Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于LLAMA 3.1模型的适配问题 #361

Open
echo-valor opened this issue Oct 14, 2024 · 1 comment
Open

关于LLAMA 3.1模型的适配问题 #361

echo-valor opened this issue Oct 14, 2024 · 1 comment

Comments

@echo-valor
Copy link

按照目前PAI适配的方法,有这样的疑问:为什么不考虑适配low_freq_factor / high_freq_factor?

  1. 以llama 3.1-70b-base模型为例,他的config.json文件中有如下参数设置:
    "rope_scaling": {
    "factor": 8.0,
    "low_freq_factor": 1.0,
    "high_freq_factor": 4.0,
    "original_max_position_embeddings": 8192,
    "rope_type": "llama3"
    },
    这里的low_freq_factor / high_freq_factor缩放,在对Attn进行位置编码时,high_freq_factor=4肯定会产生对高维度编码的影响,如何在实际对Llama 3.1模型进行训练时,进行上述参数适配,求解。
@lostkevin
Copy link
Contributor

我们内部实现了这一参数,但并没有提供外部接口,如果您有需求,请对相应代码进行修改,详见

high_freq_factor = rotary_scaling_config.get("high_freq_factor", 4)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants