New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

关于LLAMA 3.1模型的适配问题 #361

Open

echo-valor opened this issue Oct 14, 2024 · 1 comment

echo-valor commented Oct 14, 2024

按照目前PAI适配的方法，有这样的疑问：为什么不考虑适配low_freq_factor / high_freq_factor？

以llama 3.1-70b-base模型为例，他的config.json文件中有如下参数设置：
"rope_scaling": {
"factor": 8.0,
"low_freq_factor": 1.0,
"high_freq_factor": 4.0,
"original_max_position_embeddings": 8192,
"rope_type": "llama3"
},
这里的low_freq_factor / high_freq_factor缩放，在对Attn进行位置编码时，high_freq_factor=4肯定会产生对高维度编码的影响，如何在实际对Llama 3.1模型进行训练时，进行上述参数适配，求解。

Contributor

lostkevin commented Oct 15, 2024

我们内部实现了这一参数，但并没有提供外部接口，如果您有需求，请对相应代码进行修改，详见

Pai-Megatron-Patch/megatron_patch/model/llama3_1/model.py

Line 127 in 9d3e557

high_freq_factor = rotary_scaling_config.get("high_freq_factor", 4)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment