[feature] support Gemma2Model for tensor parallem training #6122

jing-4369 · 2024-11-09T13:13:50Z

📌 Checklist before creating the PR

I have created an issue for this PR for traceability
The title follows the standard format: [doc/gemini/tensor/...]: A concise description
I have added relevant tags if possible for us to better distinguish different PRs
I have installed pre-commit: pip install pre-commit && pre-commit install

🚨 Issue number

fixed #6120

📝 What does this PR do?

support Gemma2Model for tensor parallem training

Attached here is a small bug fix to successfully run the llama model

💥 Checklist before requesting a review

I have linked my PR to an issue (instruction)
My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
I have performed a self-review of my code
I have added thorough tests.
I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

🌝 Yes, I do.
🌚 No, I don't.

for more information, see https://pre-commit.ci

Edenzzzz · 2024-11-11T18:40:44Z

Thanks for contributing! To add a new model, we will also need unit tests. Please reference the existing tests and feel free to ping other team members.

Edenzzzz · 2024-11-18T02:02:22Z

colossalai/shardformer/modeling/llama.py

+            attn_kwargs: torch.Tensor = self._update_causal_mask(
+                attention_mask, hidden_states, cache_position, past_key_values, output_attentions
+            )


We don't need this? The main branch seems to work

this can be removed here.
but this is another bug, this did not work when you train llama3, llama3.1, llama3.2

https://github.com/hpcaitech/ColossalAI/blob/main/examples/language/llama/benchmark.py
i hope you can try this, and use HybridParallelPlugin

I'm not sure what you refer to, colossalai run --nproc_per_node 2 --master_port 29501 benchmark.py -p 3d -b 1 -g --zero 2 (flash attn disabled, so go into this if branch) doesn't throw any error.
Are you using the right transformers version?
To justify such changes and save time, please provide a command to easily reproduce the error.

jing-4369 and others added 2 commits November 9, 2024 15:51

support gemma2

4389089

[pre-commit.ci] auto fixes from pre-commit.com hooks

753db97

for more information, see https://pre-commit.ci

jing-4369 requested a review from a team as a code owner November 9, 2024 13:13

Edenzzzz reviewed Nov 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature] support Gemma2Model for tensor parallem training #6122

[feature] support Gemma2Model for tensor parallem training #6122

jing-4369 commented Nov 9, 2024

Edenzzzz commented Nov 11, 2024

Edenzzzz Nov 18, 2024 •

edited

Loading

jing-4369 Nov 19, 2024

Edenzzzz Nov 22, 2024 •

edited

Loading

[feature] support Gemma2Model for tensor parallem training #6122

Are you sure you want to change the base?

[feature] support Gemma2Model for tensor parallem training #6122

Conversation

jing-4369 commented Nov 9, 2024

📌 Checklist before creating the PR

🚨 Issue number

📝 What does this PR do?

💥 Checklist before requesting a review

⭐️ Do you enjoy contributing to Colossal-AI?

Edenzzzz commented Nov 11, 2024

Edenzzzz Nov 18, 2024 • edited Loading

Choose a reason for hiding this comment

jing-4369 Nov 19, 2024

Choose a reason for hiding this comment

Edenzzzz Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

Edenzzzz Nov 18, 2024 •

edited

Loading

Edenzzzz Nov 22, 2024 •

edited

Loading