Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError: First dimension of the tensor should be divisible by tensor parallel size #332

Open
pizts opened this issue Sep 4, 2024 · 0 comments

Comments

@pizts
Copy link

pizts commented Sep 4, 2024

Traceback (most recent call last):
[rank0]: File "/mnt/data/Pai-Megatron-Patch/examples/qwen2/pretrain_qwen.py", line 242, in
[rank0]: pretrain(
[rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/training/training.py", line 326, in pretrain
[rank0]: iteration, num_floating_point_operations_so_far = train(
[rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/training/training.py", line 1247, in train
[rank0]: loss_dict, skipped_iter, grad_norm, num_zeros_in_grad = train_step(
[rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/training/training.py", line 688, in train_step
[rank0]: losses_reduced = forward_backward_func(
[rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/core/pipeline_parallel/schedules.py", line 1344, in forward_backward_pipelining_without_interleaving
[rank0]: output_tensor, num_tokens = forward_step(
[rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/core/pipeline_parallel/schedules.py", line 219, in forward_step
[rank0]: output_tensor, loss_func = forward_step_func(data_iterator, model)
[rank0]: File "/mnt/data/Pai-Megatron-Patch/examples/qwen2/pretrain_qwen.py", line 166, in forward_step
[rank0]: output_tensor = model(tokens, position_ids, attention_mask, labels=labels)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1582, in _call_impl
[rank0]: result = forward_call(*args, **kwargs)
[rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/core/distributed/distributed_data_parallel.py", line 204, in forward
[rank0]: return self.module(*inputs, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1582, in _call_impl
[rank0]: result = forward_call(*args, **kwargs)
[rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/legacy/model/module.py", line 189, in forward
[rank0]: outputs = self.module(*inputs, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1582, in _call_impl
[rank0]: result = forward_call(*args, **kwargs)
[rank0]: File "/mnt/data/Pai-Megatron-Patch/megatron_patch/model/qwen2/model.py", line 188, in forward
[rank0]: decoder_input = self.embedding(input_ids=input_ids, position_ids=position_ids)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1582, in _call_impl
[rank0]: result = forward_call(*args, **kwargs)
[rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/core/models/common/embeddings/language_model_embedding.py", line 100, in forward
[rank0]: word_embeddings = self.word_embeddings(input_ids)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1582, in _call_impl
[rank0]: result = forward_call(*args, **kwargs)
[rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/core/tensor_parallel/layers.py", line 242, in forward
[rank0]: output = reduce_scatter_to_sequence_parallel_region(output_parallel)
[rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/core/tensor_parallel/mappings.py", line 466, in reduce_scatter_to_sequence_parallel_region
[rank0]: return ReduceScatterToSequenceParallelRegion.apply(input)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 569, in apply
[rank0]: return super().apply(*args, **kwargs) # type: ignore[misc]
[rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/core/tensor_parallel/mappings.py", line 319, in forward
[rank0]: return reduce_scatter_along_first_dim(input)
[rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/core/tensor_parallel/mappings.py", line 135, in _reduce_scatter_along_first_dim
[rank0]: dim_size[0] % world_size == 0
[rank0]: AssertionError: First dimension of the tensor should be divisible by tensor parallel size [16383, 16, 3584]/2

看了下dim_size是[16383, 16, 3584], world_size是2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant