We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Traceback (most recent call last): [rank0]: File "/mnt/data/Pai-Megatron-Patch/examples/qwen2/pretrain_qwen.py", line 242, in [rank0]: pretrain( [rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/training/training.py", line 326, in pretrain [rank0]: iteration, num_floating_point_operations_so_far = train( [rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/training/training.py", line 1247, in train [rank0]: loss_dict, skipped_iter, grad_norm, num_zeros_in_grad = train_step( [rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/training/training.py", line 688, in train_step [rank0]: losses_reduced = forward_backward_func( [rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/core/pipeline_parallel/schedules.py", line 1344, in forward_backward_pipelining_without_interleaving [rank0]: output_tensor, num_tokens = forward_step( [rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/core/pipeline_parallel/schedules.py", line 219, in forward_step [rank0]: output_tensor, loss_func = forward_step_func(data_iterator, model) [rank0]: File "/mnt/data/Pai-Megatron-Patch/examples/qwen2/pretrain_qwen.py", line 166, in forward_step [rank0]: output_tensor = model(tokens, position_ids, attention_mask, labels=labels) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1582, in _call_impl [rank0]: result = forward_call(*args, **kwargs) [rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/core/distributed/distributed_data_parallel.py", line 204, in forward [rank0]: return self.module(*inputs, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1582, in _call_impl [rank0]: result = forward_call(*args, **kwargs) [rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/legacy/model/module.py", line 189, in forward [rank0]: outputs = self.module(*inputs, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1582, in _call_impl [rank0]: result = forward_call(*args, **kwargs) [rank0]: File "/mnt/data/Pai-Megatron-Patch/megatron_patch/model/qwen2/model.py", line 188, in forward [rank0]: decoder_input = self.embedding(input_ids=input_ids, position_ids=position_ids) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1582, in _call_impl [rank0]: result = forward_call(*args, **kwargs) [rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/core/models/common/embeddings/language_model_embedding.py", line 100, in forward [rank0]: word_embeddings = self.word_embeddings(input_ids) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1582, in _call_impl [rank0]: result = forward_call(*args, **kwargs) [rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/core/tensor_parallel/layers.py", line 242, in forward [rank0]: output = reduce_scatter_to_sequence_parallel_region(output_parallel) [rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/core/tensor_parallel/mappings.py", line 466, in reduce_scatter_to_sequence_parallel_region [rank0]: return ReduceScatterToSequenceParallelRegion.apply(input) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 569, in apply [rank0]: return super().apply(*args, **kwargs) # type: ignore[misc] [rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/core/tensor_parallel/mappings.py", line 319, in forward [rank0]: return reduce_scatter_along_first_dim(input) [rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/core/tensor_parallel/mappings.py", line 135, in _reduce_scatter_along_first_dim [rank0]: dim_size[0] % world_size == 0 [rank0]: AssertionError: First dimension of the tensor should be divisible by tensor parallel size [16383, 16, 3584]/2
看了下dim_size是[16383, 16, 3584], world_size是2
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Traceback (most recent call last):
[rank0]: File "/mnt/data/Pai-Megatron-Patch/examples/qwen2/pretrain_qwen.py", line 242, in
[rank0]: pretrain(
[rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/training/training.py", line 326, in pretrain
[rank0]: iteration, num_floating_point_operations_so_far = train(
[rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/training/training.py", line 1247, in train
[rank0]: loss_dict, skipped_iter, grad_norm, num_zeros_in_grad = train_step(
[rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/training/training.py", line 688, in train_step
[rank0]: losses_reduced = forward_backward_func(
[rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/core/pipeline_parallel/schedules.py", line 1344, in forward_backward_pipelining_without_interleaving
[rank0]: output_tensor, num_tokens = forward_step(
[rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/core/pipeline_parallel/schedules.py", line 219, in forward_step
[rank0]: output_tensor, loss_func = forward_step_func(data_iterator, model)
[rank0]: File "/mnt/data/Pai-Megatron-Patch/examples/qwen2/pretrain_qwen.py", line 166, in forward_step
[rank0]: output_tensor = model(tokens, position_ids, attention_mask, labels=labels)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1582, in _call_impl
[rank0]: result = forward_call(*args, **kwargs)
[rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/core/distributed/distributed_data_parallel.py", line 204, in forward
[rank0]: return self.module(*inputs, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1582, in _call_impl
[rank0]: result = forward_call(*args, **kwargs)
[rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/legacy/model/module.py", line 189, in forward
[rank0]: outputs = self.module(*inputs, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1582, in _call_impl
[rank0]: result = forward_call(*args, **kwargs)
[rank0]: File "/mnt/data/Pai-Megatron-Patch/megatron_patch/model/qwen2/model.py", line 188, in forward
[rank0]: decoder_input = self.embedding(input_ids=input_ids, position_ids=position_ids)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1582, in _call_impl
[rank0]: result = forward_call(*args, **kwargs)
[rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/core/models/common/embeddings/language_model_embedding.py", line 100, in forward
[rank0]: word_embeddings = self.word_embeddings(input_ids)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1582, in _call_impl
[rank0]: result = forward_call(*args, **kwargs)
[rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/core/tensor_parallel/layers.py", line 242, in forward
[rank0]: output = reduce_scatter_to_sequence_parallel_region(output_parallel)
[rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/core/tensor_parallel/mappings.py", line 466, in reduce_scatter_to_sequence_parallel_region
[rank0]: return ReduceScatterToSequenceParallelRegion.apply(input)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 569, in apply
[rank0]: return super().apply(*args, **kwargs) # type: ignore[misc]
[rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/core/tensor_parallel/mappings.py", line 319, in forward
[rank0]: return reduce_scatter_along_first_dim(input)
[rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/core/tensor_parallel/mappings.py", line 135, in _reduce_scatter_along_first_dim
[rank0]: dim_size[0] % world_size == 0
[rank0]: AssertionError: First dimension of the tensor should be divisible by tensor parallel size [16383, 16, 3584]/2
看了下dim_size是[16383, 16, 3584], world_size是2
The text was updated successfully, but these errors were encountered: