AssertionError: First dimension of the tensor should be divisible by tensor parallel size #332

pizts · 2024-09-04T08:05:04Z

Traceback (most recent call last):
[rank0]: File "/mnt/data/Pai-Megatron-Patch/examples/qwen2/pretrain_qwen.py", line 242, in
[rank0]: pretrain(
[rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/training/training.py", line 326, in pretrain
[rank0]: iteration, num_floating_point_operations_so_far = train(
[rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/training/training.py", line 1247, in train
[rank0]: loss_dict, skipped_iter, grad_norm, num_zeros_in_grad = train_step(
[rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/training/training.py", line 688, in train_step
[rank0]: losses_reduced = forward_backward_func(
[rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/core/pipeline_parallel/schedules.py", line 1344, in forward_backward_pipelining_without_interleaving
[rank0]: output_tensor, num_tokens = forward_step(
[rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/core/pipeline_parallel/schedules.py", line 219, in forward_step
[rank0]: output_tensor, loss_func = forward_step_func(data_iterator, model)
[rank0]: File "/mnt/data/Pai-Megatron-Patch/examples/qwen2/pretrain_qwen.py", line 166, in forward_step
[rank0]: output_tensor = model(tokens, position_ids, attention_mask, labels=labels)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1582, in _call_impl
[rank0]: result = forward_call(*args, **kwargs)
[rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/core/distributed/distributed_data_parallel.py", line 204, in forward
[rank0]: return self.module(*inputs, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1582, in _call_impl
[rank0]: result = forward_call(*args, **kwargs)
[rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/legacy/model/module.py", line 189, in forward
[rank0]: outputs = self.module(*inputs, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1582, in _call_impl
[rank0]: result = forward_call(*args, **kwargs)
[rank0]: File "/mnt/data/Pai-Megatron-Patch/megatron_patch/model/qwen2/model.py", line 188, in forward
[rank0]: decoder_input = self.embedding(input_ids=input_ids, position_ids=position_ids)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1582, in _call_impl
[rank0]: result = forward_call(*args, **kwargs)
[rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/core/models/common/embeddings/language_model_embedding.py", line 100, in forward
[rank0]: word_embeddings = self.word_embeddings(input_ids)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1582, in _call_impl
[rank0]: result = forward_call(*args, **kwargs)
[rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/core/tensor_parallel/layers.py", line 242, in forward
[rank0]: output = reduce_scatter_to_sequence_parallel_region(output_parallel)
[rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/core/tensor_parallel/mappings.py", line 466, in reduce_scatter_to_sequence_parallel_region
[rank0]: return ReduceScatterToSequenceParallelRegion.apply(input)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 569, in apply
[rank0]: return super().apply(*args, **kwargs) # type: ignore[misc]
[rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/core/tensor_parallel/mappings.py", line 319, in forward
[rank0]: return reduce_scatter_along_first_dim(input)
[rank0]: File "/mnt/data/Pai-Megatron-Patch/PAI-Megatron-LM-240718/megatron/core/tensor_parallel/mappings.py", line 135, in _reduce_scatter_along_first_dim
[rank0]: dim_size[0] % world_size == 0
[rank0]: AssertionError: First dimension of the tensor should be divisible by tensor parallel size [16383, 16, 3584]/2

看了下dim_size是[16383, 16, 3584]， world_size是2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AssertionError: First dimension of the tensor should be divisible by tensor parallel size #332

AssertionError: First dimension of the tensor should be divisible by tensor parallel size #332

pizts commented Sep 4, 2024

AssertionError: First dimension of the tensor should be divisible by tensor parallel size #332

AssertionError: First dimension of the tensor should be divisible by tensor parallel size #332

Comments

pizts commented Sep 4, 2024