RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:331 #10

ChesterXi · 2021-06-20T00:26:35Z

Why Runtime Error
Enviroment：
RTX3090
CUDA：
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

Error：
06/19/2021 16:37:19 - INFO - main - device: cuda, n_gpu: 2, 16-bits training: False
06/19/2021 16:51:17 - INFO - main - Start epoch #0 (lr = 4e-05)...
Traceback (most recent call last):
File "code/run_trigger_qa.py", line 629, in
main(args)
File "code/run_trigger_qa.py", line 480, in main
loss = model(input_ids, token_type_ids = segment_ids, attention_mask = input_mask, labels = labels)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
output.reraise()
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/_utils.py", line 369, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/zf1/xqs/eeqa-master/code/pytorch_pretrained_bert/modeling.py", line 1198, in forward
sequence_output, _ = self.bert(input_ids, token_type_ids, attention_mask, output_all_encoded_layers=False)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/zf1/xqs/eeqa-master/code/pytorch_pretrained_bert/modeling.py", line 734, in forward
output_all_encoded_layers=output_all_encoded_layers)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/zf1/xqs/eeqa-master/code/pytorch_pretrained_bert/modeling.py", line 411, in forward
hidden_states = layer_module(hidden_states, attention_mask)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/zf1/xqs/eeqa-master/code/pytorch_pretrained_bert/modeling.py", line 396, in forward
attention_output = self.attention(hidden_states, attention_mask)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/zf1/xqs/eeqa-master/code/pytorch_pretrained_bert/modeling.py", line 354, in forward
self_output = self.self(input_tensor, attention_mask)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/zf1/xqs/eeqa-master/code/pytorch_pretrained_bert/modeling.py", line 311, in forward
attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:331

Tibony · 2021-07-02T08:12:05Z

我也在跑这个。怎么可以联系上你
QQ：2411093921
微信：17392718405

YX-ZL · 2023-10-25T12:40:42Z

Check if your CUDA version matches Torch

kissaxin572 · 2024-05-19T15:05:27Z

Is the problem solved? I have the same error and i have checked my environment which is consistent with the requirements.txt. Could anyone give me a hand. Thank you very much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:331 #10

RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:331 #10

ChesterXi commented Jun 20, 2021

Tibony commented Jul 2, 2021

YX-ZL commented Oct 25, 2023

kissaxin572 commented May 19, 2024

RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:331 #10

RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:331 #10

Comments

ChesterXi commented Jun 20, 2021

Tibony commented Jul 2, 2021

YX-ZL commented Oct 25, 2023

kissaxin572 commented May 19, 2024