Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:331 #10

Open
ChesterXi opened this issue Jun 20, 2021 · 3 comments

Comments

@ChesterXi
Copy link

Why Runtime Error
Enviroment:
RTX3090
CUDA:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

Error:
06/19/2021 16:37:19 - INFO - main - device: cuda, n_gpu: 2, 16-bits training: False
06/19/2021 16:51:17 - INFO - main - Start epoch #0 (lr = 4e-05)...
Traceback (most recent call last):
File "code/run_trigger_qa.py", line 629, in
main(args)
File "code/run_trigger_qa.py", line 480, in main
loss = model(input_ids, token_type_ids = segment_ids, attention_mask = input_mask, labels = labels)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
output.reraise()
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/_utils.py", line 369, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/zf1/xqs/eeqa-master/code/pytorch_pretrained_bert/modeling.py", line 1198, in forward
sequence_output, _ = self.bert(input_ids, token_type_ids, attention_mask, output_all_encoded_layers=False)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/zf1/xqs/eeqa-master/code/pytorch_pretrained_bert/modeling.py", line 734, in forward
output_all_encoded_layers=output_all_encoded_layers)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/zf1/xqs/eeqa-master/code/pytorch_pretrained_bert/modeling.py", line 411, in forward
hidden_states = layer_module(hidden_states, attention_mask)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/zf1/xqs/eeqa-master/code/pytorch_pretrained_bert/modeling.py", line 396, in forward
attention_output = self.attention(hidden_states, attention_mask)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/zf1/xqs/eeqa-master/code/pytorch_pretrained_bert/modeling.py", line 354, in forward
self_output = self.self(input_tensor, attention_mask)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/zf1/xqs/eeqa-master/code/pytorch_pretrained_bert/modeling.py", line 311, in forward
attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:331

@Tibony
Copy link

Tibony commented Jul 2, 2021

我也在跑这个。怎么可以联系上你
QQ:2411093921
微信:17392718405

@YX-ZL
Copy link

YX-ZL commented Oct 25, 2023

Check if your CUDA version matches Torch

@kissaxin572
Copy link

Is the problem solved? I have the same error and i have checked my environment which is consistent with the requirements.txt. Could anyone give me a hand. Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants