Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in processing while using gen_text_plus_code_data.py #13

Open
mohitgureja opened this issue Jul 18, 2022 · 2 comments
Open

Error in processing while using gen_text_plus_code_data.py #13

mohitgureja opened this issue Jul 18, 2022 · 2 comments

Comments

@mohitgureja
Copy link

Hello Dr. Darabi,

I am trying to run gen_text_plus_code_data.py file for generating dictionary containing patient id's.

python gen_text_plus_code_data.py -p '<path-to-where-df-were-saved>/df_less_1.pkl' -s '<path-to-save-output>' -et -cpb './data/biobert_pubmed/bert_config.json' -sdp './data/biobert_pubmed_model' -vpb './data/biobert_pubmed/vocab.txt' -bsl 512 -diag -proc -med -cpt -sc -vp <path-to-medical-code-vocabulary>

I am using Google Colab Pro with GPU but still it is taking so much time. I have executed it for more than 15 hrs but still it is not getting completed and is still showing 0%. Can you please help us to execute this step?

Thanking you in Advance!

Best regards,
Mohit Gureja

@primus852
Copy link

primus852 commented Oct 5, 2022

+1
I run it locally on a GTX 1660 Ti, but it says it takes approx. ~1900 hours to finish? Is that correct? I know the 1660 is NOT a ML GPU, but still, that seems a lot?
image

Is there any way to speed this up? Installing apex did not change anything...

Also during training, it does not seem to use the GPU at all?

image

but print('CUDA', torch.cuda.is_available(), torch.cuda.get_device_name(0))

outputs CUDA True NVIDIA GeForce GTX 1660 Ti. Any ideas?

@primus852
Copy link

@mohitgureja I fixed this by adding the flag -gpu 1. After some investigation I found that the GPU was not used at all when not setting this flag as it defaults to 0. It reduced training time from 1900 to ~80 hours, which seems reasonable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants