-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AssertionError for Phi-3.5-mini-instruct and Qwen2.5-7B-Instruct with NeMo + ThunderFX #1476
Comments
Here is txt file with full traceback: full_traceback.txt |
I think the reason is this PR(#1437), it relies on PyTorch's bug fixing pytorch/pytorch#139275, probably only in Torch nightly |
The error is fixed only with the latest PyTorch (Nov 1st+, pytorch/pytorch@0cf4cc3). What's the PyTorch version used in |
The functionality added in #1437 is not (yet) a blocker for our Q4 goals. I recommend a workaround that simply disables the functionality when/if PyTorch is too old.
It is old: |
Sure, if we need to make it work for the older PyTorch we can do that. A workaround could be to iterate over all submodules returned in lightning-thunder/thunder/dynamo/splitter.py Lines 134 to 137 in cd6977d
and add an output node to all submodules that are missing one. @kshitij12345, does this sound like a correct workaround? |
Yes, I think that should work. |
🐛 Bug
When running Phi-3.5-mini-instruct and Qwen2.5-7B-Instruct with NeMo + ThunderFX we get error:
(I'll add file with full traceback)
To Reproduce
The error is present on 1xH100.
Dockerfile used (I build it yesterday and I'm not sure yet how nemo:dev images are versioned, so I can't provide its detailed version):
Inside docker container please run:
Script
bench_targets/llm_peft/_nemo.py
can be obtained from internal Gitlab fromakoumparouli/nemo_bench
. You can contact me or @tfogal if you have any questions.You can check that the command below works:
Expected behavior
No error for Thunder.
Environment
cc @tfogal
The text was updated successfully, but these errors were encountered: