-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add special logic for 'step' in _optimizer_to_device #20019
Conversation
Here is the performance information when using the test code from issue #19955 and continuing from a checkpoint. With the old code many memory synchronizations are forced, with the update to keep 'step' as-is, this issue is removed:
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #20019 +/- ##
=======================================
Coverage 89% 89%
=======================================
Files 267 267
Lines 23071 23076 +5
=======================================
+ Hits 20572 20577 +5
Misses 2499 2499 |
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@corwinjoy Thanks for the PR. I added missing tests to cover the change. There was a previous test that allows dataclasses as part of the state, so I had to keep this functionality. So I think the PR is ready to land.
for more information, see https://pre-commit.ci
) Co-authored-by: Adrian Wälchli <[email protected]>
Fix performance degradation when restoring optimizer from checkpoint.
This fix is to address the issue discussed in #19955
Fixes #19955
This fix is also due to the related isssue in PyTorch:
pytorch/pytorch#74424
This issue could also use a test to check for continued performance, but I'm not sure how to do it.
On a dedicated GPU the transfer time is negligible, this really becomes an issue when the GPU is shared or has more of a transfer bottleneck.
📚 Documentation preview 📚: https://pytorch-lightning--20019.org.readthedocs.build/en/20019/