Add special logic for 'step' in _optimizer_to_device #20019

corwinjoy · 2024-06-27T01:43:53Z

Fix performance degradation when restoring optimizer from checkpoint.
This fix is to address the issue discussed in #19955

Fixes #19955

This fix is also due to the related isssue in PyTorch:
pytorch/pytorch#74424

This issue could also use a test to check for continued performance, but I'm not sure how to do it.
On a dedicated GPU the transfer time is negligible, this really becomes an issue when the GPU is shared or has more of a transfer bottleneck.

📚 Documentation preview 📚: https://pytorch-lightning--20019.org.readthedocs.build/en/20019/

corwinjoy · 2024-06-27T01:46:44Z

Here is the performance information when using the test code from issue #19955 and continuing from a checkpoint. With the old code many memory synchronizations are forced, with the update to keep 'step' as-is, this issue is removed:

nsys profile --stats=true /home/cjoy/src/adam_gpu/.venv/bin/python /home/cjoy/src/adam_gpu/src/test.py


Original _optimizer_to_device function:
[7/8] Executing 'cuda_gpu_mem_time_sum' stats report

 Time (%)  Total Time (ns)  Count   Avg (ns)    Med (ns)  Min (ns)   Max (ns)   StdDev (ns)            Operation          
 --------  ---------------  -----  -----------  --------  --------  ----------  ------------  ----------------------------
     60.4      129,388,373  4,094     31,604.4   1,344.0     1,024  16,394,576     672,557.9  [CUDA memcpy Device-to-Host]
     38.7       82,982,124     44  1,885,957.4     608.0       415  67,438,426  10,172,712.8  [CUDA memcpy Host-to-Device]
      0.9        1,971,518  2,000        985.8     992.0       416       2,368         166.8  [CUDA memset]            
      

with special handling for 'step as per this PR':
[7/8] Executing 'cuda_gpu_mem_time_sum' stats report

 Time (%)  Total Time (ns)  Count   Avg (ns)    Med (ns)  Min (ns)   Max (ns)   StdDev (ns)            Operation          
 --------  ---------------  -----  -----------  --------  --------  ----------  ------------  ----------------------------
     59.3      122,887,554     74  1,660,642.6   1,424.0     1,024  16,134,055   4,710,020.5  [CUDA memcpy Device-to-Host]
     39.8       82,420,918     34  2,424,144.6     799.5       415  67,068,637  11,489,504.0  [CUDA memcpy Host-to-Device]
      0.9        1,940,579  2,000        970.3     991.0       415       5,727         197.9  [CUDA memset]

for more information, see https://pre-commit.ci

codecov · 2024-08-03T17:45:05Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89%. Comparing base (345450b) to head (8bacec9).
Report is 33 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master   #20019   +/-   ##
=======================================
  Coverage      89%      89%           
=======================================
  Files         267      267           
  Lines       23071    23076    +5     
=======================================
+ Hits        20572    20577    +5     
  Misses       2499     2499

for more information, see https://pre-commit.ci

awaelchli

@corwinjoy Thanks for the PR. I added missing tests to cover the change. There was a previous test that allows dataclasses as part of the state, so I had to keep this functionality. So I think the PR is ready to land.

src/lightning/fabric/utilities/optimizer.py

for more information, see https://pre-commit.ci

) Co-authored-by: Adrian Wälchli <[email protected]>

Add special logic for 'step' in _optimizer_to_device

aa0ec04

corwinjoy requested review from lantiga, Borda, tchaton, awaelchli and justusschock as code owners June 27, 2024 01:43

github-actions bot added the fabric lightning.fabric.Fabric label Jun 27, 2024

corwinjoy mentioned this pull request Jun 27, 2024

Adam optimizer is slower after loading model from checkpoint #19955

Closed

awaelchli and others added 3 commits August 3, 2024 13:28

add tests

fb4cf05

[pre-commit.ci] auto fixes from pre-commit.com hooks

585fd43

for more information, see https://pre-commit.ci

move note to the condition

4ad418a

awaelchli added bug Something isn't working performance optimization labels Aug 3, 2024

awaelchli added this to the 2.4 milestone Aug 3, 2024

awaelchli added 2 commits August 3, 2024 19:30

Merge branch 'master' into optimizer_to_device

bec392b

update changelog

192ecce

github-actions bot added the pl Generic label for PyTorch Lightning package label Aug 3, 2024

[pre-commit.ci] auto fixes from pre-commit.com hooks

75f7783

for more information, see https://pre-commit.ci

awaelchli approved these changes Aug 3, 2024

View reviewed changes

awaelchli reviewed Aug 3, 2024

View reviewed changes

src/lightning/fabric/utilities/optimizer.py Show resolved Hide resolved

awaelchli added the community This PR is from the community label Aug 3, 2024

mergify bot added the has conflicts label Aug 5, 2024

Merge branch 'master' into optimizer_to_device

ef53040

mergify bot removed the has conflicts label Aug 5, 2024

awaelchli and others added 2 commits August 5, 2024 14:06

update

7d13190

[pre-commit.ci] auto fixes from pre-commit.com hooks

8bacec9

for more information, see https://pre-commit.ci

awaelchli merged commit 631911c into Lightning-AI:master Aug 5, 2024
98 checks passed

awaelchli mentioned this pull request Aug 5, 2024

Remove the optimizer_to_device logic if possible #20165

Open

ammyk9 pushed a commit to ammyk9/pytorch-lightning that referenced this pull request Aug 6, 2024

Add special logic for 'step' in _optimizer_to_device (Lightning-AI#20019

f6ce079

) Co-authored-by: Adrian Wälchli <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add special logic for 'step' in _optimizer_to_device #20019

Add special logic for 'step' in _optimizer_to_device #20019

corwinjoy commented Jun 27, 2024 •

edited by github-actions bot

Loading

corwinjoy commented Jun 27, 2024 •

edited

Loading

codecov bot commented Aug 3, 2024 •

edited

Loading

awaelchli left a comment

Add special logic for 'step' in _optimizer_to_device #20019

Add special logic for 'step' in _optimizer_to_device #20019

Conversation

corwinjoy commented Jun 27, 2024 • edited by github-actions bot Loading

corwinjoy commented Jun 27, 2024 • edited Loading

codecov bot commented Aug 3, 2024 • edited Loading

Codecov Report

awaelchli left a comment

Choose a reason for hiding this comment

corwinjoy commented Jun 27, 2024 •

edited by github-actions bot

Loading

corwinjoy commented Jun 27, 2024 •

edited

Loading

codecov bot commented Aug 3, 2024 •

edited

Loading