Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the optimizer_to_device logic if possible #20165

Open
awaelchli opened this issue Aug 5, 2024 · 3 comments
Open

Remove the optimizer_to_device logic if possible #20165

awaelchli opened this issue Aug 5, 2024 · 3 comments
Labels

Comments

@awaelchli
Copy link
Contributor

awaelchli commented Aug 5, 2024

Outline & Motivation

The trainer uses a function optimizer_to_device here:

if trainer.state.fn == TrainerFn.FITTING:
_optimizers_to_device(self.optimizers, self.root_device)

In #19955 an issue was raised that the function moved the "step" parameter in the optimizer state to the CUDA device, causing device-to-host syncs during optimizer.step() because the "step" tensor was expected to remain on CPU. #20019 fixed this with special treatment of that key. However, good arguments were made in #19955 that this optimizer_to_device shouldn't even be necessary in the first place (#19955 (comment)).

Pitch

Remove optimizer_to_device and show that it is redundant by running the tests. We will still need a optimizer_to_cpu for teardown.

Additional context

No response

cc @justusschock @awaelchli @Borda

@awaelchli awaelchli added refactor needs triage Waiting to be triaged by maintainers performance checkpointing Related to checkpointing and removed needs triage Waiting to be triaged by maintainers labels Aug 5, 2024
@corwinjoy
Copy link
Contributor

corwinjoy commented Aug 5, 2024

Thanks @awaelchli .
The analysis I have done on the code seems to agree that we can remove this function (see #19955 (comment)). The remaining points I think are of note:

  1. This function is also called by Strategy.teardown(). The idea is to transfer the optimizer back from the GPU to the CPU. Maybe we don't really care about this since when the fit is complete either the optimizer will be checkpointed or discarded. The questions is if users really depend on this final transfer behavior but I would guess not. This factors into whether we feel we need optimizer_to_cpu for teardown as mentioned above.
  2. I created additional tests in Test optimizer to device #20062 that I think are helpful in verifying that the function is not needed. I believe that some of these may be helpful in a PR to remove this function.

@corwinjoy
Copy link
Contributor

corwinjoy commented Aug 5, 2024

Also tagging @janeyx99 for input.

@janeyx99
Copy link
Contributor

  1. I am likely not the right person to respond to this q, but my instinct agrees with you and doesn't think moving the optimizer back to CPU needs to be a part of the teardown. I'd defer to others on this though.
  2. I reviewed the part of the tests that move the state_dict back and forth. Left my comments on the PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants