Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor: Addressing Sources of User Error #73

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

thomasfortin1
Copy link

I made two changes which should help future users implement MuP correctly:

Previously a user could use mup.Adam, mup.AdamW, or mup.SGD (which are just the regular PyTorch optimizers) instead of the correct mup.MuAdam, mup.MuAdamW, or mup.MuSGD. Now the vanilla PyTorch optimizers cannot be accidentally accessed through the mup package.

If mup.MuAdam is used with weight decay, a warning will prompt the user to switch to mup.MuAdamW for correct weight decay scaling as described in appendix B.3 of the version of the paper which is on ArXiv. Note that doing a coord check will not indicate an incorrect implementation when using MuAdam with weight decay, but increasing model size will still eventually lead to diminishing performance unless MuAdamW is used instead (in my experience).

@thomasfortin1
Copy link
Author

@microsoft-github-policy-service agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant