pytorch/test/optim
janEbert b0708654c0 Implement NAdamW optimizer (#103881)
NAdamW, which is simply NAdam with the AdamW weight decay term, has shown strong performance in optimizer comparisons such as
1. https://arxiv.org/abs/2211.09760
1. https://arxiv.org/abs/2306.07179

[The VeLO paper](https://arxiv.org/abs/2211.09760) argues its power lies in its ability to act as a superset of other popular optimizers.

This PR adds NAdamW by ~~copying and making very small adaptations to the NAdam implementation (just like AdamW and Adam). To see the small changes in better detail, you can `diff torch/optim/nadam.py torch/optim/nadamw.py`.~~ adding a boolean flag `decoupled_weight_decay` that activates NAdamW behavior (`False` by default) to NAdam.

Interest in the optimizer has also been shown in the PyTorch forums:
https://discuss.pytorch.org/t/nadamw-and-demon-optimizers/179778

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103881
Approved by: https://github.com/janeyx99
2023-07-24 19:29:26 +00:00
..
test_lrscheduler.py [BE] f-stringify torch/ and scripts (#105538) 2023-07-21 19:35:24 +00:00
test_optim.py Implement NAdamW optimizer (#103881) 2023-07-24 19:29:26 +00:00
test_swa_utils.py Dont run test files that are already run in test_optim (#103017) 2023-06-06 17:31:21 +00:00