mirror of
https://github.com/saymrwulf/pytorch.git
synced 2026-05-15 21:00:47 +00:00
NAdamW, which is simply NAdam with the AdamW weight decay term, has shown strong performance in optimizer comparisons such as 1. https://arxiv.org/abs/2211.09760 1. https://arxiv.org/abs/2306.07179 [The VeLO paper](https://arxiv.org/abs/2211.09760) argues its power lies in its ability to act as a superset of other popular optimizers. This PR adds NAdamW by ~~copying and making very small adaptations to the NAdam implementation (just like AdamW and Adam). To see the small changes in better detail, you can `diff torch/optim/nadam.py torch/optim/nadamw.py`.~~ adding a boolean flag `decoupled_weight_decay` that activates NAdamW behavior (`False` by default) to NAdam. Interest in the optimizer has also been shown in the PyTorch forums: https://discuss.pytorch.org/t/nadamw-and-demon-optimizers/179778 Pull Request resolved: https://github.com/pytorch/pytorch/pull/103881 Approved by: https://github.com/janeyx99 |
||
|---|---|---|
| .. | ||
| test_lrscheduler.py | ||
| test_optim.py | ||
| test_swa_utils.py | ||