onnxruntime/orttraining
jingyanwangms 5dcaf70501
Adding this set_to_none flag to zero_grad to have signature parity with pytorch Adam (#16375)
### Description
torch.optim Adam zero_grad() signature is
zero_grad(set_to_none=True)

https://pytorch.org/docs/stable/generated/torch.optim.Adam.html#torch.optim.Adam.zero_grad

We set this flag in initialization, similar to deepspeed:
https://deepspeed.readthedocs.io/en/latest/optimizers.html#deepspeed.ops.adam.FusedAdam

Adding this flag to have signature parity with pytorch Adam

### Motivation and Context
Easier model integration

Co-authored-by: Jingyan Wang <jingywa@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2023-06-19 17:27:41 -07:00
..
orttraining Adding this set_to_none flag to zero_grad to have signature parity with pytorch Adam (#16375) 2023-06-19 17:27:41 -07:00
pytorch_frontend_examples Enable pylint and numpy rules (#15218) 2023-03-27 20:37:53 -07:00
tools [ROCm] reduce batch size to fix CI error (#15714) 2023-05-16 13:10:02 +08:00