onnxruntime/orttraining
ashbhandare 7cebf76a46
Improve checkpointing for Zero stage 1 (#5478)
* Initial running changes

* Checkpointing aggregation changes

* compare with older version

* initial cleanup

* Add zero test, minor fix

* Fix zero test, transform, formatting

* Review comments

* add more unit tests

* review comments

* Try fix CI

* Add additional check on just aggregation code

* Try fix ckpt gen

* Add pregenerated ckpt for CI, enable zero test in e2e

* Moving test to nightly, removing ckpt files

* Add tests to dist GPU CI

* Fix dist test

* Review comments

* Fix test
2020-12-07 09:16:01 -08:00
..
orttraining Improve checkpointing for Zero stage 1 (#5478) 2020-12-07 09:16:01 -08:00
pytorch_frontend_examples Fix mnist example (#4926) 2020-08-26 15:28:39 -07:00
tools Cache build docker images in container registry. (#5811) 2020-11-17 17:02:24 -08:00