Commit graph

4 commits

Author SHA1 Message Date
liqunfu
92662659ba
Liqun/remove number matching (#5606)
replace number matching with relaxed comparison in frontend tests
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-10-27 21:27:37 -07:00
liqunfu
773992c7d4
Liqun/bert pretrain tb (#5377)
* add tensor board, remove torch.distributed.lanuch because ort nccl depends on MPI. Use MPI to launch parallel training.

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-10-06 16:28:31 -07:00
liqunfu
fe50213491
Liqun/bert pretrain2 (#5327)
* bert single node multi GPU pretrain w/o checkpoint

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-10-01 11:01:26 -07:00
edgchen1
6d5b93b805
Synchronize training dependency versions between Docker image and Python wheel. (#5261)
Synchronize training dependency versions between Docker image and wheel, update docs, refactor build scripts.
2020-09-23 19:03:42 -07:00