Commit graph

283 commits

Author SHA1 Message Date
liqunfu
fe50213491
Liqun/bert pretrain2 (#5327)
* bert single node multi GPU pretrain w/o checkpoint

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-10-01 11:01:26 -07:00
Sherlock
37445d1198
Update Bert Perf Script (#5339)
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-09-30 14:30:20 -07:00
Sherlock
9ec1ed42a8
Enable BiasDropoutFusion for CUDA EP only (#5324)
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-09-29 14:00:15 -07:00
Sherlock
11c194ce29
Minor fix for ComputeBroadcastBackwardAxesDynamic; Fix for GradientGraphBuilder logging (#5313)
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-09-29 09:49:05 -07:00
Vincent Wang
eae2473dc1
Scale Op for ReduceMeanGrad. (#5191)
* Scale Op for ReduceMeanGrad

* fix Windows build error

* resove PR comments.

Co-authored-by: Vincent Wang <weicwang@microsoft.com>
2020-09-29 09:30:49 +08:00
Tang, Cheng
d9ecc0cebf
add bert loss legacy back (#5224) 2020-09-27 13:41:16 -07:00
Guoyu Wang
3a3f26f38e
Move ort flatbuffers helper functions and value info r/w functions into separated lib (#5276)
* Move fbs include from header to cc

* add initial cmake for flatbuffers

* Move most flatbuffers util to ort_flatbuffers

* move code around

* fix

* move test/perf runner to use flatbuffer directly instead of model

* minor update

* Fix build break

* Clean up includes and foward decl

* Fix traning CI build breaks

* Addressed PR comment, replaced some include with forward decls

* Remove ORT_MUST_USE_RESULT temporarily
2020-09-25 05:36:29 -07:00
Changming Sun
17f1178c2e
Downgrade GCC (#5269)
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2020-09-24 21:14:54 -07:00
Sherlock
b03fb82ab7
Transformer layer-wise Recompute (#4526)
* Build Recomputation Graph

* Make topological sort to run FW nodes first

* Pattern match start and end of transformer layer

* Topological sort with Priority

* Add logger to Gradient Graph Builder

* Use Logger

* Introduce Execution Order
2020-09-24 19:56:32 -07:00
Ashwini Khade
16220f3848
Add FusedMatMul contrib op (#5213)
* bug fix transformer

* fuse cpu kernel for transposescalematmul and matmul

* fuse transpose_scale_matmul cpu kernel with matmul

* fix test

* Add FusedMatMul Contrib Op

* fix test

* fix typo

* plus more updates per review
2020-09-23 12:17:50 -07:00
Scott McKay
c52561d044
Rework broadcasting setup to decrease binary size. (#5227)
* Rework broadcasting setup to decrease binary size. Push all the type specific down and separate out the broadcasting/parallelization.

Reductions:
element_wise_ops: 521.0KB -> 268.8KB
where: 25.8 KB -> 17.3 KB
qlinear_binary_op: 28.1 -> 12.8
2020-09-23 14:15:40 +10:00
KeDengMS
8dceebda0e
[Training/Python] Add option to enable symbolic shape inference (#5107)
This change adds symbolic shape inference to ORT training which helps static memory planning for model like BART.
2020-09-22 10:49:07 -07:00
Sherlock
1478643215
Place Shape's output in CPU memory (#5245)
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-09-21 20:21:59 -07:00
Pranav Sharma
974b9bfc09
Allow sharing of initializers between sessions. (#5092)
* Allow sharing of initializers between sessions.

* Allow sharing of initializers between sessions (2).

* Add test for C#

* Add test for C#; address PR comments

* Address PR comments
Moved AddInitializer logic to internal session options
Added tests for owned buffer
Clarified documentation
Fix bug where memory info and not device was getting compared

* Fix test

* Fix training build

* Add ver 5 end marker and ver 6 starter, add scenario and usage examples.
2020-09-21 14:09:37 -07:00
edgchen1
e9671e93f0
Fix TransposeScaleMatMul and MatMulScaleFusion issues (#5230)
- Rename TransposeScaleMatMul back to TransposeMatMul for backwards compatibility
- Fix MatMulScaleFusion issues:
  - Add check for supported execution providers
  - Add check for supported MatMul input types
2020-09-21 12:34:01 -07:00
Suffian Khan
84589c7e05
Fuse softmax(a + b) in case of simple broadcast (#4937)
* bias softmax kernel

* bias softmax kernel

* remove debug comments

* remove debug comment

* windows build doesnt handle unary minus on unsigned type

* int64 => int treated as error

* only support cuda

* add bias softmax fusion tests

* PR comments

* more PR comments

* use MLTypeCallDispatcher

* break function into pieces

* add loop unroll and add to list for inference as well

* use std::min and move operator==

* revert std::min (doesnt work ci pipeline) and fix int to size_t error

* pr comments

* fixes for windows ci

* fix for windows ci

* pr comments on consistency

* p_model_

* fix formatting and add anonymous namespace

Co-authored-by: suffian khan <sukha@OrtTrainingDev1.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-09-18 14:15:55 -07:00
Tang, Cheng
e0b49844e9
Provide option to let layernorm stash mean/var as fp32 or bfloat16 (#5215)
* add option to set layernorm stash type

* bug fix

* fix merge error

* fix win build error
2020-09-18 13:42:01 -07:00
Suffian Khan
e01e0b2e40
Fix softmax_warp_backward math when is_log_softmax = True and register LogSoftmax CUDA kernel (#5160)
* register logsoftmax cuda kernel; fix logsoftmaxgrad cuda kernal; fix tests to invoke dispatch_softmax_*

* forgot to remove axis check

* add tests all axis

Co-authored-by: suffian khan <sukha@OrtTrainingDev1.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-09-17 07:15:25 -07:00
Vincent Wang
c37472a1aa
Mixed Precision Transformer and Gradient Builder Refactor (#4892)
* transform mixed precision before build gradient

* resolve comments

Co-authored-by: Vincent Wang <weicwang@microsoft.com>
2020-09-17 02:44:50 +08:00
edgchen1
a20f8037f6
Install ssh in builder image, fix segfault in TrainingRunnerTest.Basic. (#5186) 2020-09-16 09:53:30 -07:00
Bowen Bao
400ac85565
Improve error message for FE model export checking (#5156) 2020-09-16 09:22:37 -07:00
Rayan-Krishnan
92a8c650ad
[Debuggability] Add feature to ORTTrainer Frontend (#5124)
* add option, feature to orttrainer and test

* address comments

* minor fixes

* further address comments

* minor changes

Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-09-11 12:16:07 -07:00
Scott McKay
59ee8ffb17
Remove SparseTensor support from minimal build. (#5114)
* Remove SparseTensor support from minimal build.

Currently the only valid usage of a SparseTensor is as an attribute of a Constant node. That would have been lifted to a dense tensor initializer when loading the onnx model, so would not exist when saving the ORT format model. Due to that there can be no SparseTensors in an ORT format model.

Co-authored-by: gwang <wanggy@outlook.com>
2020-09-11 17:56:54 +10:00
Wei-Sheng Chin
9ba56dcfed
Support Send and Recv for old NCCL versions (#5097)
If NCCL version < 2.7, MPI is sued. Otherwise, we use NCCL Send and Recv.
2020-09-09 20:58:05 -07:00
Wei-Sheng Chin
934f30fc38
Not to call NVTX when not available (#5095)
* Not to call NVTX when not available

* fix syntax

* Fix a syntax error
2020-09-09 20:01:45 -07:00
Xueyun Zhu
a90fae8c71
unify error handling in pipeline transformer (#5039) 2020-09-09 14:52:04 -07:00
Thiago Crepaldi
6594d6672f
Move onnxruntime.experiment to onnxruntime.training namespace (#5045) 2020-09-09 09:46:06 -07:00
Wei-Sheng Chin
4ccca20def
Replace MPI Send and Recv with NCCL Send and Recv (#5054)
* Prototype NCCL P2P

* Clean code

* Fix NCCL path and some minor bugs

* Add path

* Fix path

* Try fix path

* Add missed files

* Address some comments

* Clean code

* Rename files

* Add MPI path back and fix a path

* Put MPI path under USE_NCCL flag

* not to build Send and Recv when MPI is not installed
2020-09-09 09:39:56 -07:00
Vincent Wang
07bf8b968e
Register BiasGelu and BiasDropout for CUDA only. (#5060)
Co-authored-by: Vincent Wang <weicwang@microsoft.com>
2020-09-09 11:46:55 +08:00
Sherlock
38453acae3
Further populate Stop Gradient list (#5021)
* Add to Stop Gradient list

* Improve Stop gradient
2020-09-08 12:49:09 -07:00
liqunfu
de58720a97
Liqun/transformer test and e2e golden numbers (#5064)
* match new/old api numbers

* new golden numbers for Roberta and MC

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-09-04 18:11:37 -07:00
Vincent Wang
84de14a833
Register OpSet13 CUDA Kernels for BERT/UniLMv2 (#4856)
* opset13 cuda kernels for BERT.

* add opset13 SoftmaxCrossEntropyLoss.

* opset13 size.

* fix argmax/min for ut.

* fix ut failure for argmax/min.

* OrtMemTypeCPUInput

Co-authored-by: Vincent Wang <weicwang@microsoft.com>
2020-09-05 08:09:52 +08:00
Bowen Bao
6dd4af3936
Fix initializer name only when wrapper is applied (#4920)
* Fix initializer name only when wrapper is applied

* fix inspect import
2020-09-04 12:08:07 -07:00
Thiago Crepaldi
0fc9c504fe
Re-enable CI tests for the new PyTorch frontend (#5017)
This PR includes:

* Re-enable CI tests for new PyTorch frontend
* Re-enable fp16 and adjust tolerances for number matching
2020-09-04 09:36:24 -07:00
liqunfu
bb13b52291
to allow parallel training with mpi4py (#4942)
to allow parallel training with mpi4py
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-09-03 12:47:12 -07:00
Thiago Crepaldi
9388d49c0d
Add warning to non pickable models (#5037) 2020-09-03 11:53:56 -07:00
Thiago Crepaldi
9d1bdef195
Update CODEOWNERS and minor docstring fix (#5002)
This PR includes:

* Previous CODEOWNERS was encompassing more files than just training files
* Polynomial optimizer config is missing part of its docstring
2020-09-03 11:52:38 -07:00
Suffian Khan
546965c2da
Add deterministic path for AllReduceL2 (used to compute gradient norm) (#5027)
* add deterministic path for reduce l2

* add unit tests

* memset zero size off by one

* eliminate windows warning as error

Co-authored-by: suffian khan <sukha@OrtTrainingDev1.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-09-03 10:02:41 -07:00
Bowen Bao
22ba266bd6
Add flag to _internal_use to control export of contrib ops in ort trainer (#4968) 2020-09-03 09:11:47 -07:00
Scott McKay
28445c88f9
Changes to enable saving and loading an ORT format model (#4995)
* Changes to enable saving and loading an ORT format model via the public APIs.
Cleanup session.py to try and make slightly more understandable. More refactoring is needed here.
Couple of bug fixes

* Fix bug in handling NodeArg serialization for optional inputs which has a name and no type info.

* Address PR comments
  - tweak SessionOptions config to avoid double lookup
  - merge duplicated functionality in python binding around registering an EP with optional options

Fix a couple of build issues.

* Update C API to be consistent with python API
  - only load model in InferenceSession ctor if required
  - support loading ORT model in minimal build

* Fix nodejs test.
We get an invalid path error from LoadInterOp first now

* Another attempt at fixing nodejs test.
Error message depends on whether ENABLE_LANGUAGE_INTEROP_OPS is defined. Make the output consistent.

The interop implementation looks suspicious given it appears to be internal code that is going via the public api. TBD if that should be fixed.

* Fix couple of build issues.

* Disable test temporarily so PR can be checked in.
Will fix in separate PR that adds final pieces for minimal build as the test is required there.

* Give up on nodejs test and make the match simpler.
Fix init call in TrainingSession python to not pass through sess. it wasn't being used in Session anyway so passing it through just adds confusion.

* Fix call to Session.__init__ in TrainingSession.
Session now initializes Session._sess to None to make it clearer where the 'ownership' of that member is, and that needs to happen before TrainingSession sets it.
2020-09-03 09:10:48 -07:00
Sherlock
a935731bd3
Neg Gradient (#5022)
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-09-02 15:54:17 -07:00
Thiago Crepaldi
aabed34d5c
Fix checkpoint API and improve loss scaler handling (#4950)
This PR also includes:
	* More LossScaler tests
        * Minor LossScaler improvement
	* Check model after extra post processing
	* Improve basic training tests to include all optimizers
	* Set rtol=1e-7 tolerance for Legacy vs Experimental frontend API tests
	* Increase number of training tests for Legacy vs Experimental tests
	* Minor refactoring on existing tests
        * Fix Checkpoint API for Gradient Accumulation / fp16 scenarios
2020-09-02 09:38:02 -07:00
Thiago Crepaldi
eebc2cccce
Fix fetches when eval_step's input is a subset of train_step's input (#4966)
This PR also includes MNIST sample using the new forntend as a sample
2020-09-02 08:57:44 -07:00
Thiago Crepaldi
f38f2d5b54
Port #4920 into the new pytorch frontend (#4965) 2020-09-01 19:00:49 -07:00
Hariharan Seshadri
d30dd41c0e
Remove public default ctor in PyInferenceSession and replace it with a protected ctor (#4990) 2020-09-01 17:10:36 -07:00
liqunfu
d79af260bb
Liqun/new api orttraining test transformers (#4982)
* matching transformer model test with Lamb
* increase epochs
* use atol 1e-6 to pass full precision test
2020-09-01 13:11:06 -07:00
Xueyun Zhu
1e1f5a9c79
support data parallel + pipeline parallel (#4648)
* enable data + pipeline parallel

* distributed group calculation

* fix typo

* fix test and minor changes
2020-08-31 17:32:03 -07:00
Thiago Crepaldi
9817b8c8a7
Fix state_dict/checkpoint issue introduced by #4639 (#4984)
https://github.com/microsoft/onnxruntime/pull/4639 changed the default
behavior by removing optimizer state from state_dict/checkpoint APIs.
The reason for the previous change was to allow models trained on ORT to
be used for inference on PyTorch, which is an important feature.

Due to the change aforementioned, when resuming training from a checkpoint,
the optimizer would start with random weights, leading to a bad performance.
This behavior would also cause reproducibility issues, as the optimizer
wouldnt be able to resume from its previous state.

This PR adds a boolean flag to state_dict/save_xheckpoint API that
when True (default) it saves both model and optimizer state.
When False, only the model state is kept.
2020-08-31 17:00:14 -07:00
Sherlock
50c610e70a
Stop Gradient at Shape op (#4983) 2020-08-31 13:13:17 -07:00
M. Zeeshan Siddiqui
6d9d252bc3
Disable NegativeLogLikelihoodLoss_LargeSizeTensor test (#4979)
Disabling this test until it's intermittent failure is root caused, this is a function and does not have a dedicated op by itself. However, this op is not used in known model to the best of my knowledge to disabling this test for the sanity of CI until the investigation is over is probably reasonable.
2020-08-31 11:02:07 -07:00