Commit graph

3313 commits

Author SHA1 Message Date
Wei-Sheng Chin
4ccca20def
Replace MPI Send and Recv with NCCL Send and Recv (#5054)
* Prototype NCCL P2P

* Clean code

* Fix NCCL path and some minor bugs

* Add path

* Fix path

* Try fix path

* Add missed files

* Address some comments

* Clean code

* Rename files

* Add MPI path back and fix a path

* Put MPI path under USE_NCCL flag

* not to build Send and Recv when MPI is not installed
2020-09-09 09:39:56 -07:00
Scott McKay
dbf4e7019d
Add ability to generate configuration file with required operators. (#5089)
* Add ability to generate configuration file with required operators.
2020-09-09 21:39:17 +10:00
Scott McKay
80ada0291f
Improve the minimal build size on android and linux (#5086)
Fix bug where linux build fails when python is enabled and rtti is disabled
Update doco for new build settings
2020-09-09 21:38:34 +10:00
Guoyu Wang
5019b2f3b9
fix for x86 android build break (#5088) 2020-09-09 21:38:22 +10:00
gwang-msft
a1a81470e3
Add minimal build binary size verification (arm64) to Android CI (#5087)
* Add minimal build binary size verification (arm64) to Android CI

* Add comments in the CI ymal
2020-09-09 19:06:20 +10:00
dependabot[bot]
b8d63f31c3 Bump bl from 4.0.2 to 4.0.3 in /nodejs
Bumps [bl](https://github.com/rvagg/bl) from 4.0.2 to 4.0.3.
- [Release notes](https://github.com/rvagg/bl/releases)
- [Commits](https://github.com/rvagg/bl/compare/v4.0.2...v4.0.3)

Signed-off-by: dependabot[bot] <support@github.com>
2020-09-09 00:33:28 -07:00
Vincent Wang
07bf8b968e
Register BiasGelu and BiasDropout for CUDA only. (#5060)
Co-authored-by: Vincent Wang <weicwang@microsoft.com>
2020-09-09 11:46:55 +08:00
Brian Martin
f41614a875
User/brianma/telemetry (#5084)
* add runtime session id to (de)tensorization events

* append start or stop to the event names and remove opcodes

* add appsessionguid to telemetry events
2020-09-08 19:02:46 -07:00
Moshe David
1b46573bb7
Update BUILD.md (#5085)
* Update BUILD.md

- No need to format the word 'parameter' as code

* Update BUILD.md
2020-09-08 18:20:46 -07:00
gwang-msft
a40d34386a
Add Linux CPU CI for ORT minimal build (#5074)
* initial test version

* update yml

* minor updates

* minor updates

* Test minimal build

* update with include ops for minimal build ut only

* error case to see build failure

* test no_exceptio

* Remove error cases

* address pr comments

Co-authored-by: gwang0000 <62914304+gwang0000@users.noreply.github.com>
2020-09-08 17:09:33 -07:00
Ye Wang
b23e08b85c
Add AutoModel selector in transformers tool (#5051)
* Add AutoModel selector in transformers tool

* change distilbert-*-squad's pipeline to AutoModelForQuestionAnswering

* rule base selector and add model_class as parameter

* Update huggingface_models.py

* review comments
2020-09-08 15:06:04 -07:00
Cameron Maske
4553b2eecd
Expose DirectML provider to python (conflicts resolved from #3359) (#4630) 2020-09-08 14:34:09 -07:00
Ye Wang
c239ff0750
Modify embedlayernorm fusion due to shape node merging (#4967)
* modify embedlayernorm fusion due to shape integration

* update

* update comments

* review comments

* review comments

* fix test
2020-09-08 14:17:29 -07:00
Sherlock
38453acae3
Further populate Stop Gradient list (#5021)
* Add to Stop Gradient list

* Improve Stop gradient
2020-09-08 12:49:09 -07:00
Hariharan Seshadri
e1ed0fde2b
Prevent registering both DML and CUDA EPs in an ML op test (#5078) 2020-09-08 11:13:50 -07:00
Olivia Jain
8d91d4ff36
Build docker image instruction fix (CUDA) (#5070) 2020-09-08 09:59:16 -07:00
Scott McKay
6c33e95b88
Fix signed/unsigned mismatch on x86. (#5079) 2020-09-08 18:37:18 +10:00
Scott McKay
796ddeb2cb
Remove serialization of outer scope value info in ORT format model (#5077)
* Remove serialization of outer scope node arg info in ORT format model. We don't currently need it in a minimal build as only SessionState calls Graph::IsConstantInitializer and it doesn't search outer scope. If we do need it in the future the information can be calculated at runtime (small binary size cost to do so).

Motivation: ORT format model was 32% bigger for a BERT model with multiple levels of subgraph and a lot of nodes due to this. Size is about 5% larger of the original ONNX model with the change. ORT format has type/shape info for all nodes, and this model has 2000 nodes so this seems reasonable.

Added example code to dump ORT format model to json.

Fixed misc bug in python test script around handling float and non-float expected output.
2020-09-08 17:43:42 +10:00
Scott McKay
e03a391895
Small updates to ORT Mobile documentation (#5075)
* Few documentation clarifications

* Few more tweaks
2020-09-08 11:02:31 +10:00
Scott McKay
36dc057913
Add unit test for C# setting of session options config entry. (#5073)
Make error message slightly more user friendly.
2020-09-07 20:15:33 +10:00
gwang-msft
5d60d57ce2
Add csharp API for AddSessionConfigEntry (#5072)
Co-authored-by: gwang0000 <62914304+gwang0000@users.noreply.github.com>
2020-09-05 21:40:38 +10:00
Pranav Sharma
2c1410afe7
Remove usage of macros for constants in public header. (#5061)
* Remove usage of macros for constants

* Fix linkage issue
2020-09-05 01:27:20 -07:00
gwang-msft
6081c1cfa2
Update ONNX to latest (#5069)
* Update ONNX to latest

* update onnxml.cs

* revert changes in proto and cs files

* add broken test

* update broken tests

* update broken tests

Co-authored-by: gwang0000 <62914304+gwang0000@users.noreply.github.com>
2020-09-05 00:49:09 -07:00
gwang-msft
d922cb1081
Add sequence and map support in ORT mobile file format, add UT (#5066)
* Init change

* Update schema header

* Address review comments

* fix for DISABLE_ML_OPS build break

* Fix build break

Co-authored-by: gwang0000 <62914304+gwang0000@users.noreply.github.com>
2020-09-04 21:33:48 -07:00
liqunfu
de58720a97
Liqun/transformer test and e2e golden numbers (#5064)
* match new/old api numbers

* new golden numbers for Roberta and MC

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-09-04 18:11:37 -07:00
Vincent Wang
84de14a833
Register OpSet13 CUDA Kernels for BERT/UniLMv2 (#4856)
* opset13 cuda kernels for BERT.

* add opset13 SoftmaxCrossEntropyLoss.

* opset13 size.

* fix argmax/min for ut.

* fix ut failure for argmax/min.

* OrtMemTypeCPUInput

Co-authored-by: Vincent Wang <weicwang@microsoft.com>
2020-09-05 08:09:52 +08:00
Changming Sun
370d194db7
Add a docker file for CI build CUDA 10.2 (#5065) 2020-09-04 16:28:45 -07:00
Zhang Lei
ec88f14a7a
Implement QLinearMul in mlas (#4593)
* Implement QLinearMul
2020-09-04 15:02:19 -07:00
Scott McKay
b5c2932ae8
Last major set of ORT format model changes (#5056)
* Add minimal build option to build.py
Group some of the build settings so binary size reduction options are all together
Make some cmake variable naming more consistent
Replace usage of std::hash with murmurhash3 for kernel. std::hash is implementation dependent so can't be used.
Add initial doco and ONNX to ORT model conversion script
Misc cleanups of minimal build breaks.
2020-09-05 07:59:01 +10:00
Du Li
6134994db9
Parallelizing elementwise kernels (#4577)
* Parallelizing unary elementarywise ops.

* Parallelizing binary elementwise ops.

* Accommodating PR comments.
2020-09-04 14:45:43 -07:00
Xiang Zhang
0dad79b495
Add SetLanguageProjection C Api and use it in four projections (#5023)
* Add SetLanguageProjection C Api and use it in four projections

* static cast enum languageprojection to uint32_t

* resolve comments

* fix typo and line added unintentionally

* revert unecessary change

* reorder c# api

* add TensorAt and CreateAndRegisterAllocator in Csharp to keep the same order as C apis
2020-09-04 14:26:39 -07:00
Bowen Bao
6dd4af3936
Fix initializer name only when wrapper is applied (#4920)
* Fix initializer name only when wrapper is applied

* fix inspect import
2020-09-04 12:08:07 -07:00
Ryan Hill
d792af776d
Remove Cuda dependency from TensorRT shared provider (#5014) 2020-09-04 11:35:02 -07:00
Zhang Lei
78bb53381b
optimize resize op for NN mode for some fasterrcnn model (#4825)
Also Add test case for 5-D. Disable 5d test for Cuda Provider.
2020-09-04 10:35:36 -07:00
Zhang Lei
8289981f0e
Implement QLinearSigmoid. (#5015)
Refactor QLinearLeakyRelu using QLinearLookupBase.
Paralleling the lookup phase.
2020-09-04 09:37:17 -07:00
Thiago Crepaldi
0fc9c504fe
Re-enable CI tests for the new PyTorch frontend (#5017)
This PR includes:

* Re-enable CI tests for new PyTorch frontend
* Re-enable fp16 and adjust tolerances for number matching
2020-09-04 09:36:24 -07:00
Andrews548
bd215b79a2
ACL v20.02 (#4981)
* Add ACL version 20.02

* fix loging typo

* check depthwise operation based on group param

* Generate ArmNN runtime inside class constructor

* Update to the latest ONNX operation set

* Update BUILD.md

Co-authored-by: Andrei-Alexandru <andrei-alexandru.avram@nxp.com>
2020-09-03 20:44:27 -07:00
Bowen Bao
73456f10cd
Fix contrib ops unregister to match pytorch behavior (#5052) 2020-09-03 16:32:42 -07:00
Nat Kershaw (MSFT)
d7502eff8f
Add nodejs samples README (#5005) 2020-09-03 15:58:44 -07:00
xkszltl
4b9b5b6146
Imported protoc cannot have compile options. (#5030) 2020-09-03 15:20:00 -07:00
Sergii Dymchenko
d7984fe6ba
Add packages from training docker to cgmanifest. (#5033) 2020-09-03 13:11:41 -07:00
liqunfu
bb13b52291
to allow parallel training with mpi4py (#4942)
to allow parallel training with mpi4py
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-09-03 12:47:12 -07:00
Thiago Crepaldi
9388d49c0d
Add warning to non pickable models (#5037) 2020-09-03 11:53:56 -07:00
Thiago Crepaldi
9d1bdef195
Update CODEOWNERS and minor docstring fix (#5002)
This PR includes:

* Previous CODEOWNERS was encompassing more files than just training files
* Polynomial optimizer config is missing part of its docstring
2020-09-03 11:52:38 -07:00
Suffian Khan
546965c2da
Add deterministic path for AllReduceL2 (used to compute gradient norm) (#5027)
* add deterministic path for reduce l2

* add unit tests

* memset zero size off by one

* eliminate windows warning as error

Co-authored-by: suffian khan <sukha@OrtTrainingDev1.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-09-03 10:02:41 -07:00
Ashwini Khade
9ba2cfb71b
fix py packaging pipeline (#5038)
* add test skip logic when opset > allowed opset

* fix attribute error

* plus fix
2020-09-03 09:32:10 -07:00
Bowen Bao
22ba266bd6
Add flag to _internal_use to control export of contrib ops in ort trainer (#4968) 2020-09-03 09:11:47 -07:00
Scott McKay
28445c88f9
Changes to enable saving and loading an ORT format model (#4995)
* Changes to enable saving and loading an ORT format model via the public APIs.
Cleanup session.py to try and make slightly more understandable. More refactoring is needed here.
Couple of bug fixes

* Fix bug in handling NodeArg serialization for optional inputs which has a name and no type info.

* Address PR comments
  - tweak SessionOptions config to avoid double lookup
  - merge duplicated functionality in python binding around registering an EP with optional options

Fix a couple of build issues.

* Update C API to be consistent with python API
  - only load model in InferenceSession ctor if required
  - support loading ORT model in minimal build

* Fix nodejs test.
We get an invalid path error from LoadInterOp first now

* Another attempt at fixing nodejs test.
Error message depends on whether ENABLE_LANGUAGE_INTEROP_OPS is defined. Make the output consistent.

The interop implementation looks suspicious given it appears to be internal code that is going via the public api. TBD if that should be fixed.

* Fix couple of build issues.

* Disable test temporarily so PR can be checked in.
Will fix in separate PR that adds final pieces for minimal build as the test is required there.

* Give up on nodejs test and make the match simpler.
Fix init call in TrainingSession python to not pass through sess. it wasn't being used in Session anyway so passing it through just adds confusion.

* Fix call to Session.__init__ in TrainingSession.
Session now initializes Session._sess to None to make it clearer where the 'ownership' of that member is, and that needs to happen before TrainingSession sets it.
2020-09-03 09:10:48 -07:00
Tim Harris
bbb9d92a5f
Remove SchedulingParams variants of ThreadPool::TryParallelFor (#5050) 2020-09-03 09:04:31 -07:00
gwang-msft
fde7a2c848
Temporarily switch SafeInt to a fork for an option to disable exceptions (#5041)
* Removed submodule

* Add safeint fork
2020-09-02 23:21:39 -07:00