Commit graph

834 commits

Author SHA1 Message Date
Ryan Hill
1b953c6423
Fix some code defects (#9810) 2021-11-19 15:48:15 -08:00
Sergii Dymchenko
ba339e667b
Add training performance investigation script (ONNX graph analyzer) (#9791)
* Add first version of performance investigation script.

* Simplify and update performance investigation script.
2021-11-19 13:27:00 -08:00
Tang, Cheng
fcc167dd47
fix reshape implementation in eager mode (#9741) 2021-11-18 19:26:49 -08:00
satyajandhyala
3af14fc554
Updated SoftmaxGrad and LogSoftmaxGrad to support version 13. (#9733)
* Updated SoftmaxGrad_13/LogSoftmaxGrad_13 to support version 13.
2021-11-18 17:39:16 -08:00
Vincent Wang
3654a5d60e
Register Custom Symbolic of torch.einsum for ORTModule (#9590)
* register custom symbolic for einsum

* bugfix for case needs permute at the end

* refactor

* refactor equation parser

* support new case, use ReduceProd

* optimize perf and graph

* remove some Gather node

* add more ut, fix gemm trans fusion
2021-11-18 10:13:58 +08:00
satyajandhyala
421e4c03ce
Update default cast propagation strategy from None to FloodFill (#9713)
* Changed the default cast propagation strategy from None to FloodFill.
2021-11-16 13:15:57 -08:00
Edward Chen
9acbfeba09
Address some code scan issues. (#9752) 2021-11-16 10:24:46 -08:00
Tang, Cheng
99257eb8e3
support build option to include external graph transformers (#9478)
* temp code

* support external graph transformer  from build script

* remove debug code

* add test case

* support register rewrite rule

* fix source_group issue if external source is not share any common prefix

* fix python code style checker

* resolve merge conflict

Co-authored-by: Cheng Tang <chenta@microsoft.com>
2021-11-15 08:16:20 -08:00
pengwa
6e09fc5152
Implement block wise softmax for reduction dimention > 1024 cases. (#9696)
* implement block wise softmax for reduction dimention > 1024 cases.

* fix builds

* fix

* fix amd build

* fix amd build

* fix win-gpu build

* add tests

* remove cudnn path/add python tests
2021-11-14 11:47:58 +08:00
Aidan Beggs
f6edf13513
Implement a Gemm/Sum fusion pattern (#9699)
When the pattern Sum(Gemm(A, B), C) exists, we can convert it to
Gemm(A, B, C), assuming that C the output of the original Gemm is
not used elsewhere, and this change does not break broadcasting.
2021-11-11 18:33:13 -08:00
George Wu
1541784f6c
[python api] align api with other language bindings' treatment of explicit provider registrations. enforce use of providers param in python InferenceSession when execution providers other than default CPU are enabled. (#9712)
* remove default python ep registration. raise exception if providers are not explicitly set if there are available providers

* temporarily disable exception

* fix python tests

* explicitly set CUDAProvider for python iobinding tests

* explicitly set providers param for InferenceSession())

* onnxrt

* raise ValueError if not explicitly set providers when creating InferenceSession

* add required providers param

* explicitly set providers

* typo
2021-11-10 12:17:53 -08:00
Vincent Wang
adf98feb2c
ATenOp Support for BCEWithLogitsLoss (#9670) 2021-11-10 08:36:57 +08:00
Wei-Sheng Chin
bdc279a7ed
Use the same allocator following Pytorch (#9697)
* Use the same allocator following Pytorch

* Polish

* Fix AMD build
2021-11-09 11:25:16 -08:00
satyajandhyala
229c9a4e1c
Added Trilu CUDA kernel. (#9633)
* Added Trilu CUDA kernel.

* Added TriluGrad.

* Added a training testcase for Trilu.

* Added Trilu gradient checker test.
2021-11-09 11:20:17 -08:00
mindest
c579ebfbc3
change a for iteration (#9678)
Co-authored-by: Min Lin <linmin@microsoft.com>
2021-11-09 08:33:50 +08:00
Ryan Hill
24e35fba32
Change TensorShape to typically not allocate heap memory (#9542) 2021-11-08 10:29:54 -08:00
Xavier Dupré
7e207ba3be
Use ORTMODULE_ONNX_OPSET_VERSION to modify the opset version in OrtModule (#9529)
* Use environment variable to change the ONNX opset in ORTModule
* overwrite ONNX_OPSET_VERSION
* store envvar in module constant
2021-11-08 17:03:16 +01:00
ashari4
1151c661eb
Add gi overload (#9690) 2021-11-07 16:04:00 -08:00
Edward Chen
ddb4c05852
Save graph runtime optimizations for minimal build (#9508)
Add support for saving graph runtime optimizations in an ORT format model. The idea is to allow some optimizations to be "replayed" at runtime in a minimal build. The replaying part will be in a future change.
2021-11-04 10:49:46 -07:00
baijumeswani
230099e482
Make ORTModule serializable (#9634) 2021-11-03 13:54:05 -07:00
groenenboomj
5c56fa0def
Miopen conv grad (#9574)
* Add source for conv_grad

* Add sources for ROCm EP.
* Transliterate sources for conv_grad for ROCm EP.

* Add conv_grad to ROCm EP

Add conv_grad to ROCm execution
provider.

* Update ROCm EP ConvGrad

Update ConvGrad for the ROCm EP to match other EP
changes and fix a build issue.
2021-10-31 11:19:46 -07:00
Hariharan Seshadri
b5f7bb7d10
Update ONNX (#9462) 2021-10-29 10:33:40 -07:00
Xavier Dupré
9c15c68ed4
Enable fallback when forward fails due to non contiguous tensor (#9369) 2021-10-28 13:04:54 -07:00
Thiago Crepaldi
5d5c03bcdc
Fix opset version change by not using copy of global constant (#9393) 2021-10-27 12:42:06 -04:00
satyajandhyala
f29057c7c0
Added TanhGrad. (#9507)
* Added TanhGrad.
2021-10-26 09:10:03 -07:00
pengwa
b125446f9c
Optimize python overhead of APEX amp (#9447)
* optimize python overhead of _post_amp_backward

* overwrite apex amp's zero_grad for faster implementation

* move unscale_fp16_grads_into_fp32_grads into C++ impl

* improve the efficiency furthur, reducing 3.5ms to 1.7ms for unilm.

* unilm 1.7ms to 338us: 1). optimize python list <==> std::vector copy, 2). launch the kernels as long as num_elem reach thresh hold. This help reduce the CUDA idel time.

* refine the logic a bit after validating

Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>
2021-10-26 13:13:49 +08:00
ashbhandare
0270ff7951
Minor import fix (#9538) 2021-10-25 21:29:31 -07:00
Vincent Wang
fb4f7dbbb7
Call ATenOp for ReduceSum on ORTModule (#9471)
* call ATenOp for ReduceSum

* Enable ReduceSum ATenOp for training only

* always load extension
2021-10-26 09:48:57 +08:00
Sherlock
3ed8ade675
Use SafeInt for malloc related computation (#9503)
* Use SafeInt for malloc related computation
2021-10-22 16:42:12 -07:00
Wei-Sheng Chin
beddbdec5a
Fix PythonOp exporter (#9318)
Register PythonOp exporter with the right symbol.
2021-10-22 10:45:45 -07:00
Wei-Sheng Chin
d2d480a0db
Allow None As Autograd Context (#9315)
* Allow none ctx

* Update orttraining/orttraining/test/python/orttraining_test_ortmodule_autograd.py

Co-authored-by: pengwa <pengwa@microsoft.com>

* Address a comment

Co-authored-by: pengwa <pengwa@microsoft.com>
2021-10-21 20:37:36 -07:00
Jeff Daily
66ceb6926d
rehipify ROCm EP files under orttraining (#9443)
* rehipify rocm ep files under orttraining committed to source control

* fix flake8 error
2021-10-21 13:36:21 -07:00
Xavier Dupré
5797bd6db3
Remove one unnecessary deepcopy in unflatten_user_output (#9353)
* Removes one unnecessary deepcopy
2021-10-21 10:44:27 +02:00
Nick Kreeger
f1123c2fb3
Fix whitespace and style in concat.cc (#9452) 2021-10-20 12:43:46 -05:00
Changming Sun
406f1629c1
Remove Featurizers code (#9300) 2021-10-20 10:20:35 -07:00
baijumeswani
20eaed43e5
Ignore all string inputs to ORTModule AB#1310803 (#9344) 2021-10-19 16:34:47 -07:00
baijumeswani
757bc66720
Set cuda version to be None instead of an empty string (#9435) 2021-10-19 11:10:52 -04:00
baijumeswani
5da4e07daa
Make FusedAdam mathematically equivalent to Transformers AdamW (#9343) 2021-10-18 16:03:18 -07:00
pengwa
f05c285a58
Exception when duplicated autograd.Function name detected (#9351)
* Exception when duplicated autograd.Function name detected

* reorder a bit for a bittle bit better perf

* fix a bug in previous PR :(

* correct the error message a bit
2021-10-15 12:23:13 +08:00
Jeff Daily
c8789d3047
[ROCm] static re-hipify of CUDA EP to ROCm EP, now a shared provider (#8877)
* re-hipify all rocm EP sources

* fix all other files affected by re-hipify

* add cuda_provider_factory.h to amd_hipify.py

* do not use cudnn_conv_algo_search in ROCm EP, missing reduce min registration

* Fix ReduceConsts template specialization introduced in #9101.

Fixes the error when building for ROCm 4.3.1:

error: too many template headers for onnxruntime::rocm::ReduceConsts<__half>::One (should be 0)

* fix flake8 error in amd_hipify.py

* speed up hipify with concurrent.futures

* flake8 fix in amd_hipify.py
2021-10-14 15:15:51 -07:00
Abhishek Jindal
23700a15a0
Abjindal/eager windows build (#9326)
* removing warnings which are causing errors from torch and changing flags for Windows

* adding MKL library resolution and comments

* cleaning up the code

* fixing onnxruntime_python file for windows build

* fix the include order to aovid the python_d.lib issue on win debug build

* changes for warnings, typos and other comments

* merge conflict

* adding fix for mkl library error

* Revert "adding fix for mkl library error"

This reverts commit 73b87c73c2.

* fix for dll path for windows

* typo for dll path

Co-authored-by: Cheng Tang <chenta@microsoft.com>
2021-10-14 12:54:49 -07:00
Xavier Dupré
22e3f8bf54
Refactor TrainingManager.forward (#9354)
* Refactor TrainingManager.forward
2021-10-14 12:54:31 +02:00
pengwa
5ee47e3ffa
legacy_megatron-lm/deepspeed_ZERO1&2 FP16_Optimizer wrapper (#9184)
* megatron-lm FP16_Optimizer Wrap, allow model parallelism aggregation optional

* add deepspeed zero1 and zero2 - checkoverflow & clip norm

* re-structure code and add the copyright

* update the document

* refine the code after validation
2021-10-14 09:01:23 +08:00
Chandru Ramakrishnan
ba0cca96f0
Hooked up eager logging to ORT default logger. (#9340)
* Hooked up eager logging to ORT default logger.
2021-10-13 18:10:32 -04:00
Tang, Cheng
f0bc35c4ba
fix a hardcode type (#9337) 2021-10-12 13:44:46 -07:00
Tang, Cheng
48737091c0
resolve the provider options before create training session in orttrainer (#9199)
* resolve the provider options before create training session in orttrainer

* Update orttraining/orttraining/python/orttraining_pybind_common.h

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* support clear the training ep instance pool

* fix status error

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
2021-10-12 09:30:45 -07:00
ashbhandare
52c021d1f3
Fix export of aten op for Max and Avg Pool 2D (#9330) 2021-10-12 09:03:14 -07:00
Edward Chen
79e736ed25
Make onnxruntime::Status nodiscard (#9279)
Mark onnxruntime::Status class with [[nodiscard]] attribute.
Fix existing warnings.
2021-10-08 17:10:31 -07:00
satyajandhyala
29379db432
Added SigmoidGrad schema and kernels. (#9244)
* Added SigmoidGrad schema and kernels.

* Added test_sigmoid_grad function.
2021-10-08 11:03:28 -07:00
Tang, Cheng
68601fc296
error handling ffor eager mode's data transfer (#9261) 2021-10-07 17:16:33 -07:00