Edward Chen
ddb4c05852
Save graph runtime optimizations for minimal build ( #9508 )
...
Add support for saving graph runtime optimizations in an ORT format model. The idea is to allow some optimizations to be "replayed" at runtime in a minimal build. The replaying part will be in a future change.
2021-11-04 10:49:46 -07:00
baijumeswani
230099e482
Make ORTModule serializable ( #9634 )
2021-11-03 13:54:05 -07:00
groenenboomj
5c56fa0def
Miopen conv grad ( #9574 )
...
* Add source for conv_grad
* Add sources for ROCm EP.
* Transliterate sources for conv_grad for ROCm EP.
* Add conv_grad to ROCm EP
Add conv_grad to ROCm execution
provider.
* Update ROCm EP ConvGrad
Update ConvGrad for the ROCm EP to match other EP
changes and fix a build issue.
2021-10-31 11:19:46 -07:00
Hariharan Seshadri
b5f7bb7d10
Update ONNX ( #9462 )
2021-10-29 10:33:40 -07:00
Xavier Dupré
9c15c68ed4
Enable fallback when forward fails due to non contiguous tensor ( #9369 )
2021-10-28 13:04:54 -07:00
Thiago Crepaldi
5d5c03bcdc
Fix opset version change by not using copy of global constant ( #9393 )
2021-10-27 12:42:06 -04:00
satyajandhyala
f29057c7c0
Added TanhGrad. ( #9507 )
...
* Added TanhGrad.
2021-10-26 09:10:03 -07:00
pengwa
b125446f9c
Optimize python overhead of APEX amp ( #9447 )
...
* optimize python overhead of _post_amp_backward
* overwrite apex amp's zero_grad for faster implementation
* move unscale_fp16_grads_into_fp32_grads into C++ impl
* improve the efficiency furthur, reducing 3.5ms to 1.7ms for unilm.
* unilm 1.7ms to 338us: 1). optimize python list <==> std::vector copy, 2). launch the kernels as long as num_elem reach thresh hold. This help reduce the CUDA idel time.
* refine the logic a bit after validating
Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>
2021-10-26 13:13:49 +08:00
ashbhandare
0270ff7951
Minor import fix ( #9538 )
2021-10-25 21:29:31 -07:00
Vincent Wang
fb4f7dbbb7
Call ATenOp for ReduceSum on ORTModule ( #9471 )
...
* call ATenOp for ReduceSum
* Enable ReduceSum ATenOp for training only
* always load extension
2021-10-26 09:48:57 +08:00
Sherlock
3ed8ade675
Use SafeInt for malloc related computation ( #9503 )
...
* Use SafeInt for malloc related computation
2021-10-22 16:42:12 -07:00
Wei-Sheng Chin
beddbdec5a
Fix PythonOp exporter ( #9318 )
...
Register PythonOp exporter with the right symbol.
2021-10-22 10:45:45 -07:00
Wei-Sheng Chin
d2d480a0db
Allow None As Autograd Context ( #9315 )
...
* Allow none ctx
* Update orttraining/orttraining/test/python/orttraining_test_ortmodule_autograd.py
Co-authored-by: pengwa <pengwa@microsoft.com>
* Address a comment
Co-authored-by: pengwa <pengwa@microsoft.com>
2021-10-21 20:37:36 -07:00
Jeff Daily
66ceb6926d
rehipify ROCm EP files under orttraining ( #9443 )
...
* rehipify rocm ep files under orttraining committed to source control
* fix flake8 error
2021-10-21 13:36:21 -07:00
Xavier Dupré
5797bd6db3
Remove one unnecessary deepcopy in unflatten_user_output ( #9353 )
...
* Removes one unnecessary deepcopy
2021-10-21 10:44:27 +02:00
Nick Kreeger
f1123c2fb3
Fix whitespace and style in concat.cc ( #9452 )
2021-10-20 12:43:46 -05:00
Changming Sun
406f1629c1
Remove Featurizers code ( #9300 )
2021-10-20 10:20:35 -07:00
baijumeswani
20eaed43e5
Ignore all string inputs to ORTModule AB#1310803 ( #9344 )
2021-10-19 16:34:47 -07:00
baijumeswani
757bc66720
Set cuda version to be None instead of an empty string ( #9435 )
2021-10-19 11:10:52 -04:00
baijumeswani
5da4e07daa
Make FusedAdam mathematically equivalent to Transformers AdamW ( #9343 )
2021-10-18 16:03:18 -07:00
pengwa
f05c285a58
Exception when duplicated autograd.Function name detected ( #9351 )
...
* Exception when duplicated autograd.Function name detected
* reorder a bit for a bittle bit better perf
* fix a bug in previous PR :(
* correct the error message a bit
2021-10-15 12:23:13 +08:00
Jeff Daily
c8789d3047
[ROCm] static re-hipify of CUDA EP to ROCm EP, now a shared provider ( #8877 )
...
* re-hipify all rocm EP sources
* fix all other files affected by re-hipify
* add cuda_provider_factory.h to amd_hipify.py
* do not use cudnn_conv_algo_search in ROCm EP, missing reduce min registration
* Fix ReduceConsts template specialization introduced in #9101 .
Fixes the error when building for ROCm 4.3.1:
error: too many template headers for onnxruntime::rocm::ReduceConsts<__half>::One (should be 0)
* fix flake8 error in amd_hipify.py
* speed up hipify with concurrent.futures
* flake8 fix in amd_hipify.py
2021-10-14 15:15:51 -07:00
Abhishek Jindal
23700a15a0
Abjindal/eager windows build ( #9326 )
...
* removing warnings which are causing errors from torch and changing flags for Windows
* adding MKL library resolution and comments
* cleaning up the code
* fixing onnxruntime_python file for windows build
* fix the include order to aovid the python_d.lib issue on win debug build
* changes for warnings, typos and other comments
* merge conflict
* adding fix for mkl library error
* Revert "adding fix for mkl library error"
This reverts commit 73b87c73c2 .
* fix for dll path for windows
* typo for dll path
Co-authored-by: Cheng Tang <chenta@microsoft.com>
2021-10-14 12:54:49 -07:00
Xavier Dupré
22e3f8bf54
Refactor TrainingManager.forward ( #9354 )
...
* Refactor TrainingManager.forward
2021-10-14 12:54:31 +02:00
pengwa
5ee47e3ffa
legacy_megatron-lm/deepspeed_ZERO1&2 FP16_Optimizer wrapper ( #9184 )
...
* megatron-lm FP16_Optimizer Wrap, allow model parallelism aggregation optional
* add deepspeed zero1 and zero2 - checkoverflow & clip norm
* re-structure code and add the copyright
* update the document
* refine the code after validation
2021-10-14 09:01:23 +08:00
Chandru Ramakrishnan
ba0cca96f0
Hooked up eager logging to ORT default logger. ( #9340 )
...
* Hooked up eager logging to ORT default logger.
2021-10-13 18:10:32 -04:00
Tang, Cheng
f0bc35c4ba
fix a hardcode type ( #9337 )
2021-10-12 13:44:46 -07:00
Tang, Cheng
48737091c0
resolve the provider options before create training session in orttrainer ( #9199 )
...
* resolve the provider options before create training session in orttrainer
* Update orttraining/orttraining/python/orttraining_pybind_common.h
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
* support clear the training ep instance pool
* fix status error
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
2021-10-12 09:30:45 -07:00
ashbhandare
52c021d1f3
Fix export of aten op for Max and Avg Pool 2D ( #9330 )
2021-10-12 09:03:14 -07:00
Edward Chen
79e736ed25
Make onnxruntime::Status nodiscard ( #9279 )
...
Mark onnxruntime::Status class with [[nodiscard]] attribute.
Fix existing warnings.
2021-10-08 17:10:31 -07:00
satyajandhyala
29379db432
Added SigmoidGrad schema and kernels. ( #9244 )
...
* Added SigmoidGrad schema and kernels.
* Added test_sigmoid_grad function.
2021-10-08 11:03:28 -07:00
Tang, Cheng
68601fc296
error handling ffor eager mode's data transfer ( #9261 )
2021-10-07 17:16:33 -07:00
ytaous
7166586d7e
Enable SkipCheck by default ( #9215 )
...
* Enable SkipCheck by default
* fix UTs
* fix UT
* fix UTs
* fix UTs
* address comments
* fix UT
* enable skipchecks
* move _SkipCheck back
* move _SkipCheck back
* move _SkipCheck back
* Update orttraining/orttraining/python/training/ortmodule/_inference_manager.py
* Update orttraining/orttraining/python/training/ortmodule/_utils.py
Co-authored-by: Ethan Tao <ettao@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
2021-10-07 15:47:14 -07:00
Tang, Cheng
c002dc86a3
set mpi group init flag after add group ( #9293 )
2021-10-07 10:09:16 -07:00
Thiago Crepaldi
52d067402a
Fix all-or-nothing fallback for bad ORTModule init ( #9277 )
...
* Fix all-or-nothing fallback for bad ORTModule init
* Address comments
2021-10-06 15:12:27 -04:00
baijumeswani
bcdb411c8d
Implement FusedAdam for ORT adapted from DeepSpeed ( #9266 )
2021-10-05 20:50:34 -07:00
ashbhandare
35c2102cfa
Fixes for GatherND, Multinomial ( #9143 )
...
* register gathernd kernel, aten multinomial
* fix CI, add test
* review comments
2021-10-05 14:51:58 -07:00
G. Ramalingam
0b77c9ca7c
Cleanup function definitions of contrib ops ( #9265 )
...
* Simplify function definitions
* Simplify fast-gelu function definition
* Simplify training function op body definitions
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
* Eliminate redundant function
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
* Formatting changes
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
* Minor formatting changes
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
* Add comment
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
* Specify int64 type for constant 1
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
2021-10-05 11:38:42 -07:00
Thiago Crepaldi
6e2f66ee9c
Allow custom exporter args + bug fix ( #9242 )
2021-10-04 11:32:42 -04:00
baijumeswani
45399d5ace
Remove TORCH_WARN to avoid torch string related operations that take up time ( #9238 )
2021-10-01 13:56:04 -04:00
Tang, Cheng
be4d887439
Fix ONNX exporter call with latest API for ORTrainer ( #9228 )
...
* update the exporter call with latest api in orttrainer
* use official export api instead of the private call
2021-10-01 13:49:55 -04:00
G. Ramalingam
e79be39081
LayerNormGrad function body and LayerNorm inference/body fix ( #9160 )
...
* Add function body for LayerNormGrad
* Fix LayerNorm schema for multiple normalization dims
2021-09-30 12:03:08 -07:00
Thiago Crepaldi
ceb51dda4a
Support external torch cpp extensions on ORTModule ( #9223 )
2021-09-30 10:37:35 -04:00
satyajandhyala
278928a102
Added a test case for python gradient builder. ( #9207 )
...
* Register Cos operator gradient using ORTModule's register_gradient and compare gradient against PyTorch.
2021-09-29 09:24:12 -07:00
Suffian Khan
6f580f07de
Switch AMD CI pipeline to use environment image from onnxruntimecibuildenvironment ( #9206 )
...
* shift docker image reference for amd ci pipeline
* fix service endpoint
* reduce perf tolerance
2021-09-28 13:06:16 -07:00
ytaous
d3f859fe30
Dropout Vectorized Kernel ( #9157 )
...
* vectorized kernel
* fix build
* re-calibrate expected loss
* fix build
* re-calibrate convergence results
* more re-calibrate on loss
* divide kernels
* adress comments
* more calibration
* calibration
* per comments
* enable sync
Co-authored-by: Ethan Tao <ettao@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-09-27 17:19:12 -07:00
Wei-Sheng Chin
1b0816859f
Only wrap sub-modules which can be wrapped as ORTModule ( #9021 )
2021-09-27 17:18:22 -07:00
baijumeswani
c30cc9190a
Change the agent pool for orttraining-distributed pipeline ( #9179 )
2021-09-26 21:26:44 -07:00
baijumeswani
fd91bf91c9
Print full stacktrace exception when exporter fails ( #9169 )
2021-09-24 10:24:37 -04:00
Vincent Wang
39dc6ea8a3
Fix to_dlpack Failure on PyTorch-1.10 ( #9151 )
...
* workaround to_dlpack fail in new pt version
* add torch code link
2021-09-24 09:48:07 +08:00