Commit graph

797 commits

Author SHA1 Message Date
baijumeswani
5da4e07daa
Make FusedAdam mathematically equivalent to Transformers AdamW (#9343) 2021-10-18 16:03:18 -07:00
pengwa
f05c285a58
Exception when duplicated autograd.Function name detected (#9351)
* Exception when duplicated autograd.Function name detected

* reorder a bit for a bittle bit better perf

* fix a bug in previous PR :(

* correct the error message a bit
2021-10-15 12:23:13 +08:00
Jeff Daily
c8789d3047
[ROCm] static re-hipify of CUDA EP to ROCm EP, now a shared provider (#8877)
* re-hipify all rocm EP sources

* fix all other files affected by re-hipify

* add cuda_provider_factory.h to amd_hipify.py

* do not use cudnn_conv_algo_search in ROCm EP, missing reduce min registration

* Fix ReduceConsts template specialization introduced in #9101.

Fixes the error when building for ROCm 4.3.1:

error: too many template headers for onnxruntime::rocm::ReduceConsts<__half>::One (should be 0)

* fix flake8 error in amd_hipify.py

* speed up hipify with concurrent.futures

* flake8 fix in amd_hipify.py
2021-10-14 15:15:51 -07:00
Abhishek Jindal
23700a15a0
Abjindal/eager windows build (#9326)
* removing warnings which are causing errors from torch and changing flags for Windows

* adding MKL library resolution and comments

* cleaning up the code

* fixing onnxruntime_python file for windows build

* fix the include order to aovid the python_d.lib issue on win debug build

* changes for warnings, typos and other comments

* merge conflict

* adding fix for mkl library error

* Revert "adding fix for mkl library error"

This reverts commit 73b87c73c2.

* fix for dll path for windows

* typo for dll path

Co-authored-by: Cheng Tang <chenta@microsoft.com>
2021-10-14 12:54:49 -07:00
Xavier Dupré
22e3f8bf54
Refactor TrainingManager.forward (#9354)
* Refactor TrainingManager.forward
2021-10-14 12:54:31 +02:00
pengwa
5ee47e3ffa
legacy_megatron-lm/deepspeed_ZERO1&2 FP16_Optimizer wrapper (#9184)
* megatron-lm FP16_Optimizer Wrap, allow model parallelism aggregation optional

* add deepspeed zero1 and zero2 - checkoverflow & clip norm

* re-structure code and add the copyright

* update the document

* refine the code after validation
2021-10-14 09:01:23 +08:00
Chandru Ramakrishnan
ba0cca96f0
Hooked up eager logging to ORT default logger. (#9340)
* Hooked up eager logging to ORT default logger.
2021-10-13 18:10:32 -04:00
Tang, Cheng
f0bc35c4ba
fix a hardcode type (#9337) 2021-10-12 13:44:46 -07:00
Tang, Cheng
48737091c0
resolve the provider options before create training session in orttrainer (#9199)
* resolve the provider options before create training session in orttrainer

* Update orttraining/orttraining/python/orttraining_pybind_common.h

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* support clear the training ep instance pool

* fix status error

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
2021-10-12 09:30:45 -07:00
ashbhandare
52c021d1f3
Fix export of aten op for Max and Avg Pool 2D (#9330) 2021-10-12 09:03:14 -07:00
Edward Chen
79e736ed25
Make onnxruntime::Status nodiscard (#9279)
Mark onnxruntime::Status class with [[nodiscard]] attribute.
Fix existing warnings.
2021-10-08 17:10:31 -07:00
satyajandhyala
29379db432
Added SigmoidGrad schema and kernels. (#9244)
* Added SigmoidGrad schema and kernels.

* Added test_sigmoid_grad function.
2021-10-08 11:03:28 -07:00
Tang, Cheng
68601fc296
error handling ffor eager mode's data transfer (#9261) 2021-10-07 17:16:33 -07:00
ytaous
7166586d7e
Enable SkipCheck by default (#9215)
* Enable SkipCheck by default

* fix UTs

* fix UT

* fix UTs

* fix UTs

* address comments

* fix UT

* enable skipchecks

* move _SkipCheck back

* move _SkipCheck back

* move _SkipCheck back

* Update orttraining/orttraining/python/training/ortmodule/_inference_manager.py

* Update orttraining/orttraining/python/training/ortmodule/_utils.py

Co-authored-by: Ethan Tao <ettao@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
2021-10-07 15:47:14 -07:00
Tang, Cheng
c002dc86a3
set mpi group init flag after add group (#9293) 2021-10-07 10:09:16 -07:00
Thiago Crepaldi
52d067402a
Fix all-or-nothing fallback for bad ORTModule init (#9277)
* Fix all-or-nothing fallback for bad ORTModule init

* Address comments
2021-10-06 15:12:27 -04:00
baijumeswani
bcdb411c8d
Implement FusedAdam for ORT adapted from DeepSpeed (#9266) 2021-10-05 20:50:34 -07:00
ashbhandare
35c2102cfa
Fixes for GatherND, Multinomial (#9143)
* register gathernd kernel, aten multinomial

* fix CI, add test

* review comments
2021-10-05 14:51:58 -07:00
G. Ramalingam
0b77c9ca7c
Cleanup function definitions of contrib ops (#9265)
* Simplify function definitions

* Simplify fast-gelu function definition

* Simplify training function op body definitions

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Eliminate redundant function

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Formatting changes

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Minor formatting changes

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Add comment

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Specify int64 type for constant 1

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
2021-10-05 11:38:42 -07:00
Thiago Crepaldi
6e2f66ee9c
Allow custom exporter args + bug fix (#9242) 2021-10-04 11:32:42 -04:00
baijumeswani
45399d5ace
Remove TORCH_WARN to avoid torch string related operations that take up time (#9238) 2021-10-01 13:56:04 -04:00
Tang, Cheng
be4d887439
Fix ONNX exporter call with latest API for ORTrainer (#9228)
* update the exporter call with latest api in orttrainer

* use official export api instead of the private call
2021-10-01 13:49:55 -04:00
G. Ramalingam
e79be39081
LayerNormGrad function body and LayerNorm inference/body fix (#9160)
* Add function body for LayerNormGrad

* Fix LayerNorm schema for multiple normalization dims
2021-09-30 12:03:08 -07:00
Thiago Crepaldi
ceb51dda4a
Support external torch cpp extensions on ORTModule (#9223) 2021-09-30 10:37:35 -04:00
satyajandhyala
278928a102
Added a test case for python gradient builder. (#9207)
* Register Cos operator gradient using ORTModule's register_gradient and compare gradient against PyTorch.
2021-09-29 09:24:12 -07:00
Suffian Khan
6f580f07de
Switch AMD CI pipeline to use environment image from onnxruntimecibuildenvironment (#9206)
* shift docker image reference for amd ci pipeline

* fix service endpoint

* reduce perf tolerance
2021-09-28 13:06:16 -07:00
ytaous
d3f859fe30
Dropout Vectorized Kernel (#9157)
* vectorized kernel

* fix build

* re-calibrate expected loss

* fix build

* re-calibrate convergence results

* more re-calibrate on loss

* divide kernels

* adress comments

* more calibration

* calibration

* per comments

* enable sync

Co-authored-by: Ethan Tao <ettao@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-09-27 17:19:12 -07:00
Wei-Sheng Chin
1b0816859f
Only wrap sub-modules which can be wrapped as ORTModule (#9021) 2021-09-27 17:18:22 -07:00
baijumeswani
c30cc9190a
Change the agent pool for orttraining-distributed pipeline (#9179) 2021-09-26 21:26:44 -07:00
baijumeswani
fd91bf91c9
Print full stacktrace exception when exporter fails (#9169) 2021-09-24 10:24:37 -04:00
Vincent Wang
39dc6ea8a3
Fix to_dlpack Failure on PyTorch-1.10 (#9151)
* workaround to_dlpack fail in new pt version

* add torch code link
2021-09-24 09:48:07 +08:00
Thiago Crepaldi
153767bab4
Add internal determinism flag configuration for ORTModule (#9074) 2021-09-21 15:11:41 -04:00
Ryan Hill
b876e5675b
C API Enum Name Fixes (#9092) 2021-09-17 15:11:26 -07:00
Ryan Hill
26509465f0
Add default C++ initialization to OrtCUDAProviderOptions (#9064)
* Add default C++ initialization to OrtCUDAProviderOptions
2021-09-16 15:03:58 -07:00
Suffian Khan
e758870b18
Upgrade ROCm CI pipeline for ROCm 4.3.1 and permit run inside container (#9070)
* try to run inside 4.3.1 container

* no \ in container run command

* remove networking options

* try with adding video render groups

* add job to build docker image

* try without 1st stage

* change alpha, beta to float

* try adding service connection

* retain huggingface directory

* static video and render gid

* use runtime expression for variables

* install torch-ort

* pin sacrebleu==1.5.1

* update curves for rocm 4.3.1

* try again

* disable determinism and only check tail of loss curve and with a much larger threshold of 0.05

* disable RoBERTa due to high run variablity on ROCm 4.3.1

* put reduction unit tests back in
2021-09-15 12:32:02 -07:00
ashbhandare
98ac341c5b
Filter nones from ctx saved tensors (#9063)
Co-authored-by: Aishwarya Bhandare <aibhanda@5cb7a9c3931a4b19a66ae028b49221a6000001.ahkw4qp232huflxlm4gmpq4nbh.jx.internal.cloudapp.net>
2021-09-15 10:13:45 -07:00
G. Ramalingam
7d28b596f4
Add function-body to opschema of FastGeluGrad (#9028)
* Add function body to FastGeluGrad

* Add test case
2021-09-14 12:27:55 -07:00
Sherlock
9174cbe3d5
Optimize CUDA Kernel for 3D and 4D Transpose (#8928)
* Optimize Transpose120 and Transpose102

* Generalize Transpose0123 for more input shapes

* Add Transpose3D test cases

* update rocm kernel
2021-09-13 23:00:53 -07:00
baijumeswani
34f37d2920
Disable fallback for ortmodule api tests (#9018) 2021-09-13 16:00:13 -07:00
mindest
a1021a1cf4
Add BatchNorm kernel for ROCm (#9014)
* Add BatchNorm kernel for ROCm, update BN test

* correct epsilon_ setting; limit min epsilon
2021-09-13 15:15:05 +08:00
Ryan Hill
c3321b1778
Fix NVTX profiling so it can run in the shared CUDA provider (#9035)
* Move NVTX profiling so it can run in the shared provider properly
2021-09-11 00:35:54 -07:00
Tang, Cheng
8eb6546e8e
enable eager mode with ortmodule (#8961)
* initial change for eager/ortmodule integration

* pdate to latest pytorch api

* add test model;fix torch version issue

* fix comments in pr

* fix python test break

* fix api change

* fix comments in PR

* pass device into the fw function
2021-09-10 15:09:23 -07:00
satyajandhyala
ce7b12bf5d
Added new fp16 allow/safe opcodes in PropagateCastOps (#8964)
* Removed RemoveInputOutputUpDownCasts strategy in PropagatCastOps.

* Added Expand, Squeeze and Unsqueeze ops to fp16 allow ops

* Added onnx models for squeeze/unsqueeze tests.
2021-09-10 11:53:26 -07:00
Bowen Bao
31af88c0bc
Update cross_entropy_loss symbolic for new argument from upstream torch (#9007)
In torch 1.10, `label_smoothing` is added as additional input to `cross_entropy_loss`. Update the symbolic function to handle this change.
2021-09-10 10:32:59 -07:00
baijumeswani
d78e90d1af
Adding preprocessor checks for torch version during torch cpp extensions compilation (#8989) 2021-09-09 10:26:38 -07:00
pengwa
d209fe29b9
custom autograd func memory refinement (#8993)
* Release torch tensor referenced by torch gradient graph (created in PythonOp)

* Update orttraining/orttraining/python/training/ortmodule/torch_cpp_extensions/torch_interop_utils/torch_interop_utils.cc

* refine with comments

Co-authored-by: Wei-Sheng Chin <wschin@outlook.com>
2021-09-09 18:37:24 +08:00
Ashwini Khade
ec63d10303
add model local function support (#8540)
* updates for picking pnnx commit

* add tests filter to c# tests

* plus test fixes

* fix versioning for contrib ops

* fix tests

* test filter for optional ops

* more versioning related updates

* fix test

* fix layernorm spec

* more updates

* update docs

* add more test filters

* more filters

* update binary size threshold

* update docs

* draft - enable model local function

* enable model local functions in ORT

* update to latest rel onnx commit

* plus tests

* plus more updates

* plus updates

* test updates

* Fix for nested functions + shape inference

* plus bug fix and updates per review

* plus fixes per review

* plus test updates

* plus updates per review

* plus fixes

* fix a test
2021-09-08 11:47:01 -07:00
baijumeswani
0cc2909573
Auto forward non method attribute lookups to the user's model and bind custom methods to ORTModule (#8798) 2021-09-03 08:25:44 -07:00
Vincent Wang
c343f7cb43
Add Algorithm Search for ConvGrad (#8613)
* algo search for conv grad

* global cache, bigger workspace size

* fix build error

* refactor

* refactor

* resolve comments

* fix rocm

* change lock places

* rename variable

* remove setting for inference

* resolve comments
2021-09-03 11:25:17 +08:00
Gary Miguel
47435311f4
Include pytorch_export_contrib_ops in inference builds (#8878)
* Include pytorch_export_contrib_ops in inference builds

Rename / move it from tools/python/register_custom_ops_pytorch_exporter
to onnxruntime/python/tools/pytorch_export_contrib_ops.

Rationale for inclusion in inference builds:
This code is potentially useful for anyone using ORT, not just training.

Rationale for new name:
"Contrib op" is the nomenclature used within ORT to refer to the set of
ops that are not in the standard op set but are included by default with
ORT. This is more specific than "custom op", which is what the PyTorch
exporter uses to refer to any non-standard op.

Step 1 of addressing #8818. After this is merged I will update the docs.

* Enable test_pytorch_export_contrib_ops.py in CI

Fixes AB#1342330
2021-09-02 14:26:58 -07:00