Commit graph

958 commits

Author SHA1 Message Date
Wil Brady
b0e027c661
Add aten::_softmax to eager ops. (#11820) 2022-06-13 13:05:26 -04:00
Vincent Wang
f745eb1d3f
fix gradient ut (#11797) 2022-06-10 12:14:19 +08:00
Vincent Wang
5ecfaef042
ATen Fallback for Inference (#11597)
* aten op for inference

* fix build error

* more some code to training only

* remove domain from operator name

* move aten_op_executor ext out from ortmodule

* add pipeline

* add exec mode

* fix script

* fix ut script

* fix test pipeline

* failure test

* rollback

* bugfix

* resolve comments

* enable aten for python build only

* fix win build

* use target_compile_definitions

* support io binding

* turn off aten by default

* fix ut

Co-authored-by: Vincent Wang <weicwang@microsoft.com>
Co-authored-by: zhijxu <zhijxu@microsoft.com>
2022-06-09 16:07:30 +08:00
PeixuanZuo
908e19dc16
[FIX] using torch.version.cuda/hip to ensure build ORTModule Torch C++ CUDA extension for docker build (#11675)
* [FIX] cpp ext

* Update orttraining/orttraining/python/training/ortmodule/torch_cpp_extensions/install.py

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* [FIX] fix python format

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
2022-06-07 07:51:26 +08:00
Changming Sun
3c1dd9514d
Revert "fixed point based requantization on arm64 (#11540)" (#11732)
This reverts commit 1f2c926. Because it makes our packaging pipeline crash

Error message:

[ RUN ] QLinearConvTest.Conv3D_S8S8_Depthwise
Test #1: onnxruntime_test_all ...................Subprocess killed***Exception: 838.24 sec

We haven't successfully reproduced the bug on a real ARM64 hardware. Currently we only saw it showed up with qemu. More investigations are on-going.
2022-06-03 19:12:25 -07:00
Yufeng Li
1f2c92673b
fixed point based requantization on arm64 (#11540)
* fixed point based requantization on arm64

* reverse MlasConvSymDepthwiseKernel u8s8 and s8s8 order
2022-06-02 12:34:17 -07:00
Vincent Wang
54d1573d2f
[ORTModule] Enable SimplifiedLayerNormalization Fusion (#11580)
* enable SimplifiedLayerNormalization fuse

* remove allow_layer_norm_mod_precision flag
2022-06-01 15:09:39 +08:00
pengwa
44f7b1bf2c
MTA AdamWOptimizer (#11506)
* skeleton change

* adam compute kernels

* add rtol/atol for tests

* some clean up

* optional outputs

* more clean up

* add tests

* adamw mode=1 test pass

* clean up tests

* add HF AdamW test cases

* refactor adam test file

* make test pass

* all test pass, fix comments

* rename to adamw

* make test pass again

* fix cpplint

* minor fixes

* fix python lint

* Fix build and tests

* fix builds

* fix windows build

* fix win build

* minor fix

* Refine based on comments

* resolve comments

* formatting

* resolve comments

* add ut
2022-05-27 19:52:04 +08:00
Vincent Wang
02724c54ff
[CUDA] Implement BitmaskDropout, BitmaskBiasDropout and BitmaskDropoutGrad (#11534)
* Implement BitmaskDropout and associated unit tests.

* Implement BitmaskDropoutGrad and associated unit tests.

* Implement Dropout -> BitmaskDropout rewrite rule and associated unit tests.

* Implement (Dropout,DropoutGrad) -> (BitmaskDropout,BitmaskDropoutGrad) rewrite rule.

This commit does not yet include unit tests for this rewrite rule.

This commit also introduces improved documentation for all changes which will be grouped
into this PR.

* bitmask dropout

* fix win build

* bugfix for rocm

* bugfix

* fix code format

* fix ut

* fix build break

* fix ut in win

* resolve comments

* fix ut in trt

* resolve comments

* fix rocm build error

* fix typo

Co-authored-by: Aidan Beggs <aidanbeggs@microsoft.com>
2022-05-27 17:24:47 +08:00
Vincent Wang
eadb1a3128
Speed Up GradientChecker Running (#11579)
* fix gradient tester

* test size adjust

* fix win build
2022-05-27 15:14:53 +08:00
Thiago Crepaldi
427230431a
Fix torch cpp ext build when CPU wheel is installed but GPU card is present (#11608)
* Fix torch cpp ext build when CPU wheel is installed but GPU card is present

Also there is a minor improvement for ATen operator that allows both
"::op" and "aten::op" name for operators

* Fix flake8 false positive
2022-05-25 09:44:26 -04:00
PeixuanZuo
a67994316a
Update rocm ci to ROCm5.1.1 + torch1.10.0
* [UPDATE] update amd ci pipeline 2 rocm5.1.1

* [FIX] json format error

* [ERROR] disable unit tests

* [FIX] ucx error

* [FIX] cmake version

* [FIX] units test
2022-05-20 11:07:21 +08:00
Vincent Wang
436c4f9b79
Add BFloat16 (bf16) support for ATen (#11546)
Co-authored-by: Vincent Wang <weicwang@microsoft.com>
2022-05-19 10:04:08 -04:00
Vincent Wang
084165c748
Change MinGrad/MaxGrad to Use Distributed Logic (#11388)
* change min max grad

* resolve comments
2022-05-05 11:49:40 +08:00
Tang, Cheng
ae043e3963
Support ort device tensor in ortmodule's inference (#11112)
* support ort device tensor in ort module inference

* fallback aten equal to cpu; add ortmodule inference test case

* fix python format

Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-05-03 14:28:30 -07:00
Changming Sun
5023f6750b
Revert "Call pluggable EP's shutdown function in Environment::~Environment() (#11120)" (#11393)
This reverts commit 4983d6e5d6. We can't destroy OrtEnv through python's atexit function, because at that time there might be many other ORT python objects alive.
2022-05-02 14:38:31 -07:00
G. Ramalingam
024747bff4
Allow int32 as shape type (#11345)
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
2022-05-02 10:10:30 -07:00
Tang, Cheng
4b875e3543
Re-implment the function support in onnxruntime (#11167)
* initial fix

* refactor the function handle

* update the implementation

* fix linux build break

* fix training build

* fix minmal build

* fix gradient checker

* deprecate the local function members in graph. host it in model

* fix changming's comments

* fix comments about inlined containers

* fix a missed inlined container

* fix training build

* avoid const for std string_view

Co-authored-by: Cheng Tang <chenta@microsoft.com>
2022-04-29 10:15:58 -07:00
mindest
c8270c2940
Add ATen export and gradient for torch.max/min (#11275)
* add aten export for max, max.dim

* rewrite grad of max (no dim); add cases for min

* update UT cases

* mod sym shape infer

* resolve comments: shape infer, add comments, etc.

* add test for torch.max of two tensors

* resolve peng's comments: keepdim; test case

* correct python format

* fix recently introduced lint error
2022-04-28 17:30:33 +08:00
Vincent Wang
1c64351e09
Create Tensor with Strides (#11294)
* create tensor with strides

* resolve comments

* refactor

Co-authored-by: Vincent Wang <weicwang@microsoft.com>
2022-04-28 16:49:37 +08:00
Justin Chu
d64769c38e
Set black's target version (#11370)
Description: Set black's target version to be py37 - py310

Motivation and Context

Black by default targets its format for py3.10. Since our project supports python 3.7, we need to target version to all the python versions supported.

Re-ran black. 13 files reformatted.
2022-04-27 14:52:19 -07:00
Justin Chu
fdce4fa6af
Format all python files under onnxruntime with black and isort (#11324)
Description: Format all python files under onnxruntime with black and isort.

After checking in, we can use .git-blame-ignore-revs to ignore the formatting PR in git blame.

#11315, #11316
2022-04-26 09:35:16 -07:00
Gary Miguel
7aa4af238a
Add strict_shape_type_inference config option (#11081)
Prior to this, certain shape and type errors were surfaced only when
the model was using the latest known op set version.

Providing users an explicit option allows for better testing of code
that produces models, which includes unit tests within this repo and
other repos such as the TF-ONNX and PT-ONNX converters.

Remove the previous behavior which seems quite counter-intuitive:
an otherwise identical model with a later op set version should be treated
identically in this regard.

The option defaults to false to avoid causing errors for users that
rely on the previous permissive behavior.

Turned on the strict enforcement by default in OpTester, which revealed a few
disagreements between ORT and ONNX on what the correct output shape should
be.

Fix shape inference bug in ReduceSumTraining with noop_with_empty_axes=1
which was revealed.

Fix TensorOpTest.Unsqueeze_scalar, which was testing negative axes on an
op set version where the op did not actually support negative axes.

Fixes #9506.
2022-04-21 08:32:40 -07:00
Vincent Wang
06026fe8e6
SizeInBytes Fix for Strided Tensor (#11224)
* SizeInBytes Fix for Strided Tensor

* resolve comments
2022-04-19 15:13:00 +08:00
pengwa
9765ef8b4e
fix build warnings (#11213)
* fix build warning
2022-04-18 21:09:09 +08:00
ashbhandare
ddb17294b2
Fix gradient builder for Cast (#11008)
* fix grad builder for cast

* reviw comments

Co-authored-by: Aishwarya Bhandare <aibhanda@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Aishwarya Bhandare <aibhanda@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-04-12 16:08:21 -07:00
Vincent Wang
bcc62e0cbf
move some process out of training step (#11150) 2022-04-08 17:30:11 +08:00
Changming Sun
4983d6e5d6
Call pluggable EP's shutdown function in Environment::~Environment() (#11120)
I disabled some tests temporarily. I will move them to a separated executable file in another PR.

In the future, I want to combine onnxruntime::Environment and OrtEnv classes. Now we have 3 env classes, it is too confusing:

1. onnxruntime::Env
2. onnxruntime::Environment
3. OrtEnv
Our python binding uses onnxruntime::Environment, while all other language bindings use OrtEnv. So python doesn't unload EPs but the others do. It's better to make them consistent.

Please note even I added the call, currently the unload function still is a no-op on Linux. So, currently on Windows we must unload the EPs while on Linux we must not do it.
2022-04-07 14:11:29 -07:00
Dmitri Smirnov
2700261f7c
Provide an API to supply external initializers data from user buffers (#11109)
Imlpement AddExternalInitializers
2022-04-07 12:21:53 -07:00
Xavier Dupré
3f42665a40
Improve transfered time from ort to torch (#9610)
* Improve transfered time from ort to torch
* Use static_cast
* fix call to Python API for python <= 3.8
* investigation
* fix ref counts
* disable import if no training
* one function to convert multiple ortvalues
* add proto_type
* enforce dlpack->deleter to be not null
* fix _ortvalues_to_torch_tensor for eager mode
* rename proto_type into element_type in the Python API
* conversion from ort to torch 2x times faster
* fix conversion of list of OrtValue
* replace has_bool_tensor by bool_tensor_indices
* introduce _ortvalues_to_torch_tensor_list
* use _ortvalues_to_torch_tensor_list for cache
* fix ambiguity between c and python classes

Co-authored-by: xavier dupré <xavier.dupre@gmail.com>
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
2022-04-06 09:12:58 +02:00
Abhishek Jindal
91c940b619
adding fill scalar for torch ones direct initialization on ort device (#10898)
* adding fill scalar for torch ones direct initialization on device and adding test case for it

* using ConstantOfShape to for implementing fill Scalar in atenops

* adding case for handling at::Tensor attribute

* handling the at::Tensor type for ConstantOfShape

* handling the at::Tensor type for ConstantOfShape with attr type

* handling the at::Tensor type case

* converting the data to tensor in case of aten tensor mapping is needed

* handling aten tensor case

* handling aten tensor case and reversing the string case

* changing type of scalar
2022-04-05 11:17:25 -07:00
G. Ramalingam
2c2408814f
Add function body for SoftmaxCrossEntropyLossGrad (#10779)
* Add function definition for SoftmaxCrossEntropyLossGrad

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Cleanup

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Eliminate unused variable

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Fix index of weight tensor

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* A few fixes to handle typing and weight

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Fix for zero D dimensions

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Add function body to internal op also

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* A few fixes

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Fix type variable name

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Fix type constraint var

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Fix ignore_index handling in testcase

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>

* Add fun def for SoftmaxCrossEntropyLossInternal

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>

* Specify opset

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>

* Handle opset in NLL function

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Address PR feedback

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Modify onehot

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Eliminate duplicate statement

Co-authored-by: Ganesan Ramalingam <grama@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-04-05 10:52:40 -07:00
Vincent Wang
6a6840d5c6
Fuse LayerNormalization for Apex O2 (#10233) 2022-03-29 21:22:04 +08:00
Vincent Wang
3b6cee8059
[CUDA] Optimize Conv and ConvGrad for Training (#10999)
* Optimize Conv and ConvGrad for Training

* add provider option to control

* fix typo
2022-03-29 07:31:36 +08:00
Baiju Meswani
9c6cc018a9
Add utility to get the gradient graph from GradientGraphBuilder (#10995)
* Add pybind method to get the gradient graph

* Fix segmentation fault because of logging for gradien building
2022-03-25 17:13:56 -07:00
Chandru Ramakrishnan
cb31b7eab1
Fixed creation of ORT_Value to pass offset of 0 (#11004) 2022-03-25 15:52:10 -04:00
Scott McKay
47c09e6701
Clarify usage of kOnnxDomainAlias. (#10962)
* Clarify usage of kOnnxDomainAlias.
2022-03-25 09:52:59 +10:00
pengwa
89ef987ab1
Improve NonZero on CUDA/ROCM (#10307)
* improve NonZero

* fix megatron_fp16 optimzier, fix the doc

* multi_tensor_applier

* resolve comment

* fix building warning

* fix build error when enabling training and use tensorrt
2022-03-25 07:35:45 +08:00
mpapdiwala
1e917c879e
Adding support for saving and loading train step info properties in the state dict and checkpoint file. (#10569)
* Adding optimization step and step parameter to the ORTTrainer constructor

* Added ORTTrainerOptions for optimization step

* Adding Train Step Info Settings to State Dictionary

* Adding train step info key

* Updating comments

* Reverting changes

* Updating test case for new state dict entry train_step_info
2022-03-24 11:50:45 -07:00
mindest
3c5853dcbc
register custom_op_symbolic for squeeze (#10970)
* register custom_op_symbolic for squeeze

* remove misleading warning msg from symbolic_opset9
2022-03-24 10:28:21 +08:00
Chandru Ramakrishnan
07201726ed
Fixed macros for graph transformer registration. (#10983) 2022-03-23 14:55:17 -04:00
raviskolli
480c793125
Update training packages to Pytorch 1.11.0 (#10851)
* Update ortmodule training packages to Pytorch 1.11.0

Co-authored-by: Harshitha Venkata <havenka@microsoft.com>
Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>
2022-03-22 16:45:51 -07:00
Baiju Meswani
565318ce86
Support ORT WASM compilation with the training flag (#10973)
* Add training support for ORT web assembly compilation

* Use wrapper for eigen includes in training
2022-03-22 16:13:35 -07:00
Chandru Ramakrishnan
4a5b5328a4
Added support to Eager CodeGen for multiple in-place parameters. (#10945)
* Added support to CodeGen for multiple inplace output parameters.

* Updated output Tensor to references.
2022-03-21 13:10:22 -07:00
G. Ramalingam
8703d37517
Extend DropoutGrad function to support bfloat16 (#10662)
* Update DropoutGrad function to support bfloat16

* Eliminate dead comments

* Set opset version for testcase

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Update to new builder

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
2022-03-20 15:11:08 -07:00
Vincent Wang
8860fded02
Disable Some Einsum ORTModule Tests Due to Issue from PyTorch Exporter (#10906)
* disable some einsum tests due to pytorch issue

* disable tests on specific torch versions

* use skipif
2022-03-18 21:28:18 +08:00
mindest
d7d7665023
restore random states after export_model (#10705)
* restore random states after export_model

* move get/set_random_states inside _export_model

* add comments for random state save/restore

* add unit test for random state check

* resolve comments

* fix error
2022-03-17 11:56:25 +08:00
Edward Chen
f468ea40e5
Refactor Node::AddAttribute() (#10869) 2022-03-16 14:53:00 +10:00
Edward Chen
e53422c6d0
Update convert_onnx_models_to_ort.py to support runtime optimizations. (#10765)
Add runtime optimization support to ONNX -> ORT format conversion script.
Replace `--optimization_level`, `--use_nnapi`, and `--use_coreml` with a new `--optimization_style` option.
2022-03-14 16:50:41 -07:00
Abhishek Jindal
03181caeae
Creating test case for printing ort tensor (#10850)
* creating a test for printing ort tensor

* modifying comment for error case

* Using Output Grabber to assert the print output

* modifying the print ort test

* removing comments

* removing sys import
2022-03-11 21:39:48 -08:00