Commit graph

727 commits

Author SHA1 Message Date
Changming Sun
4bfff45859
Downgrade Eigen (#8817) 2021-08-23 18:06:23 -07:00
Chandru Ramakrishnan
2693af9799
Ported changes / bug fixes from torch/ort. (#8784)
* Ported changes / bug fixes from torch/ort.

* Fixed formatting

* Renamed function

* Renamed module_ to module.

* Revert "Renamed module_ to module."

This reverts commit b17fc114b3db20d174283811d90592b5b8154c19.

* Include pybind common header to fix linker errors on windows debug.

* Fix to generation of > 1 custom op.

Co-authored-by: Ashwin Hari <ashari@microsoft.com>
2021-08-23 17:45:40 -04:00
George Nash
d4a88cfe3f
Add Gemm op to DNNL Exectution provider (#8799)
* Implement Gemm op for DNNL execution provider

Signed-off-by: George Nash <george.nash@intel.com>

* Remove KernelRegistry and Gemm op for dnnl ep

The KernelRegistry for the dnnl execution provider only
registered a Gemm op that as best we can tell was never
actually used and also was not using the dnnl library.

We have implemented a Gemm op in the DNNL execution provider
subgraph code and thus are removing the unused Gemm op that
was in the dnnl KernelRegistry.

Signed-off-by: George Nash <george.nash@intel.com>

* Fix duplicated output and kernelshape inference

fix getcapability to make sure subgraph outputs do not have duplicates

fix kernelshape inference in pool

Signed-off-by: Wang <zhaoyang.wang@intel.com>

* Removed most dnnl specialized ifdefs from gradient_ops_test code

Re-enable GlobalAveragePoolGrad test for dnnl ep

The bugs that were exposed by the GlobalAveragePoolGrad test have
been fixed and this test no longer needs to be disabled for DNNL.

Removed the ReluGradDnnl test. We are getting the testing from the
already existing ReluGrad test.

MaxPoolGrad test no longer has specialized execution provider
enabling for DNNL execution provider. It will now run without
the extra enabling.

ConvGrad is the only test that still has dnnl specialized ifdefs
However, the ConvGrad code was not being executed by the code
unless it was listed first in the list of execution providers.

Signed-off-by: George Nash <george.nash@intel.com>

* Fix transpose issue on Gemm

On transposing square matrices, getmemoryandreshape will fail to reshape

fix by adding a bool

Signed-off-by: Wang <zhaoyang.wang@intel.com>

* Save memory space by reusing internal tensor for output

The intermediat matmul output tensor can be used as the output
tensor for the binary calculation.

Remove the unused IsAttributeSupported from the
DnnlGemmNodeCapability class since we now support all of the
Gemm attributes in our implementation.

Signed-off-by: George Nash <george.nash@intel.com>

Co-authored-by: Wang <zhaoyang.wang@intel.com>
2021-08-23 08:45:34 -07:00
Suffian Khan
9fa0d8392a
Extend node debugging utilities to push tensors and node placement to SQL database (#8672)
* adding support for tracing to sqldb instead of files

* use compiled statements

* script to pull tensors from db

* link sqlite3

* remove node info redundant with onnx graph

* addressing PR comments

* address PR comments and include program counter

* third party notice

* use find_pacakge

* add to cgmanifests.json

* address thread safety and add pid suffix

* build fi

* python script to select on devicetype

* remove unpopulated and redundant Shape and Type fields

* comment

* comment

* PR comments

* add graph execution counter to session state

* move increment to inference session

* std::endl to \n

* ifdef on graph execution counter

* add ifdef to inference session

* move DEBUG_NODE_INPUTS_OUTPUTS to CMakeLists.txt
2021-08-21 00:40:12 -07:00
Sherlock
81889a1cf6
Invertible ReluGrad (#8773)
* Invertible Relu Grad
2021-08-19 11:29:05 -07:00
Aaron Bockover
b2813656f5
eager: fix build against latest PyTorch master (#8745)
Improve README as well.
2021-08-18 14:27:21 -04:00
pengwa
0983d61969
refine glue code and tests (#8510) 2021-08-18 11:38:00 +08:00
ashbhandare
cc275e7529
Gradient Accumulation optimization verified for correctness (#8273)
* Fetching frontier tensors to frontend

* Move before session initialize call

* Fetch tensor and add to cache

* Rest of the changes for using cache

* Review comments

* Review changes

* Review comments

* switch to shared_ptr

* Fix bug after rebase

* FE docstring change
2021-08-17 16:24:44 -07:00
baijumeswani
871eeb4dbd
Support dicts as inputs to ORTModule (#8718) 2021-08-17 13:40:55 -07:00
Thiago Crepaldi
ed254c283f
Add support for experimental json config for fallback (#8759) 2021-08-17 13:35:42 -07:00
Thiago Crepaldi
419834d285
Add PyTorch fallback for ORTModule forward exceptions (#8346) 2021-08-17 10:41:15 -07:00
M. Zeeshan Siddiqui
0fb82f0f8a
Memory aware gradient builder. (#8582) 2021-08-16 19:01:22 -07:00
Nat Kershaw (MSFT)
aa12d68c37
Update ORTModule API docstrings (#8309) 2021-08-16 16:53:01 -07:00
George Nash
e695cd304a
Dnnl refactor (#8627)
* dnnl ep rework

    rework DnnlTensor,DnnlNode,DnnlSubgraph to support arbitrary graph topology and tensor data types

    rework GetCapability to claim nodes in graph greedily from node topological ordering and delay creation of DnnlSubgraph until Compile

    rework compile to have DnnlSubgraphPrimitive as the object to handle primitive creation and execution
        instead of thread local primitive pool which duplicates intermediate memory allocated by the EP across threads

    DnnlSubgraphPrimitive provides helpers to handle many common functions for each dnnl primitive builder and become the centralized place to store input, output, intermediate memories, initializer memories and etc
        it provides functions to obtain input memories with automatic reordering/reshaping and moving between engines
        it provides interfaces to add primitive, set output memory for single node and etc

    add CONCURRENT_EXEC compile flag for dnnl library as without it, convolution primitive cannot be created and executed on different threads

    enable unit tests to run on dnnl ep as well if built with dnnl ep

    add dnnl ep support for Matmulinteger

* Add Relu to the DNNL refactor

Signed-off-by: George Nash <george.nash@intel.com>

* Add Convolution op to the DNNL rework

Signed-off-by: George Nash <george.nash@intel.com>

* Add Pooling ops to the DNNL rework

This adds the following ops:
    - AveragePool
    - GlobalAveragePool
    - GlobalMaxPool
    - MaxPool

Note: Pooling with dilation is not yet supported.
Note: GlobalLpPool, LpPool, MaxRoiPool, and MaxUnpool are not supported yet.

Signed-off-by: George Nash <george.nash@intel.com>

* Add Sum op to the DNNL rework

Signed-off-by: George Nash <george.nash@intel.com>

* Add ConvGrad op to the DNNL rework

Signed-off-by: George Nash <george.nash@intel.com>

* Add MaxPoolGrad and AveragePoolGrad ops to DNNL rework

Signed-off-by: George Nash <george.nash@intel.com>

* Added lrn operator to the refactored code

Signed-off by chethan.palangoutu.keshava@intel.com

* Added ReduceMean DNNL op to the refactor code

Signed-off-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>

* Added Softmax DNNL op for the refactored code

Signed-off-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>

* Added BatchNorm DNNL op inference-only for refactored code

Signed-off-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>

* Added Binary Ops to DNNL rework

Signed-off-by: Wang <zhaoyang.wang@intel.com>

* Added ReluGrad to DNNL Rework

Signed-off-by: Wang <zhaoyang.wang@intel.com>

* Update OneDNN tag to v2.3

Signed-off-by: Wang <zhaoyang.wang@intel.com>

* Added support for memory upto dim size 12

this is to fix the CI test cases that contain binary ops of input dim
size > 5

Signed-off-by: Wang <zhaoyang.wang@intel.com>

* Prevent claiming support for float16 and bfloat16 when only float is suppoted

By using The string.find used was causing the code to claiming support
for float16 and bfloat16 when we only supported float. We now explicitly
check the code for the data type or the data type with a 7 letter prefix
basically prefixed with "tensor("

Signed-off-by: George Nash <george.nash@intel.com>

* Disable uint8 mul and div, improve type conversion

Disable mul_uint8 and div_uint8 test cases as they use modulo for
overflow handling while onednn uses saturation

improve ype conversion using enum instead of string comparsion as well
as adding more types

Signed-off-by: Wang <zhaoyang.wang@intel.com>

Co-authored-by: Wang <zhaoyang.wang@intel.com>
Co-authored-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>
2021-08-13 14:15:43 -07:00
Changming Sun
436ac6dd5f
Rename ml_value.h to ort_value.h (#8726) 2021-08-13 07:04:56 -07:00
baijumeswani
217b2c9f93
Removing filelock import from ORTModule (#8722) 2021-08-12 21:19:49 -07:00
Tang, Cheng
de2a53e46d
[eager mode] fix build and support customize shared provider entry point (#8680)
* fix build break

* support customize the name of shared provide lib's entry point

* fix non training build

* check error code

* check return code
2021-08-11 15:10:35 -07:00
harshithapv
c24335246b
Support bool type for Pad Op and fix Unsqueeze in Tile grad for Opset 13 (#8602)
* changes

* tile grad unsqueeze fix for opset 13

* clean up

* remove bool support for opset 2 to 12 for Pad as it is not supported.

* Copy OperatorKernels.md from artifacts of Windows CI build.
2021-08-11 11:21:02 -07:00
mindest
a56e325eb8
constrain inputs for min/max grad UT (#8632)
* fix inputs for min/max grad UT

* use random inputs (truncated)
2021-08-07 18:29:06 +08:00
Tang, Cheng
6d3c2c85ef
Integrate eager mode source code into onnxruntime repo (#8584)
* integrate eager mode source codde; build with cmake and integrate the python test

* Adding the python path for importing libraries in the Eager mode

* fix clang break;check if training and python enabled

* handling the linking of torch libraries across multiple platforms

* merge and fix the naming

* add build instruction

Co-authored-by: Abhishek Jindal <abjindal@OrtTrainingDev0.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: ajindal1 <abjindal@microsoft.com>
2021-08-06 08:30:27 -07:00
Ashwini Khade
96eb9810ba
Update onnx (#8458)
* updates for picking pnnx commit

* add tests filter to c# tests

* plus test fixes

* fix versioning for contrib ops

* fix tests

* test filter for optional ops

* more versioning related updates

* fix test

* fix layernorm spec

* more updates

* update docs

* add more test filters

* more filters

* update binary size threshold

* update docs

* plus more fixes

* updates per review

* update to release commit

* add filters for optional type tests

* plus updates
2021-08-05 09:21:44 -07:00
Changming Sun
0510688411
Update compliance tasks in python packaging pipeline and fix some compile warnings (#8471)
1. Update SDLNativeRules from v2 to v3. The new one allows us setting excluded paths.
2. Update TSAUpload from v1 to v2. And add a config file ".gdn/.gdntsa" for it.
3. Fix some parentheses warnings
4. Update cmake to the latest.
5. Remove "--x86" build option from pipeline yaml files. Now we can auto-detect cpu architecture from python. So we don't need to ask user to specify it.
2021-07-30 17:16:37 -07:00
baijumeswani
816ad86d14
Configuring ORTModule - Internal Options (#8537) 2021-07-30 13:05:32 -07:00
satyajandhyala
5e2f4263db
Enable cast propagation in the frontend. (#8517) 2021-07-28 17:06:49 -07:00
baijumeswani
2e28cbaa64
Configuring ORTModule - End User Facing Options (#8470) 2021-07-28 10:51:43 -07:00
Sherlock
1370cbe256
[ORTModule] Extract output schema in module's true train/eval mode (#8516)
* Extract output schema in module's true train/eval mode
2021-07-28 09:55:07 -07:00
mindest
a71dab691d
Implement BatchNormInternal for cuda (#8172)
* correct batchnorm replacement output order;

remove bn replacement in grad graph builder

* update op defs and kernel class

* implement batch norm internal and grad.

* change saved_var into saved_inv_std

* cuda test case: bn internal

* remove redundant include

* fix comment; add support and UT for 1d input.

* exclude batch_norm_internal in amd_hipify

* run BNInternal UT for CUDA only

* fix CI error

* fix comment errors

* fix error

* add comment for inconsistency with cudnnBN doc

* additional comments for cudnnBN inconsistency
2021-07-28 16:04:49 +08:00
Vincent Wang
1798698545
avgpool2d atenop (#8507) 2021-07-28 14:04:55 +08:00
Sherlock
686f9b530b
ORTModule set_seed in int (#8511) 2021-07-27 15:43:13 -07:00
Oliver Rausch
1685ab8138
Implement Concat with Strided copy (#8336)
Adds a StridedCopy function that implements a copy from strided tensor to another.

This parallelizes the Concat operator, and can also be used in the future to parallelize many other data movement operators (e.g. Transpose, Split, etc.).
This operation is also required for the proposed data layout extensions to ORT.
2021-07-27 18:27:56 +02:00
ytaous
1ae32655b3
fix t5 assert error (#8501)
Co-authored-by: Ethan Tao <ettao@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-07-27 09:04:01 -07:00
ytaous
ab5289f109
Performance: enable faster training with skip checks config (#8411)
* freeze/fastpath support

* more comments on _fast_path

* per comments

* minor fix

* IntFlag improve

* address comments

Co-authored-by: Ethan Tao <ettao@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-07-23 10:23:13 -07:00
Vincent Wang
c8d210de29
Decouple Forward and Backward of ATenOp (#8301)
* atenop for inference

* assert if dtype mismatch

* atenop config in frontend

* fix orttrainer test

* gradient def not only for ATenOp

* bugfix

* fix gradient input shape and type issue

* fix after merge master
2021-07-23 16:53:26 +08:00
Thiago Crepaldi
9073c094d4 Update torch litghning and re-enable test 2021-07-22 14:18:07 -07:00
pengwa
892ac9f55a
code structure update (rename only) (#8410) 2021-07-22 23:50:19 +08:00
Edward Chen
695536a7ac
Make some common macros safer to use. (#8445) 2021-07-21 12:14:36 -07:00
Sherlock
28527b4867
Handle duplicated names for output_grads (#8431) 2021-07-20 10:17:31 -07:00
Sherlock
4931ef666d
Update ORTModule frontend code owner file (#8335) 2021-07-14 09:26:04 -07:00
pengwa
7db4fc8c2a
Fix segment fault for custom function (#8331)
* unregister registered python functions upon normal interpreter termination
* atexit.register(unregister_python_functions) should be called by __init__.py
* minor fix
2021-07-13 18:01:33 +08:00
Tang, Cheng
e467d78a11
fix a typo (#8334)
Co-authored-by: Cheng Tang <chenta@microsoft.com>
2021-07-09 09:24:43 -07:00
Tang, Cheng
598454bb5f
Fix the mix precision handle for square case (#8333)
* handle unsqueeze change in opset13

* fix the node arguments index check for square case (x * x)

* Revert "fix the node arguments index check for square case (x * x)"

This reverts commit c66344f0a82c35d8c24d31f2264cf7e9b235ce22.

* handle the square case (x * x) for node argument search

Co-authored-by: Cheng Tang <chenta@microsoft.com>
2021-07-09 09:24:19 -07:00
Hariharan Seshadri
46e5c8d4b9
Cosmetic change in test infrastructure (#8292) 2021-07-08 21:52:02 -07:00
pengwa
5454af4b95
decouple the shared python dependency (#8294)
* remove warnining message for non-training build

* move to/from dlpack for onnxruntime_python back into python project
2021-07-09 11:47:11 +08:00
satyajandhyala
84bc20fe9d
Enable cast propagation with level one by default. (#8286) 2021-07-08 14:38:09 -07:00
pengwa
6dbfb8db0e
autograd function fallback perf (#8312)
* fix known issues

* Update orttraining/orttraining/test/python/orttraining_test_ortmodule_autograd.py

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
2021-07-09 00:29:40 +08:00
baijumeswani
6652d17dcd
Support lists as inputs to ORTModule (#8311) 2021-07-07 13:04:19 -07:00
Tang, Cheng
d7c3703371
handle unsqueeze change in opset13 (#8308)
Co-authored-by: Cheng Tang <chenta@microsoft.com>
2021-07-06 22:30:24 -07:00
pengwa
2347a0aca8
Autograd Function Fallback bug fix - moe support (#8105)
* Support forward inputs orders like "Non_tensor/Tensor/Non_tensor". Correspondingly, support "None/Tensor_Grad/None" fpr backward outputs.

* Report RuntimeError when PythonOp detected but _enable_custom_autograd_function is enabled.

* Fix "PoliCheck ] - Defect : Term "hang", Component : orttraining\orttraining\python\training\ortmodule\__init__.py (1 issue)"

* rename call_convention->input_convention, input_tensor_requires_grads->input_requires_grads

* fix minor comment

* revert polycheck fix in case of conflict

* Update orttraining/orttraining/core/graph/training_op_defs.cc

Co-authored-by: Tim Harris <tiharr@microsoft.com>

* Apply suggestions from code review

Refine the schema description

Co-authored-by: Tim Harris <tiharr@microsoft.com>

* Resolve review comments

Co-authored-by: Tim Harris <tiharr@microsoft.com>
2021-07-07 08:58:01 +08:00
Suffian Khan
036eee5b66
register softmaxinternal with rocm (#8289) 2021-07-02 16:29:18 -07:00
Vincent Wang
88ec95ea96
Support OrtMemTypeCPUInput for ATenOp/ATenOpGrad (#8116) 2021-07-02 23:04:43 +08:00