* Export numpy_T as onnx transpose
* further fixes, test
Co-authored-by: Aishwarya Bhandare <aibhanda@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Add abseil and inlined containers typedefs
Introduce TensorShapeVector for shape building.
Use gsl::span<const T> to make interfaces accept different types of vector like args.
Introduce InineShapeVectorT for shape capacity typed instantiations
Refactor cuda slice along with provider shared interfaces
Refactor Concat, Conv, Pad
Build with Conv Einsum and ConvTranspose refactored.
Remove TesnorShape::GetDimsAsVector()
Refactor SliceIterator and SliceIteratorBase
Refactor broadcast
Refactor Pads for twice as long
Remove memory planner intermediate shapes vector
Refactor orttraining
Fix passing TenshroShapeVector to tests
Remove abseil copy and submodule, use FetchContent_Declare/Fetch
Path with separate command
Make RocmAsyncBuffer accept anything convertible to span. Adjust Linux GPU pipeline.
* clearing map for eager mode backends
* clearing map for eager mode backends manager
* making OrtBackendsManager an extern variable and trying to delete it
* cleaning backends manager when the python interpret exits
* adding ifdef for eager mode code
* disabling warning for pybind state file
* disabling warning for python module file
* running clang auto format and reducing redundancy
* remove new line
* moving declaration to a new header file
* adding the header file for eager mode for python module
* removing source files for eager mode
* add source file for python module in eager mode
* Update orttraining/orttraining/python/orttraining_python_module_eager.h
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
* fix deadlock in model.train model forward run only
* fix tests
* clear the grad_fns before every forward run
* add clean up on exit
* fix
* refine code comments
* fix aten view op
* add test case
* fix signature
* fix the build
Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
In a reduced ops build, some source files get updated. This change moves the updated files into the build directory. This way, it is easier to simultaneously manage different build directories (with possibly different reduced ops configurations) based on a single source directory.
* Add Reduce Ops to DNNL ep
Combine the Reduction ops into one class
Add ReduceL1, ReduceL2, ReduceSum, ReduceMax, ReduceMin, and ReduceProd,
ReduceSumSquare, ReduceLogSum, and ReduceLogSumExp
Reduce code now also handles the keepdims attribute
Also updated code to use HandleNegativeAxis function from
the providers/common.h code instead of manually calculating.
In code documentation exists to help explain complex reduction op code
Add elementwise ops to Reduction op capability code removed keepdims check
from the Reduction op capability code.
Updated the error_tolerance for LogGrad(DNNL EP only) after finding a few
instances that the tests were a little out of tolerance.
Signed-off-by: George Nash <george.nash@intel.com>
* Documentation cleanup in dnnl_qattention
Cleaned up the Comments documenting the QAttention operator
For some reason a bunch of new lines were introduced to the
comment making it harder to read.
Signed-off-by: George Nash <george.nash@intel.com>
* Add AtenOp at:bitwise_or
* Specify overload name for bitwise_or
* undo unnecessary import
* set output element type to BOOL
* Add broadcasting support
* Fix test
Co-authored-by: Gani Nazirov <ganaziro@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Gani Nazirov <ganaziro@microsoft.com>
* adding view operator changes
* adding the slice operator definition
* moving to opgen script for slice op and removing redundant steps in view op and reshape_copy
* adding for at definition
* adding for at::infer_size definition
* changing template style for reshape_copy to ensure int64_t type
* update to torch 1.10
* update torchvision version
* update torchtext version
* remove deprecated option enable_onnx_checker
* add unit test to test gradient of GatherElements
* add ORTMODULE_ONNX_OPSET_VERSION in a docker file
* add ortmodule and eager mode test
* add ortmodule dependency
* convert between aten ort tensor and ortvalue
* register the EP to ortmodule using ort device information
* remove duplicated test
* remove useless dependency
* handle half precision type for ortmodule outputs
* adjust the tensor conversion python code
Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Potential comparison of a constant with another constant.
at D:\a\_work\1\s\orttraining\orttraining\training_ops\cuda\reduction\\reduction_all.cu@97,42
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
* add ortmodule and eager mode test
* add ortmodule dependency
* fix eager pipeline
* skip tthe ortmodule test for windows due to win ci issue
* remove useless win ci change
* add torch
Co-authored-by: Abhishek Jindal <abjindal@microsoft.com>
* fix reshape implementation in eager mode
* test code
* update opgen script to support fallback to cpu
* enhance the eager backend to support torch cpu fallback
* add more testes
* disable the printensor test for now, as we need to erge a PR to pytorch first
* register custom symbolic for einsum
* bugfix for case needs permute at the end
* refactor
* refactor equation parser
* support new case, use ReduceProd
* optimize perf and graph
* remove some Gather node
* add more ut, fix gemm trans fusion