Work on minimizing memory management calls by
reducing number of allocations and copies.
Replace std::unordered_set to InlinedHashSet
and add usage of InlinedVector.
Employ std::move() to minimize copying and memory allocations.
Remove copying of the const shared data into each of the
PropagateCast transformer instances.
Move inlined_containers.h header to include/common
Adjust AsSpan imlementation for C++ < 17
* Fix incorrect type constraint registration for RoiAlign. This led to the input type not actually being checked when matching a kernel as the invalid constraint name is treated as a missing optional input.
* fix missing dependency for the unit test exe. Whilst it doesn't link against the CUDA providers lib, without the dependency VS doesn't know it needs to rebuild the library if there are changes.
* Add check for invalid type constraints.
* Fix invalid registrations for other kernels.
* Add hash replacement logic to provide backwards compatibility in ORT format models when the registration is fixed.
* Add tests
* Fix UT
* UT
* UTs
* enable ROCm UT
* fix build attempt
* minor
* fix UT
* fix UT
* fix UTs
Co-authored-by: Ethan Tao <ettao@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: root <root@GCRAMDRR1-MI100-087.redmond.corp.microsoft.com>
* Export numpy_T as onnx transpose
* further fixes, test
Co-authored-by: Aishwarya Bhandare <aibhanda@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Add abseil and inlined containers typedefs
Introduce TensorShapeVector for shape building.
Use gsl::span<const T> to make interfaces accept different types of vector like args.
Introduce InineShapeVectorT for shape capacity typed instantiations
Refactor cuda slice along with provider shared interfaces
Refactor Concat, Conv, Pad
Build with Conv Einsum and ConvTranspose refactored.
Remove TesnorShape::GetDimsAsVector()
Refactor SliceIterator and SliceIteratorBase
Refactor broadcast
Refactor Pads for twice as long
Remove memory planner intermediate shapes vector
Refactor orttraining
Fix passing TenshroShapeVector to tests
Remove abseil copy and submodule, use FetchContent_Declare/Fetch
Path with separate command
Make RocmAsyncBuffer accept anything convertible to span. Adjust Linux GPU pipeline.
* clearing map for eager mode backends
* clearing map for eager mode backends manager
* making OrtBackendsManager an extern variable and trying to delete it
* cleaning backends manager when the python interpret exits
* adding ifdef for eager mode code
* disabling warning for pybind state file
* disabling warning for python module file
* running clang auto format and reducing redundancy
* remove new line
* moving declaration to a new header file
* adding the header file for eager mode for python module
* removing source files for eager mode
* add source file for python module in eager mode
* Update orttraining/orttraining/python/orttraining_python_module_eager.h
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
* fix deadlock in model.train model forward run only
* fix tests
* clear the grad_fns before every forward run
* add clean up on exit
* fix
* refine code comments
* fix aten view op
* add test case
* fix signature
* fix the build
Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
In a reduced ops build, some source files get updated. This change moves the updated files into the build directory. This way, it is easier to simultaneously manage different build directories (with possibly different reduced ops configurations) based on a single source directory.
* Add Reduce Ops to DNNL ep
Combine the Reduction ops into one class
Add ReduceL1, ReduceL2, ReduceSum, ReduceMax, ReduceMin, and ReduceProd,
ReduceSumSquare, ReduceLogSum, and ReduceLogSumExp
Reduce code now also handles the keepdims attribute
Also updated code to use HandleNegativeAxis function from
the providers/common.h code instead of manually calculating.
In code documentation exists to help explain complex reduction op code
Add elementwise ops to Reduction op capability code removed keepdims check
from the Reduction op capability code.
Updated the error_tolerance for LogGrad(DNNL EP only) after finding a few
instances that the tests were a little out of tolerance.
Signed-off-by: George Nash <george.nash@intel.com>
* Documentation cleanup in dnnl_qattention
Cleaned up the Comments documenting the QAttention operator
For some reason a bunch of new lines were introduced to the
comment making it harder to read.
Signed-off-by: George Nash <george.nash@intel.com>