* Make ORT as Pytorch JIT backend
LORT likely doesn't work with aten fallback so we only test LORT in its own CI.
* Revert changes to enable external CUDA allocator. Will add it later.
Revert "Revert changes to enable external CUDA allocator. Will add it later."
This reverts commit d5487f2e193014c805505afae8fb577c53667658.
Fix external allocator
* Relax tolerance and remove commented code
* Print more information in CI
* Fix pointer
* Address comments.
1. Reuse ORT-eager mode's environment.
2. Remove unused ctor.
* Use Pytorch master branch as all PRs are merged
Fix
* Refine based on cpplint feedbacks
* Revert changes to allow custom CUDA allocator in public APIs
* Use torch.testing.assert_close
* Use unittest framework
* Switch docker repo
* Rename *.cpp to *.cc
* Address comments
* Add comment
* Use same pipeline file for eager and lort pipelines
* Address comments
* Add yaml comment
* Fix cmake files
* Address comments
* Rename flags, remove printing code, remove dead comment
* Remove ostream operator<< definitions for TensorShapeProto and TensorProto as they clash with ONNX definitions in onnx/defs/printer.h/cc.
Currently printer.h (unnecessarily) pulls in a number of other ONNX headers which causes naming clashes with parts of ORT. It is also excluded in a minimal build.
Instead convert the onnx::TensorShapeProto to onnxruntime::TensorShape so we use the existing ostream operator<< for TensorShape.
Make GetTensorShapeFromTensorProto consistent with GetTensorShapeFromTensorShapeProto so both return a TensorShape (as the name implies).
* use std::variant for synthetic data storage.
* use std::variant to replace TypedCheckpointProperty
* Remvoe shared ptr for checkpoint property
* fix tests
* refine std::variant usage a bit
* remove CheckpointProperty data abstraction
* use InlinedVector and InlinedHashMap if possible
* fix comments
* fix build and test
* fix some comments
* use gsl::span
* fix tests
* refine based on comments
* fix win build
* fix build
* enable PythonOp by default when --enable_training_torch_interop is enabled during build
* clean up
* fix
* fix comment
* fix
* fix tests
* fix fallback test
* pylint format
* refine based on comments
* Load checkpoint in cpp
* removed unused imports
* throw error on invalid name and change function name
* inplace model assignment, change name and other comments resolved
* name change on import
* Addded unit test, resolved comments
* remove unused imports
* resolved comments
* refactoring too reduce memoory allocation
* resolved extra comments
* changed files hierarchy an force added onnx moodel
* solved order of function argument
* used gtest macros on test cases
Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
* make memory profiler work with multiple session runs.
(cherry picked from commit 5b636b4dd6fe91b75c063696dc73eda33ec36c8d)
* minor fix
* fix build
* fix window build
* 1. fix cpplint issues;
2. give unique filesname for each session profiler result.
* Remove hand written add_.Tensor as it can now be generated.
* Generate .out for tensor version of basic math ops. Add.out testing added too.
* Remove sin tests as they are covered by parameterized tests. Also, moved all parameterized tests to the end in their own section.
* Add binary ops tests for tensors. Scalar tests are calling the aten .out which is for tensor.
* Add support for scalar input to add, div, mul, and sub.
* Apply project formatting rules to ort_aten.cpp
Formatting applied by formatting the file in VS Code.
This file is under active development and the inconsistent formatting
was causing friction due to:
1. cpplint job on Pipeline was flagging a lot of style issues,
resulting in a lot of noisy annotations.
2. local edits would result in changes that are not part of the core change.
While there are other files in this part of the source tree with
inconsistent formatting, this file was causing the most friction. We can
come back and address the other files later, which would be a much
larger change.
* Apply consistent pattern for invoker.Invoke(...)
* cpu adamwoptimizer implementation
* unit tests for cpu kernel pass
* refine based on comments
* parallize the weights loop in PrepareForCompute.
* fix wrong test data path
* fix kernel hash
* fix rocm ci pipeline
* release cached cuda memory after temp model_copy run
* op schema change only: remove PythonOp forward output from PythonOpGrad inputs.
* always export model using torch.no_grad
* 1.update PythonOP's "input_requires_grads" attribute according to ORT gradient graph.
2. remove PythonOp's "output_tensor_requires_grads" attribute because in torch.no_grad mode, the exported value is not correct.
3. [related to 2] remove PythonOPGrad's "input_tensor_requires_grads" because it comes from corresponding PythonOP's "output_tensor_requires_grads".
* fix uts
* refine basde on wschin's comments && fix pylint
* fix comments
* fix unused variable
* Sort supported types order so we get a consistently generated order of types.
* Fix promote type to include all the input types and not just the first one.
* use 3D grid to avoid the upper limit of grid dimension
* enrich tests
* Revert "use 3D grid to avoid the upper limit of grid dimension"
This reverts commit 2d5badf2fe8cd985f3f29ee2cb18fff13d07c2ab.
* change to a fix: switch the 1st and 2nd dim
This change updates the implementation or te argmax_out operator to 1)
set the output tensor correctly and 2) remove the unnecessary use of a
temporary tensor to store intermediate result of onnx ArgMax operation.
Previously, the argmax_out operator did not correctly update the out
tensor - it replaced the OrtValue instead of the memory backing the
OrtValue . To properly update the output tensor, we need to calculate
the expected shape of the out tensor.
We add the helper function calculate_reduction_shape to calculate the
shape of the reduced tensor from the input tensor, dimension to reduce,
and option to keep the reduced dimension or not. This is based on the
utility functions in aten/src/ATen/native/ReduceOpsUtils.h in the
PyTorch repository, but is tailored to be a bit more specific to our
current needs.
Notes:
We considered just directly leveraging PyTorch's utility functions (e.g.
get_reduction_shape) to calculate the shape of the reduced tensor from
aten/src/ATen/native/ReduceOpsUtils.h in the PyTorch repository, but
including this header file resulted in warnings around unused functions
that we need to handle. As we only need a limited functionality at the
moment, we instead implemented our own utility function to calculate the
reduction shape for our specific current needs. If we need a utility
function to more generally calculate the reduction shape, we could
consider switching to leveraging the utility methods in PyTorch.