* Remove hand written add_.Tensor as it can now be generated.
* Generate .out for tensor version of basic math ops. Add.out testing added too.
* Remove sin tests as they are covered by parameterized tests. Also, moved all parameterized tests to the end in their own section.
* Add binary ops tests for tensors. Scalar tests are calling the aten .out which is for tensor.
* Add support for scalar input to add, div, mul, and sub.
* Apply project formatting rules to ort_aten.cpp
Formatting applied by formatting the file in VS Code.
This file is under active development and the inconsistent formatting
was causing friction due to:
1. cpplint job on Pipeline was flagging a lot of style issues,
resulting in a lot of noisy annotations.
2. local edits would result in changes that are not part of the core change.
While there are other files in this part of the source tree with
inconsistent formatting, this file was causing the most friction. We can
come back and address the other files later, which would be a much
larger change.
* Apply consistent pattern for invoker.Invoke(...)
* cpu adamwoptimizer implementation
* unit tests for cpu kernel pass
* refine based on comments
* parallize the weights loop in PrepareForCompute.
* fix wrong test data path
* fix kernel hash
* fix rocm ci pipeline
* release cached cuda memory after temp model_copy run
* op schema change only: remove PythonOp forward output from PythonOpGrad inputs.
* always export model using torch.no_grad
* 1.update PythonOP's "input_requires_grads" attribute according to ORT gradient graph.
2. remove PythonOp's "output_tensor_requires_grads" attribute because in torch.no_grad mode, the exported value is not correct.
3. [related to 2] remove PythonOPGrad's "input_tensor_requires_grads" because it comes from corresponding PythonOP's "output_tensor_requires_grads".
* fix uts
* refine basde on wschin's comments && fix pylint
* fix comments
* fix unused variable
* Sort supported types order so we get a consistently generated order of types.
* Fix promote type to include all the input types and not just the first one.
* use 3D grid to avoid the upper limit of grid dimension
* enrich tests
* Revert "use 3D grid to avoid the upper limit of grid dimension"
This reverts commit 2d5badf2fe8cd985f3f29ee2cb18fff13d07c2ab.
* change to a fix: switch the 1st and 2nd dim
This change updates the implementation or te argmax_out operator to 1)
set the output tensor correctly and 2) remove the unnecessary use of a
temporary tensor to store intermediate result of onnx ArgMax operation.
Previously, the argmax_out operator did not correctly update the out
tensor - it replaced the OrtValue instead of the memory backing the
OrtValue . To properly update the output tensor, we need to calculate
the expected shape of the out tensor.
We add the helper function calculate_reduction_shape to calculate the
shape of the reduced tensor from the input tensor, dimension to reduce,
and option to keep the reduced dimension or not. This is based on the
utility functions in aten/src/ATen/native/ReduceOpsUtils.h in the
PyTorch repository, but is tailored to be a bit more specific to our
current needs.
Notes:
We considered just directly leveraging PyTorch's utility functions (e.g.
get_reduction_shape) to calculate the shape of the reduced tensor from
aten/src/ATen/native/ReduceOpsUtils.h in the PyTorch repository, but
including this header file resulted in warnings around unused functions
that we need to handle. As we only need a limited functionality at the
moment, we instead implemented our own utility function to calculate the
reduction shape for our specific current needs. If we need a utility
function to more generally calculate the reduction shape, we could
consider switching to leveraging the utility methods in PyTorch.
* add scripts
* update docker scripts
* update build script
* create run script
* add test script
* add log 3 flags
* use the right build function
* build navi
* add clean script
* add pytorch like soln
* only build gfx 1030
* use HOST side var
* ignore logs
* update scripts
* GPU_WARP_SIZE_HOST
* update scripts
* remove scripts/amd
* match main
* add GPU_WARP_SIZE_HOST on cuda side
* match main
* correct gfx1030
* remove print
* move gfx add to rocm5.0
* remove inline
* make constexpr on cuda side
* [UPDATE] update ci to rocm5.2 + torch1.11
* [Revert] disable ort module test
* [DELETE] delete Rocm5.1.1 ci test result
* [UPDATE] update the comments
* test case for masked_select
* isolate variables per onnx_op, include line numbers for ORT errors
* format errors
* correct masked_select impl, broadcast test
* node attrs naming fixed
* Add tests for all uniary aten ops supported in eager mode
* fixing the PR draft
* fixing the merge
* changing eval to be at compile time
* adding requirements for eager
* 1.adding function to {ops}_out
2.cleaning the code
and adding comments
* editing the code according to code review
Co-authored-by: root <root@AHA-LIRONKESE-1>
Description: In the PR 12018 a few fixable python and cpp warning were introduced that this PR cleans up. Also adding a comment on the intent of test_mul_bool and out testing on test_ones.
Motivation and Context
When iterating in Python, use a list instead of a set and don't use reserved words
Fix long line in cpp
Clarify test_mul_bool intent for future developers.
fill_ implements torch.ones under the covers but in previous pr verification on the out param was not added so adding it here.
* Add utility methods for resize_output
* Eager mode: implement abs.out
This is an initial hand written implementation of an out= operator to
demonstrate how to structure out= methods using resize_out helper
methods.
This is meant to be used as a reference when we update the code
generator to generate implementations for out= operations.
Add support for PyTorch `resize_` operation. The PyTorch API method is documented
here:
https://pytorch.org/docs/stable/generated/torch.Tensor.resize_.html
Implementation notes:
There are some implementation details that might deviate from
expectations:
- As the Onnxruntime::tensor does not support resize operation, this
functionality is supported on the TensorImpl by swapping out the
backing tensor if the size changes.
- In the ORT model the shape of the TensorImpl is defined by the
backing onnxruntime::tensor, so it is not supported to have a
TensorImpl with a different shape / size than the backing
onnxruntime::tensor. This means when resizing to a smaller TensorImpl,
other implementations might keep the same backing storage, ORT will
re-allocate a new onnxruntime::tensor and copy over as many of the
existing elements that fit. Functionally, you will end up with same
output, but the underlying buffer will be re-allocated.
A future change could be to allow ORTTensorImpl to have a different
size / shape than the onnxrutime::tensor backing it, and then we
could improve this behavior.
The canonical CPU / CUDA implementations in PyTorch repository:
CPU: aten/src/ATen/native/Resize.cpp
CUDA: aten/src/ATen/native/cuda/Resize.cpp