* Fix GH issue 12151.
Need to use inverse perms for updating that axis to what is used for transposing the input. This only applies if the DQ node is doing per-axis dequantization.
**Description**: This fixes error handling in the JNI code in OnnxMap, OnnxSequence, OnnxRuntime, RunOptions. SessionOptions and OrtEnvironment are correct as is.
The bulk of the work will be in rewriting OnnxTensor, OnnxSparseTensor (after the merge of #10653) and OrtSession, along with the helper methods in OrtJniUtil. I plan to tackle those in separate PRs to reduce the amount of code to review.
**Motivation and Context**
- Why is this change required? What problem does it solve? The current native interop code doesn't return control to Java immediately on throwing an exception from an ORT error code, which can cause incorrect interactions with native ORT, and issues with exception propagation on the Java side.
- If it fixes an open issue, please link to the issue here. Partial work towards solving #11451.
* test case for masked_select
* isolate variables per onnx_op, include line numbers for ORT errors
* format errors
* correct masked_select impl, broadcast test
* node attrs naming fixed
* Add tests for all uniary aten ops supported in eager mode
* fixing the PR draft
* fixing the merge
* changing eval to be at compile time
* adding requirements for eager
* 1.adding function to {ops}_out
2.cleaning the code
and adding comments
* editing the code according to code review
Co-authored-by: root <root@AHA-LIRONKESE-1>
Description: In the PR 12018 a few fixable python and cpp warning were introduced that this PR cleans up. Also adding a comment on the intent of test_mul_bool and out testing on test_ones.
Motivation and Context
When iterating in Python, use a list instead of a set and don't use reserved words
Fix long line in cpp
Clarify test_mul_bool intent for future developers.
fill_ implements torch.ones under the covers but in previous pr verification on the out param was not added so adding it here.
* First attempt for half2 vectorized memory access in SkipLayerNorm
* Add some functions for debugging
* Clean up the code
* Clean up the code
* Generalize the vectorized kernels with aligned_vector and remove cudaDeviceProp
* Add a unit test for a larger input size
* Fix some Lint C++ warnings
* Use ILP = 4 for the vectorized kernels
* Rewrite the vectorized kernel and templatize ComputeSkipLayerNorm
* Use conditional operator for input_v
* Refactor LaunchSkipLayerNormKernel and replace the original SkipLayerNormKernelSmall with the vectorized kernel
* Clean some comments and rename the layernorm function
* Use ComputeSkipLayerNorm to replace LaunchSkipLayerNormKernel
* Resolve a Lint C++ warning
* Fix SkipLayerNormBatch1_Float16_vec output data
* Add hipified code of bert SkipLayerNorm for ROCmEP
* Resolve some Lint C++ warnings
* Resolve some Lint C++ warnings
* Resolve some Lint C++ warnings
* Resolve Python formatting issue
* First attempt for half2 vectorized memory access in SkipLayerNorm
* Add some functions for debugging
* Clean up the code
* Clean up the code
* Generalize the vectorized kernels with aligned_vector and remove cudaDeviceProp
* Add a unit test for a larger input size
* Fix some Lint C++ warnings
* Use ILP = 4 for the vectorized kernels
* Rewrite the vectorized kernel and templatize ComputeSkipLayerNorm
* Use conditional operator for input_v
* Refactor LaunchSkipLayerNormKernel and replace the original SkipLayerNormKernelSmall with the vectorized kernel
* Clean some comments and rename the layernorm function
* Use ComputeSkipLayerNorm to replace LaunchSkipLayerNormKernel
* Resolve a Lint C++ warning
* Fix SkipLayerNormBatch1_Float16_vec output data
The generated bindings causes C# build errors that require workaround code. Disabling generation should avoid the need for any workarounds.
As the user has the C# ORT package with the C# to C bindings there's no need for binding generation that calls the ORT Java API (which is C# -> Java ->C).
* Add op tuning functionality and example for vector add.
* Add namespace.
* Various improvements.
* use unique pointer
* fix lint errors
* Check return error.
* Add utility methods for resize_output
* Eager mode: implement abs.out
This is an initial hand written implementation of an out= operator to
demonstrate how to structure out= methods using resize_out helper
methods.
This is meant to be used as a reference when we update the code
generator to generate implementations for out= operations.
* Update C# runtest.sh for opset 17
Should have been part of https://github.com/microsoft/onnxruntime/pull/11924
* get appropriate opset version from onnx doc
* use absolute rather than relative path
* fix typo in var name
* Add net6 targets.
Remove maccatalyst as we don't have a native build targetting that.
* Set platform in macos targets
* Add targetFramework entries
* Move NativeLib.DllName definition and set using preprocessor values for simplicity. Couldn't get it to build with the preprocessor based setup when it was in a separate file.
Update the nuspec generation to set platform version for .net6 targets. TODO: Validate versions. I copied them from the managed nuget package the packaging pipeline generated prior to adding targets. Possibly w could/should lower some of the versions.
Hopefully the need to specify a version goes away when the release version of VS2022 supports .net6.
* Try android 31.1 as https://github.com/actions/virtual-environments/blob/main/images/win/Windows2022-Readme.md suggests that should be available on the CI machines
* Fix patch version mismatch
Add some extra debug info in case it helps
* Debug nuget location in CI
* Add workspace entry back in
* Add steps
* One more attempt with hardcoded nuget.exe path and original android31.0 version
* Better fix - found explicit nuget download and updated version there.
* flake8 fixes
* Fix black complaints.
* Exit Microsoft_ML_OnnxRuntime_CheckPrerequisites for net6 iOS.
* Removed outdated comment
Add support for PyTorch `resize_` operation. The PyTorch API method is documented
here:
https://pytorch.org/docs/stable/generated/torch.Tensor.resize_.html
Implementation notes:
There are some implementation details that might deviate from
expectations:
- As the Onnxruntime::tensor does not support resize operation, this
functionality is supported on the TensorImpl by swapping out the
backing tensor if the size changes.
- In the ORT model the shape of the TensorImpl is defined by the
backing onnxruntime::tensor, so it is not supported to have a
TensorImpl with a different shape / size than the backing
onnxruntime::tensor. This means when resizing to a smaller TensorImpl,
other implementations might keep the same backing storage, ORT will
re-allocate a new onnxruntime::tensor and copy over as many of the
existing elements that fit. Functionally, you will end up with same
output, but the underlying buffer will be re-allocated.
A future change could be to allow ORTTensorImpl to have a different
size / shape than the onnxrutime::tensor backing it, and then we
could improve this behavior.
The canonical CPU / CUDA implementations in PyTorch repository:
CPU: aten/src/ATen/native/Resize.cpp
CUDA: aten/src/ATen/native/cuda/Resize.cpp
* Add FastGelu to kernel explorer for profiling.
* fix python lint errors
* Fix one more python lint error
* Delete white space (python lint)
* Various improvements.
* Update README.md
* refactor header files