### Description
The tensor creation code now allows the creation of boolean tensors from
non-direct `ByteBuffer` instances. It previously only allowed them from
arrays and direct `ByteBuffer` instances and this fixes that
inconsistency. The boolean tensor test has been updated to cover all
three cases.
### Motivation and Context
Fixes#15509.
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
### Description
Removing C4090 warning suppression after windows pipelines adapt vs2022
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
This PR partially reverts changes introduced in
https://github.com/microsoft/onnxruntime/pull/15643
We make two API return std::string always in UTF-8.
We also move the entry points from OrtApiBase to OrtApi to make them
versioned.
### Motivation and Context
`GetVersionString` always returns x.y.z numbers that are not subject to
internationalization.
`GetBuildInfoString` can hold international chars, but UTF-8 should be
fine to contain those.
We prefix them with u8"" in case the compiler default charset is not
UTF-8.
Furthermore, creating platform dependent APIs is discouraged.
`ORTCHAR_T` is platform dependent and was created for paths only.
On non-unix platforms would still produce `std::string` that can only
contain UTF-8
The API was introduced after the latest release, and can still be
adjusted.
### Description
This PR creates Nuget and Android for Training.
### Motivation and Context
These packages are intended to be released in ORT 1.15 to enable
On-Device Training Scenarios.
## Packaging Story for Learning On The Edge Release
### Nuget Packages:
1. New Native package -> **Microsoft.ML.OnnxRuntime.Training** (Native
package will contain binaries for: win-x86, win-x64, win-arm, win-arm64,
linux-x64, linux-arm64, android)
2. C# bindings will be added to existing package ->
**Microsoft.ML.OnnxRuntime.Managed**
### Android Package published to Maven:
1. New package for training (full build) ->
**onnxruntime-training-android-full-aar**
### Python Package published to PyPi:
1. Python bindings and offline tooling will be added to the existing ort
training package -> **onnxruntime-training**
### Description
Updating the build option for enabling training in java builds from
ENABLE_TRAINING -> ENABLE_TRAINING_APIS.
In the native codebase ENABLE_TRAINING is used for enabling full
training and ENABLE_TRAINING_APIS is used for creating the lte builds
with training apis. Making the change to sync the naming convention
across all the language bindings.
It was a bit confusing to see ENABLE_TRAINING when debugging the android
build failures for training. Making this change just to improve
readability of logs during debugging.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Allows the creation of zero length tensors via the buffer path (the
array path with zero length arrays still throws as the validation logic
to check it's not ragged would require more intrusive revision), and
allows the `tensor.getValue()` method to return a Java multidimensional
array with a zero dimension. Also added a test for the creation and
extraction behaviour.
### Motivation and Context
The Python interface can return zero length tensors (e.g. if object
detection doesn't find any objects), and before this PR in Java calling
`tensor.getValue()` throws an exception with a confusing error message.
Fixes#7270 & #15107.
- Update Gradle version used in most places from 6.8.3 to 8.0.1. Update Android Gradle Plugin version where applicable.
Not updated in this change: React Native Android projects (under `js/react_native/`). That can be done later along with updating the React Native projects.
- Add Gradle wrapper in `java/` to make it easier to consistently use a specific Gradle version.
### Description
<!-- Describe your changes. -->
I fixed some broken links in the C API documentation, but then did a
quick pass over all of the links I could find and then fixed those.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
I got some 404's when exploring the documentation and wanted to fix it.
### Description
<!-- Describe your changes. -->
Update java/build.gradle to not use deprecated features that were
removed in gradle 8.0.
Also move gradle wrapper setup from a script into a step template.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix builds which use hosted Mac agents and gradle.
Recently the system version of gradle got upgraded to 8.0. Even though
we use an older gradle wrapper version, java/build.gradle is still
processed with gradle 8.0 in the initial call to `gradle wrapper`.
* Added the OrtDnnlProviderOptions structure to expose configuration
options to the user
* The number of threads can be defined by the user with the -i flag on
the perftest
* Number of threads can also be configured via the OMP_NUM_THREADS
environment variable
* The number of threads defined in the OrtDnnlProviderOptions is
prioritized over the environment variable
### Description
Avoids thread oversubscription caused by OpenMP allocating the maximum
number of threads possible for oneDNN EP. Added support for the
OrtDnnlProviderOptions, this will allow for more EP customization
capabilities, and allows for user defined number of threads.
### Motivation and Context
- Improves performances and allows for user to fine tune the number of
threads
Fix C6011, C6385, C6386 found by Visual Studio. Basically, I set the
maximum number of options for every EP to 128. To my knowledge, 128 is
big enough to support all EPs.
For support arbitrary number of EP options, we probably need #13999 and
create a "std::vector"-like struct in C language.
Description
Add bindings for Android and iOS.
Motivation and Context
Enable mobile app linking against ort-extensions library and registering the custom ops with ORT.
**Description**:
Adds support for creating and receiving sparse tensors in the ORT Java
API.
CSRC and COO tensors as inputs are tested, but there is no op which
accepts a block sparse tensor to test. COO tensors are tested as
outputs, but there is no op which emits a CSRC or block sparse tensor to
test.
**Motivation and Context**
- Why is this change required? What problem does it solve? Request to
expose ORT sparse tensor support in Java.
cc @yuslepukhin
### Description
Update protobuf-java to version 3.21.7. This change only impact tests.
### Motivation and Context
The current version exhibits CVE-2022-3509
Previously OnnxSequence would flatten out a list of tensors into a
single output array assuming they were all scalar values. This doesn't
accurately represent the semantics of an ONNX sequence, but was what the
semantics appeared to be years ago when I first wrote that class. This
PR changes it so that the `getValue` method on `OnnxSequence` unwraps
the sequence and returns `List<? extends OnnxValue>` allowing the user
to process the individual ONNX values separately. It's done this way
rather than returning a multidimensional array for a tensor and a Java
map for a map as multidimensional arrays are very inefficient in Java
and best practice when operating with a OnnxTensor in Java is to use a
`java.nio.ByteBuffer`. So allowing users to access each `OnnxTensor`s
individually allows them to control how the data is materialised on the
Java heap.
# Motivation
Currently, ORT minimal builds use kernel def hashes to map from nodes to
kernels to execute when loading the model. As the kernel def hashes must
be known ahead of time, this works for statically registered kernels.
This works well for the CPU EP.
For this approach to work, the kernel def hashes must also be known at
ORT format model conversion time, which means the EP with statically
registered kernels must also be enabled then. This is not an issue for
the always-available CPU EP. However, we do not want to require that any
EP which statically registers kernels is always available too.
Consequently, we explore another approach to match nodes to kernels that
does not rely on kernel def hashes. An added benefit of this is the
possibility of moving away from kernel def hashes completely, which
would eliminate the maintenance burden of keeping the hashes stable.
# Approach
In a full build, ORT uses some information from the ONNX op schema to
match a node to a kernel. We want to avoid including the ONNX op schema
in a minimal build to reduce binary size. Essentially, we take the
necessary information from the ONNX op schema and make it available in a
minimal build.
We decouple the ONNX op schema from the kernel matching logic. The
kernel matching logic instead relies on per-op information which can
either be obtained from the ONNX op schema or another source.
This per-op information must be available in a minimal build when there
are no ONNX op schemas. We put it in the ORT format model.
Existing uses of kernel def hashes to look up kernels are replaced
with the updated kernel matching logic. We no longer store
kernel def hashes in the ORT format model’s session state and runtime
optimization representations. We no longer keep the logic to
generate and ensure stability of kernel def hashes.
Working on JNI refactor for OnnxTensor.
Simplifying the error handling logic in createTensor.
Collapsing casting branches and migrating to ONNX element type enum.
Disable cpplint for JNI C files.
**Description**: This fixes error handling in the JNI code in OnnxMap, OnnxSequence, OnnxRuntime, RunOptions. SessionOptions and OrtEnvironment are correct as is.
The bulk of the work will be in rewriting OnnxTensor, OnnxSparseTensor (after the merge of #10653) and OrtSession, along with the helper methods in OrtJniUtil. I plan to tackle those in separate PRs to reduce the amount of code to review.
**Motivation and Context**
- Why is this change required? What problem does it solve? The current native interop code doesn't return control to Java immediately on throwing an exception from an ORT error code, which can cause incorrect interactions with native ORT, and issues with exception propagation on the Java side.
- If it fixes an open issue, please link to the issue here. Partial work towards solving #11451.
* Specify list/map capacity when initializing where possible
- This really depends on the use case, but in some cases the array/map resizing can be slightly costly, there is effectively no downside setting the initial capacity for a collection if we know for sure its final size
* Supply list/map capacity when initializing where possible
- This really depends on the use case, but in some cases the array/map resizing can be slightly costly, there is effectively no downside setting the initial capacity for a collection if we know for sure its final size
- Introduce an extra utility to help creating maps with expected capacity
* Move utility function to OrtUtil and drop MapUtil, also add Java doc to method
* Move test to the right class
Java side parts for configuring CUDA and TensorRT.
Adding tests for CUDA and TensorRT. Refactoring library loading logic as provider options need to have their shared library loaded before they can be constructed.
* Add android package build settings for full build
Co-authored-by: gwang0000 <62914304+gwang0000@users.noreply.github.com>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
* squashed commit for standalone tvm execution provider
* critical fix for correct python build with stvm ep
* get tuning log file from ep options. It has priority over AUTOTVM_TUNING_LOG
* updates and fixes
* update parsing of stvm provider options
* add support of external data for onnx model
* add conditional dump of subgraphs
* remove unused code
* get input tensor shapes through provider options. get output shapes for fixed input ones by TVM API
* support AUTO_TVM tuning log file inside ORT. Selector for Ansor and Auto_TVM is provider option (tuning_type)
* add fp16
* add functionality of conversion of model layout to NHWC if need. Necessary parameter was added to STVM provider options
* fix license text in header. fix log format
* small fixes
* fix issues from flake8
* remove model proto construction from GetCapability
* reserve memory for vector of DLTensors
* add simple tutorial for STVM EP
* STVM docs
* jroesch/tvm -> apache/tvm
* remove dead code, unneccessary logs and comments
* fix in readme
* improve tutorial notebook
* tvm update
* update STVM_EP.md
* fix default value
* update STVM_EP.md
* some TODOs for the future development
* shorten long lines
* add hyperlink to STVM_EP.md
* fix Linux CI error
* fix error in csharp test
Co-authored-by: Jared Roesch <jroesch@octoml.ai>
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>