Commit graph

166 commits

Author SHA1 Message Date
Adrian Lizarraga
3b4c7df4e9
[QNN EP] Make QNN EP a shared library (#23120)
### Description
- Makes QNN EP a shared library **by default** when building with
`--use_qnn` or `--use_qnn shared_lib`. Generates the following build
artifacts:
- **Windows**: `onnxruntime_providers_qnn.dll` and
`onnxruntime_providers_shared.dll`
- **Linux**: `libonnxruntime_providers_qnn.so` and
`libonnxruntime_providers_shared.so`
  - **Android**: Not supported. Must build QNN EP as a static library.
- Allows QNN EP to still be built as a static library with `--use_qnn
static_lib`. This is primarily for the Android QNN AAR package.
- Unit tests run for both the static and shared QNN EP builds.

### Detailed changes
- Updates Java bindings to support both shared and static QNN EP builds.
- Provider bridge API:
- Adds logging sink ETW to the provider bridge. Allows EPs to register
ETW callbacks for ORT logging.
- Adds a variety of methods for onnxruntime objects that are needed by
QNN EP.
- QNN EP:
- Adds `ort_api.h` and `ort_api.cc` that encapsulates the API provided
by ORT in a manner that allows the EP to be built as either a shared or
static library.
- Adds custom function to transpose weights for Conv and Gemm (instead
of adding util to provider bridge API).
- Adds custom function to quantize data for LeakyRelu (instead of adding
util to provider bridge API).
  - Adds custom ETW tracing for QNN profiling events:
    - shared library: defines its own TraceLogging provider handle
- static library: uses ORT's TraceLogging provider handle and existing
telemetry provider.
- ORT-QNN Packages:
- **Python**: Pipelines build QNN EP as a shared library by default.
User can build a local python wheel with QNN EP as a static library by
passing `--use_qnn static_lib`.
- **NuGet**: Pipelines build QNN EP as a shared library by default.
`build.py` currently enforces QNN EP to be built as a shared library.
Can add support for building a QNN NuGet package with static later if
deemed necessary.
- **Android**: Pipelines build QNN EP as a **static library**.
`build.py` enforces QNN EP to be built as a static library. Packaging
multiple shared libraries into an Android AAR package is not currently
supported due to the added need to also distribute a shared libcpp.so
library.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2025-01-22 12:11:00 -08:00
Changming Sun
5d7030e4c6
Revert DML pipeline changes (#23135)
### Description
Previously we wanted to add DirectML EP to existing onnxruntime Windows
CUDA packages. After careful consideration, we will postpone the change.
This PR reverts some pipeline changes previously made by @mszhanyi and
@jchen351 .
2024-12-18 10:42:10 -08:00
Jian Chen
9ed0c7fe26
Redo "Update Gradle version 8.7 and java version 17 within onnxruntime/java" (#22923)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-12-02 18:34:25 -08:00
wejoncy
c284a686f2
[CoreML] Create EP by AppendExecutionProvider (#22675)
### Description
AppendExecutionProvider("CoreML", {{"MLComputeUnits","MLProgram"}})



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Scott McKay <skottmckay@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-11-27 09:26:31 +08:00
Yi Zhang
85751e7276
Build DML in Windows GPU CI pipeline (#22869)
### Description
Add a new stage to build cuda and dml in Windows GPU CI pipeline (PR
checks) to prevent regressions introduced by new cuda tests.
Update all tests in cuda/testcases name prefix to CudaEp for skipping
them easily

### Motivation and Context
1. CudaNhwcEP is added by default when using cuda ep
2. if onnxruntime_ENABLE_CUDA_EP_INTERNAL_TES is enable, the tests in
tests/provider/cuda/testcases is added too.

### To do
add enable_pybind in the new stage.
Now, --enable_pybind will trigger some python test, like
onnxruntime_test_python.py.
It uses the API of get_avaible_providers() .
More discussions are needed to decide how to make it works
2024-11-25 10:50:52 +08:00
Yi Zhang
a28246a994
Revert "Update Gradle version 8.7 and java version 17 within onnxrunt… (#22914)
…ime/java (#22771)"

This reverts commit 632a36a233.

### Description
<!-- Describe your changes. -->



### Motivation and Context
Run E2E tests using Browserstack failed due to this PR.
2024-11-21 18:12:28 +08:00
Jian Chen
632a36a233
Update Gradle version 8.7 and java version 17 within onnxruntime/java (#22771)
### Description
This change is to update the Gradle version within java project to 8.7,
it also upgrades the JAVA to 17. Gradle version from react-native was
also updated to 7.5 to make it compatible with changes from the Java
directory. However, the target java version remains the same. Java
version from these will be upgraded in a separated PR.

This is spited from #22206

### Motivation and Context
This is the first step to upgrade the react native version.
2024-11-14 17:10:44 -08:00
sheetalarkadam
e8f1d73b0b
Add Android QNN Browserstack test (#22434)
Add Android QNN Browserstack test



### Motivation and Context
Real device test in CI
2024-11-10 16:10:29 -08:00
Yi Zhang
8e8b62b8b5
Build CUDA and DML together (#22602)
### Description
Now, we need to build cuda and dml in one package.
But CUDA EP and DML EP can't run in one process.
It will throw the exception of `the GPU device instance has been
suspended`
So the issue is CUDA EP and DML EP coexist in compile time but can't
exist in run time.

This PR is to split cuda ep test and dml ep test in all unit tests.
The solution is to use 2 environment variable, NO_CUDA_TEST and
NO_DML_TEST, in CI.

For example, if NO_CUDA_TEST is set, the DefaultCudaExecutionProvider
will be nullptr, and the test will not run with CUDA EP.
In debugging, the CUDAExecutionProvider will not be called. 
I think, as long as cuda functions, like cudaSetDevice, are not called,
DML EP tests can pass.

Disabled java test of testDIrectML because it doesn't work now even
without CUDA EP.
2024-10-31 15:51:13 -07:00
wejoncy
20a45dd67b
[CoreML ML Program] support acclerators selector (#22383)
### Description
For no, CoreML only support run mlmodels on CPU/ALL, However, sometimes
CPU_GPU would be faster a lot.

We support the option to select different hardware to boost performance
in this PR.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2024-10-15 11:50:11 +08:00
sheetalarkadam
c06ecd415c
RC releases to Maven for Android (#22391)
### Description
Aallows alpha, beta and rc version releases to Maven for Android
artifacts.



### Motivation and Context
Helpful to release rc versions or test artifacts to Maven for testing.
For example, a new QNN android package is being released and it will be
nice to test the RC version for dependencies before release

## Future Work
Allow RC version for all Maven artifacts.
2024-10-11 08:58:02 -07:00
sheetalarkadam
dd2ea8469e
Add qnn android package (#22296)
### Description
Pre built QNN Android package


### Future Work
1. Setting up CI with Browserstack- onnxruntime_tests and Android test
2. ESRP Release to Maven
2024-10-10 10:37:22 -07:00
Yulong Wang
c5d28cac4d
Initial WebGPU EP checkin (#22318)
### Description

This change introduces the WebGPU EP into ONNX Runtime.

To make the PR as simple as possible, this PR excluded the following:
- C API changes for WebGPU EP
- actual implementation of WebGPU EP. Currently in this PR, WebGPU is a
stub implementation that does not register any kernel.
- Python IO Binding update
- Node.js IO Binding update

This PR now contains only 43 file changes (while the working branch
contains 130+) and hopefully this makes it easier to review.

There is going to be separated PRs for each mentioned above.

Current working branch: #21904
2024-10-08 16:10:46 -07:00
jingyanwangms
5036e63d58
Increanse TensorRT tolerance from default 1e-5 to 1e-3 after TRT 10.4 (#22321)
### Description
Increanse TensorRT tolerance from default 1e-5 to 1e-3 after TRT 10.4

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-10-05 18:06:06 +10:00
Adam Pocock
14d1bfc34b
[java] Multi-LoRA support (#22280)
### Description
Java parts of Multi-LoRA support - #22046.

### Motivation and Context
API equivalence with Python & C#.

---------

Co-authored-by: Dmitri Smirnov <dmitrism@microsoft.com>
2024-10-01 13:54:37 -07:00
Edward Chen
c24e55b1f1
[Java] Add API for appending QNN EP (#22208)
- Add Java API for appending QNN EP
- Update Java unit test setup
  - Fix issues with setting system properties for tests
  - Unify Windows/non-Windows setup to simplify
2024-10-01 10:18:04 -07:00
Adam Pocock
cfa45df6b5
[java] Migrate OnnxTensors created from arrays over to a backing Java buffer (#18556)
### Description
Following from #16578 and #16835 this migrates over
`OnnxTensor.createTensor(<array>)` to first instantiate a
`java.nio.Buffer` and then copy the array into that buffer in Java
before creating the tensor. It also changes the `OnnxTensor.getValue()`
method which returns a multidimensional array so it does the array
construction and value copy in Java. This allows the removal of some
unpleasant recursive C code which repeatedly calls into the JVM to
traverse Java's arrays. The equivalent Java code is still unpleasant and
recursive, but it's easier to reason about and memory safe. As a bonus,
more `OnnxTensor`s are now backed by buffers which allow users to pin
memory and reduce allocations by reusing them for same sized inputs.

Some of the JNI code which parses Java arrays still exists as it's used
by `OnnxMap`, removing that will be the target of a future refactor.
Strings are still processed in JNI as it is easier to work with String
tensors and UTF-8 arrays in C.

### Motivation and Context
Minimizing the amount of JNI code makes it easier to maintain and using
buffers in preference to arrays allows for fewer allocations.
2024-09-24 15:36:52 +10:00
Adam Pocock
6d7235ba5a
[Java] Exposing SessionOptions.SetDeterministicCompute (#18998)
### Description
Exposes `SetDeterministicCompute` in Java, added to the C API by #18944.

### Motivation and Context
Parity between C and Java APIs.
2024-09-16 11:55:38 +10:00
Adam Pocock
02e00dc023
[java] Adding ability to load a model from a memory mapped byte buffer (#20062)
### Description
Adds support for constructing an `OrtSession` from a
`java.nio.ByteBuffer`. These buffers can be memory mapped from files
which means there doesn't need to be copies of the model protobuf held
in Java, reducing peak memory usage during session construction.

### Motivation and Context
Reduces memory usage on model construction by not requiring as many
copies on the Java side. Should help with #19599.
2024-09-16 08:31:55 +10:00
Michael Tyler
904b850b44
Update Arm Compute Library Execution Provider (#22032)
### Description
This PR makes the following updates to the Arm Compute Library execution
provider:

- Target Arm Compute Library 24.07  
- Add support for the following operators: 
  - Conv (FP16) 
  - NhwcConv 
  - QLinearConv 
  - MatMul 
  - FusedMatMul 
  - MatMulIntegerToFloat 
- Optimize memory usage and performance
- Expose the enable_fast_math setting 
- Use the main runtime thread pool 



### Motivation and Context
These updates improve performance and memory usage, and enable use of a
more recent version of Arm Compute Library.

@microsoft-github-policy-service agree company="Arm Ltd"

---------

Signed-off-by: Michael Tyler <michael.tyler@arm.com>
2024-09-12 20:51:59 -07:00
Adam Pocock
22437b581b
[java] Fix for OnnxTensor creation when passing in a ByteBuffer containing elements of a different type (#21774)
### Description
Fixes a bug where the buffer offset and position was incorrectly
computed if the user supplied a `ByteBuffer` to `createTensor` but set
the type of the tensor to something other than `INT8`. This would be
more common if the user was trying to load the initializers from a
serialized representation and didn't want to bother with the type
information (which is the case in #21321).

### Motivation and Context
Partial fix for #21321. The remainder of the fix is to add a helper
which allows users to load initializers out of an `onnx_data` file, but
that will require adding protobuf as a dependency for the Java API to
allow the parsing of an ONNX file separately from the native code. It
might be nicer to put that functionality into ORT's C API so it can
return the lengths & offsets of the initializers when provided with an
ONNX file containing external initializers. We hit this kind of thing in
Java more often than other languages as in Java models can be supplied
as classpath resources which we can easily read, but not materialize on
disk for the ORT native library to read.
2024-09-13 12:38:17 +10:00
mindest
5b9369e93c
Fix typos according to reviewdog report. (#21335)
### Description
Fix typos based on reviewdog report but with some
exceptions/corrections.
2024-07-22 13:37:32 -07:00
Edward Chen
9c2b85ad58
Fix Android build on Windows (#21304)
- Pass a list of files instead of path separator-delimited string to project.files(). See this issue: https://github.com/gradle/gradle/issues/19817
- Check for host (instead of target) being Windows when using fallback patch program.
2024-07-15 12:29:02 -07:00
Jian Chen
f81c0ec32a
Remove warning suppression from Java Packaging pipeline. (#21010)
### Description
Remove warning suppression from Java Packaging pipeline.


### Motivation and Context
We want the CI step not to produce warning.
2024-06-24 16:46:21 -07:00
Edward Chen
981893c318
Remove deprecated "mobile" packages (#20941)
# Description

This PR removes the building of the ORT "mobile" packages and much of the associated infrastructure which is no longer needed.

Not removed yet - tools/ci_build/github/android/mobile_package.required_operators.config and the helper scripts that depend on it.

# Motivation and Context

The mobile packages were deprecated in 1.18. Users should use the full packages (Android - onnxruntime-android, iOS - onnxruntime-c/onnxruntime-objc) instead or do a custom build.
2024-06-07 16:20:32 -05:00
Jian Chen
d32adb26f2
Refactor deprecated gradle syntax (#20922)
To replaced deprecated API. 
Should verify with the `Gradle cmakeCheck` step from
`Windows_Packaging_CPU_x64_default` stage from the Zip-Nuge-...
pipeline.
2024-06-07 11:08:52 -07:00
Jian Chen
228713f635
adding publishing stage to publish java CUDA 12 pkg to ado (#20834) 2024-05-29 16:24:23 -07:00
Adam Pocock
a36692066d
[java] CUDA & TensorRT options fix (#20549)
### Description
I misunderstood how UpdateCUDAProviderOptions and
UpdateTensorRTProviderOptions work in the C API, I had assumed that they
updated the options struct, however they re-initialize the struct to the
defaults then only apply the values in the update. I've rewritten the
Java bindings for those classes so that they aggregate all the updates
and apply them in one go. I also updated the C API documentation to note
that these classes have this behaviour. I've not checked if any of the
other providers with an options struct have this behaviour, we only
expose CUDA and TensorRT's options in Java.

There's a small unrelated update to add a private constructor to the
Fp16Conversions classes to remove a documentation warning (they
shouldn't be instantiated anyway as they are utility classes containing
static methods).

### Motivation and Context
Fixes #20544.
2024-05-05 00:16:55 -07:00
Adam Pocock
262b6bd3b7
[java][DML EP] Modifying dml_provider_factory.h so it can compile as a C header file (#20157)
### Description
The dml_provider_factory header file can't be used in C programs as it
defines C++ inline operators. This PR rearranges that header file so
that it looks like valid C when used from C, and also makes a couple of
small modifications to the Java code so it correctly binds to the DML EP
at build time.

I'm having some difficulty testing it as I think it's pulling in the old
version of DirectML on my computer and I can't figure out what the
library loading path is in Java to make it look at the recent version I
downloaded. So the test I added fails with:

```
InferenceTest > testDirectML() FAILED
    ai.onnxruntime.OrtException: Error code - ORT_RUNTIME_EXCEPTION - message: Exception during initialization: <path-to-ort>\onnxruntime\core\providers\dml\DmlExecutionProvider\src\AbiCustomRegistry.cpp(518)\onnxruntime.dll!00007FFF74819333: (caller: 00007FFF74793509) Exception(3) tid(4f58) 80070057 The parameter is incorrect.
        at app//ai.onnxruntime.OrtSession.createSession(Native Method)
        at app//ai.onnxruntime.OrtSession.<init>(OrtSession.java:74)
        at app//ai.onnxruntime.OrtEnvironment.createSession(OrtEnvironment.java:236)
        at app//ai.onnxruntime.OrtEnvironment.createSession(OrtEnvironment.java:221)
        at app//ai.onnxruntime.InferenceTest.openSessionSqueezeNet(InferenceTest.java:1961)
        at app//ai.onnxruntime.InferenceTest.runProvider(InferenceTest.java:665)
        at app//ai.onnxruntime.InferenceTest.testDirectML(InferenceTest.java:657)
```

But it does correctly compile, and this error seems very similar to
other issues with the DML provider when it doesn't like a model due to
the loaded library being old. The test is using the squeezenet file
that's been in the repo since 2019. If someone can help me figure out
how to get the right version of DML in the library path I can test it
more on my end. I tried adding the folder with the new version into the
system path, but I'm not very familiar with Windows' library loading
behaviour.

### Motivation and Context
Fixes #19656 to allow use of the DirectML EP from ORT Java.

cc @martinb35
2024-04-01 21:58:50 -07:00
Adam Pocock
2f82400b13
[java] Java 21 build support (#19876)
### Description
Bump spotless and the Gradle wrapper to 6.25.0 and 8.6 respectively to
allow compiling ORT on Java 21. The build still targets Java 8.

I'm not sure if there will be CI changes necessary to use this PR,
specifically for the Gradle version as I don't know if that is cached
somewhere earlier in the CI build process.

The new Gradle version adds a warning that using `--source` and
`--target` to select the Java language version is obsolete which is
annoying, we can fix it if we decide to only allow building on newer
versions of Java, while still supporting running on Java 8.

### Motivation and Context
Java 21 is the latest LTS release of Java and ORT should be able to
build on it.
2024-03-28 15:51:22 -07:00
Adam Pocock
e5ce81ae84
[java] Adding ML program flag for CoreML (#19551)
### Description
Adds the new CoreML enum flags to enable ML Program support in Java.

### Motivation and Context
Adds support for #19347 to the Java API.
2024-02-21 12:24:41 -08:00
Tianlei Wu
fbff99a432
Change Jave Test Threshold (#19508)
### Description
Increase the threshold to 1e-5 to avoid test failed in CUDA when
difference is slightly larger than 1e-6.
May because TF32 is used in those CUDA tests.

### Motivation and Context


https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1291322&view=logs&j=f2f63060-d9d6-52d0-adee-b97db5a9ab91&t=28e21ca6-87a4-5e1e-0441-72b5e8326f2d

ProviderOptionsTest > testCUDAOptions() FAILED
org.opentest4j.AssertionFailedError: array contents differ at index
[103], expected: <0.0102678> but was: <0.010266338>
at
app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
at
app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
at
app//org.junit.jupiter.api.AssertArrayEquals.failArraysNotEqual(AssertArrayEquals.java:440)
at
app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:290)
at
app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:123)
at
app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:119)
at
app//org.junit.jupiter.api.Assertions.assertArrayEquals(Assertions.java:1360)
at
app//ai.onnxruntime.providers.ProviderOptionsTest.runProvider(ProviderOptionsTest.java:99)
at
app//ai.onnxruntime.providers.ProviderOptionsTest.testCUDAOptions(ProviderOptionsTest.java:43)
        

https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1293200&view=logs&jobId=f2f63060-d9d6-52d0-adee-b97db5a9ab91&j=f2f63060-d9d6-52d0-adee-b97db5a9ab91&t=28e21ca6-87a4-5e1e-0441-72b5e8326f2d
        
InferenceTest > testCUDA() FAILED
org.opentest4j.AssertionFailedError: array contents differ at index
[103], expected: <0.0102678> but was: <0.010266337>
at
app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
at
app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
at
app//org.junit.jupiter.api.AssertArrayEquals.failArraysNotEqual(AssertArrayEquals.java:440)
at
app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:290)
at
app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:123)
at
app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:119)
at
app//org.junit.jupiter.api.Assertions.assertArrayEquals(Assertions.java:1360)
at app//ai.onnxruntime.InferenceTest.runProvider(InferenceTest.java:676)
at app//ai.onnxruntime.InferenceTest.testCUDA(InferenceTest.java:615)
2024-02-14 10:08:46 -08:00
Changming Sun
a28abeb241
Change "#ifdef WIN32" to "#ifdef _WIN32" (#19254)
### Description
`_WIN32` is a standard macro listed at
https://learn.microsoft.com/en-us/cpp/preprocessor/predefined-macros?view=msvc-170
. But `WIN32` is not.
2024-01-24 14:35:44 -08:00
Heflin Stephen Raj
0ea48fc73e
Modified the condition to load the optimiser model (#18891) 2024-01-23 10:10:54 -08:00
Adam Pocock
191525301f
[java] Updating TensorInfo so it contains the named dimensions (#18962)
### Description
The Java `TensorInfo` object which is used to describe a tensor's shape,
along with the input and output placeholders for a model couldn't show
any symbolic/named dimensions in that tensor. Now this information is
stored in Java strings on construction and included in the toString.

### Motivation and Context
Setting symbolic dimensions required external information in Java, the
names were not discoverable from within the API.
2024-01-15 14:42:50 -08:00
Adam Pocock
71657d1eb8
[java] Fix double close (#19133)
### Description
The `OnnxValue` and `OrtProviderOptions` implementations now check to
see if they've been closed before accessing the native pointer, and also
before close is called.

### Motivation and Context
Before they could be closed twice which SIGSEGV'd the JVM. Fixes #19125.
2024-01-14 14:53:26 -08:00
Adam Pocock
3456831413
[java] Make the backing byte buffer in an OrtValue accessible (#16578)
### Description
Adds a method to access the backing direct byte buffer from a Java
`OnnxTensor` object, assuming it is backed by a direct byte buffer
(tensors created by ORT's run call or ones created in Java from
multidimensional arrays are not). Also adds a method to check if the
backing byte buffer was copied from the user's buffer supplied on
creation (this could be tested via a pointer comparison from the output
of `getBufferRef` and the user's input buffer, so I'm not sure if it's
necessary).

### Motivation and Context
This is the first part of changes necessary to support output pinning in
Java OrtSession.run/OrtTrainingSession.run calls. I split it out from
the rest of the work as it's useful by itself (e.g. to allow users to
keep a single input tensor and rewrite it each time with new inputs
rather than allocate a fresh one) and the other change will be much more
involved so splitting it makes it easier to review.

cc @yuslepukhin
2023-10-17 10:03:49 -07:00
Chi Lo
569876fb16
[TensorRT EP] Refactor OrtTensorRTProviderOptions initialization and make it easy to add new field (#17617)
Two major modifications of this PR:

1. Refactor OrtTensorRTProviderOptions initialization and make it easy
to add new field.
2. Make Python API capable of using TensorRT plugins by adding new
Python binding api `register_tensorrt_plugins_as_custom_ops`. (It needs
to register ep's custom op domain before model load. For C++ API, it's
slightly different, when calling
SessionOptionsAppendExecutionProvider_TensorRT_XX, it appends cutom op
domain to session option. Later ORT can register custom op domain from
session option before model loading)
2023-10-06 14:12:20 -07:00
Adam Pocock
522cc968e8
[java] Filling out the javadoc for the float8 types (#17694) 2023-09-27 10:52:11 -07:00
Adam Pocock
aed43f429a
[java] Enable output pinning in OrtSession and OrtTrainingSession (#16835) 2023-09-26 01:49:13 -07:00
Adam Pocock
03c3e91b0d
[java] Relaxing CoreML test (#16777)
### Description
Reduces precision on the CoreML provider test as it returns slightly
different answers than the other tested providers. Checked on a 2020 13"
M1 MBP.

### Motivation and Context
Fixes Java CoreML test failure after #16763.
2023-08-09 11:43:05 -07:00
Adam Pocock
340f4ded73
[java] Fills out the javadoc so there are no more documentation warnings (#16776)
### Description
Adds javadoc for all protected and public members, methods and classes.

### Motivation and Context
The javadoc warnings were annoying me when running the builds. Also,
those types should have been documented.

---------

Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-07-27 16:17:03 +10:00
Adam Pocock
a1bb670536
[java] Fp16 fix for android/react native (#16832)
### Description
This PR splits out the FP16 conversions into a separate package we can
override in the android build with a version which works on old versions
of Android.

I'm not sure the android build system changes are correct as I haven't
got an android build environment configured on my workstation.
@YUNQIUGUO if the CI build fails we should follow up offline to get my
environment configured so I can iterate on it.

### Motivation and Context
Fixes the CI failure after #16703.
2023-07-25 12:31:32 -07:00
Adam Pocock
a8e776b78b
[java] Adds support for fp16 and bf16 tensors (#16703)
### Description
The Java API currently only supports fp16 output tensors which it
automatically casts to floats on the way out. This PR adds support for
creating fp16 and bf16 tensors (from `java.nio.Buffer` objects or as the
output of models, creation from Java short arrays is not supported),
along with efficient methods for casting `FloatBuffer` into
`ShortBuffer` filled with fp16 or bf16 values and vice versa.

The fp16 conversions use a trick to pull in the efficient conversion
methods added to Java 20, falling back to ports of the MLAS methods
otherwise. The Java 20 methods can be special cased by the C2 JIT
compiler to emit the single instruction on x86 and ARM which converts
fp32<->fp16, or the vectorized versions thereof, so they should be quite
a bit faster than the MLAS ported one.

### Motivation and Context
fp16 and bf16 are increasingly popular formats and we've had several
requests for this functionality. Fixes #7003.

cc @yuslepukhin  @cassiebreviu

---------

Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-07-21 21:14:41 +10:00
Adam Pocock
ba91457183
[java] Adding addExternalInitializers and addInitializer to OrtSession.SessionOptions (#16198)
### Description
Adds support for adding external initializers or overriding initializers
to a session options from Java.

### Motivation and Context
We want to instantiate large models from Java without filesystem access.

cc @yuslepukhin
2023-07-05 12:51:59 -07:00
Adam Pocock
13cc6192e5
[java] Adding native library loader to SessionOptions and RunOptions static init (#16435)
### Description
Unlike most ORT classes `SessionOptions` and `RunOptions` don't trigger
native library loading of the JNI binding and ORT when the classes are
initialized (after class loading). This was initially because I thought
that loading an inner class would trigger the static initialization of
the outer class, but this is not true. So if you create a
`SessionOptions` instance before referencing `OrtEnvironment` then you
won't trigger library loading and you'll get an error saying it couldn't
link the native method that creates a `SessionOptions` object.

Note this doesn't prevent users from creating a `SessionOptions` and
modifying it before the `OrtEnvironment` is created, which can still
cause issues. It would be a breaking API change to modify the
`SessionOptions` constructor to take an environment, and it wouldn't
mirror the way it works in the C API which requires this by convention
rather than API design, but we can discuss making that modification
later.

### Motivation and Context
Reduces the occurrence of mysterious Java library loading errors. Helps
with #16434.
2023-07-03 15:59:03 -07:00
Baiju Meswani
10ba1e270c
Minimal Build for On-Device Training (#16326)
🛠️ __Changes in this pull request:__

This pull request introduces two significant changes to the project:

- Changing on device training checkpoint format: The current
implementation stores the on device training checkpoint as a sequence of
tensors in multiple files inside a checkpoint folder, which can be
inefficient in terms of storage and performance. In this PR, I have
modified the checkpoint format to utilize the flatbuffer table to save
the checkpoint to a single file, providing a more compact and efficient
representation. The changes around this are twofold:
- Add the checkpoint flatbuffer schema that will generate the necessary
checkpoint source files.
- Update the checkpoint saving and loading functionality to use the new
format.

- Adding support for onnxruntime minimal build: To support scenarios
where binary size is a constraint, I made changes to ensure that the
training build can work well with the minimal build.

🔍 __Open Issues:__
- In order to extract the optimizer type, the existing implementation
re-loaded the onnx optimizer model and parsed it. This is no longer
possible, since the model format can either be onnx or ort. One idea is
to do the same for ort format optimizer model. This needs some
investigation.
- Changes to the offline tooling to generate ort format training
artifacts.
- End-to-end training example showcasing the use of the minimal training
build.
- Add support for export model for inferencing in a minimal build.
2023-06-22 12:27:23 -07:00
Adam Pocock
bca49d62a0
Fixing CoreML in Java (#16231)
### Description
The name of the flag we set when compiling the JNI binding to enable the CoreML EP changed at some point in the past. This PR fixes it by updating the flag in the JNI. I also added a quick smoke test for the CoreML provider to make sure it doesn't crash and can be enabled.

### Motivation and Context
All the EPs should work as expected in Java. Fixes #16230.
2023-06-07 12:24:57 -07:00
Adam Pocock
3c2a11f2f1
[java] Allow the creation of boolean tensors from ByteBuffer (#15556)
### Description
The tensor creation code now allows the creation of boolean tensors from
non-direct `ByteBuffer` instances. It previously only allowed them from
arrays and direct `ByteBuffer` instances and this fixes that
inconsistency. The boolean tensor test has been updated to cover all
three cases.

### Motivation and Context
Fixes #15509.
2023-06-05 09:58:50 -07:00
Xavier Dupré
e726151b5c
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.

* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel

### Motivation and Context
Supports latest onnx version.

Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)

---------

Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 13:25:58 -07:00