### Description
Removing C4090 warning suppression after windows pipelines adapt vs2022
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
This PR partially reverts changes introduced in
https://github.com/microsoft/onnxruntime/pull/15643
We make two API return std::string always in UTF-8.
We also move the entry points from OrtApiBase to OrtApi to make them
versioned.
### Motivation and Context
`GetVersionString` always returns x.y.z numbers that are not subject to
internationalization.
`GetBuildInfoString` can hold international chars, but UTF-8 should be
fine to contain those.
We prefix them with u8"" in case the compiler default charset is not
UTF-8.
Furthermore, creating platform dependent APIs is discouraged.
`ORTCHAR_T` is platform dependent and was created for paths only.
On non-unix platforms would still produce `std::string` that can only
contain UTF-8
The API was introduced after the latest release, and can still be
adjusted.
### Description
This PR creates Nuget and Android for Training.
### Motivation and Context
These packages are intended to be released in ORT 1.15 to enable
On-Device Training Scenarios.
## Packaging Story for Learning On The Edge Release
### Nuget Packages:
1. New Native package -> **Microsoft.ML.OnnxRuntime.Training** (Native
package will contain binaries for: win-x86, win-x64, win-arm, win-arm64,
linux-x64, linux-arm64, android)
2. C# bindings will be added to existing package ->
**Microsoft.ML.OnnxRuntime.Managed**
### Android Package published to Maven:
1. New package for training (full build) ->
**onnxruntime-training-android-full-aar**
### Python Package published to PyPi:
1. Python bindings and offline tooling will be added to the existing ort
training package -> **onnxruntime-training**
### Description
Updating the build option for enabling training in java builds from
ENABLE_TRAINING -> ENABLE_TRAINING_APIS.
In the native codebase ENABLE_TRAINING is used for enabling full
training and ENABLE_TRAINING_APIS is used for creating the lte builds
with training apis. Making the change to sync the naming convention
across all the language bindings.
It was a bit confusing to see ENABLE_TRAINING when debugging the android
build failures for training. Making this change just to improve
readability of logs during debugging.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Allows the creation of zero length tensors via the buffer path (the
array path with zero length arrays still throws as the validation logic
to check it's not ragged would require more intrusive revision), and
allows the `tensor.getValue()` method to return a Java multidimensional
array with a zero dimension. Also added a test for the creation and
extraction behaviour.
### Motivation and Context
The Python interface can return zero length tensors (e.g. if object
detection doesn't find any objects), and before this PR in Java calling
`tensor.getValue()` throws an exception with a confusing error message.
Fixes#7270 & #15107.
- Update Gradle version used in most places from 6.8.3 to 8.0.1. Update Android Gradle Plugin version where applicable.
Not updated in this change: React Native Android projects (under `js/react_native/`). That can be done later along with updating the React Native projects.
- Add Gradle wrapper in `java/` to make it easier to consistently use a specific Gradle version.
### Description
<!-- Describe your changes. -->
I fixed some broken links in the C API documentation, but then did a
quick pass over all of the links I could find and then fixed those.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
I got some 404's when exploring the documentation and wanted to fix it.
### Description
<!-- Describe your changes. -->
Update java/build.gradle to not use deprecated features that were
removed in gradle 8.0.
Also move gradle wrapper setup from a script into a step template.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix builds which use hosted Mac agents and gradle.
Recently the system version of gradle got upgraded to 8.0. Even though
we use an older gradle wrapper version, java/build.gradle is still
processed with gradle 8.0 in the initial call to `gradle wrapper`.
* Added the OrtDnnlProviderOptions structure to expose configuration
options to the user
* The number of threads can be defined by the user with the -i flag on
the perftest
* Number of threads can also be configured via the OMP_NUM_THREADS
environment variable
* The number of threads defined in the OrtDnnlProviderOptions is
prioritized over the environment variable
### Description
Avoids thread oversubscription caused by OpenMP allocating the maximum
number of threads possible for oneDNN EP. Added support for the
OrtDnnlProviderOptions, this will allow for more EP customization
capabilities, and allows for user defined number of threads.
### Motivation and Context
- Improves performances and allows for user to fine tune the number of
threads
Fix C6011, C6385, C6386 found by Visual Studio. Basically, I set the
maximum number of options for every EP to 128. To my knowledge, 128 is
big enough to support all EPs.
For support arbitrary number of EP options, we probably need #13999 and
create a "std::vector"-like struct in C language.
Description
Add bindings for Android and iOS.
Motivation and Context
Enable mobile app linking against ort-extensions library and registering the custom ops with ORT.
**Description**:
Adds support for creating and receiving sparse tensors in the ORT Java
API.
CSRC and COO tensors as inputs are tested, but there is no op which
accepts a block sparse tensor to test. COO tensors are tested as
outputs, but there is no op which emits a CSRC or block sparse tensor to
test.
**Motivation and Context**
- Why is this change required? What problem does it solve? Request to
expose ORT sparse tensor support in Java.
cc @yuslepukhin
### Description
Update protobuf-java to version 3.21.7. This change only impact tests.
### Motivation and Context
The current version exhibits CVE-2022-3509
Previously OnnxSequence would flatten out a list of tensors into a
single output array assuming they were all scalar values. This doesn't
accurately represent the semantics of an ONNX sequence, but was what the
semantics appeared to be years ago when I first wrote that class. This
PR changes it so that the `getValue` method on `OnnxSequence` unwraps
the sequence and returns `List<? extends OnnxValue>` allowing the user
to process the individual ONNX values separately. It's done this way
rather than returning a multidimensional array for a tensor and a Java
map for a map as multidimensional arrays are very inefficient in Java
and best practice when operating with a OnnxTensor in Java is to use a
`java.nio.ByteBuffer`. So allowing users to access each `OnnxTensor`s
individually allows them to control how the data is materialised on the
Java heap.
# Motivation
Currently, ORT minimal builds use kernel def hashes to map from nodes to
kernels to execute when loading the model. As the kernel def hashes must
be known ahead of time, this works for statically registered kernels.
This works well for the CPU EP.
For this approach to work, the kernel def hashes must also be known at
ORT format model conversion time, which means the EP with statically
registered kernels must also be enabled then. This is not an issue for
the always-available CPU EP. However, we do not want to require that any
EP which statically registers kernels is always available too.
Consequently, we explore another approach to match nodes to kernels that
does not rely on kernel def hashes. An added benefit of this is the
possibility of moving away from kernel def hashes completely, which
would eliminate the maintenance burden of keeping the hashes stable.
# Approach
In a full build, ORT uses some information from the ONNX op schema to
match a node to a kernel. We want to avoid including the ONNX op schema
in a minimal build to reduce binary size. Essentially, we take the
necessary information from the ONNX op schema and make it available in a
minimal build.
We decouple the ONNX op schema from the kernel matching logic. The
kernel matching logic instead relies on per-op information which can
either be obtained from the ONNX op schema or another source.
This per-op information must be available in a minimal build when there
are no ONNX op schemas. We put it in the ORT format model.
Existing uses of kernel def hashes to look up kernels are replaced
with the updated kernel matching logic. We no longer store
kernel def hashes in the ORT format model’s session state and runtime
optimization representations. We no longer keep the logic to
generate and ensure stability of kernel def hashes.
Working on JNI refactor for OnnxTensor.
Simplifying the error handling logic in createTensor.
Collapsing casting branches and migrating to ONNX element type enum.
Disable cpplint for JNI C files.
**Description**: This fixes error handling in the JNI code in OnnxMap, OnnxSequence, OnnxRuntime, RunOptions. SessionOptions and OrtEnvironment are correct as is.
The bulk of the work will be in rewriting OnnxTensor, OnnxSparseTensor (after the merge of #10653) and OrtSession, along with the helper methods in OrtJniUtil. I plan to tackle those in separate PRs to reduce the amount of code to review.
**Motivation and Context**
- Why is this change required? What problem does it solve? The current native interop code doesn't return control to Java immediately on throwing an exception from an ORT error code, which can cause incorrect interactions with native ORT, and issues with exception propagation on the Java side.
- If it fixes an open issue, please link to the issue here. Partial work towards solving #11451.
* Specify list/map capacity when initializing where possible
- This really depends on the use case, but in some cases the array/map resizing can be slightly costly, there is effectively no downside setting the initial capacity for a collection if we know for sure its final size
* Supply list/map capacity when initializing where possible
- This really depends on the use case, but in some cases the array/map resizing can be slightly costly, there is effectively no downside setting the initial capacity for a collection if we know for sure its final size
- Introduce an extra utility to help creating maps with expected capacity
* Move utility function to OrtUtil and drop MapUtil, also add Java doc to method
* Move test to the right class
Java side parts for configuring CUDA and TensorRT.
Adding tests for CUDA and TensorRT. Refactoring library loading logic as provider options need to have their shared library loaded before they can be constructed.
* Add android package build settings for full build
Co-authored-by: gwang0000 <62914304+gwang0000@users.noreply.github.com>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
* squashed commit for standalone tvm execution provider
* critical fix for correct python build with stvm ep
* get tuning log file from ep options. It has priority over AUTOTVM_TUNING_LOG
* updates and fixes
* update parsing of stvm provider options
* add support of external data for onnx model
* add conditional dump of subgraphs
* remove unused code
* get input tensor shapes through provider options. get output shapes for fixed input ones by TVM API
* support AUTO_TVM tuning log file inside ORT. Selector for Ansor and Auto_TVM is provider option (tuning_type)
* add fp16
* add functionality of conversion of model layout to NHWC if need. Necessary parameter was added to STVM provider options
* fix license text in header. fix log format
* small fixes
* fix issues from flake8
* remove model proto construction from GetCapability
* reserve memory for vector of DLTensors
* add simple tutorial for STVM EP
* STVM docs
* jroesch/tvm -> apache/tvm
* remove dead code, unneccessary logs and comments
* fix in readme
* improve tutorial notebook
* tvm update
* update STVM_EP.md
* fix default value
* update STVM_EP.md
* some TODOs for the future development
* shorten long lines
* add hyperlink to STVM_EP.md
* fix Linux CI error
* fix error in csharp test
Co-authored-by: Jared Roesch <jroesch@octoml.ai>
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>
* re-hipify all rocm EP sources
* fix all other files affected by re-hipify
* add cuda_provider_factory.h to amd_hipify.py
* do not use cudnn_conv_algo_search in ROCm EP, missing reduce min registration
* Fix ReduceConsts template specialization introduced in #9101.
Fixes the error when building for ROCm 4.3.1:
error: too many template headers for onnxruntime::rocm::ReduceConsts<__half>::One (should be 0)
* fix flake8 error in amd_hipify.py
* speed up hipify with concurrent.futures
* flake8 fix in amd_hipify.py
* First iteration of making cuda a shared provider.
Separated out shared OpKernel change, so doing this to merge with that change.
* More cuda shared library refactoring
* More cuda shared library refactoring
* More build options tested, converted the training ops over.
* Fix merge breaks
* Fix submodules
* Fix submodules
* Fix submodules
* Fix python
* Fix compile errors
* Duplicate symbol fix
* Test fix for ROCM provider
* Another ROCM test workaround
* ROCM Build Test
* ROCM build fix
* ROCM
* ROCM
* ROCM
* ROCM
* ROCM
* ROCM test
* Reduce header dependencies
* Remove redundant namespace
* Test fix for linux
* Fix linux build
* Fix Eigen build error
* Fix unused parameter warning
* Test link error
* Another linker test
* Linker test
* Linker test
* Another test
* Another build test
* Fix linux link error
* Build test
* Fix control flow ops to use common base class with core code
* Remove extra qualifiers
* Fix template syntax for linux
* Fix cuda memory leak
* Fix pybind
* Test disabling cast
* Cleanup
* Restore cuda in test
* Remove more header dependencies
* Test not adding cuda provider to session
* Make GetProviderInfo_CUDA throw
* No-op cuda provider creation
* Fix some setup issues
* Fix memory cleanup on unload
* Diagnostics
* Don't unload library
* Add diagnostics
* Fix deleting registry at right time.
* Test disabling profiler
* Fix merge break
* Revert profiler change
* Move unloading of shared providers into Environment
* Free more global allocations before library unloads
* Add more diagnostics
* Move unloading back to the OrtEnv as there are multiple Environments created during a session.
Remove some library dependencies for tests.
* Fix more cmake files
* ERROR -> WARNING
* Fix python shutdown
* Test not using dml in pipeline
* Change python version and disable dml
* Update python version
* Test adding unload method for shared providers
* Disable DLL test
* Python test
* Revert "Python test"
This reverts commit c7ec2cfe98.
* Revert "Disable DLL test"
This reverts commit e901cb93aa.
* Revert "Test adding unload method for shared providers"
This reverts commit c427b78799.
* Point to RyanWinGPU
* Revert python version
* Fix id_to_allocator_map
* Another python exit test
* Remove extra debug messages
Try a more clean python shutdown through DllMain
* Revert DllMain idea, it didn't work
* Merge conflicts
* Fix merge with master issues.
* Comments
* Undo edit to file
* Cleanup + new training ops
* Revert yml changes
* Fix another merge error
* ROCM fix
* ROCM fix v2
* Put back Linux hack, it is necessary
* Stupid fixes
* Fix submodule out of sync
* ROCM fix 3
* ROCM 4
* Test java fix
* Fix typos
* Java test on my VM
* Fix build error
* Spotless fix
* Leave temp file around to load properly
* Fix cleanup on exit
* Fix break
* Java comments
* Remove LongformerAttentionBase workaround
* Spotless fix
* Switch yml back to regular build pool
* Revert "Switch yml back to regular build pool"
This reverts commit be35fc2a5a.
* Code review feedback
* Fix errors due to merge
* Spotless fix
* Fix minimal build
* Java fix for non cuda case
* Java fix for CPU build
* Fix Nuphar?
* Fix nuphar 2
* Fix formatting
* Revert "Remove LongformerAttentionBase workaround"
This reverts commit 648679b370.
* Training fix
* Another java fix
* Formatting
* Formatting
* For orttraining
* Last orttraining build fix...
* training fixes
* Fix test provider error
* Missing pass command
* Removed in wrong spot
* Python typo
* Python typos
* Python crash on exit, possibly due to unloading of libraries.
* Remove test_execution_provider from training build
Only enable python atexit on windows
Remove assert on provider library exit
* Still can't unload providers in python, alas.
* Disable Nvtx temporarily
* MPI Kernels for Training
* MPI Kernels part 2
* Patch through INcclService
* Oops, wrong CMakeLists
* Missing namespace
* Fix missing ()
* Move INcclService::GetInstance around to link nicer
* Missing }
* Missing MPI libraries for Cuda
* Add extra GetType functions used by MPI
* Missing Nccl library
* Remove LOGS statements as a test
* Add in a couple more missing GetType methods
* Update comments
* Missed a logging reference in mpi_context.h
* Convert aten_op to shared (due to marge with master)
* Test moving DistributedRunContext instance into shared provider layer
(with purpose error to verify it's being built properly)
* Test passed, now with fix
* Missing static
* Oops, scope DistributedRunContext to just NCCL
* Merge related issues and code review feedback.
* Merge error
* Bump to rel-1.9.1 (#7684)
* Formatting
* Code review feedback for Java build on non Windows
* Remove cupti library dependency from core library
* Test Java pipeline fix
* Linux build fix
* Revert "Linux build fix"
This reverts commit a73a811516.
* Revert "Remove cupti library dependency from core library"
This reverts commit 6a889ee8bf.
* Packaging pipeline fixes to copy cuda shared provider for tensorrt & standard packages
* Add cuda to Tensorrt nuget package
* onnxruntime_common still has a cuda header dependency
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
* test
* [gwang] make cmake compile work
* [gwang] enble build apks
* some build update
* add simple sigmoid test android project and cmake
* add build.py
* refine and remove unused import lib
* address CR comments
* remove unnecessary files
* add README.md
* minor update
* remove
* minor change
* fix ci failure and minor update
* fix typo in project folder
* remove
* remove and minor update
* refine
* minor fix
* fix
* fix typo
* add gradle spotlessApply task to fix CI failure
* fix
* enable spotlessApply in build gradle
* revert some changes
* minor fix
* run spotless apply for format
* address CR comments and fix CI version and format
* refine
* Refine
* address comments
* refine
* refine
* modify
* reformat
* resolve version conflicts
* minor update
* minor update
* address comments
* minor update
Co-authored-by: Guoyu Wang <wanggy@outlook.com>
Add providers for CoreML, ROCM, NNAPI, ArmNN
Adding the structs for OrtCUDAProviderOptions and OrtOpenVINOProviderOptions
Updating NNAPI flags.
Adding the new CoreML flag.
Adding hooks to the build system to tell Java about the new providers.
* Updates for Gradle 7.
* Adding support for OrtThreadingOptions into the Java API.
* Fixing a typo in the JNI code.
* Adding a test for the environment's thread pool.
* Fix cuda test, add comment to failure.
* Updating build.gradle
* Adding Java support for getAvailableProviders, addFreeDimensionOverrideByName, disablePerSessionThreads and getProfilingStartTimeNs.
* Fixing copyright years, running spotless and adding javadoc and an accessor to OrtProvider.
* Renaming OrtSession.getProfilingStartTimeInNs.
* Removing ngraph as it's been deprecated.
* Rearranging checks in onnxruntime_mlas.cmake to pickup Apple Silicon.
On an M1 Macbook Pro clang reports:
$ clang -dumpmachine
arm64-apple-darwin20.1.0
So the regex check needs to look for "arm64" first, as otherwise it
matches 32-bit ARM and you get NEON compilation failures.
* Adding Java side library loading support for Apple Silicon (and other aarch64 architectures).
* Adding Qgemm fix from @tracysh
* Fixes the java packaging on Windows.
* Missed a check in the java platform detector.
* Remove nGraph Execution Provider
Pursuant to nGraph deprecation notice: https://github.com/microsoft/onnxruntime/blob/master/docs/execution_providers/nGraph-ExecutionProvider.md#deprecation-notice
**Deprecation Notice**
| | |
| --- | --- |
| Deprecation Begins | June 1, 2020 |
| Removal Date | December 1, 2020 |
Starting with the OpenVINO™ toolkit 2020.2 release, all of the features
previously available through nGraph have been merged into the OpenVINO™
toolkit. As a result, all the features previously available through
ONNX RT Execution Provider for nGraph have been merged with ONNX RT
Execution Provider for OpenVINO™ toolkit.
Therefore, ONNX RT Execution Provider for **nGraph** will be deprecated
starting June 1, 2020 and will be completely removed on December 1,
2020. Users are recommended to migrate to the ONNX RT Execution Provider
for OpenVINO™ toolkit as the unified solution for all AI inferencing on
Intel® hardware.
* Remove nGraph Licence info from ThirdPartyNotices.txt
* Use simple Test.Run() for tests without EP exclusions
To be consistent with rest of test code.
* Remove nGraph EP functions from Java code
* Add options for nnapi ep
* Add nnapi flags test
* add comments
* Add flag comments
* Make the flags bitset const
* Fix build break
* Add stub changes to java and c# api
* Fix java related build break
* Fix java build break
* Switch to bit flags instead of bitset
Fixed static IS_ANDROID detection
final static IS_ANDROID is causing an Error Unsupport arch:aarch64, so removed IS_ANDROID & replaced with IS_ANDROID with isAndroid().
* [java] Fixing the buffer semantics.
* Renaming bufferCapacity to bufferRemaining.
* Adding a cast to char* so the pointer arithmetic works on Windows.
* Add SetLanguageProjection C Api and use it in four projections
* static cast enum languageprojection to uint32_t
* resolve comments
* fix typo and line added unintentionally
* revert unecessary change
* reorder c# api
* add TensorAt and CreateAndRegisterAllocator in Csharp to keep the same order as C apis
* update java API docs
* fix link
* rearrange
* update platforms, use table
* use javadoc.io
* craigacp tested it in java 14
* update link
* fix broken link
* fix testdata link
Modify gradle build so artifactID has _gpu for GPU builds.
Pass USE_CUDA flag on CUDA build
Adjust publishing pipelines to extract POM from a correct path.
Co-Authored-By: @Craigacp
1. Enlarge the read buffer size further, so that our code can run even faster. TODO: need apply the similar changes to python some other language bindings.
2. Add coreml_VGG16_ImageNet to the test exclusion set of x86_32. It is not a new model but previously we didn't run the test against x86_32.
* Add amd migraphx execution provider to onnx runtime
* rename MiGraphX to MIGraphX
* remove unnecessary changes in migraphx_execution_provider.cc
* add migraphx EP to tests
* add input requests of the batchnorm operator
* add to support an onnx operator PRelu
* update migrapx dockerfile and removed one unused line
* sync submodules with mater branch
* fixed a small bug
* fix various bugs to run msft real models correctly
* some code cleanup
* fix python file format
* fixed a code style issue
* add default provider for migraphx execution provider
Co-authored-by: Shucai Xiao <Shucai.Xiao@amd.com>
onnxruntime init failure due to wrong path of reading native libraries. In OS X 64 system, the arch name is detected as x86 which generates invalid path to read native libraries.
Exception java.lang.UnsatisfiedLinkError: no onnxruntime in java.library.path
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867)
at java.lang.Runtime.loadLibrary0(Runtime.java:870)
at java.lang.System.loadLibrary(System.java:1122)
at ai.onnxruntime.OnnxRuntime.load(OnnxRuntime.java:174)
at ai.onnxruntime.OnnxRuntime.init(OnnxRuntime.java:81)
at ai.onnxruntime.OrtEnvironment.<clinit>(OrtEnvironment.java:24)
* Initial update of readme
* Readme updates
* Review of consolidated README (#3930)
* Proposed updates for readme (#3953)
I found some of the information was duplicated within the doc, so attempted to streamline
* Fix links
* More updates
- fix build instructions
- nodejs doc reorganization
- roadmap update
- version fixes
* Update ORT Server build instructions
* More doc cleanup
* fix python dev notes name
* Update nodejs and some links
* sync eigen version back to master
* Minor fixes
* add nodsjs to sample table of content
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* address PR feedback
* address PR feedback
* nodejs build instruction
* Update Java instructions to include gradle
* Roadmap refresh
Reformat some data, fix link, minor rewording
* Clarify Visual C++ runtime req
Co-authored-by: Nat Kershaw (MSFT) <nakersha@microsoft.com>
Co-authored-by: Prasanth Pulavarthi <prasantp@microsoft.com>
Co-authored-by: manashgoswami <magoswam@microsoft.com>
* [java] - adding a cuda enabled test.
* Adding --build_java to the windows gpu ci pipeline.
* Removing a stray line from the unit tests that always enabled CUDA for Java.
Detect os and arch and move the artifacts to a new folder.
Remove unnecesary jars so we cam focus on those we publish.
Add signing
Make signature simlper.
Fix indent.
Halt on 32-bit arch.
Credits: @Craigacp
* java - adding support for custom op libraries.
* Adding support for RunOptions and additional methods for SessionOptions and OrtSession.
As a result OrtEnvironment.LoggingLevel moved to be a top level enum
called OrtLoggingLevel.
* java - adding unit tests for RunOptions and SessionOptions.
* java - removing unused releaseNamesHandle method
* java - add test for custom op library.
* java - adding log verbosity methods, and tests for the same.
* java - fixes for custom op loading test on Windows.
* Cleanup after rebase on master.