- Update Gradle version used in most places from 6.8.3 to 8.0.1. Update Android Gradle Plugin version where applicable.
Not updated in this change: React Native Android projects (under `js/react_native/`). That can be done later along with updating the React Native projects.
- Add Gradle wrapper in `java/` to make it easier to consistently use a specific Gradle version.
### Description
allow onnxruntime_test_all to run in browser for WebAssembly build (use
flag `--wasm_run_tests_in_browser`).
To output the logs from stdout correctly, this test needs to be build
with `--enable_wasm_threads`.
### Description
<!-- Describe your changes. -->
Merging extensions from Git submodule to cmake FetchContent
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
---------
Co-authored-by: Changming Sun <chasun@microsoft.com>
Co-authored-by: Jian Chen <jchen351@MacBook-Pro.local>
### Description
Introduce collective ops into onnxruntime inference build, including
1) AllReduce and AllGather schema in contrib op, controlled by USE_MPI
flag
2) AllReduce and AllGather kernel in cuda EP, controlled by ORT_USE_NCCL
flag
### Motivation and Context
Enable the collective ops in onnxruntime inference build so we have the
ability to run distributed inference with multiple GPUs.
The original ncclAllReduce ops in training build require quite complex
configurations, which is not suitable for inference case, and it already
broken. so we introduce a new implementation.
---------
Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
### Only link mpi when either use_mpi or use_nccl enabled
To fix the issue https://github.com/microsoft/onnxruntime/issues/14278.
Talked with @askhade, we think if users want to enable NCCL/MPi but MPI
is not found, it should be failure instead of warning.
So this PR made the change. As a result, to make CIs pass, we need
disable NCCL/MPI explicitly in the build command. This PR take an
alternative approach, e.g. since NCCL and MPi are not used for
customers, disable NCCL by default if "--disable_nccl" not specified,
disable MPI by default if "--use_mpi" not specified.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Two modifications:
- After [TRT 8.5](https://github.com/microsoft/onnxruntime/pull/13867)
being merged, we can manually set timeout and make TRT EP only run small
portion of unit tests
(`onnxruntime_SKIP_AND_PERFORM_FILTERED_TENSORRT_TESTS=ON`) due to
additional TRT kernel overhead introduced by TRT 8.5 which increases
test time a lot. This PR modifies the checking condition and make
TensorRT CIs (can enable builder placeholder) still run most of the unit
tests.
- Exclude TRT EP from [Resize Opset
18](https://github.com/microsoft/onnxruntime/pull/13890) unit tests
since TensorRT 8.5 supports operators up to Opset 17.
### Description
<!-- Describe your changes. -->
Use dlsym/GetProcAddress to lookup a custom ops registration function by
name and call it.
This will be better on mobile platforms where the custom ops library is
linked against, and there isn't necessarily a filesystem that a library
path can be loaded from.
Alternative is to wire up passing in the address of the function, but
that has multiple complications which differ by platform.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Enable using ort and ort-ext packages on mobile platforms.
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
### Description
1. Renames all references of on device training to training apis. This
is to keep the naming general. Nothing really prevents us from using the
same apis on servers\non-edge devices.
2. Update ENABLE_TRAINING option: With this PR when this option is
enabled, training apis and torch interop is also enabled.
3. Refactoring for onnxruntime_ENABLE_TRAINING_TORCH_INTEROP option:
- Removed user facing option
- Setting onnxruntime_ENABLE_TRAINING_TORCH_INTEROP to ON when
onnxruntime_ENABLE_TRAINING is ON as we always build with torch interop.
Once this PR is merged when --enable_training is selected we will do a
"FULL Build" for training (with all the training entry points and
features).
Training entry points include:
1. ORTModule
2. Training APIs
Features include:
1. ATen Fallback
2. All Training OPs includes communication and collectives
3. Strided Tensor Support
4. Python Op (torch interop)
5. ONNXBlock (Front end tools for training artifacts prep when using
trianing apis)
### Motivation and Context
Intention is to simply the options for building training enabled builds.
This is part of the larger work item to create dedicated build for
learning on the edge scenarios with just training apis enabled.
Implement CloudEP for hybrid inferencing.
The PR introduces zero new API, customers could configure session and
run options to do inferencing with Azure [triton
endpoint.](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-with-triton?tabs=azure-cli%2Cendpoint)
Sample configuration in python be like:
```
sess_opt.add_session_config_entry('cloud.endpoint_type', 'triton');
sess_opt.add_session_config_entry('cloud.uri', 'https://cloud.com');
sess_opt.add_session_config_entry('cloud.model_name', 'detection2');
sess_opt.add_session_config_entry('cloud.model_version', '7'); // optional, default 1
sess_opt.add_session_config_entry('cloud.verbose', '1'); // optional, default '0', meaning no verbose
...
run_opt.add_run_config_entry('use_cloud', '1') # 0 for local inferencing, 1 for cloud endpoint.
run_opt.add_run_config_entry('cloud.auth_key', '...')
...
sess.run(None, {'input':input_}, run_opt)
```
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
### Description
Update CUDA ArgMin/ArgMax op kernels to have end version 11 since opset
12+ is not supported yet.
With the way these kernels are currently registered, the documentation
shows support for opset 11+. This is not accurate.
### Motivation and Context
Fix#13781
### Description
Add the ability to run graph
### Motivation and Context
A brief description is as follows:
1) If the whole graph is supported, then will be processed by the graph
engine, directly.
2) If the whole graph is not supported, the whole graph will be divided
into subgraphs and single operators; The sub-graphs will be run on graph
engine, and the single operators will fallback to the traditional mode.
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
For compilation in container, ADO Cache task doesn't work directly.
The workaround is to mount the cache directory to the container, and let
CCache in container to read/write cache data.
In short, we just leverage ADO API to download/upload cache data.
The Post-jobs works in stack-mode, So the PostBuildCleanUp Tasks should
be defined first.
Thus, The PostBuildCleanUp would be executed lastly.
Else, Cache Task would fail to upload cache because the Agent Directory
is cleaned.
Integrate TensorRT 8.5
- Update TensorRT EP to support TensorRT 8.5
- Update relevant CI pipelines
- Disable known non-supported ops for TensorRT
- Make timeout configurable.
We observe more than [20
hours](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=256729&view=logs&j=71ce39d8-054f-502a-dcd0-e89fa9931f40)
of running unit tests with TensorRT 8.5 in package pipelines. Because we
can't use placeholder to significantly reduce testing time (c-api
application test will deadlock) in package pipelines, we only run
subsets of model tests and unit tests that are related to TRT (add new
build flag--test_all_timeout and set it to 72000 seconds by package
pipelines). Just to remember, we still run all the tests in TensorRT CI
pipelines to have full test coverage.
- include https://github.com/microsoft/onnxruntime/pull/13918 to fix
onnx-tensorrt compile error.
Co-authored-by: George Wu <jywu@microsoft.com>
### Description
This PR enables building nuget packages locally for on device training
using --build_nuget arg.
This PR also enables the C# bindings by default in the managed package.
If a user triggers any training apis when the native binary is not built
for training, an exception with message "Training is disabled in the
current build. Please build ONNXRuntime from source with the build flags
enable_training and enable_training_on_device. " is thrown.
Build command for creating nuget packes for on device training:
build.bat --enable_training --enable_training_on_device --build_nuget
2 Nuget packages are built
1. Microsoft.ML.OnnxRuntime.Managed
2. Microsoft.ML.OnnxRuntime.Training OR
Microsoft.ML.OnnxRuntime.Training.Gpu
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
## Description
1. Convert some git submodules to cmake external projects
2. Update nsync from
[1.23.0](https://github.com/google/nsync/releases/tag/1.23.0) to
[1.25.0](https://github.com/google/nsync/releases/tag/1.25.0)
3. Update re2 from 2021-06-01 to 2022-06-01
4. Update wil from an old commit to 1.0.220914.1 tag
5. Update gtest to a newer commit so that it can optionally leverage
absl/re2 for parsing command line flags.
The following git submodules are deleted:
1. FP16
2. safeint
3. XNNPACK
4. cxxopts
5. dlpack
7. flatbuffers
8. googlebenchmark
9. json
10. mimalloc
11. mp11
12. pthreadpool
More will come.
## Motivation and Context
There are 3 ways of integrating 3rd party C/C++ libraries into ONNX
Runtime:
1. Install them to a system location, then use cmake's find_package
module to locate them.
2. Use git submodules
6. Use cmake's external projects(externalproject_add).
At first when this project was just started, we considered both option 2
and option 3. We preferred option 2 because:
1. It's easier to handle authentication. At first this project was not
open source, and it had some other non-public dependencies. If we use
git submodule, ADO will handle authentication smoothly. Otherwise we
need to manually pass tokens around and be very careful on not exposing
them in build logs.
2. At that time, cmake fetched dependencies after "cmake" finished
generating vcprojects/makefiles. So it was very difficult to make cflags
consistent. Since cmake 3.11, it has a new command: FetchContent, which
fetches dependencies when it generates vcprojects/makefiles just before
add_subdirectories, so the parent project's variables/settings can be
easily passed to the child projects.
And when the project went on, we had some new concerns:
1. As we started to have more and more EPs and build configs, the number
of submodules grew quickly. For more developers, most ORT submodules are
not relevant to them. They shouldn't need to download all of them.
2. It is impossible to let two different build configs use two different
versions of the same dependency. For example, right now we have protobuf
3.18.3 in the submodules. Then every EP must use the same version.
Whenever we have a need to upgrade protobuf, we need to coordinate
across the whole team and many external developers. I can't manage it
anymore.
3. Some projects want to manage the dependencies in a different way,
either because of their preference or because of compliance
requirements. For example, some Microsoft teams want to use vcpkg, but
we don't want to force every user of onnxruntime using vcpkg.
7. Someone wants to dynamically link to protobuf, but our build script
only does static link.
8. Hard to handle security vulnerabilities. For example, whenever
protobuf has a security patch, we have a lot of things to do. But if we
allowed people to build ORT with a different version of protobuf without
changing ORT"s source code, the customer who build ORT from source will
be able to act on such things in a quicker way. They will not need to
wait ORT having a patch release.
9. Every time we do a release, github will also publish a source file
zip file and a source file tarball for us. But they are not usable,
because they miss submodules.
### New features
After this change, users will be able to:
1. Build the dependencies in the way they want, then install them to
somewhere(for example, /usr or a temp folder).
2. Or download the dependencies by using cmake commands from these
dependencies official website
3. Similar to the above, but use your private mirrors to migrate supply
chain risks.
4. Use different versions of the dependencies, as long as our source
code is compatible with them. For example, you may use you can't use
protobuf 3.20.x as they need code changes in ONNX Runtime.
6. Only download the things the current build needs.
10. Avoid building external dependencies again and again in every build.
### Breaking change
The onnxruntime_PREFER_SYSTEM_LIB build option is removed you could think from now
it is default ON. If you don't like the new behavior, you can set FETCHCONTENT_TRY_FIND_PACKAGE_MODE to NEVER.
Besides, for who relied on the onnxruntime_PREFER_SYSTEM_LIB build
option, please be aware that this PR will change find_package calls from
Module mode to Config mode. For example, in the past if you have
installed protobuf from apt-get from ubuntu 20.04's official repo,
find_package can find it and use it. But after this PR, it won't. This
is because that protobuf version provided by Ubuntu 20.04 is too old to
support the "config mode". It can be resolved by getting a newer version
of protobuf from somewhere.
fix for https://github.com/microsoft/onnxruntime/issues/13383,
https://github.com/microsoft/onnxruntime/issues/13408
Currently ort-web doesn't catch exceptions because turning on exception
catching increases the binary size by 3MB (~30%).
But ort can throw (ie onnx errors or ORT_ENFORCE) and there is no
useable error message.
Turning on exception catching just for top level api released file will
fix the error messages at minimal increase of binary size.
1. Remove the cmake option onnxruntime_DEV_MODE and replace it with
"--compile-no-warning-as-error"
2. Suppress some GSL warnings because now we treat nvcc diag warnings as
errors
### Description
<!-- Describe your changes. -->
Fix document generation CI. It's not currently updating the docs as
we're skipping the tests, which is the invocation of build.py that would
have generated the documentation.
Setup specific task to generate documentation for greater clarity.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Operator kernel documentation is not getting updated and is now out of
date.
### Description
Currently, hipify happens before cmake is configured and then cmake glob
the directories. This get rids of thoes customized python threading
logic and opt for build system itself to generate the files.
This also supersede the half baked branch
[sukha/hipify-with-cmake](https://github.com/microsoft/onnxruntime/tree/sukha/hipify-with-cmake)
### Description
<!-- Describe your changes. -->
ROCm developers always need to build onnxruntime *whl with
`--enable_rocm_profiling`.
Add a ROCm dev python package pipeline which product *.whl with build
args `--enable_rocm_profiling`.
The dev *whl need to upload to azure storage and can get from
https://download.onnxruntime.ai/onnxruntime_nightly_rocm53.profiling.html
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
**Description**: This PR adds Ascend CANN execution provider support.
**Motivation and Context**
- Why is this change required? What problem does it solve?
As the info shown in the issue. CANN is the API layer for Ascend
processor. Add CANN EP can allow user run onnx model on Ascend hardware
via onnxruntime
The detail change:
1. Added CANN EP framework.
2. Added the basic operators to support ResNet and VGG model.
3. Added C/C++、Python API support
- If it fixes an open issue, please link to the issue here.
https://github.com/microsoft/onnxruntime/issues/11477
Author:
lijiawei <lijiawei19@huawei.com>
wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: FFrog <ljw1101.vip@gmail.com>
This changes are to align OV 2022.2 Release with ORT . Changes
CPU FP16 Support, dGPU Support, RHEL Dockerfile, Ubuntu 20 Dockerfile
**Motivation and Context**
- This change is required to ensure ORT-OpenVINO Execution Provider is
aligned with latest changes.
- If it fixes an open issue, please link to the issue here.
Co-authored-by: mayavijx <mayax.vijayan@intel.com>
Co-authored-by: shamaksx <shamax.kshirsagar@intel.com>
Co-authored-by: pratiksha <pratikshax.bapusaheb.vanse@intel.com>
Co-authored-by: pratiksha <mohsinx.mohammad@intel.com>
Co-authored-by: Sahar Fatima <sfatima.3001@gmail.com>
Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>
Co-authored-by: nmaajidk <n.maajid.khan@intel.com>
Co-authored-by: Mateusz Tabaka <mateusz.tabaka@intel.com>
Co-authored-by: intel <intel@iotgecsp-nuc04.iind.intel.com>
1. add node test data to current model tests
2. support opset version to filter tests.
3. remove old filter based on onnx version. To avoid confusion, ONLY
support opset version filter in onnxruntime_test_all
4. support read onnx test data from absolute path on Windows.
* Fix bug in pybind get_all_operator_schema due to premature reference dropping
* Add updated operator kernels markdown table
* Update build.py to include documentation generation for DML operators too
* Update GPU pipeline to include DML in the build to so operators can be generated.
* Use a separate pipeline stage, feedback from Changming and Scott
* Appease annoying Python linter
* Add onnxruntime_BUILD_UNIT_TESTS=OFF and remove stale --use_dml in cuda stage
* drop nuphar code and configs
* refactor test case
* format python
* remove nuphar from training test
* remove commented nuphar logics
* restore llvm setting
* drop nuphar ci
* fix compile err
* fix compile err
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
* upgrade emsdk to 3.1.19
* fix build break
* ignore '-Wunused-but-set-variable' in eigen
* add malloc and free in exported functions
* EXPORTED_FUNCTIONS
* Add first pass of rocm kernel profiler
* Clean up rocm_profiler. Format args. Demangle kernel names.
Add Api EventRecords
* Remove debug output
* Temporarily disable profiling unit test 'api record check' for cupti
* Fix compile error for non-gpu builds
* Use common file for demangle and pid/tid. Namespace ThreadUtil. Fix gpu buffer clearing.
* Merge demangle into profiler_common
* Merge demangle into profiler_common part 2
* Style cleanup
* Resolve linking issues via ProviderHost interface
* Demangle cuda kernel names
* Clean up comments
* Fix formatting
* Fix anal retentive formatting
* Make ORT as Pytorch JIT backend
LORT likely doesn't work with aten fallback so we only test LORT in its own CI.
* Revert changes to enable external CUDA allocator. Will add it later.
Revert "Revert changes to enable external CUDA allocator. Will add it later."
This reverts commit d5487f2e193014c805505afae8fb577c53667658.
Fix external allocator
* Relax tolerance and remove commented code
* Print more information in CI
* Fix pointer
* Address comments.
1. Reuse ORT-eager mode's environment.
2. Remove unused ctor.
* Use Pytorch master branch as all PRs are merged
Fix
* Refine based on cpplint feedbacks
* Revert changes to allow custom CUDA allocator in public APIs
* Use torch.testing.assert_close
* Use unittest framework
* Switch docker repo
* Rename *.cpp to *.cc
* Address comments
* Add comment
* Use same pipeline file for eager and lort pipelines
* Address comments
* Add yaml comment
* Fix cmake files
* Address comments
* Rename flags, remove printing code, remove dead comment
* Add build option to link prebuilt TensorRT parser
* Test without the build option to link prebuilt TRTParser
* Minor: update name of build option
* Minor: update name of build option
Losen the following test timeout:
1. "Test Web Multi-Browsers" stage in "ONNX Runtime Web CI Pipeline": 30min -> 60min
2. Node.js binding default per-case timeout: 30 sec -> 90 sec
* refactor test for model with undefined shapes
* add test for TVMso EP
* update build script for TVM EP tests
* fix pylint
* disable test for Windows
* fix black
* fix python format
* fix pylint
* fix python format
* replace Path.resolve with os.path.join
* fix python path issue
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Fix bug where onnxruntime_USE_NCCL flag would default to ON, causing ORT to not build properly. New functionality: flag is ON when training is enabled and NCCL is not disabled. Flag is OFF otherwise
* add description of build ORT+TVM EP on Windows
* fix cmake error related to symlink creation on Windows
* add llvm config path to build flags for correct build on Windows
* update TVM_EP.md for llvm_config build arg
* fix warnings skipping during build on Windows
* fix using string or wstring for model path to correct build on Windows (MSVC error)
* fix error in custom logger for correct build on Windows
* implement glob algorithm for Windows
* additional build fixes
* update TVM with export of VM symbols for dll
* description of nasm issue and workaround
* update TVM with export of Executable from VM symbols for dll
* description of installation of ipp-crypto dependencies on Windows
* cmake key for ipp-crypto build
* fix wstring for TVMso EP
* fix ipp-crypto build
* cmake key onnxruntime_TVM_USE_HASH switch off not specific methods, but full hash functionality
* fix absolute path to compiled lib
* update TVM_EP.md, fix lint warnings
* update TVM_EP.md
* small fixes after review
* switch on handshake functionality for Linux workflow
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>