### Description
Remove the onnxruntime-extensions submodule since it now was used via
cmake FetchContent
### Motivation and Context
The submodule relies on an outdated version of the extensions, and the
build instructions should be updated to eliminate any confusion.
### Description
This PR fixes build break for WebAssembly introduced in
6986981482
(435ad2b1d8).
This change updates onnx.patch in onnxruntime repo. the corresponding PR
in onnx repo is: https://github.com/onnx/onnx/pull/5495.
It may takes a while for the next onnx version bump.
### Description
Some pipelines are failing. It is because PR #16325 set ONNX version to
`rel-1.14.1` . It is a branch name, not a commit or tag name. It means
whenever the branch got a new commit, we will auto pick it and use it.
### Description
This change upgrade emsdk to 3.1.44.
Because backend is upgraded to LLVM 16, so need to fix a lot of build
failures caused by "-Wshorten-64-to-32".
most of the build failures comes from generated `onnx.pb.h`, and this
can be fixed by including "core/graph/onnx_protobuf.h", which detects
and ignore shorten-64-to-32 warnings.
This is in preparation for planned ROCm 6.0 changes that are not
backward compatible. However, the adjustments made by this PR to the
current onnxruntime cmake files will work with ROCm 5.x and 6.x.
### Description
1. Add "--windows_sdk_version" argument to build.py
2. Fix Windows Static Analysis build pipeline. It is failing because it
picks up a different Windows SDK version after a build machine image
update. If we can explicitly specify Windows SDK version, we can avoid
such things happening again.
3. Remove --enable_training from Windows Static Analysis build pipeline
because PR #16993 makes it incompatible with "no_rtti".
AB#18315
OpenVINO EP ORT 5.1 Branch
Changes for the new API to take in OpenVINO Provider Options
and compatibility with OV 2023.1
### Motivation and Context
The change is required for the new API to take in OpenVINO Provider
Options
and make it seamless.
---------
Signed-off-by: MaajidKhan <n.maajid.khan@intel.com>
Co-authored-by: saurabhintel0 <saurabh1.kale@intel.com>
Co-authored-by: MaajidKhan <n.maajid.khan@intel.com>
Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com>
Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>
### Description
<!-- Describe your changes. -->
Adjust nativs to display tagged strings.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Hard to debug without seeing names.
### Motivation and Context
When we handle PyTorch models' inputs in different places (ORTModule or
others), it's common for us to flatten a structured data into a 1-D
tensor list (required by lib for example torch.onnx.export,
torch.autograd.Function.forward or ORT inference session), then do
subsequent work, then unflatten back to original hierarchy as returned
values.
DeepStage3 hooks support work also need such a lib to do similar things,
so I was proposing to extract this pair of APIs in training/utils/,
which can be more used more generally. Also a comprehensive set of test
data are used for testing unflatten/flatten in unit tests.
Let me know if you have any other suggestions.
### Refactor schema extraction and output unflattening
Move `_extract_schema` and `unflatten_user_output` in
`orttraining/orttraining/python/training/ortmodule/_io.py` . to
`extract_data_and_schema` and `unflatten_data_using_schema` in
`orttraining/orttraining/python/training/utils/torch_io_helper.py` as
shared libs, which can be used later by other features (deepspeed stage
3 hook rewrite).
While there are still a few duplicated logic handling flatten with
different task by recursively loop the data struct, will change them
step by step in case of heavy review efforts.
Last week I fixed error #16484 found when trying to build onnxruntime
with the icpx compiler. Another thing I found out is that icpx uses
-ffast-math flag by default. You can check it by running the compiler
with -v flag like following:
```bash
# Setup the environment
. /opt/intel/oneapi/setvars.sh
# Compile any file to see all the implicit flags
icpx -v main.cpp
```
This leads to a bunch of warnings during the build like:
```bash
In file included from /mnt/f/wsl_home/onnxruntime/onnxruntime/test/providers/cpu/tensor/upsample_op_test.cc:5:
In file included from /mnt/f/wsl_home/onnxruntime/onnxruntime/test/providers/provider_test_utils.h:6:
In file included from /mnt/f/wsl_home/onnxruntime/onnxruntime/test/providers/checkers.h:10:
In file included from /mnt/f/wsl_home/onnxruntime/onnxruntime/core/util/math_cpuonly.h:68:
In file included from /mnt/f/wsl_home/onnxruntime/build/Linux/RelWithDebInfo/_deps/eigen-src/Eigen/Core:172:
/mnt/f/wsl_home/onnxruntime/build/Linux/RelWithDebInfo/_deps/eigen-src/Eigen/src/Core/MathFunctions.h:1019:12: warning: comparison with NaN always evaluates to false in fast floating point modes [-Wtautological-constant-compare]
return isnan EIGEN_NOT_A_MACRO (x);
^~~~~~~~~~~~~~~~~~~~~~~~~~~
```
And some tests are failing as well, usually with infinities involved. To
list a few:
```bash
# ...
1: [ FAILED ] IsInfTest.test_isinf_float
1: [ FAILED ] IsInfTest.test_isinf_double
1: [ FAILED ] IsInfTest.test_isinf_positive_float
1: [ FAILED ] IsInfTest.test_isinf_positive_double
1: [ FAILED ] IsInfTest.test_isinf_negative_float
1: [ FAILED ] IsInfTest.test_isinf_negative_double
1: [ FAILED ] IsNaNOpTest.IsNaNFloat
1: [ FAILED ] IsNaNOpTest.IsNaNDouble
# ...
```
This PR adds a quick global check for the IntelLLVM compiler, as in the
way its name is reported by CMake and then, depending on the compiler
driver, sets either MSVC-like or GCC-like switch to disable fast-maths.
Probably a bit cleaner solution would be to use
```target_compile_options(${TARGET} PRIVATE MEOW)``` instead of a
global-wide ```set(CMAKE_CXX_FLAGS MEOW)```, but then we'd be required
to add it to all the individual targets and execution providers and this
will lead to a lot of code duplication.
In additions to `onnxruntime_test_all`, `onnxruntime_shared_lib_test`
and `onnxruntime_customopregistration_test` should
also add "-Wno-deprecated-declarations" flag to ignore compiler warning
### Description
1. As a follow-up of #16761, this PR allows build ORT on iOS/Android
without the need to explicitly specify a protoc path. #16761 is for
WASM. This one is for iOS/Android
2. Update the MacOS/Linux build scripts that build/install protobuf from
source. Make them be more flexible. Add the support for
RedHatEnterprise(ubi), which will needed for upgrading the base image
from centos:7 to ubi:8.
3. Update tools/ci_build/github/pai/rocm-ci-pipeline-env.Dockerfile :
the docker file's base image has preinstalled protobuf in /usr/local, we
should uninstall them to avoid conflicts.
### Description
Added Gelu operator to JSEP
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Add some safety check for conv op.
It is to validate if the attributes coming from a conv op are in a valid
range. (shouldn't be too large or too small).
### Description
Changes allow downloading prebuilt protoc compiler when building
WebAssebly version on mac systems.
Otherwise it tries to build a js/wasm version of protoc and throws an
error while executing it: "protoc.js permission denied"
### Motivation and Context
I need to switch between my main working computer and a PC to make
changes to WebAssebly build. Would like not to do that anymore.
Otherwise, an unsupported version of gtest/gmock will be found at
/opt/conda/include for ROCm builds. Though this issue was initially
found for ROCm builds, the issue is generic. onnxruntime requires a
specific version of googletest and should not rely on locating
googletest using find_package.
The ROCm error was:
```
In file included from /opt/conda/include/gmock/gmock-spec-builders.h:75,
from /opt/conda/include/gmock/gmock-generated-function-mockers.h:47,
from /opt/conda/include/gmock/gmock-function-mocker.h:39,
from /opt/conda/include/gmock/gmock.h:61,
from /stage/onnxruntime/onnxruntime/test/util/test_utils.cc:17:
/opt/conda/include/gmock/gmock-matchers.h: In instantiation of ‘bool testing::internal::PointwiseMatcher<TupleMatcher, RhsContainer>::Impl<LhsContainer>::
MatchAndExplain(LhsContainer, testing::MatchResultListener*) const [with LhsContainer = const gsl::span<const float>&; TupleMatcher = testing::internal::
FloatingEq2Matcher<float>; RhsContainer = gsl::span<const float>]’:
/opt/conda/include/gmock/gmock-matchers.h:2303:10: required from here
/opt/conda/include/gmock/gmock-matchers.h:2312:48: error: no type named ‘const_iterator’ in ‘testing::internal::PointwiseMatcher<testing::internal::
FloatingEq2Matcher<float>, gsl::span<const float> >::Impl<const gsl::span<const float>&>::LhsStlContainer’ {aka ‘class gsl::span<const float>’}
```
- Fix link errors by including the needed onnxruntime-extensions libraries in the static framework.
- Add Objective-C API to register custom ops from embedded onnxruntime-extensions.
Caveat: Not all onnxruntime-extensions build options are working yet. E.g., building with the onnxruntime-extensions OpenCV dependency does not work.
### Description
Introduce `Float16/BFloat16` support for C# and C++ APIs.
User should be able to perform conversions from `float` to/from
`Float16/BFloat16`, compare values and tests for `NaN, Inifnity, and
whether the number is denormalized.`
### Motivation and Context
User filed issues such as:
https://github.com/microsoft/onnxruntime/issues/14303
### Description
1. Replacing AMX intrinsics with machine code macros in QGEMM kernel.
2. Removing AMX build flags for GCC in cmake file.
3. Fixing the link time optimization (LTO) issue introduced with asm
.include of an assembly file.
I have moved the AMX instruction macro definitions from
QgemmU8S8KernelAmxCommon.S to the amx_common.h to fix the LTO issue.
Note that I am also pushing the macros defined in
QgemmU8S8KernelAmxCommon.S for future reference.
A special thanks to @laxmansole who helped in the development of the
instruction macro definitions for AMX intrinsics and fixing the LTO
issue.
### Motivation and Context
The additional AMX flag in cmake adds an extra layer of dependency on
GCC version to use the feature.These changes should allow the usage of
the AMX feature with just the CPU ID check.
### Description
- Update existing rocBLAS get_solutions API using
`*_get_solutions_by_type` (supported from ROCm5.6); remove the original
nested TunableOp logic.
- Update kernel_explorer.
### Description
<!-- Describe your changes. -->
Add ort_value.h to session_options.h so OrtValue is defined.
Update a unit test binary to add required include paths. Adding
ort_value.h pulls in more data type headers.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
#16193
- Fix some warnings from Xcode build (`-Wshorten-64-to-32`).
- Enable `-Wshorten-64-to-32` warning if available. Currently it's not fully enabled for `onnxruntime_test_all` and `onnxruntime_providers_xnnpack` yet.
- Some clean up in build.py including setting CMake generator more consistently.
### Description
<!-- Describe your changes. -->
Split out the more basic changes from #15552 for easier review.
Re-organize to clarify the structure
- Separate out generic base functionality from ORT specific components
- pass in handlers for internal ORT ops to Optimize
- Split out layout transformation from transpose optimization
- Separate out level 1 transpose optimizer
- Cleanup some naming to try and clarify things like an optimizer vs.
general optimization code
Most of the changes are from this movement of code.
Two implementation changes:
- the extended handlers are queried first in GetHandler
- allows the extended handlers to override the default behaviour for an
ONNX operator
- simplify the Optimize function to remove OptimizerMode.
- `can_modify_node` is used instead of `mode` and
`ignore_assigned_nodes` and a long description of the current usage is
added. I don't _think_ that changes the current behavior and hopefully
clarifies what happens and when, and makes the base transpose optimizer
implementation more generic.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Create a cleaner separation to support adding EP specific logic next to
cleanly handle where an EP has additional layout sensitive behaviour
required (e.g. it's Resize implementation only handles one layout).
### Description
Remove AllocatorManager class
### Motivation and Context
After the refactor PR #15833 is in, AllocatorManager class is not
referenced anymore.
### Description
- Implement Objective-C binding for `ORTTrainingSession`
- Add `ORTUtils` utility class to handle conversion between C++ and
Objective-C types
- Add test case for saving checkpoint
- Add unit test cases for `ORTTrainingSession`
### Motivation and Context
This PR is part of implementing Objective-C bindings for training API.
It implements objective-c binding for training session. The objective-C
API closely resembles the C++ API.
---------
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
### Description
<!-- Describe your changes. -->
Update file list to adjust for recent changes to test infra.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
🛠️ __Changes in this pull request:__
This pull request introduces two significant changes to the project:
- Changing on device training checkpoint format: The current
implementation stores the on device training checkpoint as a sequence of
tensors in multiple files inside a checkpoint folder, which can be
inefficient in terms of storage and performance. In this PR, I have
modified the checkpoint format to utilize the flatbuffer table to save
the checkpoint to a single file, providing a more compact and efficient
representation. The changes around this are twofold:
- Add the checkpoint flatbuffer schema that will generate the necessary
checkpoint source files.
- Update the checkpoint saving and loading functionality to use the new
format.
- Adding support for onnxruntime minimal build: To support scenarios
where binary size is a constraint, I made changes to ensure that the
training build can work well with the minimal build.
🔍 __Open Issues:__
- In order to extract the optimizer type, the existing implementation
re-loaded the onnx optimizer model and parsed it. This is no longer
possible, since the model format can either be onnx or ort. One idea is
to do the same for ort format optimizer model. This needs some
investigation.
- Changes to the offline tooling to generate ort format training
artifacts.
- End-to-end training example showcasing the use of the minimal training
build.
- Add support for export model for inferencing in a minimal build.
CUDA EP already supports [CUDA
graph](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cuda-graphs),
also we observed some models can benefit from using CUDA graph with
`trtexec`. Therefore, this PR enables the CUDA graph support for TRT EP.
The implementation is based on
https://github.com/microsoft/onnxruntime/pull/9978 with the same
[constraints](https://github.com/microsoft/onnxruntime/pull/9978) as
below:
- Models with control-flow ops (i.e. If, Loop and Scan ops) are not
supported.
- Usage of CUDA Graphs is limited to models where-in all the model ops
(graph nodes) can be partitioned to the TRT EP.
- The input/output types of models need to be tensors.
- Shapes of inputs/outputs cannot change across inference calls.
- IObinding is required.
### Description
<!-- Describe your changes. -->
1. Add a new test lib `onnxruntime_providers_cuda_ut` which is similar
to `onnxruntime_providers_cuda` but `onnxruntime_providers_cuda_ut` is
only built if `onnxruntime_BUILD_UNIT_TESTS` is set. We can call all
CUDA UTs through this ut lib without affecting production lib
`onnxruntime_providers_cuda`.
2. Move all test cases from `core/providers/cuda/test/` to
`test/providers/cuda/`. These test cases are built into lib
`onnxruntime_providers_cuda_ut` and run by `./onnxruntime_test_all
--gtest_filter="*CUDA_EP_Unittest*"`. Since the lib is only for test, we
can use gtest macros in the test cases. Previous implementation do not
support using gtest lib in the CUDA UT cases.
3. The cmake code in `cmake/onnxruntime_providers.cmake` is refactored a
bit. A new function `onnxruntime_add_object_library` is to build a
object target. The 2 libs `onnxruntime_providers_cuda_ut` &
`onnxruntime_providers_cuda` share most of the code, so the object files
can be used in both libs, which helps reduce build time. Another
function `config_cuda_provider_shared_module` is used to configure all 3
similar
targets(onnxruntime_providers_cuda_obj/onnxruntime_providers_cuda/onnxruntime_providers_cuda_ut).
4. Refactored the test to call `testing::InitGoogleTest` &
`RUN_ALL_TESTS` in `libonnxruntime_providers_cuda_ut.so`'s `TestAll`.
After this change, we can see all the cases running in
`CUDA_EP_Unittest.All`:

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
After https://github.com/microsoft/onnxruntime/pull/13016, there are
still test files in test/providers/cuda/ that are not moved to
core/providers/cuda/test/ and the test cases are disabled. This PR helps
to clean the unfinished TODOs.
Even through onnxruntime_shared_lib_test covers some test for CUDA
provider. onnxruntime_shared_lib_test works like a coarse grain
end-to-end test for CUDA provider. If CUDA unittest can run cases for a
single component, this wound be helpful for CUDA developers.
---------
Co-authored-by: Yuhong Guo <yuhong.gyh@antgroup.com>
**Description**:
Adds support for cmake find_package.
**Motivation and Context**
As mentioned in issue #7150 onnxruntime doesn't have support for CMake
find_package, this PR adds that and also adds the CMake package version
file. Now anyone can link onnxruntime like this:
```cmake
find_package(onnxruntime)
add_executable(test Source.cpp)
target_link_libraries(test PRIVATE onnxruntime::onnxruntime)
```
this also simplifies #3124
### Description
<!-- Describe your changes. -->
Split up OpTester to separate out operator vs model testing. This led to
a lot of other cleanups/refactoring.
- create BaseTester class and derived OpTester/ModelTester classes to
limit APIs to what is applicable for each test type
- e.g. adding an attribute isn't relevant to a model test
- cleanup structure
- don't expose member variables either directly or via public methods
returning them
- split out checkers so they can be easily re-used
- refactor so there's one public Check method for comparing two OrtValue
instances containing any data type
- refactor the GradientOpTester usage
- it required a lot of OpTester internals to be exposed and no other
tests needed this
- it also returned Status through various parts which prevented the
usage of the google test macros which provide better output. change to
return void and use the macros.
- fix some other minor issues
- update some cmake files so all the source files are included
- remove some low value helpers (FetchTensor and GetShapeVector)
- remove some outdated code to allow unreleased opset versions from when
onnx opset 15 wasn't released
- move files from test/util/include/test to test/util/include
- doesn't seem to be any reason for the additional subdirectory given
they're not files use to test the code in test/util
- files were moved with no changes
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Cleanup test infrastructure.
---------
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
### Description
This PR enables execution of subgraphs in OVEP and currently, when OVEP
developers install the onnxruntime-openvino package on windows from
pypi, they would have to additionally download OpenVINO windows binaries
and run the setupvars.bat script which sets the environment PATH to
locate the OV dll's. Also this PR fixes issues of OVEP windows io buffer
sample.
### Motivation and Context
Fix: We want to make the user experience easy for OVEP Python developers
on windows platform.
This fix, introduces a function add_openvino_libs_to_path at the
location tools/python/util/add_openvino_win_libs.py.
The above function, can be called by OVEP python users in the
application code and that takes care of setting
the OpenVINO dll's to the path from the OpenVINO pypi packge (openvino)
which was installed.
This change also makes sure that add_openvino_libs_to_path() function is
added to onnxruntime python package
only when it is build for OpenVINO Execution Provider for ONNXRuntime
and not for default ORT python package builds.
New user experience for Python OVEP developers on windows platform:
step 1: pip install onnxruntime-openvino
step 2: pip install openvino
step 3: <Add these 2 lines in the application code>
import onnxruntime.tools.add_openvino_win_libs as utils
utils.add_openvino_libs_to_path()
---------
Signed-off-by: MaajidKhan <n.maajid.khan@intel.com>
Co-authored-by: MaajidKhan <n.maajid.khan@intel.com>
Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com>
### Description
Before this PR, the CMake code assumed that when on Windows a
multiple-config CMake generator was used, while on non-Windows there was
the assumption of a single-config CMake generator. After this PR this
information is obtained from the
[`GENERATOR_IS_MULTI_CONFIG`](https://cmake.org/cmake/help/latest/prop_gbl/GENERATOR_IS_MULTI_CONFIG.html)
global CMake propery.
### Motivation and Context
I discovered this problem when building with Ninja generator on Windows,
but I guess this should fix problems also on non-Windows platforms when
using a multiple-config generator (such as Xcode on macOS or "Ninja
Multi-Config" on all platforms).
See
https://cmake.org/cmake/help/latest/prop_gbl/GENERATOR_IS_MULTI_CONFIG.html
for more info.