### Description
This PR updates the version of Dawn to
`b9b4a37041dec3dd62ac92014a6cc1aece48d9f3` (ref:
[chromium](67f86f01dd/DEPS (399)))
in the `deps.txt` file.
The newer version of Dawn includes the previous changes from dawn.patch
so that we can remove the patch file.
There is a little interface changes and code is updated correspondingly.
### Description
Enable coremltools for Linux build. In order to do this, I did:
1. Add uuid-devel to the Linux images and regenerate them.
2. Patch the coremltools code a little bit to add some missing header
files.
### Motivation and Context
To make the code simpler. Later on I will create another PR to remove
the COREML_ENABLE_MLPROGRAM C/C++ macro.
Also, after this PR I will bring more changes to
onnxruntime_provider_coreml.cmake to make it work with vcpkg.
### Description
- Makes QNN EP a shared library **by default** when building with
`--use_qnn` or `--use_qnn shared_lib`. Generates the following build
artifacts:
- **Windows**: `onnxruntime_providers_qnn.dll` and
`onnxruntime_providers_shared.dll`
- **Linux**: `libonnxruntime_providers_qnn.so` and
`libonnxruntime_providers_shared.so`
- **Android**: Not supported. Must build QNN EP as a static library.
- Allows QNN EP to still be built as a static library with `--use_qnn
static_lib`. This is primarily for the Android QNN AAR package.
- Unit tests run for both the static and shared QNN EP builds.
### Detailed changes
- Updates Java bindings to support both shared and static QNN EP builds.
- Provider bridge API:
- Adds logging sink ETW to the provider bridge. Allows EPs to register
ETW callbacks for ORT logging.
- Adds a variety of methods for onnxruntime objects that are needed by
QNN EP.
- QNN EP:
- Adds `ort_api.h` and `ort_api.cc` that encapsulates the API provided
by ORT in a manner that allows the EP to be built as either a shared or
static library.
- Adds custom function to transpose weights for Conv and Gemm (instead
of adding util to provider bridge API).
- Adds custom function to quantize data for LeakyRelu (instead of adding
util to provider bridge API).
- Adds custom ETW tracing for QNN profiling events:
- shared library: defines its own TraceLogging provider handle
- static library: uses ORT's TraceLogging provider handle and existing
telemetry provider.
- ORT-QNN Packages:
- **Python**: Pipelines build QNN EP as a shared library by default.
User can build a local python wheel with QNN EP as a static library by
passing `--use_qnn static_lib`.
- **NuGet**: Pipelines build QNN EP as a shared library by default.
`build.py` currently enforces QNN EP to be built as a shared library.
Can add support for building a QNN NuGet package with static later if
deemed necessary.
- **Android**: Pipelines build QNN EP as a **static library**.
`build.py` enforces QNN EP to be built as a static library. Packaging
multiple shared libraries into an Android AAR package is not currently
supported due to the added need to also distribute a shared libcpp.so
library.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Add custom vcpkg ports for the following packages:
1. cpuinfo
2. onnx
3. pthreadpool
4. xnnpack
Because:
- The cpuinfo/pthreadpool/xnnpack packages in the official vcpkg repo
are too old.
- XNNPack's version is updated from 2022-12-22 to 2025-01-17
- CPUINFO's version is updated from 2022-07-19 to 2024-12-09
- Pthreadpool's version is updated from 2020-04-10 to 2024-12-17, and
the source code location is changed from
https://github.com/Maratyszcza/pthreadpool to
https://github.com/google/pthreadpool
- The onnx package in the official repo requires building python from
source, which then requires a lot of additional dependencies to be
installed. This PR removes them.
- Added a disable_gcc_warning.patch file for xnnpack for addressing the
issue reported in https://github.com/google/XNNPACK/issues/7650. I will
remove this patch when the issue is fully addressed.
- Added " -DONNX_DISABLE_STATIC_REGISTRATION=ON" to ONNX's config
options.
-
### Description
This PR updates the triplets files that manage the compile flags for
vcpkg packages.
All the changes are autogenerated except for the gen.py file in this PR.
Main changes:
1. Enable debug info for all Linux build config(Release and Debug)
2. Set CMAKE_CXX_STANDARD in each triplet. The value is set to 20 for
macOS targets and 17 for the others.
3. Only set _FORTIFY_SOURCE in release build. This is to address a build
issue on some platforms with the following glibc change:
"Warn if user requests __FORTIFY_SOURCE but it is disabled"
https://sourceware.org/git/?p=glibc.git;a=commit;f=include/features.h;h=05c2c9618f583ea4acd69b3fe5ae2a2922dd2ddc
### Motivation and Context
Address a Linux build error.
### Description
This PR allows WebGPU EP to be built with Emscripten for WebAssembly,
Including:
- cmake build files update to support correct setup for Emscripten.
- code changes to fix build breaks for wasm
- change in Web CI pipeline to add a build-only target for wasm with
`--use_webgpu`.
Add support to mainline Onnxruntime of changes from the ROCm Team's changes
### Motivation and Context
Various bugfixes, and changes added between ROCm 6.2 and 6.3 that
haven't been upstreamed yet to mainline
---------
Co-authored-by: Yueqing Zhang <yuz75@Pitt.edu>
Co-authored-by: Yueqing Zhang <yueqingz@amd.com>
Co-authored-by: Jeff Daily <jeff.daily@amd.com>
Co-authored-by: Artur Wojcik <artur.wojcik@outlook.com>
Co-authored-by: Ted Themistokleous <tedthemistokleous@amd.com>
Co-authored-by: Xinya Zhang <Xinya.Zhang@amd.com>
Co-authored-by: ikalinic <ilija.kalinic@amd.com>
Co-authored-by: sstamenk <sstamenk@amd.com>
### Description
Update xnnpack to remove the dependency on psimd and fp16 libraries.
However, coremltool still depends on them, which will be addressed
later.
Also, update CPUINFO because the latest xnnpack requires CPUINFO's avx10
support.
### Motivation and Context
The fewer dependencies the better.
### Description
This PR contains a part of the changes in #23318.
The reason of creating this PR is: The works to support building WebGPU
EP in WASM depends on #23318, which cannot be merged since it's blocked
by upstream (https://github.com/llvm/llvm-project/issues/122166). This
PR contains the changes can be safely merged separately and can unblock
the development of supporting building WebGPU EP in WASM.
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
### Description
Fixes build when specify with flag `--target
onnxruntime_providers_webgpu`
Otherwise the following error will occur:
```
range.cc
D:\code\onnxruntime\build\Windows\Debug\_deps\onnx-src\onnx\onnx_pb.h(65,10): error C1083: Cannot open include file: 'o
nnx/onnx-ml.pb.h': No such file or directory [D:\code\onnxruntime\build\Windows\Debug\onnxruntime_providers_webgpu.vcxp
roj]
(compiling source file '../../../onnxruntime/core/providers/webgpu/math/binary_elementwise_ops.cc')
```
The new images contain the following updates:
1. Added Git, Ninja and VCPKG to all docker images
2. Updated CPU containers' GCC version from 12 to 14
3. Pinned CUDA 12 images' CUDNN version to 9.5(The latest one is 9.6)
4. Addressed container supply chain warnings by building CUDA 12 images
from scratch(avoid using Nvidia's prebuilt images)
5. Updated manylinux commit id to
75aeda9d18eafb323b00620537c8b4097d4bef48
Also, this PR updated some source code to make the CPU EP's source code
compatible with GCC 14.
### Description
This change fixes the WebGPU delay load test.
<details>
<summary>Fix UB in macro</summary>
The following C++ code outputs `2, 1` in MSVC, while it outputs `1, 1`
in GCC:
```c++
#include <iostream>
#define A 1
#define B 1
#define ENABLE defined(A) && defined(B)
#if ENABLE
int x = 1;
#else
int x = 2;
#endif
#if defined(A) && defined(B)
int y = 1;
#else
int y = 2;
#endif
int main()
{
std::cout << x << ", " << y << "\n";
}
```
Clang reports `macro expansion producing 'defined' has undefined
behavior [-Wexpansion-to-defined]`.
</details>
<details>
<summary>Fix condition of build option
onnxruntime_ENABLE_DELAY_LOADING_WIN_DLLS</summary>
Delay load is explicitly disabled when python binding is being built.
modifies the condition.
</details>
### Description
CMake's
[target_link_libraries](https://cmake.org/cmake/help/latest/command/target_link_libraries.html#id2)
function accepts plain library name(like `re2`) or target name(like
`re2::re2`) or some other kinds of names. "plain library names" are
old-fashioned, for compatibility only. We should use target names.
### Motivation and Context
To make vcpkg work with winml build. See #23158
### Description
<!-- Describe your changes. -->
Update CIs to TRT10.7
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
This change fixes the DLL delay load problem for the WebGPU EP and
DirectML EP. See detailed explanation below.
### Problem
When onnxruntime.dll uses delay loading for its dependencies, the
dependencies are loaded using `LoadLibraryEx()`, which search the
directory of process (.exe) instead of this library (onnxruntime.dll).
This is a problem for usages of Node.js binding and python binding,
because Windows will try to find the dependencies in the directory of
node.exe or python.exe, which is not the directory of onnxruntime.dll.
There was previous attempt to fix this by loading DirectML.dll in the
initialization of onnxruntime nodejs binding, which works for DML EP but
is not a good solution because it does not really "delay" the load.
For WebGPU, the situation became worse because webgpu_dawn.dll depends
on dxil.dll and dxcompiler.dll, which are explicitly dynamically loaded
in the code using `LoadLibraryA()`. This has the same problem of the DLL
search.
### Solutions
For onnxruntime.dll loading its direct dependencies, it can be resolved
by set the [`__pfnDliNotifyHook2`
hook](https://learn.microsoft.com/en-us/cpp/build/reference/understanding-the-helper-function?view=msvc-170#structure-and-constant-definitions)
to load from an absolute path that constructed from the onnxruntime.dll
folder and the DLL name.
For webgpu_dawn.dll loading dxil.dll and dxcompiler.dll, since they are
explicitly loaded in the code, the hook does not work. Instead, it can
be resolved by ~~using WIN32 API `SetDllDirectory()` to add the
onnxruntime.dll folder to the search path.~~ preloading the 2 DLLs from
the onnxruntime.dll folder .
### Description
<!-- Describe your changes. -->
This patches Eigen source to remove an unused deprecated static var.
### Motivation and Context
Internal customer request.
### Description
OVEP development changes for ORT 1.21 Release
### Motivation and Context
- Has Critical Bug Fixes
- Improved Performance optimizations for both memory & inference latency
(https://github.com/intel/onnxruntime/pull/513)
- Enabled Model Compilation using NPUW
(https://github.com/intel/onnxruntime/pull/508)
- Fixed support for EPContext embed mode 0 for lower memory utilization
- Updated NuGet package name as `Intel.ML.OnnxRuntime.OpenVino`
- Fixed QDQ Stripping logic on NPU
- Use `ANDROID` instead of `CMAKE_SYSTEM_NAME STREQUAL "Android"`.
- Put common gradle arguments into `COMMON_GRADLE_ARGS` to make them easier to reuse.
### Description
Add fp16 kernel to rotary embedding to boost performance.
### Motivation and Context
Part of performance optimization work for group query attention
### Description
Upgrade version of Dawn.
Removed dawn.patch, because all patches are included in upstream.
Updated code that affected by API changes (`const char*` ->
`WGPUStringView`)
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Allows to build ONNX Runtime with a custom local path of Dawn's source
code.
Usage:
```sh
build --use_webgpu --cmake_extra_defines "onnxruntime_CUSTOM_DAWN_SRC_PATH=C:/src/dawn"
```
### Description
-Add split/pad/neg/not/ceil/round/min/max op support
-Fix conv2d op default pads value issue
-Add VSINPU EP to support python bindings
### Motivation and Context
-New OPs support for VSINPU EP
---------
Signed-off-by: Kee <xuke537@hotmail.com>
Option is named onnxruntime_FORCE_GENERIC_ALGORITHMS
Follow up to https://github.com/microsoft/onnxruntime/pull/22125.
### Description
This change adds compile-time option to disable optimized algorithms and
use generic algorithms (exclude AVX* and SSE etc in GEMM) on x86. This
new option is intended only for testing these algorithms, not for
production use.
Following build command on linux x86_64 builds onnxruntime with new
option enabled:
`./build.sh --parallel --cmake_extra_defines
onnxruntime_FORCE_GENERIC_ALGORITHMS=1`
### Motivation and Context
This change allows testing generic algorithms. This may be needed for
platforms which don't have optimized implementations available, like in
https://github.com/microsoft/onnxruntime/pull/22125.
### Description
A breakdown PR of https://github.com/microsoft/onnxruntime/pull/22651
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Fixes build failure for the cuda minimal build
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
[This change](https://github.com/microsoft/onnxruntime/pull/19470) in
1.20 is causing build failures for the cuda minimal build.
Essentially, some cudnn logic was not guarded by the `USE_CUDA_MINIMAL`.
Also the build is looking for cudnn while in the cuda minimal build it
shouldn't depend on it, resulting in linking error.
cc @gedoensmax @chilo-ms
### Description
OVEP development changes for ORT 1.21 Release
### Motivation and Context
Has critical bug fixes
Support for concurrency execution of models is enabled
Support for OV 2024.5
Memory optimizations for NPU platform
---------
Co-authored-by: jatinwadhwa921 <jatin.wadhwa@intel.com>
Co-authored-by: Ankit Maheshkar <ankit.maheshkar@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel.com>
Co-authored-by: saurabhkale17 <saurabh1.kale@intel.com>
Co-authored-by: TejalKhade28 <tejal.khade@intel.com>
Co-authored-by: Javier E. Martinez <javier.e.martinez@intel.com>
### Description
A break-down PR of https://github.com/microsoft/onnxruntime/pull/22651
Op API change only.
- add template to functions and classes that support fp32 and fp16
- rename functions, classes and files that support fp32 and fp16 from
SQNBxxx to QNBxxx
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
A break down PR of https://github.com/microsoft/onnxruntime/pull/22651
Add fp16 kernels.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
[VitisAI] Cache node subgraph when necessary
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
---------
Co-authored-by: Zhenze Wang <zhenzew@xilinx.com>
Co-authored-by: zhenzew <zhenzew@amd.com>
### Description
With recent changes, below build error is found under AIX.
```
ld: 0706-012 The -p flag is not recognized.
ld: 0706-012 The -a flag is not recognized.
ld: 0706-012 The -t flag is not recognized.
ld: 0706-012 The -h flag is not recognized.
ld: 0706-012 The -= flag is not recognized.
ld: 0706-012 The -$ flag is not recognized.
ld: 0706-012 The -$ flag is not recognized.
ld: 0706-012 The -O flag is not recognized.
ld: 0706-027 The -R IGIN flag is ignored.
collect2: error: ld returned 255 exit status
```
### Motivation and Context
AIX linker doesn't support -rpath option , so blocking this option under
AIX.
### Description
<!-- Describe your changes. -->
* Update CI with TRT 10.6
* Update oss parser to [10.6-GA-ORT-DDS
](https://github.com/onnx/onnx-tensorrt/tree/10.6-GA-ORT-DDS) and update
dependency version
* Update Py-cuda11 CI to use trt10.6
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
(There will be 3rd PR to further reduce trt_version hardcoding)
### Description
* Build cuda nhwc ops by default.
* Deprecate `--enable_cuda_nhwc_ops` in build.py and add
`--disable_cuda_nhwc_ops` option
Note that it requires cuDNN 9.x. If you build with cuDNN 8, NHWC ops
will be disabled automatically.
### Motivation and Context
In general, NHWC is faster than NCHW for convolution in Nvidia GPUs with
Tensor Cores, and this could improve performance for vision models.
This is the first step to prefer NHWC for CUDA in 1.21 release. Next
step is to do some tests on popular vision models. If it help in most
models and devices, set `prefer_nhwc=1` as default cuda provider option.
### Description
Refactor the cmake code that is related to delay loading. Provide a
cmake option to control if delay loading should be enabled or not.
Disabling the option when python is enabled, due to a known issue.
### Motivation and Context
ONNX Runtime's python package depends on DirectML.dll, but supposedly
the DLL should be delay loaded.
This PR only refactor the code. It doesn't change the behavior.
### Description
[DML EP] Update DML to 1.15.4
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
We want the customer to use the latest DirectML.
### Description
Upgrade python from 3.9 to 3.10 in ROCm and MigraphX docker files and CI
pipelines. Upgrade ROCm version to 6.2.3 in most places except ROCm CI,
see comment below.
Some improvements/upgrades on ROCm/Migraphx docker or pipeline:
* rocm 6.0/6.1.3 => 6.2.3
* python 3.9 => 3.10
* Ubuntu 20.04 => 22.04
* Also upgrade ml_dtypes, numpy and scipy packages.
* Fix message "ROCm version from ..." with correct file path in
CMakeList.txt
* Exclude some NHWC tests since ROCm EP lacks support for NHWC
convolution.
#### ROCm CI Pipeline:
ROCm 6.1.3 is kept in the pipeline for now.
- Failed after upgrading to ROCm 6.2.3: `HIPBLAS_STATUS_INVALID_VALUE ;
GPU=0 ; hostname=76123b390aed ;
file=/onnxruntime_src/onnxruntime/core/providers/rocm/rocm_execution_provider.cc
; line=170 ; expr=hipblasSetStream(hipblas_handle_, stream);` . It need
further investigation.
- cupy issues:
(1) It currently supports numpy < 1.27, might not work with numpy 2.x.
So we locked numpy==1.26.4 for now.
(2) cupy support of ROCm 6.2 is still in progress:
https://github.com/cupy/cupy/issues/8606.
Note that miniconda issues: its libstdc++.so.6 and libgcc_s.so.1 might
have conflict with the system ones. So we created links to use the
system ones.
#### MigraphX CI pipeline
MigraphX CI does not use cupy, and we are able to use ROCm 6.2.3 and
numpy 2.x in the pipeline.
#### Other attempts
Other things that I've tried which might help in the future:
Attempt to use a single docker file for both ROCm and Migraphx:
https://github.com/microsoft/onnxruntime/pull/22478
Upgrade to ubuntu 24.04 and python 3.12, and use venv like
[this](27903e7ff1/tools/ci_build/github/linux/docker/rocm-ci-pipeline-env.Dockerfile).
### Motivation and Context
In 1.20 release, ROCm nuget packaging pipeline will use 6.2:
https://github.com/microsoft/onnxruntime/pull/22461.
This upgrades rocm to 6.2.3 in CI pipelines to be consistent.
### Description
1. Remove the onnxruntime::OrtMutex class and replace it with
~absl::Mutex~ std::mutex.
2. After this change, most source files will not include <Windows.h>
indirectly.
### Motivation and Context
To reduce the number of deps we have, and address some Github issues
that are related to build ONNX Runtime from source.
In PR #3000 , I added a custom implementation of std::mutex . It was
mainly because at that time std::mutex's default constructor was not
trivial on Windows. If you had such a mutex as a global var, it could
not be initialized at compile time. Then VC++ team fixed this issue.
Therefore we don't need this custom implementation anymore.
This PR also removes nsync. I ran several models tests on Linux. I
didn't see any perf difference.
This PR also reverts PR #21005 , which is no longer needed since conda
has updated its msvc runtime DLL.
This PR unblocks #22173 and resolves#22092 . We have a lot of open
issues with nsync. This PR can resolve all of them.