Commit graph

1826 commits

Author SHA1 Message Date
shaoboyan091
5ef18328bf
[WebGPU] Support PIX Capture for WebGPU EP (#23192)
PIX Capture tool requires 'present' to end a frame capture. ORT doesn't
have rendering work so no 'present' happens.

To avoid endless waiting for PIX capture tool, this PR added a blank
surface and 'present' on it in each session run.

The surface is created in WebGPU ep constructor and closed in WebGPU ep
destructor.

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2025-02-08 02:05:15 -08:00
Ankit Maheshkar
a6ea57b8f3
OpenVINO EP Weights Sharing Feature (#23553)
### Description
These changes are done to ensure that weight sharing happens between two model using session context option ep_weight_sharing.

Key changes introduced in this feature are:

Creating a shared context between two models Extracting external constant initializers and re labelling them back as
inputs to the model to allow weight loading in the direct blob. Creating EP Context Nodes when Subgraph partitioning is happening.

### Motivation and Context
This change was required to ensure that LLM with prefill and kvcache models can use the same share
The change was also required to ensure EP Context nodes can be formed even when model is being subgraph partitioned.

---------

Co-authored-by: jatinwadhwa921 <jatin.wadhwa@intel.com>
Co-authored-by: jatinwadhwa921 <110383850+jatinwadhwa921@users.noreply.github.com>
Co-authored-by: saurabh <saurabh1.kale@intel.com>
Co-authored-by: TejalKhade28 <tejal.khade@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel.com>
Co-authored-by: Javier E. Martinez <javier.e.martinez@intel.com>
Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>
Co-authored-by: Eric Crawford <eric.r.crawford@intel.com>
2025-02-06 14:57:38 -08:00
Changming Sun
328a13c06d
Enable VCPKG in more pipelines (#23590)
### Description
Enable VCPKG in more pipelines
2025-02-06 10:10:31 -08:00
Yifan Li
6728d6085d
[TensorRT EP] support TensorRT 10.8-GA (#23592)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2025-02-06 10:05:57 -08:00
Changming Sun
5f6a3158f8
Enable VCPKG in CI build (#23426)
### Description
1. Enable VCPKG flag in Windows CPU CI build pipelines. 
2. Increased the min supported cmake version from 3.26 to 3.28. Because
of it, drop the support for the old way of finding python by
"find_package(PythonLibs)". Therefore, in build.py we no longer set
"PYTHON_EXECUTABLE" cmake var when doing cmake configure.
3. Added "xnnpack-ep" as a feature for ORT's vcpkg config.
4. Added asset cache support for ORT's vcpkg build
5. Added VCPKG triplet files for Android build.
6. Set VCPKG triplet to "universal2-osx" if CMAKE_OSX_ARCHITECTURES was
found in cmake extra defines.
7. Removed a small piece of code in build.py, which was for support CUDA
version < 11.8.
8. Fixed an issue that CMAKE_OSX_ARCHITECTURES sometimes got specified
twice when build.py invoked cmake.
9. Added more model tests to Android build. After this change, we will
test all ONNX versions instead of just the latest one.
10. Fixed issues that are related to build.py's "--build_nuget"
parameter. Also, enable the flag in most Windows CPU CI build jobs.
11. Removed a restriction in build.py that disallowed cross-compiling
Windows ARM64 nuget package on Windows x86.
 
### Motivation and Context
Adopt vcpkg.
2025-02-05 10:58:53 -08:00
Gavin Kinsey
1fce51b3b2
Fix all instances of 4244 and 4267 warnings in OV EP code (#23567)
### Description
Remove MSVC warnings 4244, 4267 from the list of disabled warnings in
cmake.
Fix the code that generates the warnings so that it no longer does.

### Motivation and Context
This makes onnxruntime_providers_openvino.dll pass BinSkim analysis.
Without this change BinSkim complains about the disabled warnings.
2025-02-04 17:13:27 -08:00
Tianlei Wu
9e18b6a0f3
[CUDA] Update nvcc flags (#23572)
### Description
(1) Remove `if (CMAKE_CUDA_COMPILER_VERSION VERSION_GREATER_EQUAL 11)`
since build requires cuda >= 11.4.
(2) Add sm_86 and sm_89 since we generate SASS code for specified cuda
architectures only. This change could support popular consumer GPUs
(like RTX 30X0 and RTX 40X0).
(3) Add sm_120 to support Blackwell GPUs (like RTX 50X0 etc).
(4) Add `-Xfatbin=-compress-all` to reduce wheel size. When
CMAKE_CUDA_ARCHITECTURES is not specified, the linux wheel size built by
CUDA 12.8 is reduced 8% (from 324MB to 299MB).

### Motivation and Context

To support popular consumer GPUs (RTX 30x0, 40x0, 50x0) in the default
setting. Reduce binary size.

Note that the default sm settings does not impact official released
binary. ORT official released binary are built with augmentation like
CMAKE_CUDA_ARCHITECTURES=75;80;90, which has both SASS (real) and PTX
(virtual) by default. See
https://cmake.org/cmake/help/latest/prop_tgt/CUDA_ARCHITECTURES.html for
more info.
2025-02-04 11:47:02 -08:00
George Wu
bb7f9616e6
remove log spam from cpuinfo (#23548)
cpuinfo outputs error when cpu is not recognized. 
this has been a longstanding issue e.g. 
https://github.com/microsoft/onnxruntime/issues/21947
https://github.com/microsoft/onnxruntime/issues/21393

this issue has been exacerbated by
https://github.com/microsoft/onnxruntime/pull/22856
this change
4fa0f1e0ed/onnxruntime/core/mlas/lib/qnbitgemm_kernel_neon.cpp (L189)
causes the messages to appear during static initialization.

this means for python, when you import onnxruntime you immediately see
the errors.

```
>>> import onnxruntime
Error in cpuinfo: Unknown chip model name 'snapdragon (tm) 8cx gen 3 @ 3.40 GHz'.
Please add new Windows on Arm SoC/chip support to arm/windows/init.c!
unknown Qualcomm CPU part 0x1 ignored
unknown Qualcomm CPU part 0x1 ignored
unknown Qualcomm CPU part 0x1 ignored
unknown Qualcomm CPU part 0x1 ignored
unknown Qualcomm CPU part 0x1 ignored
unknown Qualcomm CPU part 0x1 ignored
unknown Qualcomm CPU part 0x1 ignored
unknown Qualcomm CPU part 0x1 ignored
unknown Qualcomm CPU part 0x1 ignored
unknown Qualcomm CPU part 0x1 ignored
unknown Qualcomm CPU part 0x1 ignored
unknown Qualcomm CPU part 0x1 ignored
```

Fix is to patch pytorch_cpuinfo and to comment out std::cerr lines in
cpuid_uarch.cc
the errors are not actionable by the user, so they should not be
emitted.

tested that after these changes, these errors no longer show up.
2025-01-31 18:16:24 -08:00
PARK DongHa
169917b1e7
Use latest vcpkg commit in configuration, sync manifest with deps.txt (#23554)
### Description

`python3` dependency is removed in `onnx` port of
https://github.com/microsoft/vcpkg upstream.

* https://github.com/microsoft/vcpkg/pull/43236
*
https://github.com/microsoft/onnxruntime/pull/23285#issuecomment-2579073056
(Previous work)

Removed `nsync`, and use ONNX 1.70.0+ in vcpkg.json(manifest)

### Motivation and Context

* Help #23158
* #23456
2025-01-31 12:34:07 -08:00
Takeshi Watanabe
7e2408880e
Enable dlpack by default (#23110)
### Description
<!-- Describe your changes. -->
This PR will enable python dlpack interface by default.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

dlpack python interface is useful in inference mode not only training
mode.
Since some inference result preprocess may be written in torch and
making unnecessary device transfer should be reduced in those cases.
closes https://github.com/microsoft/onnxruntime/issues/15963 closes
https://github.com/microsoft/onnxruntime/issues/22061

TODOs:
- [x] Add tests like
5407c69028/orttraining/orttraining/test/python/orttraining_test_ortvalue.py
that's unrelated to training feature

---------

Co-authored-by: Xavier Dupré <xadupre@users.noreply.github.com>
Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>
2025-01-30 23:23:56 +01:00
Karim Vadsariya
655a23ff1d
[onnxruntime/build] Add new flag enable_generic_interface to build primary EPs by default (#23342)
### Description
- Add new build flag in build.py to build onnxruntime.dll supporting
interfaces for all primary EPs( QNN, TensoRT, OpenVino, VitisAI).
- Modify onnxruntime.dll/onnxruntime_shared.dll build settings to remove
dependency of IHV SDK Toolset to be installed on the system.
- Change CMake variables to be explicit when building EP vs ORT. e.g.
onnxruntime_USE_TENSORRT vs onnxruntime_USE_TENSORRT_INTERFACE, to
evolve the build system to build ORT independent of EPs.



### Motivation and Context
Changes in the build system required to evolve the repo to build the
components independently while removing unnecessary dependencies

---------

Co-authored-by: Lei Cao <jslhcl@gmail.com>
Co-authored-by: Karim Vadsariya <kvadsariya@microsoft.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2025-01-28 15:24:09 -08:00
Yulong Wang
8db97a68f2
[webgpu] Bump version of Dawn to b9b4a370 (#23494)
### Description

This PR updates the version of Dawn to
`b9b4a37041dec3dd62ac92014a6cc1aece48d9f3` (ref:
[chromium](67f86f01dd/DEPS (399)))
in the `deps.txt` file.

The newer version of Dawn includes the previous changes from dawn.patch
so that we can remove the patch file.

There is a little interface changes and code is updated correspondingly.
2025-01-27 14:02:06 -08:00
Changming Sun
1fc9c4823d
Enable coremltools for Linux build (#23481)
### Description

Enable coremltools for Linux build. In order to do this, I did:

1. Add uuid-devel to the Linux images and regenerate them.
2. Patch the coremltools code a little bit to add some missing header
files.

### Motivation and Context
To make the code simpler. Later on I will create another PR to remove
the COREML_ENABLE_MLPROGRAM C/C++ macro.
Also, after this PR I will bring more changes to
onnxruntime_provider_coreml.cmake to make it work with vcpkg.
2025-01-24 18:18:37 -08:00
Jing Fang
13348c572a
[ARM CPU] hgemm optimized for gqa (#23107)
### Description
Add fp16 kernels for GQA matmul on ARM CPU.
The kernels are mlas hgemm for C = alpha * A x B' + beta * C


### Motivation and Context
Add fp16 support for GQA, speed up the operator and reduce memory usage.

__Token Generation__
| | HGEMM Runtime (ns) | SGEMM Runtime (ns) | Speed-up (%) |

|---------------------------------|--------------------|--------------------|--------------|
| M:1/N:4096/K:4096 | 251551 | 1775905 | 85.84 |
| M:1/N:11008/K:4096 | 892507 | 4649145 | 80.80 |
| M:1/N:4096/K:11008 | 866860 | 3240015 | 73.25 |
| M:1/N:11008/K:11008 | 2631615 |8783877 | 70.04 |

__Prompting__
| | HGEMM Runtime (ns) | SGEMM Runtime (ns) | Speed-up (%) |

|---------------------------------|--------------------|--------------------|--------------|
| M:1024/N:4096/K:4096 | 90508701 | 111283029 | 18.67 |
| M:2048/N:4096/K:4096 | 181307522 | 240211107 | 24.52 |
| M:1024/N:11008/K:4096 | 241120234 | 307707933 | 21.64 |
| M:2048/N:11008/K:4096 | 481091232 | 648921367 | 25.86 |
| M:1024/N:4096/K:11008 | 241736343 | 310129880 | 22.05 |
| M:2048/N:4096/K:11008 | 480456703 | 644814999 | 25.49 |
| M:1024/N:11008/K:11008 | 642121440 | 847925766 | 24.27 |
| M:2048/N:11008/K:11008 | 1276097154 | 1731314509 | 26.29
2025-01-24 15:25:24 -08:00
Adrian Lizarraga
3b4c7df4e9
[QNN EP] Make QNN EP a shared library (#23120)
### Description
- Makes QNN EP a shared library **by default** when building with
`--use_qnn` or `--use_qnn shared_lib`. Generates the following build
artifacts:
- **Windows**: `onnxruntime_providers_qnn.dll` and
`onnxruntime_providers_shared.dll`
- **Linux**: `libonnxruntime_providers_qnn.so` and
`libonnxruntime_providers_shared.so`
  - **Android**: Not supported. Must build QNN EP as a static library.
- Allows QNN EP to still be built as a static library with `--use_qnn
static_lib`. This is primarily for the Android QNN AAR package.
- Unit tests run for both the static and shared QNN EP builds.

### Detailed changes
- Updates Java bindings to support both shared and static QNN EP builds.
- Provider bridge API:
- Adds logging sink ETW to the provider bridge. Allows EPs to register
ETW callbacks for ORT logging.
- Adds a variety of methods for onnxruntime objects that are needed by
QNN EP.
- QNN EP:
- Adds `ort_api.h` and `ort_api.cc` that encapsulates the API provided
by ORT in a manner that allows the EP to be built as either a shared or
static library.
- Adds custom function to transpose weights for Conv and Gemm (instead
of adding util to provider bridge API).
- Adds custom function to quantize data for LeakyRelu (instead of adding
util to provider bridge API).
  - Adds custom ETW tracing for QNN profiling events:
    - shared library: defines its own TraceLogging provider handle
- static library: uses ORT's TraceLogging provider handle and existing
telemetry provider.
- ORT-QNN Packages:
- **Python**: Pipelines build QNN EP as a shared library by default.
User can build a local python wheel with QNN EP as a static library by
passing `--use_qnn static_lib`.
- **NuGet**: Pipelines build QNN EP as a shared library by default.
`build.py` currently enforces QNN EP to be built as a shared library.
Can add support for building a QNN NuGet package with static later if
deemed necessary.
- **Android**: Pipelines build QNN EP as a **static library**.
`build.py` enforces QNN EP to be built as a static library. Packaging
multiple shared libraries into an Android AAR package is not currently
supported due to the added need to also distribute a shared libcpp.so
library.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2025-01-22 12:11:00 -08:00
Changming Sun
77adf4b040
Add custom vcpkg ports (#23456)
### Description
Add custom vcpkg ports for the following packages:
1. cpuinfo
2. onnx 
3. pthreadpool  
4. xnnpack

Because:

- The cpuinfo/pthreadpool/xnnpack packages in the official vcpkg repo
are too old.
   - XNNPack's version is updated from 2022-12-22 to 2025-01-17
   - CPUINFO's version is updated from 2022-07-19 to 2024-12-09
- Pthreadpool's version is updated from 2020-04-10 to 2024-12-17, and
the source code location is changed from
https://github.com/Maratyszcza/pthreadpool to
https://github.com/google/pthreadpool
- The onnx package in the official repo requires building python from
source, which then requires a lot of additional dependencies to be
installed. This PR removes them.
- Added a disable_gcc_warning.patch file for xnnpack for addressing the
issue reported in https://github.com/google/XNNPACK/issues/7650. I will
remove this patch when the issue is fully addressed.
- Added " -DONNX_DISABLE_STATIC_REGISTRATION=ON" to ONNX's config
options.
-
2025-01-22 11:49:16 -08:00
Changming Sun
3dcc90119b
Update the compile flags for vcpkg packages (#23455)
### Description

This PR updates the triplets files that manage the compile flags for
vcpkg packages.
All the changes are autogenerated except for the gen.py file in this PR.

Main changes:
1. Enable debug info for all Linux build config(Release and Debug)
2. Set CMAKE_CXX_STANDARD in each triplet. The value is set to 20 for
macOS targets and 17 for the others.
3. Only set _FORTIFY_SOURCE in release build. This is to address a build
issue on some platforms with the following glibc change:
"Warn if user requests __FORTIFY_SOURCE but it is disabled"

https://sourceware.org/git/?p=glibc.git;a=commit;f=include/features.h;h=05c2c9618f583ea4acd69b3fe5ae2a2922dd2ddc


### Motivation and Context
Address a Linux build error.
2025-01-22 11:48:38 -08:00
Changming Sun
368e243194
Make ORT and Dawn use the same protobuf/abseil source code (#23447)
### Description
Make ORT and Dawn use the same protobuf/abseil source code
2025-01-21 17:17:47 -08:00
junchao-zhao
5b5aa11b83
Fix eigen external deps (#23439)
### Description
<!-- Describe your changes. -->

I think we should not use the eigen in the system directly, but should
first use the eigen specified in deps.txt. in ubuntu22.04, ORT fails to
compile when I install libeigen3-dev (which ROS2 humble depends on). The
error message is below:

```
[ 62%] Built target onnxruntime_lora
[ 62%] Building CXX object CMakeFiles/onnxruntime_session.dir/home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/session/IOBinding.cc.o
/home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.cc: In member function ‘onnxruntime::common::Status onnxruntime::Min_6<T>::Compute(onnxruntime::OpKernelContext*) const [with T = float]’:
/home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.cc:750:56: error: no matching function for call to ‘Eigen::ArrayWrapper<Eigen::Map<Eigen::Matrix<float, -1, 1>, 0, Eigen::Stride<0, 0> > >::min<Eigen::PropagateNaN>(Eigen::ArrayWrapper<Eigen::Map<const Eigen::Matrix<float, -1, 1>, 0, Eigen::Stride<0, 0> > >)’
  750 |     min = min.array().template min<Eigen::PropagateNaN>(EigenMap<float>(data_n).array());
      |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/eigen3/Eigen/Core:19,
                 from /home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/util/math_cpuonly.h:68,
                 from /home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.h:10,
                 from /home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.cc:4:
/usr/include/eigen3/Eigen/src/Core/../plugins/ArrayCwiseBinaryOps.h:33:28: note: candidate: ‘template<class OtherDerived> const Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<typename Eigen::internal::traits<T>::Scalar, typename Eigen::internal::traits<OtherDerived>::Scalar>, const Derived, const OtherDerived> Eigen::ArrayBase<Derived>::min(const Eigen::ArrayBase<OtherDerived>&) const [with OtherDerived = OtherDerived; Derived = Eigen::ArrayWrapper<Eigen::Map<Eigen::Matrix<float, -1, 1>, 0, Eigen::Stride<0, 0> > >]’
   33 | EIGEN_MAKE_CWISE_BINARY_OP(min,min)
      |                            ^~~
/usr/include/eigen3/Eigen/src/Core/util/Macros.h:1339:4: note: in definition of macro ‘EIGEN_MAKE_CWISE_BINARY_OP’
 1339 |   (METHOD)(const EIGEN_CURRENT_STORAGE_BASE_CLASS<OtherDerived> &other) const \
      |    ^~~~~~
/usr/include/eigen3/Eigen/src/Core/../plugins/ArrayCwiseBinaryOps.h:33:28: note:   template argument deduction/substitution failed:
   33 | EIGEN_MAKE_CWISE_BINARY_OP(min,min)
      |                            ^~~
/usr/include/eigen3/Eigen/src/Core/util/Macros.h:1339:4: note: in definition of macro ‘EIGEN_MAKE_CWISE_BINARY_OP’
 1339 |   (METHOD)(const EIGEN_CURRENT_STORAGE_BASE_CLASS<OtherDerived> &other) const \
      |    ^~~~~~
/home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.cc:750:56: error: type/value mismatch at argument 1 in template parameter list for ‘template<class OtherDerived> const Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<typename Eigen::internal::traits<T>::Scalar, typename Eigen::internal::traits<OtherDerived>::Scalar>, const Derived, const OtherDerived> Eigen::ArrayBase<Derived>::min(const Eigen::ArrayBase<OtherDerived>&) const [with OtherDerived = OtherDerived; Derived = Eigen::ArrayWrapper<Eigen::Map<Eigen::Matrix<float, -1, 1>, 0, Eigen::Stride<0, 0> > >]’
  750 |     min = min.array().template min<Eigen::PropagateNaN>(EigenMap<float>(data_n).array());
      |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.cc:750:56: note:   expected a type, got ‘Eigen::PropagateNaN’
/home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.cc: In lambda function:
/home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.cc:802:77: error: no matching function for call to ‘Eigen::Map<const Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> >::min<Eigen::PropagateNaN>(Eigen::half)’
  802 |           output_vec_map = input_1_vec_map.template min<Eigen::PropagateNaN>(
      |                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
  803 |               static_cast<Eigen::half>(per_iter_bh.ScalarInput0<MLFloat16>()));
      |               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/eigen3/Eigen/Core:19,
                 from /home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/util/math_cpuonly.h:68,
                 from /home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.h:10,
                 from /home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.cc:4:
/usr/include/eigen3/Eigen/src/Core/../plugins/ArrayCwiseBinaryOps.h:33:28: note: candidate: ‘template<class OtherDerived> const Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<typename Eigen::internal::traits<T>::Scalar, typename Eigen::internal::traits<OtherDerived>::Scalar>, const Derived, const OtherDerived> Eigen::ArrayBase<Derived>::min(const Eigen::ArrayBase<OtherDerived>&) const [with OtherDerived = OtherDerived; Derived = Eigen::Map<const Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> >]’
   33 | EIGEN_MAKE_CWISE_BINARY_OP(min,min)
      |                            ^~~
/usr/include/eigen3/Eigen/src/Core/util/Macros.h:1339:4: note: in definition of macro ‘EIGEN_MAKE_CWISE_BINARY_OP’
 1339 |   (METHOD)(const EIGEN_CURRENT_STORAGE_BASE_CLASS<OtherDerived> &other) const \
      |    ^~~~~~
/usr/include/eigen3/Eigen/src/Core/../plugins/ArrayCwiseBinaryOps.h:33:28: note:   template argument deduction/substitution failed:
   33 | EIGEN_MAKE_CWISE_BINARY_OP(min,min)
      |                            ^~~
/usr/include/eigen3/Eigen/src/Core/util/Macros.h:1339:4: note: in definition of macro ‘EIGEN_MAKE_CWISE_BINARY_OP’
 1339 |   (METHOD)(const EIGEN_CURRENT_STORAGE_BASE_CLASS<OtherDerived> &other) const \
      |    ^~~~~~
/home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.cc:802:77: error: type/value mismatch at argument 1 in template parameter list for ‘template<class OtherDerived> const Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<typename Eigen::internal::traits<T>::Scalar, typename Eigen::internal::traits<OtherDerived>::Scalar>, const Derived, const OtherDerived> Eigen::ArrayBase<Derived>::min(const Eigen::ArrayBase<OtherDerived>&) const [with OtherDerived = OtherDerived; Derived = Eigen::Map<const Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> >]’
  802 |           output_vec_map = input_1_vec_map.template min<Eigen::PropagateNaN>(
      |                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
  803 |               static_cast<Eigen::half>(per_iter_bh.ScalarInput0<MLFloat16>()));
      |               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.cc:802:77: note:   expected a type, got ‘Eigen::PropagateNaN’
/home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.cc:805:77: error: no matching function for call to ‘Eigen::Map<const Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> >::max<Eigen::PropagateNaN>(Eigen::half)’
  805 |           output_vec_map = input_1_vec_map.template max<Eigen::PropagateNaN>(
      |                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
  806 |               static_cast<Eigen::half>(per_iter_bh.ScalarInput0<MLFloat16>()));
      |               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Fix #23407

@lixing-star
2025-01-21 12:40:42 -08:00
Yulong Wang
080c67e900
[WebGPU] allow build WebGPU EP for WebAssembly (#23364)
### Description

This PR allows WebGPU EP to be built with Emscripten for WebAssembly,
Including:


- cmake build files update to support correct setup for Emscripten.
- code changes to fix build breaks for wasm
- change in Web CI pipeline to add a build-only target for wasm with
`--use_webgpu`.
2025-01-16 10:52:17 -08:00
Ted Themistokleous
7cd08a6004
[MigraphX EP] [ROCm EP] Upstream ROCm changes for bugfixes and features (#23249)
Add support to mainline Onnxruntime of changes from the ROCm Team's changes

### Motivation and Context
Various bugfixes, and changes added between ROCm 6.2 and 6.3 that
haven't been upstreamed yet to mainline

---------

Co-authored-by: Yueqing Zhang <yuz75@Pitt.edu>
Co-authored-by: Yueqing Zhang <yueqingz@amd.com>
Co-authored-by: Jeff Daily <jeff.daily@amd.com>
Co-authored-by: Artur Wojcik <artur.wojcik@outlook.com>
Co-authored-by: Ted Themistokleous <tedthemistokleous@amd.com>
Co-authored-by: Xinya Zhang <Xinya.Zhang@amd.com>
Co-authored-by: ikalinic <ilija.kalinic@amd.com>
Co-authored-by: sstamenk <sstamenk@amd.com>
2025-01-15 12:57:04 -08:00
Changming Sun
6a7ea5c896
Update xnnpack, cpuinfo and pthreadpool (#23362)
### Description
Update xnnpack to remove the dependency on psimd and fp16 libraries.
However, coremltool still depends on them, which will be addressed
later.

Also, update CPUINFO because the latest xnnpack requires CPUINFO's avx10
support.

### Motivation and Context
The fewer dependencies the better.
2025-01-15 09:42:15 -08:00
Yulong Wang
444fcebaa4
Pre-requisites of upgrading EMSDK (#23347)
### Description

This PR contains a part of the changes in #23318.

The reason of creating this PR is: The works to support building WebGPU
EP in WASM depends on #23318, which cannot be merged since it's blocked
by upstream (https://github.com/llvm/llvm-project/issues/122166). This
PR contains the changes can be safely merged separately and can unblock
the development of supporting building WebGPU EP in WASM.

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-01-14 11:07:21 -08:00
Changming Sun
4e4fd2bdcf
Update ORT extension to the latest (#23314)
Update ORT extension to the latest, to include some build system fixes.
2025-01-13 18:59:42 -08:00
Yulong Wang
a74817ab10
add missing build dependency for onnxruntime_providers_webgpu (#23324)
### Description

Fixes build when specify with flag `--target
onnxruntime_providers_webgpu`

Otherwise the following error will occur:
```
  range.cc
D:\code\onnxruntime\build\Windows\Debug\_deps\onnx-src\onnx\onnx_pb.h(65,10): error C1083: Cannot open include file: 'o
nnx/onnx-ml.pb.h': No such file or directory [D:\code\onnxruntime\build\Windows\Debug\onnxruntime_providers_webgpu.vcxp
roj]
  (compiling source file '../../../onnxruntime/core/providers/webgpu/math/binary_elementwise_ops.cc')
```
2025-01-10 18:07:12 -08:00
Changming Sun
b461f06a15
Remove a hack in adjust_global_compile_flags.cmake (#23313)
### Description

Remove a hack in adjust_global_compile_flags.cmake because the issue
should have been resolved.
2025-01-10 18:05:43 -08:00
Changming Sun
1ce59577d5
Add VCPKG triplet files (#23298)
Add VCPKG triplet files. All the triplet files are automatically
generated by gen.py. Put the files there to ease use.
2025-01-09 16:18:51 -08:00
Changming Sun
0ec2171b9f
Update Linux docker images (#23244)
The new images contain the following updates:

1. Added Git, Ninja and VCPKG to all docker images
2. Updated CPU containers' GCC version from 12 to 14
3. Pinned CUDA 12 images' CUDNN version to 9.5(The latest one is 9.6)
4. Addressed container supply chain warnings by building CUDA 12 images
from scratch(avoid using Nvidia's prebuilt images)
5. Updated manylinux commit id to
75aeda9d18eafb323b00620537c8b4097d4bef48

Also, this PR updated some source code to make the CPU EP's source code
compatible with GCC 14.
2025-01-09 10:20:33 -08:00
PARK DongHa
5b9c968eaa
Correct ONNX and Protobuf version in vcpkg build (#23285)
### Description

Changes vcpkg manifest and configuration file (vcpkg.json &
vcpkg-configuration.json)

* Update vcpkg version to
https://github.com/microsoft/vcpkg/releases/tag/2024.12.16
* Use protobuf 3.21.12(= `v21.12`) to sync with
[cmake/deps.txt](https://github.com/microsoft/onnxruntime/blob/main/cmake/deps.txt)
  * Resolve https://github.com/microsoft/onnxruntime/issues/22750
* Add `onnx` to vcpkg manifest so `find_package(ONNX)` and
`find_dependency(Protobuf)` can work as expected.
  * Currently, It uses 1.16.2
* v1.17.0 will become available after
https://github.com/microsoft/vcpkg/pull/42942

However, `onnx` in vcpkg doesn't configure
`ONNX_DISABLE_STATIC_REGISTRATION` build option.

* https://github.com/microsoft/vcpkg/pull/38879
* Create "cmake/vcpkg-triplets/" folder and triplet files which use
`VCPKG_CMAKE_CONFIGURE_OPTIONS` for the option
* This requires `VCPKG_OVERLAY_TRIPLETS` environment variable for CI
steps, which is a bit inconvenient.
     I will try to find simple way to get same result

### Motivation and Context

* Help #23158 
  * "ONNX is not consumed from vcpkg"
* "Mismatch protobuf version. When vcpkg is enabled , we should not
fetch protoc from Github which may cause version mismatches."
* https://github.com/microsoft/vcpkg/pull/43126
* #21348
2025-01-08 12:25:17 -08:00
Changming Sun
69bb53db85
Enable delay loading hooker for python packages (#23227)
### Description
Enable delay loading hooker for python packages
2024-12-31 10:12:31 -08:00
liqun Fu
a9a881cc98
Integrate onnx 1.17.0 (#21897)
### Description
<!-- Describe your changes. -->
for ORT 1.21.0 release

Create following related issues to track skipped tests due to updated
ONNX operators in the ONNX 1.17.0 release:
https://github.com/microsoft/onnxruntime/issues/23162
https://github.com/microsoft/onnxruntime/issues/23164
https://github.com/microsoft/onnxruntime/issues/23163
https://github.com/microsoft/onnxruntime/issues/23161

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Signed-off-by: Liqun Fu <liqfu@microsoft.com>
Signed-off-by: Liqun Fu <liqun.fu@microsoft.com>
Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com>
Co-authored-by: Yifan Li <109183385+yf711@users.noreply.github.com>
Co-authored-by: yf711 <yifanl@microsoft.com>
2024-12-24 09:02:02 -08:00
Yulong Wang
6806174096
fix webgpu delay load test (#23157)
### Description

This change fixes the WebGPU delay load test.


<details>
<summary>Fix UB in macro</summary>

The following C++ code outputs `2, 1` in MSVC, while it outputs `1, 1`
in GCC:

```c++
#include <iostream>

#define A 1
#define B 1

#define ENABLE defined(A) && defined(B)

#if ENABLE
int x = 1;
#else
int x = 2;
#endif

#if defined(A) && defined(B)
int y = 1;
#else
int y = 2;
#endif

int main()
{
    std::cout << x << ", " << y << "\n";
}
```

Clang reports `macro expansion producing 'defined' has undefined
behavior [-Wexpansion-to-defined]`.

</details>

<details>
<summary>Fix condition of build option
onnxruntime_ENABLE_DELAY_LOADING_WIN_DLLS</summary>

Delay load is explicitly disabled when python binding is being built.
modifies the condition.

</details>
2024-12-20 13:37:12 -08:00
Changming Sun
fcc34da5e9
Fix a tiny problem in winml.cmake (#23173)
### Description
CMake's
[target_link_libraries](https://cmake.org/cmake/help/latest/command/target_link_libraries.html#id2)
function accepts plain library name(like `re2`) or target name(like
`re2::re2`) or some other kinds of names. "plain library names" are
old-fashioned, for compatibility only. We should use target names.

### Motivation and Context
To make vcpkg work with winml build. See #23158
2024-12-20 11:48:43 -08:00
Yifan Li
d9d07ad8ae
[TensorRT EP] support TensorRT 10.7-GA (#23011)
### Description
<!-- Describe your changes. -->
Update CIs to TRT10.7

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-12-19 10:39:15 -08:00
Yulong Wang
8680244ebc
Fix delay load for WebGPU EP and DML EP (#23111)
### Description

This change fixes the DLL delay load problem for the WebGPU EP and
DirectML EP. See detailed explanation below.

### Problem

When onnxruntime.dll uses delay loading for its dependencies, the
dependencies are loaded using `LoadLibraryEx()`, which search the
directory of process (.exe) instead of this library (onnxruntime.dll).
This is a problem for usages of Node.js binding and python binding,
because Windows will try to find the dependencies in the directory of
node.exe or python.exe, which is not the directory of onnxruntime.dll.

There was previous attempt to fix this by loading DirectML.dll in the
initialization of onnxruntime nodejs binding, which works for DML EP but
is not a good solution because it does not really "delay" the load.

For WebGPU, the situation became worse because webgpu_dawn.dll depends
on dxil.dll and dxcompiler.dll, which are explicitly dynamically loaded
in the code using `LoadLibraryA()`. This has the same problem of the DLL
search.

### Solutions

For onnxruntime.dll loading its direct dependencies, it can be resolved
by set the [`__pfnDliNotifyHook2`
hook](https://learn.microsoft.com/en-us/cpp/build/reference/understanding-the-helper-function?view=msvc-170#structure-and-constant-definitions)
to load from an absolute path that constructed from the onnxruntime.dll
folder and the DLL name.

For webgpu_dawn.dll loading dxil.dll and dxcompiler.dll, since they are
explicitly loaded in the code, the hook does not work. Instead, it can
be resolved by ~~using WIN32 API `SetDllDirectory()` to add the
onnxruntime.dll folder to the search path.~~ preloading the 2 DLLs from
the onnxruntime.dll folder .
2024-12-19 10:23:48 -08:00
Yulong Wang
3a0b958586
add 2 CMake build options of Dawn (#23096)
### Description

This change adds the following CMake build options for Dawn:
- onnxruntime_BUILD_DAWN_MONOLITHIC_LIBRARY
  - OFF by default
  - when enabled, builds Dawn as a monolithic library (webgpu_dawn.dll)
- onnxruntime_ENABLE_DAWN_BACKEND_VULKAN
  - OFF by default
  - when enabled, build with Vulkan backend for Dawn on Windows
- onnxruntime_ENABLE_DAWN_BACKEND_D3D12
  - ON by default
  - when enabled, build with DirectX 12 backend for Dawn on Windows



### File Size Comparison (Windows)

|  Build | cmdline  |  File Size  |
|---|---|---|
| Baseline | --config Release<br/> --build_shared_lib | `12,755,456
onnxruntime.dll` |
| WebGPU D3D12 (default) | --use_webgpu<br/> --config Release<br/>
--build_shared_lib | `17,082,368 dxcompiler.dll`<br/>` 1,508,472
dxil.dll`<br/>`18,708,480 onnxruntime.dll` |
| WebGPU D3D12+Vulkan | --use_webgpu<br/> --config Release<br/>
--build_shared_lib<br/> --cmake_extra_defines<br/>
onnxruntime_ENABLE_DAWN_BACKEND_D3D12=1<br/>
onnxruntime_ENABLE_DAWN_BACKEND_VULKAN=1 | `17,081,344
dxcompiler.dll`<br/>` 1,508,472 dxil.dll`<br/>`19,388,416
onnxruntime.dll` |
| WebGPU Vulkan | --use_webgpu<br/> --config Release<br/>
--build_shared_lib<br/> --cmake_extra_defines<br/>
onnxruntime_ENABLE_DAWN_BACKEND_D3D12=0<br/>
onnxruntime_ENABLE_DAWN_BACKEND_VULKAN=1 | `17,615,872 onnxruntime.dll`
|
| Monolithic | --use_webgpu<br/> --config Release<br/>
--build_shared_lib<br/> --cmake_extra_defines<br/>
onnxruntime_BUILD_DAWN_MONOLITHIC_LIBRARY=1 | `17,082,368
dxcompiler.dll`<br/>` 1,508,472 dxil.dll`<br/>`13,277,696
onnxruntime.dll`<br/>` 5,616,640 webgpu_dawn.dll` |
| External Dawn | --use_webgpu<br/> --config Release<br/>
--build_shared_lib<br/> --cmake_extra_defines<br/>
onnxruntime_USE_EXTERNAL_DAWN=1<br/> --skip_tests | `17,081,344
dxcompiler.dll`<br/>` 1,508,472 dxil.dll`<br/>`13,277,184
onnxruntime.dll`
2024-12-13 16:05:48 -08:00
Dmitri Smirnov
890a719c91
Remove deprecated static from Eigen that contributes to size increase (#23084)
### Description
<!-- Describe your changes. -->
This patches Eigen source to remove an unused deprecated static var.

### Motivation and Context
Internal customer request.
2024-12-12 10:19:47 -08:00
Ankit Maheshkar
1f88284f96
OVEP 1.21.0 Development Updates (#23080)
### Description
OVEP development changes for ORT 1.21 Release
 
 
### Motivation and Context
- Has Critical Bug Fixes
- Improved Performance optimizations for both memory & inference latency
(https://github.com/intel/onnxruntime/pull/513)
- Enabled Model Compilation using NPUW
(https://github.com/intel/onnxruntime/pull/508)
- Fixed support for EPContext embed mode 0 for lower memory utilization
- Updated NuGet package name as `Intel.ML.OnnxRuntime.OpenVino`
- Fixed QDQ Stripping logic on NPU
2024-12-11 22:26:32 -08:00
Edward Chen
fa6ad202aa
Minor updates to onnxruntime_java.cmake (#23068)
- Use `ANDROID` instead of `CMAKE_SYSTEM_NAME STREQUAL "Android"`.
- Put common gradle arguments into `COMMON_GRADLE_ARGS` to make them easier to reuse.
2024-12-10 15:44:36 -08:00
Misha Chornyi
bf4d3e1a5b
Update vcpkg.json - lock flatbuffer version (#23046)
### Description
Locking version introduced in:

03ea5dc495/onnxruntime/core/flatbuffers/schema/ort_training_checkpoint.fbs.h (L11-L13)

### Motivation and Context
Resolve issue for version `>=1.20.` 
https://github.com/microsoft/onnxruntime/issues/22666
2024-12-10 11:23:01 -08:00
Jing Fang
bd5a759d0c
[ARM CPU] Add rotary embedding fp16 kernel (#23013)
### Description
Add fp16 kernel to rotary embedding to boost performance.


### Motivation and Context
Part of performance optimization work for group query attention
2024-12-06 13:25:48 -08:00
Yulong Wang
a615bd6688
Bump version of Dawn to 12a3b24c4 (#23002)
### Description

Upgrade version of Dawn.

Removed dawn.patch, because all patches are included in upstream.

Updated code that affected by API changes (`const char*` ->
`WGPUStringView`)


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-12-04 09:47:16 -08:00
Yulong Wang
e84b8e7bd5
allow specify a custom local source path for Dawn (#22999)
### Description

Allows to build ONNX Runtime with a custom local path of Dawn's source
code.

Usage:
```sh
build --use_webgpu --cmake_extra_defines "onnxruntime_CUSTOM_DAWN_SRC_PATH=C:/src/dawn"

```
2024-12-03 19:25:22 -08:00
Kee
8c52fa3924
[VSINPU]Split/Pad and some element-wise OPs support (#22916)
### Description
-Add split/pad/neg/not/ceil/round/min/max op support
-Fix conv2d op default pads value issue
-Add VSINPU EP to support python bindings


### Motivation and Context
-New OPs support for VSINPU EP

---------

Signed-off-by: Kee <xuke537@hotmail.com>
2024-12-02 13:57:30 -08:00
Aleksei Nikiforov
f6e1d44829
Add option to force generic algorithms on x86 (#22917)
Option is named onnxruntime_FORCE_GENERIC_ALGORITHMS

Follow up to https://github.com/microsoft/onnxruntime/pull/22125.

### Description
This change adds compile-time option to disable optimized algorithms and
use generic algorithms (exclude AVX* and SSE etc in GEMM) on x86. This
new option is intended only for testing these algorithms, not for
production use.

Following build command on linux x86_64 builds onnxruntime with new
option enabled:
`./build.sh --parallel --cmake_extra_defines
onnxruntime_FORCE_GENERIC_ALGORITHMS=1`

### Motivation and Context
This change allows testing generic algorithms. This may be needed for
platforms which don't have optimized implementations available, like in
https://github.com/microsoft/onnxruntime/pull/22125.
2024-11-21 13:45:46 -08:00
Changming Sun
13346fdf18
Cleanup code (#22827)
### Description
1.  Delete TVM EP because it is out of maintain 
2.  Delete ortmodule related docker files and scripts.
2024-11-19 14:13:33 -08:00
Jing Fang
c73a3d1804
[ARM] MatMulNBits fp16 support - connect kernels (#22856)
### Description
A breakdown PR of https://github.com/microsoft/onnxruntime/pull/22651



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-11-15 14:59:11 -08:00
Po-Wei (Vincent)
bbe7c87738
Fix 1.20 cuda minimal build failure (#22751)
### Description
Fixes build failure for the cuda minimal build




### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
[This change](https://github.com/microsoft/onnxruntime/pull/19470) in
1.20 is causing build failures for the cuda minimal build.
Essentially, some cudnn logic was not guarded by the `USE_CUDA_MINIMAL`.
Also the build is looking for cudnn while in the cuda minimal build it
shouldn't depend on it, resulting in linking error.


cc @gedoensmax @chilo-ms
2024-11-15 10:50:55 -08:00
Preetha Veeramalai
ac9c135b95
Ovep develop 1.21 (#22824)
### Description
OVEP development changes for ORT 1.21 Release


### Motivation and Context
Has critical bug fixes
Support for concurrency execution of models is enabled
Support for OV 2024.5
Memory optimizations for NPU platform

---------

Co-authored-by: jatinwadhwa921 <jatin.wadhwa@intel.com>
Co-authored-by: Ankit Maheshkar <ankit.maheshkar@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel.com>
Co-authored-by: saurabhkale17 <saurabh1.kale@intel.com>
Co-authored-by: TejalKhade28 <tejal.khade@intel.com>
Co-authored-by: Javier E. Martinez <javier.e.martinez@intel.com>
2024-11-14 20:10:07 -08:00
Jing Fang
c02b398980
[ARM] MatMulNBits Fp16 support - API change only (#22826)
### Description
A break-down PR of https://github.com/microsoft/onnxruntime/pull/22651
Op API change only.
- add template to functions and classes that support fp32 and fp16
- rename functions, classes and files that support fp32 and fp16 from
SQNBxxx to QNBxxx


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-11-14 10:38:59 -08:00