onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-21 02:18:09 +00:00

Author	SHA1	Message	Date
shaoboyan091	5ef18328bf	[WebGPU] Support PIX Capture for WebGPU EP (#23192 ) PIX Capture tool requires 'present' to end a frame capture. ORT doesn't have rendering work so no 'present' happens. To avoid endless waiting for PIX capture tool, this PR added a blank surface and 'present' on it in each session run. The surface is created in WebGPU ep constructor and closed in WebGPU ep destructor. ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2025-02-08 02:05:15 -08:00
Changming Sun	5f6a3158f8	Enable VCPKG in CI build (#23426 ) ### Description 1. Enable VCPKG flag in Windows CPU CI build pipelines. 2. Increased the min supported cmake version from 3.26 to 3.28. Because of it, drop the support for the old way of finding python by "find_package(PythonLibs)". Therefore, in build.py we no longer set "PYTHON_EXECUTABLE" cmake var when doing cmake configure. 3. Added "xnnpack-ep" as a feature for ORT's vcpkg config. 4. Added asset cache support for ORT's vcpkg build 5. Added VCPKG triplet files for Android build. 6. Set VCPKG triplet to "universal2-osx" if CMAKE_OSX_ARCHITECTURES was found in cmake extra defines. 7. Removed a small piece of code in build.py, which was for support CUDA version < 11.8. 8. Fixed an issue that CMAKE_OSX_ARCHITECTURES sometimes got specified twice when build.py invoked cmake. 9. Added more model tests to Android build. After this change, we will test all ONNX versions instead of just the latest one. 10. Fixed issues that are related to build.py's "--build_nuget" parameter. Also, enable the flag in most Windows CPU CI build jobs. 11. Removed a restriction in build.py that disallowed cross-compiling Windows ARM64 nuget package on Windows x86. ### Motivation and Context Adopt vcpkg.	2025-02-05 10:58:53 -08:00
Changming Sun	ead9d5cf43	Set ANDROID_USE_LEGACY_TOOLCHAIN_FILE to false (#23544 ) NDK has two toolchain cmake files as you can see in https://android.googlesource.com/platform/ndk/+/refs/heads/main/build/cmake By default NDK use the legacy one for providing the best compatibility. We don't need to. This PR changes to use the new one. The new toolchain cmake file uses standard cmake flags like CMAKE_ANDROID_RTTI to control C++ features.	2025-01-30 16:10:09 -08:00
Karim Vadsariya	655a23ff1d	[onnxruntime/build] Add new flag enable_generic_interface to build primary EPs by default (#23342 ) ### Description - Add new build flag in build.py to build onnxruntime.dll supporting interfaces for all primary EPs( QNN, TensoRT, OpenVino, VitisAI). - Modify onnxruntime.dll/onnxruntime_shared.dll build settings to remove dependency of IHV SDK Toolset to be installed on the system. - Change CMake variables to be explicit when building EP vs ORT. e.g. onnxruntime_USE_TENSORRT vs onnxruntime_USE_TENSORRT_INTERFACE, to evolve the build system to build ORT independent of EPs. ### Motivation and Context Changes in the build system required to evolve the repo to build the components independently while removing unnecessary dependencies --------- Co-authored-by: Lei Cao <jslhcl@gmail.com> Co-authored-by: Karim Vadsariya <kvadsariya@microsoft.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2025-01-28 15:24:09 -08:00
Adrian Lizarraga	3b4c7df4e9	[QNN EP] Make QNN EP a shared library (#23120 ) ### Description - Makes QNN EP a shared library by default when building with `--use_qnn` or `--use_qnn shared_lib`. Generates the following build artifacts: - Windows: `onnxruntime_providers_qnn.dll` and `onnxruntime_providers_shared.dll` - Linux: `libonnxruntime_providers_qnn.so` and `libonnxruntime_providers_shared.so` - Android: Not supported. Must build QNN EP as a static library. - Allows QNN EP to still be built as a static library with `--use_qnn static_lib`. This is primarily for the Android QNN AAR package. - Unit tests run for both the static and shared QNN EP builds. ### Detailed changes - Updates Java bindings to support both shared and static QNN EP builds. - Provider bridge API: - Adds logging sink ETW to the provider bridge. Allows EPs to register ETW callbacks for ORT logging. - Adds a variety of methods for onnxruntime objects that are needed by QNN EP. - QNN EP: - Adds `ort_api.h` and `ort_api.cc` that encapsulates the API provided by ORT in a manner that allows the EP to be built as either a shared or static library. - Adds custom function to transpose weights for Conv and Gemm (instead of adding util to provider bridge API). - Adds custom function to quantize data for LeakyRelu (instead of adding util to provider bridge API). - Adds custom ETW tracing for QNN profiling events: - shared library: defines its own TraceLogging provider handle - static library: uses ORT's TraceLogging provider handle and existing telemetry provider. - ORT-QNN Packages: - Python: Pipelines build QNN EP as a shared library by default. User can build a local python wheel with QNN EP as a static library by passing `--use_qnn static_lib`. - NuGet: Pipelines build QNN EP as a shared library by default. `build.py` currently enforces QNN EP to be built as a shared library. Can add support for building a QNN NuGet package with static later if deemed necessary. - Android: Pipelines build QNN EP as a static library. `build.py` enforces QNN EP to be built as a static library. Packaging multiple shared libraries into an Android AAR package is not currently supported due to the added need to also distribute a shared libcpp.so library. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2025-01-22 12:11:00 -08:00
Justin Chu	ad312d9677	Enable comprehension simplification in ruff rules (#23414 ) Enable comprehension simplification rules (C4) for ruff and apply autofix.	2025-01-17 08:43:06 -08:00
Justin Chu	09c4cc7b36	Target py310 and modernize codebase with ruff (#23401 ) Change `target-version = "py310"` and modernize the code base with ruff.	2025-01-16 19:10:14 -08:00
Justin Chu	c7c8757a1c	Use ruff as the formatter to replace black-isort (#23397 ) Use ruff as the code formatter in place of black and isort since it is much faster, and as projects like PyTorch and ONNX have adopted ruff format as well. This PR include only auto-fixed changes in formatting.	2025-01-16 11:14:15 -08:00
Ted Themistokleous	7cd08a6004	[MigraphX EP] [ROCm EP] Upstream ROCm changes for bugfixes and features (#23249 ) Add support to mainline Onnxruntime of changes from the ROCm Team's changes ### Motivation and Context Various bugfixes, and changes added between ROCm 6.2 and 6.3 that haven't been upstreamed yet to mainline --------- Co-authored-by: Yueqing Zhang <yuz75@Pitt.edu> Co-authored-by: Yueqing Zhang <yueqingz@amd.com> Co-authored-by: Jeff Daily <jeff.daily@amd.com> Co-authored-by: Artur Wojcik <artur.wojcik@outlook.com> Co-authored-by: Ted Themistokleous <tedthemistokleous@amd.com> Co-authored-by: Xinya Zhang <Xinya.Zhang@amd.com> Co-authored-by: ikalinic <ilija.kalinic@amd.com> Co-authored-by: sstamenk <sstamenk@amd.com>	2025-01-15 12:57:04 -08:00
dependabot[bot]	1461a16e71	Bump ruff from 0.5.4 to 0.9.1 (#23328 ) Bumps [ruff](https://github.com/astral-sh/ruff) from 0.5.4 to 0.9.1. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/astral-sh/ruff/releases">ruff's releases</a>.</em></p> <blockquote> <h2>0.9.1</h2> <h2>Release Notes</h2> <h3>Preview features</h3> <ul> <li>[<code>pycodestyle</code>] Run <code>too-many-newlines-at-end-of-file</code> on each cell in notebooks (<code>W391</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15308">#15308</a>)</li> <li>[<code>ruff</code>] Omit diagnostic for shadowed private function parameters in <code>used-dummy-variable</code> (<code>RUF052</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15376">#15376</a>)</li> </ul> <h3>Rule changes</h3> <ul> <li>[<code>flake8-bugbear</code>] Improve <code>assert-raises-exception</code> message (<code>B017</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15389">#15389</a>)</li> </ul> <h3>Formatter</h3> <ul> <li>Preserve trailing end-of line comments for the last string literal in implicitly concatenated strings (<a href="https://redirect.github.com/astral-sh/ruff/pull/15378">#15378</a>)</li> </ul> <h3>Server</h3> <ul> <li>Fix a bug where the server and client notebooks were out of sync after reordering cells (<a href="https://redirect.github.com/astral-sh/ruff/pull/15398">#15398</a>)</li> </ul> <h3>Bug fixes</h3> <ul> <li>[<code>flake8-pie</code>] Correctly remove wrapping parentheses (<code>PIE800</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15394">#15394</a>)</li> <li>[<code>pyupgrade</code>] Handle comments and multiline expressions correctly (<code>UP037</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15337">#15337</a>)</li> </ul> <h2>Contributors</h2> <ul> <li><a href="https://github.com/AntoineD"><code>@AntoineD</code></a></li> <li><a href="https://github.com/InSyncWithFoo"><code>@InSyncWithFoo</code></a></li> <li><a href="https://github.com/MichaReiser"><code>@MichaReiser</code></a></li> <li><a href="https://github.com/calumy"><code>@calumy</code></a></li> <li><a href="https://github.com/dcreager"><code>@dcreager</code></a></li> <li><a href="https://github.com/dhruvmanila"><code>@dhruvmanila</code></a></li> <li><a href="https://github.com/dylwil3"><code>@dylwil3</code></a></li> <li><a href="https://github.com/sharkdp"><code>@sharkdp</code></a></li> <li><a href="https://github.com/tjkuson"><code>@tjkuson</code></a></li> </ul> <h2>Install ruff 0.9.1</h2> <h3>Install prebuilt binaries via shell script</h3> <pre lang="sh"><code>curl --proto '=https' --tlsv1.2 -LsSf https://github.com/astral-sh/ruff/releases/download/0.9.1/ruff-installer.sh \| sh </code></pre> <h3>Install prebuilt binaries via powershell script</h3> <pre lang="sh"><code>powershell -ExecutionPolicy ByPass -c "irm https://github.com/astral-sh/ruff/releases/download/0.9.1/ruff-installer.ps1 \| iex" </code></pre> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md">ruff's changelog</a>.</em></p> <blockquote> <h2>0.9.1</h2> <h3>Preview features</h3> <ul> <li>[<code>pycodestyle</code>] Run <code>too-many-newlines-at-end-of-file</code> on each cell in notebooks (<code>W391</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15308">#15308</a>)</li> <li>[<code>ruff</code>] Omit diagnostic for shadowed private function parameters in <code>used-dummy-variable</code> (<code>RUF052</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15376">#15376</a>)</li> </ul> <h3>Rule changes</h3> <ul> <li>[<code>flake8-bugbear</code>] Improve <code>assert-raises-exception</code> message (<code>B017</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15389">#15389</a>)</li> </ul> <h3>Formatter</h3> <ul> <li>Preserve trailing end-of line comments for the last string literal in implicitly concatenated strings (<a href="https://redirect.github.com/astral-sh/ruff/pull/15378">#15378</a>)</li> </ul> <h3>Server</h3> <ul> <li>Fix a bug where the server and client notebooks were out of sync after reordering cells (<a href="https://redirect.github.com/astral-sh/ruff/pull/15398">#15398</a>)</li> </ul> <h3>Bug fixes</h3> <ul> <li>[<code>flake8-pie</code>] Correctly remove wrapping parentheses (<code>PIE800</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15394">#15394</a>)</li> <li>[<code>pyupgrade</code>] Handle comments and multiline expressions correctly (<code>UP037</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15337">#15337</a>)</li> </ul> <h2>0.9.0</h2> <p>Check out the <a href="https://astral.sh/blog/ruff-v0.9.0">blog post</a> for a migration guide and overview of the changes!</p> <h3>Breaking changes</h3> <p>Ruff now formats your code according to the 2025 style guide. As a result, your code might now get formatted differently. See the formatter section for a detailed list of changes.</p> <p>This release doesn’t remove or remap any existing stable rules.</p> <h3>Stabilization</h3> <p>The following rules have been stabilized and are no longer in preview:</p> <ul> <li><a href="https://docs.astral.sh/ruff/rules/stdlib-module-shadowing/"><code>stdlib-module-shadowing</code></a> (<code>A005</code>). This rule has also been renamed: previously, it was called <code>builtin-module-shadowing</code>.</li> <li><a href="https://docs.astral.sh/ruff/rules/builtin-lambda-argument-shadowing/"><code>builtin-lambda-argument-shadowing</code></a> (<code>A006</code>)</li> <li><a href="https://docs.astral.sh/ruff/rules/slice-to-remove-prefix-or-suffix/"><code>slice-to-remove-prefix-or-suffix</code></a> (<code>FURB188</code>)</li> <li><a href="https://docs.astral.sh/ruff/rules/boolean-chained-comparison/"><code>boolean-chained-comparison</code></a> (<code>PLR1716</code>)</li> <li><a href="https://docs.astral.sh/ruff/rules/decimal-from-float-literal/"><code>decimal-from-float-literal</code></a> (<code>RUF032</code>)</li> <li><a href="https://docs.astral.sh/ruff/rules/post-init-default/"><code>post-init-default</code></a> (<code>RUF033</code>)</li> <li><a href="https://docs.astral.sh/ruff/rules/useless-if-else/"><code>useless-if-else</code></a> (<code>RUF034</code>)</li> </ul> <p>The following behaviors have been stabilized:</p> <ul> <li><a href="https://docs.astral.sh/ruff/rules/pytest-parametrize-names-wrong-type/"><code>pytest-parametrize-names-wrong-type</code></a> (<code>PT006</code>): Detect <a href="https://docs.pytest.org/en/7.1.x/how-to/parametrize.html#parametrize"><code>pytest.parametrize</code></a> calls outside decorators and calls with keyword arguments.</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`12f86f39a4`"><code>12f86f3</code></a> Ruff 0.9.1 (<a href="https://redirect.github.com/astral-sh/ruff/issues/15407">#15407</a>)</li> <li><a href="`2b28d566a4`"><code>2b28d56</code></a> Associate a trailing end-of-line comment in a parenthesized implicit concaten...</li> <li><a href="`adca7bd95c`"><code>adca7bd</code></a> Remove pygments pin (<a href="https://redirect.github.com/astral-sh/ruff/issues/15404">#15404</a>)</li> <li><a href="`6b98a26452`"><code>6b98a26</code></a> [red-knot] Support <code>assert_type</code> (<a href="https://redirect.github.com/astral-sh/ruff/issues/15194">#15194</a>)</li> <li><a href="`c87463842a`"><code>c874638</code></a> [red-knot] Move tuple-containing-Never tests to Markdown (<a href="https://redirect.github.com/astral-sh/ruff/issues/15402">#15402</a>)</li> <li><a href="`c364b586f9`"><code>c364b58</code></a> [<code>flake8-pie</code>] Correctly remove wrapping parentheses (<code>PIE800</code>) (<a href="https://redirect.github.com/astral-sh/ruff/issues/15394">#15394</a>)</li> <li><a href="`73d424ee5e`"><code>73d424e</code></a> Fix outdated doc for handling the default file types with the pre-commit hook...</li> <li><a href="`6e9ff445fd`"><code>6e9ff44</code></a> Insert the cells from the <code>start</code> position (<a href="https://redirect.github.com/astral-sh/ruff/issues/15398">#15398</a>)</li> <li><a href="`f2c3ddc5ea`"><code>f2c3ddc</code></a> [red-knot] Move intersection type tests to Markdown (<a href="https://redirect.github.com/astral-sh/ruff/issues/15396">#15396</a>)</li> <li><a href="`b861551b6a`"><code>b861551</code></a> Remove unnecessary backticks (<a href="https://redirect.github.com/astral-sh/ruff/issues/15393">#15393</a>)</li> <li>Additional commits viewable in <a href="https://github.com/astral-sh/ruff/compare/0.5.4...0.9.1">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=ruff&package-manager=pip&previous-version=0.5.4&new-version=0.9.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>	2025-01-15 11:11:17 -08:00
Changming Sun	1ce59577d5	Add VCPKG triplet files (#23298 ) Add VCPKG triplet files. All the triplet files are automatically generated by gen.py. Put the files there to ease use.	2025-01-09 16:18:51 -08:00
Yifan Li	a3bb3f1487	[TensorRT EP] New CIs to test TRT+minimal CUDA build (#23028 ) ### Description <!-- Describe your changes. --> New CI: [Linux_TRT_Minimal_CUDA_Test_CI](https://dev.azure.com/onnxruntime/onnxruntime/_build?definitionId=230&_a=summary) and [Win_TRT_Minimal_CUDA_Test_CI ](https://dev.azure.com/onnxruntime/onnxruntime/_build?definitionId=231) Setting config for new CI to monitor if there's no issue to build ORT-TRTEP with minimal CUDA * yaml content is following Linux TRT CI yaml, with different build arg/cache name * build arg is following [[TensorRT EP] Enable a minimal CUDA EP compilation without kernels](https://github.com/microsoft/onnxruntime/pull/19052#issuecomment-1888066851) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Monitor if user is able to build ORT-TRTEP-minimalCUDA without any blocker (which takes ~30min to build)	2024-12-19 10:30:39 -08:00
Ankit Maheshkar	1f88284f96	OVEP 1.21.0 Development Updates (#23080 ) ### Description OVEP development changes for ORT 1.21 Release ### Motivation and Context - Has Critical Bug Fixes - Improved Performance optimizations for both memory & inference latency (https://github.com/intel/onnxruntime/pull/513) - Enabled Model Compilation using NPUW (https://github.com/intel/onnxruntime/pull/508) - Fixed support for EPContext embed mode 0 for lower memory utilization - Updated NuGet package name as `Intel.ML.OnnxRuntime.OpenVino` - Fixed QDQ Stripping logic on NPU	2024-12-11 22:26:32 -08:00
A-Satti	b14b4ec703	Restore Qspectre flag (#23060 ) Restore a removed Qspectre flag and update comment ### Motivation and Context Adjustment for PR `f5293d253c`	2024-12-09 21:52:21 -08:00
A-Satti	f5293d253c	Update Intel Thread Counts (#22894 ) ### Description The default thread count methodology by onnxruntime did not account for new upcoming Intel microarchitectures leading to a suboptimal thread count. Optimizing the thread count for new Intel microarchitectures reveal gains on the majority of models across datatypes and shows gains up to ~1.5x speedup. ### Motivation and Context Applications should run on Intel with the most performant thread configuration for the majority of models. With new microarchitectures, adjusting the thread count methodology is required to take advantage of their differences. <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-12-06 13:56:50 -08:00
Tianlei Wu	c97dd6e3c1	Update transformers test requirements (#22911 ) ### Description * Install PyTorch for transformers tests. The installation is before python tests so that it can use torch if needed. * Update protobuf and numpy versions used in transformers test. ### Motivation and Context Currently, transformers tests are enabled in the following CI pipelines: * Linux CPU CI Pipeline (torch for cpu-only) * Linux GPU CI Pipeline (torch for cuda 12) * Windows GPU CUDA CI Pipeline (torch for cpu-only right now, note that we might change it to torch for cuda 12 in the future). For ROCm CI Pipeline, transformer tests are enabled but skipped since onnx package is not installed in CI. Previously, torch was not installed before python tests, so some tests depending on torch were skipped like [test_bind_onnx_types_not_supported_by_numpy](`f6e1d44829/onnxruntime/test/python/onnxruntime_test_python_iobinding.py (L199)`) or [test user_compute_stream](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/test/python/onnxruntime_test_python.py#L465-L476). In this PR, we changed build.py to install torch before running python tests.	2024-11-22 09:45:12 -08:00
Tianlei Wu	8d99b1a8dc	reduce GQA test combinations (#22918 ) ### Description * Reduce GQA test combinations to save about 35 minutes test time in CI pipelines. * Show latency of transformers tests * Use seed in DMMHA test to avoid random failure. * For test_flash_attn_rocm.py, test skipping condition from "has cuda ep" to "not has rocm ep", so that it does not run in cpu build. * For test_flash_attn_cuda.py, move flash attention and memory efficient attention tests to different classes, so that we can skip a test suite instead of checking in each test. ### Motivation and Context It takes too long to run GQA tests in CI pipelines since there are too many combinations. ###### Linux GPU CI Pipeline Before: 5097 passed, 68 skipped, 8 warnings in 1954.64s (0:32:34) After: 150 passed, 176 skipped, 8 warnings in 530.38s (0:08:50) Time Saved: 1424 seconds (0:23:44) ###### Windows GPU CUDA CI Pipeline Before: 1781 passed, 72 skipped, 6 warnings in 605.48s (0:10:05) After: 116 passed, 118 skipped, 6 warnings in 275.48s (0:04:35) Time Saved: 330 seconds (0:05:30) ###### Linux CPU CI Pipeline Before: 5093 passed, 72 skipped, 4 warnings in 467.04s (0:07:47) - 212.96s transformers/test_gqa_cpu.py::TestGQA::test_gqa_past - 154.12s transformers/test_gqa_cpu.py::TestGQA::test_gqa_no_past - 26.45s transformers/test_gqa_cpu.py::TestGQA::test_gqa_interactive_one_batch After: 116 passed, 210 skipped, 4 warnings in 93.41s (0:01:33) - 0.97s transformers/test_gqa_cpu.py::TestGQA::test_gqa_past - 19.23s transformers/test_gqa_cpu.py::TestGQA::test_gqa_no_past - 2.41s transformers/test_gqa_cpu.py::TestGQA::test_gqa_interactive_one_batch Time Saved: 374 seconds (0:06:14).	2024-11-21 12:26:46 -08:00
Changming Sun	13346fdf18	Cleanup code (#22827 ) ### Description 1. Delete TVM EP because it is out of maintain 2. Delete ortmodule related docker files and scripts.	2024-11-19 14:13:33 -08:00
Tianlei Wu	72186bbb71	[CUDA] Build nhwc ops by default (#22648 ) ### Description * Build cuda nhwc ops by default. * Deprecate `--enable_cuda_nhwc_ops` in build.py and add `--disable_cuda_nhwc_ops` option Note that it requires cuDNN 9.x. If you build with cuDNN 8, NHWC ops will be disabled automatically. ### Motivation and Context In general, NHWC is faster than NCHW for convolution in Nvidia GPUs with Tensor Cores, and this could improve performance for vision models. This is the first step to prefer NHWC for CUDA in 1.21 release. Next step is to do some tests on popular vision models. If it help in most models and devices, set `prefer_nhwc=1` as default cuda provider option.	2024-11-06 09:54:55 -08:00
Yulong Wang	7a8fa12850	Add implementation of WebGPU EP (#22591 ) ### Description This PR adds the actual implementation of the WebGPU EP based on https://github.com/microsoft/onnxruntime/pull/22318. This change includes the following: <details> <summary><b>core framework of WebGPU EP</b></summary> - WebGPU EP factory classes for: - handling WebGPU options - creating WebGPU EP instance - creating WebGPU context - WebGPU Execution Provider classes - GPU Buffer allocator - data transfer - Buffer management classes - Buffer Manager - BufferCacheManager - DisabledCacheManager - SimpleCacheManager - LazyReleaseCacheManager - BucketCacheManager - Program classes - Program (base) - Program Cache Key - Program Manager - Shader helper classes - Shader Helper - ShaderIndicesHelper - ShaderVariableHelper - Utils - GPU Query based profiler - compute context - string utils - Miscs - Python binding webgpu support (basic) </details> <details> <summary><b>Kernel implementation</b></summary> - onnx.ai (default opset): - Elementwise (math): Abs, Neg, Floor, Ceil, Reciprocal, Sqrt, Exp, Erf, Log, Sin, Cos, Tan, Asin, Acos, Atan, Sinh, Cosh, Asinh, Acosh, Atanh, Tanh, Not, Cast - Elementwise (activation): Sigmoid, HardSigmoid, Clip, Elu, Relu, LeakyRelu, ThresholdedRelu, Gelu - Binary (math): Add, Sub, Mul, Div, Pow, Equal, Greater, GreaterOrEqual, Less, LessOrEqual - (Tensors): Shape, Reshape, Squeeze, Unsqueeze - Where - Transpose - Concat - Expand - Gather - Tile - Range - LayerNormalization - com.microsoft - FastGelu - MatMulNBits - MultiHeadAttention - RotaryEmbedding - SkipLayerNormalization - LayerNormalization - SimplifiedLayerNormalization - SkipSimplifiedLayerNormalization </details> <details> <summary><b>Build, test and CI pipeline integration</b></summary> - build works for Windows, macOS and iOS - support onnxruntime_test_all and python node test - added a new unit test for `--use_external_dawn` build flag. - updated MacOS pipeline to build with WebGPU support - added a new pipeline for WebGPU Windows </details> This change does not include: - Node.js binding support for WebGPU (will be a separate PR)	2024-10-29 18:29:40 -07:00
Yifan Li	951d9aa99f	[TensorRT EP] Refactor TRT version update logic & apply TRT 10.5 (#22483 ) ### Description <!-- Describe your changes. --> * Leverage template `common-variables.yml` and reduce usage of hardcoded trt_version `8391b24447/tools/ci_build/github/azure-pipelines/templates/common-variables.yml (L2-L7)` * Among all CI yamls, this PR reduces usage of hardcoding trt_version from 40 to 6, by importing trt_version from `common-variables.yml` * Apply TRT 10.5 and re-enable control flow op test ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> - Reduce usage of hardcoding trt_version among all CI ymls ### Next refactor PR will work on reducing usage of hardcoding trt_version among `.dockerfile`, `.bat` and remaining 2 yml files (download_win_gpu_library.yml & set-winenv.yml, which are step-template yaml that can't import variables)	2024-10-29 09:23:41 -07:00
Satya Kumar Jandhyala	4ed5bec2e7	[JS/WebGPU] Support WASM64 (#21836 ) ### Description Support wasm64 ### Motivation and Context Overcome memory limitations --------- Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2024-10-24 20:21:51 -07:00
Changming Sun	88676e62b9	Remove nsync (#20413 ) ### Description 1. Remove the onnxruntime::OrtMutex class and replace it with ~absl::Mutex~ std::mutex. 2. After this change, most source files will not include <Windows.h> indirectly. ### Motivation and Context To reduce the number of deps we have, and address some Github issues that are related to build ONNX Runtime from source. In PR #3000 , I added a custom implementation of std::mutex . It was mainly because at that time std::mutex's default constructor was not trivial on Windows. If you had such a mutex as a global var, it could not be initialized at compile time. Then VC++ team fixed this issue. Therefore we don't need this custom implementation anymore. This PR also removes nsync. I ran several models tests on Linux. I didn't see any perf difference. This PR also reverts PR #21005 , which is no longer needed since conda has updated its msvc runtime DLL. This PR unblocks #22173 and resolves #22092 . We have a lot of open issues with nsync. This PR can resolve all of them.	2024-10-21 15:32:14 -07:00
Edward Chen	04404ea482	Fix Xcode 16 iOS build issues (#22379 ) - Work around Xcode 16 iOS test build issue: `error: Multiple commands produce '.../PlugIns'`. - Fix link error in iOS static framework test. - Update build.py to check for the right kind of build before running iOS tests on the simulator. - Update Xcode 16 build images to 'macos-15' because that's the only image that will have Xcode 16 soon. See https://github.com/actions/runner-images/issues/10703.	2024-10-14 09:24:38 -07:00
Changming Sun	6ada97c84c	Fix a build issue when statically link to MSVC Runtime (#22393 ) Yesterday I updated ABSL to a newer version which added a new cmake option: ABSL_MSVC_STATIC_RUNTIME . I wasn't aware of it. This PR fixes it.	2024-10-10 20:09:13 -07:00
Yi Zhang	25b1c38e87	Add conv fp16 kernel in xnnpack EP (#22301 ) ### Description Add FP16 kernels of Conv and ConvTranspose [AB#50186](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/50186) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> ---------	2024-10-10 08:48:09 +08:00
Yulong Wang	c5d28cac4d	Initial WebGPU EP checkin (#22318 ) ### Description This change introduces the WebGPU EP into ONNX Runtime. To make the PR as simple as possible, this PR excluded the following: - C API changes for WebGPU EP - actual implementation of WebGPU EP. Currently in this PR, WebGPU is a stub implementation that does not register any kernel. - Python IO Binding update - Node.js IO Binding update This PR now contains only 43 file changes (while the working branch contains 130+) and hopefully this makes it easier to review. There is going to be separated PRs for each mentioned above. Current working branch: #21904	2024-10-08 16:10:46 -07:00
jingyanwangms	d0b0ecfdb9	[Running CI] Update TensorRT to 10.4 (#22049 ) ### Description TensorRT 10.4 is GA now, update to 10.4 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-09-26 11:10:52 -07:00
George Wu	944d87381d	[QNN EP] set up py packaging pipeline for Linux x64 (#22132 ) set up a pipeline to produce nightly Linux x64 whls for onnxruntime-qnn this can be used for offline context binary generation.	2024-09-18 23:24:32 -07:00
Michael Tyler	904b850b44	Update Arm Compute Library Execution Provider (#22032 ) ### Description This PR makes the following updates to the Arm Compute Library execution provider: - Target Arm Compute Library 24.07 - Add support for the following operators: - Conv (FP16) - NhwcConv - QLinearConv - MatMul - FusedMatMul - MatMulIntegerToFloat - Optimize memory usage and performance - Expose the enable_fast_math setting - Use the main runtime thread pool ### Motivation and Context These updates improve performance and memory usage, and enable use of a more recent version of Arm Compute Library. @microsoft-github-policy-service agree company="Arm Ltd" --------- Signed-off-by: Michael Tyler <michael.tyler@arm.com>	2024-09-12 20:51:59 -07:00
PARK DongHa	f633caa0b1	Create CMake option `onnxruntime_USE_VCPKG` (#21348 ) ### Changes 1. CMake option `onnxruntime_USE_VCPKG`. It will be used in the vcpkg port * Unit test may fail because this option leads to a mixture of unexpected external library versions. Especially ONNX, Protobuf, and Flatbuffers version can be different 2. Overhaul of `onnxruntime_external_deps.cmake` * Make `FetchContent_Declare` to try `find_package`. See https://cmake.org/cmake/help/latest/guide/using-dependencies/index.html * Relocated `FetchContent_Declare` and `FetchContent_MakeAvailable`(or `onnxruntime_fetchcontent_makeavailable`) to closer lines. It was too hard to navigate the entire file to search related sections... * Alias `IMPORTED` targets like build targets (e.g. `ONNX::onnx` --> `onnx`) ```cmake # The script uses `find_package` with the changes. # In this case, use vcpkg to search dependencies # See https://cmake.org/cmake/help/latest/guide/using-dependencies/index.html include(external/onnxruntime_external_deps.cmake) ``` 3. Create CMakePresets.json and presets to [run vcpkg in manifest mode](https://learn.microsoft.com/en-us/vcpkg/concepts/manifest-mode) * Currently, it's NOT for training build * Main triplets are `x64-windows` and `x64-osx` ```pwsh Push-Location "cmake" cmake --preset "x64-windows-vcpkg" cmake --build --preset "x64-windows-vcpkg-debug" Pop-Location ``` ```bash pushd "cmake" cmake --preset "x64-osx-vcpkg" cmake --build --preset "x64-osx-vcpkg-debug" popd ``` 4. Updated tools/ci_build/build.py * `--use_vcpkg` option: it needs `CMAKE_TOOLCHAIN_FILE` with [vcpkg.cmake toolchain script](https://github.com/microsoft/vcpkg/blob/master/scripts/buildsystems/vcpkg.cmake) * `--compile_no_warning_as_error` is recommended because library version differences will cause unexpected compiler warnings ```bash python ./tools/ci_build/build.py \ --compile_no_warning_as_error \ --use_vcpkg \ --cmake_extra_defines "CMAKE_TOOLCHAIN_FILE:FILEPATH=${VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake" \ --cmake_extra_defines "VCPKG_TARGET_TRIPLET=..." ``` 5. Created Job `Vcpkg` for Windows and macOS * Show how to setup and use vcpkg. Similar to the CMakePresets.json usage ### Motivation and Context * Help #7150 * Help https://github.com/microsoft/vcpkg/pull/36850 * https://github.com/luncliff/vcpkg-registry/pull/212 * https://github.com/microsoft/vcpkg/pull/39881 * https://github.com/luncliff/vcpkg-registry/pull/215 * https://github.com/luncliff/vcpkg-registry/pull/216 * https://github.com/luncliff/vcpkg-registry/pull/227 * https://cmake.org/cmake/help/latest/guide/using-dependencies/index.html * https://github.com/microsoft/vcpkg/blob/master/scripts/buildsystems/vcpkg.cmake ### Future Works? More feature coverage with the vcpkg supported libraries * CUDA feature support * Training feature support	2024-09-10 16:39:27 -07:00
Guenther Schmuelling	ba7baae994	Revert "Upgrade emsdk from 3.1.59 to 3.1.62" (#21817 ) Reverts microsoft/onnxruntime#21421 Users are seeing chrome memory grow to 16GB before it crashes: https://github.com/microsoft/onnxruntime/issues/21810 Revert for now so we have time to debug.	2024-08-22 11:21:00 -07:00
Satya Kumar Jandhyala	6d8de1f7b8	Upgrade emsdk from 3.1.59 to 3.1.62 (#21421 ) ### Description Upgrade EM SDK to 3.1.62. ### Motivation and Context The changes are required to clear wasm64 errors.	2024-08-14 12:38:52 -07:00
Justin Chu	c203d89958	Update ruff and clang-format versions (#21479 ) ruff -> 0.5.4 clang-format -> 18	2024-07-24 11:50:11 -07:00
mindest	5b9369e93c	Fix typos according to reviewdog report. (#21335 ) ### Description Fix typos based on reviewdog report but with some exceptions/corrections.	2024-07-22 13:37:32 -07:00
Jian Chen	4e75605eec	Replace inline pip install with pip install from requirements.txt (#21106 ) ### Description Replace inline pip install with pip install from requirements.txt ### Motivation and Context so that CG can recognize ### Dependency - [x] https://github.com/microsoft/onnxruntime/pull/21085	2024-07-22 12:39:10 -07:00
Ted Themistokleous	4ac4cd2668	Migraphx ep windows build (#21284 ) ### Description Repeat of #21084 with removal of policy CMP0144 to suppress warnings which uses CMake 3.27.0. ### Motivation and Context Already approved PR: https://github.com/microsoft/onnxruntime/pull/21084 Removed the added policy from CMake 3.27.0.	2024-07-11 21:21:38 -07:00
Baiju Meswani	116398c1a4	onnxruntime shared lib inside python package (#21223 )	2024-07-02 15:37:50 -07:00
Chen Feiyue	56b36a58ba	Initial PR for VSINPU execution provider (#20903 ) ### Description <!-- Describe your changes. --> -It is an initial PR for VSINPU execution provider ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> - For support VeriSilicon hardware - TIM-VX(Tensor Interface Module) (https://github.com/VeriSilicon/TIM-VX) is an integrated software solution by Verisilicon for our hardware(A311D/i.MX 8M Plus etc.) design, it is easy to use Verisilicon’s hardware by simply connecting onnxruntime with the TIM-VX API by this VSINPU execution provider.	2024-06-28 21:48:34 -07:00
Preetha Veeramalai	6baaaf5165	OVEP options to disable CPU fallback at compile time (#21166 ) ### Description Provide user level options to control the fallback on CPU for models not supported on Intel's NPU hardware. ### Motivation and Context - Current workflow of OVEP allows safe fallback from OV NPU to OV CPU on compilation failures. Also supports MLAS CPU fallback in presence of unsupported custom ops. - The PR provides a build-time option to disable fallback from OV NPU to OV CPU. - The session Option "kOrtSessionOptionsDisableCPUEPFallback" disables OV CPU and MLAS CPU fallback. - Also has bug fix for proto creation. --------- Co-authored-by: jatinwadhwa921 <jatin.wadhwa@intel.com> Co-authored-by: ankitm3k <ankit.maheshkar@intel.com>	2024-06-28 08:31:02 -07:00
Changming Sun	f5625b8858	Revert "[MIGraphX EP] enable compilation and execution on Windows (21084)" (#21132 ) ### Description This reverts commit `1d7bf56947` because it broken the AMD GPU CI pipeline. Sorry when I reviewed the PR I forgot to run the AMD GPU CI pipeline. Will revert the PR first then ask the author to fix the issue.	2024-06-21 01:01:07 -07:00
Ted Themistokleous	1d7bf56947	[MIGraphX EP] enable compilation and execution on Windows (#36 ) (#21084 )	2024-06-20 16:21:11 -07:00
Changming Sun	be423747b1	Delete pyop (#21094 ) ### Description Remove the "--enable_language_interop_ops" build flag, because the code is incompatible with the latest numpy, and the build flag is not used anywhere except a macOS CI pipeline. It does not seem to have a ship plan. ### Motivation and Context The build error was: ``` onnxruntime/core/language_interop_ops/pyop/pyop.cc:122:85: error: no member named 'elsize' in '_PyArray_Descr' static_cast<int64_t>(PyArray_DescrFromType(type)->elsize), ~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ```	2024-06-19 16:21:33 -07:00
Clément Péron	8ab8e649a7	tools: build: fix typo (#21052 ) ### Description Typo in the python build script	2024-06-19 16:14:58 -07:00
Scott McKay	5fc60f36f2	Update to the net8 MAUI targets. Remove Xamarin. (#21062 ) ### Description <!-- Describe your changes. --> Xamarin is EOL so remove support. The MAUI targets are EOL and need updating. https://dotnet.microsoft.com/en-us/platform/support/policy/maui Other cleanups: - netcoreapp3.1 is EOL - the net6 macos target was added in the mistaken belief that was for MAUI mac support, but that is actually via the mac-catalyst target which we recently added support for. - some CIs that were using the old build setup of splitting pre-net6 targets. The ORT C# bindings csproj was updated last year and the `PreNet6` and `SelectedTargets` properties no longer exist as they were replaced by the simpler `IncludeMobileTargets` property. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Remove EOL components. #21058	2024-06-19 16:20:58 +10:00
Jian Chen	1ad2c0a4b2	fix Window_CI in Github Action (#21070 ) ### Description fix Window_CI in Github Action	2024-06-18 23:14:08 -07:00
Changming Sun	ffb8e8eb0e	Update build.py: add a comment (#20993 ) ### Description Update build.py: add a comment ### Motivation and Context See the comment.	2024-06-18 13:52:34 -07:00
Nikolai Svakhin	7b3fff650a	Updated build script for CUDA case (#20987 ) ### Description In CUDA case, use the cuda_home variable to set CMAKE's CUDA compiler to a correct version of NVCC Otherwise, an NVCC from a current PATH would be picked up, which could be from a different version of CUDA. ### Motivation and Context I had a case when I had main CUDA installed, and it was a version 11.8. I wanted to build against 12.5, so I downloaded and unpacked it into a separate directory and passed it as a `--cuda-home` parameter, however the ONNX builder was still picking the NVCC compiler from 11.8. This would fix the issue https://github.com/microsoft/onnxruntime/issues/20928 cc @gedoensmax	2024-06-17 14:41:43 -07:00
Changming Sun	feec8efae4	Add "-allow-unsupported-compiler" flags to Windows CUDA flags (#21004 ) ### Description Add "-allow-unsupported-compiler" flags to Windows CUDA flags. This change only impacts our pipelines. By default it would not reach this code path. ### Motivation and Context nvcc refuses working with the latest VS toolset unless this flag is set. If without this change, our CI build will fail with the compiler is the latest VS 2022 17.10. Here is the log: https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1405549&view=logs&j=6df8fe70-7b8f-505a-8ef0-8bf93da2bac7&t=c7e55e04-f02b-57dc-d19a-29b7d3528c44&l=715 The error message is: `D:\a\_work\_temp\v11.8\include\crt/host_config.h(153): fatal error C1189: #error: -- unsupported Microsoft Visual Studio version! Only the versions between 2017 and 2022 (inclusive) are supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk. [D:\a\_work\1\b\RelWithDebInfo\CMakeFiles\CMakeScratch\TryCompile-g5rudf\cmTC_7b8ff.vcxproj]`	2024-06-12 14:23:00 -07:00
Baiju Meswani	94aa21c3dd	Define _DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR (#21005 ) https://github.com/microsoft/STL/pull/3824 introduces constexpr mutex. An older version of msvcp140.dll will lead to ```A dynamic link library (DLL) initialization routine failed```. This error can be encountered if using conda Python since conda packages msvc dlls and these are older right now. This PR disables the constexpr mutex so that ort package can work with older msvc dlls. Thanks @snnn for the discovery.	2024-06-11 22:23:28 -07:00

1 2 3 4 5 ...

623 commits