Commit graph

623 commits

Author SHA1 Message Date
shaoboyan091
5ef18328bf
[WebGPU] Support PIX Capture for WebGPU EP (#23192)
PIX Capture tool requires 'present' to end a frame capture. ORT doesn't
have rendering work so no 'present' happens.

To avoid endless waiting for PIX capture tool, this PR added a blank
surface and 'present' on it in each session run.

The surface is created in WebGPU ep constructor and closed in WebGPU ep
destructor.

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2025-02-08 02:05:15 -08:00
Changming Sun
5f6a3158f8
Enable VCPKG in CI build (#23426)
### Description
1. Enable VCPKG flag in Windows CPU CI build pipelines. 
2. Increased the min supported cmake version from 3.26 to 3.28. Because
of it, drop the support for the old way of finding python by
"find_package(PythonLibs)". Therefore, in build.py we no longer set
"PYTHON_EXECUTABLE" cmake var when doing cmake configure.
3. Added "xnnpack-ep" as a feature for ORT's vcpkg config.
4. Added asset cache support for ORT's vcpkg build
5. Added VCPKG triplet files for Android build.
6. Set VCPKG triplet to "universal2-osx" if CMAKE_OSX_ARCHITECTURES was
found in cmake extra defines.
7. Removed a small piece of code in build.py, which was for support CUDA
version < 11.8.
8. Fixed an issue that CMAKE_OSX_ARCHITECTURES sometimes got specified
twice when build.py invoked cmake.
9. Added more model tests to Android build. After this change, we will
test all ONNX versions instead of just the latest one.
10. Fixed issues that are related to build.py's "--build_nuget"
parameter. Also, enable the flag in most Windows CPU CI build jobs.
11. Removed a restriction in build.py that disallowed cross-compiling
Windows ARM64 nuget package on Windows x86.
 
### Motivation and Context
Adopt vcpkg.
2025-02-05 10:58:53 -08:00
Changming Sun
ead9d5cf43
Set ANDROID_USE_LEGACY_TOOLCHAIN_FILE to false (#23544)
NDK has two toolchain cmake files as you can see in 

https://android.googlesource.com/platform/ndk/+/refs/heads/main/build/cmake

By default NDK use the legacy one for providing the best compatibility.
We don't need to. This PR changes to use the new one.

The new toolchain cmake file uses standard cmake flags like
CMAKE_ANDROID_RTTI to control C++ features.
2025-01-30 16:10:09 -08:00
Karim Vadsariya
655a23ff1d
[onnxruntime/build] Add new flag enable_generic_interface to build primary EPs by default (#23342)
### Description
- Add new build flag in build.py to build onnxruntime.dll supporting
interfaces for all primary EPs( QNN, TensoRT, OpenVino, VitisAI).
- Modify onnxruntime.dll/onnxruntime_shared.dll build settings to remove
dependency of IHV SDK Toolset to be installed on the system.
- Change CMake variables to be explicit when building EP vs ORT. e.g.
onnxruntime_USE_TENSORRT vs onnxruntime_USE_TENSORRT_INTERFACE, to
evolve the build system to build ORT independent of EPs.



### Motivation and Context
Changes in the build system required to evolve the repo to build the
components independently while removing unnecessary dependencies

---------

Co-authored-by: Lei Cao <jslhcl@gmail.com>
Co-authored-by: Karim Vadsariya <kvadsariya@microsoft.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2025-01-28 15:24:09 -08:00
Adrian Lizarraga
3b4c7df4e9
[QNN EP] Make QNN EP a shared library (#23120)
### Description
- Makes QNN EP a shared library **by default** when building with
`--use_qnn` or `--use_qnn shared_lib`. Generates the following build
artifacts:
- **Windows**: `onnxruntime_providers_qnn.dll` and
`onnxruntime_providers_shared.dll`
- **Linux**: `libonnxruntime_providers_qnn.so` and
`libonnxruntime_providers_shared.so`
  - **Android**: Not supported. Must build QNN EP as a static library.
- Allows QNN EP to still be built as a static library with `--use_qnn
static_lib`. This is primarily for the Android QNN AAR package.
- Unit tests run for both the static and shared QNN EP builds.

### Detailed changes
- Updates Java bindings to support both shared and static QNN EP builds.
- Provider bridge API:
- Adds logging sink ETW to the provider bridge. Allows EPs to register
ETW callbacks for ORT logging.
- Adds a variety of methods for onnxruntime objects that are needed by
QNN EP.
- QNN EP:
- Adds `ort_api.h` and `ort_api.cc` that encapsulates the API provided
by ORT in a manner that allows the EP to be built as either a shared or
static library.
- Adds custom function to transpose weights for Conv and Gemm (instead
of adding util to provider bridge API).
- Adds custom function to quantize data for LeakyRelu (instead of adding
util to provider bridge API).
  - Adds custom ETW tracing for QNN profiling events:
    - shared library: defines its own TraceLogging provider handle
- static library: uses ORT's TraceLogging provider handle and existing
telemetry provider.
- ORT-QNN Packages:
- **Python**: Pipelines build QNN EP as a shared library by default.
User can build a local python wheel with QNN EP as a static library by
passing `--use_qnn static_lib`.
- **NuGet**: Pipelines build QNN EP as a shared library by default.
`build.py` currently enforces QNN EP to be built as a shared library.
Can add support for building a QNN NuGet package with static later if
deemed necessary.
- **Android**: Pipelines build QNN EP as a **static library**.
`build.py` enforces QNN EP to be built as a static library. Packaging
multiple shared libraries into an Android AAR package is not currently
supported due to the added need to also distribute a shared libcpp.so
library.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2025-01-22 12:11:00 -08:00
Justin Chu
ad312d9677
Enable comprehension simplification in ruff rules (#23414)
Enable comprehension simplification rules (C4) for ruff and apply
autofix.
2025-01-17 08:43:06 -08:00
Justin Chu
09c4cc7b36
Target py310 and modernize codebase with ruff (#23401)
Change `target-version = "py310"` and modernize the code base with ruff.
2025-01-16 19:10:14 -08:00
Justin Chu
c7c8757a1c
Use ruff as the formatter to replace black-isort (#23397)
Use ruff as the code formatter in place of black and isort since it is
much faster, and as projects like PyTorch and ONNX have adopted ruff
format as well.

This PR include only auto-fixed changes in formatting.
2025-01-16 11:14:15 -08:00
Ted Themistokleous
7cd08a6004
[MigraphX EP] [ROCm EP] Upstream ROCm changes for bugfixes and features (#23249)
Add support to mainline Onnxruntime of changes from the ROCm Team's changes

### Motivation and Context
Various bugfixes, and changes added between ROCm 6.2 and 6.3 that
haven't been upstreamed yet to mainline

---------

Co-authored-by: Yueqing Zhang <yuz75@Pitt.edu>
Co-authored-by: Yueqing Zhang <yueqingz@amd.com>
Co-authored-by: Jeff Daily <jeff.daily@amd.com>
Co-authored-by: Artur Wojcik <artur.wojcik@outlook.com>
Co-authored-by: Ted Themistokleous <tedthemistokleous@amd.com>
Co-authored-by: Xinya Zhang <Xinya.Zhang@amd.com>
Co-authored-by: ikalinic <ilija.kalinic@amd.com>
Co-authored-by: sstamenk <sstamenk@amd.com>
2025-01-15 12:57:04 -08:00
dependabot[bot]
1461a16e71
Bump ruff from 0.5.4 to 0.9.1 (#23328)
Bumps [ruff](https://github.com/astral-sh/ruff) from 0.5.4 to 0.9.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/ruff/releases">ruff's
releases</a>.</em></p>
<blockquote>
<h2>0.9.1</h2>
<h2>Release Notes</h2>
<h3>Preview features</h3>
<ul>
<li>[<code>pycodestyle</code>] Run
<code>too-many-newlines-at-end-of-file</code> on each cell in notebooks
(<code>W391</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15308">#15308</a>)</li>
<li>[<code>ruff</code>] Omit diagnostic for shadowed private function
parameters in <code>used-dummy-variable</code> (<code>RUF052</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15376">#15376</a>)</li>
</ul>
<h3>Rule changes</h3>
<ul>
<li>[<code>flake8-bugbear</code>] Improve
<code>assert-raises-exception</code> message (<code>B017</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15389">#15389</a>)</li>
</ul>
<h3>Formatter</h3>
<ul>
<li>Preserve trailing end-of line comments for the last string literal
in implicitly concatenated strings (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15378">#15378</a>)</li>
</ul>
<h3>Server</h3>
<ul>
<li>Fix a bug where the server and client notebooks were out of sync
after reordering cells (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15398">#15398</a>)</li>
</ul>
<h3>Bug fixes</h3>
<ul>
<li>[<code>flake8-pie</code>] Correctly remove wrapping parentheses
(<code>PIE800</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15394">#15394</a>)</li>
<li>[<code>pyupgrade</code>] Handle comments and multiline expressions
correctly (<code>UP037</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15337">#15337</a>)</li>
</ul>
<h2>Contributors</h2>
<ul>
<li><a
href="https://github.com/AntoineD"><code>@​AntoineD</code></a></li>
<li><a
href="https://github.com/InSyncWithFoo"><code>@​InSyncWithFoo</code></a></li>
<li><a
href="https://github.com/MichaReiser"><code>@​MichaReiser</code></a></li>
<li><a href="https://github.com/calumy"><code>@​calumy</code></a></li>
<li><a
href="https://github.com/dcreager"><code>@​dcreager</code></a></li>
<li><a
href="https://github.com/dhruvmanila"><code>@​dhruvmanila</code></a></li>
<li><a href="https://github.com/dylwil3"><code>@​dylwil3</code></a></li>
<li><a href="https://github.com/sharkdp"><code>@​sharkdp</code></a></li>
<li><a href="https://github.com/tjkuson"><code>@​tjkuson</code></a></li>
</ul>
<h2>Install ruff 0.9.1</h2>
<h3>Install prebuilt binaries via shell script</h3>
<pre lang="sh"><code>curl --proto '=https' --tlsv1.2 -LsSf
https://github.com/astral-sh/ruff/releases/download/0.9.1/ruff-installer.sh
| sh
</code></pre>
<h3>Install prebuilt binaries via powershell script</h3>
<pre lang="sh"><code>powershell -ExecutionPolicy ByPass -c &quot;irm
https://github.com/astral-sh/ruff/releases/download/0.9.1/ruff-installer.ps1
| iex&quot;
</code></pre>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md">ruff's
changelog</a>.</em></p>
<blockquote>
<h2>0.9.1</h2>
<h3>Preview features</h3>
<ul>
<li>[<code>pycodestyle</code>] Run
<code>too-many-newlines-at-end-of-file</code> on each cell in notebooks
(<code>W391</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15308">#15308</a>)</li>
<li>[<code>ruff</code>] Omit diagnostic for shadowed private function
parameters in <code>used-dummy-variable</code> (<code>RUF052</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15376">#15376</a>)</li>
</ul>
<h3>Rule changes</h3>
<ul>
<li>[<code>flake8-bugbear</code>] Improve
<code>assert-raises-exception</code> message (<code>B017</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15389">#15389</a>)</li>
</ul>
<h3>Formatter</h3>
<ul>
<li>Preserve trailing end-of line comments for the last string literal
in implicitly concatenated strings (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15378">#15378</a>)</li>
</ul>
<h3>Server</h3>
<ul>
<li>Fix a bug where the server and client notebooks were out of sync
after reordering cells (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15398">#15398</a>)</li>
</ul>
<h3>Bug fixes</h3>
<ul>
<li>[<code>flake8-pie</code>] Correctly remove wrapping parentheses
(<code>PIE800</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15394">#15394</a>)</li>
<li>[<code>pyupgrade</code>] Handle comments and multiline expressions
correctly (<code>UP037</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15337">#15337</a>)</li>
</ul>
<h2>0.9.0</h2>
<p>Check out the <a href="https://astral.sh/blog/ruff-v0.9.0">blog
post</a> for a migration guide and overview of the changes!</p>
<h3>Breaking changes</h3>
<p>Ruff now formats your code according to the 2025 style guide. As a
result, your code might now get formatted differently. See the formatter
section for a detailed list of changes.</p>
<p>This release doesn’t remove or remap any existing stable rules.</p>
<h3>Stabilization</h3>
<p>The following rules have been stabilized and are no longer in
preview:</p>
<ul>
<li><a
href="https://docs.astral.sh/ruff/rules/stdlib-module-shadowing/"><code>stdlib-module-shadowing</code></a>
(<code>A005</code>).
This rule has also been renamed: previously, it was called
<code>builtin-module-shadowing</code>.</li>
<li><a
href="https://docs.astral.sh/ruff/rules/builtin-lambda-argument-shadowing/"><code>builtin-lambda-argument-shadowing</code></a>
(<code>A006</code>)</li>
<li><a
href="https://docs.astral.sh/ruff/rules/slice-to-remove-prefix-or-suffix/"><code>slice-to-remove-prefix-or-suffix</code></a>
(<code>FURB188</code>)</li>
<li><a
href="https://docs.astral.sh/ruff/rules/boolean-chained-comparison/"><code>boolean-chained-comparison</code></a>
(<code>PLR1716</code>)</li>
<li><a
href="https://docs.astral.sh/ruff/rules/decimal-from-float-literal/"><code>decimal-from-float-literal</code></a>
(<code>RUF032</code>)</li>
<li><a
href="https://docs.astral.sh/ruff/rules/post-init-default/"><code>post-init-default</code></a>
(<code>RUF033</code>)</li>
<li><a
href="https://docs.astral.sh/ruff/rules/useless-if-else/"><code>useless-if-else</code></a>
(<code>RUF034</code>)</li>
</ul>
<p>The following behaviors have been stabilized:</p>
<ul>
<li><a
href="https://docs.astral.sh/ruff/rules/pytest-parametrize-names-wrong-type/"><code>pytest-parametrize-names-wrong-type</code></a>
(<code>PT006</code>): Detect <a
href="https://docs.pytest.org/en/7.1.x/how-to/parametrize.html#parametrize"><code>pytest.parametrize</code></a>
calls outside decorators and calls with keyword arguments.</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="12f86f39a4"><code>12f86f3</code></a>
Ruff 0.9.1 (<a
href="https://redirect.github.com/astral-sh/ruff/issues/15407">#15407</a>)</li>
<li><a
href="2b28d566a4"><code>2b28d56</code></a>
Associate a trailing end-of-line comment in a parenthesized implicit
concaten...</li>
<li><a
href="adca7bd95c"><code>adca7bd</code></a>
Remove pygments pin (<a
href="https://redirect.github.com/astral-sh/ruff/issues/15404">#15404</a>)</li>
<li><a
href="6b98a26452"><code>6b98a26</code></a>
[red-knot] Support <code>assert_type</code> (<a
href="https://redirect.github.com/astral-sh/ruff/issues/15194">#15194</a>)</li>
<li><a
href="c87463842a"><code>c874638</code></a>
[red-knot] Move tuple-containing-Never tests to Markdown (<a
href="https://redirect.github.com/astral-sh/ruff/issues/15402">#15402</a>)</li>
<li><a
href="c364b586f9"><code>c364b58</code></a>
[<code>flake8-pie</code>] Correctly remove wrapping parentheses
(<code>PIE800</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/issues/15394">#15394</a>)</li>
<li><a
href="73d424ee5e"><code>73d424e</code></a>
Fix outdated doc for handling the default file types with the pre-commit
hook...</li>
<li><a
href="6e9ff445fd"><code>6e9ff44</code></a>
Insert the cells from the <code>start</code> position (<a
href="https://redirect.github.com/astral-sh/ruff/issues/15398">#15398</a>)</li>
<li><a
href="f2c3ddc5ea"><code>f2c3ddc</code></a>
[red-knot] Move intersection type tests to Markdown (<a
href="https://redirect.github.com/astral-sh/ruff/issues/15396">#15396</a>)</li>
<li><a
href="b861551b6a"><code>b861551</code></a>
Remove unnecessary backticks (<a
href="https://redirect.github.com/astral-sh/ruff/issues/15393">#15393</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/astral-sh/ruff/compare/0.5.4...0.9.1">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=ruff&package-manager=pip&previous-version=0.5.4&new-version=0.9.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2025-01-15 11:11:17 -08:00
Changming Sun
1ce59577d5
Add VCPKG triplet files (#23298)
Add VCPKG triplet files. All the triplet files are automatically
generated by gen.py. Put the files there to ease use.
2025-01-09 16:18:51 -08:00
Yifan Li
a3bb3f1487
[TensorRT EP] New CIs to test TRT+minimal CUDA build (#23028)
### Description
<!-- Describe your changes. -->
New CI:
[Linux_TRT_Minimal_CUDA_Test_CI](https://dev.azure.com/onnxruntime/onnxruntime/_build?definitionId=230&_a=summary)
and [Win_TRT_Minimal_CUDA_Test_CI
](https://dev.azure.com/onnxruntime/onnxruntime/_build?definitionId=231)
Setting config for new CI to monitor if there's no issue to build
ORT-TRTEP with minimal CUDA
* yaml content is following Linux TRT CI yaml, with different build
arg/cache name
* build arg is following [[TensorRT EP] Enable a minimal CUDA EP
compilation without
kernels](https://github.com/microsoft/onnxruntime/pull/19052#issuecomment-1888066851)



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Monitor if user is able to build ORT-TRTEP-minimalCUDA without any
blocker
(which takes ~30min to build)
2024-12-19 10:30:39 -08:00
Ankit Maheshkar
1f88284f96
OVEP 1.21.0 Development Updates (#23080)
### Description
OVEP development changes for ORT 1.21 Release
 
 
### Motivation and Context
- Has Critical Bug Fixes
- Improved Performance optimizations for both memory & inference latency
(https://github.com/intel/onnxruntime/pull/513)
- Enabled Model Compilation using NPUW
(https://github.com/intel/onnxruntime/pull/508)
- Fixed support for EPContext embed mode 0 for lower memory utilization
- Updated NuGet package name as `Intel.ML.OnnxRuntime.OpenVino`
- Fixed QDQ Stripping logic on NPU
2024-12-11 22:26:32 -08:00
A-Satti
b14b4ec703
Restore Qspectre flag (#23060)
Restore a removed Qspectre flag and update comment

### Motivation and Context
Adjustment for PR
f5293d253c
2024-12-09 21:52:21 -08:00
A-Satti
f5293d253c
Update Intel Thread Counts (#22894)
### Description
The default thread count methodology by onnxruntime did not account for
new upcoming Intel microarchitectures leading to a suboptimal thread
count. Optimizing the thread count for new Intel microarchitectures
reveal gains on the majority of models across datatypes and shows gains
up to ~1.5x speedup.


### Motivation and Context
Applications should run on Intel with the most performant thread
configuration for the majority of models. With new microarchitectures,
adjusting the thread count methodology is required to take advantage of
their differences.
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-12-06 13:56:50 -08:00
Tianlei Wu
c97dd6e3c1
Update transformers test requirements (#22911)
### Description

* Install PyTorch for transformers tests. The installation is before
python tests so that it can use torch if needed.
* Update protobuf and numpy versions used in transformers test.

### Motivation and Context

Currently, transformers tests are enabled in the following CI pipelines:
* Linux CPU CI Pipeline (torch for cpu-only)
* Linux GPU CI Pipeline (torch for cuda 12)
* Windows GPU CUDA CI Pipeline (torch for cpu-only right now, note that
we might change it to torch for cuda 12 in the future).

For ROCm CI Pipeline, transformer tests are enabled but skipped since
onnx package is not installed in CI.

Previously, torch was not installed before python tests, so some tests
depending on torch were skipped like
[test_bind_onnx_types_not_supported_by_numpy](f6e1d44829/onnxruntime/test/python/onnxruntime_test_python_iobinding.py (L199))
or [test
user_compute_stream](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/test/python/onnxruntime_test_python.py#L465-L476).

In this PR, we changed build.py to install torch before running python
tests.
2024-11-22 09:45:12 -08:00
Tianlei Wu
8d99b1a8dc
reduce GQA test combinations (#22918)
### Description
* Reduce GQA test combinations to save about 35 minutes test time in CI
pipelines.
* Show latency of transformers tests
* Use seed in DMMHA test to avoid random failure.
* For test_flash_attn_rocm.py, test skipping condition from "has cuda
ep" to "not has rocm ep", so that it does not run in cpu build.
* For test_flash_attn_cuda.py, move flash attention and memory efficient
attention tests to different classes, so that we can skip a test suite
instead of checking in each test.

### Motivation and Context
It takes too long to run GQA tests in CI pipelines since there are too
many combinations.

###### Linux GPU CI Pipeline
Before: 5097 passed, 68 skipped, 8 warnings in 1954.64s (0:32:34)
After:  150 passed, 176 skipped, 8 warnings in 530.38s (0:08:50)
Time Saved: **1424** seconds (0:23:44)

###### Windows GPU CUDA CI Pipeline
Before: 1781 passed, 72 skipped, 6 warnings in 605.48s (0:10:05)
After: 116 passed, 118 skipped, 6 warnings in 275.48s (0:04:35) 
Time Saved: **330** seconds (0:05:30)

###### Linux CPU CI Pipeline
Before: 5093 passed, 72 skipped, 4 warnings in 467.04s (0:07:47)
- 212.96s transformers/test_gqa_cpu.py::TestGQA::test_gqa_past
- 154.12s transformers/test_gqa_cpu.py::TestGQA::test_gqa_no_past
- 26.45s
transformers/test_gqa_cpu.py::TestGQA::test_gqa_interactive_one_batch

After: 116 passed, 210 skipped, 4 warnings in 93.41s (0:01:33)
- 0.97s  transformers/test_gqa_cpu.py::TestGQA::test_gqa_past
- 19.23s transformers/test_gqa_cpu.py::TestGQA::test_gqa_no_past
- 2.41s
transformers/test_gqa_cpu.py::TestGQA::test_gqa_interactive_one_batch

Time Saved: **374** seconds (0:06:14).
2024-11-21 12:26:46 -08:00
Changming Sun
13346fdf18
Cleanup code (#22827)
### Description
1.  Delete TVM EP because it is out of maintain 
2.  Delete ortmodule related docker files and scripts.
2024-11-19 14:13:33 -08:00
Tianlei Wu
72186bbb71
[CUDA] Build nhwc ops by default (#22648)
### Description

* Build cuda nhwc ops by default.
* Deprecate `--enable_cuda_nhwc_ops` in build.py and add
`--disable_cuda_nhwc_ops` option

Note that it requires cuDNN 9.x. If you build with cuDNN 8, NHWC ops
will be disabled automatically.

### Motivation and Context

In general, NHWC is faster than NCHW for convolution in Nvidia GPUs with
Tensor Cores, and this could improve performance for vision models.

This is the first step to prefer NHWC for CUDA in 1.21 release. Next
step is to do some tests on popular vision models. If it help in most
models and devices, set `prefer_nhwc=1` as default cuda provider option.
2024-11-06 09:54:55 -08:00
Yulong Wang
7a8fa12850
Add implementation of WebGPU EP (#22591)
### Description

This PR adds the actual implementation of the WebGPU EP based on
https://github.com/microsoft/onnxruntime/pull/22318.

This change includes the following:

<details>
<summary><b>core framework of WebGPU EP</b></summary>

  - WebGPU EP factory classes for:
    - handling WebGPU options
    - creating WebGPU EP instance
    - creating WebGPU context
  - WebGPU Execution Provider classes
    - GPU Buffer allocator
    - data transfer
  - Buffer management classes
    - Buffer Manager
    - BufferCacheManager
      - DisabledCacheManager
      - SimpleCacheManager
      - LazyReleaseCacheManager
      - BucketCacheManager
  - Program classes
    - Program (base)
    - Program Cache Key
    - Program Manager
  - Shader helper classes
    - Shader Helper
    - ShaderIndicesHelper
    - ShaderVariableHelper
  - Utils
    - GPU Query based profiler
    - compute context
    - string utils
  - Miscs
    - Python binding webgpu support (basic)
 
</details>

<details>
<summary><b>Kernel implementation</b></summary>


  - onnx.ai (default opset):
- Elementwise (math): Abs, Neg, Floor, Ceil, Reciprocal, Sqrt, Exp, Erf,
Log, Sin, Cos, Tan, Asin, Acos, Atan, Sinh, Cosh, Asinh, Acosh, Atanh,
Tanh, Not, Cast
- Elementwise (activation): Sigmoid, HardSigmoid, Clip, Elu, Relu,
LeakyRelu, ThresholdedRelu, Gelu
- Binary (math): Add, Sub, Mul, Div, Pow, Equal, Greater,
GreaterOrEqual, Less, LessOrEqual
    - (Tensors): Shape, Reshape, Squeeze, Unsqueeze
    - Where
    - Transpose
    - Concat
    - Expand
    - Gather
    - Tile
    - Range
    - LayerNormalization
  - com.microsoft
    - FastGelu
    - MatMulNBits
    - MultiHeadAttention
    - RotaryEmbedding
    - SkipLayerNormalization
    - LayerNormalization
    - SimplifiedLayerNormalization
    - SkipSimplifiedLayerNormalization

</details>

<details>
<summary><b>Build, test and CI pipeline integration</b></summary>

  - build works for Windows, macOS and iOS
  - support onnxruntime_test_all and python node test
  - added a new unit test for `--use_external_dawn` build flag.
  - updated MacOS pipeline to build with WebGPU support
  - added a new pipeline for WebGPU Windows

</details>

This change does not include:

- Node.js binding support for WebGPU (will be a separate PR)
2024-10-29 18:29:40 -07:00
Yifan Li
951d9aa99f
[TensorRT EP] Refactor TRT version update logic & apply TRT 10.5 (#22483)
### Description
<!-- Describe your changes. -->
* Leverage template `common-variables.yml` and reduce usage of hardcoded
trt_version

8391b24447/tools/ci_build/github/azure-pipelines/templates/common-variables.yml (L2-L7)
* Among all CI yamls, this PR reduces usage of hardcoding trt_version
from 40 to 6, by importing trt_version from `common-variables.yml`
* Apply TRT 10.5 and re-enable control flow op test


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
- Reduce usage of hardcoding trt_version among all CI ymls

### Next refactor PR 
will work on reducing usage of hardcoding trt_version among
`.dockerfile`, `.bat` and remaining 2 yml files
(download_win_gpu_library.yml & set-winenv.yml, which are step-template
yaml that can't import variables)
2024-10-29 09:23:41 -07:00
Satya Kumar Jandhyala
4ed5bec2e7
[JS/WebGPU] Support WASM64 (#21836)
### Description
Support wasm64



### Motivation and Context
Overcome memory limitations

---------

Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
2024-10-24 20:21:51 -07:00
Changming Sun
88676e62b9
Remove nsync (#20413)
### Description
1. Remove the onnxruntime::OrtMutex class and replace it with
~absl::Mutex~ std::mutex.
2. After this change, most source files will not include <Windows.h>
indirectly.


### Motivation and Context
To reduce the number of deps we have, and address some Github issues
that are related to build ONNX Runtime from source.
In PR #3000 , I added a custom implementation of std::mutex . It was
mainly because at that time std::mutex's default constructor was not
trivial on Windows. If you had such a mutex as a global var, it could
not be initialized at compile time. Then VC++ team fixed this issue.
Therefore we don't need this custom implementation anymore.

This PR also removes nsync. I ran several models tests on Linux. I
didn't see any perf difference.
This PR also reverts PR #21005 , which is no longer needed since conda
has updated its msvc runtime DLL.

This PR unblocks #22173 and resolves #22092 . We have a lot of open
issues with nsync. This PR can resolve all of them.
2024-10-21 15:32:14 -07:00
Edward Chen
04404ea482
Fix Xcode 16 iOS build issues (#22379)
- Work around Xcode 16 iOS test build issue: `error: Multiple commands produce '.../PlugIns'`.
- Fix link error in iOS static framework test.
- Update build.py to check for the right kind of build before running iOS tests on the simulator.
- Update Xcode 16 build images to 'macos-15' because that's the only image that will have Xcode 16 soon. See https://github.com/actions/runner-images/issues/10703.
2024-10-14 09:24:38 -07:00
Changming Sun
6ada97c84c
Fix a build issue when statically link to MSVC Runtime (#22393)
Yesterday I updated ABSL to a newer version which added a new cmake
option: ABSL_MSVC_STATIC_RUNTIME . I wasn't aware of it. This PR fixes
it.
2024-10-10 20:09:13 -07:00
Yi Zhang
25b1c38e87
Add conv fp16 kernel in xnnpack EP (#22301)
### Description
Add FP16 kernels of Conv and ConvTranspose

[AB#50186](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/50186)



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------
2024-10-10 08:48:09 +08:00
Yulong Wang
c5d28cac4d
Initial WebGPU EP checkin (#22318)
### Description

This change introduces the WebGPU EP into ONNX Runtime.

To make the PR as simple as possible, this PR excluded the following:
- C API changes for WebGPU EP
- actual implementation of WebGPU EP. Currently in this PR, WebGPU is a
stub implementation that does not register any kernel.
- Python IO Binding update
- Node.js IO Binding update

This PR now contains only 43 file changes (while the working branch
contains 130+) and hopefully this makes it easier to review.

There is going to be separated PRs for each mentioned above.

Current working branch: #21904
2024-10-08 16:10:46 -07:00
jingyanwangms
d0b0ecfdb9
[Running CI] Update TensorRT to 10.4 (#22049)
### Description
TensorRT 10.4 is GA now, update to 10.4



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-09-26 11:10:52 -07:00
George Wu
944d87381d
[QNN EP] set up py packaging pipeline for Linux x64 (#22132)
set up a pipeline to produce nightly Linux x64 whls for onnxruntime-qnn
this can be used for offline context binary generation.
2024-09-18 23:24:32 -07:00
Michael Tyler
904b850b44
Update Arm Compute Library Execution Provider (#22032)
### Description
This PR makes the following updates to the Arm Compute Library execution
provider:

- Target Arm Compute Library 24.07  
- Add support for the following operators: 
  - Conv (FP16) 
  - NhwcConv 
  - QLinearConv 
  - MatMul 
  - FusedMatMul 
  - MatMulIntegerToFloat 
- Optimize memory usage and performance
- Expose the enable_fast_math setting 
- Use the main runtime thread pool 



### Motivation and Context
These updates improve performance and memory usage, and enable use of a
more recent version of Arm Compute Library.

@microsoft-github-policy-service agree company="Arm Ltd"

---------

Signed-off-by: Michael Tyler <michael.tyler@arm.com>
2024-09-12 20:51:59 -07:00
PARK DongHa
f633caa0b1
Create CMake option onnxruntime_USE_VCPKG (#21348)
### Changes

1. CMake option `onnxruntime_USE_VCPKG`. It will be used in the vcpkg
port
* Unit test may fail because this option leads to a mixture of
unexpected external library versions.
     Especially ONNX, Protobuf, and Flatbuffers version can be different
2. Overhaul of `onnxruntime_external_deps.cmake`
   * Make `FetchContent_Declare` to try `find_package`.  
See
https://cmake.org/cmake/help/latest/guide/using-dependencies/index.html
* Relocated `FetchContent_Declare` and `FetchContent_MakeAvailable`(or
`onnxruntime_fetchcontent_makeavailable`) to closer lines.
It was too hard to navigate the entire file to search related
sections...
* Alias `IMPORTED` targets like build targets (e.g. `ONNX::onnx` -->
`onnx`)

```cmake
# The script uses `find_package` with the changes.
# In this case, use vcpkg to search dependencies
# See https://cmake.org/cmake/help/latest/guide/using-dependencies/index.html
include(external/onnxruntime_external_deps.cmake)
```

3. Create CMakePresets.json and presets to [run vcpkg in manifest
mode](https://learn.microsoft.com/en-us/vcpkg/concepts/manifest-mode)
   * Currently, it's NOT for training build
   * Main triplets are `x64-windows` and `x64-osx`

```pwsh
Push-Location "cmake"
    cmake --preset "x64-windows-vcpkg"
    cmake --build --preset "x64-windows-vcpkg-debug"
Pop-Location
```
```bash
pushd "cmake"
    cmake --preset "x64-osx-vcpkg"
    cmake --build --preset "x64-osx-vcpkg-debug"
popd
```

4. Updated tools/ci_build/build.py
* `--use_vcpkg` option: it needs `CMAKE_TOOLCHAIN_FILE` with
[vcpkg.cmake toolchain
script](https://github.com/microsoft/vcpkg/blob/master/scripts/buildsystems/vcpkg.cmake)
* `--compile_no_warning_as_error` is recommended because library version
differences will cause unexpected compiler warnings

```bash
python ./tools/ci_build/build.py \
    --compile_no_warning_as_error \
    --use_vcpkg \
    --cmake_extra_defines "CMAKE_TOOLCHAIN_FILE:FILEPATH=${VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake" \
    --cmake_extra_defines "VCPKG_TARGET_TRIPLET=..."
```

5. Created Job `Vcpkg` for Windows and macOS
   * Show how to setup and use vcpkg.  
     Similar to the CMakePresets.json usage

### Motivation and Context

* Help #7150
* Help https://github.com/microsoft/vcpkg/pull/36850
   * https://github.com/luncliff/vcpkg-registry/pull/212
   * https://github.com/microsoft/vcpkg/pull/39881
* https://github.com/luncliff/vcpkg-registry/pull/215
   * https://github.com/luncliff/vcpkg-registry/pull/216
   * https://github.com/luncliff/vcpkg-registry/pull/227
*
https://cmake.org/cmake/help/latest/guide/using-dependencies/index.html
*
https://github.com/microsoft/vcpkg/blob/master/scripts/buildsystems/vcpkg.cmake

### Future Works?

More feature coverage with the vcpkg supported libraries

* CUDA feature support
* Training feature support
2024-09-10 16:39:27 -07:00
Guenther Schmuelling
ba7baae994
Revert "Upgrade emsdk from 3.1.59 to 3.1.62" (#21817)
Reverts microsoft/onnxruntime#21421

Users are seeing chrome memory grow to 16GB before it crashes:
https://github.com/microsoft/onnxruntime/issues/21810

Revert for now so we have time to debug.
2024-08-22 11:21:00 -07:00
Satya Kumar Jandhyala
6d8de1f7b8
Upgrade emsdk from 3.1.59 to 3.1.62 (#21421)
### Description
Upgrade EM SDK to 3.1.62.



### Motivation and Context
The changes are required to clear wasm64 errors.
2024-08-14 12:38:52 -07:00
Justin Chu
c203d89958
Update ruff and clang-format versions (#21479)
ruff -> 0.5.4
clang-format -> 18
2024-07-24 11:50:11 -07:00
mindest
5b9369e93c
Fix typos according to reviewdog report. (#21335)
### Description
Fix typos based on reviewdog report but with some
exceptions/corrections.
2024-07-22 13:37:32 -07:00
Jian Chen
4e75605eec
Replace inline pip install with pip install from requirements*.txt (#21106)
### Description
Replace inline pip install with pip install from requirements*.txt



### Motivation and Context
so that CG can recognize

### Dependency

- [x] https://github.com/microsoft/onnxruntime/pull/21085
2024-07-22 12:39:10 -07:00
Ted Themistokleous
4ac4cd2668
Migraphx ep windows build (#21284)
### Description
Repeat of #21084 with removal of policy CMP0144 to suppress warnings
which uses CMake 3.27.0.


### Motivation and Context


Already approved PR: 
https://github.com/microsoft/onnxruntime/pull/21084

Removed the added policy from CMake 3.27.0.
2024-07-11 21:21:38 -07:00
Baiju Meswani
116398c1a4
onnxruntime shared lib inside python package (#21223) 2024-07-02 15:37:50 -07:00
Chen Feiyue
56b36a58ba
Initial PR for VSINPU execution provider (#20903)
### Description
<!-- Describe your changes. -->
-It is an initial PR for VSINPU execution provider



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
- For support VeriSilicon hardware
- TIM-VX(Tensor Interface Module)
(https://github.com/VeriSilicon/TIM-VX) is an integrated software
solution by Verisilicon for our hardware(A311D/i.MX 8M Plus etc.)
design, it is easy to use Verisilicon’s hardware by simply connecting
onnxruntime with the TIM-VX API by this VSINPU execution provider.
2024-06-28 21:48:34 -07:00
Preetha Veeramalai
6baaaf5165
OVEP options to disable CPU fallback at compile time (#21166)
### Description
Provide user level options to control the fallback on CPU for models not
supported on Intel's NPU hardware.


### Motivation and Context
- Current workflow of OVEP allows safe fallback from OV NPU to OV CPU on
compilation failures. Also supports MLAS CPU fallback in presence of
unsupported custom ops.
- The PR provides a build-time option to disable fallback from OV NPU to
OV CPU.
- The session Option "kOrtSessionOptionsDisableCPUEPFallback" disables
OV CPU and MLAS CPU fallback.
- Also has bug fix for proto creation.

---------

Co-authored-by: jatinwadhwa921 <jatin.wadhwa@intel.com>
Co-authored-by: ankitm3k <ankit.maheshkar@intel.com>
2024-06-28 08:31:02 -07:00
Changming Sun
f5625b8858
Revert "[MIGraphX EP] enable compilation and execution on Windows (21084)" (#21132)
### Description

This reverts commit 1d7bf56947 because it
broken the AMD GPU CI pipeline. Sorry when I reviewed the PR I forgot to
run the AMD GPU CI pipeline.

Will revert the PR first then ask the author to fix the issue.
2024-06-21 01:01:07 -07:00
Ted Themistokleous
1d7bf56947
[MIGraphX EP] enable compilation and execution on Windows (#36) (#21084) 2024-06-20 16:21:11 -07:00
Changming Sun
be423747b1
Delete pyop (#21094)
### Description
Remove the "--enable_language_interop_ops" build flag, because the code
is incompatible with the latest numpy, and the build flag is not used
anywhere except a macOS CI pipeline. It does not seem to have a ship
plan.


### Motivation and Context
The build error was:
```
onnxruntime/core/language_interop_ops/pyop/pyop.cc:122:85: error: no member named 'elsize' in '_PyArray_Descr'
                                  static_cast<int64_t>(PyArray_DescrFromType(type)->elsize),
                                                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~  ^
```
2024-06-19 16:21:33 -07:00
Clément Péron
8ab8e649a7
tools: build: fix typo (#21052)
### Description
Typo in the python build script
2024-06-19 16:14:58 -07:00
Scott McKay
5fc60f36f2
Update to the net8 MAUI targets. Remove Xamarin. (#21062)
### Description
<!-- Describe your changes. -->
Xamarin is EOL so remove support.
The MAUI targets are EOL and need updating.
https://dotnet.microsoft.com/en-us/platform/support/policy/maui

Other cleanups:
- netcoreapp3.1 is EOL
- the net6 macos target was added in the mistaken belief that was for
MAUI mac support, but that is actually via the mac-catalyst target which
we recently added support for.
- some CIs that were using the old build setup of splitting pre-net6
targets. The ORT C# bindings csproj was updated last year and the
`PreNet6` and `SelectedTargets` properties no longer exist as they were
replaced by the simpler `IncludeMobileTargets` property.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Remove EOL components.
#21058
2024-06-19 16:20:58 +10:00
Jian Chen
1ad2c0a4b2
fix Window_CI in Github Action (#21070)
### Description
fix Window_CI in Github Action
2024-06-18 23:14:08 -07:00
Changming Sun
ffb8e8eb0e
Update build.py: add a comment (#20993)
### Description
Update build.py: add a comment


### Motivation and Context
See the comment.
2024-06-18 13:52:34 -07:00
Nikolai Svakhin
7b3fff650a
Updated build script for CUDA case (#20987)
### Description

In CUDA case, use the cuda_home variable to set CMAKE's CUDA compiler to
a correct version of NVCC

Otherwise, an NVCC from a current PATH would be picked up, which could
be from a different version of CUDA.


### Motivation and Context

I had a case when I had main CUDA installed, and it was a version 11.8.

I wanted to build against 12.5, so I downloaded and unpacked it into a
separate directory and passed it as a `--cuda-home` parameter, however
the ONNX builder was still picking the NVCC compiler from 11.8.

This would fix the issue
https://github.com/microsoft/onnxruntime/issues/20928


cc @gedoensmax
2024-06-17 14:41:43 -07:00
Changming Sun
feec8efae4
Add "-allow-unsupported-compiler" flags to Windows CUDA flags (#21004)
### Description
Add "-allow-unsupported-compiler" flags to Windows CUDA flags. This
change only impacts our pipelines. By default it would not reach this
code path.

### Motivation and Context
nvcc refuses working with the latest VS toolset unless this flag is set.

If without this change, our CI build will fail with the compiler is the
latest VS 2022 17.10. Here is the log:
https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1405549&view=logs&j=6df8fe70-7b8f-505a-8ef0-8bf93da2bac7&t=c7e55e04-f02b-57dc-d19a-29b7d3528c44&l=715

The error message is:
`D:\a\_work\_temp\v11.8\include\crt/host_config.h(153): fatal error
C1189: #error: -- unsupported Microsoft Visual Studio version! Only the
versions between 2017 and 2022 (inclusive) are supported! The nvcc flag
'-allow-unsupported-compiler' can be used to override this version
check; however, using an unsupported host compiler may cause compilation
failure or incorrect run time execution. Use at your own risk.
[D:\a\_work\1\b\RelWithDebInfo\CMakeFiles\CMakeScratch\TryCompile-g5rudf\cmTC_7b8ff.vcxproj]`
2024-06-12 14:23:00 -07:00
Baiju Meswani
94aa21c3dd
Define _DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR (#21005)
https://github.com/microsoft/STL/pull/3824 introduces constexpr mutex.
An older version of msvcp140.dll will lead to ```A dynamic link library
(DLL) initialization routine failed```.

This error can be encountered if using conda Python since conda packages
msvc dlls and these are older right now.

This PR disables the constexpr mutex so that ort package can work with
older msvc dlls.

Thanks @snnn for the discovery.
2024-06-11 22:23:28 -07:00