Commit graph

12200 commits

Author SHA1 Message Date
Hector Li
afbee6eb0c
Fix the power mode issue for the case that run from context model (#23330)
### Description
Set power config id and the default power mode from provider option (if there is) for main thread, otherwise it will mess up the power mode if user just create session without run it.

The issue fixed by this PR is:
Process 1 just creates the session without run it.
Then, start process 2 which creates the session and run it with power saver mode. The result is with burst power mode.
2025-01-16 09:07:54 -08:00
Changming Sun
82aa355904
Update android_min_sdk_version/android_target_sdk_version (#23369)
Update android_min_sdk_version to 24 and android_target_sdk_version to
34.
Previously Jian already updated the values for some pipelines. This PR
updates the other occurrences to make things consistent

Why android_min_sdk_version is set to 24:
Because React Native requires so:

https://github.com/react-native-community/discussions-and-proposals/discussions/802

Why android_target_sdk_version is set to 34:
Because according to Google Play's policy, new apps and app updates must
target Android 14 (API level 34) to be submitted to Google Play.

https://support.google.com/googleplay/android-developer/answer/11926878?hl=en
2025-01-16 08:03:31 -08:00
Yulong Wang
1d97d6ef55
[webgpu] fix Split operator implementation when input is 1D (#23376)
### Description

[webgpu] fix Split operator implementation when input is 1D
2025-01-15 21:01:05 -08:00
Yi-Hong Lyu
e51bcfb541
Implement DepthToSpace uint8_t and Enable DropQDQNodesRules (#23352)
### Description
<!-- Describe your changes. -->

- Implemented the DepthToSpace uint8_t kernel.
- Enabled DropQDQNodesRules for DepthToSpace.
- Added unit tests for the DepthToSpace uint8_t kernel.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

This commit aims to enhance the performance of the Image
Super-Resolution INT8 Model (RFDN). Specifically, it improves the
Inference Per Second (IPS) by 25%, providing a significant boost in
efficiency and speed.
2025-01-15 19:24:50 -08:00
Jian Chen
331fc36b6a
Remove hot path for pre-0.70.15 RN fix (#23382)
### Description
This undo the changes from #23281
2025-01-15 16:16:38 -08:00
Ted Themistokleous
7cd08a6004
[MigraphX EP] [ROCm EP] Upstream ROCm changes for bugfixes and features (#23249)
Add support to mainline Onnxruntime of changes from the ROCm Team's changes

### Motivation and Context
Various bugfixes, and changes added between ROCm 6.2 and 6.3 that
haven't been upstreamed yet to mainline

---------

Co-authored-by: Yueqing Zhang <yuz75@Pitt.edu>
Co-authored-by: Yueqing Zhang <yueqingz@amd.com>
Co-authored-by: Jeff Daily <jeff.daily@amd.com>
Co-authored-by: Artur Wojcik <artur.wojcik@outlook.com>
Co-authored-by: Ted Themistokleous <tedthemistokleous@amd.com>
Co-authored-by: Xinya Zhang <Xinya.Zhang@amd.com>
Co-authored-by: ikalinic <ilija.kalinic@amd.com>
Co-authored-by: sstamenk <sstamenk@amd.com>
2025-01-15 12:57:04 -08:00
dependabot[bot]
1461a16e71
Bump ruff from 0.5.4 to 0.9.1 (#23328)
Bumps [ruff](https://github.com/astral-sh/ruff) from 0.5.4 to 0.9.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/ruff/releases">ruff's
releases</a>.</em></p>
<blockquote>
<h2>0.9.1</h2>
<h2>Release Notes</h2>
<h3>Preview features</h3>
<ul>
<li>[<code>pycodestyle</code>] Run
<code>too-many-newlines-at-end-of-file</code> on each cell in notebooks
(<code>W391</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15308">#15308</a>)</li>
<li>[<code>ruff</code>] Omit diagnostic for shadowed private function
parameters in <code>used-dummy-variable</code> (<code>RUF052</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15376">#15376</a>)</li>
</ul>
<h3>Rule changes</h3>
<ul>
<li>[<code>flake8-bugbear</code>] Improve
<code>assert-raises-exception</code> message (<code>B017</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15389">#15389</a>)</li>
</ul>
<h3>Formatter</h3>
<ul>
<li>Preserve trailing end-of line comments for the last string literal
in implicitly concatenated strings (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15378">#15378</a>)</li>
</ul>
<h3>Server</h3>
<ul>
<li>Fix a bug where the server and client notebooks were out of sync
after reordering cells (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15398">#15398</a>)</li>
</ul>
<h3>Bug fixes</h3>
<ul>
<li>[<code>flake8-pie</code>] Correctly remove wrapping parentheses
(<code>PIE800</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15394">#15394</a>)</li>
<li>[<code>pyupgrade</code>] Handle comments and multiline expressions
correctly (<code>UP037</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15337">#15337</a>)</li>
</ul>
<h2>Contributors</h2>
<ul>
<li><a
href="https://github.com/AntoineD"><code>@​AntoineD</code></a></li>
<li><a
href="https://github.com/InSyncWithFoo"><code>@​InSyncWithFoo</code></a></li>
<li><a
href="https://github.com/MichaReiser"><code>@​MichaReiser</code></a></li>
<li><a href="https://github.com/calumy"><code>@​calumy</code></a></li>
<li><a
href="https://github.com/dcreager"><code>@​dcreager</code></a></li>
<li><a
href="https://github.com/dhruvmanila"><code>@​dhruvmanila</code></a></li>
<li><a href="https://github.com/dylwil3"><code>@​dylwil3</code></a></li>
<li><a href="https://github.com/sharkdp"><code>@​sharkdp</code></a></li>
<li><a href="https://github.com/tjkuson"><code>@​tjkuson</code></a></li>
</ul>
<h2>Install ruff 0.9.1</h2>
<h3>Install prebuilt binaries via shell script</h3>
<pre lang="sh"><code>curl --proto '=https' --tlsv1.2 -LsSf
https://github.com/astral-sh/ruff/releases/download/0.9.1/ruff-installer.sh
| sh
</code></pre>
<h3>Install prebuilt binaries via powershell script</h3>
<pre lang="sh"><code>powershell -ExecutionPolicy ByPass -c &quot;irm
https://github.com/astral-sh/ruff/releases/download/0.9.1/ruff-installer.ps1
| iex&quot;
</code></pre>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md">ruff's
changelog</a>.</em></p>
<blockquote>
<h2>0.9.1</h2>
<h3>Preview features</h3>
<ul>
<li>[<code>pycodestyle</code>] Run
<code>too-many-newlines-at-end-of-file</code> on each cell in notebooks
(<code>W391</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15308">#15308</a>)</li>
<li>[<code>ruff</code>] Omit diagnostic for shadowed private function
parameters in <code>used-dummy-variable</code> (<code>RUF052</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15376">#15376</a>)</li>
</ul>
<h3>Rule changes</h3>
<ul>
<li>[<code>flake8-bugbear</code>] Improve
<code>assert-raises-exception</code> message (<code>B017</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15389">#15389</a>)</li>
</ul>
<h3>Formatter</h3>
<ul>
<li>Preserve trailing end-of line comments for the last string literal
in implicitly concatenated strings (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15378">#15378</a>)</li>
</ul>
<h3>Server</h3>
<ul>
<li>Fix a bug where the server and client notebooks were out of sync
after reordering cells (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15398">#15398</a>)</li>
</ul>
<h3>Bug fixes</h3>
<ul>
<li>[<code>flake8-pie</code>] Correctly remove wrapping parentheses
(<code>PIE800</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15394">#15394</a>)</li>
<li>[<code>pyupgrade</code>] Handle comments and multiline expressions
correctly (<code>UP037</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15337">#15337</a>)</li>
</ul>
<h2>0.9.0</h2>
<p>Check out the <a href="https://astral.sh/blog/ruff-v0.9.0">blog
post</a> for a migration guide and overview of the changes!</p>
<h3>Breaking changes</h3>
<p>Ruff now formats your code according to the 2025 style guide. As a
result, your code might now get formatted differently. See the formatter
section for a detailed list of changes.</p>
<p>This release doesn’t remove or remap any existing stable rules.</p>
<h3>Stabilization</h3>
<p>The following rules have been stabilized and are no longer in
preview:</p>
<ul>
<li><a
href="https://docs.astral.sh/ruff/rules/stdlib-module-shadowing/"><code>stdlib-module-shadowing</code></a>
(<code>A005</code>).
This rule has also been renamed: previously, it was called
<code>builtin-module-shadowing</code>.</li>
<li><a
href="https://docs.astral.sh/ruff/rules/builtin-lambda-argument-shadowing/"><code>builtin-lambda-argument-shadowing</code></a>
(<code>A006</code>)</li>
<li><a
href="https://docs.astral.sh/ruff/rules/slice-to-remove-prefix-or-suffix/"><code>slice-to-remove-prefix-or-suffix</code></a>
(<code>FURB188</code>)</li>
<li><a
href="https://docs.astral.sh/ruff/rules/boolean-chained-comparison/"><code>boolean-chained-comparison</code></a>
(<code>PLR1716</code>)</li>
<li><a
href="https://docs.astral.sh/ruff/rules/decimal-from-float-literal/"><code>decimal-from-float-literal</code></a>
(<code>RUF032</code>)</li>
<li><a
href="https://docs.astral.sh/ruff/rules/post-init-default/"><code>post-init-default</code></a>
(<code>RUF033</code>)</li>
<li><a
href="https://docs.astral.sh/ruff/rules/useless-if-else/"><code>useless-if-else</code></a>
(<code>RUF034</code>)</li>
</ul>
<p>The following behaviors have been stabilized:</p>
<ul>
<li><a
href="https://docs.astral.sh/ruff/rules/pytest-parametrize-names-wrong-type/"><code>pytest-parametrize-names-wrong-type</code></a>
(<code>PT006</code>): Detect <a
href="https://docs.pytest.org/en/7.1.x/how-to/parametrize.html#parametrize"><code>pytest.parametrize</code></a>
calls outside decorators and calls with keyword arguments.</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="12f86f39a4"><code>12f86f3</code></a>
Ruff 0.9.1 (<a
href="https://redirect.github.com/astral-sh/ruff/issues/15407">#15407</a>)</li>
<li><a
href="2b28d566a4"><code>2b28d56</code></a>
Associate a trailing end-of-line comment in a parenthesized implicit
concaten...</li>
<li><a
href="adca7bd95c"><code>adca7bd</code></a>
Remove pygments pin (<a
href="https://redirect.github.com/astral-sh/ruff/issues/15404">#15404</a>)</li>
<li><a
href="6b98a26452"><code>6b98a26</code></a>
[red-knot] Support <code>assert_type</code> (<a
href="https://redirect.github.com/astral-sh/ruff/issues/15194">#15194</a>)</li>
<li><a
href="c87463842a"><code>c874638</code></a>
[red-knot] Move tuple-containing-Never tests to Markdown (<a
href="https://redirect.github.com/astral-sh/ruff/issues/15402">#15402</a>)</li>
<li><a
href="c364b586f9"><code>c364b58</code></a>
[<code>flake8-pie</code>] Correctly remove wrapping parentheses
(<code>PIE800</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/issues/15394">#15394</a>)</li>
<li><a
href="73d424ee5e"><code>73d424e</code></a>
Fix outdated doc for handling the default file types with the pre-commit
hook...</li>
<li><a
href="6e9ff445fd"><code>6e9ff44</code></a>
Insert the cells from the <code>start</code> position (<a
href="https://redirect.github.com/astral-sh/ruff/issues/15398">#15398</a>)</li>
<li><a
href="f2c3ddc5ea"><code>f2c3ddc</code></a>
[red-knot] Move intersection type tests to Markdown (<a
href="https://redirect.github.com/astral-sh/ruff/issues/15396">#15396</a>)</li>
<li><a
href="b861551b6a"><code>b861551</code></a>
Remove unnecessary backticks (<a
href="https://redirect.github.com/astral-sh/ruff/issues/15393">#15393</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/astral-sh/ruff/compare/0.5.4...0.9.1">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=ruff&package-manager=pip&previous-version=0.5.4&new-version=0.9.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2025-01-15 11:11:17 -08:00
Changming Sun
6a7ea5c896
Update xnnpack, cpuinfo and pthreadpool (#23362)
### Description
Update xnnpack to remove the dependency on psimd and fp16 libraries.
However, coremltool still depends on them, which will be addressed
later.

Also, update CPUINFO because the latest xnnpack requires CPUINFO's avx10
support.

### Motivation and Context
The fewer dependencies the better.
2025-01-15 09:42:15 -08:00
Vincent Wang
cff0ec5278
Disable QNN HTP MatMul Op Test to Avoid Random Failure (#23371)
The QNN HTP backend for MatMul is not stable on different versions and
platforms. Disable the UT to avoid random failure.
2025-01-15 13:46:53 +08:00
Sam Webster
5d215ff810
[QNN EP] Clean up correctly from a partial setup (#23320)
### Description
Fix bug in previous change where a failure during `SetupBackend` causes `ReleaseResources `to be called to clean up but does nothing because `backend_setup_completed_ ` is false. `backend_setup_completed_ ` _seems_ to now be redundant so removing it fixes the problem.

### Motivation and Context
We are seeing crashes due to the log callback failing to be de-registered
2025-01-14 20:58:17 -08:00
kunal-vaishnavi
ab59e9e31e
Add unit tests for Phi vision (#23357)
### Description

This PR adds unit tests for [fusing the vision
components](https://github.com/microsoft/onnxruntime/pull/20721) of
Phi-3 vision and Phi-3.5 vision.

### Motivation and Context

Many multi-modal models use a CLIP encoder or a variant of CLIP as part
of their encoders. These fusion unit tests will ensure that the vision
components of Phi-3 vision and Phi-3.5 vision can still be fused when
existing fusions are modified to support more models.
2025-01-14 19:24:06 -08:00
Wanming Lin
b67983c553
[WebNN] Support RotaryEmbedding op (#23283)
WebNN doesn't provide a dedicated op for RotaryEmbedding. Instead, we
implement it by using a combination of WebNN ops. The decomposed graph
is referenced from DML EP at:

onnxruntime/core/providers/dml/DmlExecutionProvider/src/Operators/DmlOperatorRotaryEmbedding.cpp
2025-01-14 17:58:06 -08:00
Prathik Rao
c07afd3198
slice operator implementation for webgpu native (#23264)
Increases operator coverage for webgpu native ep
2025-01-14 15:47:15 -08:00
Yifan Li
5c3c7643db
Update range of gpu arch (#23309)
### Description
<!-- Describe your changes. -->
* Remove deprecated gpu arch to control nuget/python package size
(latest TRT supports sm75 Turing and newer arch)
* Add 90 to support blackwell series in next release (86;89 not
considered as adding them will rapidly increase package size)

| arch_range | Python-cuda12 | Nuget-cuda12 |
| -------------- |
------------------------------------------------------------ |
---------------------------------- |
| 60;61;70;75;80 | Linux: 279MB Win: 267MB | Linux: 247MB Win: 235MB |
| 75;80 | Linux: 174MB Win: 162MB | Linux: 168MB Win: 156MB |
| **75;80;90** | **Linux: 299MB Win: 277MB** | **Linux: 294MB Win:
271MB** |
| 75;80;86;89 | [Linux: MB Win:
390MB](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=647457&view=results)
| Linux: 416MB Win: 383MB |
| 75;80;86;89;90 | [Linux: MB Win:
505MB](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=646536&view=results)
| Linux: 541MB Win: 498MB |

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Callout: While adding sm90 support, the build of cuda11.8+cudnn8 will be
dropped in the coming ORT release,
as the build has issue with blackwell (mentioned in comments) and demand
on cuda 11 is minor, according to internal ort-cuda11 repo.
2025-01-14 14:27:34 -08:00
Jian Chen
39db20f3ff
Updating react-native to 0.70.15 (#23279)
### Description
Updating react-native to 0.70.15



### Motivation and Context
To address the issue with the failed checksum after boost switching URL
from Jfrog
2025-01-14 14:09:49 -08:00
Tianlei Wu
6550f4b35b
Stable Diffusion 3.x and Flux Optimization (#22986)
### Description

It has dependency on the following PRs:
- https://github.com/microsoft/onnxruntime/pull/23297

Optimize the ONNX pipeline for Stable Diffusion 3.x and Flux 1.0 models
(fp32 or fp16).
- [x] Update optimize_pipeline script
- [x] Update benchmkark script
- [x] Update document about Stable Diffusion 3.x and Flux 1.0 models
- [x] Add graph optimizations for MMDit model
  - [x] FastGelu fusion
  - [x]  RMSNorm fusion
  - [x]  MultiHeadAttention fusion
- [x] Add graph optimizations for Flux transformer models
  - [x]  MultiHeadAttention fusion
- [x] Update graph optimizations for t5
- [x] Add tests

Optimize the ONNX pipeline for Stable Diffusion 3.x and Flux 1.0 models:
```
python optimize_pipeline.py -i ./flux1_schnell_onnx/fp32 -o ./flux1_schnell_onnx/fp16 --float16

  Optimize flux1_schnell_onnx/fp32/transformer/model.onnx ...
  Fused LayerNormalization: 115
  Fused SimplifiedLayerNormalization: 152
  Fused FastGelu: 76
  Fused MultiHeadAttention: 57
```

### H100 Benchmark Results

* GPU: NVIDIA H100 80GB HBM3
* Image Size: 1024x1024
* Batch Size: 1

Model | Steps | Precision | Engine | Latency (Seconds) | GPU Memory (MB)
-- | -- | -- | -- | -- | --
Flux 1.0 Dev | 50 | BF16 | Torch 2.5.1 (compile) | 8.198 | 37,603
Flux 1.0 Dev | 50 | FP16+BF16 | Optimum (ORT) | 10.762 | 41,469
Flux 1.0 Dev | 50 | FP16+FP32 | Optimum (ORT) | 10.891 | 43,545
Flux 1.0 Dev | 50 | BF16 | Torch 2.5.1 (eager) | 12.339 | 36,651
Flux 1.0 Schnell | 4 | BF16 | Torch 2.5.1 (compile) | 0.775 | 37,857
Flux 1.0 Schnell | 4 | FP16+BF16 | Optimum (ORT) | 0.931 | 41,433
Flux 1.0 Schnell | 4 | FP16+FP32 | Optimum (ORT) | 0.939 | 43,809
Flux 1.0 Schnell | 4 | BF16 | Torch 2.5.1 (eager) | 1.120 | 36,629
SD 3.5 Large | 50 | BF16 | Torch 2.5.1 (compile) | 7.466 | 32,217
SD 3.5 Large | 50 | FP16+BF16 | Optimum (ORT) | 10.275 | 36,609
SD 3.5 Large | 50 | FP16+FP32 | Optimum (ORT) | 10.283 | 36,729
SD 3.5 Large | 50 | BF16 | Torch 2.5.1 (eager) | 11.615 | 31,517
SD 3.5 Medium | 50 | BF16 | Torch 2.5.1 (compile) | 3.240 | 21,143
SD 3.5 Medium | 50 | FP16+BF16 | Optimum (ORT) | 4.799 | 25,097
SD 3.5 Medium | 50 | FP16+FP32 | Optimum (ORT) | 4.838 | 25,109
SD 3.5 Medium | 50 | BF16 | Torch 2.5.1 (eager) | 5.582 | 20,489

### A100 Benchmark Results

* GPU: A100-SXM4-80GB
* Image Size: 1024x1024
* Batch Size: 1

Model | Steps | Precision | Engine | Latency (Seconds) | GPU Memory (MB)
-- | -- | -- | -- | -- | --
Flux 1.0 Dev | 50 | BF16 | Torch 2.5.1 (compile) | 17.593 | 37,723
Flux 1.0 Dev | 50 | FP16+BF16 | Optimum (ORT) | 21.918 | 41,348
Flux 1.0 Dev | 50 | FP16+FP32 | Optimum (ORT) | 22.060 | 44,860
Flux 1.0 Dev | 50 | BF16 | Torch 2.5.1 (eager) | 24.267 | 36,847
Flux 1.0 Schnell | 4 | BF16 | Torch 2.5.1 (compile) | 1.627 | 37,881
Flux 1.0 Schnell | 4 | FP16+BF16 | Optimum (ORT) | 1.884 | 41,537
Flux 1.0 Schnell | 4 | FP16+FP32 | Optimum (ORT) | 1.902 | 44,858
Flux 1.0 Schnell | 4 | BF16 | Torch 2.5.1 (eager) | 2.162 | 36,831
SD 3.5 Large | 50 | BF16 | Torch 2.5.1 (compile) | 15.881 | 32,307
SD 3.5 Large | 50 | FP16+FP32 | Optimum (ORT) | 19.837 | 36,451
SD 3.5 Large | 50 | FP16+BF16 | Optimum (ORT) | 19.964 | 36,461
SD 3.5 Large | 50 | BF16 | Torch 2.5.1 (eager) | 22.477 | 31,513
SD 3.5 Medium | 50 | BF16 | Torch 2.5.1 (compile) | 6.476 | 21,341
SD 3.5 Medium | 50 | FP16+FP32 | Optimum (ORT) | 8.775 | 25,183
SD 3.5 Medium | 50 | BF16 | Torch 2.5.1 (eager) | 10.057 | 20,433

### Future Works

* Triton kernel for matrix multiplication and auto tuning.
* FP8/Int8 quantization

### Motivation and Context

SD 3.5 Architecture:

https://huggingface.co/stabilityai/stable-diffusion-3.5-medium/resolve/main/mmdit-x.png
2025-01-14 13:37:58 -08:00
Edward Chen
04030f64be
Add QNN EP HTP shared memory allocator (#23136)
Adds QNN EP HTP shared memory allocator.

The HTP shared memory allocator (`HtpSharedMemoryAllocator`) calls the
rpcmem shared library (libcdsprpc.so/dll) to allocate and free memory
that can be shared between HTP and CPU.

The allocator can be enabled by setting QNN EP option
`enable_htp_shared_memory_allocator` to `1`.
`QNNExecutionProvider::CreatePreferredAllocators()` will then return an
instance of `HtpSharedMemoryAllocator`.

For each QNN context, we also need to register and unregister memory
handles in order to use the HTP shared memory. This memory handle
management is added to `QnnBackendManager`, which also manages the QNN
context handles.

For more information about using HTP shared memory with QNN, see:
https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/htp_shared_buffer_tutorial.html#shared-buffer-tutorial

Limitations:
- HTP shared memory usage is only supported for graph inputs and
outputs. Intermediate values are not supported.
- An allocation is assigned to a single shared memory buffer. The
allocator is not smart enough to have multiple allocations share a
single shared memory buffer.

Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>
2025-01-14 11:09:50 -08:00
Yulong Wang
444fcebaa4
Pre-requisites of upgrading EMSDK (#23347)
### Description

This PR contains a part of the changes in #23318.

The reason of creating this PR is: The works to support building WebGPU
EP in WASM depends on #23318, which cannot be merged since it's blocked
by upstream (https://github.com/llvm/llvm-project/issues/122166). This
PR contains the changes can be safely merged separately and can unblock
the development of supporting building WebGPU EP in WASM.

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-01-14 11:07:21 -08:00
Jianhui Dai
4a0269a5d4
[webgpu] Simplify o2i_output implementation (#23351)
### Description
This change simplifies the o2i_output implementation by reducing
unnecessary intermediate variables, with no change in functionality.

### Motivation and Context
As above.

Signed-off-by: Jianhui Dai <jianhui.j.dai@intel.com>
2025-01-14 11:06:42 -08:00
Changming Sun
228dd16893
Bump clang-format from 18.1.8 to 19.1.6 (#23346)
To replace #23327
2025-01-14 09:02:04 -08:00
Yulong Wang
d9cd27a0a7
[WebGPU EP] use LOGS_DEFAULT for device lost logging (#23353)
### Description

use LOGS_DEFAULT for device lost logging.

Now since the GPU device lifecycle is managed by WebGpuContext, it's now
able to use ORT logging.
2025-01-14 08:12:55 -08:00
Yulong Wang
af4126c2f1
[WebGPU EP] increase abs error for test case MatMulNBits.Float16Large (#23350)
### Description

increase absolute error for test case `MatMulNBits.Float16Large` to 0.1
for WebGPU with subgroup implementation.

Fixes webgpu CI pipeline.
2025-01-14 08:00:53 -08:00
Changming Sun
4e4fd2bdcf
Update ORT extension to the latest (#23314)
Update ORT extension to the latest, to include some build system fixes.
2025-01-13 18:59:42 -08:00
Jie Chen
a9be6b71a0
[webgpu] Implement Split operator (#23198)
Test: onnxruntime_test_all.exe --gtest_filter=SplitOperatorTest.*

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2025-01-13 15:51:36 -08:00
dependabot[bot]
377165fe1d
Bump black from 24.2.0 to 24.10.0 (#23325)
Bumps [black](https://github.com/psf/black) from 24.2.0 to 24.10.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/psf/black/releases">black's
releases</a>.</em></p>
<blockquote>
<h2>24.10.0</h2>
<h3>Highlights</h3>
<ul>
<li>Black is now officially tested with Python 3.13 and provides Python
3.13
mypyc-compiled wheels. (<a
href="https://redirect.github.com/psf/black/issues/4436">#4436</a>) (<a
href="https://redirect.github.com/psf/black/issues/4449">#4449</a>)</li>
<li>Black will issue an error when used with Python 3.12.5, due to an
upstream memory
safety issue in Python 3.12.5 that can cause Black's AST safety checks
to fail. Please
use Python 3.12.6 or Python 3.12.4 instead. (<a
href="https://redirect.github.com/psf/black/issues/4447">#4447</a>)</li>
<li>Black no longer supports running with Python 3.8 (<a
href="https://redirect.github.com/psf/black/issues/4452">#4452</a>)</li>
</ul>
<h3>Stable style</h3>
<ul>
<li>Fix crashes involving comments in parenthesised return types or
<code>X | Y</code> style unions.
(<a
href="https://redirect.github.com/psf/black/issues/4453">#4453</a>)</li>
<li>Fix skipping Jupyter cells with unknown <code>%%</code> magic (<a
href="https://redirect.github.com/psf/black/issues/4462">#4462</a>)</li>
</ul>
<h3>Preview style</h3>
<ul>
<li>Fix type annotation spacing between * and more complex type variable
tuple (i.e. <code>def fn(*args: *tuple[*Ts, T]) -&gt; None: pass</code>)
(<a
href="https://redirect.github.com/psf/black/issues/4440">#4440</a>)</li>
</ul>
<h3>Caching</h3>
<ul>
<li>Fix bug where the cache was shared between runs with and without
<code>--unstable</code> (<a
href="https://redirect.github.com/psf/black/issues/4466">#4466</a>)</li>
</ul>
<h3>Packaging</h3>
<ul>
<li>Upgrade version of mypyc used to 1.12 beta (<a
href="https://redirect.github.com/psf/black/issues/4450">#4450</a>) (<a
href="https://redirect.github.com/psf/black/issues/4449">#4449</a>)</li>
<li><code>blackd</code> now requires a newer version of aiohttp. (<a
href="https://redirect.github.com/psf/black/issues/4451">#4451</a>)</li>
</ul>
<h3>Output</h3>
<ul>
<li>Added Python target version information on parse error (<a
href="https://redirect.github.com/psf/black/issues/4378">#4378</a>)</li>
<li>Add information about Black version to internal error messages (<a
href="https://redirect.github.com/psf/black/issues/4457">#4457</a>)</li>
</ul>
<h2>24.8.0</h2>
<h3>Stable style</h3>
<ul>
<li>Fix crash when <code># fmt: off</code> is used before a closing
parenthesis or bracket. (<a
href="https://redirect.github.com/psf/black/issues/4363">#4363</a>)</li>
</ul>
<h3>Packaging</h3>
<ul>
<li>Packaging metadata updated: docs are explictly linked, the issue
tracker is now also
linked. This improves the PyPI listing for Black. (<a
href="https://redirect.github.com/psf/black/issues/4345">#4345</a>)</li>
</ul>
<h3>Parser</h3>
<ul>
<li>Fix regression where Black failed to parse a multiline f-string
containing another
multiline string (<a
href="https://redirect.github.com/psf/black/issues/4339">#4339</a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/psf/black/blob/main/CHANGES.md">black's
changelog</a>.</em></p>
<blockquote>
<h2>24.10.0</h2>
<h3>Highlights</h3>
<ul>
<li>Black is now officially tested with Python 3.13 and provides Python
3.13
mypyc-compiled wheels. (<a
href="https://redirect.github.com/psf/black/issues/4436">#4436</a>) (<a
href="https://redirect.github.com/psf/black/issues/4449">#4449</a>)</li>
<li>Black will issue an error when used with Python 3.12.5, due to an
upstream memory
safety issue in Python 3.12.5 that can cause Black's AST safety checks
to fail. Please
use Python 3.12.6 or Python 3.12.4 instead. (<a
href="https://redirect.github.com/psf/black/issues/4447">#4447</a>)</li>
<li>Black no longer supports running with Python 3.8 (<a
href="https://redirect.github.com/psf/black/issues/4452">#4452</a>)</li>
</ul>
<h3>Stable style</h3>
<ul>
<li>Fix crashes involving comments in parenthesised return types or
<code>X | Y</code> style unions.
(<a
href="https://redirect.github.com/psf/black/issues/4453">#4453</a>)</li>
<li>Fix skipping Jupyter cells with unknown <code>%%</code> magic (<a
href="https://redirect.github.com/psf/black/issues/4462">#4462</a>)</li>
</ul>
<h3>Preview style</h3>
<ul>
<li>Fix type annotation spacing between * and more complex type variable
tuple (i.e. <code>def fn(*args: *tuple[*Ts, T]) -&gt; None: pass</code>)
(<a
href="https://redirect.github.com/psf/black/issues/4440">#4440</a>)</li>
</ul>
<h3>Caching</h3>
<ul>
<li>Fix bug where the cache was shared between runs with and without
<code>--unstable</code> (<a
href="https://redirect.github.com/psf/black/issues/4466">#4466</a>)</li>
</ul>
<h3>Packaging</h3>
<ul>
<li>Upgrade version of mypyc used to 1.12 beta (<a
href="https://redirect.github.com/psf/black/issues/4450">#4450</a>) (<a
href="https://redirect.github.com/psf/black/issues/4449">#4449</a>)</li>
<li><code>blackd</code> now requires a newer version of aiohttp. (<a
href="https://redirect.github.com/psf/black/issues/4451">#4451</a>)</li>
</ul>
<h3>Output</h3>
<ul>
<li>Added Python target version information on parse error (<a
href="https://redirect.github.com/psf/black/issues/4378">#4378</a>)</li>
<li>Add information about Black version to internal error messages (<a
href="https://redirect.github.com/psf/black/issues/4457">#4457</a>)</li>
</ul>
<h2>24.8.0</h2>
<h3>Stable style</h3>
<ul>
<li>Fix crash when <code># fmt: off</code> is used before a closing
parenthesis or bracket. (<a
href="https://redirect.github.com/psf/black/issues/4363">#4363</a>)</li>
</ul>
<h3>Packaging</h3>
<ul>
<li>Packaging metadata updated: docs are explictly linked, the issue
tracker is now also
linked. This improves the PyPI listing for Black. (<a
href="https://redirect.github.com/psf/black/issues/4345">#4345</a>)</li>
</ul>
<h3>Parser</h3>
<ul>
<li>Fix regression where Black failed to parse a multiline f-string
containing another</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="1b2427a2b7"><code>1b2427a</code></a>
Prepare release 24.10.0 (<a
href="https://redirect.github.com/psf/black/issues/4471">#4471</a>)</li>
<li><a
href="a22b1ebbfd"><code>a22b1eb</code></a>
Add mypyc 3.13 wheel build (<a
href="https://redirect.github.com/psf/black/issues/4449">#4449</a>)</li>
<li><a
href="b7d0e7212b"><code>b7d0e72</code></a>
Bump AndreMiras/coveralls-python-action from
65c1672f0b8a201702d86c81b79187df...</li>
<li><a
href="f1a2f92bba"><code>f1a2f92</code></a>
Include --unstable in cache key (<a
href="https://redirect.github.com/psf/black/issues/4466">#4466</a>)</li>
<li><a
href="8d9d18c033"><code>8d9d18c</code></a>
Fix skipping Jupyter cells with unknown %% magic (<a
href="https://redirect.github.com/psf/black/issues/4462">#4462</a>)</li>
<li><a
href="bbfdba3a5e"><code>bbfdba3</code></a>
Fix docs CI: use venv for uv to fix 'failed to create directory' (<a
href="https://redirect.github.com/psf/black/issues/4460">#4460</a>)</li>
<li><a
href="8fb2add1f7"><code>8fb2add</code></a>
Use builtin generics (<a
href="https://redirect.github.com/psf/black/issues/4458">#4458</a>)</li>
<li><a
href="2a45cecf29"><code>2a45cec</code></a>
Fix crashes with comments in parentheses (<a
href="https://redirect.github.com/psf/black/issues/4453">#4453</a>)</li>
<li><a
href="b4d6d8632d"><code>b4d6d86</code></a>
Drop Python 3.8 support (<a
href="https://redirect.github.com/psf/black/issues/4452">#4452</a>)</li>
<li><a
href="ac018c16ca"><code>ac018c1</code></a>
Require newer aiohttp for blackd (<a
href="https://redirect.github.com/psf/black/issues/4451">#4451</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/psf/black/compare/24.2.0...24.10.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=black&package-manager=pip&previous-version=24.2.0&new-version=24.10.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-01-13 15:10:11 -08:00
Jiajia Qin
80d8931f1d
[webgpu] Use subgroup for matmulnbits (#23224)
### Description
This PR applies subgroup to implement matmulnbits when tile_m > 1 for
intel devices.
With this PR, prefill for 500 tokens prompt for phi3 becomes 3.5s from
8.5s on intel Meteor Lake.
2025-01-13 08:20:42 -08:00
Tianlei Wu
73f5b0c597
LayerNormalization broadcast (limited support for axis=2) (#23297)
### Description

Spec of LayerNormalization supports broadcasting (tensors Scale and B
should be unidirectional broadcastable to tensor X).
https://onnx.ai/onnx/operators/onnx__LayerNormalization.html
However, current implementation only allow scale and bias size to be
X.shape()[axis:].

Example of input tensors that normalized with axis=2:

| X shape |  Scale shape | B shape | Before | After |
| - | - | - | - | - |
| (B, S, D) | (D) | (D) | Supported | Supported |
| (B, S, D) | (1, 1, D) | (1, 1, D) | Supported | Supported |
| (B, S, D) | (B, 1, D) | (B, 1, D) | Not Supported | Supported |
| (B, S, D) | (1, S, D) | (1, S, D) | Not Supported | Supported |
| (B, S, D) | (B, S, D) | (B, S, D) | Not Supported | Supported |


Here we add limited support: axis=2; scale/bias has same shape;
scale/bias/X have same number of dimensions. It could support common use
case in LLM and vision models.

### Motivation and Context

Support Stable Diffusion 3.x and Flux model.
2025-01-10 21:57:18 -08:00
Yulong Wang
a74817ab10
add missing build dependency for onnxruntime_providers_webgpu (#23324)
### Description

Fixes build when specify with flag `--target
onnxruntime_providers_webgpu`

Otherwise the following error will occur:
```
  range.cc
D:\code\onnxruntime\build\Windows\Debug\_deps\onnx-src\onnx\onnx_pb.h(65,10): error C1083: Cannot open include file: 'o
nnx/onnx-ml.pb.h': No such file or directory [D:\code\onnxruntime\build\Windows\Debug\onnxruntime_providers_webgpu.vcxp
roj]
  (compiling source file '../../../onnxruntime/core/providers/webgpu/math/binary_elementwise_ops.cc')
```
2025-01-10 18:07:12 -08:00
Changming Sun
b461f06a15
Remove a hack in adjust_global_compile_flags.cmake (#23313)
### Description

Remove a hack in adjust_global_compile_flags.cmake because the issue
should have been resolved.
2025-01-10 18:05:43 -08:00
Xiaoyu
6e5efb5dba
Fix quant modelproto error (#23322)
### Description
Fixing
[issue](https://github.com/microsoft/onnxruntime/issues/23268#issuecomment-2579010227).
Saving a `ModelProto` with `save_as_external_data=True` updates its
metadata, which could lead to issues later if not managed carefully.
Using a deepcopy to prevent such problems.
2025-01-10 17:48:01 -08:00
Changming Sun
ecdeecae61
Update MACOSX_DEPLOYMENT_TARGET (#23308)
Fix some inconsistency. 

All our iOS build should target iOS 15.1.
All our macOS desktop build should target macOS 13.3 to align with the
changes made in #17361
2025-01-10 14:25:32 -08:00
Satya Kumar Jandhyala
436dfc3c9d
[Native WebGPU] Fix the error when past and present key/value share buffer (#23315)
### Description
Fix error causing incorrect output when past key/value share buffer with
present key/value



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2025-01-10 13:31:50 -08:00
Changming Sun
e7d8596c7c
Update docker images: remove python 3.8 and 3.9 (#23310)
Python 3.8 and 3.9 are removed from the new manylinux images, to reduce
image size.
2025-01-10 13:09:04 -08:00
Changming Sun
1ce59577d5
Add VCPKG triplet files (#23298)
Add VCPKG triplet files. All the triplet files are automatically
generated by gen.py. Put the files there to ease use.
2025-01-09 16:18:51 -08:00
Jiajia Qin
7be006c466
[js/webgpu] Optimize convtranspose (#23302)
### Description
<!-- Describe your changes. -->
BUG #23273

With this change, I see the convTranspose time in that bug becomes ~7s
from ~90s on my Meteor Lake.

This PR does below things:
1. Use stride to update the increasement in the loop.
In the bug, the stride is 1024, which can greatly reduce the loop times.
2. Support components for A to reduce the memory access times.
3. When output channels is 1, the b components can be same with A to
further reduce the memory access times.
2025-01-09 11:24:42 -08:00
Yulong Wang
0627a6cb93
[js/web] fix package export for bundlers (#23257)
### Description
<!-- Describe your changes. -->

This PR tries to fix #22615. (see detailed description in the issue)

A perfect solution would be too difficult to make, because there are a
huge number of combinations of usage scenarios, including combinations
of development framework, bundler, dev/prod mode, and so on.

This PR is using the following approach:
- Introduce a new type of end to end test: export test. This type of
tests are complete web apps that use popular web development frameworks,
and the tests are using puppeteer to run the apps and check if the apps
can run without error.
  - added one nextjs based web app and one vite based web app.
- In the test, perform the following test steps:
  - `npm install` for packages built locally
- `npm run dev` to start dev server and use puppeteer to launch the
browser to test
- `npm run build && npm run start` to test prod build and use puppeteer
to launch the browser to test
- Make changes to ort-web, including:
- special handling on Webpack's behavior of rewriting `import.meta.url`
to a `file://` string
  - revise build definitions
  - fix wasm URL for proxy, if used in a bundled build
2025-01-09 11:01:00 -08:00
Changming Sun
0ec2171b9f
Update Linux docker images (#23244)
The new images contain the following updates:

1. Added Git, Ninja and VCPKG to all docker images
2. Updated CPU containers' GCC version from 12 to 14
3. Pinned CUDA 12 images' CUDNN version to 9.5(The latest one is 9.6)
4. Addressed container supply chain warnings by building CUDA 12 images
from scratch(avoid using Nvidia's prebuilt images)
5. Updated manylinux commit id to
75aeda9d18eafb323b00620537c8b4097d4bef48

Also, this PR updated some source code to make the CPU EP's source code
compatible with GCC 14.
2025-01-09 10:20:33 -08:00
Corentin Maravat
16a246dc1c
Add Gradient for Atan (#23172) 2025-01-09 09:30:53 -08:00
Satya Kumar Jandhyala
d0c7438f5a
[JSEP/WebGPU] Add a fatal error message for unsupported GQA do_rotary attribute. (#23287)
### Description
<!-- Describe your changes. -->

Added a fatal error message for unsupported GroupQuerryAttention
do_rotary attribute.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
https://github.com/microsoft/onnxruntime/issues/22987
Help user understand that this attribute is not supported.
2025-01-09 08:52:17 -08:00
Vincent Wang
3b1a9002f5
Fix Build Error (#23299)
Fix build error.
2025-01-09 13:34:19 +08:00
Vincent Wang
4134cd9e42
Add Optional Redundant Clip Node to NodeUnit (#22888)
Currently we have Clip/Relu with Q fusion on level 2. But for EPs that
are using NodeUnit, these optimizers are not applied. If we want to
remove such redundant Clip/Relu nodes, we need to add code to handle it
for each EP separately.

The PR detects a Clip/Relu is made redundant with a Q node, and add this
information to the corresponding QDQ NodeUnit, so that EPs can ignore
it, and can handle the target node only in the QDQ NodeUnit.
2025-01-09 10:25:32 +08:00
Preetha Veeramalai
ca77de54d2
Updated the Documentation for nuget packages (#23182)
### Description
Update documentation for Nuget packages for OVEP

Co-authored-by: jatinwadhwa921 <jatin.wadhwa@intel.com>
2025-01-08 17:19:39 -08:00
Changming Sun
3328eb3bb3
Update min iOS version to 15.1 to align with React Native 0.76 (#23292)
Update min iOS version to 15.1 to align with React Native 0.76. We need
to update React Native .
See
https://github.com/react-native-community/discussions-and-proposals/discussions/812
for background.

Similar to PR  #20773
2025-01-08 16:02:45 -08:00
Changming Sun
ccbe66d422
Update NDK (#23280)
Similar to #21989
2025-01-08 13:57:23 -08:00
Sam Webster
080f87fa0b
[QNN EP] Make sure everything gets cleaned up (#23275)
### Description
Always make sure resources and callbacks are cleaned up



### Motivation and Context
We've seen problems where the log callback isn't deregistered which can lead to crashes

---------

Co-authored-by: Adrian Lizarraga <adrianlm2@gmail.com>
2025-01-08 12:56:30 -08:00
Hector Li
76d6345f0b
Fix the issue for Gather int64 indices handling (#23274)
### Description
Fix the issue for Gather int64 indices handling. Make it still insert Cast node if it's non-quantized Gather node.
2025-01-08 12:52:08 -08:00
PARK DongHa
5b9c968eaa
Correct ONNX and Protobuf version in vcpkg build (#23285)
### Description

Changes vcpkg manifest and configuration file (vcpkg.json &
vcpkg-configuration.json)

* Update vcpkg version to
https://github.com/microsoft/vcpkg/releases/tag/2024.12.16
* Use protobuf 3.21.12(= `v21.12`) to sync with
[cmake/deps.txt](https://github.com/microsoft/onnxruntime/blob/main/cmake/deps.txt)
  * Resolve https://github.com/microsoft/onnxruntime/issues/22750
* Add `onnx` to vcpkg manifest so `find_package(ONNX)` and
`find_dependency(Protobuf)` can work as expected.
  * Currently, It uses 1.16.2
* v1.17.0 will become available after
https://github.com/microsoft/vcpkg/pull/42942

However, `onnx` in vcpkg doesn't configure
`ONNX_DISABLE_STATIC_REGISTRATION` build option.

* https://github.com/microsoft/vcpkg/pull/38879
* Create "cmake/vcpkg-triplets/" folder and triplet files which use
`VCPKG_CMAKE_CONFIGURE_OPTIONS` for the option
* This requires `VCPKG_OVERLAY_TRIPLETS` environment variable for CI
steps, which is a bit inconvenient.
     I will try to find simple way to get same result

### Motivation and Context

* Help #23158 
  * "ONNX is not consumed from vcpkg"
* "Mismatch protobuf version. When vcpkg is enabled , we should not
fetch protoc from Github which may cause version mismatches."
* https://github.com/microsoft/vcpkg/pull/43126
* #21348
2025-01-08 12:25:17 -08:00
Jian Chen
da35cceac9
Add a temporary path to RN 0.69.3 to update the boost url (#23281)
### Description
Add a temporary path to RN 0.69.3 to update the boost url


### Motivation and Context
Fix the React-native CI until we update the RN to 0.70.15 or 0.73.3+
versions
2025-01-08 09:28:35 -08:00
Vincent Wang
34d70f5fae
[QNN] MatMul Op Builder to Handle All Cases of ONNX's MatMul (#22639)
ONNX's MatMul is same as numpy.matmul, which supports input tensors with
rank >= 1. But QNN's MatMul can only support input tensors with rank >=
2. This PR is to add MatMulOpBuilder for QNN EP to build QNN graph to
support all possible cases of ONNX's MatMul, by adding Reshape nodes if
necessary, e.g., if Reshape 1D input to 2D if exists, and Reshape output
to expected shape at the end.
 
This PR also tries to use FullyConnected Op for MatMul if 2nd input is
2D initializer or 1D tensor because FullyConnected is faster than MatMul
on QNN EP. If 2nd input is 2D tensor, we require it an initializer
because FullyConnected requires 2nd input in [n, k] shape, we can
transpose it when graph building if it's an initializer (we don't want
to add extra Transpose node).

Use swin_base model as example, which contains several MatMul nodes with
2nd input is 2D initializer (not followed by Add), running on Gen3
mobile device, before the change, it takes 34.8876 ms, after this
change, it's 27.0639 ms.
2025-01-08 10:15:55 +08:00
Vincent Wang
ff0ab0a8a5
Quantize Weight for Gemm/Conv on Quantized Model (#22969)
Some quantized models have QDQ around Conv/Gemm but the weight and/or
bias are not quantized. This PR adds WeightBiasQuantization optimizer to
quantize float weight and/or bias to INT8 and INT32 tensors
respectively. We only do this for weight and/or bias initializer so that
ConstantFolding will fold the sub-graph to real quantized initializers
during the graph optimization next round.
2025-01-08 10:00:24 +08:00