Commit graph

11862 commits

Author SHA1 Message Date
Changming Sun
a910cedf73
Move Linux Github actions to a dedicated pool (#22566)
### Description
Move Linux Github actions to a dedicated pool. Currently the
"orttraining-linux-ci-pipeline " is too slow.

### Motivation and Context
To speed up the running.
2024-10-24 07:34:05 -07:00
Kyle
d9ca84ef96
Add DoEsrp Check for Signature Verification (#22570)
### Description
<!-- Describe your changes. -->
Add DoEsrp Check for Signature Verification


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-10-24 16:55:36 +08:00
Prathik Rao
742594c8f0
Clears GPU Cache when there are no more active sessions (#22490)
Fixes https://github.com/microsoft/onnxruntime/issues/21574
2024-10-23 22:22:57 -07:00
dependabot[bot]
08cc2612f4
Bump onnx from 1.16.0 to 1.17.0 in /onnxruntime/python/tools/transformers/models/stable_diffusion/requirements (#22561)
Bumps [onnx](https://github.com/onnx/onnx) from 1.16.0 to 1.17.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/onnx/onnx/releases">onnx's
releases</a>.</em></p>
<blockquote>
<h2>v1.17.0</h2>
<p>ONNX v1.17.0 is now available with exciting new features! We would
like to thank everyone who contributed to this release!
Please visit <a href="https://onnx.ai/">onnx.ai</a> to learn more about
ONNX and associated projects.</p>
<h1>Key Updates</h1>
<h2>ai.onnx Opset 22</h2>
<ul>
<li>Update to support bfloat16:
<ul>
<li><a
href="https://onnx.ai/onnx/operators/onnx__Acos.html#acos-22">Acos</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__Acosh.html#acosh-22">Acosh</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__Asin.html#asin-22">Asin</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__Asinh.html#asinh-22">Asinh</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__Atan.html#atan-22">Atan</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__Atanh.html#atanh-22">Atanh</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__AveragePool.html#averagepool-22">AveragePool</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__Bernoulli.html#bernoulli-22">Bernoulli</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__Conv.html#conv-22">Conv</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__ConvTranspose.html#convtranspose-22">ConvTranspose</a>,
<a href="https://onnx.ai/onnx/operators/onnx__Cos.html#cos-22">Cos</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__Cosh.html#cosh-22">Cosh</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__DeformConv.html#deformconv-22">DeformConv</a>,
<a href="https://onnx.ai/onnx/operators/onnx__Det.html#det-22">Det</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__Dropout.html#dropout-22">Dropout</a>,
<a href="https://onnx.ai/onnx/operators/onnx__Elu.html#elu-22">Elu</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__EyeLike.html#eyelike-22">EyeLike</a>,
<a href="https://onnx.ai/onnx/operators/onnx__GRU.html#gru-22">GRU</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__GlobalAveragePool.html#globalaveragepool-22">GlobalAveragePool</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__GlobalLpPool.html#globallppool-22">GlobalLpPool</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__GlobalMaxPool.html#globalmaxpool-22">GlobalMaxPool</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__GridSample.html#gridsample-22">GridSample</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__HardSigmoid.html#hardsigmoid-22">HardSigmoid</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__HardSwish.html#hardswish-22">HardSwish</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__InstanceNormalization.html#instancenormalization-22">InstanceNormalization</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__LSTM.html#lstm-22">LSTM</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__LpNormalization.html#lpnormalization-22">LpNormalization</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__LpPool.html#lppool-22">LpPool</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__MaxPool.html#maxpool-22">MaxPool</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__MaxRoiPool.html#maxroipool-22">MaxRoiPool</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__MaxUnpool.html#maxunpool-22">MaxUnpool</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__Mish.html#mish-22">Mish</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__Multinomial.html#multinomial-22">Multinomial</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__NegativeLogLikelihoodLoss.html#negativeloglikelihoodloss-22">NegativeLogLikelihoodLoss</a>,
<a href="https://onnx.ai/onnx/operators/onnx__RNN.html#rnn-22">RNN</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__RandomNormal.html#randomnormal-22">RandomNormal</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__RandomNormalLike.html#randomnormallike-22">RandomNormalLike</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__RandomUniform.html#randomuniform-22">RandomUniform</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__RandomUniformLike.html#randomuniformlike-22">RandomUniformLike</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__RoiAlign.html#roialign-22">RoiAlign</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__Round.html#round-22">Round</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__Selu.html#selu-22">Selu</a>,
<a href="https://onnx.ai/onnx/operators/onnx__Sin.html#sin-22">Sin</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__Sinh.html#sinh-22">Sinh</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__Softplus.html#softplus-22">Softplus</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__Softsign.html#softsign-22">Softsign</a>,
<a href="https://onnx.ai/onnx/operators/onnx__Tan.html#tan-22">Tan</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__ThresholdedRelu.html#thresholdedrelu-22">ThresholdedRelu</a></li>
</ul>
</li>
</ul>
<h2>Python Changes</h2>
<ul>
<li>Support for numpy &gt;= 2.0</li>
</ul>
<h1>Bug fixes and infrastructure improvements</h1>
<ul>
<li>Fix Check URLs errors <a
href="https://redirect.github.com/onnx/onnx/pull/5972">5972</a></li>
<li>Use CMAKE_PREFIX_PATH in finding libprotobuf <a
href="https://redirect.github.com/onnx/onnx/pull/5975">5975</a></li>
<li>Bump main VERSION_NUMBER to 1.17.0 <a
href="https://redirect.github.com/onnx/onnx/pull/5968">5968</a></li>
<li>Fix source and pip tar.gz builds on s390x systems <a
href="https://redirect.github.com/onnx/onnx/pull/5984">5984</a></li>
<li>Fix unique_name <a
href="https://redirect.github.com/onnx/onnx/pull/5992">5992</a></li>
<li>Fix SegFault bug in shape inference <a
href="https://redirect.github.com/onnx/onnx/pull/5990">5990</a></li>
<li>Fix onnx.compose when connecting subgraphs <a
href="https://redirect.github.com/onnx/onnx/pull/5991">5991</a></li>
<li>Fix conversion from split 11 to split 18 <a
href="https://redirect.github.com/onnx/onnx/pull/6020">6020</a></li>
<li>Update error messages for NegativeLogLikelihoodLoss inference
function <a
href="https://redirect.github.com/onnx/onnx/pull/6021">6021</a></li>
<li>Generalize input/output number check in shape inference <a
href="https://redirect.github.com/onnx/onnx/pull/6005">6005</a></li>
<li>Replace rank inference with shape inference for Einsum op <a
href="https://redirect.github.com/onnx/onnx/pull/6010">6010</a></li>
<li>build from source instruction with latest cmake change <a
href="https://redirect.github.com/onnx/onnx/pull/6038">6038</a></li>
<li>Handle OneHot's depth value during shape inference <a
href="https://redirect.github.com/onnx/onnx/pull/5963">5963</a></li>
<li>Not to install cmake in pyproject.toml on Windows <a
href="https://redirect.github.com/onnx/onnx/pull/6045">6045</a></li>
<li>fix a skipped shape infer code <a
href="https://redirect.github.com/onnx/onnx/pull/6049">6049</a></li>
<li>Include the &quot;.onnxtext&quot; extension in supported
serialization format <a
href="https://redirect.github.com/onnx/onnx/pull/6051">6051</a></li>
<li>Allow ReferenceEvaluator to return intermediate results <a
href="https://redirect.github.com/onnx/onnx/pull/6066">6066</a></li>
<li>Fix 1 typo in numpy_helper.py <a
href="https://redirect.github.com/onnx/onnx/pull/6041">6041</a></li>
<li>Remove benchmarking code <a
href="https://redirect.github.com/onnx/onnx/pull/6076">6076</a></li>
<li>Prevent crash on import after GCC 8 builds <a
href="https://redirect.github.com/onnx/onnx/pull/6048">6048</a></li>
<li>Check graph outputs are defined <a
href="https://redirect.github.com/onnx/onnx/pull/6083">6083</a></li>
<li>Enable additional ruff rules <a
href="https://redirect.github.com/onnx/onnx/pull/6032">6032</a></li>
<li>Add missing shape inference check for DequantizeLinear <a
href="https://redirect.github.com/onnx/onnx/pull/6080">6080</a></li>
<li>Add bfloat16 to all relevant ops <a
href="https://redirect.github.com/onnx/onnx/pull/6099">6099</a></li>
<li>fix(ci): install python dependencies with --only-binary :all: in
manylinux <a
href="https://redirect.github.com/onnx/onnx/pull/6120">6120</a></li>
<li>fix: install google-re2 with --only-binary option <a
href="https://redirect.github.com/onnx/onnx/pull/6129">6129</a></li>
<li>Specify axis parameter for DequantizeLinear when input rank is 1 <a
href="https://redirect.github.com/onnx/onnx/pull/6095">6095</a></li>
<li>Pin onnxruntime to 1.17.3 for release CIs <a
href="https://redirect.github.com/onnx/onnx/pull/6143">6143</a></li>
<li>Fix INT4 TensorProto byte size is 5x larger than expected with
negative values <a
href="https://redirect.github.com/onnx/onnx/pull/6161">6161</a></li>
<li>Mitigate tarball directory traversal risks <a
href="https://redirect.github.com/onnx/onnx/pull/6164">6164</a></li>
<li>Fix reference implementation for ScatterND with 4D tensors <a
href="https://redirect.github.com/onnx/onnx/pull/6174">6174</a></li>
<li>Addition of group &gt; 1 in test and in backend for ConvTranspose <a
href="https://redirect.github.com/onnx/onnx/pull/6175">6175</a></li>
<li>Support for bfloat16 for binary, unary operators in reference
implementation <a
href="https://redirect.github.com/onnx/onnx/pull/6166">6166</a></li>
<li>Refactor windows workflow to work on standard windows <a
href="https://redirect.github.com/onnx/onnx/pull/6190">6190</a></li>
<li>Fix a few crashes while running shape inference <a
href="https://redirect.github.com/onnx/onnx/pull/6195">6195</a></li>
<li>Update onnx to work with numpy&gt;=2.0 <a
href="https://redirect.github.com/onnx/onnx/pull/6196">6196</a></li>
<li>Use sets to improve performance of dfs search <a
href="https://redirect.github.com/onnx/onnx/pull/6213">6213</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="b8baa84466"><code>b8baa84</code></a>
Set version 1.17.0 for official release (<a
href="https://redirect.github.com/onnx/onnx/issues/6405">#6405</a>)</li>
<li><a
href="6d77b80821"><code>6d77b80</code></a>
[Cherry-Pick] Fix main url checks (<a
href="https://redirect.github.com/onnx/onnx/issues/6312">#6312</a>) (<a
href="https://redirect.github.com/onnx/onnx/issues/6327">#6327</a>)</li>
<li><a
href="174938d8b7"><code>174938d</code></a>
[Cherry-Pick] Fix protobuf pkg 5.28.0 failing on Windows (<a
href="https://redirect.github.com/onnx/onnx/issues/6342">#6342</a>) (<a
href="https://redirect.github.com/onnx/onnx/issues/6347">#6347</a>)</li>
<li><a
href="f18d5931ad"><code>f18d593</code></a>
[Cherry-Pick] Remove unused variables (<a
href="https://redirect.github.com/onnx/onnx/issues/6303">#6303</a>) (<a
href="https://redirect.github.com/onnx/onnx/issues/6324">#6324</a>)</li>
<li><a
href="c58890537f"><code>c588905</code></a>
Set version in rel-1.17.0 to 1.17.0rc1 (<a
href="https://redirect.github.com/onnx/onnx/issues/6317">#6317</a>)</li>
<li><a
href="4392c2c9ae"><code>4392c2c</code></a>
Prepare for rel-1.17.0 (<a
href="https://redirect.github.com/onnx/onnx/issues/6281">#6281</a>)</li>
<li><a
href="cb54169e4f"><code>cb54169</code></a>
Update ort filter to 1.20.0 to skip tests known to fail with ort 1.19.0
(<a
href="https://redirect.github.com/onnx/onnx/issues/6306">#6306</a>)</li>
<li><a
href="99e1fd352c"><code>99e1fd3</code></a>
Bump reviewdog/action-misspell from 1.21.0 to 1.23.0 (<a
href="https://redirect.github.com/onnx/onnx/issues/6268">#6268</a>)</li>
<li><a
href="1920565505"><code>1920565</code></a>
Bump ossf/scorecard-action from 2.3.3 to 2.4.0 (<a
href="https://redirect.github.com/onnx/onnx/issues/6273">#6273</a>)</li>
<li><a
href="2e8f2289b9"><code>2e8f228</code></a>
Bump mypy from 1.10.1 to 1.11.1 (<a
href="https://redirect.github.com/onnx/onnx/issues/6275">#6275</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/onnx/onnx/compare/v1.16.0...v1.17.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=onnx&package-manager=pip&previous-version=1.16.0&new-version=1.17.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-10-23 15:04:46 -07:00
Changming Sun
a25c9315ea
Move ORT Training pipeline to github actions (#22543)
Move ORT Training pipeline to github actions and enable CodeQL scan for the code(including inference code).
We will move all pull request pipelines to Github Actions.
2024-10-23 11:57:15 -07:00
Satya Kumar Jandhyala
fd8ee4894d
[JS/WebGPU] GroupQueryAttention rewrite (#20946)
### Description
Implement JSEP GroupQueryAttention



### Motivation and Context
Required to enable certain LLM models to run using WebGPU.
2024-10-23 10:14:09 -07:00
Wanming Lin
33e2f6ad8d
[WebNN EP] Support external data (#22263)
### Description
This PR introduces support for registering external data inside WebNN
EP.

### Motivation and Context

- The WebNN EP needs to register the initializers at graph compilation
stage, for initializers from external data, it can't leverage the
general external data loader framework because the graph compilation of
WebNN EP is executed before external data loader called.
- Exposes the `utils::GetExternalDataInfo`, it is useful for WebNN EP to
read the external tensor's infomation.
- Define a new `registerMLConstant` in JSEP to create WebNN constants
from external data in WebNN backend, with the info of tensor as
parameters, as well as the `Module.MountedFiles`, which holds all
preloaded external files.
2024-10-23 08:18:16 -07:00
Jian Chen
ffaddead0a
Refactor cuda packaging pipeline (#22542)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-10-23 08:14:10 -07:00
ivberg
0028d3f332
Fix crash in QNN EP - ResetQnnLogLevel (#22456)
### Description
Fix crash with extra checks ResetQnnLogLevel. 

From the dump it looks like during ETW callbacks, while the provider is stopping, we attempt to reset the QNN log level.
While the QNN BackEndMgr (this) is alive logger_ is not valid

### Motivation and Context
ORT should not crash
2024-10-22 20:45:44 -07:00
Wanming Lin
ba40022ec4
[WebNN EP] Support axes and fix some validation for Resize (#21952)
- Supports arbitrary axes for Resize opset 18+
- Check all inputs and attributes more carefully

---------

Co-authored-by: Dwayne Robinson <fdwr@hotmail.com>
2024-10-22 20:26:34 -07:00
Wanming Lin
034ab4fa04
[WebNN EP] Fixed a minor bug in ConvTranspose (#22384)
For ConvTranspose, the filter should be transposed from iohw -> ohwi if
it is NHWC preferred layout.
2024-10-22 20:03:09 -07:00
Wanming Lin
e6e94e6252
[WebNN EP] Use boolean flags instead of MLTensorUsage (#22497)
Fixed #22495

We will keep MLTensorUsage until it is removed from Chromium.

---------

Co-authored-by: Dwayne Robinson <fdwr@hotmail.com>
2024-10-22 17:20:36 -07:00
Tianlei Wu
63a07c1838
update pipeline name list in run_CIs_for_external_pr.py (#22540)
### Description
Update list of CI pipelines to trigger for external PRs.

### Motivation and Context
The pipelines triggered for external PRs are not consistent with
internal PRs.
2024-10-22 17:14:48 -07:00
Hector Li
fc2be09386
Enable QLinearMatMul for opset21 (#22488)
### Description
Enable QLinearMatMul for opset21
2024-10-22 14:33:36 -07:00
Sophie Schoenmeyer
62f99d8a8d
Change API docs schedule from monthly to every 2 weeks (#22524)
### Description
<!-- Describe your changes. -->
Current API docs workflows are scheduled to run monthly, but artifacts
expire after 30 days, which could create issues for 31-day months.
Updating to regenerate artifacts every 2 weeks.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-10-22 09:21:27 -07:00
Tianlei Wu
8a04ab421d
[CUDA] upgrade opencv in stable diffusion demo (#22470)
### Description
(1) Upgrade opencv
(2) Add some comments about onnxruntime-gpu installation

### Motivation and Context
opencv-python was locked to an older version, which has security
vulnerabilities: see https://github.com/microsoft/onnxruntime/pull/22445
for more info
2024-10-21 23:20:49 -07:00
Ted Themistokleous
c1f7485193
[MIGraphX EP] Add GroupNormalization and LayerNormalization Support (#22503)
Need this to ensure we use GroupNormalization and LayerNormalization operators in MIGraphX
2024-10-21 20:59:07 -07:00
genmingz@AMD
34a61e2df4
[VitisAI] Vitis ai ep support dynamic options (#22386)
### Description
relate to #22282. Let Vitis ai ep handles dynamic_options



### Motivation and Context

---------

Co-authored-by: genmingz <genmingz@amd.com>
2024-10-21 17:05:12 -07:00
Changming Sun
88676e62b9
Remove nsync (#20413)
### Description
1. Remove the onnxruntime::OrtMutex class and replace it with
~absl::Mutex~ std::mutex.
2. After this change, most source files will not include <Windows.h>
indirectly.


### Motivation and Context
To reduce the number of deps we have, and address some Github issues
that are related to build ONNX Runtime from source.
In PR #3000 , I added a custom implementation of std::mutex . It was
mainly because at that time std::mutex's default constructor was not
trivial on Windows. If you had such a mutex as a global var, it could
not be initialized at compile time. Then VC++ team fixed this issue.
Therefore we don't need this custom implementation anymore.

This PR also removes nsync. I ran several models tests on Linux. I
didn't see any perf difference.
This PR also reverts PR #21005 , which is no longer needed since conda
has updated its msvc runtime DLL.

This PR unblocks #22173 and resolves #22092 . We have a lot of open
issues with nsync. This PR can resolve all of them.
2024-10-21 15:32:14 -07:00
Changming Sun
c7138a2630
Update CMake (#22516)
This pull request upgrades the CMake version from v3.31.0-rc1 to
v3.31.0-rc2 to include a bug fix for CUDA
https://gitlab.kitware.com/cmake/cmake/-/merge_requests/9902 from Nvidia
company.

AB#51692
2024-10-21 07:51:05 -07:00
kailums
3174e3da57
update pipline python version from 3.8 to 3.12 (#22517)
### Description
As the python3.8 is going to reach EOL. 

https://discuss.python.org/t/python-3-13-0-final-has-been-released/
https://discuss.python.org/t/python-3-8-is-now-officially-eol/66983

we update our ci pipeline python version which still using 3.8 to 3.12
2024-10-21 07:50:31 -07:00
Vincent Wang
60da4a2ccd
Fix SegFault (#22499)
Fix SegFault reported by
https://github.com/microsoft/onnxruntime/issues/22493.
2024-10-21 14:36:28 +08:00
Adrian Lizarraga
abad69b322
[StableDiffusion] Pin huggingface_hub to 0.25.2 due to breaking changes in 0.26.0 (#22508)
### Description
Pin huggingface_hub to 0.25.2 due to breaking changes in 0.26.0.



### Motivation and Context
We depend on `diffusers==0.28.0`, which [depends
on](https://github.com/huggingface/diffusers/blob/v0.28.0-release/setup.py#L104)
`huggingface_hub>=0.20.2`. There are breaking changes with the latest
huggingface_hub 0.26.0 release that break our Big Models pipeline:
[Release v0.26.0: Multi-tokens support, conversational VLMs and quality
of life improvements ·
huggingface/huggingface_hub](https://github.com/huggingface/huggingface_hub/releases/tag/v0.26.0)

Specifically, the breaking changes to `cached_download()` cause our
pipeline to fail.

![image](https://github.com/user-attachments/assets/c1d15c7e-9a5d-4ef3-8d1b-35bde0a2ca82)
2024-10-18 20:31:20 -07:00
Jeff Daily
5aabc53121
[ROCm] redo hipify of version controlled files (#22449)
### Description
Updates the ROCm EP opsets to match the current CUDA EP opsets. Also
enable the test CApiTest.basic_cuda_graph_with_annotation.

Note that some changes are whitespace-only. These changes were made to
improve the comparison of corresponding ROCm and CUDA EP source files
when using a side by side diff tool.

### Motivation and Context
The ROCm EP derives from the CUDA EP. Many source files are shared
between the EPs and "hipified" during the ROCm EP build, however quite a
few files within the ROCm EP are under source control after their
initial hipification. Over time these ROCm EP files get stale relative
to their CUDA EP counterparts. It becomes necessary to re-hipify these
otherwise static files in order to pick up important changes such as
opset differences.
2024-10-18 12:40:54 -07:00
Hector Li
d2a5ee2e5e
Update the python wrapper script to support weight sharing case (#22341)
Update the python wrapper script to support weight sharing case
### Description
update the script to support json file that from QNN converter or the one extracted from QNN context binary file for the weight sharing scenario
2024-10-18 11:16:20 -07:00
Sophie Schoenmeyer
a732f7a4b3
Update README.md with release roadmap info (#22486)
The ONNX Runtime Release Roadmap on our website is not very easy to find
right now, so I'm adding a link here to make it more accessible.

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
2024-10-18 11:00:43 -07:00
Edward Chen
7964d3aef6
Specify iOS simulator runtime version (#22474)
- Allow specification of iOS simulator runtime version to use.
- Pick simulator runtime version (iphonesimulator 16.4) that is supported by the Xcode version (14.3.1) that we use.
- Disable CoreML EP's DepthToSpace op support for CoreML version less than 7, with DCR mode, and FP16 input. It doesn't produce the correct output in this case.
- Some cleanup of iOS test infrastructure.
2024-10-18 09:26:06 -07:00
Enrico Galli
1e5bda88f0
[WebNN EP] Cache MLTensors between runs (#22278)
### Description
This change enables caching `MLTensor`s between inferences runs. This is
done by keeping a reference to `MLTensor`s alive after they have been
released. `MLTensor`s are only destroyed once the sessions goes out of
scope.

### Motivation and Context
Creating and destroying `MTensor`s on every run has a non-trivial
performance penalty. This performance penalty materializes when using
`ort.Tensors`[location=cpu] for inputs/outputs or when using the CPU EP
as a fallback EP for unsupported operators. The former could be
mitigated by developer using `ort.Tensors`[location=ml-tensor]. The
latter cannot be mitigated by developers.
2024-10-18 08:07:00 -07:00
Yulong Wang
b4cb937440
fix LayerNorm f16 CPU implementation (#22479)
### Description

The recent PR #22223 introduced 2 bugs in implementation of CPU
LayerNorm f16:
- possible access to nullptr for bias
`const TensorShape& bias_shape = bias->Shape();` will crash when `bias`
does not exist. (amazingly seems this one is not coverred by any test
case)
   - fix: guard with pointer check
- a racing condition inside ComputeJob
`ComputeJob()` is dispatched to threadpool and it internally tries to
modify `LayerNormImpl::scale_fp32_` and `LayerNormImpl::bias_fp32_`,
which are `std::unique_ptr`s and are not thread-safe.
- fix: move the modification of `LayerNormImpl::scale_fp32_` and
`LayerNormImpl::bias_fp32_` out of `ComputeJob()` and put into
`LayerNormImpl::ComputeWithoutContext()`. It may still have racing
condition because `ConcurrentRunSupported` is set to `true` for CPU EP.
Added an OrtMutex.

This should fixes the recent flaky tests as well.
2024-10-17 18:49:38 -07:00
Akshay Sonawane
e5c2e50849
bumps up version in main from 1.20 -> 1.21 (#22482)
Bump up version in main from 1.20.0 to 1.21.0 since the release branch
has been cut.
2024-10-17 12:32:35 -07:00
Yulong Wang
55c584954c
fix supports_device() in python interface (#22473)
### Description

`get_device()` returns a string of hyphen connected device names, such
as "GPU-DML". It's a problem that when CUDA is disabled but OpenVino GPU
is enabled in the build, because in this case `get_device()` returns
"CPU-OPENVINO_GPU", so `supports_device("CUDA")` will return `True` in
this build.

Splitting the value of `get_device()` by "-" and check if the input is
in the list is not an option because it seems some code in the code base
stores the value of `get_device()` and use the value to call
`supports_device()`. Using this implementation will cause
`supports_device("GPU-DML")` to return `False` for a build with
`get_device() == "GPU-DML"` because `"GPU-DML" in ["GPU", "DML"]` is
`False`.

This change also helps to avoid further problems when "WebGPU" is
introduced.
2024-10-17 12:10:25 -07:00
Yulong Wang
1247d69c28
Add onnxtestdata cache for win-web-multi-browsers pipeline (#22477)
### Description

Apply onnxtestdata cache to win-web-multi-browsers pipeline

Same change that applied to win-web-ci #16659
2024-10-17 12:03:29 -07:00
Edward Chen
d649cac9af
Consolidate CPU allocator arena creation checks into a helper function. (#22460) 2024-10-17 09:08:44 -07:00
Wanming Lin
52b77762bd
[WebNN EP] Remove the numThreads option (#22464)
Chromium has removed this option via
https://chromium-review.googlesource.com/c/chromium/src/+/5905656.
2024-10-17 07:45:39 -07:00
Hector Li
ac98bcae37
Update QNN default version to 2.27 in CI pipeline (#22471)
### Description
Update QNN default version to 2.27 in CI pipeline
2024-10-16 22:05:47 -07:00
Adrian Lizarraga
84d48b6ad6
[QNN EP] Add provider option to offload graph I/O quantization/dequantization to the CPU EP (#22436)
### Description
Adds QNN provider option `offload_graph_io_quantization` to offload
graph input quantization and graph output dequantization to the CPU EP.
Option is disabled by default to maintain current behavior.


### Motivation and Context
Offloading the handling of I/O quantization to the CPU EP significantly
improves inference latency for many models.
2024-10-16 15:00:53 -07:00
Yulong Wang
b7050c8390
remove unused _fence_ events for profiler (#22403)
### Description

The current code to log profiler event "_fence_before" and
"_fence_after" seems to be useless. The measured duration of the 2
events are 0.

Removed them.
2024-10-16 13:38:32 -07:00
Yulong Wang
c3a94c6c5f
Fix Memcpy transformer when dealing multiple EPs (#22413)
### Description

Fix Memcpy transformer when dealing multiple EPs.

---------

Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
2024-10-16 13:38:22 -07:00
Patrice Vignola
f610605a48
[DML EP] Support partial rotary embedding (#22417)
### Description
This adds support for partial RotaryEmbedding to DML. Essentially,
partial RotaryEmbedding simply consists of doing the rotary embedding
calculation on a subregion of the input tensor of as if its head size
was `rotary_embedding_dim`, while leaving the second part of the tensor
(i.e. `head_size - rotary_embedding_dim`) alone.

To achieve this, all we need to do is follow the following steps:

1. Split the tensor into 2 parts
2. Run the rotary embedding algorithm on the first part, just like we
were doing before on the entire tensor
3. Join the 2 parts back together

Since we're leaving the middle part intact, the RotaryEmbedding fusion
will still be done within DML. Also, the concat at the end is
essentially free because DML optimizes it out and directly allocate the
result of RotaryEmbedding at the right place. The only overhead here is
the splitting of the tensor at the beginning, which we should eventually
make part of the RotaryEmbedding fusion within DML.



### Motivation and Context
This fix allows us to correctly run models that have a
`partial_rotary_factor` setting in huggingface, including Nvidia's
Nemotron: https://huggingface.co/nvidia/Nemotron-Mini-4B-Instruct
2024-10-16 13:28:44 -07:00
Patrice Vignola
a164228c10
[DML EP] Add QDQ fusions for DML and disable QDQ + Resample fusion (#22458)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-10-16 12:40:39 -07:00
Changming Sun
f9e623e4d1
Update CMake to 3.31.0rc1 (#22433)
To include a bug fix:
https://gitlab.kitware.com/cmake/cmake/-/merge_requests/9890

Discussion:

https://discourse.cmake.org/t/cmake-incorrectly-links-to-nvrtc-builtins/12723/4

This bug fix should be included in our upcoming release, because right
now our GPU package depends on “libnvrtc-builtins.so.12.2" which has a
hardcoded CUDA version: 12.2. The minor CUDA version should not be
there.
2024-10-16 11:50:13 -07:00
Caroline Zhu
691de83892
Enable BrowserStack tests (#22457)
### Description
BrowserStack account issues have been resolved -- this PR enables E2E
browserstack tests in the pipeline again
2024-10-16 11:10:12 -07:00
PeixuanZuo
bf604428aa
[ROCm] Update ROCm Nuget pipeline to ROCm 6.2 (#22461)
1. Update ROCm Nuget pipeline build version to ROCm 6.2
2. Update AMD-GPU Agent Pool base docker image for ROCm Nuget pipeline
test stage. search `AMD GPU pipeline Nuget` page in onenote to see how
to update it.

passed pipeline:
https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=580846&view=results
2024-10-16 10:36:49 -07:00
Yi Zhang
2b8fc5529b
Enable RunMatMulTest all test cases support FP16 (#22440)
### Description
<!-- Describe your changes. -->


### Motivation and Context
increase FP16 test coverage for all related EPs
2024-10-16 09:57:05 +08:00
Jian Chen
af00a20f8a
Change ORT nightly python packages' name (#22450)
### Description
Our nightly CPU python package's name is "ort-nightly" instead of
"onnxruntime". It was because of some historical reasons. Tensorflow was
like that.
Now we would prefer to make them the same.
Do this change for all nightly python packages, including CPU,
GPU(CUDA), and maybe others.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-10-15 18:44:59 -07:00
Justin Beavers
a5e85a950c
Fix training artifacts for 2GB+ models and MSELoss (#22414) 2024-10-15 16:47:16 -07:00
Caroline Zhu
6407d81b35
Disable BrowserStack testing stage (#22438)
### Description
We are seeing this [packaging
pipeline](https://aiinfra.visualstudio.com/Lotus/_build?definitionId=940&_a=summary)
fail because we are running into BrowserStack account issues. Disabling
this step until issues are resolved
2024-10-15 13:27:05 -07:00
Ted Themistokleous
4c47bca8fe
[MIGraphX EP] Add additional operators (#22446)
* Add in missing operators for llama run

* Add simplified layer norm ops

### Description
<!-- Describe your changes. -->
Adding additional supported operators into MIGraphX EP that are
supported in MIGraphX


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Allows for more models to be run through MIGraphX EP
2024-10-15 12:21:22 -07:00
Yi Zhang
c5a0fb182a
Fix big models exception caused by timm upgrade (#22442)
### Description
Today, stable diffusion stage failed due to there's a upgrade in timm.
controlnet_aux depends on it.
And its latest version limit the timm version less than 0.6.7.
So upgrading controlnet_aux can solve it.
And controlnet_aux uses opencv-python-headless, pin
opencv-python-headless to 4.8.0.74 too.


### Motivation and Context
2024-10-15 21:13:52 +08:00
wejoncy
20a45dd67b
[CoreML ML Program] support acclerators selector (#22383)
### Description
For no, CoreML only support run mlmodels on CPU/ALL, However, sometimes
CPU_GPU would be faster a lot.

We support the option to select different hardware to boost performance
in this PR.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2024-10-15 11:50:11 +08:00