### Description
Increase fp16 qnbitgemm UT tol and use fixed seeds.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
#22380 removes the file
`tools/ci_build/github/linux/docker/inference/x86_64/python/cpu/scripts/requirements.txt`
but it is still used in `dockerfiles/Dockerfile.cuda`.
This change updates the file path of the requirements.txt
fixes#22945.
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Upgrade version of Dawn.
Removed dawn.patch, because all patches are included in upstream.
Updated code that affected by API changes (`const char*` ->
`WGPUStringView`)
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Use TensorRT and CUDA version fetched at **runtime** to get the hash
value which determines the cache name.
The old way to get the version is at compile/build time that might have
some issues in some cases,
ex:
TRT EP uses the TRT version which we or users built against at compile
time.
However, users can change different TRT version at run time, that can
cause issue because TRT EP always checks the "fixed" TRT version, not
the TRT version it uses now. This can cause TRT EP to use incompatible
TRT engine cache.
see the github issue here:
https://github.com/microsoft/onnxruntime/issues/22382#issuecomment-2404140754
### Description
Allows to build ONNX Runtime with a custom local path of Dawn's source
code.
Usage:
```sh
build --use_webgpu --cmake_extra_defines "onnxruntime_CUSTOM_DAWN_SRC_PATH=C:/src/dawn"
```
Bumps [cross-spawn](https://github.com/moxystudio/node-cross-spawn) from
7.0.3 to 7.0.6.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/moxystudio/node-cross-spawn/blob/master/CHANGELOG.md">cross-spawn's
changelog</a>.</em></p>
<blockquote>
<h3><a
href="https://github.com/moxystudio/node-cross-spawn/compare/v7.0.5...v7.0.6">7.0.6</a>
(2024-11-18)</h3>
<h3>Bug Fixes</h3>
<ul>
<li>update cross-spawn version to 7.0.5 in package-lock.json (<a
href="f700743918">f700743</a>)</li>
</ul>
<h3><a
href="https://github.com/moxystudio/node-cross-spawn/compare/v7.0.4...v7.0.5">7.0.5</a>
(2024-11-07)</h3>
<h3>Bug Fixes</h3>
<ul>
<li>fix escaping bug introduced by backtracking (<a
href="640d391fde">640d391</a>)</li>
</ul>
<h3><a
href="https://github.com/moxystudio/node-cross-spawn/compare/v7.0.3...v7.0.4">7.0.4</a>
(2024-11-07)</h3>
<h3>Bug Fixes</h3>
<ul>
<li>disable regexp backtracking (<a
href="https://redirect.github.com/moxystudio/node-cross-spawn/issues/160">#160</a>)
(<a
href="5ff3a07d9a">5ff3a07</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="77cd97f3ca"><code>77cd97f</code></a>
chore(release): 7.0.6</li>
<li><a
href="6717de49ff"><code>6717de4</code></a>
chore: upgrade standard-version</li>
<li><a
href="f700743918"><code>f700743</code></a>
fix: update cross-spawn version to 7.0.5 in package-lock.json</li>
<li><a
href="9a7e3b2165"><code>9a7e3b2</code></a>
chore: fix build status badge</li>
<li><a
href="085268352d"><code>0852683</code></a>
chore(release): 7.0.5</li>
<li><a
href="640d391fde"><code>640d391</code></a>
fix: fix escaping bug introduced by backtracking</li>
<li><a
href="bff0c87c8b"><code>bff0c87</code></a>
chore: remove codecov</li>
<li><a
href="a7c6abc6fe"><code>a7c6abc</code></a>
chore: replace travis with github workflows</li>
<li><a
href="9b9246e096"><code>9b9246e</code></a>
chore(release): 7.0.4</li>
<li><a
href="5ff3a07d9a"><code>5ff3a07</code></a>
fix: disable regexp backtracking (<a
href="https://redirect.github.com/moxystudio/node-cross-spawn/issues/160">#160</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/moxystudio/node-cross-spawn/compare/v7.0.3...v7.0.6">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
Dependabot will merge this PR once CI passes on it, as requested by
@fs-eire.
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
### Description
This PR updates installation script to fix it for CUDA v12. However, it
may be difficult for CUDA v11 since the steps are quite complicated to
automate. Added a few lines of instructions instead.
fixes#22877
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
-Add split/pad/neg/not/ceil/round/min/max op support
-Fix conv2d op default pads value issue
-Add VSINPU EP to support python bindings
### Motivation and Context
-New OPs support for VSINPU EP
---------
Signed-off-by: Kee <xuke537@hotmail.com>
### Description
<!-- Describe your changes. -->
Support GroupQueryAttention operator for native webgpu ep.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This is required for inferencing some LLMs.
### Description
This pull request introduces several enhancements and new
functionalities to the `tools/python/util/android/android.py` file,
focusing on improving the management of Android emulators. The most
important changes include adding a timeout parameter to the
`start_emulator` function, adding checks to prevent multiple emulators
from running simultaneously, and introducing new utility functions to
manage emulator processes more effectively.
Enhancements to `start_emulator` function:
* Added a `timeout_minutes` parameter to the `start_emulator` function
to make the startup timeout configurable.
[[1]](diffhunk://#diff-c54db556a9c445989f830c09ab90ce2704e648deaccce9c9e0ee4875ddaa864dL108-R117)
[[2]](diffhunk://#diff-c54db556a9c445989f830c09ab90ce2704e648deaccce9c9e0ee4875ddaa864dL158-R170)
* Added a check to prevent starting a new emulator if one with the same
AVD name is already running.
* Included additional emulator arguments `-verbose` for better control
and debugging.
* Added a final verification step to ensure the emulator has started
successfully.
New utility functions for managing emulator processes:
* Introduced `check_emulator_running_using_avd_name `,
`check_emulator_running_using_process`, and
`check_emulator_running_using_pid` to check if an emulator is running
based on AVD name, process instance, or PID, respectively.
* Added `stop_emulator_by_proc` and `stop_emulator_by_pid` functions to
stop the emulator process using a `subprocess.Popen` instance or PID,
with a configurable timeout.
* Updated the `stop_emulator` function to use the new utility functions
for stopping the emulator process.
These changes enhance the robustness and flexibility of the emulator
management utilities, making it easier to handle different scenarios in
CI environments and development workflows.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
Some quantized models don't have Conv/Gemm node's bias quantized but
still leave them in float. This PR is to create a sub-graph to quantize
the bias for Conv/Gemm nodes with scale = scale_input_0 * scale_input_1
and zp = 0. We only do this for bias initializer so that ConstantFolding
will fold the sub-graph to a real quantized int32 bias initializer
during the graph optimization next round.
Add ReduceL2 support to QNN EP. Some of the QNN AI Hub models contain
Reduce L2, such as openai_clip_CLIPTextEncoder and
openai_clip_CLIPIamgeEncoder, without this PR, the ReduceL2 will be
assigned to CPU and the graph will be split to 2 QNN graphs, which this
PR, all nodes will be in QNN EP.
### Description
Fix mamtulnbits accuracy level
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
- Erf
- Round
- Max
- ReduceMax
- ReduceMean
- ReduceSum
- Unsqueeze
- Squeeze
- Softmax
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
---------
Co-authored-by: Scott McKay <skottmckay@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
### Description
Fixes regression in post merge pipeline caused by #22612
### Motivation and Context
So far, there isn't the artifactFeeds in Public Project
### Description
AppendExecutionProvider("CoreML", {{"MLComputeUnits","MLProgram"}})
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
---------
Co-authored-by: Scott McKay <skottmckay@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
### Description
<!-- Describe your changes. -->
Update this patch because the origin file has changed
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Fix sequential_executor.cc to avoid segfault when profiling is used on
model with empty Optional
### Motivation and Context
Fixes#22890
We need to be able to control/override the exact version of qnn sdk used
for the android build as qnn-runtime (maven package) releases are slower
to QNN SDK releases.
### Description
Add a new stage to build cuda and dml in Windows GPU CI pipeline (PR
checks) to prevent regressions introduced by new cuda tests.
Update all tests in cuda/testcases name prefix to CudaEp for skipping
them easily
### Motivation and Context
1. CudaNhwcEP is added by default when using cuda ep
2. if onnxruntime_ENABLE_CUDA_EP_INTERNAL_TES is enable, the tests in
tests/provider/cuda/testcases is added too.
### To do
add enable_pybind in the new stage.
Now, --enable_pybind will trigger some python test, like
onnxruntime_test_python.py.
It uses the API of get_avaible_providers() .
More discussions are needed to decide how to make it works
### Description
* Install PyTorch for transformers tests. The installation is before
python tests so that it can use torch if needed.
* Update protobuf and numpy versions used in transformers test.
### Motivation and Context
Currently, transformers tests are enabled in the following CI pipelines:
* Linux CPU CI Pipeline (torch for cpu-only)
* Linux GPU CI Pipeline (torch for cuda 12)
* Windows GPU CUDA CI Pipeline (torch for cpu-only right now, note that
we might change it to torch for cuda 12 in the future).
For ROCm CI Pipeline, transformer tests are enabled but skipped since
onnx package is not installed in CI.
Previously, torch was not installed before python tests, so some tests
depending on torch were skipped like
[test_bind_onnx_types_not_supported_by_numpy](f6e1d44829/onnxruntime/test/python/onnxruntime_test_python_iobinding.py (L199))
or [test
user_compute_stream](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/test/python/onnxruntime_test_python.py#L465-L476).
In this PR, we changed build.py to install torch before running python
tests.
### Description
<!-- Describe your changes. -->
Update comment for `-I` to mention that symbolic dim values can be
provided with `-f`.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Option is named onnxruntime_FORCE_GENERIC_ALGORITHMS
Follow up to https://github.com/microsoft/onnxruntime/pull/22125.
### Description
This change adds compile-time option to disable optimized algorithms and
use generic algorithms (exclude AVX* and SSE etc in GEMM) on x86. This
new option is intended only for testing these algorithms, not for
production use.
Following build command on linux x86_64 builds onnxruntime with new
option enabled:
`./build.sh --parallel --cmake_extra_defines
onnxruntime_FORCE_GENERIC_ALGORITHMS=1`
### Motivation and Context
This change allows testing generic algorithms. This may be needed for
platforms which don't have optimized implementations available, like in
https://github.com/microsoft/onnxruntime/pull/22125.
### Description
* Reduce GQA test combinations to save about 35 minutes test time in CI
pipelines.
* Show latency of transformers tests
* Use seed in DMMHA test to avoid random failure.
* For test_flash_attn_rocm.py, test skipping condition from "has cuda
ep" to "not has rocm ep", so that it does not run in cpu build.
* For test_flash_attn_cuda.py, move flash attention and memory efficient
attention tests to different classes, so that we can skip a test suite
instead of checking in each test.
### Motivation and Context
It takes too long to run GQA tests in CI pipelines since there are too
many combinations.
###### Linux GPU CI Pipeline
Before: 5097 passed, 68 skipped, 8 warnings in 1954.64s (0:32:34)
After: 150 passed, 176 skipped, 8 warnings in 530.38s (0:08:50)
Time Saved: **1424** seconds (0:23:44)
###### Windows GPU CUDA CI Pipeline
Before: 1781 passed, 72 skipped, 6 warnings in 605.48s (0:10:05)
After: 116 passed, 118 skipped, 6 warnings in 275.48s (0:04:35)
Time Saved: **330** seconds (0:05:30)
###### Linux CPU CI Pipeline
Before: 5093 passed, 72 skipped, 4 warnings in 467.04s (0:07:47)
- 212.96s transformers/test_gqa_cpu.py::TestGQA::test_gqa_past
- 154.12s transformers/test_gqa_cpu.py::TestGQA::test_gqa_no_past
- 26.45s
transformers/test_gqa_cpu.py::TestGQA::test_gqa_interactive_one_batch
After: 116 passed, 210 skipped, 4 warnings in 93.41s (0:01:33)
- 0.97s transformers/test_gqa_cpu.py::TestGQA::test_gqa_past
- 19.23s transformers/test_gqa_cpu.py::TestGQA::test_gqa_no_past
- 2.41s
transformers/test_gqa_cpu.py::TestGQA::test_gqa_interactive_one_batch
Time Saved: **374** seconds (0:06:14).
### Description
Match new SDPA pattern for huggingface BERT model that exported from
latest transformers package.
Some changes of transformers tests in CI pipeline:
(1) Enable tests for bert, distilbert and roberta models in CI.
(2) Remove out-of-date tests for huggingface models that were marked as
slow and not enabled in CI pipeline.
(3) Upgrade transformers package version to the latest.
### Motivation and Context
Recent huggingface transformers use torch SDPA in bert modeling. The
graph pattern change causes attention fusion not working anymore. Update
the fusion script to match the new pattern.
### Description
<!-- Describe your changes. -->
when updating from cp38 to cp310, there has some issues for bigmodel
pipeine. there are two jobs failed: stable_diffusion and whisper.
1. for stable_diffusion, we are now using
"nvcr.io/nvidia/pytorch:22.11-py3" from nvidia repo. it is for cuda11
and python3.8. and they are not providing python3.10 version for cuda
11. the latest version of this docker image is for cuda12 and
python3.10. To solve this problem, i use a docker image of ubuntu22.04,
and then install all need python package for this job.
2. for whisper. the original docker image is ubuntu20.04 which doesn't
have python3.10, and has to update to ubuntu22.04.
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
…ime/java (#22771)"
This reverts commit 632a36a233.
### Description
<!-- Describe your changes. -->
### Motivation and Context
Run E2E tests using Browserstack failed due to this PR.
This change fixes multiple tests like QDQTransformerTests.MatMul_U8S8S8,
for all architectures where architecture-specific
optimized function is not available yet, like s390x.
### Description
Matrix B is packed by 16 elements, thus new row starts 16 items later.
Also, for next C increment index only by 1 for each increment of C.
### Motivation and Context
This change fixes mlas sgemm fallback implementation for all
architectures which don't have architecture-specific implementations
available, like s390x.
### Description
<!-- Describe your changes. -->
Extend timeout for always failed job.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Chromium will rename split's output name from "output" to "outputs" in
`OpSupportLimits` to align with spec, the EP should check which name is
available to make it compatible.