### Description
This PR only upgrade the gradle version and
`com.android.tools.build:gradle` version from build.gradle.
This only update the react-native library gradle version, not the e2e
test.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
We have use cases where multiple sessions are created concurrently.
Minimizing the usage of the default logger is important for these
scenarios.
Wire through the session logger to as many places as possible. The EP
logger can also be used once the session is created (can't be used
during EP construction/kernel registration but can be used in
GetCapability and Compile).
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Improve logging when there are concurrent sessions.
### Description
refactor unsquzee's implementation
add more flags to boost peformance.
add profile flag
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
---------
Co-authored-by: jicwen <jicwen@YiMacBook-Pro.local>
Co-authored-by: wejoncy <wejoncy@.com>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
### Description
This PR adds the logic needed to consider only the needed implicit
inputs on BeamSearch op in case of T5 model (encoder/decoder, 2 graphs).
The logic added is similar to what happens in the _If_ kernel setup.
### Motivation and Context
Fixes#23043
### Description
<!-- Describe your changes. -->
- fix some missing end of version markers and since_version info
- fix include to use onnx_protobuf.h which handles minimal builds
- we should always prefer that header over directly using the onnx ones
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Some ops should allow empty tensor as input, e.g. roi, scales inputs in
Resize
### Motivation and Context
It avoid some unexpected fallback for optional input with empty tensor.
e.g. roi and scales are both optional inputs in Resize, in some models
they have non-empty name but with empty initializer presented as `[0]`,
WebNN currently will fallback all nodes with 0 dimension, which is not
expected.

### Description
The default thread count methodology by onnxruntime did not account for
new upcoming Intel microarchitectures leading to a suboptimal thread
count. Optimizing the thread count for new Intel microarchitectures
reveal gains on the majority of models across datatypes and shows gains
up to ~1.5x speedup.
### Motivation and Context
Applications should run on Intel with the most performant thread
configuration for the majority of models. With new microarchitectures,
adjusting the thread count methodology is required to take advantage of
their differences.
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Add fp16 kernel to rotary embedding to boost performance.
### Motivation and Context
Part of performance optimization work for group query attention
### Description
Enable QNN HTP spill fill buffer setting to save RAM usage.
This feature is available after QNN 2.28. Need to re-generate QNN
context binary.
https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/htp_backend.html#qnn-htp-backend-api
Requirements:
1. Need to re-generate the Onnx model with QNN context binary by set the
EP option enable_htp_spill_fill_buffer = 1.
2. Works for a model with multiple Context binaries. Need manually merge
2 Onnx model with context binary into 1 Onnx model.
3. Requires Linux platform if generate the context binary offline since
QnnSystem lib is not available for Windows x86_64 platform.
No need to do extra thing while running the model inference.
The generated EPContext node will have a max_size attribute with the
maximum spill fill buffer size for the context binary
<img width="353" alt="image"
src="https://github.com/user-attachments/assets/a3bf48be-a8da-4381-8a1d-3f2558eea37d">
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
### Description
This change uses a TypeScript trick to infer global types in
onnxruntime-common. Thanks to the strong type system of TypeScript, we
are able to refer to types that may not be available in the context.
This helps to keep onnxruntime-common not to include dependencies like
"@webgpu/types", and still being able to use the types in the
declaration. See comments of `TryGetGlobalType` in `type-helper.ts`.
Merge the util functions to create or retrieve:
- A WebNN constant MLOperand filled with the specified value, data type,
and shape.
- A WebNN scalar constant MLOperand with the specified value and data
type.
### Description
Increase fp16 qnbitgemm UT tol and use fixed seeds.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
#22380 removes the file
`tools/ci_build/github/linux/docker/inference/x86_64/python/cpu/scripts/requirements.txt`
but it is still used in `dockerfiles/Dockerfile.cuda`.
This change updates the file path of the requirements.txt
fixes#22945.
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Upgrade version of Dawn.
Removed dawn.patch, because all patches are included in upstream.
Updated code that affected by API changes (`const char*` ->
`WGPUStringView`)
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Use TensorRT and CUDA version fetched at **runtime** to get the hash
value which determines the cache name.
The old way to get the version is at compile/build time that might have
some issues in some cases,
ex:
TRT EP uses the TRT version which we or users built against at compile
time.
However, users can change different TRT version at run time, that can
cause issue because TRT EP always checks the "fixed" TRT version, not
the TRT version it uses now. This can cause TRT EP to use incompatible
TRT engine cache.
see the github issue here:
https://github.com/microsoft/onnxruntime/issues/22382#issuecomment-2404140754
### Description
Allows to build ONNX Runtime with a custom local path of Dawn's source
code.
Usage:
```sh
build --use_webgpu --cmake_extra_defines "onnxruntime_CUSTOM_DAWN_SRC_PATH=C:/src/dawn"
```
Bumps [cross-spawn](https://github.com/moxystudio/node-cross-spawn) from
7.0.3 to 7.0.6.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/moxystudio/node-cross-spawn/blob/master/CHANGELOG.md">cross-spawn's
changelog</a>.</em></p>
<blockquote>
<h3><a
href="https://github.com/moxystudio/node-cross-spawn/compare/v7.0.5...v7.0.6">7.0.6</a>
(2024-11-18)</h3>
<h3>Bug Fixes</h3>
<ul>
<li>update cross-spawn version to 7.0.5 in package-lock.json (<a
href="f700743918">f700743</a>)</li>
</ul>
<h3><a
href="https://github.com/moxystudio/node-cross-spawn/compare/v7.0.4...v7.0.5">7.0.5</a>
(2024-11-07)</h3>
<h3>Bug Fixes</h3>
<ul>
<li>fix escaping bug introduced by backtracking (<a
href="640d391fde">640d391</a>)</li>
</ul>
<h3><a
href="https://github.com/moxystudio/node-cross-spawn/compare/v7.0.3...v7.0.4">7.0.4</a>
(2024-11-07)</h3>
<h3>Bug Fixes</h3>
<ul>
<li>disable regexp backtracking (<a
href="https://redirect.github.com/moxystudio/node-cross-spawn/issues/160">#160</a>)
(<a
href="5ff3a07d9a">5ff3a07</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="77cd97f3ca"><code>77cd97f</code></a>
chore(release): 7.0.6</li>
<li><a
href="6717de49ff"><code>6717de4</code></a>
chore: upgrade standard-version</li>
<li><a
href="f700743918"><code>f700743</code></a>
fix: update cross-spawn version to 7.0.5 in package-lock.json</li>
<li><a
href="9a7e3b2165"><code>9a7e3b2</code></a>
chore: fix build status badge</li>
<li><a
href="085268352d"><code>0852683</code></a>
chore(release): 7.0.5</li>
<li><a
href="640d391fde"><code>640d391</code></a>
fix: fix escaping bug introduced by backtracking</li>
<li><a
href="bff0c87c8b"><code>bff0c87</code></a>
chore: remove codecov</li>
<li><a
href="a7c6abc6fe"><code>a7c6abc</code></a>
chore: replace travis with github workflows</li>
<li><a
href="9b9246e096"><code>9b9246e</code></a>
chore(release): 7.0.4</li>
<li><a
href="5ff3a07d9a"><code>5ff3a07</code></a>
fix: disable regexp backtracking (<a
href="https://redirect.github.com/moxystudio/node-cross-spawn/issues/160">#160</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/moxystudio/node-cross-spawn/compare/v7.0.3...v7.0.6">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
Dependabot will merge this PR once CI passes on it, as requested by
@fs-eire.
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
### Description
This PR updates installation script to fix it for CUDA v12. However, it
may be difficult for CUDA v11 since the steps are quite complicated to
automate. Added a few lines of instructions instead.
fixes#22877
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
-Add split/pad/neg/not/ceil/round/min/max op support
-Fix conv2d op default pads value issue
-Add VSINPU EP to support python bindings
### Motivation and Context
-New OPs support for VSINPU EP
---------
Signed-off-by: Kee <xuke537@hotmail.com>
### Description
<!-- Describe your changes. -->
Support GroupQueryAttention operator for native webgpu ep.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This is required for inferencing some LLMs.
### Description
This pull request introduces several enhancements and new
functionalities to the `tools/python/util/android/android.py` file,
focusing on improving the management of Android emulators. The most
important changes include adding a timeout parameter to the
`start_emulator` function, adding checks to prevent multiple emulators
from running simultaneously, and introducing new utility functions to
manage emulator processes more effectively.
Enhancements to `start_emulator` function:
* Added a `timeout_minutes` parameter to the `start_emulator` function
to make the startup timeout configurable.
[[1]](diffhunk://#diff-c54db556a9c445989f830c09ab90ce2704e648deaccce9c9e0ee4875ddaa864dL108-R117)
[[2]](diffhunk://#diff-c54db556a9c445989f830c09ab90ce2704e648deaccce9c9e0ee4875ddaa864dL158-R170)
* Added a check to prevent starting a new emulator if one with the same
AVD name is already running.
* Included additional emulator arguments `-verbose` for better control
and debugging.
* Added a final verification step to ensure the emulator has started
successfully.
New utility functions for managing emulator processes:
* Introduced `check_emulator_running_using_avd_name `,
`check_emulator_running_using_process`, and
`check_emulator_running_using_pid` to check if an emulator is running
based on AVD name, process instance, or PID, respectively.
* Added `stop_emulator_by_proc` and `stop_emulator_by_pid` functions to
stop the emulator process using a `subprocess.Popen` instance or PID,
with a configurable timeout.
* Updated the `stop_emulator` function to use the new utility functions
for stopping the emulator process.
These changes enhance the robustness and flexibility of the emulator
management utilities, making it easier to handle different scenarios in
CI environments and development workflows.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
Some quantized models don't have Conv/Gemm node's bias quantized but
still leave them in float. This PR is to create a sub-graph to quantize
the bias for Conv/Gemm nodes with scale = scale_input_0 * scale_input_1
and zp = 0. We only do this for bias initializer so that ConstantFolding
will fold the sub-graph to a real quantized int32 bias initializer
during the graph optimization next round.
Add ReduceL2 support to QNN EP. Some of the QNN AI Hub models contain
Reduce L2, such as openai_clip_CLIPTextEncoder and
openai_clip_CLIPIamgeEncoder, without this PR, the ReduceL2 will be
assigned to CPU and the graph will be split to 2 QNN graphs, which this
PR, all nodes will be in QNN EP.
### Description
Fix mamtulnbits accuracy level
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
- Erf
- Round
- Max
- ReduceMax
- ReduceMean
- ReduceSum
- Unsqueeze
- Squeeze
- Softmax
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
---------
Co-authored-by: Scott McKay <skottmckay@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
### Description
Fixes regression in post merge pipeline caused by #22612
### Motivation and Context
So far, there isn't the artifactFeeds in Public Project
### Description
AppendExecutionProvider("CoreML", {{"MLComputeUnits","MLProgram"}})
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
---------
Co-authored-by: Scott McKay <skottmckay@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
### Description
<!-- Describe your changes. -->
Update this patch because the origin file has changed
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Fix sequential_executor.cc to avoid segfault when profiling is used on
model with empty Optional
### Motivation and Context
Fixes#22890