Commit graph

12085 commits

Author SHA1 Message Date
Misha Chornyi
bf4d3e1a5b
Update vcpkg.json - lock flatbuffer version (#23046)
### Description
Locking version introduced in:

03ea5dc495/onnxruntime/core/flatbuffers/schema/ort_training_checkpoint.fbs.h (L11-L13)

### Motivation and Context
Resolve issue for version `>=1.20.` 
https://github.com/microsoft/onnxruntime/issues/22666
2024-12-10 11:23:01 -08:00
Jian Chen
5f7b9d0245
Upgrade gradle to 8.7 (#23016)
### Description
This PR only upgrade the gradle version and
`com.android.tools.build:gradle` version from build.gradle.

This only update the react-native library gradle version, not the e2e
test.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-12-10 10:49:03 -08:00
A-Satti
b14b4ec703
Restore Qspectre flag (#23060)
Restore a removed Qspectre flag and update comment

### Motivation and Context
Adjustment for PR
f5293d253c
2024-12-09 21:52:21 -08:00
Scott McKay
708ee8556e
Reduce default logger usage (#23030)
### Description
<!-- Describe your changes. -->
We have use cases where multiple sessions are created concurrently.
Minimizing the usage of the default logger is important for these
scenarios.

Wire through the session logger to as many places as possible. The EP
logger can also be used once the session is created (can't be used
during EP construction/kernel registration but can be used in
GetCapability and Compile).

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Improve logging when there are concurrent sessions.
2024-12-10 12:54:14 +11:00
wejoncy
e12421be30
[CoreML] more performace flag (#22975)
### Description
refactor unsquzee's implementation
add more flags to boost peformance.
add profile flag


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: jicwen <jicwen@YiMacBook-Pro.local>
Co-authored-by: wejoncy <wejoncy@.com>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
2024-12-10 09:35:05 +08:00
amancini-N
8f3384b4c1
Fix BeamSearch T5 if initializers are on outer scope (#23044)
### Description
This PR adds the logic needed to consider only the needed implicit
inputs on BeamSearch op in case of T5 model (encoder/decoder, 2 graphs).
The logic added is similar to what happens in the _If_ kernel setup.


### Motivation and Context
Fixes #23043
2024-12-09 15:15:20 -08:00
Scott McKay
2f2c73bdde
Miscellaneous cleanups (#23048)
### Description
<!-- Describe your changes. -->
- fix some missing end of version markers and since_version info
- fix include to use onnx_protobuf.h which handles minimal builds
- we should always prefer that header over directly using the onnx ones


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-12-10 09:24:16 +11:00
Yulong Wang
22ae97c7dc
[webgpu] Add Alias def for Flatten (#23038)
### Description

Add `Alias` definition for Flatten in WebGPU EP.

also add int32/uint32 in type constraint T.
2024-12-09 14:19:43 -08:00
Wanming Lin
6d9636f07c
[WebNN] Allow ops to handle ignoring an empty tensor as input (#22972)
### Description
Some ops should allow empty tensor as input, e.g. roi, scales inputs in
Resize
### Motivation and Context
It avoid some unexpected fallback for optional input with empty tensor.
e.g. roi and scales are both optional inputs in Resize, in some models
they have non-empty name but with empty initializer presented as `[0]`,
WebNN currently will fallback all nodes with 0 dimension, which is not
expected.

![image](https://github.com/user-attachments/assets/599ba351-b5f6-49ac-8a1f-69fb28dbaf9b)
2024-12-06 17:58:15 -08:00
A-Satti
f5293d253c
Update Intel Thread Counts (#22894)
### Description
The default thread count methodology by onnxruntime did not account for
new upcoming Intel microarchitectures leading to a suboptimal thread
count. Optimizing the thread count for new Intel microarchitectures
reveal gains on the majority of models across datatypes and shows gains
up to ~1.5x speedup.


### Motivation and Context
Applications should run on Intel with the most performant thread
configuration for the majority of models. With new microarchitectures,
adjusting the thread count methodology is required to take advantage of
their differences.
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-12-06 13:56:50 -08:00
Jing Fang
bd5a759d0c
[ARM CPU] Add rotary embedding fp16 kernel (#23013)
### Description
Add fp16 kernel to rotary embedding to boost performance.


### Motivation and Context
Part of performance optimization work for group query attention
2024-12-06 13:25:48 -08:00
Hector Li
401d16c671
Enable QNN HTP spill fill buffer setting to save RAM usage. (#22853)
### Description
Enable QNN HTP spill fill buffer setting to save RAM usage.
This feature is available after QNN 2.28. Need to re-generate QNN
context binary.

https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/htp_backend.html#qnn-htp-backend-api

Requirements:
1. Need to re-generate the Onnx model with QNN context binary by set the
EP option enable_htp_spill_fill_buffer = 1.
2. Works for a model with multiple Context binaries. Need manually merge
2 Onnx model with context binary into 1 Onnx model.
3. Requires Linux platform if generate the context binary offline since
QnnSystem lib is not available for Windows x86_64 platform.
No need to do extra thing while running the model inference.

The generated EPContext node will have a max_size attribute with the
maximum spill fill buffer size for the context binary
<img width="353" alt="image"
src="https://github.com/user-attachments/assets/a3bf48be-a8da-4381-8a1d-3f2558eea37d">

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-12-06 11:36:52 -08:00
dependabot[bot]
d27fecd3d3
Bump cross-spawn from 6.0.5 to 6.0.6 in /js/web (#23019)
Bumps [cross-spawn](https://github.com/moxystudio/node-cross-spawn) from
6.0.5 to 6.0.6.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/moxystudio/node-cross-spawn/blob/v6.0.6/CHANGELOG.md">cross-spawn's
changelog</a>.</em></p>
<blockquote>
<h2><a
href="https://github.com/moxystudio/node-cross-spawn/compare/v6.0.5...v6.0.6">6.0.6</a>
(2024-11-18)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>disable regexp backtracking (<a
href="https://redirect.github.com/moxystudio/node-cross-spawn/issues/160">#160</a>)
(<a
href="https://github.com/moxystudio/node-cross-spawn/commit/ba5aaef">ba5aaef</a>)</li>
<li><strong>core:</strong> support worker threads (<a
href="https://redirect.github.com/moxystudio/node-cross-spawn/issues/127">#127</a>)
(<a
href="https://github.com/moxystudio/node-cross-spawn/commit/f4af31c">f4af31c</a>)</li>
</ul>
<p><!-- raw HTML omitted --><!-- raw HTML omitted --></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="d35c865b87"><code>d35c865</code></a>
chore(release): 6.0.6</li>
<li><a
href="5a37e19173"><code>5a37e19</code></a>
chore: update package.json and package.lock</li>
<li><a
href="ba5aaef783"><code>ba5aaef</code></a>
fix: disable regexp backtracking (<a
href="https://redirect.github.com/moxystudio/node-cross-spawn/issues/160">#160</a>)</li>
<li><a
href="f4af31c8ee"><code>f4af31c</code></a>
fix(core): support worker threads (<a
href="https://redirect.github.com/moxystudio/node-cross-spawn/issues/127">#127</a>)</li>
<li>See full diff in <a
href="https://github.com/moxystudio/node-cross-spawn/compare/v6.0.5...v6.0.6">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=cross-spawn&package-manager=npm_and_yarn&previous-version=6.0.5&new-version=6.0.6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
Dependabot will merge this PR once it's up-to-date and CI passes on it,
as requested by @fs-eire.

[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-05 10:07:08 -08:00
Yi Zhang
6ed77cc374
Deprecate macos-12 (#23017)
### Description
<!-- Describe your changes. -->



### Motivation and Context
ESRP code-sign task has supported .net 8, so we can remove macos-12
2024-12-05 14:07:21 +08:00
Yulong Wang
1c79a4c9dd
[js/common] use TS type inference to eliminate unknown (#23012)
### Description

This change uses a TypeScript trick to infer global types in
onnxruntime-common. Thanks to the strong type system of TypeScript, we
are able to refer to types that may not be available in the context.

This helps to keep onnxruntime-common not to include dependencies like
"@webgpu/types", and still being able to use the types in the
declaration. See comments of `TryGetGlobalType` in `type-helper.ts`.
2024-12-04 19:01:26 -08:00
Jian Chen
f340b3cad3
Adding DML to python cuda package (#22606) 2024-12-04 21:20:12 -05:00
Yulong Wang
3234487385
[js] remove more unused training types (#22753)
### Description

remove more unused training types
2024-12-04 16:44:09 -08:00
dependabot[bot]
3975e79303
Bump axios from 1.6.1 to 1.7.9 in /js/node (#23009) 2024-12-04 23:52:24 +00:00
Wanming Lin
cacd97dba3
[WebNN] Improve the util function of creating WebNN constant MLOperand (#22935)
Merge the util functions to create or retrieve:
- A WebNN constant MLOperand filled with the specified value, data type,
and shape.
- A WebNN scalar constant MLOperand with the specified value and data
type.
2024-12-04 15:09:54 -08:00
Jing Fang
fbe22fdac7
[ARM CPU] Fix flaky hqnbitgemm UT (#23010)
### Description
Increase fp16 qnbitgemm UT tol and use fixed seeds.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-12-04 14:55:52 -08:00
Yulong Wang
7b0fa407eb
fix requirements.txt path (#22946)
### Description

#22380 removes the file
`tools/ci_build/github/linux/docker/inference/x86_64/python/cpu/scripts/requirements.txt`
but it is still used in `dockerfiles/Dockerfile.cuda`.

This change updates the file path of the requirements.txt

fixes #22945.
2024-12-04 13:08:29 -08:00
Yulong Wang
d0dde4f7d4
[wasm/test] update packages versions (#23008)
### Description

Upgrade packages version to resolve the following dependabot alerts:
- https://github.com/microsoft/onnxruntime/security/dependabot/269
- https://github.com/microsoft/onnxruntime/security/dependabot/268
- https://github.com/microsoft/onnxruntime/security/dependabot/275
- https://github.com/microsoft/onnxruntime/security/dependabot/306



```
# npm audit report

braces  <3.0.3
Severity: high
Uncontrolled resource consumption in braces - https://github.com/advisories/GHSA-grv7-fg5c-xmjg
fix available via `npm audit fix`
node_modules/braces

cookie  <0.7.0
cookie accepts cookie name, path, and domain with out of bounds characters - https://github.com/advisories/GHSA-pxg6-pf52-xh8x
fix available via `npm audit fix`
node_modules/cookie
  engine.io  0.7.8 - 0.7.9 || 1.8.0 - 6.6.1
  Depends on vulnerable versions of cookie
  Depends on vulnerable versions of ws
  node_modules/engine.io
    socket.io  1.6.0 - 4.7.5
    Depends on vulnerable versions of engine.io
    node_modules/socket.io


ws  8.0.0 - 8.17.0
Severity: high
ws affected by a DoS when handling a request with many HTTP headers - https://github.com/advisories/GHSA-3h5v-q93c-6h6q
fix available via `npm audit fix`
node_modules/ws
  socket.io-adapter  2.5.2 - 2.5.4
  Depends on vulnerable versions of ws
  node_modules/socket.io-adapter

6 vulnerabilities (1 low, 1 moderate, 4 high)

```
2024-12-04 13:08:13 -08:00
Yulong Wang
fdf5ffe2cf
[js/node] fix TypeScript declaration in onnxruntime-node (#23000)
### Description
fix TypeScript declaration in onnxruntime-node

### Motivation and Context

Fixes #22978
2024-12-04 11:29:27 -08:00
Xu Xing
c19617a24a
[js/webgpu] Add GatherND (#22847)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-12-04 09:57:32 -08:00
Yulong Wang
a615bd6688
Bump version of Dawn to 12a3b24c4 (#23002)
### Description

Upgrade version of Dawn.

Removed dawn.patch, because all patches are included in upstream.

Updated code that affected by API changes (`const char*` ->
`WGPUStringView`)


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-12-04 09:47:16 -08:00
Yulong Wang
50b38ca9d5
[js/web] update default export to include webgpu (#22754)
### Description

This PR changes the following exports:
- `onnxruntime-web` now is same to `onnxruntime-web/webgpu`.
- `onnxruntime-web/webgpu` is deprecating.

### Migration instructions:
- use `onnxruntime-web` instead of `onnxruntime-web/webgpu`.
- use `onnxruntime-web/wasm` if want to use onnxruntime-web without
webgpu/webnn.

### Export table

| file name | export entry | includes WASM | includes JSEP (WebGPU &
WebNN) | includes WebGL
| ------------- | ------------- | ----- | ----- | -----
| ort.all.min.js<br/>ort.all.js<br/>ort.all.min.mjs<br/>ort.all.mjs |
`onnxruntime-web/all` | ✔️| ✔️| ✔️
| ort.min.js<br/>ort.js<br/>ort.min.mjs<br/>ort.mjs | `onnxruntime-web`
| ✔️|  --> ✔️| ✔️ -->
|
ort.webgpu.min.js<br/>ort.webgpu.js<br/>ort.webgpu.min.mjs<br/>ort.webgpu.mjs
| `onnxruntime-web/webgpu` | ✔️ | ✔️ |
| ort.wasm.min.js<br/>ort.wasm.js<br/>ort.wasm.min.mjs<br/>ort.wasm.mjs
| `onnxruntime-web/wasm` | ✔️ |  |
2024-12-04 09:46:45 -08:00
Chi Lo
9b9f881475
[TensorRT EP] Use TRT/CUDA/ORT version from runtime instead of build time to generate hash value (#22921)
Use TensorRT and CUDA version fetched at **runtime** to get the hash
value which determines the cache name.

The old way to get the version is at compile/build time that might have
some issues in some cases,
ex:
TRT EP uses the TRT version which we or users built against at compile
time.
However, users can change different TRT version at run time, that can
cause issue because TRT EP always checks the "fixed" TRT version, not
the TRT version it uses now. This can cause TRT EP to use incompatible
TRT engine cache.

see the github issue here:

https://github.com/microsoft/onnxruntime/issues/22382#issuecomment-2404140754
2024-12-03 21:58:43 -08:00
dependabot[bot]
bd701e4f33
Bump cross-spawn from 7.0.3 to 7.0.6 in /js (#23003) 2024-12-04 05:07:21 +00:00
Yulong Wang
06526af346
[js/webgpu] fix a bug in transpose shader (#22997)
### Description

Fix a bug in transpose shader, when input/output rank is 1.

### Motivation and Context

Fixes #22994
2024-12-03 20:21:08 -08:00
Yulong Wang
e84b8e7bd5
allow specify a custom local source path for Dawn (#22999)
### Description

Allows to build ONNX Runtime with a custom local path of Dawn's source
code.

Usage:
```sh
build --use_webgpu --cmake_extra_defines "onnxruntime_CUSTOM_DAWN_SRC_PATH=C:/src/dawn"

```
2024-12-03 19:25:22 -08:00
dependabot[bot]
4497c97d54
Bump cross-spawn from 7.0.3 to 7.0.6 in /js/node (#22998)
Bumps [cross-spawn](https://github.com/moxystudio/node-cross-spawn) from
7.0.3 to 7.0.6.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/moxystudio/node-cross-spawn/blob/master/CHANGELOG.md">cross-spawn's
changelog</a>.</em></p>
<blockquote>
<h3><a
href="https://github.com/moxystudio/node-cross-spawn/compare/v7.0.5...v7.0.6">7.0.6</a>
(2024-11-18)</h3>
<h3>Bug Fixes</h3>
<ul>
<li>update cross-spawn version to 7.0.5 in package-lock.json (<a
href="f700743918">f700743</a>)</li>
</ul>
<h3><a
href="https://github.com/moxystudio/node-cross-spawn/compare/v7.0.4...v7.0.5">7.0.5</a>
(2024-11-07)</h3>
<h3>Bug Fixes</h3>
<ul>
<li>fix escaping bug introduced by backtracking (<a
href="640d391fde">640d391</a>)</li>
</ul>
<h3><a
href="https://github.com/moxystudio/node-cross-spawn/compare/v7.0.3...v7.0.4">7.0.4</a>
(2024-11-07)</h3>
<h3>Bug Fixes</h3>
<ul>
<li>disable regexp backtracking (<a
href="https://redirect.github.com/moxystudio/node-cross-spawn/issues/160">#160</a>)
(<a
href="5ff3a07d9a">5ff3a07</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="77cd97f3ca"><code>77cd97f</code></a>
chore(release): 7.0.6</li>
<li><a
href="6717de49ff"><code>6717de4</code></a>
chore: upgrade standard-version</li>
<li><a
href="f700743918"><code>f700743</code></a>
fix: update cross-spawn version to 7.0.5 in package-lock.json</li>
<li><a
href="9a7e3b2165"><code>9a7e3b2</code></a>
chore: fix build status badge</li>
<li><a
href="085268352d"><code>0852683</code></a>
chore(release): 7.0.5</li>
<li><a
href="640d391fde"><code>640d391</code></a>
fix: fix escaping bug introduced by backtracking</li>
<li><a
href="bff0c87c8b"><code>bff0c87</code></a>
chore: remove codecov</li>
<li><a
href="a7c6abc6fe"><code>a7c6abc</code></a>
chore: replace travis with github workflows</li>
<li><a
href="9b9246e096"><code>9b9246e</code></a>
chore(release): 7.0.4</li>
<li><a
href="5ff3a07d9a"><code>5ff3a07</code></a>
fix: disable regexp backtracking (<a
href="https://redirect.github.com/moxystudio/node-cross-spawn/issues/160">#160</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/moxystudio/node-cross-spawn/compare/v7.0.3...v7.0.6">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=cross-spawn&package-manager=npm_and_yarn&previous-version=7.0.3&new-version=7.0.6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
Dependabot will merge this PR once CI passes on it, as requested by
@fs-eire.

[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-03 18:48:22 -08:00
Yulong Wang
d3bc3180d8
[js/node] fix CUDA artifact installation script for Linux/x64 (#22984)
### Description

This PR updates installation script to fix it for CUDA v12. However, it
may be difficult for CUDA v11 since the steps are quite complicated to
automate. Added a few lines of instructions instead.

fixes #22877
2024-12-03 16:07:43 -08:00
Prathik Rao
5c644d3747
[WebGPU EP] Flatten implementation (#22964)
Implements flatten operator for native webgpu.
2024-12-03 14:40:57 -08:00
Jian Chen
9ed0c7fe26
Redo "Update Gradle version 8.7 and java version 17 within onnxruntime/java" (#22923)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-12-02 18:34:25 -08:00
Edward Chen
e2356a0403
Use UTF8 string encoding in ORTSaveCodeAndDescriptionToError(). (#22982)
Update from ASCII to UTF8 string encoding when creating the `NSString` description.
2024-12-02 17:41:52 -08:00
Kee
8c52fa3924
[VSINPU]Split/Pad and some element-wise OPs support (#22916)
### Description
-Add split/pad/neg/not/ceil/round/min/max op support
-Fix conv2d op default pads value issue
-Add VSINPU EP to support python bindings


### Motivation and Context
-New OPs support for VSINPU EP

---------

Signed-off-by: Kee <xuke537@hotmail.com>
2024-12-02 13:57:30 -08:00
Satya Kumar Jandhyala
e8bf46a70e
[WebGPU EP] Support GroupQueryAttention (#22658)
### Description
<!-- Describe your changes. -->
Support GroupQueryAttention operator for native webgpu ep.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This is required for inferencing some LLMs.
2024-12-02 12:40:03 -08:00
Jian Chen
6c2ff5fc55
Refactor emulator start and stop functions for clarity and efficiency (#22861)
### Description
This pull request introduces several enhancements and new
functionalities to the `tools/python/util/android/android.py` file,
focusing on improving the management of Android emulators. The most
important changes include adding a timeout parameter to the
`start_emulator` function, adding checks to prevent multiple emulators
from running simultaneously, and introducing new utility functions to
manage emulator processes more effectively.

Enhancements to `start_emulator` function:

* Added a `timeout_minutes` parameter to the `start_emulator` function
to make the startup timeout configurable.
[[1]](diffhunk://#diff-c54db556a9c445989f830c09ab90ce2704e648deaccce9c9e0ee4875ddaa864dL108-R117)
[[2]](diffhunk://#diff-c54db556a9c445989f830c09ab90ce2704e648deaccce9c9e0ee4875ddaa864dL158-R170)
* Added a check to prevent starting a new emulator if one with the same
AVD name is already running.
* Included additional emulator arguments `-verbose` for better control
and debugging.
* Added a final verification step to ensure the emulator has started
successfully.

New utility functions for managing emulator processes:

* Introduced `check_emulator_running_using_avd_name `,
`check_emulator_running_using_process`, and
`check_emulator_running_using_pid` to check if an emulator is running
based on AVD name, process instance, or PID, respectively.
* Added `stop_emulator_by_proc` and `stop_emulator_by_pid` functions to
stop the emulator process using a `subprocess.Popen` instance or PID,
with a configurable timeout.
* Updated the `stop_emulator` function to use the new utility functions
for stopping the emulator process.

These changes enhance the robustness and flexibility of the emulator
management utilities, making it easier to handle different scenarios in
CI environments and development workflows.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
2024-12-02 09:29:17 -08:00
Chi Lo
e234023d11
[TensorRT EP] Fix wrong input order when generating IndexedSubGraph (#22857)
The input order of generated indexedSubGraph needs to be consistent with
the input order of original graph.

This PR will also fix the github issue
https://github.com/microsoft/onnxruntime/issues/22729
2024-12-02 01:45:29 -08:00
Chi Lo
49a80df77f
Keep the model metadata on the generated EP context model (use bridge api) (#22860)
In addition to the
[PR](https://github.com/microsoft/onnxruntime/pull/22825) which directly
uses internal graph api, this PR updates the bridge api for the case of
TRT EP and OpenVINO EP.
2024-12-01 21:57:45 -08:00
Vincent Wang
1128882bfd
Quantize Bias for Conv/Gemm on Quantized Model (#22889)
Some quantized models don't have Conv/Gemm node's bias quantized but
still leave them in float. This PR is to create a sub-graph to quantize
the bias for Conv/Gemm nodes with scale = scale_input_0 * scale_input_1
and zp = 0. We only do this for bias initializer so that ConstantFolding
will fold the sub-graph to a real quantized int32 bias initializer
during the graph optimization next round.
2024-11-28 10:10:24 +08:00
Vincent Wang
42ecb05080
[QNN] ReduceL2 Support (#22636)
Add ReduceL2 support to QNN EP. Some of the QNN AI Hub models contain
Reduce L2, such as openai_clip_CLIPTextEncoder and
openai_clip_CLIPIamgeEncoder, without this PR, the ReduceL2 will be
assigned to CPU and the graph will be split to 2 QNN graphs, which this
PR, all nodes will be in QNN EP.
2024-11-28 10:09:13 +08:00
Jing Fang
08abab0b14
[CPU] Fix mamtulnbits accuracy level (#22963)
### Description
Fix mamtulnbits accuracy level



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-11-27 17:40:04 -08:00
wejoncy
a24723df16
[CoreML ] ML Program more operators support [3/N] (#22710)
### Description
- Erf
- Round
- Max
- ReduceMax
- ReduceMean
- ReduceSum
- Unsqueeze
- Squeeze
- Softmax



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Scott McKay <skottmckay@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-11-28 09:21:02 +08:00
Yi Zhang
b930b4ab5b
Limit PipAuthenticate in Private Project now (#22954)
### Description
Fixes regression in post merge pipeline caused by #22612



### Motivation and Context
So far, there isn't  the artifactFeeds in Public Project
2024-11-27 13:32:35 +08:00
Wanming Lin
fe749a88a5
[WebNN EP] Fixed bug in usage of Array.reduce() (#22944)
In JS, reduce of empty array with no initial value will throw error. Fix
it by checking the array length firstly.
2024-11-26 19:03:44 -08:00
wejoncy
c284a686f2
[CoreML] Create EP by AppendExecutionProvider (#22675)
### Description
AppendExecutionProvider("CoreML", {{"MLComputeUnits","MLProgram"}})



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Scott McKay <skottmckay@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-11-27 09:26:31 +08:00
Chen Feiyue
487184fa42
[VSINPU] update crosscompiling patch (#22937)
### Description
<!-- Describe your changes. -->
Update this patch because the origin file has changed


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-11-26 14:35:16 -08:00
amancini-N
8826e39a81
#22890 Fix profiling on empty Optional (#22891)
### Description
Fix sequential_executor.cc to avoid segfault when profiling is used on
model with empty Optional



### Motivation and Context
Fixes #22890
2024-11-26 11:18:47 -08:00
shiyi
afbb53937c
[WebNN] Support negative steps for slice (#22871)
Slice with negative steps can be emulated by reverse+slice.
2024-11-25 23:06:23 -08:00