Commit graph

232 commits

Author SHA1 Message Date
Satya Kumar Jandhyala
544bdd6073
Fix ConvTranspose for certain attribute combinations (#23488)
### Description
Convert output_padding attribute from 1D to 2D convtranspose



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
https://github.com/microsoft/onnxruntime/issues/23403
2025-02-05 12:22:47 -08:00
Yulong Wang
fbae88f5ad
[js/web] use the recommended workaround for Vite (#23531)
### Description

After some investigation and debug, I decided to follow the recommended
workaround as suggested in https://github.com/vitejs/vite/issues/8427.

### Motivation and Context

There is a known issue with Vite 5.x when using WebAssembly package.
Detail information is in https://github.com/vitejs/vite/issues/8427.

There are previous attempts to fix this problem (#23487). I tried
various ways to make it working out of the box for Vite users but none
of them worked: Some "fixes" did fix the usage of Vite but broke other
use case/bundler and some introduced other issues. Eventually I figured
out that there is no good way to fix this inside ONNX Runtime.

Considering the root cause is inside Vite and it may be fixed in Vite
v6. I think now the best way is to follow the recommended workaround.
2025-01-29 17:38:22 -08:00
Yulong Wang
bf023ab3d5
[js/web] allow import .mjs/.wasm file (#23487)
### Description

Allow importing the `.mjs` and `.wasm` files.

when using Vite, this enables web app to consume ORT-web for simplify
the setup:
   ```js
   import * as ort from 'onnxruntime-web';

   import wasmFileUrl from 'onnxruntime-web/.wasm?url';
   ort.env.wasm.wasmPaths = { wasm: wasmFileUrl };
2025-01-28 16:24:41 -08:00
Jiajia Qin
25f427466e
[js/webgpu] Optimize ConvTranspose (Continue) (#23429)
BUG #23273

This PR does below optimizations:
1. When output channels is one, 1) calculate the offset before the
inchannel loop to reduce indices to offsets calculation, 2) split the
`inputChannelsPerGroup` into `inputChannelsPerGroupInt` and
`inputChannelsRemainder` parts so that we can always access 4 data for
`inputChannelsPerGroupInt`.
2. Use precise initial value to reduce useless loop iterations. Thanks
@jiangzhaoming 's suggestion's on this.

With this PR, ConvTranspose becomes 3.7s from 8.4s on Intel Meteor Lake.
On NV RTX 2000 Ada, it becomes 1.6s from 2.7s.
2025-01-22 08:59:17 -08:00
dependabot[bot]
f4dc965522
Bump vite from 6.0.7 to 6.0.11 in /js/web/test/e2e/exports/testcases/vite-default (#23446)
Bumps [vite](https://github.com/vitejs/vite/tree/HEAD/packages/vite)
from 6.0.7 to 6.0.11.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/vitejs/vite/releases">vite's
releases</a>.</em></p>
<blockquote>
<h2>v6.0.11</h2>
<p>Please refer to <a
href="https://github.com/vitejs/vite/blob/v6.0.11/packages/vite/CHANGELOG.md">CHANGELOG.md</a>
for details.</p>
<h2>v6.0.10</h2>
<p>Please refer to <a
href="https://github.com/vitejs/vite/blob/v6.0.10/packages/vite/CHANGELOG.md">CHANGELOG.md</a>
for details.</p>
<h2>v6.0.9</h2>
<p>This version contains a breaking change due to security fixes. See <a
href="https://github.com/vitejs/vite/security/advisories/GHSA-vg6x-rcgg-rjx6">https://github.com/vitejs/vite/security/advisories/GHSA-vg6x-rcgg-rjx6</a>
for more details.</p>
<p>Please refer to <a
href="https://github.com/vitejs/vite/blob/v6.0.9/packages/vite/CHANGELOG.md">CHANGELOG.md</a>
for details.</p>
<h2>v6.0.8</h2>
<p>Please refer to <a
href="https://github.com/vitejs/vite/blob/v6.0.8/packages/vite/CHANGELOG.md">CHANGELOG.md</a>
for details.</p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/vitejs/vite/blob/main/packages/vite/CHANGELOG.md">vite's
changelog</a>.</em></p>
<blockquote>
<h2><!-- raw HTML omitted -->6.0.11 (2025-01-21)<!-- raw HTML omitted
--></h2>
<ul>
<li>fix: <code>preview.allowedHosts</code> with specific values was not
respected (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19246">#19246</a>)
(<a
href="aeb3ec84a2">aeb3ec8</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19246">#19246</a></li>
<li>fix: allow CORS from loopback addresses by default (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19249">#19249</a>)
(<a
href="3d03899737">3d03899</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19249">#19249</a></li>
</ul>
<h2><!-- raw HTML omitted -->6.0.10 (2025-01-20)<!-- raw HTML omitted
--></h2>
<ul>
<li>fix: try parse <code>server.origin</code> URL (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19241">#19241</a>)
(<a
href="2495022420">2495022</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19241">#19241</a></li>
</ul>
<h2><!-- raw HTML omitted -->6.0.9 (2025-01-20)<!-- raw HTML omitted
--></h2>
<ul>
<li>fix!: check host header to prevent DNS rebinding attacks and
introduce <code>server.allowedHosts</code> (<a
href="bd896fb5f3">bd896fb</a>)</li>
<li>fix!: default <code>server.cors: false</code> to disallow fetching
from untrusted origins (<a
href="b09572acc9">b09572a</a>)</li>
<li>fix: verify token for HMR WebSocket connection (<a
href="029dcd6d77">029dcd6</a>)</li>
</ul>
<h2><!-- raw HTML omitted -->6.0.8 (2025-01-20)<!-- raw HTML omitted
--></h2>
<ul>
<li>fix: avoid SSR HMR for HTML files (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19193">#19193</a>)
(<a
href="3bd55bcb7e">3bd55bc</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19193">#19193</a></li>
<li>fix: build time display 7m 60s (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19108">#19108</a>)
(<a
href="cf0d2c8e23">cf0d2c8</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19108">#19108</a></li>
<li>fix: don't resolve URL starting with double slash (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19059">#19059</a>)
(<a
href="35942cde11">35942cd</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19059">#19059</a></li>
<li>fix: ensure <code>server.close()</code> only called once (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19204">#19204</a>)
(<a
href="db81c2dada">db81c2d</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19204">#19204</a></li>
<li>fix: resolve.conditions in ResolvedConfig was
<code>defaultServerConditions</code> (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19174">#19174</a>)
(<a
href="ad75c56dce">ad75c56</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19174">#19174</a></li>
<li>fix: tree shake stringified JSON imports (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19189">#19189</a>)
(<a
href="f2aed62d0b">f2aed62</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19189">#19189</a></li>
<li>fix: use shared sigterm callback (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19203">#19203</a>)
(<a
href="47039f4643">47039f4</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19203">#19203</a></li>
<li>fix(deps): update all non-major dependencies (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19098">#19098</a>)
(<a
href="8639538e64">8639538</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19098">#19098</a></li>
<li>fix(optimizer): use correct default install state path for yarn PnP
(<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19119">#19119</a>)
(<a
href="e690d8bb1e">e690d8b</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19119">#19119</a></li>
<li>fix(types): improve <code>ESBuildOptions.include / exclude</code>
type to allow <code>readonly (string | RegExp)[]</code> (<a
href="ea53e70952">ea53e70</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19146">#19146</a></li>
<li>chore(deps): update dependency pathe to v2 (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19139">#19139</a>)
(<a
href="71506f0a8d">71506f0</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19139">#19139</a></li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="a0ed4057c9"><code>a0ed405</code></a>
release: v6.0.11</li>
<li><a
href="3d03899737"><code>3d03899</code></a>
fix: allow CORS from loopback addresses by default (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19249">#19249</a>)</li>
<li><a
href="aeb3ec84a2"><code>aeb3ec8</code></a>
fix: <code>preview.allowedHosts</code> with specific values was not
respected (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19246">#19246</a>)</li>
<li><a
href="9654348258"><code>9654348</code></a>
release: v6.0.10</li>
<li><a
href="2495022420"><code>2495022</code></a>
fix: try parse <code>server.origin</code> URL (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19241">#19241</a>)</li>
<li><a
href="a55f8ba3e4"><code>a55f8ba</code></a>
release: v6.0.9</li>
<li><a
href="bd896fb5f3"><code>bd896fb</code></a>
fix!: check host header to prevent DNS rebinding attacks and introduce
`serve...</li>
<li><a
href="029dcd6d77"><code>029dcd6</code></a>
fix: verify token for HMR WebSocket connection</li>
<li><a
href="b09572acc9"><code>b09572a</code></a>
fix!: default <code>server.cors: false</code> to disallow fetching from
untrusted origins</li>
<li><a
href="c0f72a695c"><code>c0f72a6</code></a>
release: v6.0.8</li>
<li>Additional commits viewable in <a
href="https://github.com/vitejs/vite/commits/v6.0.11/packages/vite">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=vite&package-manager=npm_and_yarn&previous-version=6.0.7&new-version=6.0.11)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-01-21 17:18:39 -08:00
Yulong Wang
444fcebaa4
Pre-requisites of upgrading EMSDK (#23347)
### Description

This PR contains a part of the changes in #23318.

The reason of creating this PR is: The works to support building WebGPU
EP in WASM depends on #23318, which cannot be merged since it's blocked
by upstream (https://github.com/llvm/llvm-project/issues/122166). This
PR contains the changes can be safely merged separately and can unblock
the development of supporting building WebGPU EP in WASM.

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-01-14 11:07:21 -08:00
Jiajia Qin
7be006c466
[js/webgpu] Optimize convtranspose (#23302)
### Description
<!-- Describe your changes. -->
BUG #23273

With this change, I see the convTranspose time in that bug becomes ~7s
from ~90s on my Meteor Lake.

This PR does below things:
1. Use stride to update the increasement in the loop.
In the bug, the stride is 1024, which can greatly reduce the loop times.
2. Support components for A to reduce the memory access times.
3. When output channels is 1, the b components can be same with A to
further reduce the memory access times.
2025-01-09 11:24:42 -08:00
Yulong Wang
0627a6cb93
[js/web] fix package export for bundlers (#23257)
### Description
<!-- Describe your changes. -->

This PR tries to fix #22615. (see detailed description in the issue)

A perfect solution would be too difficult to make, because there are a
huge number of combinations of usage scenarios, including combinations
of development framework, bundler, dev/prod mode, and so on.

This PR is using the following approach:
- Introduce a new type of end to end test: export test. This type of
tests are complete web apps that use popular web development frameworks,
and the tests are using puppeteer to run the apps and check if the apps
can run without error.
  - added one nextjs based web app and one vite based web app.
- In the test, perform the following test steps:
  - `npm install` for packages built locally
- `npm run dev` to start dev server and use puppeteer to launch the
browser to test
- `npm run build && npm run start` to test prod build and use puppeteer
to launch the browser to test
- Make changes to ort-web, including:
- special handling on Webpack's behavior of rewriting `import.meta.url`
to a `file://` string
  - revise build definitions
  - fix wasm URL for proxy, if used in a bundled build
2025-01-09 11:01:00 -08:00
Changming Sun
5d692b0136
Merge web machine pools (#23243)
### Description
The Web CI pipeline uses three different Windows machine pools:
1. onnxruntime-Win2022-webgpu-A10
2. onnxruntime-Win2022-VS2022-webgpu-A10
3. onnxruntime-Win-CPU-2022-web

This PR merges them together to reduce ongoing maintenance cost.
2025-01-03 13:53:17 -08:00
Yulong Wang
ae6dcc839e
Revert "[js/webgpu] disable failed tests temporarily (#23127)" (#23130)
### Description

This reverts commit 9115682d69.

### Motivation and Context
2024-12-18 18:07:50 -08:00
Yulong Wang
9115682d69
[js/webgpu] disable failed tests temporarily (#23127)
### Description


Those test cases start to fail for unknown reasons.

To unblock the CI, I disabled those tests temporarily to earn time to
investigate the root cause.
2024-12-16 15:35:47 -08:00
Yulong Wang
e605870783
[js/web] Update API for ort.env.webgpu (#23026)
### Description

This PR is a replacement of #21671. It offers a new way for accessing
the following:
- `ort.env.webgpu.adapter`:
- **deprecating**. There is no point to get the value of it. Once
`GPUDevice.adapterInfo` is supported, there is no point to set the value
too.
- `ort.env.webgpu.device`:
  - set value of `GPUDevice` if user created it. Use at user's own risk.
- get value of `Promise<GPUDevice>`. if not exist, create a new one. if
exist return it.
- `ort.env.webgpu.powerPreference`:
- **deprecating**. encouraging users to set `ort.env.webgpu.device` if
necessary.
- `ort.env.webgpu.forceFallbackAdapter`:
- **deprecating**. encouraging users to set `ort.env.webgpu.device` if
necessary.
2024-12-11 10:24:14 -08:00
Xu Xing
c19617a24a
[js/webgpu] Add GatherND (#22847)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-12-04 09:57:32 -08:00
Yulong Wang
50b38ca9d5
[js/web] update default export to include webgpu (#22754)
### Description

This PR changes the following exports:
- `onnxruntime-web` now is same to `onnxruntime-web/webgpu`.
- `onnxruntime-web/webgpu` is deprecating.

### Migration instructions:
- use `onnxruntime-web` instead of `onnxruntime-web/webgpu`.
- use `onnxruntime-web/wasm` if want to use onnxruntime-web without
webgpu/webnn.

### Export table

| file name | export entry | includes WASM | includes JSEP (WebGPU &
WebNN) | includes WebGL
| ------------- | ------------- | ----- | ----- | -----
| ort.all.min.js<br/>ort.all.js<br/>ort.all.min.mjs<br/>ort.all.mjs |
`onnxruntime-web/all` | ✔️| ✔️| ✔️
| ort.min.js<br/>ort.js<br/>ort.min.mjs<br/>ort.mjs | `onnxruntime-web`
| ✔️|  --> ✔️| ✔️ -->
|
ort.webgpu.min.js<br/>ort.webgpu.js<br/>ort.webgpu.min.mjs<br/>ort.webgpu.mjs
| `onnxruntime-web/webgpu` | ✔️ | ✔️ |
| ort.wasm.min.js<br/>ort.wasm.js<br/>ort.wasm.min.mjs<br/>ort.wasm.mjs
| `onnxruntime-web/wasm` | ✔️ |  |
2024-12-04 09:46:45 -08:00
Jiajia Qin
e597eaed4a
[js/webgpu] Optimize transpose as reshape when suitable (#22870)
BUG #22031
2024-11-18 12:52:48 -08:00
Wanming Lin
82681205e4
[WebNN] Fix MLTensorUsage is undefined issue (#22831)
`MLTensorUsage` has been removed from Chromium:
https://chromium-review.googlesource.com/c/chromium/src/+/6015318, but
we still need to make it compatible with old Chrome versions, so just
make it `undefined` for latest Chrome version.
2024-11-13 20:22:22 -08:00
Xu Xing
ff57ac4f3d
[js/webgpu] Add scatterND (#22755)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-11-13 09:13:00 -08:00
Jiajia Qin
7e0dd9d433
[js/webgpu] Optimize Expand (#22752)
Use components = 4 if possible.

llama3.2-1B becomes 20 tokens/s from 18 tokens/s on my iGPUs.
2024-11-12 12:37:19 -08:00
shiyi
63cb53257b
[WebNN] Support steps >= 1 for slice operator (#22708)
Co-authored-by: Wanming Lin <wanming.lin@intel.com>
2024-11-09 18:20:52 -08:00
xhcao
b5ee4ac760
[js/webgpu] support GridSample operator (#22652)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-11-08 11:02:36 -08:00
jzm-intel
d9b91682f1
WebGPU JSEP: Make shader code not depend on input broadcasting patterns (#22536)
This PR make MatMul shaders not depend on inputs broadcasting pattern,
but only depend on input ranks and their shape provided in uniform. This
change fix the issue that currently shaders code are different for
different broadcasting, but have identical cache key and results in
wrong cache hit.
2024-11-08 11:00:51 -08:00
Bin Miao
777fe7922c
[WebNN EP] Support Sign and CumSum operators (#22616)
This PR supports Sign and CumSum operators for WebNN EP. @Honry @fdwr
PTAL, thanks.
2024-11-03 20:08:16 -08:00
Jiajia Qin
8fbbf2fd4f
[js/webgpu] Optimize MatMul with M = 1 (#22577)
### Description
<!-- Describe your changes. -->
BUG #22031

In the demucs model, there are lots of MatMul ops with shapes like
below:
`input[0]: [3448,1,512] | float32, input[1]: [512,1536] | float32,
output[0]: [3448,1,1536] | float32`

We can see that for this kind of shape, the batch size is a big value,
but M = 1. Our current algorithm is based on [M, N] to partition tiles,
which is not efficient for such kind of shapes. This PR reshapes the
inputs to improve the matmul performance.
Before:  [3448,1,512] x [512,1536] =  [3448,1,1536]
After: [1, 3448, 512] x [512, 1536] = [1, 3448, 1536] , then the output
can be reshaped to [3448, 1, 1536]

The overall MatMul time in demucs model becomes 1778.45 ms from 4418.17
ms on my iGPUs.

---------

Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
2024-11-01 08:04:42 -07:00
Wanming Lin
fc375a6f58
[WebNN] Support And, Or and Xor ops (#22598)
Co-authored-by: Dwayne Robinson <fdwr@hotmail.com>
2024-10-30 17:52:10 -07:00
shiyi
46ff240821
[WebNN] Add ScatterElements and GatherElements (#22534) 2024-10-30 10:20:21 -07:00
shiyi
dcf91266bd
[WebNN EP] Support GatherND and ScatterND op (#22181) 2024-10-28 15:04:45 -07:00
Satya Kumar Jandhyala
05fbb43b34
[JSEP/WebGPU] Fix data causing output mismatch resulting in CI build failures occasionally (#22596)
### Description
<!-- Describe your changes. -->
Test case failing sometimes and passing other times.


### Motivation and Context
Prevent unnecessary CI build failures requiring manually rerunning tests
2024-10-26 01:37:12 -07:00
Satya Kumar Jandhyala
fd8ee4894d
[JS/WebGPU] GroupQueryAttention rewrite (#20946)
### Description
Implement JSEP GroupQueryAttention



### Motivation and Context
Required to enable certain LLM models to run using WebGPU.
2024-10-23 10:14:09 -07:00
Wanming Lin
e6e94e6252
[WebNN EP] Use boolean flags instead of MLTensorUsage (#22497)
Fixed #22495

We will keep MLTensorUsage until it is removed from Chromium.

---------

Co-authored-by: Dwayne Robinson <fdwr@hotmail.com>
2024-10-22 17:20:36 -07:00
Wanming Lin
52b77762bd
[WebNN EP] Remove the numThreads option (#22464)
Chromium has removed this option via
https://chromium-review.googlesource.com/c/chromium/src/+/5905656.
2024-10-17 07:45:39 -07:00
mingmingtasd
004bd36f3d
[WebNN EP] Support Tile operator (#22148)
PTAL, thanks! @Honry , @fdwr thanks!
2024-10-05 00:56:55 -07:00
Yang Gu
9e5153b688
[js/webgpu] Manage model download with a specific unittest option (#22214)
Currently in debug mode, unit test will always download models to local
file system, which is a bit annoying. This PR fixes this by adding a
specific option to enable model download.
2024-09-30 18:27:43 -07:00
Yang Gu
c75f4a09b7
[js/webgpu] Remove the limitation on axis in softmax (#22231)
In current implementation, axis in softmax has to be the last, which is
an obvious limitation. This PR removes this limitation and will fix
issues #20710 and #22176.
2024-09-30 18:27:11 -07:00
Enrico Galli
52a8c1cae8
[WebNN EP] Enable IO Bindings with MLTensor (#21301)
### Description
Enables using the MLTensor to pass data between models. 


### Motivation and Context
Using MLTensor instead of ArrayBuffers reduces the number of copies
between the CPU and devices as well as the renderer and GPU process in
Chromium.
2024-09-27 17:24:21 -07:00
shiyi
1e3cd86d80
[WebNN EP] Support LSTM op (#20293)
<!-- Describe your changes. -->




<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-09-27 14:23:08 -07:00
Yulong Wang
291a5352b2
[js/web] remove training release (#22103)
### Description

Remove training from onnxruntime-web

Following up of #22082
2024-09-16 10:56:22 -07:00
Bin Miao
4d82404544
[WebNN EP] Support GRU operator (#20405)
This PR support Gru operator for WebNN EP.
@Honry ,  @fdwr thanks!
2024-09-11 14:16:36 -07:00
Jiajia Qin
3580e01348
[js/webgpu] Optimize grouped conv (#21892)
### Description
<!-- Describe your changes. -->
#21618

This PR optimizes grouped conv by 1) more sequential memory access in
gpu 2) reusing input's data to reduce global memory access times.

See `Conv|GroupedConv` op in
[Wav2Vec2](https://huggingface.co/facebook/wav2vec2-base-960h) becomes
92 ms from 1058 ms on iGPUs with 32 EU.

For the whole model on my iGPUs with 32 EU,
wav2vec2 model becomes 982ms from 1942 ms.
squeezebert-uncased model becomes 71.86ms from 431.77ms.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-09-04 17:16:35 -07:00
Jiajia Qin
a80bfed5b4
[js/webgpu] Optimize transpose (#21964)
### Description
<!-- Describe your changes. -->
Fix bugs in previous implementation and add more situations to go the
optimized path.

Below situations will go to the optimized path.
1. 2d inputs or squeezed 2d inputs
2. channels last or channels first transpose. For example, channel last
transpose: [1, 256, 512, 512] -> [1, 512, 512, 256]
For this case, the transpose becomes [256, 512x512] -> [512x512, 256]

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
For SD Turbo demo, the total transpose time becomes 39.98ms from
122.09ms. And the correspnding percents becomes 3.89% from 11.05% in
this demo.

This PR will also help #21618, the total transpose time in that demo
becomes 17.32 ms from 70.25 ms on my iGPUs.
2024-09-04 12:04:04 -07:00
xhcao
3bfb5e4f62
[js/webgpu] support float16 for Clip (#21584)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-08-28 13:19:20 -07:00
Satya Kumar Jandhyala
af18824f43
[JS/WebGPU] Add GatherBlockQuantized op support (#21734)
### Description
Add GatherBlockQuantized operator to JSEP.



### Motivation and Context
Gemma model requires this.
2024-08-26 14:46:04 -07:00
Xu Xing
d9c57ac7db
[js/webgpu] Enable pad f16 uniform (#21691)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
2024-08-26 07:58:48 -07:00
Jiajia Qin
27a6890529
[js/webgpu] Optimize conv1d by conv2d (#19388)
### Description
<!-- Describe your changes. -->

Optimize conv1d to go to the conv2d path to utilize the conv2d's
optimization path.

See whisper-tiny-encoder model becomes 158.66 ms from 532.28 ms. Conv
goes to Conv2DMatMul(8 ms) instead of GroupedConv(382 ms).

Old profiling result:
Kernel | Time (ms) | Percentage (%)
-- | -- | --
Conv\|GroupedConv | 382.99 | 71.95
MatMul | 126.16 | 23.70
Softmax | 7.01 | 1.32
Transpose | 4.59 | 0.86
Add | 4.39 | 0.82
Mul | 2.36 | 0.44
Div | 1.44 | 0.27
ReduceMean\|ReduceMeanShared | 1.25 | 0.23
Erf | 0.85 | 0.16
Sub | 0.72 | 0.14
Pow | 0.46 | 0.09
Sqrt | 0.07 | 0.01
Sum | 532.28 |  

New profiling result with this PR:

Kernel | Time (ms) | Percentage (%)
-- | -- | --
MatMul | 127.07 | 80.09
Conv\|Conv2DMatMul | 8.00 | 5.04
Softmax | 6.95 | 4.38
Transpose | 4.65 | 2.93
Add | 4.26 | 2.68
Mul | 2.56 | 1.61
Div | 1.51 | 0.95
ReduceMean\|ReduceMeanShared | 1.31 | 0.83
Erf | 0.85 | 0.54
Sub | 0.79 | 0.50
Pow | 0.46 | 0.29
Conv\|Transpose | 0.26 | 0.17
Sqrt | 0.00 | 0.00
Sum | 158.66 |  

---------

Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
2024-08-22 22:56:07 -07:00
Satya Kumar Jandhyala
1fb2e71ddc
[JS/WebGPU] Avoid producing presentKey/presentValue outputs if pastKey/pastValue … (#21782)
Avoid producing presentKey/presentValue outputs if pastKey/pastValue
don't exists.

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-08-19 18:02:19 -07:00
Yang Gu
49fc168eed
[js/webgpu] Handle negative axis in op Split (#21771)
This is to fix issue #21703, where the axis is a negative value in the
model. According to the spec
(https://onnx.ai/onnx/operators/onnx__Split.html), negative axis means
counting dimensions from the back.
2024-08-17 16:41:23 -07:00
Tianlei Wu
d79e3c5791
Extend Attention Bias Broadcast Support (#21710)
### Description
Previously, MultiHeadAttention supports relative position bias of shape
[1, N, S, T] or [B, N, S, T], and DecoderMaskedMultiHeadAttention
supports [1, N, S, T]. This will extend the support to allow [1, N, S,
T], [B, N, S, T], [B, 1, S, T] and [1, 1, S, T] for CUDA and CPU EPs.

- [x] Rename the input of "relative position bias" to "attention bias"
because it can also be used for other types of bias, like ALiBi
(Attention with Linear Biases) or attention mask.
- [x] Update unfused kernel to support broadcasting 2nd dimension of
attention bias.
- [x] Update efficient attention to support broadcasting 2nd dimension
of attention bias.
- [x] Update operators (MultiHeadAttention,
DecoderMaskedMultiHeadAttention, Attention, PackedAttention,
PackedMultiHeadAttention) to support broadcast attention bias on CUDA
and CPU EPs.
- [x] Update ROCm, DML and WebGPU naming to be consistent. (Note that
those EPs do not support broadcasting attention_bias for now).
- [x] Add attention bias tests for MultiHeadAttention.
- [x] Update operator documents
- [x] Update benchmark script

Other changes:
* Fix some checks in multihead-attention.ts
* Add helper functions to dump tensors given dimensions.
2024-08-16 15:40:04 -07:00
Yulong Wang
ef2ccc477b
[js/web] Add support for int4/uint4 tensor (#21720)
### Description
Add support for int4/uint4 tensor.
2024-08-15 21:32:10 -07:00
Yulong Wang
abdc31de40
[js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728)
### Description

See
454996d496
for manual changes (excluded auto-generated formatting changes)

### Why

Because the toolsets for old clang-format is out-of-date. This reduces
the development efficiency.

- The NPM package `clang-format` is already in maintenance mode. not
updated since 2 years ago.
- The VSCode extension for clang-format is not maintained for a while,
and a recent Node.js security update made it not working at all in
Windows.

No one in community seems interested in fixing those.

Choose Prettier as it is the most popular TS/JS formatter.

### How to merge

It's easy to break the build:
- Be careful of any new commits on main not included in this PR.
- Be careful that after this PR is merged, other PRs that already passed
CI can merge.

So, make sure there is no new commits before merging this one, and
invalidate js PRs that already passed CI, force them to merge to latest.
2024-08-14 16:51:22 -07:00
Xu Xing
7172aff1cf
[js/webgpu] Fix max pool shape end with 0 (#21698)
Bug: https://github.com/microsoft/onnxruntime/issues/21386

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-08-13 20:59:24 -07:00
Satya Kumar Jandhyala
51b2044120
[JS/WebGPU] Add Dequantizelinear operator (#21642)
### Description
Added DequantizeLinear operator for JSEP.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-08-09 14:44:19 -07:00