Commit graph

741 commits

Author SHA1 Message Date
Jiajia Qin
8fbbf2fd4f
[js/webgpu] Optimize MatMul with M = 1 (#22577)
### Description
<!-- Describe your changes. -->
BUG #22031

In the demucs model, there are lots of MatMul ops with shapes like
below:
`input[0]: [3448,1,512] | float32, input[1]: [512,1536] | float32,
output[0]: [3448,1,1536] | float32`

We can see that for this kind of shape, the batch size is a big value,
but M = 1. Our current algorithm is based on [M, N] to partition tiles,
which is not efficient for such kind of shapes. This PR reshapes the
inputs to improve the matmul performance.
Before:  [3448,1,512] x [512,1536] =  [3448,1,1536]
After: [1, 3448, 512] x [512, 1536] = [1, 3448, 1536] , then the output
can be reshaped to [3448, 1, 1536]

The overall MatMul time in demucs model becomes 1778.45 ms from 4418.17
ms on my iGPUs.

---------

Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
2024-11-01 08:04:42 -07:00
Wanming Lin
eb66bfa7b4
[WebNN] Convert MLOperand methods into readonly attributes (#22653)
Adapt to spec change at
https://github.com/webmachinelearning/webnn/pull/774
2024-10-30 17:54:49 -07:00
Wanming Lin
fc375a6f58
[WebNN] Support And, Or and Xor ops (#22598)
Co-authored-by: Dwayne Robinson <fdwr@hotmail.com>
2024-10-30 17:52:10 -07:00
Enrico Galli
df236c7894
[WebNN EP] Add cache for MLContexts in the WebNNBackend (#22510)
### Description
This change adds a cache of `MLContext`s keyed by their options to the
`WebNNBackend`. This makes is so that multiple `InferenceSession`s
create with the same options will share the same context.

### Motivation and Context
Since `MLTensor`s are tied `MLContext`s, developer can't easily share
tensors between `InferenceSession` (outside of manually an `MLContext`
and specifying the `context` options). This leads strange behaviors such
as,
```js
const sessionsA = ort.InferenceSession.create(urlA, {
  executionProviders: ["webnn"],
  preferredOutputLocation: "ml-buffer",
});
const sessionsB = ort.InferenceSession.create(urlB, {
  executionProviders: ["webnn"],
});
const temp = await sessionA.run({/* arguments */});
const result = await sessionB.run({"input":temp["output"]}); // ERROR: Failed to execute 'dispatch' on 'MLContext': Invalid inputs: The context of MLGraph doesn't match the context of the MLTensor with name "input".
```
We encountered this behavior when updating the transformers.js version
in the developer preview demos. microsoft/webnn-developer-preview#46
2024-10-30 10:26:33 -07:00
shiyi
46ff240821
[WebNN] Add ScatterElements and GatherElements (#22534) 2024-10-30 10:20:21 -07:00
Prathik Rao
5cc7fb4a74
[JSEP] Upgrade to ONNX Opset 21 (#22595)
### JSEP Ops that need updating

- [x] Cast
- [x] ReduceMax
- [x] ReduceMin
- [x] Squeeze
- [x] Unsqueeze
- [x] Transpose
- [x] AveragePool
- [x] Flatten
- [x] Pad
- [x] If
2024-10-29 17:44:38 -07:00
Jiajia Qin
04e696d8e0
[js/webgpu] Optimize InstanceNorm in some shapes (#22637)
BUG #22031

Optimize below two situations:
1. Increase workgroupSize if only one workgroup is dispatched.
2. Avoid transpose if not necessary.

The overall time of demucs model becomes 106.36 ms from 154.60 ms on my
dGPUs with this PR and PR #22577
2024-10-29 17:10:14 -07:00
Yulong Wang
dbe8c83893
[js/web] remove "node": null in export table (#22618)
### Description

This change resolves issue No.3 described in #22615
2024-10-29 04:01:26 -07:00
shiyi
dcf91266bd
[WebNN EP] Support GatherND and ScatterND op (#22181) 2024-10-28 15:04:45 -07:00
Satya Kumar Jandhyala
05fbb43b34
[JSEP/WebGPU] Fix data causing output mismatch resulting in CI build failures occasionally (#22596)
### Description
<!-- Describe your changes. -->
Test case failing sometimes and passing other times.


### Motivation and Context
Prevent unnecessary CI build failures requiring manually rerunning tests
2024-10-26 01:37:12 -07:00
Wanming Lin
008c9090b4
[WebNN] Support int4 and uint4 data types (#22575) 2024-10-25 17:44:46 -07:00
Satya Kumar Jandhyala
4ed5bec2e7
[JS/WebGPU] Support WASM64 (#21836)
### Description
Support wasm64



### Motivation and Context
Overcome memory limitations

---------

Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
2024-10-24 20:21:51 -07:00
jzm-intel
374022e988
JSEP: Use global-agent in scripts to enable using network proxy (#22537)
This PR add dependency to the global-agent package, and use it in JSEP
scripts that download files from network (i.e. `js/scripts/utils.ts` and
`js/web/script/pull-prebuilt-wasm-artifacts.ts`), so that user can make
these script use network proxy by setting environment variable
GLOBAL_AGENT_HTTPS_PROXY.
2024-10-24 16:27:11 -07:00
Yulong Wang
ef7f1ce08b
Update Node.js version from 18.x to 20.x in CI pipelines (#22576) 2024-10-24 07:34:42 -07:00
Prathik Rao
742594c8f0
Clears GPU Cache when there are no more active sessions (#22490)
Fixes https://github.com/microsoft/onnxruntime/issues/21574
2024-10-23 22:22:57 -07:00
Satya Kumar Jandhyala
fd8ee4894d
[JS/WebGPU] GroupQueryAttention rewrite (#20946)
### Description
Implement JSEP GroupQueryAttention



### Motivation and Context
Required to enable certain LLM models to run using WebGPU.
2024-10-23 10:14:09 -07:00
Wanming Lin
33e2f6ad8d
[WebNN EP] Support external data (#22263)
### Description
This PR introduces support for registering external data inside WebNN
EP.

### Motivation and Context

- The WebNN EP needs to register the initializers at graph compilation
stage, for initializers from external data, it can't leverage the
general external data loader framework because the graph compilation of
WebNN EP is executed before external data loader called.
- Exposes the `utils::GetExternalDataInfo`, it is useful for WebNN EP to
read the external tensor's infomation.
- Define a new `registerMLConstant` in JSEP to create WebNN constants
from external data in WebNN backend, with the info of tensor as
parameters, as well as the `Module.MountedFiles`, which holds all
preloaded external files.
2024-10-23 08:18:16 -07:00
Wanming Lin
ba40022ec4
[WebNN EP] Support axes and fix some validation for Resize (#21952)
- Supports arbitrary axes for Resize opset 18+
- Check all inputs and attributes more carefully

---------

Co-authored-by: Dwayne Robinson <fdwr@hotmail.com>
2024-10-22 20:26:34 -07:00
Wanming Lin
e6e94e6252
[WebNN EP] Use boolean flags instead of MLTensorUsage (#22497)
Fixed #22495

We will keep MLTensorUsage until it is removed from Chromium.

---------

Co-authored-by: Dwayne Robinson <fdwr@hotmail.com>
2024-10-22 17:20:36 -07:00
Enrico Galli
1e5bda88f0
[WebNN EP] Cache MLTensors between runs (#22278)
### Description
This change enables caching `MLTensor`s between inferences runs. This is
done by keeping a reference to `MLTensor`s alive after they have been
released. `MLTensor`s are only destroyed once the sessions goes out of
scope.

### Motivation and Context
Creating and destroying `MTensor`s on every run has a non-trivial
performance penalty. This performance penalty materializes when using
`ort.Tensors`[location=cpu] for inputs/outputs or when using the CPU EP
as a fallback EP for unsupported operators. The former could be
mitigated by developer using `ort.Tensors`[location=ml-tensor]. The
latter cannot be mitigated by developers.
2024-10-18 08:07:00 -07:00
Akshay Sonawane
e5c2e50849
bumps up version in main from 1.20 -> 1.21 (#22482)
Bump up version in main from 1.20.0 to 1.21.0 since the release branch
has been cut.
2024-10-17 12:32:35 -07:00
Wanming Lin
52b77762bd
[WebNN EP] Remove the numThreads option (#22464)
Chromium has removed this option via
https://chromium-review.googlesource.com/c/chromium/src/+/5905656.
2024-10-17 07:45:39 -07:00
wejoncy
20a45dd67b
[CoreML ML Program] support acclerators selector (#22383)
### Description
For no, CoreML only support run mlmodels on CPU/ALL, However, sometimes
CPU_GPU would be faster a lot.

We support the option to select different hardware to boost performance
in this PR.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2024-10-15 11:50:11 +08:00
Jiajia Qin
8159723ba7
[js/webgpu] Optimize matmulnbits (#22360)
### Description
<!-- Describe your changes. -->
This PR further optimizes matmulnbits specially for iGPUs. The phi3 demo
becomes ~12 tokens/second from ~8 tokens on iGPUs.

Some todos:
1. Make the optimization more general, Remove the blockSize = 32
limitation.
2. Tune the parameter, such as workgroupSize, components size (currently
only support components = 1), to see the performance change.
2024-10-14 15:49:29 -07:00
dependabot[bot]
2bc3754494
Bump cookie and socket.io in /js/web (#22408)
Bumps [cookie](https://github.com/jshttp/cookie) and
[socket.io](https://github.com/socketio/socket.io). These dependencies
needed to be updated together.
Updates `cookie` from 0.4.2 to 0.7.2
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/jshttp/cookie/releases">cookie's
releases</a>.</em></p>
<blockquote>
<h2>v0.7.2</h2>
<p><strong>Fixed</strong></p>
<ul>
<li>Fix object assignment of <code>hasOwnProperty</code> (<a
href="https://redirect.github.com/jshttp/cookie/issues/177">#177</a>)
bc38ffd</li>
</ul>
<p><a
href="https://github.com/jshttp/cookie/compare/v0.7.1...v0.7.2">https://github.com/jshttp/cookie/compare/v0.7.1...v0.7.2</a></p>
<h2>0.7.1</h2>
<p><strong>Fixed</strong></p>
<ul>
<li>Allow leading dot for domain (<a
href="https://redirect.github.com/jshttp/cookie/issues/174">#174</a>)
<ul>
<li>Although not permitted in the spec, some users expect this to work
and user agents ignore the leading dot according to spec</li>
</ul>
</li>
<li>Add fast path for <code>serialize</code> without options, use
<code>obj.hasOwnProperty</code> when parsing (<a
href="https://redirect.github.com/jshttp/cookie/issues/172">#172</a>)</li>
</ul>
<p><a
href="https://github.com/jshttp/cookie/compare/v0.7.0...v0.7.1">https://github.com/jshttp/cookie/compare/v0.7.0...v0.7.1</a></p>
<h2>0.7.0</h2>
<ul>
<li>perf: parse cookies ~10% faster (<a
href="https://redirect.github.com/jshttp/cookie/issues/144">#144</a> by
<a href="https://github.com/kurtextrem"><code>@​kurtextrem</code></a>
and <a
href="https://redirect.github.com/jshttp/cookie/issues/170">#170</a>)</li>
<li>fix: narrow the validation of cookies to match RFC6265 (<a
href="https://redirect.github.com/jshttp/cookie/issues/167">#167</a> by
<a href="https://github.com/bewinsnw"><code>@​bewinsnw</code></a>)</li>
<li>fix: add <code>main</code> to <code>package.json</code> for rspack
(<a href="https://redirect.github.com/jshttp/cookie/issues/166">#166</a>
by <a
href="https://github.com/proudparrot2"><code>@​proudparrot2</code></a>)</li>
</ul>
<p><a
href="https://github.com/jshttp/cookie/compare/v0.6.0...v0.7.0">https://github.com/jshttp/cookie/compare/v0.6.0...v0.7.0</a></p>
<h2>0.6.0</h2>
<ul>
<li>Add <code>partitioned</code> option</li>
</ul>
<h2>0.5.0</h2>
<ul>
<li>Add <code>priority</code> option</li>
<li>Fix <code>expires</code> option to reject invalid dates</li>
<li>pref: improve default decode speed</li>
<li>pref: remove slow string split in parse</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="d19eaa1a2b"><code>d19eaa1</code></a>
0.7.2</li>
<li><a
href="bc38ffd0ea"><code>bc38ffd</code></a>
Fix object assignment of <code>hasOwnProperty</code> (<a
href="https://redirect.github.com/jshttp/cookie/issues/177">#177</a>)</li>
<li><a
href="cf4658f492"><code>cf4658f</code></a>
0.7.1</li>
<li><a
href="6a8b8f5a49"><code>6a8b8f5</code></a>
Allow leading dot for domain (<a
href="https://redirect.github.com/jshttp/cookie/issues/174">#174</a>)</li>
<li><a
href="58015c0b93"><code>58015c0</code></a>
Remove more code and perf wins (<a
href="https://redirect.github.com/jshttp/cookie/issues/172">#172</a>)</li>
<li><a
href="ab057d6c06"><code>ab057d6</code></a>
0.7.0</li>
<li><a
href="5f02ca8768"><code>5f02ca8</code></a>
Migrate history to GitHub releases</li>
<li><a
href="a5d591ce84"><code>a5d591c</code></a>
Migrate history to GitHub releases</li>
<li><a
href="51968f94b5"><code>51968f9</code></a>
Skip isNaN</li>
<li><a
href="9e7ca51ade"><code>9e7ca51</code></a>
perf(parse): cache length, return early (<a
href="https://redirect.github.com/jshttp/cookie/issues/144">#144</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/jshttp/cookie/compare/v0.4.2...v0.7.2">compare
view</a></li>
</ul>
</details>
<details>
<summary>Maintainer changes</summary>
<p>This version was pushed to npm by <a
href="https://www.npmjs.com/~blakeembrey">blakeembrey</a>, a new
releaser for cookie since your current version.</p>
</details>
<br />

Updates `socket.io` from 4.7.5 to 4.8.0
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/socketio/socket.io/releases">socket.io's
releases</a>.</em></p>
<blockquote>
<h2>socket.io-client@4.8.0</h2>
<h3>Features</h3>
<h4>Custom transport implementations</h4>
<p>The <code>transports</code> option now accepts an array of transport
implementations:</p>
<pre lang="js"><code>import { io } from &quot;socket.io-client&quot;;
import { XHR, WebSocket } from &quot;engine.io-client&quot;;
<p>const socket = io({
transports: [XHR, WebSocket]
});
</code></pre></p>
<p>Here is the list of provided implementations:</p>
<table>
<thead>
<tr>
<th>Transport</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>Fetch</code></td>
<td>HTTP long-polling based on the built-in <code>fetch()</code>
method.</td>
</tr>
<tr>
<td><code>NodeXHR</code></td>
<td>HTTP long-polling based on the <code>XMLHttpRequest</code> object
provided by the <code>xmlhttprequest-ssl</code> package.</td>
</tr>
<tr>
<td><code>XHR</code></td>
<td>HTTP long-polling based on the built-in <code>XMLHttpRequest</code>
object.</td>
</tr>
<tr>
<td><code>NodeWebSocket</code></td>
<td>WebSocket transport based on the <code>WebSocket</code> object
provided by the <code>ws</code> package.</td>
</tr>
<tr>
<td><code>WebSocket</code></td>
<td>WebSocket transport based on the built-in <code>WebSocket</code>
object.</td>
</tr>
<tr>
<td><code>WebTransport</code></td>
<td>WebTransport transport based on the built-in
<code>WebTransport</code> object.</td>
</tr>
</tbody>
</table>
<p>Usage:</p>
<table>
<thead>
<tr>
<th>Transport</th>
<th>browser</th>
<th>Node.js</th>
<th>Deno</th>
<th>Bun</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>Fetch</code></td>
<td></td>
<td> (1)</td>
<td></td>
<td></td>
</tr>
<tr>
<td><code>NodeXHR</code></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td><code>XHR</code></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td><code>NodeWebSocket</code></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td><code>WebSocket</code></td>
<td></td>
<td> (2)</td>
<td></td>
<td></td>
</tr>
<tr>
<td><code>WebTransport</code></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
<p>(1) since <a
href="https://nodejs.org/api/globals.html#fetch">v18.0.0</a>
(2) since <a
href="https://nodejs.org/api/globals.html#websocket">v21.0.0</a></p>
<p>Added in <a
href="f4d898ee96">f4d898e</a>
and <a
href="b11763beec">b11763b</a>.</p>
<h4>Test each low-level transports</h4>
<p>When setting the <code>tryAllTransports</code> option to
<code>true</code>, if the first transport (usually, HTTP long-polling)
fails, then the other transports will be tested too:</p>
<pre lang="js"><code>import { io } from &quot;socket.io-client&quot;;
&lt;/tr&gt;&lt;/table&gt; 
</code></pre>
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="d0fc720420"><code>d0fc720</code></a>
chore(release): socket.io@4.8.0</li>
<li><a
href="4a0555c671"><code>4a0555c</code></a>
chore(release): socket.io-client@4.8.0</li>
<li><a
href="2b60df18a8"><code>2b60df1</code></a>
chore(release): engine.io@6.6.1</li>
<li><a
href="d4cb375856"><code>d4cb375</code></a>
ci: ignore tests when publishing to npm</li>
<li><a
href="c251ae7ba7"><code>c251ae7</code></a>
chore(release): engine.io-client@6.6.1</li>
<li><a
href="8a2f5a3da0"><code>8a2f5a3</code></a>
fix(eio-client): move 'offline' event listener at the top</li>
<li><a
href="b04fa64365"><code>b04fa64</code></a>
fix(sio): allow to join a room in a middleware (uws)</li>
<li><a
href="7085f0e3e4"><code>7085f0e</code></a>
refactor(sio-client): mangle private attributes</li>
<li><a
href="4f66708210"><code>4f66708</code></a>
chore(sio-client): use babel loose mode when transpiling classes</li>
<li><a
href="1a95db2145"><code>1a95db2</code></a>
chore(sio-client): add a script to compute the bundle size</li>
<li>Additional commits viewable in <a
href="https://github.com/socketio/socket.io/compare/socket.io@4.7.5...socket.io@4.8.0">compare
view</a></li>
</ul>
</details>
<br />


Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
Dependabot will merge this PR once CI passes on it, as requested by
@fs-eire.

[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-10-14 15:47:01 -07:00
Jiajia Qin
0409c639f7
[js/webgpu] Optimize MultiHeadAttention|Transpose (#22420)
### Description
<!-- Describe your changes. -->
With this optimization, 96 MultiHeadAttention|Transpose ops in phi3
disappear. Phi3 becomes 113 tokens from 107 tokens on my dGPUs.

The optimization mainly skips the transpose op if one of the transposed
dims is 1. Reshape is enough.
2024-10-14 15:43:14 -07:00
mingmingtasd
004bd36f3d
[WebNN EP] Support Tile operator (#22148)
PTAL, thanks! @Honry , @fdwr thanks!
2024-10-05 00:56:55 -07:00
Wanming Lin
39c8b3759f
[JS/WebGPU] Fixed bugs in inputs validation of Resize (#21955)
- 'scales' and 'sizes' may be empty tensor, make sure it's 1D tensor and
non-empty
- Make sure 'scales' and 'sizes' if present its length is non-zero
2024-10-04 18:29:53 -07:00
Yang Gu
9e5153b688
[js/webgpu] Manage model download with a specific unittest option (#22214)
Currently in debug mode, unit test will always download models to local
file system, which is a bit annoying. This PR fixes this by adding a
specific option to enable model download.
2024-09-30 18:27:43 -07:00
Yang Gu
c75f4a09b7
[js/webgpu] Remove the limitation on axis in softmax (#22231)
In current implementation, axis in softmax has to be the last, which is
an obvious limitation. This PR removes this limitation and will fix
issues #20710 and #22176.
2024-09-30 18:27:11 -07:00
Yulong Wang
1bda91fc57
[js/webgpu] fix external buffer registration (#22254)
### Description

Fixes the problem of running into failure when GPU inputs shuffled
between iterations.
2024-09-28 10:36:40 -07:00
Enrico Galli
52a8c1cae8
[WebNN EP] Enable IO Bindings with MLTensor (#21301)
### Description
Enables using the MLTensor to pass data between models. 


### Motivation and Context
Using MLTensor instead of ArrayBuffers reduces the number of copies
between the CPU and devices as well as the renderer and GPU process in
Chromium.
2024-09-27 17:24:21 -07:00
shiyi
1e3cd86d80
[WebNN EP] Support LSTM op (#20293)
<!-- Describe your changes. -->




<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-09-27 14:23:08 -07:00
Scott McKay
3846f84218
Increase React Native E2E (#22230)
### Description
<!-- Describe your changes. -->
Increase the detox setup timeout to 4 minutes. 

The iOS RN E2E tests are taking slightly around 2 mins to setup causing
flakiness.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Improve RN CI pass rate
2024-09-27 08:59:36 +10:00
Claude
3494f80e83
Check if HTMLCanvasElement exists (i.e. we are not running in a webworker) (#22153)
This fixes #22152


### Description
Tensor.fromImage fails in a webworker context, because HTMLCanvasElement
does not exist:

> HTMLCanvasElement is not defined



### Motivation and Context
This fixes #22152

---------

Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
2024-09-25 11:52:52 -07:00
Yulong Wang
df25006d1b
upgrade micromatch to v4.0.8 (#22174)
### Description

Upgrade `micromatch` to v4.0.8

https://github.com/advisories/GHSA-952p-6rrq-rcjv
2024-09-23 14:39:32 -07:00
Jiajia Qin
80e9df826e
[js/webgpu] Optimize InstanceNormalization (#21995)
### Description
<!-- Describe your changes. -->
For InstanceNormalization, it has `y = scale * (x - mean) /
sqrt(variance + epsilon) + B` , where mean and variance are computed per
instance per channel. Calculating mean and variance per channel is a
reduce processing, which is NCHW layout friendly since it makes the
adjacent threads can access contiguous data in gpu memory.

This PR optimizes both NHWC and NCHW InstanceNormalization. To
efficiently calculate the mean and variance, we need to make sure the
input is NCHW instead of NHWC. Then use shared memory to do the reduce
operation to get `channel_scale` and `channel_shift`.

With this PR, getting `channel_scale` and `channel_shift` are same for
NHWC and NCHW InstanceNormalization. And the overall performance becomes
very close now.

Below data comes from SD Turbo profiling results.
Before (InstanceNormalization overall time: 140.84 ms)

InstanceNormalization\|InstanceNormComputeMean | 129.70
-- | -- 
InstanceNormalization\|InstanceNormalizationNHWC | 10.55
InstanceNormalization\|InstanceNormComputeChannelScaleShift | 0.59


After (InstanceNormalization overall time:  59.44 ms)

InstanceNormalization\|InstanceNormComputeChannelScaleShift | 28.57
-- | -- 
InstanceNormalization\|TransposeShared | 20.19
InstanceNormalization\|InstanceNormalizationNHWC | 10.68
2024-09-23 11:32:09 -07:00
Jian Chen
fa68ae2def
Update pool to MacOS-13 (#17361)
### Description
See https://github.com/microsoft/onnxruntime-extensions/pull/476
and https://github.com/actions/runner-images/issues/7671

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

### Current issue
- [ ] For default xcode 15.2, that come with the MacOS-13, We Need to
update the boost container header boost/container_hash/hash.hpp version
to pass the build
- [x] For xcode 14.2 The Build passed but the `Run React Native Detox
Android e2e Test` Failed.
Possible flaky test, https://github.com/microsoft/onnxruntime/pull/21969
- [x] For xcode 14.3.1 We encountered following issue in `Build React
Native Detox iOS e2e Tests`
```
ld: file not found: /Applications/Xcode_14.3.1.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/arc/libarclite_iphonesimulator.a
clang: error: linker command failed with exit code 1 (use -v to see invocation)
```
Applied following code to the eof in both ios/Podfile and fixed the
issue
```
post_install do |installer|
    installer.generated_projects.each do |project|
        project.targets.each do |target|
            target.build_configurations.each do |config|
                config.build_settings['IPHONEOS_DEPLOYMENT_TARGET'] = '13.0'
            end
        end
    end
end
```


- [x] https://github.com/facebook/react-native/issues/32483

Applying changes to ios/Pofile
```
pre_install do |installer|
  # Custom pre-install script or commands
  puts "Running pre-install script..."

  # Recommended fix for https://github.com/facebook/react-native/issues/32483
  # from https://github.com/facebook/react-native/issues/32483#issuecomment-966784501
  system("sed -i '' 's/typedef uint8_t clockid_t;//' \"${SRCROOT}/Pods/RCT-Folly/folly/portability/Time.h\"")
end
```

- [ ] Detox environment setting up exceeded time out of 120000ms during
iso e2e test


### dependent 

- [x] https://github.com/microsoft/onnxruntime/pull/21159

---------

Co-authored-by: Changming Sun <chasun@microsoft.com>
2024-09-17 10:07:30 -07:00
Wanming Lin
9786909ab5
[WebNN EP] Support QuantizeLinear and DequantizeLinear ops (#22097) 2024-09-17 08:18:47 -07:00
Xu Xing
afd642a194
[js/webgpu] Replace array with string in transpose perm (#21930)
Perf test data(100000 times)
Array: 12.599999997764826ms
String: 1.6000000014901161ms

Perf test case:

```
const permFunctionBodyArray = (rank: number, input: string): string => {
  const reverseFunc = [];
  reverseFunc.push(`fn perm(i: int) -> int {
    var a: int};`);
  for (let i = 0; i < rank; ++i) {
    reverseFunc.push(input);
  }
  reverseFunc.push('return a;}');
  return reverseFunc.join('\n');
};

const permFunctionBodyString = (rank: number, input: string): string => {
  let reverseFunc= `fn perm(i: int}) -> int {
    var a: int;`;
  for (let i = 0; i < rank; ++i) {
    reverseFunc+=input;
  }
  reverseFunc+='return a;}';
  return reverseFunc;//.join('\n');
};
const count = 100000;
let start, end
console.time('array');
start = performance.now();
for(let i =0 ; i < count; i ++) {
    permFunctionBodyArray(3, 'input');
}
end = performance.now();
console.timeEnd('array');
console.log("Array: "+ (end-start));

console.time('string');
start = performance.now();
for(let i =0 ; i < count; i ++) {
    permFunctionBodyString(3, 'input');
}
end = performance.now();
console.log("String: " +(end-start));
console.timeEnd('string');
```

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-09-16 23:17:46 -07:00
Yang Gu
2db6b734f5
[js/webgpu] Fix issue to run model demucs (#22074)
This is to fix issue #22031 to run model demucs.
For conv-transpose, outputPadding.length could be 1, while spatialRank
is 2. The fix is to append enough 0s to outputPadding. For conv, the
issue is similar. kernelShape.length sometimes could be 1, while
inputs[1].dims.length is 4. The fix is also to append enough 0s to
kernelShape.
2024-09-16 23:17:10 -07:00
Yulong Wang
291a5352b2
[js/web] remove training release (#22103)
### Description

Remove training from onnxruntime-web

Following up of #22082
2024-09-16 10:56:22 -07:00
Prathik Rao
d495e6cf1c
adds support for Uint8ClampedArray (#21985)
Fixes https://github.com/microsoft/onnxruntime/issues/21753
2024-09-11 22:02:30 -07:00
Bin Miao
4d82404544
[WebNN EP] Support GRU operator (#20405)
This PR support Gru operator for WebNN EP.
@Honry ,  @fdwr thanks!
2024-09-11 14:16:36 -07:00
dependabot[bot]
19954decaf
Bump body-parser from 1.20.2 to 1.20.3 in /js/web (#22044) 2024-09-10 23:05:44 +00:00
Jiajia Qin
3580e01348
[js/webgpu] Optimize grouped conv (#21892)
### Description
<!-- Describe your changes. -->
#21618

This PR optimizes grouped conv by 1) more sequential memory access in
gpu 2) reusing input's data to reduce global memory access times.

See `Conv|GroupedConv` op in
[Wav2Vec2](https://huggingface.co/facebook/wav2vec2-base-960h) becomes
92 ms from 1058 ms on iGPUs with 32 EU.

For the whole model on my iGPUs with 32 EU,
wav2vec2 model becomes 982ms from 1942 ms.
squeezebert-uncased model becomes 71.86ms from 431.77ms.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-09-04 17:16:35 -07:00
Jiajia Qin
a80bfed5b4
[js/webgpu] Optimize transpose (#21964)
### Description
<!-- Describe your changes. -->
Fix bugs in previous implementation and add more situations to go the
optimized path.

Below situations will go to the optimized path.
1. 2d inputs or squeezed 2d inputs
2. channels last or channels first transpose. For example, channel last
transpose: [1, 256, 512, 512] -> [1, 512, 512, 256]
For this case, the transpose becomes [256, 512x512] -> [512x512, 256]

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
For SD Turbo demo, the total transpose time becomes 39.98ms from
122.09ms. And the correspnding percents becomes 3.89% from 11.05% in
this demo.

This PR will also help #21618, the total transpose time in that demo
becomes 17.32 ms from 70.25 ms on my iGPUs.
2024-09-04 12:04:04 -07:00
Edward Chen
cbf3c50d75
Improve stability of Android ReactNative E2E test (#21969)
- Remove redundant `OnnxruntimeModuleExampleE2ETest CheckOutputComponentExists` test
- Attempt to close any Application Not Responding (ANR) dialog prior to running Android test
- Add `--take-screenshots failing` option to detox test commands to save screenshots on failure
2024-09-04 08:41:07 -07:00
Guenther Schmuelling
4fece0430f
remove duplicate function definition (#21903) 2024-08-28 16:18:56 -07:00
xhcao
3bfb5e4f62
[js/webgpu] support float16 for Clip (#21584)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-08-28 13:19:20 -07:00