Commit graph

262 commits

Author SHA1 Message Date
Arthur Islamov
65249f42e4
[js/web] FP16 Gemm, Softmax & Transpose (#17494)
### Description
First three OPs to support fp16. Will add more once this gets merged
since others depend on changes in js_data_types
2023-09-11 21:09:37 -07:00
satyajandhyala
bf6d6961cc
[JS/Web] Added Einsum operator support. (#17401)
### Description
Added Einsum operator support to JSEP.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-09-11 15:57:15 -07:00
Yulong Wang
89da5a0108
[js/webgpu] exclude WebGPU reduce_log_sum_exp_* float64 test cases (#17472)
### Description

as explained in the comments, tests "test_reduce_log_sum_exp_*" on
opset17/opset18 are excluded because they use float64.

They are passing now because they fallback to CPU. WebGPU does not
support f64.


This is one of the prerequisites for supporting IO binding for WebGPU
buffer in onnxruntime-web.

list of prerequisites PRs:
https://github.com/microsoft/onnxruntime/pull/17465
https://github.com/microsoft/onnxruntime/pull/17469
https://github.com/microsoft/onnxruntime/pull/17470
https://github.com/microsoft/onnxruntime/pull/17472 (this one)
2023-09-08 17:03:04 -07:00
Caroline Zhu
dcc93909b4
Add training WASM generation to Web CI pipeline (#17319)
### Description
[Successful pipeline
run](https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1123141&view=results)

Added flag to build the training artifacts & updated the
pull-wasm-artifacts script to pull the training artifacts as well.

Bundled into this PR are minor formatting fixes + naming fixes.

### Motivation and Context
[This PR](https://github.com/microsoft/onnxruntime/pull/16521) extended
the WASM API wrapper to build training WASM artifacts as well.
The ORT training WASM artifacts are required to support ORT training web
bindings.
2023-09-08 15:49:47 -07:00
xhcao
9017ea131b
[js/webgpu] support GreaterOrEqual and LessOrEqual operators (#17310)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-09-07 17:41:16 -07:00
dependabot[bot]
eaef485461
Bump electron from 23.1.2 to 23.3.13 in /js/web (#17436)
Bumps [electron](https://github.com/electron/electron) from 23.1.2 to
23.3.13.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/electron/electron/releases">electron's
releases</a>.</em></p>
<blockquote>
<h2>electron v23.3.13</h2>
<h1>Release Notes for v23.3.13</h1>
<h2>End of Support for 23.x.y</h2>
<p>Electron 23.x.y has reached end-of-support as per the project's <a
href="https://www.electronjs.org/docs/latest/tutorial/electron-timelines#version-support-policy">support
policy</a>. Developers and applications are encouraged to upgrade to a
newer version of Electron.</p>
<h2>electron v23.3.12</h2>
<h1>Release Notes for v23.3.12</h1>
<h2>Other Changes</h2>
<ul>
<li>Fixed a crash while screen sharing on Wayland with PipeWire. <a
href="https://redirect.github.com/electron/electron/pull/39274">#39274</a></li>
<li>Security: backported fix for CVE-2023-3732.
<ul>
<li>Security: backported fix for CVE-2023-3728.</li>
<li>Security: backported fix for CVE-2023-3730. <a
href="https://redirect.github.com/electron/electron/pull/39268">#39268</a></li>
</ul>
</li>
</ul>
<h2>electron v23.3.11</h2>
<h1>Release Notes for v23.3.11</h1>
<h2>Fixes</h2>
<ul>
<li>Fixed a crash when listing desktop capture sources on Wayland with
PipeWire. <a
href="https://redirect.github.com/electron/electron/pull/39116">#39116</a>
<!-- raw HTML omitted -->(Also in <a
href="https://redirect.github.com/electron/electron/pull/39050">24</a>,
<a
href="https://redirect.github.com/electron/electron/pull/39051">25</a>,
<a
href="https://redirect.github.com/electron/electron/pull/39049">26</a>)<!--
raw HTML omitted --></li>
</ul>
<h2>electron v23.3.10</h2>
<h1>Release Notes for v23.3.10</h1>
<h2>Other Changes</h2>
<ul>
<li>Security: backported fix for CVE-2023-3422.
<ul>
<li>Security: backported fix for CVE-2023-3421.</li>
<li>Security: backported fix for CVE-2023-3420.</li>
<li>Security: backported fix for 1454860. <a
href="https://redirect.github.com/electron/electron/pull/38948">#38948</a></li>
</ul>
</li>
</ul>
<h2>electron v23.3.9</h2>
<h1>Release Notes for v23.3.9</h1>
<h2>Fixes</h2>
<ul>
<li>Fixed <code>preload</code> script may not run in some child windows
opened by <code>window.open</code>. <a
href="https://redirect.github.com/electron/electron/pull/38933">#38933</a>
<!-- raw HTML omitted -->(Also in <a
href="https://redirect.github.com/electron/electron/pull/38932">24</a>,
<a
href="https://redirect.github.com/electron/electron/pull/38931">25</a>,
<a
href="https://redirect.github.com/electron/electron/pull/38930">26</a>)<!--
raw HTML omitted --></li>
<li>Fixed minimize button to be visible when all buttons reenabled. <a
href="https://redirect.github.com/electron/electron/pull/38880">#38880</a>
<!-- raw HTML omitted -->(Also in <a
href="https://redirect.github.com/electron/electron/pull/38881">24</a>,
<a
href="https://redirect.github.com/electron/electron/pull/38879">25</a>)<!--
raw HTML omitted --></li>
</ul>
<h2>electron v23.3.8</h2>
<h1>Release Notes for v23.3.8</h1>
<h2>Other Changes</h2>
<ul>
<li>Security: backported fix for CVE-2023-3215.
<ul>
<li>Security: backported fix for CVE-2023-3216.</li>
<li>Security: backported fix for 1450536. <a
href="https://redirect.github.com/electron/electron/pull/38788">#38788</a></li>
</ul>
</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="4b782e259b"><code>4b782e2</code></a>
fix: avoid package.json check on built-in modules (<a
href="https://redirect.github.com/electron/electron/issues/39426">#39426</a>)</li>
<li><a
href="b2047d710c"><code>b2047d7</code></a>
ci: fix hang when validating AppVeyor artifacts (<a
href="https://redirect.github.com/electron/electron/issues/39401">#39401</a>)</li>
<li><a
href="10b2baea43"><code>10b2bae</code></a>
docs: clean up removed systemPreferences methods (<a
href="https://redirect.github.com/electron/electron/issues/39349">#39349</a>)</li>
<li><a
href="454990a201"><code>454990a</code></a>
chore: cherry-pick 4 changes from Release-0-M115 (<a
href="https://redirect.github.com/electron/electron/issues/39268">#39268</a>)</li>
<li><a
href="10b49ffa12"><code>10b49ff</code></a>
chore: cherry-pick 2 changes from webrtc (<a
href="https://redirect.github.com/electron/electron/issues/39274">#39274</a>)</li>
<li><a
href="dc0fc78fac"><code>dc0fc78</code></a>
fix: do not resolve electron entrypoints on disk (<a
href="https://redirect.github.com/electron/electron/issues/39249">#39249</a>)</li>
<li><a
href="1aafc2ae38"><code>1aafc2a</code></a>
ci: fail appveyor build if artifacts are missing (<a
href="https://redirect.github.com/electron/electron/issues/39219">#39219</a>)</li>
<li><a
href="595e25a270"><code>595e25a</code></a>
fix: use StartUpdating method for PipeWire capturer (<a
href="https://redirect.github.com/electron/electron/issues/39116">#39116</a>)</li>
<li><a
href="7fe5925c94"><code>7fe5925</code></a>
build: disable unneeded depot_tools update on Windows CI (<a
href="https://redirect.github.com/electron/electron/issues/39016">#39016</a>)</li>
<li><a
href="c4b0ff4994"><code>c4b0ff4</code></a>
chore: cherry-pick 4 changes from Release-3-M114 (<a
href="https://redirect.github.com/electron/electron/issues/38948">#38948</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/electron/electron/compare/v23.1.2...v23.3.13">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=electron&package-manager=npm_and_yarn&previous-version=23.1.2&new-version=23.3.13)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-09-07 17:39:49 -07:00
Jian Chen
8914fe687b
[js/webgpu] Include Support for neg.int32 (#17374)
### Description
Include Support for neg.int32



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-09-06 12:00:16 -07:00
Yulong Wang
75710f0006
[js/webgpu] add matmul broadcast tests (#17335)
### Description

Commit fffefb1c22 (#16969) optimized
matmul and also fixes broadcasting. So #17191 is no longer needed.
However, the newly added operator test file from the PR by @dakenf is
helpful so pick and add it to enhance the tests.
2023-09-05 20:41:46 -07:00
xhcao
026672e947
[js/webgpu] Support slice int32 (#16968)
Co-authored-by: Xing Xu <xing.xu@intel.com>
2023-09-05 18:05:47 -07:00
Jiajia Qin
5e747071be
[js/webgpu] Fix bug in conv2dByMatMul path (#17369)
### Description
<!-- Describe your changes. -->
For the conv2dByMatMul path, the simulated matmul output shape is the
reshape of the original conv2d. So we should pass this information to
`createMatmulProgramInfo` so that it can process it correctly.
2023-09-02 00:16:28 -07:00
Jian Chen
e60493525f
[js/webgpu] Adding support for abs with int32 type (#17359)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-08-31 08:13:54 -07:00
Jiajia Qin
352b745deb
[js/webgpu] Add input/output shapes information to profiling (#17342)
### Description
This PR is to enhance the profiling information.
With the PR, the profiling result is like below:
```
[profiling] kernel "[Split] 51288384" input[0]: 1,256,64,64, output[0]: 1,256,64,64, execution time: 37135 ns
program-manager.ts:114 
[profiling] kernel "[Concat] 52361040" input[0]: 1,256,64,64, output[0]: 1,256,64,64, execution time: 50833 ns
program-manager.ts:114 
[profiling] kernel "[Transpose] 52375264" input[0]: 1,256,64,64, output[0]: 1,64,64,256, execution time: 99791 ns
program-manager.ts:114 
[profiling] kernel "[Sub] 51098472" input[0]: , input[1]: 1, output[0]: 1, execution time: 7448 ns
program-manager.ts:114 
[profiling] kernel "[Mul] 51344440" input[0]: 1, input[1]: 1,256,1,1, output[0]: 1,256,1,1, execution time: 8334 ns
```
Without this PR, the profiling result is like below:
```
[profiling] kernel "52097928|[Split] 52097928" execution time: 37760 ns
program-manager.ts:105 
[profiling] kernel "41898328|[Concat] 41898328" execution time: 51666 ns
program-manager.ts:105 
[profiling] kernel "41915648|[Transpose] 41915648" execution time: 95416 ns
program-manager.ts:105 
[profiling] kernel "49757856|[Sub] 49757856" execution time: 7969 ns
program-manager.ts:105 
[profiling] kernel "51680504|[Mul] 51680504" execution time: 8906 ns
```
With the new information, we can easily know what kind of shape ops have
poor performance. Also it can help us to check whether too small shape
ops run on gpu.
2023-08-31 08:12:28 -07:00
Yulong Wang
e5ca3f3dcb
[js/api] introducing IO binding for tensor (#16452)
[//]: # (## Work In Progress. Feedbacks are welcome!)

### Description
This PR adds a few properties, methods and factories to Tensor type to
support IO-binding feature. This will allow user to create tensor from
GPU/CPU bound data without a force transferring of data between CPU and
GPU.

This change is a way to resolve #15312

### Change Summary
1. Add properties to `Tensor` type:
a. `location`: indicating where the data is sitting. valid values are
`cpu`, `cpu-pinned`, `texture`, `gpu-buffer`.
b. `texture`: sit side to `data`, a readonly property of `WebGLTexture`
type. available only when `location === 'texture'`
c. `gpuBuffer`: sit side to `data`, a readonly property of `GPUBuffer`
type. available only when `location === 'gpu-buffer'`

2. Add methods to `Tensor` type (usually dealing with inference
outputs):
- async function `getData()` allows user to download data from GPU to
CPU manually.
- function `dispose()` allows user to release GPU resources manually.

3. Add factories for creating `Tensor` instances:
    a. `fromTexture()` to create a WebGL texture bound tensor data
    b. `fromGpuBuffer()` to create a WebGPUBuffer bound tensor data
    c. `fromPinnedBuffer()` to create a tensor using a CPU pinned buffer

### Examples:

create tensors from texture and pass to inference session as inputs
```js
// when create session, specify we prefer 'image_output:0' to be stored on GPU as texture
const session = await InferenceSession.create('./my_model.onnx', {
  executionProviders: [ 'webgl' ],
  preferredOutputLocation: { 'image_output:0': 'texture' }
});

...

const myImageTexture = getTexture(); // user's function to get a texture
const myFeeds = { input0: Tensor.fromTexture(myImageTexture, { width: 224, height: 224 }) }; // shape [1, 224, 224, 4], RGBA format.
const results = await session.run(myFeeds);
const myOutputTexture = results['image_output:0'].texture;
```
2023-08-29 12:58:26 -07:00
Jiajia Qin
fffefb1c22
[js/webgpu] Optimize matmul (#16969)
### Description
Changes in this PR:
1) use the optimized version `makeMatMulPacked[Vec4]Source` to support
matmul.
2) enable the conv2dByMatMul path.
3) support broadcast
4) use IndicesHelper.

MatMul with M = 512, K = 512, N = 512 becomes 2ms from 15ms when
enabling profilingMode on my ADL.
2023-08-29 12:40:57 -07:00
Caroline
228db24317
Add training API functions to WASM API (#16521)
### Description
* Created `wasm/training_api` source and header files & modified
WebAssembly CMake to include training flags
* The `wasm/training_api` files use an `OrtTrainingManager` handle which
is a struct of an OrtCheckpointState and an OrtTrainingSession, rather
than creating a CheckpointState handle & a separate TrainingSession
handle.
* This is so that the TypeScript side only has to manage one handle that
will be passed between TrainingSession & CheckpointState
representations, rather than the TypeScript side managing separate
CheckpointStateHandle and TrainingSessionHandle.


### Motivation and Context
WASM API needs to be updated with ORT training API function calls so
that ORT training web bindings can be added for on-device training.

---------

Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>
Co-authored-by: carzh <carolinezhu@microsoft.com>
Co-authored-by: Ashwini Khade <askhade@microsoft.com>
2023-08-28 11:05:02 -07:00
Hariharan Seshadri
cbd97515cd
[JS/WebGPU] Support GatherElements kernel (#17243)
### Description
As title


### Motivation and Context
Improve WebGPU kernel coverage
2023-08-28 09:55:25 -07:00
Yulong Wang
bb1871332f
[js/webgpu] add kernel Not and Equal (#17306)
### Description
This PR adds kernel implementation for operator "Not" and "Equal". Also
removed download cache in gpu data manager.

**Why removing download cache**
The following test case failed. ("Or" is on CPU, "Greater" and "Equal"
are on JSEP)

![image](https://github.com/microsoft/onnxruntime/assets/7679871/8d9798ad-2703-4fb9-907e-ff716c67d0b2)
after debugging, I found that both "Equal" and "Greater" are using the
same output GPU Data ID. This is because when ORT executes the graph, it
first run "Equal", allowing its shader to write into GPU Data ID 2; then
a Gpu2Cpu copy for it is issued (because currently "Or" is on CPU EP);
at this point, ORT thinks GPU Data ID=2 is free to use; so it reuse it
as output for "Greater". This means there is no allocation for output of
"Greater" kernel, and both kernel writes to GPU Data ID=2.

For gpu data manager, there will be 2 downloads from the same GPU
buffer. Previously I think this is a waste of resource so I cached the
data. But now it shoes that we need to perform 2 downloads because the
GPU data is already different. The download data cache should be
removed.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-08-27 19:50:17 -07:00
Yulong Wang
ddcd46174e
[js/webgpu] fix jsepOnRunEnd (#17300)
### Description
fix jsepOnRunEnd: jsepOnRunEnd() need to be run after runPromise is
resolved.
2023-08-26 00:30:28 -07:00
Jiajia Qin
873ef8b8f0
[js/webgpu] add label for some webgpu APIs (#17291)
### Description
<!-- Describe your changes. -->
With the label, it's more easier to identify which op causes the error.

Without the label, the error message is like below: 
```
Tint WGSL reader failure: :12:5 error: return statement type must match its function return type, returned 'vec4<f32>', expected 'f32'
    return W[i2o_W(indices)];
    ^^^^^^

 - While validating [ShaderModuleDescriptor]
 - While calling [Device].CreateShaderModule([ShaderModuleDescriptor]).
```
With the label, the error message is like below:
```
Tint WGSL reader failure: :12:5 error: return statement type must match its function return type, returned 'vec4<f32>', expected 'f32'
    return W[i2o_W(indices)];
    ^^^^^^

 - While validating [ShaderModuleDescriptor "ConvTranspose2D"]
 - While calling [Device].CreateShaderModule([ShaderModuleDescriptor "ConvTranspose2D"]).
```
### Motivation and Context
This change is mainly for debugging. With this change, we can easily
know that `ConvTranspose2D`'s shader has problem from above message.
2023-08-25 12:12:56 -07:00
xhcao
5e8d94cec8
[js/webgpu] support Greater and Less operators (#17296)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-08-25 12:11:25 -07:00
Yulong Wang
79c4ed9a45
[js/webgpu] support error pop and kernel name (#17260)
### Description
This PR contains changes to support error pop and kernel name.

- Add a function `JsepGetNodeName` to allow reading kernel name from JS
to C++
- When in debug mode ( `env.debug = true;` ) or in profiling mode (
`env.webgpu.profilingMode = 'default';` ), kernel name will be read from
ORT; otherwise use the kernel pointer ( a number ) as kernel name to
save calls from JS to C++.
- When in debug mode, WebGPU validation errors will be recorded and if
any error occurs, `inferenceSession.run()` will fail (Promise get
rejected). Behavior when not in debug mode is not changed. This is
because recording errors are not zero-overhead, and GPU validation
errors should occur consistently in and not in debug mode.
- Add `jsepOnRunStart()` and `jsepOnRunEnd()` hook to:
   - allow implementation of the features mentioned above.
   - pass session ID to backend.
2023-08-25 08:08:15 -07:00
satyajandhyala
da180b20fa
[JS/Web] Fix ConvTranspose shader code compilation errors. (#17232)
### Description
Fix JSEP ConvTranspose shader code errors.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-08-25 06:25:54 -07:00
Yulong Wang
fb51faea64
[js/webgpu] fix 2 build breaks introduced in merge (#17273)
### Description
fix 2 build breaks introduced in merge. Fixes web build
2023-08-23 18:09:50 -07:00
Yulong Wang
8b18d48c7c
[js/webgpu] make IndicesHelper implementation implicit (#17193)
### Description
This change makes it no longer required to call indicesHelper.impl() in
shader code.
2023-08-23 14:41:35 -07:00
Arthur Islamov
5842144d98
[js/web] JSEP Gemm for opset 13 (#16936)
### Description
Added JSEP Gemm registration for opset 13. It was falling back to CPU
provider as CPU has it for 13

---------

Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com>
2023-08-22 18:13:20 -07:00
Guenther Schmuelling
d3d3dde844
fix webgpu split (#17258)
fix webgpu split for the case of split_sizes coming from input[1]
2023-08-22 16:49:22 -07:00
Yulong Wang
6fc3fd9ece
[js/webgpu] support Cast operator (#16489)
### Description
support `Cast` operator for webgpu backend.

Cast operator for webgpu backend currently only supports f32, u32, i32
and bool.
2023-08-18 23:51:03 -07:00
xhcao
dd3b2cefd6
[js/webgpu] Support int32 type for binary (#16901)
### Description
Enable typed binary and support int32 type for binary.

Co-authored-by: Xing Xu <xing.xu@intel.com>

---------

Co-authored-by: Xing Xu <xing.xu@intel.com>
2023-08-18 12:19:01 -07:00
Hariharan Seshadri
a476dbf430
[JS/WebGPU] Support Tile operator (#17123)
### Description
As title

### Motivation and Context
Improve WebGPU op coverage
2023-08-18 10:07:21 -07:00
satyajandhyala
7d1a5635a0
[JS/Web] Added SkipLayerNormalization operator. (#17102)
### Description
Add SkipLayerNormalization operator to JSEP.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-08-18 09:59:03 -07:00
Yulong Wang
cbee84ddfb
[js/web] allow optional input/output in operator test (#17184)
### Description
allow optional input/output in operator test
2023-08-16 11:50:11 -07:00
Hariharan Seshadri
66df11769c
[JS/WebGPU] Expand operator fixes (#17137) 2023-08-16 11:24:26 -07:00
satyajandhyala
89b682e3f3
[JS/Web] The bias input is optional, not required, for LayerNormalization operator (#17143)
### Description
Fix a typo. LayerNormalization takes 2 or 3 inputs. The third input,
bias, is optional.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-08-16 10:41:20 -07:00
Yulong Wang
133af1385c
[js/webgpu] update shader cache key to include input tensor datatype (#17176)
### Description
update shader cache key to include input tensor datatype.

and make the key a little bit easier to read
2023-08-16 09:14:19 -07:00
xhcao
33ecde9af1
[js/webgpu] Fix reshape int32 test case (#17113)
Co-authored-by: Xing Xu <xing.xu@intel.com>

Co-authored-by: Xing Xu <xing.xu@intel.com>
2023-08-15 21:18:13 -07:00
Guenther Schmuelling
8289e8b6ef
[js/webgpu] fix a few shader errors (#17171)
Fix for segment anything decoder, reduceMax with rank1 and concat.
2023-08-15 21:14:20 -07:00
Yulong Wang
35363dd9a5
[js/web] a few optimizations for test runner (#17174)
### Description
1. allows passing session options to operator test (eg. graph
optimization level)
2. add a short flag '-x' for '--wasm-number-threads' as it is frequently
used.
2023-08-15 21:00:23 -07:00
Arthur Islamov
ccf14e891e
[js/web] JSEP node assignment optimization (#17128)
### Description
Since WebGPU supports only float32 and int32, having Gather, Reshape,
Shape, Squeeze and Unsqueeze ops with other data types create additional
MemCpy ops and slow down the overall execution as all other OPs with
other tensor types will be done on CPU.

Before this patch SD Unet had these numbers:
Node(s) placed on [CPUExecutionProvider]. Number of nodes: 1141
Node(s) placed on [JsExecutionProvider]. Number of nodes: 4025
memcpy tokens: 2001

After patch:
Node(s) placed on [CPUExecutionProvider]. Number of nodes: 1735
Node(s) placed on [JsExecutionProvider]. Number of nodes: 2243
memcpu tokens: 813

It also gives more than 5X performance benefit. From 12sec for one Unet
step to 2.2sec on RTX 3090 Ti, so we are almost getting to native
performance.

UPD: with latest changes from main branch and multi-threading it went
down to 1.6sec. Will try re-exporting my model to onnx with maximum
optimizations, like using MultiHeadAttention to decrease node count.
Maybe after implementing that it can go in less than 1 sec
2023-08-15 18:58:05 -07:00
xhcao
24e0bd37b4
[JS/WebGPU] Support Log operator (#17045)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-08-14 18:04:12 -07:00
Guenther Schmuelling
9204cd7392
[js/webgpu] Add C++ registration for operator Tanh in JSEP (#17124)
add webgpu/tanh

Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
2023-08-12 11:43:39 -07:00
Yulong Wang
e7adbb38f6
[js/webgpu] disable test case 'test_batchnorm_epsilon_training_mode' temporarily (#17129)
### Description

test case 'test_batchnorm_epsilon_training_mode' on webgpu is failing.
the issue need time to investigate so comment this off and re-enable it
when the root cause is fixed.
2023-08-12 08:53:10 -07:00
Yulong Wang
14a8315f10
[js/web] [webgpu] new incides helper (#16957)
### Description
This PR introduces the new incides helper.

IndicesHelper is a helper class for generating WGSL code for
manipulating indices and data for a shader's input or output.

This class is designed to offer a unified way to generate WGSL code for
manipulating indices and data for a shader's input or output. The
following is a list of terminologies used in this class:
- `offset`: a uint32 value representing the offset of an element in the
data buffer.
- `indices`: an abstraction of a multi-dimensional array's indices
representing the data's index on each dimension.
- `value`: a value of a data element.

Users are expected to create an instance of this class for each shader's
input or output, and use the instance to generate WGSL code for
manipulating indices and data. The following 2 exported functions are
for users to call to create an instance of an indices helper:
 - `inputVariable()`: create an indices helper instance for an input.
 - `outputVariable()`: create an indices helper instance for an output.


An indices helper instance contains helper functions for the following
operations:
- access readonly basic information, including: `name`(the name of the
input or output), `usage`(whether it's an input or an output) and
`shape`(the passed in shape).
- `type`: access readonly type information, including: `indices`(the
type of indices), `value`(the type of value at runtime), `storage`(the
type of value at storage) and `tensor`(the tensor type as represented in
TensorView).
- generate WGSL code for getting indices from offset. Use
`offsetToIndices()` for WGSL code snippet to calculate incides from
offset, and use `indicesToOffset()` for WGSL code snippet to calculate
offset from indices.
- to manipulate an instance of indices, use `setIndices()` and
`getIndices()` to set and get the indices on an indices variable.
- to manipulate data, use `set()`/`get()` to access data at the given
indices from parameter list, use `setByIndices()`/`getByIndices()` to
access data at the given indices from an indices variable, and use
`setByOffset()`/`getByOffset()` to access data at the given offset.
- `impl`: get WGSL code of function implementation for the util
functions mentioned above.

This change applies the usage of new IndicesHelper through the code, but
not necessary for all code.
2023-08-11 11:36:59 -07:00
satyajandhyala
e8a9d4f04d
[JS/Web] Fix Resize kMSInternalNHWCDomain (#17023)
### Description
Fix some Resize failing tests.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
2023-08-10 09:14:43 -07:00
Zimon Tai
a3e02e8e2a
Fix Resize op input check (#16594)
### Description
onnxjs contains a `Resize` op input check which is outdated since opset
9. Currently `Resize` supports up to 4 inputs. This PR looses the input
check.



### Motivation and Context

Fixes #15636
2023-08-09 15:42:30 -07:00
Yulong Wang
56bced0581
[js/web] enable webgpu in browser unit test (#16310)
### Description
enable webgpu in browser unit test.

The CI pipeline uses Edge v113+ which enables WebGPU.

===

**UPDATE on 08/07/2023:**
- add flags to Edge browser launch commandline so that Edge on CI agents
can initialize WebGPU correctly.
- ONLY enable webgpu on web release build. Other pipelines are using
flag `-b=wasm,webgl,xnnpack` to specify the other 3 backends explicitly.
- disable "Resize" related test failures. Once they are fixed the tests
can be re-enabled.

---------

Co-authored-by: Satya Jandhyala <satya.k.jandhyala@gmail.com>
2023-08-08 11:45:04 -07:00
Arthur Islamov
c3f04251c7
[js/web] JSEP LayerNormalization and InstanceNormalizations kernels (#16830)
### Description
Added two kernels for Layer and Instance norm

Also added maximum limits for `maxBufferSize` when requesting GPU device
as by default it's limited to 256mb and it fails allocating 600mb buffer
while running fp32 StableDiffusion weights.


### Motivation and Context
These two are used in StableDiffusion and many other networks
2023-08-08 09:09:37 -07:00
Jiajia Qin
9ea0a3129b
[js/webgpu] Make sure only storage buffers are reused (#16893)
### Description
<!-- Describe your changes. -->
This PR makes sure that only storage buffers are reused. Previously, the
query buffer might also get from the freeBuffers list if there is a
matching size in it. But they are different usage, which results errors.
2023-08-04 13:40:52 -07:00
satyajandhyala
7ad43d9564
[JS/Web] Fixed ArgMin and ArgMax and refactored (#17002)
Fixed ArgMin and ArgMax and refactored using functionality from Reduce
operator code.

### Description
Removed code/functionality duplication and fixed some issue.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-08-04 12:59:36 -07:00
satyajandhyala
cc4b64f646
[JS/Web] Modify Reduce, Expand and Slice to pass op and node tests. (#16979)
### Description
Make CacheHint mechanism, which is designed to avoid running the same
test multiple times saving the result mapped against a key, working by
adding input dims.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-08-03 15:48:47 -07:00
Yulong Wang
641c3a4a37
[js/web] update op test schema (#16921)
### Description
update op test schema.

This changes fixes several problems for operator tests for web:
- `opsets` -> `opset`: an operator uses exactly one opset instead of
multiple
- `condition` -> `platformCondition`: make it less confusing
- `inputShapeDefinitions`: allows to test ORT behaviors when it get
no/partial/full shape info.

Added a JSON schema file and also an example file
2023-08-03 14:20:20 -07:00