Commit graph

451 commits

Author SHA1 Message Date
Xu Xing
fa106942a7
[js/webgpu] Refactor matmul conv to support uniforms for matmul (#18452)
This change refactored matmul/conv related programs to support shape
uniforms. Currently only matmul shape uniforms are fully enabled.
TODOs: add input dependencies for conv related programs, turn clipMax
and clipMin to uniforms.
2023-11-22 14:42:55 -08:00
satyajandhyala
841f7ed3e0
[[JS/Web]Added uniform to Expand op. (#18558)
### Description
<!-- Describe your changes. -->
Added Uniforms to Expand operator kernel


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Improve performance
2023-11-22 14:14:24 -08:00
Arthur Islamov
1c555c5fc1
[JS/Web] Resize & BiasSplitGelu fp16 support (#18536)
### Description
Resize and BiasSplitGelu fp16 support on WebGPU
2023-11-22 12:12:07 -08:00
Yulong Wang
c7fd930330
[js/web] unify resolve rules for "Clip" (#18527)
### Description
It was a mistake to use 2 different names for Clip operator in
op-resolve-rules.ts for different opset. An optimized implementation can
handle both cases (opset < 11 and opset >=11). Remove "ClipV10" as an
entry from the table.
2023-11-20 23:18:06 -08:00
Jiajia Qin
abdf8b7c3f
[js/webgpu] Optimize broadcast binary. (#18185)
### Description
Currently, the binary algorithms are divided into the vectorize one
(efficient) and non-vectorize one (less efficient). Below situations
will go to the vectorize one:
1) A or B's shape length is 1.
2) The shared dimensions length of A and B are divisible by 4.
3) A and B have same shape.

This PR adds another situation as below to go to the vectorize
algorithm.
4. A or B's last dimension is divisible by 4.

With this change, the aggerate time of Add in sam-b-encoder becomes
309.65 ms from 409.12 ms on Intel ADL.
2023-11-20 16:52:17 -08:00
Yulong Wang
247ce21859
[js] optimize eslint config (#18460)
### Description
optimize eslint config to:
- set parserOptions.project to `true` to allow @typescript-eslint/parser
to find the nearest tsconfig.json file to that source file. This helps
to avoid parsing extra files, may helps with:
- reduce the possibility of seeing OOM or stackoverflow with "npm run
lint"
   - faster processing
- enforce rule "no-underscore-dangle" with a list of exceptions.
2023-11-20 12:00:56 -08:00
Yulong Wang
34c5424456
[js] update a few packages (#18499)
### Description
[js] update a few packages

- update semver
- update reference of onnx_proto to local folder in order to upgrade
protobufjs@7.2.4

Resolve AB#18513
2023-11-17 22:40:51 -08:00
Arthur Islamov
fac3e33da5
[js/web] JSEP Attention & MultiHeadAttention (#17742)
### Description
This is a narrow implementation of Attention/MultiHeadAttention as it
does not support:
a. inputs 5-7 for MHA
b. packed QKV/KV
c. past/present
d. attention mask

But it works well for StableDiffusion and can be extended later. It
reduces VRAM usage as it combines many ops into few
I've updated demo here https://islamov.ai/stable-diffusion-webgpu/ it
takes ~13sec for 1 image with 20 steps on RTX3090Ti and about 25s on M1
Pro
VRAM usage is about 8gb if you don't use img2img

Going to focus on SDXL now

---------

Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com>
Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
2023-11-17 12:23:52 -08:00
satyajandhyala
b291b20fa0
[JS/Web]Added uniforms support to Slice op. (#18422)
### Description
Support uniforms in Slice op



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Improve ferformance
2023-11-16 09:44:13 -08:00
Yulong Wang
586f06f5a1
[js/web] set noUnusedParameters to true and fix a few bugs (#18404)
### Description
- set tsconfig "noUnusedParameters" to `true` and fix a few bugs
discovered by typescript.
   how unused parameter is fixed:
- for most code (webgl), add underscore as prefix, which is the standard
ignore pattern for typescript check.
- remove unused parameter from function and modify corresponding
function calls (jsep)
- fix a bug in ArgMinMax: this 2 operators do not have more than one
input(s) so the `createArgMinMaxAttributesFromInputs()` is removed.
- add proxy main.ts into typescript check and fix a bug in parameter
passing
   - fixed `run()` function call and add typecheck fix (hack)
2023-11-15 09:16:29 -08:00
dependabot[bot]
5aeed62630
Bump axios from 1.3.4 to 1.6.1 in /js/node (#18400)
Bumps [axios](https://github.com/axios/axios) from 1.3.4 to 1.6.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/axios/axios/releases">axios's
releases</a>.</em></p>
<blockquote>
<h2>Release v1.6.1</h2>
<h2>Release notes:</h2>
<h3>Bug Fixes</h3>
<ul>
<li><strong>formdata:</strong> fixed content-type header normalization
for non-standard browser environments; (<a
href="https://redirect.github.com/axios/axios/issues/6056">#6056</a>)
(<a
href="dd465ab22b">dd465ab</a>)</li>
<li><strong>platform:</strong> fixed emulated browser detection in
node.js environment; (<a
href="https://redirect.github.com/axios/axios/issues/6055">#6055</a>)
(<a
href="3dc8369e50">3dc8369</a>)</li>
</ul>
<h3>Contributors to this release</h3>
<ul>
<li><!-- raw HTML omitted --> <a
href="https://github.com/DigitalBrainJS" title="+432/-65
([#6059](https://github.com/axios/axios/issues/6059)
[#6056](https://github.com/axios/axios/issues/6056)
[#6055](https://github.com/axios/axios/issues/6055) )">Dmitriy
Mozgovoy</a></li>
<li><!-- raw HTML omitted --> <a href="https://github.com/meyfa"
title="+5/-2 ([#5835](https://github.com/axios/axios/issues/5835)
)">Fabian Meyer</a></li>
</ul>
<h2>Release v1.6.0</h2>
<h2>Release notes:</h2>
<h3>Bug Fixes</h3>
<ul>
<li><strong>CSRF:</strong> fixed CSRF vulnerability CVE-2023-45857 (<a
href="https://redirect.github.com/axios/axios/issues/6028">#6028</a>)
(<a
href="96ee232bd3">96ee232</a>)</li>
<li><strong>dns:</strong> fixed lookup function decorator to work
properly in node v20; (<a
href="https://redirect.github.com/axios/axios/issues/6011">#6011</a>)
(<a
href="5aaff532a6">5aaff53</a>)</li>
<li><strong>types:</strong> fix AxiosHeaders types; (<a
href="https://redirect.github.com/axios/axios/issues/5931">#5931</a>)
(<a
href="a1c8ad008b">a1c8ad0</a>)</li>
</ul>
<h3>PRs</h3>
<ul>
<li>CVE 2023 45857 ( <a
href="https://api.github.com/repos/axios/axios/pulls/6028">#6028</a>
)</li>
</ul>
<pre><code>
⚠️ Critical vulnerability fix. See
https://security.snyk.io/vuln/SNYK-JS-AXIOS-6032459
</code></pre>
<h3>Contributors to this release</h3>
<ul>
<li><!-- raw HTML omitted --> <a
href="https://github.com/DigitalBrainJS" title="+449/-114
([#6032](https://github.com/axios/axios/issues/6032)
[#6021](https://github.com/axios/axios/issues/6021)
[#6011](https://github.com/axios/axios/issues/6011)
[#5932](https://github.com/axios/axios/issues/5932)
[#5931](https://github.com/axios/axios/issues/5931) )">Dmitriy
Mozgovoy</a></li>
<li><!-- raw HTML omitted --> <a
href="https://github.com/valentin-panov" title="+4/-4
([#6028](https://github.com/axios/axios/issues/6028) )">Valentin
Panov</a></li>
<li><!-- raw HTML omitted --> <a href="https://github.com/therealrinku"
title="+1/-1 ([#5889](https://github.com/axios/axios/issues/5889)
)">Rinku Chaudhari</a></li>
</ul>
<h2>Release v1.5.1</h2>
<h2>Release notes:</h2>
<h3>Bug Fixes</h3>
<ul>
<li><strong>adapters:</strong> improved adapters loading logic to have
clear error messages; (<a
href="https://redirect.github.com/axios/axios/issues/5919">#5919</a>)
(<a
href="e4107797a7">e410779</a>)</li>
<li><strong>formdata:</strong> fixed automatic addition of the
<code>Content-Type</code> header for FormData in non-browser
environments; (<a
href="https://redirect.github.com/axios/axios/issues/5917">#5917</a>)
(<a
href="bc9af51b18">bc9af51</a>)</li>
<li><strong>headers:</strong> allow <code>content-encoding</code> header
to handle case-insensitive values (<a
href="https://redirect.github.com/axios/axios/issues/5890">#5890</a>)
(<a
href="https://redirect.github.com/axios/axios/issues/5892">#5892</a>)
(<a
href="4c89f25196">4c89f25</a>)</li>
<li><strong>types:</strong> removed duplicated code (<a
href="9e6205630e">9e62056</a>)</li>
</ul>
<h3>Contributors to this release</h3>
<ul>
<li><!-- raw HTML omitted --> <a
href="https://github.com/DigitalBrainJS" title="+89/-18
([#5919](https://github.com/axios/axios/issues/5919)
[#5917](https://github.com/axios/axios/issues/5917) )">Dmitriy
Mozgovoy</a></li>
<li><!-- raw HTML omitted --> <a href="https://github.com/DavidJDallas"
title="+11/-5 ()">David Dallas</a></li>
<li><!-- raw HTML omitted --> <a href="https://github.com/fb-sean"
title="+2/-8 ()">Sean Sattler</a></li>
<li><!-- raw HTML omitted --> <a href="https://github.com/0o001"
title="+4/-4 ()">Mustafa Ateş Uzun</a></li>
<li><!-- raw HTML omitted --> <a
href="https://github.com/sfc-gh-pmotacki" title="+2/-1
([#5892](https://github.com/axios/axios/issues/5892) )">Przemyslaw
Motacki</a></li>
<li><!-- raw HTML omitted --> <a href="https://github.com/Cadienvan"
title="+1/-1 ()">Michael Di Prisco</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/axios/axios/blob/v1.x/CHANGELOG.md">axios's
changelog</a>.</em></p>
<blockquote>
<h2><a
href="https://github.com/axios/axios/compare/v1.6.0...v1.6.1">1.6.1</a>
(2023-11-08)</h2>
<h3>Bug Fixes</h3>
<ul>
<li><strong>formdata:</strong> fixed content-type header normalization
for non-standard browser environments; (<a
href="https://redirect.github.com/axios/axios/issues/6056">#6056</a>)
(<a
href="dd465ab22b">dd465ab</a>)</li>
<li><strong>platform:</strong> fixed emulated browser detection in
node.js environment; (<a
href="https://redirect.github.com/axios/axios/issues/6055">#6055</a>)
(<a
href="3dc8369e50">3dc8369</a>)</li>
</ul>
<h3>Contributors to this release</h3>
<ul>
<li><!-- raw HTML omitted --> <a
href="https://github.com/DigitalBrainJS" title="+432/-65
([#6059](https://github.com/axios/axios/issues/6059)
[#6056](https://github.com/axios/axios/issues/6056)
[#6055](https://github.com/axios/axios/issues/6055) )">Dmitriy
Mozgovoy</a></li>
<li><!-- raw HTML omitted --> <a href="https://github.com/meyfa"
title="+5/-2 ([#5835](https://github.com/axios/axios/issues/5835)
)">Fabian Meyer</a></li>
</ul>
<h1><a
href="https://github.com/axios/axios/compare/v1.5.1...v1.6.0">1.6.0</a>
(2023-10-26)</h1>
<h3>Bug Fixes</h3>
<ul>
<li><strong>CSRF:</strong> fixed CSRF vulnerability CVE-2023-45857 (<a
href="https://redirect.github.com/axios/axios/issues/6028">#6028</a>)
(<a
href="96ee232bd3">96ee232</a>)</li>
<li><strong>dns:</strong> fixed lookup function decorator to work
properly in node v20; (<a
href="https://redirect.github.com/axios/axios/issues/6011">#6011</a>)
(<a
href="5aaff532a6">5aaff53</a>)</li>
<li><strong>types:</strong> fix AxiosHeaders types; (<a
href="https://redirect.github.com/axios/axios/issues/5931">#5931</a>)
(<a
href="a1c8ad008b">a1c8ad0</a>)</li>
</ul>
<h3>PRs</h3>
<ul>
<li>CVE 2023 45857 ( <a
href="https://api.github.com/repos/axios/axios/pulls/6028">#6028</a>
)</li>
</ul>
<pre><code>
⚠️ Critical vulnerability fix. See
https://security.snyk.io/vuln/SNYK-JS-AXIOS-6032459
</code></pre>
<h3>Contributors to this release</h3>
<ul>
<li><!-- raw HTML omitted --> <a
href="https://github.com/DigitalBrainJS" title="+449/-114
([#6032](https://github.com/axios/axios/issues/6032)
[#6021](https://github.com/axios/axios/issues/6021)
[#6011](https://github.com/axios/axios/issues/6011)
[#5932](https://github.com/axios/axios/issues/5932)
[#5931](https://github.com/axios/axios/issues/5931) )">Dmitriy
Mozgovoy</a></li>
<li><!-- raw HTML omitted --> <a
href="https://github.com/valentin-panov" title="+4/-4
([#6028](https://github.com/axios/axios/issues/6028) )">Valentin
Panov</a></li>
<li><!-- raw HTML omitted --> <a href="https://github.com/therealrinku"
title="+1/-1 ([#5889](https://github.com/axios/axios/issues/5889)
)">Rinku Chaudhari</a></li>
</ul>
<h2><a
href="https://github.com/axios/axios/compare/v1.5.0...v1.5.1">1.5.1</a>
(2023-09-26)</h2>
<h3>Bug Fixes</h3>
<ul>
<li><strong>adapters:</strong> improved adapters loading logic to have
clear error messages; (<a
href="https://redirect.github.com/axios/axios/issues/5919">#5919</a>)
(<a
href="e4107797a7">e410779</a>)</li>
<li><strong>formdata:</strong> fixed automatic addition of the
<code>Content-Type</code> header for FormData in non-browser
environments; (<a
href="https://redirect.github.com/axios/axios/issues/5917">#5917</a>)
(<a
href="bc9af51b18">bc9af51</a>)</li>
<li><strong>headers:</strong> allow <code>content-encoding</code> header
to handle case-insensitive values (<a
href="https://redirect.github.com/axios/axios/issues/5890">#5890</a>)
(<a
href="https://redirect.github.com/axios/axios/issues/5892">#5892</a>)
(<a
href="4c89f25196">4c89f25</a>)</li>
<li><strong>types:</strong> removed duplicated code (<a
href="9e6205630e">9e62056</a>)</li>
</ul>
<h3>Contributors to this release</h3>
<ul>
<li><!-- raw HTML omitted --> <a
href="https://github.com/DigitalBrainJS" title="+89/-18
([#5919](https://github.com/axios/axios/issues/5919)
[#5917](https://github.com/axios/axios/issues/5917) )">Dmitriy
Mozgovoy</a></li>
<li><!-- raw HTML omitted --> <a href="https://github.com/DavidJDallas"
title="+11/-5 ()">David Dallas</a></li>
<li><!-- raw HTML omitted --> <a href="https://github.com/fb-sean"
title="+2/-8 ()">Sean Sattler</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="f6d2cf9763"><code>f6d2cf9</code></a>
chore(ci): fix publish action content permission; (<a
href="https://redirect.github.com/axios/axios/issues/6061">#6061</a>)</li>
<li><a
href="a22f4b918a"><code>a22f4b9</code></a>
chore(release): v1.6.1 (<a
href="https://redirect.github.com/axios/axios/issues/6060">#6060</a>)</li>
<li><a
href="cb8bb2beb2"><code>cb8bb2b</code></a>
chore(ci): Publish to NPM with provenance (<a
href="https://redirect.github.com/axios/axios/issues/5835">#5835</a>)</li>
<li><a
href="37cbf9214a"><code>37cbf92</code></a>
chore(ci): added labeling and notification for published PRs; (<a
href="https://redirect.github.com/axios/axios/issues/6059">#6059</a>)</li>
<li><a
href="dd465ab22b"><code>dd465ab</code></a>
fix(formdata): fixed content-type header normalization for non-standard
brows...</li>
<li><a
href="3dc8369e50"><code>3dc8369</code></a>
fix(platform): fixed emulated browser detection in node.js environment;
(<a
href="https://redirect.github.com/axios/axios/issues/6055">#6055</a>)</li>
<li><a
href="f7adacdbaa"><code>f7adacd</code></a>
chore(release): v1.6.0 (<a
href="https://redirect.github.com/axios/axios/issues/6031">#6031</a>)</li>
<li><a
href="9917e67cbb"><code>9917e67</code></a>
chore(ci): fix release-it arg; (<a
href="https://redirect.github.com/axios/axios/issues/6032">#6032</a>)</li>
<li><a
href="96ee232bd3"><code>96ee232</code></a>
fix(CSRF): fixed CSRF vulnerability CVE-2023-45857 (<a
href="https://redirect.github.com/axios/axios/issues/6028">#6028</a>)</li>
<li><a
href="7d45ab2e2a"><code>7d45ab2</code></a>
chore(tests): fixed tests to pass in node v19 and v20 with
<code>keep-alive</code> enabl...</li>
<li>Additional commits viewable in <a
href="https://github.com/axios/axios/compare/v1.3.4...v1.6.1">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=axios&package-manager=npm_and_yarn&previous-version=1.3.4&new-version=1.6.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-11-14 00:38:00 -08:00
Xu Xing
949ac4b7ce
[js/webgpu] Support uniforms for gather (#18312) 2023-11-13 11:24:34 -08:00
Wanming Lin
73ed34ac4b
[WebNN EP] Support numThreads option for WebNN CPU device (#18054) 2023-11-12 16:45:10 -08:00
Xu Xing
0c8c0014f6
[js/webgpu] Use builtin num_workgroups to fix shader key conflict (#18387)
This fixes conformance failure of tinyyolov2-8 and potential shader key
conflict issues.
2023-11-10 17:37:45 -08:00
Yulong Wang
6b0c97b43f
[js/web] fix typescript type check (#18343)
### Description

This PR fixes the TypeScript type check.

Previously, when I use esbuild to replace webpack (#17745), typescript
typecheck was disabled. This causes a few TypeScript type error checked
in into the code base. This PR fixes the followings:

- Use "Node16" as default "module" value in tsconfig.json, because in
TypeScript v5, `(module == "ES2015" && moduleResolution == "Node16")` is
an invalid combination.
- Set `noUnusedParameters` to true as default. in web override it to
false because multiple code need to be updated ( a following-up PR will
do this )
- set correct project file for 'web/lib/**/*.ts' for ESLint (otherwise
WebGPU types are not populated correctly)
- fix type error in file js/web/lib/wasm/jsep/webgpu/program-manager.ts
- upgrade "@webgpu/types" to latest to fix type error in file
js/web/lib/wasm/jsep/backend-webgpu.ts
- add package script "prebuild" for web to run tsc type check
- add type check in CI yml file
2023-11-10 16:03:38 -08:00
Xu Xing
8dba6efd61
[js/webgpu] Add uniforms support to concat op (#18238) 2023-11-10 13:46:03 -08:00
Jiajia Qin
28c23aed04
[js/webgpu] Fix conv2d with activation (#18388)
### Description
Fix #18297

With PR #17766, conv2d activation in mobilenetv2-12 will not be empty.
However, activation is not supported yet in
[biasActivationSnippet](https://github.com/microsoft/onnxruntime/blob/main/js/web/lib/wasm/jsep/webgpu/ops/3rd-party/activation_util.ts#L48C14-L48C36).
This PR makes all places unify to use
[getActivationSnippet](https://github.com/microsoft/onnxruntime/blob/main/js/web/lib/wasm/jsep/webgpu/ops/fuse-utils.ts#L13)
to fix this issue.
2023-11-10 12:54:35 -08:00
Xu Xing
dd1bb760eb
[js/webgpu] Fix scalar uniform (#18318) 2023-11-10 10:12:22 -08:00
Xu Xing
829d802337
[js/webgpu] Support uniform for softmax (#18345) 2023-11-09 11:19:23 -08:00
Guenther Schmuelling
25fbc2b0ab
fix fused relu activation (#18303) 2023-11-09 08:18:21 -08:00
Yulong Wang
10df847baf
[js] fix linter out-of-memory issue (#18307)
### Description
fix linter out-of-memory issue by ignoring file pattern 'test/data/'.
2023-11-07 17:12:22 -08:00
Jiajia Qin
606356d0b1
[js/webgpu] Simplify the Resize shader when noScale is true (#18321)
### Description
For Resize, when `noScale` is true, the shader can become very simple,
which is not related with `attributes.mode` anymore. So we should remove
those parts of shader code for simplification.

This PR can also fix #18311 since the `noScale` are all true in that
model.

However, #18311 also exposes that the Resize implementation for `linear`
mode has bug. It seems that the currently implementation always treat
the input as either 2d or 4d tensor, however, the actual input is 3d
tensor, that's why the shader compilation is failed. We may need to fix
it in a separate PR.
2023-11-07 12:54:20 -08:00
satyajandhyala
a16d528399
[JS/Web] Added Uniforms support to binary ops. (#18260)
### Description
Added Uniform support to binary ops



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
To improve performance
2023-11-07 08:41:52 -08:00
satyajandhyala
e207060ac9
[JS/Web] Added Unifroms support to unary ops. (#18223)
### Description
Added uniforms support to unary ops.


### Motivation and Context
Improve performance
2023-11-03 09:30:54 -07:00
Scott McKay
4f2096be38
Update XNNPACK to latest version (#18038)
### Description
<!-- Describe your changes. -->
Update XNNPACK to latest version
- adds fp16 kernels and various other improvements
- requires pthreadpool update as well

Most code updates in the XNNPACK EP are to adjust to the new XNNPACK API
- 'setup' is split into 'reshape' and 'setup'
-  some ops use a workspace buffer
   -  copied workspace allocation from XNNPACK unit test code
- some suffixes changed 

Added wrapper for XNNPACK caches to base XNNPACK EP kernel
- simplifies usage
- XNNPACK split out the code and weights caches, but the code cache
isn't currently usable via the public API
- we could use the internal types if we think it's required for
performance reasons. non-trivial though as we'd need to propagate ifdef
values from the XNNPACK build up to the ORT build.
- using XNNPACK internals would also mean we would not be able to
support using a pre-build XNNPACK package
    - not an issue currently
  
Fixed opset registration for internal NHWC domain
- was not being tied to the ONNX version, so nodes inserted by layout
transformation had the incorrect opset
- a number of other places needed updating once this issue was fixed

Remove support for NCHW Resize from XNNPACK EP so it's NHWC only
- we only supported NCHW for fp32,
- doing so adds complexity in multiple places (XNNPACK EP kernel
implementation, layout transformation and transpose optimization)
- unclear if that complexity provides any benefit. can add back if
required by production scenario

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
We're looking at enabling fp16 support for CoreML and NNAPI. If we do
that we need a good fallback story if the CPU EP will be used. The
XNNPACK fp16 kernels will hopefully provide that.

NOTE: This PR doesn't add fp16 support to the XNNPACK EP kernels. That
can be done as required in separate EPs and should be relatively simple
to do.
2023-11-03 09:04:28 -07:00
xhcao
8d48d3e9cc
[js/web] optimize reduce related operators (#17957)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-11-02 12:51:48 -07:00
Caroline Zhu
e3b043ba17
[js/web/training] runTrainStep implementation (#18006)
### Description
* based on design document & following InferenceSession's run
implementation, implemented TrainingSession.runTrainStep

### Motivation and Context
* Adding web bindings for training

#### Related work
* #16521 allowed for training artifacts to be built
* #17333 added interfaces for training
* #17474 allowed for training package to be built + added training
backend to web package
* #17891 implementation for createTrainingSession on the TypeScript side
**[SHOULD BE MERGED IN BEFORE THIS PR]**

---------

Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
Co-authored-by: Ashwini Khade <askhade@microsoft.com>
2023-11-02 08:32:50 -07:00
satyajandhyala
a2e9ba72d5
[JS/Web]Added FusedConv. (#17766)
### Description
Added FusedConv and FusedConvTranspose



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Improve performance
2023-11-01 15:34:51 -07:00
Jiajia Qin
785e2b1eae
[js/webgpu] Optimize softmax by vector (#18153)
### Description
This PR enables `softmax` outputs max supported components instead of
scalar for each thread.

Softmax with input[0]: [12,4096,4096] becomes 47.86 ms from 55.11 ms
2023-10-30 16:05:35 -07:00
Yulong Wang
9bba990871
[js/web] fix a few package consuming problems (#18109)
### Description
This PR tries to fix a part of the NPM package consuming problems for
onnxruntime-web (ES module) as described in #10913:

- reduce the package size to fit the 150MB restriction in jsdelivr, by
removing dev build targets for uncommon exports
- add default export to support `import ort from 'onnxruntime-web';`
(currently only support `import * as ort from 'onnxruntime-web';`
2023-10-30 08:11:43 -07:00
Yang Gu
52f4968359
[js/webgpu] Change timestamp-query-in-passes to timestamp-query (#18108)
Timestamp-query has a broader support than timestamp-query-in-passes on
all the platforms, including macOS.
Note that to enable timestamp-query, you still need to add switch
"--enable-dawn-features=allow_unsafe_apis" to Chrome. By default, the
lowest 16 bits are masked with 0 (at a granularity about 0.1ms) for
privacy. To get the highest precision, you need to add another switch
"--enable-webgpu-developer-features".
2023-10-26 16:33:03 -07:00
Caroline Zhu
64de71c5e2
[js/web/training] Add CreateTrainingSession (#17891)
### Description
* Adds TrainingSession.create() functionality following the web bindings
for training design doc
* Added 2 new training APIs to wasm/api.h:
   * OrtTrainingGetInputOutputName
   * OrtTrainingGetInputOutputCount
* Moved isOrtEnvInitialized boolean to the wasm-core-impl and added a
method that references it

### Motivation and Context
* Adding web bindings for training

#### Related work
* #16521 allowed for training artifacts to be built
* #17333 added interfaces for training
* #17474 allows for training package to be built + adds training backend
to web package **[MUST BE MERGED IN BEFORE THIS ONE]**

---------

Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
Co-authored-by: Ashwini Khade <askhade@microsoft.com>
2023-10-26 09:22:10 -07:00
satyajandhyala
f3cfe08c42
[JS/Web] Enabled 1d spacial input to GlobalAveragePool (#17973)
### Description
Enable one-dim special  input to GlobalAveragePoll input



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Currently only 2D input is supported.
2023-10-23 16:02:50 -07:00
Jiajia Qin
8a12b2cea6
[js/webgpu] Fix the transpose error when dims > 4D (#18027)
### Description
<!-- Describe your changes. -->
Currently, the uniform support has bugs when dims rank is larger than 4.
See https://github.com/microsoft/onnxruntime/issues/17860 item 1.
So this PR only enables shapes uniforms when shape rank is <= 4 for
transpose. Otherwise, below compilation errors are thrown:
```
1 error(s) generated while compiling the shader:
:3:50 error: uniform storage requires that array elements are aligned to 16 bytes, but array element of type 'u32' has a stride of 4 bytes. Consider using a vector or struct as the element type instead.
      struct Uniforms { output_size:u32, a_shape:array<u32, 5>, a_strides:array<u32, 5>, output_shape:array<u32, 5>, output_strides:array<u32, 5> };
                                                 ^^^^^^^^^^^^^

:3:7 note: see layout of struct:
/*            align(4) size(84) */ struct Uniforms {
/* offset( 0) align(4) size( 4) */   output_size : u32;
/* offset( 4) align(4) size(20) */   a_shape : array<u32, 5>;
/* offset(24) align(4) size(20) */   a_strides : array<u32, 5>;
/* offset(44) align(4) size(20) */   output_shape : array<u32, 5>;
/* offset(64) align(4) size(20) */   output_strides : array<u32, 5>;
/*                              */ };
      struct Uniforms { output_size:u32, a_shape:array<u32, 5>, a_strides:array<u32, 5>, output_shape:array<u32, 5>, output_strides:array<u32, 5> };
      ^^^^^^

:4:42 note: 'Uniforms' used in address space 'uniform' here
      @group(0) @binding(2) var<uniform> uniforms: Uniforms;
                                         ^^^^^^^^
```
2023-10-23 11:02:19 -07:00
liqun Fu
020824ed50
Update ONNX to 1.15.0rc1 (#17914) 2023-10-20 15:08:25 -07:00
Arthur Islamov
22947109f2
[js/web] FP16 LayerNorm, InstanceNorm, SkipLayerNorm (#17630)
### Description
This PR includes fixes for Norm operations to support FP16 and also some
optimizations to use vec2/vec4 if possible
2023-10-18 10:47:41 -07:00
dependabot[bot]
f9694c5b97
Bump @babel/traverse from 7.18.5 to 7.23.2 in /js/react_native/e2e (#17963) 2023-10-18 05:25:27 +00:00
dependabot[bot]
cf974f0905
Bump @babel/traverse from 7.18.2 to 7.23.2 in /js/react_native (#17962) 2023-10-16 18:24:01 +00:00
Yulong Wang
ad817d0efa
[js/web] optimize tsc for web: split out "npm prepare" (#17955)
### Description
optimize tsc for web: split out "npm prepare"
2023-10-16 09:04:54 -07:00
Caroline Zhu
c373a808a2
Add "glue" between training WASM artifacts and training web (#17474)
### Description
* follows the packaging approach according to the design document
    * adds `ENABLE_TRAINING` boolean flag to `BUILD_DEFS`
    * modifies `package.json` to include training submodule
* modifies build script to handle, validate, and minimize training WASM
artifacts
* adds the binding for the new backend with training enabled & the new
training artifacts
    * adds training backend
    * edits `index.ts` to use training backend depending on `BUILD_DEFS`
    * edits `wasm-factory.ts` to use the training artifacts if necessary

### Motivation and Context
* we are in the process of adding web bindings to enable training. 
* Adding the "glue" to allow onnxruntime-web to use the training WASM
artifacts is required for this work.
* Since BUILD_DEFS is defined and used at build time, I thought that it
made sense to bundle the changes to building in the same PR.
#### Related work
* #16521 allowed for training artifacts to be built
* #17333 must be merged in before this one

---------

Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
2023-10-12 11:16:56 -07:00
Yulong Wang
d532645bed
[js/webgpu] revise uniform support (#17871)
### Description
<!-- Describe your changes. -->

work for items (2) and (3) in #17860
2023-10-11 16:41:46 -07:00
Yulong Wang
a441a71e8e
[js/web] support different export format for ort-web (#17878)
### Description
support different export format for ort-web.
2023-10-11 09:38:51 -07:00
Yulong Wang
5228332c9f
[js] upgrade JS shared dev dependencies (#17831)
### Description
upgrade JS shared dev dependencies.

- webpack: removed
- eslint: upgrade to latest.
   - eslint config upgraded to compatible with latest version
- typescript upgrade to v5
   - update module "CommonJS" to "Node16" in tsconfig
- update deprecated config "importsNotUsedAsValues" to
"verbatimModuleSyntax"
- remove webpack bundles in onnxruntime-common
2023-10-10 17:44:39 -07:00
Yulong Wang
c6f1a1ce69
update build_jsep.bat to add release build flags (#17471)
### Description
flags `--enable_wasm_api_exception_catching --disable_rtti` are used in
release build, so fix the build_jsep.bat script to make it more
consistent with CI.
2023-10-10 17:38:35 -07:00
Yulong Wang
d9b9c5a537
[js/webgpu] support using uniform buffer (#17803)
### Description
support using uniform buffer.

This PR allows to use uniform buffer in shader program, so that some
runtime information (eg. input/output shape) is no longer need to be
hardcoded into shader code.

There are 2 commits in this PR:
-
[667f31c](667f31c83d):
framework changes to support uniform buffer, as well as updates in
program manager, gpu data manager and indices helper.
-
[09e1d2a](09e1d2ad1d):
an example change for operator `Transpose` to use input's rank-only
instead of dims as shader key. With this change, model mobilenetv2-12
shader compile times dropped from 71 to 52.
2023-10-10 00:31:12 -07:00
Chi Lo
569876fb16
[TensorRT EP] Refactor OrtTensorRTProviderOptions initialization and make it easy to add new field (#17617)
Two major modifications of this PR:

1. Refactor OrtTensorRTProviderOptions initialization and make it easy
to add new field.
2. Make Python API capable of using TensorRT plugins by adding new
Python binding api `register_tensorrt_plugins_as_custom_ops`. (It needs
to register ep's custom op domain before model load. For C++ API, it's
slightly different, when calling
SessionOptionsAppendExecutionProvider_TensorRT_XX, it appends cutom op
domain to session option. Later ORT can register custom op domain from
session option before model loading)
2023-10-06 14:12:20 -07:00
Yulong Wang
6ea493571e
[js/web] use esbuild to accelerate bundle build (#17745)
### Description

Use esbuild to accelerate bundle build.

This change uses esbuild to replace webpack for onnxruntime-web. Bundle
build time reduced from ~20sec to ~0.6sec on my windows dev box.

A few changes applied:
- import nodejs modules using "node:" prefix
- remove enum declaration inside namespace (EncoderUsage)
- use "fs/promise" to replace the old promisify from "util"
- separate ort-web and test-runner. Previously they are bundled
together, now they are built into 2 files.
- optimize karma runner launch time
- remove unnecessary sourcemap preprocessor. sourcemaps are handled
inside esbuild
- remove unnecessary proxies (because ort-web and test-runner are
separated now, the path are correctly inferred)
    - remove file watcher for test data
- optimize special handling as esbuild plugins:
- polyfill dummy imports for node.js modules when targetting browser.
    - load as content string for ort-wasm-*.worker.js
    - load as content string for ./proxy-worker/main.ts
- a source patch to ort-wasm*-threaded*.js (see details in comments in
code)
- updated debug configurations for sourcemap mapping to ensure
out-of-box good dev experience
2023-10-06 13:37:37 -07:00
Jiajia Qin
db3901ab97
[js/webgpu] Enable the NCHW ConvMatMul path (#17717)
1) Enable pointwise NCHW conv2d by MatMul.
2) Enable non-pointwise NCHW conv2d by convMatMul.
3) Fix bug when `sameSize` is true

---------

Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
2023-10-05 00:26:01 -07:00
Xu Xing
992f3e4609
[js/webgpu] Support where (#17544)
Supported type: float. int32_t, uint32_t, bool.
Case where_broadcast.jsonc is not enabled due to
https://github.com/microsoft/onnxruntime/issues/17405.

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
2023-10-03 14:28:21 -07:00
Guenther Schmuelling
f8a8452a6b
[js/webgpu] fix pad operator (#17775)
fix pad operator
2023-10-03 13:39:50 -07:00