onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-03 03:58:54 +00:00

Author	SHA1	Message	Date
Jiajia Qin	64dacc2892	[js/webgpu] Add BatchNormalization Op (#18468 ) ### Description This PR adds `BatchNormalization` with `float` support. Some Todos: 1. all inputs don't have same data type. For example, x/y is float16, but bias/scale is float32 or double. 2. training mode support. We see many models are using `BatchNormalization` ops. However, due to the missing in jsep, all of them run on cpu, which result very poor performance. With this PR's support, densenet-9 model becomes 20.29 ms from 250.69 ms.	2023-11-22 15:58:06 -08:00
Xu Xing	fa106942a7	[js/webgpu] Refactor matmul conv to support uniforms for matmul (#18452 ) This change refactored matmul/conv related programs to support shape uniforms. Currently only matmul shape uniforms are fully enabled. TODOs: add input dependencies for conv related programs, turn clipMax and clipMin to uniforms.	2023-11-22 14:42:55 -08:00
satyajandhyala	841f7ed3e0	[[JS/Web]Added uniform to Expand op. (#18558 ) ### Description <!-- Describe your changes. --> Added Uniforms to Expand operator kernel ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Improve performance	2023-11-22 14:14:24 -08:00
Arthur Islamov	1c555c5fc1	[JS/Web] Resize & BiasSplitGelu fp16 support (#18536 ) ### Description Resize and BiasSplitGelu fp16 support on WebGPU	2023-11-22 12:12:07 -08:00
Yulong Wang	c7fd930330	[js/web] unify resolve rules for "Clip" (#18527 ) ### Description It was a mistake to use 2 different names for Clip operator in op-resolve-rules.ts for different opset. An optimized implementation can handle both cases (opset < 11 and opset >=11). Remove "ClipV10" as an entry from the table.	2023-11-20 23:18:06 -08:00
Jiajia Qin	abdf8b7c3f	[js/webgpu] Optimize broadcast binary. (#18185 ) ### Description Currently, the binary algorithms are divided into the vectorize one (efficient) and non-vectorize one (less efficient). Below situations will go to the vectorize one: 1) A or B's shape length is 1. 2) The shared dimensions length of A and B are divisible by 4. 3) A and B have same shape. This PR adds another situation as below to go to the vectorize algorithm. 4. A or B's last dimension is divisible by 4. With this change, the aggerate time of Add in sam-b-encoder becomes 309.65 ms from 409.12 ms on Intel ADL.	2023-11-20 16:52:17 -08:00
Yulong Wang	247ce21859	[js] optimize eslint config (#18460 ) ### Description optimize eslint config to: - set parserOptions.project to `true` to allow @typescript-eslint/parser to find the nearest tsconfig.json file to that source file. This helps to avoid parsing extra files, may helps with: - reduce the possibility of seeing OOM or stackoverflow with "npm run lint" - faster processing - enforce rule "no-underscore-dangle" with a list of exceptions.	2023-11-20 12:00:56 -08:00
Yulong Wang	34c5424456	[js] update a few packages (#18499 ) ### Description [js] update a few packages - update semver - update reference of onnx_proto to local folder in order to upgrade protobufjs@7.2.4 Resolve AB#18513	2023-11-17 22:40:51 -08:00
Arthur Islamov	fac3e33da5	[js/web] JSEP Attention & MultiHeadAttention (#17742 ) ### Description This is a narrow implementation of Attention/MultiHeadAttention as it does not support: a. inputs 5-7 for MHA b. packed QKV/KV c. past/present d. attention mask But it works well for StableDiffusion and can be extended later. It reduces VRAM usage as it combines many ops into few I've updated demo here https://islamov.ai/stable-diffusion-webgpu/ it takes ~13sec for 1 image with 20 steps on RTX3090Ti and about 25s on M1 Pro VRAM usage is about 8gb if you don't use img2img Going to focus on SDXL now --------- Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com> Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2023-11-17 12:23:52 -08:00
satyajandhyala	b291b20fa0	[JS/Web]Added uniforms support to Slice op. (#18422 ) ### Description Support uniforms in Slice op ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Improve ferformance	2023-11-16 09:44:13 -08:00
Yulong Wang	586f06f5a1	[js/web] set noUnusedParameters to true and fix a few bugs (#18404 ) ### Description - set tsconfig "noUnusedParameters" to `true` and fix a few bugs discovered by typescript. how unused parameter is fixed: - for most code (webgl), add underscore as prefix, which is the standard ignore pattern for typescript check. - remove unused parameter from function and modify corresponding function calls (jsep) - fix a bug in ArgMinMax: this 2 operators do not have more than one input(s) so the `createArgMinMaxAttributesFromInputs()` is removed. - add proxy main.ts into typescript check and fix a bug in parameter passing - fixed `run()` function call and add typecheck fix (hack)	2023-11-15 09:16:29 -08:00
dependabot[bot]	5aeed62630	Bump axios from 1.3.4 to 1.6.1 in /js/node (#18400 ) Bumps [axios](https://github.com/axios/axios) from 1.3.4 to 1.6.1. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/axios/axios/releases">axios's releases</a>.</em></p> <blockquote> <h2>Release v1.6.1</h2> <h2>Release notes:</h2> <h3>Bug Fixes</h3> <ul> <li><strong>formdata:</strong> fixed content-type header normalization for non-standard browser environments; (<a href="https://redirect.github.com/axios/axios/issues/6056">#6056</a>) (<a href="`dd465ab22b`">dd465ab</a>)</li> <li><strong>platform:</strong> fixed emulated browser detection in node.js environment; (<a href="https://redirect.github.com/axios/axios/issues/6055">#6055</a>) (<a href="`3dc8369e50`">3dc8369</a>)</li> </ul> <h3>Contributors to this release</h3> <ul> <li><!-- raw HTML omitted --> <a href="https://github.com/DigitalBrainJS" title="+432/-65 ([#6059](https://github.com/axios/axios/issues/6059) [#6056](https://github.com/axios/axios/issues/6056) [#6055](https://github.com/axios/axios/issues/6055) )">Dmitriy Mozgovoy</a></li> <li><!-- raw HTML omitted --> <a href="https://github.com/meyfa" title="+5/-2 ([#5835](https://github.com/axios/axios/issues/5835) )">Fabian Meyer</a></li> </ul> <h2>Release v1.6.0</h2> <h2>Release notes:</h2> <h3>Bug Fixes</h3> <ul> <li><strong>CSRF:</strong> fixed CSRF vulnerability CVE-2023-45857 (<a href="https://redirect.github.com/axios/axios/issues/6028">#6028</a>) (<a href="`96ee232bd3`">96ee232</a>)</li> <li><strong>dns:</strong> fixed lookup function decorator to work properly in node v20; (<a href="https://redirect.github.com/axios/axios/issues/6011">#6011</a>) (<a href="`5aaff532a6`">5aaff53</a>)</li> <li><strong>types:</strong> fix AxiosHeaders types; (<a href="https://redirect.github.com/axios/axios/issues/5931">#5931</a>) (<a href="`a1c8ad008b`">a1c8ad0</a>)</li> </ul> <h3>PRs</h3> <ul> <li>CVE 2023 45857 ( <a href="https://api.github.com/repos/axios/axios/pulls/6028">#6028</a> )</li> </ul> <pre><code> ⚠️ Critical vulnerability fix. See https://security.snyk.io/vuln/SNYK-JS-AXIOS-6032459 </code></pre> <h3>Contributors to this release</h3> <ul> <li><!-- raw HTML omitted --> <a href="https://github.com/DigitalBrainJS" title="+449/-114 ([#6032](https://github.com/axios/axios/issues/6032) [#6021](https://github.com/axios/axios/issues/6021) [#6011](https://github.com/axios/axios/issues/6011) [#5932](https://github.com/axios/axios/issues/5932) [#5931](https://github.com/axios/axios/issues/5931) )">Dmitriy Mozgovoy</a></li> <li><!-- raw HTML omitted --> <a href="https://github.com/valentin-panov" title="+4/-4 ([#6028](https://github.com/axios/axios/issues/6028) )">Valentin Panov</a></li> <li><!-- raw HTML omitted --> <a href="https://github.com/therealrinku" title="+1/-1 ([#5889](https://github.com/axios/axios/issues/5889) )">Rinku Chaudhari</a></li> </ul> <h2>Release v1.5.1</h2> <h2>Release notes:</h2> <h3>Bug Fixes</h3> <ul> <li><strong>adapters:</strong> improved adapters loading logic to have clear error messages; (<a href="https://redirect.github.com/axios/axios/issues/5919">#5919</a>) (<a href="`e4107797a7`">e410779</a>)</li> <li><strong>formdata:</strong> fixed automatic addition of the <code>Content-Type</code> header for FormData in non-browser environments; (<a href="https://redirect.github.com/axios/axios/issues/5917">#5917</a>) (<a href="`bc9af51b18`">bc9af51</a>)</li> <li><strong>headers:</strong> allow <code>content-encoding</code> header to handle case-insensitive values (<a href="https://redirect.github.com/axios/axios/issues/5890">#5890</a>) (<a href="https://redirect.github.com/axios/axios/issues/5892">#5892</a>) (<a href="`4c89f25196`">4c89f25</a>)</li> <li><strong>types:</strong> removed duplicated code (<a href="`9e6205630e`">9e62056</a>)</li> </ul> <h3>Contributors to this release</h3> <ul> <li><!-- raw HTML omitted --> <a href="https://github.com/DigitalBrainJS" title="+89/-18 ([#5919](https://github.com/axios/axios/issues/5919) [#5917](https://github.com/axios/axios/issues/5917) )">Dmitriy Mozgovoy</a></li> <li><!-- raw HTML omitted --> <a href="https://github.com/DavidJDallas" title="+11/-5 ()">David Dallas</a></li> <li><!-- raw HTML omitted --> <a href="https://github.com/fb-sean" title="+2/-8 ()">Sean Sattler</a></li> <li><!-- raw HTML omitted --> <a href="https://github.com/0o001" title="+4/-4 ()">Mustafa Ateş Uzun</a></li> <li><!-- raw HTML omitted --> <a href="https://github.com/sfc-gh-pmotacki" title="+2/-1 ([#5892](https://github.com/axios/axios/issues/5892) )">Przemyslaw Motacki</a></li> <li><!-- raw HTML omitted --> <a href="https://github.com/Cadienvan" title="+1/-1 ()">Michael Di Prisco</a></li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/axios/axios/blob/v1.x/CHANGELOG.md">axios's changelog</a>.</em></p> <blockquote> <h2><a href="https://github.com/axios/axios/compare/v1.6.0...v1.6.1">1.6.1</a> (2023-11-08)</h2> <h3>Bug Fixes</h3> <ul> <li><strong>formdata:</strong> fixed content-type header normalization for non-standard browser environments; (<a href="https://redirect.github.com/axios/axios/issues/6056">#6056</a>) (<a href="`dd465ab22b`">dd465ab</a>)</li> <li><strong>platform:</strong> fixed emulated browser detection in node.js environment; (<a href="https://redirect.github.com/axios/axios/issues/6055">#6055</a>) (<a href="`3dc8369e50`">3dc8369</a>)</li> </ul> <h3>Contributors to this release</h3> <ul> <li><!-- raw HTML omitted --> <a href="https://github.com/DigitalBrainJS" title="+432/-65 ([#6059](https://github.com/axios/axios/issues/6059) [#6056](https://github.com/axios/axios/issues/6056) [#6055](https://github.com/axios/axios/issues/6055) )">Dmitriy Mozgovoy</a></li> <li><!-- raw HTML omitted --> <a href="https://github.com/meyfa" title="+5/-2 ([#5835](https://github.com/axios/axios/issues/5835) )">Fabian Meyer</a></li> </ul> <h1><a href="https://github.com/axios/axios/compare/v1.5.1...v1.6.0">1.6.0</a> (2023-10-26)</h1> <h3>Bug Fixes</h3> <ul> <li><strong>CSRF:</strong> fixed CSRF vulnerability CVE-2023-45857 (<a href="https://redirect.github.com/axios/axios/issues/6028">#6028</a>) (<a href="`96ee232bd3`">96ee232</a>)</li> <li><strong>dns:</strong> fixed lookup function decorator to work properly in node v20; (<a href="https://redirect.github.com/axios/axios/issues/6011">#6011</a>) (<a href="`5aaff532a6`">5aaff53</a>)</li> <li><strong>types:</strong> fix AxiosHeaders types; (<a href="https://redirect.github.com/axios/axios/issues/5931">#5931</a>) (<a href="`a1c8ad008b`">a1c8ad0</a>)</li> </ul> <h3>PRs</h3> <ul> <li>CVE 2023 45857 ( <a href="https://api.github.com/repos/axios/axios/pulls/6028">#6028</a> )</li> </ul> <pre><code> ⚠️ Critical vulnerability fix. See https://security.snyk.io/vuln/SNYK-JS-AXIOS-6032459 </code></pre> <h3>Contributors to this release</h3> <ul> <li><!-- raw HTML omitted --> <a href="https://github.com/DigitalBrainJS" title="+449/-114 ([#6032](https://github.com/axios/axios/issues/6032) [#6021](https://github.com/axios/axios/issues/6021) [#6011](https://github.com/axios/axios/issues/6011) [#5932](https://github.com/axios/axios/issues/5932) [#5931](https://github.com/axios/axios/issues/5931) )">Dmitriy Mozgovoy</a></li> <li><!-- raw HTML omitted --> <a href="https://github.com/valentin-panov" title="+4/-4 ([#6028](https://github.com/axios/axios/issues/6028) )">Valentin Panov</a></li> <li><!-- raw HTML omitted --> <a href="https://github.com/therealrinku" title="+1/-1 ([#5889](https://github.com/axios/axios/issues/5889) )">Rinku Chaudhari</a></li> </ul> <h2><a href="https://github.com/axios/axios/compare/v1.5.0...v1.5.1">1.5.1</a> (2023-09-26)</h2> <h3>Bug Fixes</h3> <ul> <li><strong>adapters:</strong> improved adapters loading logic to have clear error messages; (<a href="https://redirect.github.com/axios/axios/issues/5919">#5919</a>) (<a href="`e4107797a7`">e410779</a>)</li> <li><strong>formdata:</strong> fixed automatic addition of the <code>Content-Type</code> header for FormData in non-browser environments; (<a href="https://redirect.github.com/axios/axios/issues/5917">#5917</a>) (<a href="`bc9af51b18`">bc9af51</a>)</li> <li><strong>headers:</strong> allow <code>content-encoding</code> header to handle case-insensitive values (<a href="https://redirect.github.com/axios/axios/issues/5890">#5890</a>) (<a href="https://redirect.github.com/axios/axios/issues/5892">#5892</a>) (<a href="`4c89f25196`">4c89f25</a>)</li> <li><strong>types:</strong> removed duplicated code (<a href="`9e6205630e`">9e62056</a>)</li> </ul> <h3>Contributors to this release</h3> <ul> <li><!-- raw HTML omitted --> <a href="https://github.com/DigitalBrainJS" title="+89/-18 ([#5919](https://github.com/axios/axios/issues/5919) [#5917](https://github.com/axios/axios/issues/5917) )">Dmitriy Mozgovoy</a></li> <li><!-- raw HTML omitted --> <a href="https://github.com/DavidJDallas" title="+11/-5 ()">David Dallas</a></li> <li><!-- raw HTML omitted --> <a href="https://github.com/fb-sean" title="+2/-8 ()">Sean Sattler</a></li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`f6d2cf9763`"><code>f6d2cf9</code></a> chore(ci): fix publish action content permission; (<a href="https://redirect.github.com/axios/axios/issues/6061">#6061</a>)</li> <li><a href="`a22f4b918a`"><code>a22f4b9</code></a> chore(release): v1.6.1 (<a href="https://redirect.github.com/axios/axios/issues/6060">#6060</a>)</li> <li><a href="`cb8bb2beb2`"><code>cb8bb2b</code></a> chore(ci): Publish to NPM with provenance (<a href="https://redirect.github.com/axios/axios/issues/5835">#5835</a>)</li> <li><a href="`37cbf9214a`"><code>37cbf92</code></a> chore(ci): added labeling and notification for published PRs; (<a href="https://redirect.github.com/axios/axios/issues/6059">#6059</a>)</li> <li><a href="`dd465ab22b`"><code>dd465ab</code></a> fix(formdata): fixed content-type header normalization for non-standard brows...</li> <li><a href="`3dc8369e50`"><code>3dc8369</code></a> fix(platform): fixed emulated browser detection in node.js environment; (<a href="https://redirect.github.com/axios/axios/issues/6055">#6055</a>)</li> <li><a href="`f7adacdbaa`"><code>f7adacd</code></a> chore(release): v1.6.0 (<a href="https://redirect.github.com/axios/axios/issues/6031">#6031</a>)</li> <li><a href="`9917e67cbb`"><code>9917e67</code></a> chore(ci): fix release-it arg; (<a href="https://redirect.github.com/axios/axios/issues/6032">#6032</a>)</li> <li><a href="`96ee232bd3`"><code>96ee232</code></a> fix(CSRF): fixed CSRF vulnerability CVE-2023-45857 (<a href="https://redirect.github.com/axios/axios/issues/6028">#6028</a>)</li> <li><a href="`7d45ab2e2a`"><code>7d45ab2</code></a> chore(tests): fixed tests to pass in node v19 and v20 with <code>keep-alive</code> enabl...</li> <li>Additional commits viewable in <a href="https://github.com/axios/axios/compare/v1.3.4...v1.6.1">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=axios&package-manager=npm_and_yarn&previous-version=1.3.4&new-version=1.6.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-11-14 00:38:00 -08:00
Xu Xing	949ac4b7ce	[js/webgpu] Support uniforms for gather (#18312 )	2023-11-13 11:24:34 -08:00
Wanming Lin	73ed34ac4b	[WebNN EP] Support numThreads option for WebNN CPU device (#18054 )	2023-11-12 16:45:10 -08:00
Xu Xing	0c8c0014f6	[js/webgpu] Use builtin num_workgroups to fix shader key conflict (#18387 ) This fixes conformance failure of tinyyolov2-8 and potential shader key conflict issues.	2023-11-10 17:37:45 -08:00
Yulong Wang	6b0c97b43f	[js/web] fix typescript type check (#18343 ) ### Description This PR fixes the TypeScript type check. Previously, when I use esbuild to replace webpack (#17745), typescript typecheck was disabled. This causes a few TypeScript type error checked in into the code base. This PR fixes the followings: - Use "Node16" as default "module" value in tsconfig.json, because in TypeScript v5, `(module == "ES2015" && moduleResolution == "Node16")` is an invalid combination. - Set `noUnusedParameters` to true as default. in web override it to false because multiple code need to be updated ( a following-up PR will do this ) - set correct project file for 'web/lib/*/.ts' for ESLint (otherwise WebGPU types are not populated correctly) - fix type error in file js/web/lib/wasm/jsep/webgpu/program-manager.ts - upgrade "@webgpu/types" to latest to fix type error in file js/web/lib/wasm/jsep/backend-webgpu.ts - add package script "prebuild" for web to run tsc type check - add type check in CI yml file	2023-11-10 16:03:38 -08:00
Xu Xing	8dba6efd61	[js/webgpu] Add uniforms support to concat op (#18238 )	2023-11-10 13:46:03 -08:00
Jiajia Qin	28c23aed04	[js/webgpu] Fix conv2d with activation (#18388 ) ### Description Fix #18297 With PR #17766, conv2d activation in mobilenetv2-12 will not be empty. However, activation is not supported yet in [biasActivationSnippet](https://github.com/microsoft/onnxruntime/blob/main/js/web/lib/wasm/jsep/webgpu/ops/3rd-party/activation_util.ts#L48C14-L48C36). This PR makes all places unify to use [getActivationSnippet](https://github.com/microsoft/onnxruntime/blob/main/js/web/lib/wasm/jsep/webgpu/ops/fuse-utils.ts#L13) to fix this issue.	2023-11-10 12:54:35 -08:00
Xu Xing	dd1bb760eb	[js/webgpu] Fix scalar uniform (#18318 )	2023-11-10 10:12:22 -08:00
Xu Xing	829d802337	[js/webgpu] Support uniform for softmax (#18345 )	2023-11-09 11:19:23 -08:00
Guenther Schmuelling	25fbc2b0ab	fix fused relu activation (#18303 )	2023-11-09 08:18:21 -08:00
Yulong Wang	10df847baf	[js] fix linter out-of-memory issue (#18307 ) ### Description fix linter out-of-memory issue by ignoring file pattern 'test/data/'.	2023-11-07 17:12:22 -08:00
Jiajia Qin	606356d0b1	[js/webgpu] Simplify the Resize shader when noScale is true (#18321 ) ### Description For Resize, when `noScale` is true, the shader can become very simple, which is not related with `attributes.mode` anymore. So we should remove those parts of shader code for simplification. This PR can also fix #18311 since the `noScale` are all true in that model. However, #18311 also exposes that the Resize implementation for `linear` mode has bug. It seems that the currently implementation always treat the input as either 2d or 4d tensor, however, the actual input is 3d tensor, that's why the shader compilation is failed. We may need to fix it in a separate PR.	2023-11-07 12:54:20 -08:00
satyajandhyala	a16d528399	[JS/Web] Added Uniforms support to binary ops. (#18260 ) ### Description Added Uniform support to binary ops ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> To improve performance	2023-11-07 08:41:52 -08:00
satyajandhyala	e207060ac9	[JS/Web] Added Unifroms support to unary ops. (#18223 ) ### Description Added uniforms support to unary ops. ### Motivation and Context Improve performance	2023-11-03 09:30:54 -07:00
Scott McKay	4f2096be38	Update XNNPACK to latest version (#18038 ) ### Description <!-- Describe your changes. --> Update XNNPACK to latest version - adds fp16 kernels and various other improvements - requires pthreadpool update as well Most code updates in the XNNPACK EP are to adjust to the new XNNPACK API - 'setup' is split into 'reshape' and 'setup' - some ops use a workspace buffer - copied workspace allocation from XNNPACK unit test code - some suffixes changed Added wrapper for XNNPACK caches to base XNNPACK EP kernel - simplifies usage - XNNPACK split out the code and weights caches, but the code cache isn't currently usable via the public API - we could use the internal types if we think it's required for performance reasons. non-trivial though as we'd need to propagate ifdef values from the XNNPACK build up to the ORT build. - using XNNPACK internals would also mean we would not be able to support using a pre-build XNNPACK package - not an issue currently Fixed opset registration for internal NHWC domain - was not being tied to the ONNX version, so nodes inserted by layout transformation had the incorrect opset - a number of other places needed updating once this issue was fixed Remove support for NCHW Resize from XNNPACK EP so it's NHWC only - we only supported NCHW for fp32, - doing so adds complexity in multiple places (XNNPACK EP kernel implementation, layout transformation and transpose optimization) - unclear if that complexity provides any benefit. can add back if required by production scenario ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> We're looking at enabling fp16 support for CoreML and NNAPI. If we do that we need a good fallback story if the CPU EP will be used. The XNNPACK fp16 kernels will hopefully provide that. NOTE: This PR doesn't add fp16 support to the XNNPACK EP kernels. That can be done as required in separate EPs and should be relatively simple to do.	2023-11-03 09:04:28 -07:00
xhcao	8d48d3e9cc	[js/web] optimize reduce related operators (#17957 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-11-02 12:51:48 -07:00
Caroline Zhu	e3b043ba17	[js/web/training] runTrainStep implementation (#18006 ) ### Description * based on design document & following InferenceSession's run implementation, implemented TrainingSession.runTrainStep ### Motivation and Context * Adding web bindings for training #### Related work * #16521 allowed for training artifacts to be built * #17333 added interfaces for training * #17474 allowed for training package to be built + added training backend to web package * #17891 implementation for createTrainingSession on the TypeScript side [SHOULD BE MERGED IN BEFORE THIS PR] --------- Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com> Co-authored-by: Ashwini Khade <askhade@microsoft.com>	2023-11-02 08:32:50 -07:00
satyajandhyala	a2e9ba72d5	[JS/Web]Added FusedConv. (#17766 ) ### Description Added FusedConv and FusedConvTranspose ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Improve performance	2023-11-01 15:34:51 -07:00
Jiajia Qin	785e2b1eae	[js/webgpu] Optimize softmax by vector (#18153 ) ### Description This PR enables `softmax` outputs max supported components instead of scalar for each thread. Softmax with input[0]: [12,4096,4096] becomes 47.86 ms from 55.11 ms	2023-10-30 16:05:35 -07:00
Yulong Wang	9bba990871	[js/web] fix a few package consuming problems (#18109 ) ### Description This PR tries to fix a part of the NPM package consuming problems for onnxruntime-web (ES module) as described in #10913: - reduce the package size to fit the 150MB restriction in jsdelivr, by removing dev build targets for uncommon exports - add default export to support `import ort from 'onnxruntime-web';` (currently only support `import * as ort from 'onnxruntime-web';`	2023-10-30 08:11:43 -07:00
Yang Gu	52f4968359	[js/webgpu] Change timestamp-query-in-passes to timestamp-query (#18108 ) Timestamp-query has a broader support than timestamp-query-in-passes on all the platforms, including macOS. Note that to enable timestamp-query, you still need to add switch "--enable-dawn-features=allow_unsafe_apis" to Chrome. By default, the lowest 16 bits are masked with 0 (at a granularity about 0.1ms) for privacy. To get the highest precision, you need to add another switch "--enable-webgpu-developer-features".	2023-10-26 16:33:03 -07:00
Caroline Zhu	64de71c5e2	[js/web/training] Add CreateTrainingSession (#17891 ) ### Description * Adds TrainingSession.create() functionality following the web bindings for training design doc * Added 2 new training APIs to wasm/api.h: * OrtTrainingGetInputOutputName * OrtTrainingGetInputOutputCount * Moved isOrtEnvInitialized boolean to the wasm-core-impl and added a method that references it ### Motivation and Context * Adding web bindings for training #### Related work * #16521 allowed for training artifacts to be built * #17333 added interfaces for training * #17474 allows for training package to be built + adds training backend to web package [MUST BE MERGED IN BEFORE THIS ONE] --------- Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com> Co-authored-by: Ashwini Khade <askhade@microsoft.com>	2023-10-26 09:22:10 -07:00
satyajandhyala	f3cfe08c42	[JS/Web] Enabled 1d spacial input to GlobalAveragePool (#17973 ) ### Description Enable one-dim special input to GlobalAveragePoll input ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Currently only 2D input is supported.	2023-10-23 16:02:50 -07:00
Jiajia Qin	8a12b2cea6	[js/webgpu] Fix the transpose error when dims > 4D (#18027 ) ### Description <!-- Describe your changes. --> Currently, the uniform support has bugs when dims rank is larger than 4. See https://github.com/microsoft/onnxruntime/issues/17860 item 1. So this PR only enables shapes uniforms when shape rank is <= 4 for transpose. Otherwise, below compilation errors are thrown: ``` 1 error(s) generated while compiling the shader: :3:50 error: uniform storage requires that array elements are aligned to 16 bytes, but array element of type 'u32' has a stride of 4 bytes. Consider using a vector or struct as the element type instead. struct Uniforms { output_size:u32, a_shape:array<u32, 5>, a_strides:array<u32, 5>, output_shape:array<u32, 5>, output_strides:array<u32, 5> }; ^^^^^^^^^^^^^ :3:7 note: see layout of struct: /* align(4) size(84) / struct Uniforms { / offset( 0) align(4) size( 4) / output_size : u32; / offset( 4) align(4) size(20) / a_shape : array<u32, 5>; / offset(24) align(4) size(20) / a_strides : array<u32, 5>; / offset(44) align(4) size(20) / output_shape : array<u32, 5>; / offset(64) align(4) size(20) / output_strides : array<u32, 5>; / */ }; struct Uniforms { output_size:u32, a_shape:array<u32, 5>, a_strides:array<u32, 5>, output_shape:array<u32, 5>, output_strides:array<u32, 5> }; ^^^^^^ :4:42 note: 'Uniforms' used in address space 'uniform' here @group(0) @binding(2) var<uniform> uniforms: Uniforms; ^^^^^^^^ ```	2023-10-23 11:02:19 -07:00
liqun Fu	020824ed50	Update ONNX to 1.15.0rc1 (#17914 )	2023-10-20 15:08:25 -07:00
Arthur Islamov	22947109f2	[js/web] FP16 LayerNorm, InstanceNorm, SkipLayerNorm (#17630 ) ### Description This PR includes fixes for Norm operations to support FP16 and also some optimizations to use vec2/vec4 if possible	2023-10-18 10:47:41 -07:00
dependabot[bot]	f9694c5b97	Bump @babel/traverse from 7.18.5 to 7.23.2 in /js/react_native/e2e (#17963 )	2023-10-18 05:25:27 +00:00
dependabot[bot]	cf974f0905	Bump @babel/traverse from 7.18.2 to 7.23.2 in /js/react_native (#17962 )	2023-10-16 18:24:01 +00:00
Yulong Wang	ad817d0efa	[js/web] optimize tsc for web: split out "npm prepare" (#17955 ) ### Description optimize tsc for web: split out "npm prepare"	2023-10-16 09:04:54 -07:00
Caroline Zhu	c373a808a2	Add "glue" between training WASM artifacts and training web (#17474 ) ### Description * follows the packaging approach according to the design document * adds `ENABLE_TRAINING` boolean flag to `BUILD_DEFS` * modifies `package.json` to include training submodule * modifies build script to handle, validate, and minimize training WASM artifacts * adds the binding for the new backend with training enabled & the new training artifacts * adds training backend * edits `index.ts` to use training backend depending on `BUILD_DEFS` * edits `wasm-factory.ts` to use the training artifacts if necessary ### Motivation and Context * we are in the process of adding web bindings to enable training. * Adding the "glue" to allow onnxruntime-web to use the training WASM artifacts is required for this work. * Since BUILD_DEFS is defined and used at build time, I thought that it made sense to bundle the changes to building in the same PR. #### Related work * #16521 allowed for training artifacts to be built * #17333 must be merged in before this one --------- Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2023-10-12 11:16:56 -07:00
Yulong Wang	d532645bed	[js/webgpu] revise uniform support (#17871 ) ### Description <!-- Describe your changes. --> work for items (2) and (3) in #17860	2023-10-11 16:41:46 -07:00
Yulong Wang	a441a71e8e	[js/web] support different export format for ort-web (#17878 ) ### Description support different export format for ort-web.	2023-10-11 09:38:51 -07:00
Yulong Wang	5228332c9f	[js] upgrade JS shared dev dependencies (#17831 ) ### Description upgrade JS shared dev dependencies. - webpack: removed - eslint: upgrade to latest. - eslint config upgraded to compatible with latest version - typescript upgrade to v5 - update module "CommonJS" to "Node16" in tsconfig - update deprecated config "importsNotUsedAsValues" to "verbatimModuleSyntax" - remove webpack bundles in onnxruntime-common	2023-10-10 17:44:39 -07:00
Yulong Wang	c6f1a1ce69	update build_jsep.bat to add release build flags (#17471 ) ### Description flags `--enable_wasm_api_exception_catching --disable_rtti` are used in release build, so fix the build_jsep.bat script to make it more consistent with CI.	2023-10-10 17:38:35 -07:00
Yulong Wang	d9b9c5a537	[js/webgpu] support using uniform buffer (#17803 ) ### Description support using uniform buffer. This PR allows to use uniform buffer in shader program, so that some runtime information (eg. input/output shape) is no longer need to be hardcoded into shader code. There are 2 commits in this PR: - [667f31c](`667f31c83d`): framework changes to support uniform buffer, as well as updates in program manager, gpu data manager and indices helper. - [09e1d2a](`09e1d2ad1d`): an example change for operator `Transpose` to use input's rank-only instead of dims as shader key. With this change, model mobilenetv2-12 shader compile times dropped from 71 to 52.	2023-10-10 00:31:12 -07:00
Chi Lo	569876fb16	[TensorRT EP] Refactor OrtTensorRTProviderOptions initialization and make it easy to add new field (#17617 ) Two major modifications of this PR: 1. Refactor OrtTensorRTProviderOptions initialization and make it easy to add new field. 2. Make Python API capable of using TensorRT plugins by adding new Python binding api `register_tensorrt_plugins_as_custom_ops`. (It needs to register ep's custom op domain before model load. For C++ API, it's slightly different, when calling SessionOptionsAppendExecutionProvider_TensorRT_XX, it appends cutom op domain to session option. Later ORT can register custom op domain from session option before model loading)	2023-10-06 14:12:20 -07:00
Yulong Wang	6ea493571e	[js/web] use esbuild to accelerate bundle build (#17745 ) ### Description Use esbuild to accelerate bundle build. This change uses esbuild to replace webpack for onnxruntime-web. Bundle build time reduced from ~20sec to ~0.6sec on my windows dev box. A few changes applied: - import nodejs modules using "node:" prefix - remove enum declaration inside namespace (EncoderUsage) - use "fs/promise" to replace the old promisify from "util" - separate ort-web and test-runner. Previously they are bundled together, now they are built into 2 files. - optimize karma runner launch time - remove unnecessary sourcemap preprocessor. sourcemaps are handled inside esbuild - remove unnecessary proxies (because ort-web and test-runner are separated now, the path are correctly inferred) - remove file watcher for test data - optimize special handling as esbuild plugins: - polyfill dummy imports for node.js modules when targetting browser. - load as content string for ort-wasm-.worker.js - load as content string for ./proxy-worker/main.ts - a source patch to ort-wasm-threaded*.js (see details in comments in code) - updated debug configurations for sourcemap mapping to ensure out-of-box good dev experience	2023-10-06 13:37:37 -07:00
Jiajia Qin	db3901ab97	[js/webgpu] Enable the NCHW ConvMatMul path (#17717 ) 1) Enable pointwise NCHW conv2d by MatMul. 2) Enable non-pointwise NCHW conv2d by convMatMul. 3) Fix bug when `sameSize` is true --------- Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2023-10-05 00:26:01 -07:00
Xu Xing	992f3e4609	[js/webgpu] Support where (#17544 ) Supported type: float. int32_t, uint32_t, bool. Case where_broadcast.jsonc is not enabled due to https://github.com/microsoft/onnxruntime/issues/17405. ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2023-10-03 14:28:21 -07:00
Guenther Schmuelling	f8a8452a6b	[js/webgpu] fix pad operator (#17775 ) fix pad operator	2023-10-03 13:39:50 -07:00
Arthur Islamov	d0519a7603	[js/web] BiasSplitGelu and BiasAdd kernels (#17161 ) ### Description Two contrib kernels that supposed to speed-up StableDiffusion according to this doc https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/models/stable_diffusion/README.md However, there is no noticable effect in speed or memory consumption. So i guess the only way to make it faster is to implement MultiHeadAttention but i'm not capable of doing that right now. So i'll focus on existing PRs and finding the JSEP kernel that produces incorrect results. It should be one of the old ones (i suspect Conv or ConvTranspose), as SD was not generating images correctly on webgpu since i started working on it. I hoped someone else would fix that by the time i finish with kernels/optimizations 😅 --------- Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com> Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2023-10-03 12:20:20 -07:00
Yulong Wang	451c02543a	[js/webgpu] allow specify preferredLayout (#17756 ) ### Description Allow WebGPU backend to specify `preferredLayout`. Default is NHWC. ```js const options = {executionProviders: [{name:'webgpu', preferredLayout: 'NCHW'}]}; sess1 = await ort.InferenceSession.create('./mobilenetv2-12.onnx', options); ``` ### Motivation and Context - implement @qjia7's requirement for an easier way to do performance comparison between NCHW vs NHWC. - It's possible that NCHW does better on some models and NHWC on others. So offer user the capability to switch.	2023-10-02 21:25:12 -07:00
Scott McKay	ac4e726046	Add bytes model loading test to react native e2e (#17749 ) ### Description <!-- Describe your changes. --> Update E2E test to also check InferenceSession.create with bytes. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Add tests to validate #17739	2023-10-02 12:25:28 +10:00
xhcao	0d60604638	[JS/WebGPU] support Range operator (#17233 ) The patch also introduces the method which copies data from GPU to CPU synchronously. ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-09-30 02:05:32 -07:00
Arthur Islamov	a941dd583e	[js/web] FP16 Conv, ConvTranspose and MatMul (#17514 ) ### Description Another three ops for fp16 --------- Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com> Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2023-09-30 00:00:23 -07:00
Caroline Zhu	6a5f469d44	Add training interfaces to js/common (#17333 ) ### Description Following the design document: * Added CreateTrainingSessionHandler to the Backend interface * All existing Backend implementations throw an error for the new method createTrainingSessionHandler * Created TrainingSession namespace, interface, and TrainingSessionFactory interface * Created TrainingSessionImpl class implementation As methods are implemented, the TrainingSession interface will be added to or modified. ### Motivation and Context Adding the public-facing interfaces to the onnxruntime-common package is one of the first steps to support ORT training for web bindings. --------- Co-authored-by: Caroline Zhu <carolinezhu@microsoft.com>	2023-09-29 19:05:10 -07:00
Rachel Guo	e106b1eb8f	Fix react native load from Uint8Array buffer bug (#17739 ) ### Description <!-- Describe your changes. --> Use `.buffer` of Uint8Array to get ArrayBuffer. TODO: Add E2E React Native test case to cover JS level testing to avoid future breakage. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> #17732 Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>	2023-09-29 18:03:28 -07:00
Yulong Wang	561aca97cf	[js/webgpu] support IO binding (#17480 ) <del> This PR is based on a few prerequisites PRs. They are listed as below: - #17465 - #17469 - #17470 - #17472 - #17473 - #17484 Please review the current change by only looking at commit e2e6623e673ec6de55a5c1f8edcbd3a46b535a89 and later. </del> ### Description This PR introduces WebGPU IO binding. This new feature allows onnxruntime-web users to use tensors created from GPU as model input/output so that a model inferencing can be done without unnecessary data copy between CPU and GPU for model input/output. ### Examples An E2E demo/example is being worked on. Following is some simple demo with code snippet. Let's first check today how we do: ```js // STEP.1 - create an inference session: const mySession = await ort.InferenceSession.create('./my_model.onnx', { executionProviders: ['webgpu'] }); // STEP.2 - create model input: (supposing myImageCpuData is a Float32Array) const feeds = { 'input_image:0': new ort.Tensor('float32', myImageCpuData, [1, 224, 224, 3]) }; // STEP.3 - run model const myResults = await mySession.run(feeds); // STEP.4 - get output data const myData = myResults['output_image:0'].data; // Float32Array ``` #### for inputs (GPU tensor): Now, with IO binding, you can create a tensor from a GPU buffer, and feed it to the model: ```js // new STEP.2.A - create model input from a GPU buffer: (supposing myInputGpuBuffer is a `GPUBuffer` object with input data) const feeds = { 'input_image:0': ort.Tensor.fromGpuBuffer(myInputGpuBuffer, { dataType: 'float32', dims: [1, 224, 224, 3] }) }; ``` ### for outputs (pre-allocated GPU tensor) you can also do that for output, if you know the output shape: ```js // new STEP.2.B - create model output from a GPU buffer: (supposing myOutputGpuBuffer is a pre-allocated `GPUBuffer` object) const fetches = { 'output_image:0': ort.Tensor.fromGpuBuffer(myOutputGpuBuffer, { dataType: 'float32', dims: [1, 512, 512, 3] }) }; // new STEP.3 - run model with pre-allocated output (fetches) const myResults = await mySession.run(feeds, fetches); ``` ### for outputs (specify location) if you do not know the output shape, you can specify the output location when creating the session: ```js // new STEP.1 - create an inference session with an option "preferredOutputLocation": const mySession = await ort.InferenceSession.create('./my_model.onnx', { executionProviders: ['webgpu'], preferredOutputLocation: "gpu-buffer" }); ``` if the model has multiple outputs, you can specify them seperately: ```js // new STEP.1 - create an inference session with an option "preferredOutputLocation": const mySession = await ort.InferenceSession.create('./my_model.onnx', { executionProviders: ['webgpu'], preferredOutputLocation: { "output_image:0": "gpu-buffer" } }); ``` now you don't need to prepare the `fetches` object and onnxruntime-web will prepare output data on the location that specified. #### read data when you get the output tensor, you can: ```js // get the gpu buffer object: const gpuBuffer = myOutputTensor.gpuBuffer; // GPUBuffer // get the CPU data asynchronizely const cpuData = await myOutputTensor.getData(); // get the CPU data asynchronizely and release the underlying GPU resources const cpuData = await myOutputTensor.getData(true); // dispose the tensor (release the underlying GPU resources). This tensor object will be invalid after dispose() is called. myOutputTensor.dispose(); ``` #### resource management JavaScript has GC so you don't need to worry about managing JavaScript objects. But there are 2 types of resources that are not managed by GC: - GPU buffer that used in tensors - Underlying ORT native resources To simplify, most of the unmanaged resources and handled inside ORT web. But there are a few resources that need users to manage: - All external GPU resources, including GPU buffers inside all tensors created by `Tensor.fromGpuBuffer()`, will not be managed by ORT. User should manage those GPU buffers themselves. - When a session is created with `preferredOutputLocation` == "gpu-buffer" specified in session options, and the corresponding output is not pre-allocated, user need to call the output tensor's `dispose()` or `getData(true)` to manually release the underlying GPU buffers. - ORT internal errors (including providing a pre-allocated output tensor with wrong type/dims) will invalidate the whole wasm memory and is not recoverable. An exception is thrown in this situation.	2023-09-29 11:24:42 -07:00
satyajandhyala	b4fbc25b1f	[JS/Web] Add ConvTranspose implementation using MatMul (#17573 ) ### Description Add ConvTranspose implementation using MatMul to increase perf. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-09-29 11:00:44 -07:00
Yulong Wang	b2b1408608	[js/web] update browser launch cmd flags (#17658 ) ### Description update Chromium browser launch command line flags Canary already using dxc so no need to specify '--enable-dawn-features=use_dxc' for canary.	2023-09-25 12:24:46 -07:00
Yulong Wang	df15a3a335	[js/web] configure 5GB memory space for webpack build (#17684 ) ### Description ort-web build step - webpack consumes the amount of memory on the edge of Node.js(V8)'s default max-old-space-size, so increase the default memory size to 5GB to avoid this issue.	2023-09-25 09:22:00 -07:00
Jiajia Qin	891fba3b9c	[js/webgpu] Optimize Gather op (#17625 ) ### Description This PR optimizes the gather op, which is improved ~6ms in segment anything model in ADL. The problem in original algorithm is that it includes a for loop to calculate a block size of data. However, the block size may be very large, like `65536`. In GPU shader, we should try to avoid large loop in shader and try to use more threads to do it parallelly. Before: ``` [profiling] kernel "41771992\|[Gather] 41771992" input[0]: [4,65536] \| float32, input[1]: [1] \| int64, output[0]: [1,65536] \| float32, execution time: 6886207 ns ``` After: ``` [profiling] kernel "41771992\|[Gather] 41771992" input[0]: [4,65536] \| float32, input[1]: [1] \| int64, output[0]: [1,65536] \| float32, execution time: 11719 ns	2023-09-21 21:00:36 -07:00
Jiajia Qin	cd3fb377ea	[js/webgpu] Allow binary ops with scalar to use the vectorize path (#17589 ) ### Description 1. For binary ops, the components is always 4. So the dispatchGroup should be : `{x: Math.ceil(outputSize / 64 /* workgroup size / / 4 / component size /)}` instead of `{x: Math.ceil(outputSize / 64 / workgroup size / / (vectorize ? 4 : 1) / vec size */)}`. 2. If any of a or b only has one element, we still can use the vectorize path since the same value will be broadcasted.	2023-09-21 20:55:08 -07:00
Arthur Islamov	498b60d8a4	[js/web] fp16 Pool & Reduce (#17512 ) ### Description Two more ops to support fp16	2023-09-21 14:52:13 -07:00
Vincent Wang	e6301eee6a	Bump Up Version to 1.17.0 (#17587 ) Bump up version to 1.17.0 as the 1.16.0 release branch had been branched out.	2023-09-20 11:02:58 +08:00
Hariharan Seshadri	460f17fbb8	[JS/WebGPU] Support If on WebGPU (#17478 )	2023-09-19 12:20:18 -07:00
Arthur Islamov	0f406ca1d3	[js/web] FP16 binary and unary ops (#17515 ) ### Description Binary and unary ops with fp16 support	2023-09-18 15:43:32 -07:00
Yulong Wang	efd416b71f	[js/web] update test to explicitly fail for webnn without proxy (#17554 ) ### Description Update test to explicitly fail for webnn without proxy. I am doing this change because if I test webnn with other backend together, it silently enables proxy. I want to make test runner behave with less implicit flag reset. If proxy is not enabled, webnn test should fail. @Honry please let me know if other places (eg. CI scripts) should change also.	2023-09-15 14:40:22 -07:00
Yulong Wang	155887593d	[js/web] update npm test to load test cases only for required backends (#17555 ) ### Description update npm test to load test cases for required backends. No need to load test case list for the backends that we don't test.	2023-09-15 13:55:25 -07:00
Yulong Wang	9aafbe3feb	[js/web] revise TensorView (#17473 ) ### Description This change: - removes the unused `Tensor` types declared in /js/web/lib/wasm/jsep/tensor.ts - removes duplicated util functions in /js/web/lib/wasm/jsep/tensor.ts - renames /js/web/lib/wasm/jsep/tensor.ts to /js/web/lib/wasm/jsep/tensor-view.ts and update corresponding references. It was kind of confusing that we have multiple `Tensor` types defined in different places also we have multiple `tensor.ts` source files. This is one of the prerequisites for supporting IO binding for WebGPU buffer in onnxruntime-web. list of prerequisites PRs: https://github.com/microsoft/onnxruntime/pull/17465 https://github.com/microsoft/onnxruntime/pull/17469 https://github.com/microsoft/onnxruntime/pull/17470 https://github.com/microsoft/onnxruntime/pull/17472 https://github.com/microsoft/onnxruntime/pull/17473 (this one)	2023-09-14 21:14:44 -07:00
Jiajia Qin	41d2ff622c	[js/webgpu] Optimize InstanceNormalization (#17491 ) ### Description <!-- Describe your changes. --> In previous implementation, there are two loops to iterate H * W elements to calculate the `mean` and `squaredNorm` value in one thread, meanwhile it outputs H * W elements in one thread. That results it's very very slow when H * W is a large value. And usually, H * W does be a large value in a model. For example, in the `candy-8` model, the shapes of [H, W] are [224,224], [112,112], [56,56] for `InstanceNormalization` op. And in my ADL, `[1,224,224,32]` consumes 17 ms. See below: ``` [profiling] kernel "23848328\|[InstanceNormalization] 23848328" input[0]: [1,224,224,32] \| float32, input[1]: [32] \| float32, input[2]: [32] \| float32, output[0]: [1,224,224,32] \| float32, execution time: 17007914 ns ``` In this PR, it uses workgroup memory to optimize the original algorithm. The advantage is that it can parallelly utilize the 64 (workgroupSize) threads in one workgroup to calculate `mean` and `squaredNorm` value. Meanwhile, it only outputs `H * W / workgroupSize` outputs for one thread, which greatly reduces the overhead for one thread. With this optimization, `[1,224,224,32]` becomes 3 ms and the main overhead is the extra two `transpose`. The `createInstanceNormProgramInfo` only needs `0.64` ms. See below: ``` [profiling] kernel "23003600\|[InstanceNormalization] 23003600" input[0]: [1,224,224,32] \| float32, output[0]: [1,32,224,224] \| float32, execution time: 1543792 ns program-manager.ts:115 [profiling] kernel "23003600\|[InstanceNormalization] 23003600" input[0]: [1,32,224,224] \| float32, input[1]: [32] \| float32, input[2]: [32] \| float32, output[0]: [1,32,224,224] \| float32, execution time: 642652 ns program-manager.ts:115 [profiling] kernel "23003600\|[InstanceNormalization] 23003600" input[0]: [1,32,224,224] \| float32, output[0]: [1,224,224,32] \| float32, execution time: 991608 ns ``` This PR currently only applies the new algorithm to NCHW format. For NHWC format, one way is to transpose the input so that it can use the new algorithm. But the disadvantage is that 2 extra transpose are added. @dakenf also gives another way to optimize NHWC. Details see [here](`d45a96616d/js/web/lib/wasm/jsep/webgpu/ops/instance-norm.ts`). I checked @dakenf's method. The perf is similar with transpose + optimized NCHW. But on different GPUs, one is a little better than another or vice versa. So I prefer this PR only does the NCHW part. @dakenf can submit his optimization on NHWC.	2023-09-14 17:03:18 -07:00
xhcao	198d468849	[WebGPU/JS] Added Pad operator support (#16928 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-09-14 13:14:11 -07:00
Yulong Wang	7af2f68ef3	[js/web] add a test flag to customize chromium flags (#17545 ) ### Description add a test flag to customize chromium flags. Usage: npm test -- \<other flags> --chromium-flags=<...>	2023-09-14 10:05:31 -07:00
Hans	ad369a1fad	[js/rn] Support create boolean tensor (#17052 ) ### Description <!-- Describe your changes. --> For some use case need to create boolean tensor. I've tested on [this project](https://github.com/hans00/react-native-transformers-example) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Add handle `ONNX_TENSOR_ELEMENT_DATA_TYPE_BOOL` And it required #15556 (It seems not include in latest release (v1.15.1))	2023-09-14 15:02:27 +10:00
Arthur Islamov	03b56f7a73	[js/webgpu] FP16 extension registration (#17493 ) ### Description First small change to support FP16 --------- Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2023-09-13 13:11:17 -07:00
Yulong Wang	a2e75114cc	[js/web] add sessionOptions.freeDimensionOverrides (#17488 ) ### Description Allows to specify fixed size for dynamic input of a model. resolves #16707 Pending test	2023-09-13 09:17:34 -07:00
Yulong Wang	cdf3e9dba9	[js] update prepack script to use exact version (#17484 ) ### Description update prepack script to use exact version. the prepack script for onnxruntime-node, onnxruntime-web and onnxruntime-react-native is used to update their referencing version of dependency "onnxruntime-common". Previously "~" (tilde symbol) is used. This may cause NPM choose an older version (if the old version matches the version requirement and was previously installed already so hit the cache). see also https://semver.npmjs.com/. [This build](https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1134671&view=results) is caused by this issue.	2023-09-13 00:07:16 -07:00
xhcao	ec94b07f0a	[JS/WebGPU] support Concat.int32 operator (#17003 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-09-13 00:05:00 -07:00
Yulong Wang	41584b2827	[js/web] ensure ORT initialization to run only once (#17529 ) ### Description ensure ORT initialization to run only once	2023-09-12 23:52:08 -07:00
Yulong Wang	f923eec28b	[js/web] release session after use in npm test (#17470 ) ### Description release session after use in npm test. This is one of the prerequisites for supporting IO binding for WebGPU buffer in onnxruntime-web. list of prerequisites PRs: #17465 #17469 #17470 (this one)	2023-09-12 16:59:13 -07:00
Arthur Islamov	65249f42e4	[js/web] FP16 Gemm, Softmax & Transpose (#17494 ) ### Description First three OPs to support fp16. Will add more once this gets merged since others depend on changes in js_data_types	2023-09-11 21:09:37 -07:00
satyajandhyala	bf6d6961cc	[JS/Web] Added Einsum operator support. (#17401 ) ### Description Added Einsum operator support to JSEP. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-09-11 15:57:15 -07:00
Yulong Wang	89da5a0108	[js/webgpu] exclude WebGPU reduce_log_sum_exp_* float64 test cases (#17472 ) ### Description as explained in the comments, tests "test_reduce_log_sum_exp_*" on opset17/opset18 are excluded because they use float64. They are passing now because they fallback to CPU. WebGPU does not support f64. This is one of the prerequisites for supporting IO binding for WebGPU buffer in onnxruntime-web. list of prerequisites PRs: https://github.com/microsoft/onnxruntime/pull/17465 https://github.com/microsoft/onnxruntime/pull/17469 https://github.com/microsoft/onnxruntime/pull/17470 https://github.com/microsoft/onnxruntime/pull/17472 (this one)	2023-09-08 17:03:04 -07:00
Caroline Zhu	dcc93909b4	Add training WASM generation to Web CI pipeline (#17319 ) ### Description [Successful pipeline run](https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1123141&view=results) Added flag to build the training artifacts & updated the pull-wasm-artifacts script to pull the training artifacts as well. Bundled into this PR are minor formatting fixes + naming fixes. ### Motivation and Context [This PR](https://github.com/microsoft/onnxruntime/pull/16521) extended the WASM API wrapper to build training WASM artifacts as well. The ORT training WASM artifacts are required to support ORT training web bindings.	2023-09-08 15:49:47 -07:00
Yulong Wang	4d753b74a5	[js/common] prepare work for supporting webgpu IO binding implementation (#17465 ) ### Description This PR contains a few changes in /js/common/ to support a coming PR for a full implementation of webgpu IO binding. - allows pass-through if value is already a Tensor instance in return value of `handler.run()` called by `InferenceSession.run()` (inference-session-impl.ts). Specifically, onnxruntime-node and onnxruntime-react-native uses native bindings to generate a Tensor-like object so we need to create a real Tensor instance here; for onnxruntime-web the return value is already a Tensor instance. - adds new types for GPU buffer supported types: `'float32'\|'int32'` -> `'float32'\|'float16'\|'int32'\|'int64'\|'uint32'\|'bool'` - exposes types `GpuBufferDataTypes` together with `CpuPinnedDataTypes` and `TextureDataTypes` as exported	2023-09-08 13:49:24 -07:00
xhcao	9017ea131b	[js/webgpu] support GreaterOrEqual and LessOrEqual operators (#17310 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-09-07 17:41:16 -07:00
dependabot[bot]	eaef485461	Bump electron from 23.1.2 to 23.3.13 in /js/web (#17436 ) Bumps [electron](https://github.com/electron/electron) from 23.1.2 to 23.3.13. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/electron/electron/releases">electron's releases</a>.</em></p> <blockquote> <h2>electron v23.3.13</h2> <h1>Release Notes for v23.3.13</h1> <h2>End of Support for 23.x.y</h2> <p>Electron 23.x.y has reached end-of-support as per the project's <a href="https://www.electronjs.org/docs/latest/tutorial/electron-timelines#version-support-policy">support policy</a>. Developers and applications are encouraged to upgrade to a newer version of Electron.</p> <h2>electron v23.3.12</h2> <h1>Release Notes for v23.3.12</h1> <h2>Other Changes</h2> <ul> <li>Fixed a crash while screen sharing on Wayland with PipeWire. <a href="https://redirect.github.com/electron/electron/pull/39274">#39274</a></li> <li>Security: backported fix for CVE-2023-3732. <ul> <li>Security: backported fix for CVE-2023-3728.</li> <li>Security: backported fix for CVE-2023-3730. <a href="https://redirect.github.com/electron/electron/pull/39268">#39268</a></li> </ul> </li> </ul> <h2>electron v23.3.11</h2> <h1>Release Notes for v23.3.11</h1> <h2>Fixes</h2> <ul> <li>Fixed a crash when listing desktop capture sources on Wayland with PipeWire. <a href="https://redirect.github.com/electron/electron/pull/39116">#39116</a> <!-- raw HTML omitted -->(Also in <a href="https://redirect.github.com/electron/electron/pull/39050">24</a>, <a href="https://redirect.github.com/electron/electron/pull/39051">25</a>, <a href="https://redirect.github.com/electron/electron/pull/39049">26</a>)<!-- raw HTML omitted --></li> </ul> <h2>electron v23.3.10</h2> <h1>Release Notes for v23.3.10</h1> <h2>Other Changes</h2> <ul> <li>Security: backported fix for CVE-2023-3422. <ul> <li>Security: backported fix for CVE-2023-3421.</li> <li>Security: backported fix for CVE-2023-3420.</li> <li>Security: backported fix for 1454860. <a href="https://redirect.github.com/electron/electron/pull/38948">#38948</a></li> </ul> </li> </ul> <h2>electron v23.3.9</h2> <h1>Release Notes for v23.3.9</h1> <h2>Fixes</h2> <ul> <li>Fixed <code>preload</code> script may not run in some child windows opened by <code>window.open</code>. <a href="https://redirect.github.com/electron/electron/pull/38933">#38933</a> <!-- raw HTML omitted -->(Also in <a href="https://redirect.github.com/electron/electron/pull/38932">24</a>, <a href="https://redirect.github.com/electron/electron/pull/38931">25</a>, <a href="https://redirect.github.com/electron/electron/pull/38930">26</a>)<!-- raw HTML omitted --></li> <li>Fixed minimize button to be visible when all buttons reenabled. <a href="https://redirect.github.com/electron/electron/pull/38880">#38880</a> <!-- raw HTML omitted -->(Also in <a href="https://redirect.github.com/electron/electron/pull/38881">24</a>, <a href="https://redirect.github.com/electron/electron/pull/38879">25</a>)<!-- raw HTML omitted --></li> </ul> <h2>electron v23.3.8</h2> <h1>Release Notes for v23.3.8</h1> <h2>Other Changes</h2> <ul> <li>Security: backported fix for CVE-2023-3215. <ul> <li>Security: backported fix for CVE-2023-3216.</li> <li>Security: backported fix for 1450536. <a href="https://redirect.github.com/electron/electron/pull/38788">#38788</a></li> </ul> </li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`4b782e259b`"><code>4b782e2</code></a> fix: avoid package.json check on built-in modules (<a href="https://redirect.github.com/electron/electron/issues/39426">#39426</a>)</li> <li><a href="`b2047d710c`"><code>b2047d7</code></a> ci: fix hang when validating AppVeyor artifacts (<a href="https://redirect.github.com/electron/electron/issues/39401">#39401</a>)</li> <li><a href="`10b2baea43`"><code>10b2bae</code></a> docs: clean up removed systemPreferences methods (<a href="https://redirect.github.com/electron/electron/issues/39349">#39349</a>)</li> <li><a href="`454990a201`"><code>454990a</code></a> chore: cherry-pick 4 changes from Release-0-M115 (<a href="https://redirect.github.com/electron/electron/issues/39268">#39268</a>)</li> <li><a href="`10b49ffa12`"><code>10b49ff</code></a> chore: cherry-pick 2 changes from webrtc (<a href="https://redirect.github.com/electron/electron/issues/39274">#39274</a>)</li> <li><a href="`dc0fc78fac`"><code>dc0fc78</code></a> fix: do not resolve electron entrypoints on disk (<a href="https://redirect.github.com/electron/electron/issues/39249">#39249</a>)</li> <li><a href="`1aafc2ae38`"><code>1aafc2a</code></a> ci: fail appveyor build if artifacts are missing (<a href="https://redirect.github.com/electron/electron/issues/39219">#39219</a>)</li> <li><a href="`595e25a270`"><code>595e25a</code></a> fix: use StartUpdating method for PipeWire capturer (<a href="https://redirect.github.com/electron/electron/issues/39116">#39116</a>)</li> <li><a href="`7fe5925c94`"><code>7fe5925</code></a> build: disable unneeded depot_tools update on Windows CI (<a href="https://redirect.github.com/electron/electron/issues/39016">#39016</a>)</li> <li><a href="`c4b0ff4994`"><code>c4b0ff4</code></a> chore: cherry-pick 4 changes from Release-3-M114 (<a href="https://redirect.github.com/electron/electron/issues/38948">#38948</a>)</li> <li>Additional commits viewable in <a href="https://github.com/electron/electron/compare/v23.1.2...v23.3.13">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=electron&package-manager=npm_and_yarn&previous-version=23.1.2&new-version=23.3.13)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-09-07 17:39:49 -07:00
Jian Chen	8914fe687b	[js/webgpu] Include Support for neg.int32 (#17374 ) ### Description Include Support for neg.int32 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-09-06 12:00:16 -07:00
Yulong Wang	fa868ca9cd	[js/node] release sessions after use in npm test (#17353 ) ### Description resolve sessions after use in NPM test.	2023-09-05 23:42:32 -07:00
Yulong Wang	d88406a31b	[js/common] use Map instead of object for backends (#17352 ) ### Description resolved https://github.com/microsoft/onnxruntime/security/code-scanning/1140	2023-09-05 23:14:46 -07:00
Yulong Wang	75710f0006	[js/webgpu] add matmul broadcast tests (#17335 ) ### Description Commit `fffefb1c22` (#16969) optimized matmul and also fixes broadcasting. So #17191 is no longer needed. However, the newly added operator test file from the PR by @dakenf is helpful so pick and add it to enhance the tests.	2023-09-05 20:41:46 -07:00
Yulong Wang	2cb75420ac	[js/common] clean up JSDoc (#17408 ) ### Description clean up JSDoc for onnxruntime-common: - replace "@internal" to "@ignore" as JSDoc do not use "@internal". Using "@ignore" will let the content not show on the generated doc.	2023-09-05 20:40:23 -07:00
xhcao	026672e947	[js/webgpu] Support slice int32 (#16968 ) Co-authored-by: Xing Xu <xing.xu@intel.com>	2023-09-05 18:05:47 -07:00
Jiajia Qin	5e747071be	[js/webgpu] Fix bug in conv2dByMatMul path (#17369 ) ### Description <!-- Describe your changes. --> For the conv2dByMatMul path, the simulated matmul output shape is the reshape of the original conv2d. So we should pass this information to `createMatmulProgramInfo` so that it can process it correctly.	2023-09-02 00:16:28 -07:00
Jian Chen	e60493525f	[js/webgpu] Adding support for abs with int32 type (#17359 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-31 08:13:54 -07:00
Jiajia Qin	352b745deb	[js/webgpu] Add input/output shapes information to profiling (#17342 ) ### Description This PR is to enhance the profiling information. With the PR, the profiling result is like below: ``` [profiling] kernel "[Split] 51288384" input[0]: 1,256,64,64, output[0]: 1,256,64,64, execution time: 37135 ns program-manager.ts:114 [profiling] kernel "[Concat] 52361040" input[0]: 1,256,64,64, output[0]: 1,256,64,64, execution time: 50833 ns program-manager.ts:114 [profiling] kernel "[Transpose] 52375264" input[0]: 1,256,64,64, output[0]: 1,64,64,256, execution time: 99791 ns program-manager.ts:114 [profiling] kernel "[Sub] 51098472" input[0]: , input[1]: 1, output[0]: 1, execution time: 7448 ns program-manager.ts:114 [profiling] kernel "[Mul] 51344440" input[0]: 1, input[1]: 1,256,1,1, output[0]: 1,256,1,1, execution time: 8334 ns ``` Without this PR, the profiling result is like below: ``` [profiling] kernel "52097928\|[Split] 52097928" execution time: 37760 ns program-manager.ts:105 [profiling] kernel "41898328\|[Concat] 41898328" execution time: 51666 ns program-manager.ts:105 [profiling] kernel "41915648\|[Transpose] 41915648" execution time: 95416 ns program-manager.ts:105 [profiling] kernel "49757856\|[Sub] 49757856" execution time: 7969 ns program-manager.ts:105 [profiling] kernel "51680504\|[Mul] 51680504" execution time: 8906 ns ``` With the new information, we can easily know what kind of shape ops have poor performance. Also it can help us to check whether too small shape ops run on gpu.	2023-08-31 08:12:28 -07:00
Yulong Wang	e5ca3f3dcb	[js/api] introducing IO binding for tensor (#16452 ) [//]: # (## Work In Progress. Feedbacks are welcome!) ### Description This PR adds a few properties, methods and factories to Tensor type to support IO-binding feature. This will allow user to create tensor from GPU/CPU bound data without a force transferring of data between CPU and GPU. This change is a way to resolve #15312 ### Change Summary 1. Add properties to `Tensor` type: a. `location`: indicating where the data is sitting. valid values are `cpu`, `cpu-pinned`, `texture`, `gpu-buffer`. b. `texture`: sit side to `data`, a readonly property of `WebGLTexture` type. available only when `location === 'texture'` c. `gpuBuffer`: sit side to `data`, a readonly property of `GPUBuffer` type. available only when `location === 'gpu-buffer'` 2. Add methods to `Tensor` type (usually dealing with inference outputs): - async function `getData()` allows user to download data from GPU to CPU manually. - function `dispose()` allows user to release GPU resources manually. 3. Add factories for creating `Tensor` instances: a. `fromTexture()` to create a WebGL texture bound tensor data b. `fromGpuBuffer()` to create a WebGPUBuffer bound tensor data c. `fromPinnedBuffer()` to create a tensor using a CPU pinned buffer ### Examples: create tensors from texture and pass to inference session as inputs ```js // when create session, specify we prefer 'image_output:0' to be stored on GPU as texture const session = await InferenceSession.create('./my_model.onnx', { executionProviders: [ 'webgl' ], preferredOutputLocation: { 'image_output:0': 'texture' } }); ... const myImageTexture = getTexture(); // user's function to get a texture const myFeeds = { input0: Tensor.fromTexture(myImageTexture, { width: 224, height: 224 }) }; // shape [1, 224, 224, 4], RGBA format. const results = await session.run(myFeeds); const myOutputTexture = results['image_output:0'].texture; ```	2023-08-29 12:58:26 -07:00
Jiajia Qin	fffefb1c22	[js/webgpu] Optimize matmul (#16969 ) ### Description Changes in this PR: 1) use the optimized version `makeMatMulPacked[Vec4]Source` to support matmul. 2) enable the conv2dByMatMul path. 3) support broadcast 4) use IndicesHelper. MatMul with M = 512, K = 512, N = 512 becomes 2ms from 15ms when enabling profilingMode on my ADL.	2023-08-29 12:40:57 -07:00
Caroline	228db24317	Add training API functions to WASM API (#16521 ) ### Description * Created `wasm/training_api` source and header files & modified WebAssembly CMake to include training flags * The `wasm/training_api` files use an `OrtTrainingManager` handle which is a struct of an OrtCheckpointState and an OrtTrainingSession, rather than creating a CheckpointState handle & a separate TrainingSession handle. * This is so that the TypeScript side only has to manage one handle that will be passed between TrainingSession & CheckpointState representations, rather than the TypeScript side managing separate CheckpointStateHandle and TrainingSessionHandle. ### Motivation and Context WASM API needs to be updated with ORT training API function calls so that ORT training web bindings can be added for on-device training. --------- Co-authored-by: Baiju Meswani <bmeswani@microsoft.com> Co-authored-by: carzh <carolinezhu@microsoft.com> Co-authored-by: Ashwini Khade <askhade@microsoft.com>	2023-08-28 11:05:02 -07:00

1 2 3 4 5 ...

502 commits