onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-26 22:35:43 +00:00

Author	SHA1	Message	Date
Arthur Islamov	498b60d8a4	[js/web] fp16 Pool & Reduce (#17512 ) ### Description Two more ops to support fp16	2023-09-21 14:52:13 -07:00
Vincent Wang	e6301eee6a	Bump Up Version to 1.17.0 (#17587 ) Bump up version to 1.17.0 as the 1.16.0 release branch had been branched out.	2023-09-20 11:02:58 +08:00
Hariharan Seshadri	460f17fbb8	[JS/WebGPU] Support If on WebGPU (#17478 )	2023-09-19 12:20:18 -07:00
Arthur Islamov	0f406ca1d3	[js/web] FP16 binary and unary ops (#17515 ) ### Description Binary and unary ops with fp16 support	2023-09-18 15:43:32 -07:00
Yulong Wang	efd416b71f	[js/web] update test to explicitly fail for webnn without proxy (#17554 ) ### Description Update test to explicitly fail for webnn without proxy. I am doing this change because if I test webnn with other backend together, it silently enables proxy. I want to make test runner behave with less implicit flag reset. If proxy is not enabled, webnn test should fail. @Honry please let me know if other places (eg. CI scripts) should change also.	2023-09-15 14:40:22 -07:00
Yulong Wang	155887593d	[js/web] update npm test to load test cases only for required backends (#17555 ) ### Description update npm test to load test cases for required backends. No need to load test case list for the backends that we don't test.	2023-09-15 13:55:25 -07:00
Yulong Wang	9aafbe3feb	[js/web] revise TensorView (#17473 ) ### Description This change: - removes the unused `Tensor` types declared in /js/web/lib/wasm/jsep/tensor.ts - removes duplicated util functions in /js/web/lib/wasm/jsep/tensor.ts - renames /js/web/lib/wasm/jsep/tensor.ts to /js/web/lib/wasm/jsep/tensor-view.ts and update corresponding references. It was kind of confusing that we have multiple `Tensor` types defined in different places also we have multiple `tensor.ts` source files. This is one of the prerequisites for supporting IO binding for WebGPU buffer in onnxruntime-web. list of prerequisites PRs: https://github.com/microsoft/onnxruntime/pull/17465 https://github.com/microsoft/onnxruntime/pull/17469 https://github.com/microsoft/onnxruntime/pull/17470 https://github.com/microsoft/onnxruntime/pull/17472 https://github.com/microsoft/onnxruntime/pull/17473 (this one)	2023-09-14 21:14:44 -07:00
Jiajia Qin	41d2ff622c	[js/webgpu] Optimize InstanceNormalization (#17491 ) ### Description <!-- Describe your changes. --> In previous implementation, there are two loops to iterate H * W elements to calculate the `mean` and `squaredNorm` value in one thread, meanwhile it outputs H * W elements in one thread. That results it's very very slow when H * W is a large value. And usually, H * W does be a large value in a model. For example, in the `candy-8` model, the shapes of [H, W] are [224,224], [112,112], [56,56] for `InstanceNormalization` op. And in my ADL, `[1,224,224,32]` consumes 17 ms. See below: ``` [profiling] kernel "23848328\|[InstanceNormalization] 23848328" input[0]: [1,224,224,32] \| float32, input[1]: [32] \| float32, input[2]: [32] \| float32, output[0]: [1,224,224,32] \| float32, execution time: 17007914 ns ``` In this PR, it uses workgroup memory to optimize the original algorithm. The advantage is that it can parallelly utilize the 64 (workgroupSize) threads in one workgroup to calculate `mean` and `squaredNorm` value. Meanwhile, it only outputs `H * W / workgroupSize` outputs for one thread, which greatly reduces the overhead for one thread. With this optimization, `[1,224,224,32]` becomes 3 ms and the main overhead is the extra two `transpose`. The `createInstanceNormProgramInfo` only needs `0.64` ms. See below: ``` [profiling] kernel "23003600\|[InstanceNormalization] 23003600" input[0]: [1,224,224,32] \| float32, output[0]: [1,32,224,224] \| float32, execution time: 1543792 ns program-manager.ts:115 [profiling] kernel "23003600\|[InstanceNormalization] 23003600" input[0]: [1,32,224,224] \| float32, input[1]: [32] \| float32, input[2]: [32] \| float32, output[0]: [1,32,224,224] \| float32, execution time: 642652 ns program-manager.ts:115 [profiling] kernel "23003600\|[InstanceNormalization] 23003600" input[0]: [1,32,224,224] \| float32, output[0]: [1,224,224,32] \| float32, execution time: 991608 ns ``` This PR currently only applies the new algorithm to NCHW format. For NHWC format, one way is to transpose the input so that it can use the new algorithm. But the disadvantage is that 2 extra transpose are added. @dakenf also gives another way to optimize NHWC. Details see [here](`d45a96616d/js/web/lib/wasm/jsep/webgpu/ops/instance-norm.ts`). I checked @dakenf's method. The perf is similar with transpose + optimized NCHW. But on different GPUs, one is a little better than another or vice versa. So I prefer this PR only does the NCHW part. @dakenf can submit his optimization on NHWC.	2023-09-14 17:03:18 -07:00
xhcao	198d468849	[WebGPU/JS] Added Pad operator support (#16928 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-09-14 13:14:11 -07:00
Yulong Wang	7af2f68ef3	[js/web] add a test flag to customize chromium flags (#17545 ) ### Description add a test flag to customize chromium flags. Usage: npm test -- \<other flags> --chromium-flags=<...>	2023-09-14 10:05:31 -07:00
Hans	ad369a1fad	[js/rn] Support create boolean tensor (#17052 ) ### Description <!-- Describe your changes. --> For some use case need to create boolean tensor. I've tested on [this project](https://github.com/hans00/react-native-transformers-example) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Add handle `ONNX_TENSOR_ELEMENT_DATA_TYPE_BOOL` And it required #15556 (It seems not include in latest release (v1.15.1))	2023-09-14 15:02:27 +10:00
Arthur Islamov	03b56f7a73	[js/webgpu] FP16 extension registration (#17493 ) ### Description First small change to support FP16 --------- Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2023-09-13 13:11:17 -07:00
Yulong Wang	a2e75114cc	[js/web] add sessionOptions.freeDimensionOverrides (#17488 ) ### Description Allows to specify fixed size for dynamic input of a model. resolves #16707 Pending test	2023-09-13 09:17:34 -07:00
Yulong Wang	cdf3e9dba9	[js] update prepack script to use exact version (#17484 ) ### Description update prepack script to use exact version. the prepack script for onnxruntime-node, onnxruntime-web and onnxruntime-react-native is used to update their referencing version of dependency "onnxruntime-common". Previously "~" (tilde symbol) is used. This may cause NPM choose an older version (if the old version matches the version requirement and was previously installed already so hit the cache). see also https://semver.npmjs.com/. [This build](https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1134671&view=results) is caused by this issue.	2023-09-13 00:07:16 -07:00
xhcao	ec94b07f0a	[JS/WebGPU] support Concat.int32 operator (#17003 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-09-13 00:05:00 -07:00
Yulong Wang	41584b2827	[js/web] ensure ORT initialization to run only once (#17529 ) ### Description ensure ORT initialization to run only once	2023-09-12 23:52:08 -07:00
Yulong Wang	f923eec28b	[js/web] release session after use in npm test (#17470 ) ### Description release session after use in npm test. This is one of the prerequisites for supporting IO binding for WebGPU buffer in onnxruntime-web. list of prerequisites PRs: #17465 #17469 #17470 (this one)	2023-09-12 16:59:13 -07:00
Arthur Islamov	65249f42e4	[js/web] FP16 Gemm, Softmax & Transpose (#17494 ) ### Description First three OPs to support fp16. Will add more once this gets merged since others depend on changes in js_data_types	2023-09-11 21:09:37 -07:00
satyajandhyala	bf6d6961cc	[JS/Web] Added Einsum operator support. (#17401 ) ### Description Added Einsum operator support to JSEP. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-09-11 15:57:15 -07:00
Yulong Wang	89da5a0108	[js/webgpu] exclude WebGPU reduce_log_sum_exp_* float64 test cases (#17472 ) ### Description as explained in the comments, tests "test_reduce_log_sum_exp_*" on opset17/opset18 are excluded because they use float64. They are passing now because they fallback to CPU. WebGPU does not support f64. This is one of the prerequisites for supporting IO binding for WebGPU buffer in onnxruntime-web. list of prerequisites PRs: https://github.com/microsoft/onnxruntime/pull/17465 https://github.com/microsoft/onnxruntime/pull/17469 https://github.com/microsoft/onnxruntime/pull/17470 https://github.com/microsoft/onnxruntime/pull/17472 (this one)	2023-09-08 17:03:04 -07:00
Caroline Zhu	dcc93909b4	Add training WASM generation to Web CI pipeline (#17319 ) ### Description [Successful pipeline run](https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1123141&view=results) Added flag to build the training artifacts & updated the pull-wasm-artifacts script to pull the training artifacts as well. Bundled into this PR are minor formatting fixes + naming fixes. ### Motivation and Context [This PR](https://github.com/microsoft/onnxruntime/pull/16521) extended the WASM API wrapper to build training WASM artifacts as well. The ORT training WASM artifacts are required to support ORT training web bindings.	2023-09-08 15:49:47 -07:00
Yulong Wang	4d753b74a5	[js/common] prepare work for supporting webgpu IO binding implementation (#17465 ) ### Description This PR contains a few changes in /js/common/ to support a coming PR for a full implementation of webgpu IO binding. - allows pass-through if value is already a Tensor instance in return value of `handler.run()` called by `InferenceSession.run()` (inference-session-impl.ts). Specifically, onnxruntime-node and onnxruntime-react-native uses native bindings to generate a Tensor-like object so we need to create a real Tensor instance here; for onnxruntime-web the return value is already a Tensor instance. - adds new types for GPU buffer supported types: `'float32'\|'int32'` -> `'float32'\|'float16'\|'int32'\|'int64'\|'uint32'\|'bool'` - exposes types `GpuBufferDataTypes` together with `CpuPinnedDataTypes` and `TextureDataTypes` as exported	2023-09-08 13:49:24 -07:00
xhcao	9017ea131b	[js/webgpu] support GreaterOrEqual and LessOrEqual operators (#17310 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-09-07 17:41:16 -07:00
dependabot[bot]	eaef485461	Bump electron from 23.1.2 to 23.3.13 in /js/web (#17436 ) Bumps [electron](https://github.com/electron/electron) from 23.1.2 to 23.3.13. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/electron/electron/releases">electron's releases</a>.</em></p> <blockquote> <h2>electron v23.3.13</h2> <h1>Release Notes for v23.3.13</h1> <h2>End of Support for 23.x.y</h2> <p>Electron 23.x.y has reached end-of-support as per the project's <a href="https://www.electronjs.org/docs/latest/tutorial/electron-timelines#version-support-policy">support policy</a>. Developers and applications are encouraged to upgrade to a newer version of Electron.</p> <h2>electron v23.3.12</h2> <h1>Release Notes for v23.3.12</h1> <h2>Other Changes</h2> <ul> <li>Fixed a crash while screen sharing on Wayland with PipeWire. <a href="https://redirect.github.com/electron/electron/pull/39274">#39274</a></li> <li>Security: backported fix for CVE-2023-3732. <ul> <li>Security: backported fix for CVE-2023-3728.</li> <li>Security: backported fix for CVE-2023-3730. <a href="https://redirect.github.com/electron/electron/pull/39268">#39268</a></li> </ul> </li> </ul> <h2>electron v23.3.11</h2> <h1>Release Notes for v23.3.11</h1> <h2>Fixes</h2> <ul> <li>Fixed a crash when listing desktop capture sources on Wayland with PipeWire. <a href="https://redirect.github.com/electron/electron/pull/39116">#39116</a> <!-- raw HTML omitted -->(Also in <a href="https://redirect.github.com/electron/electron/pull/39050">24</a>, <a href="https://redirect.github.com/electron/electron/pull/39051">25</a>, <a href="https://redirect.github.com/electron/electron/pull/39049">26</a>)<!-- raw HTML omitted --></li> </ul> <h2>electron v23.3.10</h2> <h1>Release Notes for v23.3.10</h1> <h2>Other Changes</h2> <ul> <li>Security: backported fix for CVE-2023-3422. <ul> <li>Security: backported fix for CVE-2023-3421.</li> <li>Security: backported fix for CVE-2023-3420.</li> <li>Security: backported fix for 1454860. <a href="https://redirect.github.com/electron/electron/pull/38948">#38948</a></li> </ul> </li> </ul> <h2>electron v23.3.9</h2> <h1>Release Notes for v23.3.9</h1> <h2>Fixes</h2> <ul> <li>Fixed <code>preload</code> script may not run in some child windows opened by <code>window.open</code>. <a href="https://redirect.github.com/electron/electron/pull/38933">#38933</a> <!-- raw HTML omitted -->(Also in <a href="https://redirect.github.com/electron/electron/pull/38932">24</a>, <a href="https://redirect.github.com/electron/electron/pull/38931">25</a>, <a href="https://redirect.github.com/electron/electron/pull/38930">26</a>)<!-- raw HTML omitted --></li> <li>Fixed minimize button to be visible when all buttons reenabled. <a href="https://redirect.github.com/electron/electron/pull/38880">#38880</a> <!-- raw HTML omitted -->(Also in <a href="https://redirect.github.com/electron/electron/pull/38881">24</a>, <a href="https://redirect.github.com/electron/electron/pull/38879">25</a>)<!-- raw HTML omitted --></li> </ul> <h2>electron v23.3.8</h2> <h1>Release Notes for v23.3.8</h1> <h2>Other Changes</h2> <ul> <li>Security: backported fix for CVE-2023-3215. <ul> <li>Security: backported fix for CVE-2023-3216.</li> <li>Security: backported fix for 1450536. <a href="https://redirect.github.com/electron/electron/pull/38788">#38788</a></li> </ul> </li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`4b782e259b`"><code>4b782e2</code></a> fix: avoid package.json check on built-in modules (<a href="https://redirect.github.com/electron/electron/issues/39426">#39426</a>)</li> <li><a href="`b2047d710c`"><code>b2047d7</code></a> ci: fix hang when validating AppVeyor artifacts (<a href="https://redirect.github.com/electron/electron/issues/39401">#39401</a>)</li> <li><a href="`10b2baea43`"><code>10b2bae</code></a> docs: clean up removed systemPreferences methods (<a href="https://redirect.github.com/electron/electron/issues/39349">#39349</a>)</li> <li><a href="`454990a201`"><code>454990a</code></a> chore: cherry-pick 4 changes from Release-0-M115 (<a href="https://redirect.github.com/electron/electron/issues/39268">#39268</a>)</li> <li><a href="`10b49ffa12`"><code>10b49ff</code></a> chore: cherry-pick 2 changes from webrtc (<a href="https://redirect.github.com/electron/electron/issues/39274">#39274</a>)</li> <li><a href="`dc0fc78fac`"><code>dc0fc78</code></a> fix: do not resolve electron entrypoints on disk (<a href="https://redirect.github.com/electron/electron/issues/39249">#39249</a>)</li> <li><a href="`1aafc2ae38`"><code>1aafc2a</code></a> ci: fail appveyor build if artifacts are missing (<a href="https://redirect.github.com/electron/electron/issues/39219">#39219</a>)</li> <li><a href="`595e25a270`"><code>595e25a</code></a> fix: use StartUpdating method for PipeWire capturer (<a href="https://redirect.github.com/electron/electron/issues/39116">#39116</a>)</li> <li><a href="`7fe5925c94`"><code>7fe5925</code></a> build: disable unneeded depot_tools update on Windows CI (<a href="https://redirect.github.com/electron/electron/issues/39016">#39016</a>)</li> <li><a href="`c4b0ff4994`"><code>c4b0ff4</code></a> chore: cherry-pick 4 changes from Release-3-M114 (<a href="https://redirect.github.com/electron/electron/issues/38948">#38948</a>)</li> <li>Additional commits viewable in <a href="https://github.com/electron/electron/compare/v23.1.2...v23.3.13">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=electron&package-manager=npm_and_yarn&previous-version=23.1.2&new-version=23.3.13)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-09-07 17:39:49 -07:00
Jian Chen	8914fe687b	[js/webgpu] Include Support for neg.int32 (#17374 ) ### Description Include Support for neg.int32 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-09-06 12:00:16 -07:00
Yulong Wang	fa868ca9cd	[js/node] release sessions after use in npm test (#17353 ) ### Description resolve sessions after use in NPM test.	2023-09-05 23:42:32 -07:00
Yulong Wang	d88406a31b	[js/common] use Map instead of object for backends (#17352 ) ### Description resolved https://github.com/microsoft/onnxruntime/security/code-scanning/1140	2023-09-05 23:14:46 -07:00
Yulong Wang	75710f0006	[js/webgpu] add matmul broadcast tests (#17335 ) ### Description Commit `fffefb1c22` (#16969) optimized matmul and also fixes broadcasting. So #17191 is no longer needed. However, the newly added operator test file from the PR by @dakenf is helpful so pick and add it to enhance the tests.	2023-09-05 20:41:46 -07:00
Yulong Wang	2cb75420ac	[js/common] clean up JSDoc (#17408 ) ### Description clean up JSDoc for onnxruntime-common: - replace "@internal" to "@ignore" as JSDoc do not use "@internal". Using "@ignore" will let the content not show on the generated doc.	2023-09-05 20:40:23 -07:00
xhcao	026672e947	[js/webgpu] Support slice int32 (#16968 ) Co-authored-by: Xing Xu <xing.xu@intel.com>	2023-09-05 18:05:47 -07:00
Jiajia Qin	5e747071be	[js/webgpu] Fix bug in conv2dByMatMul path (#17369 ) ### Description <!-- Describe your changes. --> For the conv2dByMatMul path, the simulated matmul output shape is the reshape of the original conv2d. So we should pass this information to `createMatmulProgramInfo` so that it can process it correctly.	2023-09-02 00:16:28 -07:00
Jian Chen	e60493525f	[js/webgpu] Adding support for abs with int32 type (#17359 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-31 08:13:54 -07:00
Jiajia Qin	352b745deb	[js/webgpu] Add input/output shapes information to profiling (#17342 ) ### Description This PR is to enhance the profiling information. With the PR, the profiling result is like below: ``` [profiling] kernel "[Split] 51288384" input[0]: 1,256,64,64, output[0]: 1,256,64,64, execution time: 37135 ns program-manager.ts:114 [profiling] kernel "[Concat] 52361040" input[0]: 1,256,64,64, output[0]: 1,256,64,64, execution time: 50833 ns program-manager.ts:114 [profiling] kernel "[Transpose] 52375264" input[0]: 1,256,64,64, output[0]: 1,64,64,256, execution time: 99791 ns program-manager.ts:114 [profiling] kernel "[Sub] 51098472" input[0]: , input[1]: 1, output[0]: 1, execution time: 7448 ns program-manager.ts:114 [profiling] kernel "[Mul] 51344440" input[0]: 1, input[1]: 1,256,1,1, output[0]: 1,256,1,1, execution time: 8334 ns ``` Without this PR, the profiling result is like below: ``` [profiling] kernel "52097928\|[Split] 52097928" execution time: 37760 ns program-manager.ts:105 [profiling] kernel "41898328\|[Concat] 41898328" execution time: 51666 ns program-manager.ts:105 [profiling] kernel "41915648\|[Transpose] 41915648" execution time: 95416 ns program-manager.ts:105 [profiling] kernel "49757856\|[Sub] 49757856" execution time: 7969 ns program-manager.ts:105 [profiling] kernel "51680504\|[Mul] 51680504" execution time: 8906 ns ``` With the new information, we can easily know what kind of shape ops have poor performance. Also it can help us to check whether too small shape ops run on gpu.	2023-08-31 08:12:28 -07:00
Yulong Wang	e5ca3f3dcb	[js/api] introducing IO binding for tensor (#16452 ) [//]: # (## Work In Progress. Feedbacks are welcome!) ### Description This PR adds a few properties, methods and factories to Tensor type to support IO-binding feature. This will allow user to create tensor from GPU/CPU bound data without a force transferring of data between CPU and GPU. This change is a way to resolve #15312 ### Change Summary 1. Add properties to `Tensor` type: a. `location`: indicating where the data is sitting. valid values are `cpu`, `cpu-pinned`, `texture`, `gpu-buffer`. b. `texture`: sit side to `data`, a readonly property of `WebGLTexture` type. available only when `location === 'texture'` c. `gpuBuffer`: sit side to `data`, a readonly property of `GPUBuffer` type. available only when `location === 'gpu-buffer'` 2. Add methods to `Tensor` type (usually dealing with inference outputs): - async function `getData()` allows user to download data from GPU to CPU manually. - function `dispose()` allows user to release GPU resources manually. 3. Add factories for creating `Tensor` instances: a. `fromTexture()` to create a WebGL texture bound tensor data b. `fromGpuBuffer()` to create a WebGPUBuffer bound tensor data c. `fromPinnedBuffer()` to create a tensor using a CPU pinned buffer ### Examples: create tensors from texture and pass to inference session as inputs ```js // when create session, specify we prefer 'image_output:0' to be stored on GPU as texture const session = await InferenceSession.create('./my_model.onnx', { executionProviders: [ 'webgl' ], preferredOutputLocation: { 'image_output:0': 'texture' } }); ... const myImageTexture = getTexture(); // user's function to get a texture const myFeeds = { input0: Tensor.fromTexture(myImageTexture, { width: 224, height: 224 }) }; // shape [1, 224, 224, 4], RGBA format. const results = await session.run(myFeeds); const myOutputTexture = results['image_output:0'].texture; ```	2023-08-29 12:58:26 -07:00
Jiajia Qin	fffefb1c22	[js/webgpu] Optimize matmul (#16969 ) ### Description Changes in this PR: 1) use the optimized version `makeMatMulPacked[Vec4]Source` to support matmul. 2) enable the conv2dByMatMul path. 3) support broadcast 4) use IndicesHelper. MatMul with M = 512, K = 512, N = 512 becomes 2ms from 15ms when enabling profilingMode on my ADL.	2023-08-29 12:40:57 -07:00
Caroline	228db24317	Add training API functions to WASM API (#16521 ) ### Description * Created `wasm/training_api` source and header files & modified WebAssembly CMake to include training flags * The `wasm/training_api` files use an `OrtTrainingManager` handle which is a struct of an OrtCheckpointState and an OrtTrainingSession, rather than creating a CheckpointState handle & a separate TrainingSession handle. * This is so that the TypeScript side only has to manage one handle that will be passed between TrainingSession & CheckpointState representations, rather than the TypeScript side managing separate CheckpointStateHandle and TrainingSessionHandle. ### Motivation and Context WASM API needs to be updated with ORT training API function calls so that ORT training web bindings can be added for on-device training. --------- Co-authored-by: Baiju Meswani <bmeswani@microsoft.com> Co-authored-by: carzh <carolinezhu@microsoft.com> Co-authored-by: Ashwini Khade <askhade@microsoft.com>	2023-08-28 11:05:02 -07:00
Hariharan Seshadri	cbd97515cd	[JS/WebGPU] Support GatherElements kernel (#17243 ) ### Description As title ### Motivation and Context Improve WebGPU kernel coverage	2023-08-28 09:55:25 -07:00
Yulong Wang	bb1871332f	[js/webgpu] add kernel Not and Equal (#17306 ) ### Description This PR adds kernel implementation for operator "Not" and "Equal". Also removed download cache in gpu data manager. Why removing download cache The following test case failed. ("Or" is on CPU, "Greater" and "Equal" are on JSEP) ![image](https://github.com/microsoft/onnxruntime/assets/7679871/8d9798ad-2703-4fb9-907e-ff716c67d0b2) after debugging, I found that both "Equal" and "Greater" are using the same output GPU Data ID. This is because when ORT executes the graph, it first run "Equal", allowing its shader to write into GPU Data ID 2; then a Gpu2Cpu copy for it is issued (because currently "Or" is on CPU EP); at this point, ORT thinks GPU Data ID=2 is free to use; so it reuse it as output for "Greater". This means there is no allocation for output of "Greater" kernel, and both kernel writes to GPU Data ID=2. For gpu data manager, there will be 2 downloads from the same GPU buffer. Previously I think this is a waste of resource so I cached the data. But now it shoes that we need to perform 2 downloads because the GPU data is already different. The download data cache should be removed. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-27 19:50:17 -07:00
Yulong Wang	ddcd46174e	[js/webgpu] fix jsepOnRunEnd (#17300 ) ### Description fix jsepOnRunEnd: jsepOnRunEnd() need to be run after runPromise is resolved.	2023-08-26 00:30:28 -07:00
Arthur Islamov	c262879214	Added DML and CUDA provider support in onnxruntime-node (#16050 ) ### Description I've added changes to support CUDA and DML (only on Windows, on other platforms it will throw an error) ### Motivation and Context It fixes this feature request https://github.com/microsoft/onnxruntime/issues/14127 which is tracked here https://github.com/microsoft/onnxruntime/issues/14529 I was working on StableDiffusion implementation for node.js and it is very slow on CPU, so GPU support is essential. Here is a working demo with a patched and precompiled version https://github.com/dakenf/stable-diffusion-nodejs ---------	2023-08-25 16:57:06 -07:00
Jiajia Qin	873ef8b8f0	[js/webgpu] add label for some webgpu APIs (#17291 ) ### Description <!-- Describe your changes. --> With the label, it's more easier to identify which op causes the error. Without the label, the error message is like below: ``` Tint WGSL reader failure: :12:5 error: return statement type must match its function return type, returned 'vec4<f32>', expected 'f32' return W[i2o_W(indices)]; ^^^^^^ - While validating [ShaderModuleDescriptor] - While calling [Device].CreateShaderModule([ShaderModuleDescriptor]). ``` With the label, the error message is like below: ``` Tint WGSL reader failure: :12:5 error: return statement type must match its function return type, returned 'vec4<f32>', expected 'f32' return W[i2o_W(indices)]; ^^^^^^ - While validating [ShaderModuleDescriptor "ConvTranspose2D"] - While calling [Device].CreateShaderModule([ShaderModuleDescriptor "ConvTranspose2D"]). ``` ### Motivation and Context This change is mainly for debugging. With this change, we can easily know that `ConvTranspose2D`'s shader has problem from above message.	2023-08-25 12:12:56 -07:00
xhcao	5e8d94cec8	[js/webgpu] support Greater and Less operators (#17296 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-25 12:11:25 -07:00
Yulong Wang	79c4ed9a45	[js/webgpu] support error pop and kernel name (#17260 ) ### Description This PR contains changes to support error pop and kernel name. - Add a function `JsepGetNodeName` to allow reading kernel name from JS to C++ - When in debug mode ( `env.debug = true;` ) or in profiling mode ( `env.webgpu.profilingMode = 'default';` ), kernel name will be read from ORT; otherwise use the kernel pointer ( a number ) as kernel name to save calls from JS to C++. - When in debug mode, WebGPU validation errors will be recorded and if any error occurs, `inferenceSession.run()` will fail (Promise get rejected). Behavior when not in debug mode is not changed. This is because recording errors are not zero-overhead, and GPU validation errors should occur consistently in and not in debug mode. - Add `jsepOnRunStart()` and `jsepOnRunEnd()` hook to: - allow implementation of the features mentioned above. - pass session ID to backend.	2023-08-25 08:08:15 -07:00
satyajandhyala	da180b20fa	[JS/Web] Fix ConvTranspose shader code compilation errors. (#17232 ) ### Description Fix JSEP ConvTranspose shader code errors. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-25 06:25:54 -07:00
Changming Sun	3e934030f4	nodejs: Release Ort Env before main function returns (#17288 ) ### Description Release OrtEnv before main function returns. Before this change, OrtEnv is deleted when C/C++ runtime destructs all global variables in ONNX Runtime's core framework. The callstack is like this: ``` * frame #0: 0x00007fffee39f5a6 libonnxruntime.so.1.16.0`onnxruntime::Environment::~Environment(this=0x00007fffee39fbf2) at environment.h:20:7 frame #1: 0x00007fffee39f614 libonnxruntime.so.1.16.0`std::default_delete<onnxruntime::Environment>::operator()(this=0x00007ffff4c30e50, __ptr=0x0000000005404b00) const at unique_ptr.h:85:2 frame #2: 0x00007fffee39edca libonnxruntime.so.1.16.0`std::unique_ptr<onnxruntime::Environment, std::default_delete<onnxruntime::Environment>>::~unique_ptr(this=0x5404b00) at unique_ptr.h:361:17 frame #3: 0x00007fffee39e2ab libonnxruntime.so.1.16.0`OrtEnv::~OrtEnv(this=0x00007ffff4c30e50) at ort_env.cc:43:1 frame #4: 0x00007fffee39fa96 libonnxruntime.so.1.16.0`std::default_delete<OrtEnv>::operator()(this=0x00007fffefff8f78, __ptr=0x00007ffff4c30e50) const at unique_ptr.h:85:2 frame #5: 0x00007fffee39f394 libonnxruntime.so.1.16.0`std::unique_ptr<OrtEnv, std::default_delete<OrtEnv>>::~unique_ptr(this=0x7ffff4c30e50) at unique_ptr.h:361:17 frame #6: 0x00007ffff78574b5 libc.so.6`__run_exit_handlers + 261 frame #7: 0x00007ffff7857630 libc.so.6`exit + 32 frame #8: 0x00007ffff783feb7 libc.so.6`__libc_start_call_main + 135 frame #9: 0x00007ffff783ff60 libc.so.6`__libc_start_main@@GLIBC_2.34 + 128 frame #10: 0x0000000000abbdee node`_start + 46 ``` After this change, OrtEnv will be deleted before the main function returns and nodejs is still alive.	2023-08-24 23:07:02 -07:00
Yulong Wang	fb51faea64	[js/webgpu] fix 2 build breaks introduced in merge (#17273 ) ### Description fix 2 build breaks introduced in merge. Fixes web build	2023-08-23 18:09:50 -07:00
Yulong Wang	8b18d48c7c	[js/webgpu] make IndicesHelper implementation implicit (#17193 ) ### Description This change makes it no longer required to call indicesHelper.impl() in shader code.	2023-08-23 14:41:35 -07:00
Arthur Islamov	5842144d98	[js/web] JSEP Gemm for opset 13 (#16936 ) ### Description Added JSEP Gemm registration for opset 13. It was falling back to CPU provider as CPU has it for 13 --------- Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com>	2023-08-22 18:13:20 -07:00
Guenther Schmuelling	d3d3dde844	fix webgpu split (#17258 ) fix webgpu split for the case of split_sizes coming from input[1]	2023-08-22 16:49:22 -07:00
Yulong Wang	6fc3fd9ece	[js/webgpu] support Cast operator (#16489 ) ### Description support `Cast` operator for webgpu backend. Cast operator for webgpu backend currently only supports f32, u32, i32 and bool.	2023-08-18 23:51:03 -07:00

1 2 3 4 5 ...

388 commits