onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-22 22:01:08 +00:00

Author	SHA1	Message	Date
Xu Xing	afd642a194	[js/webgpu] Replace array with string in transpose perm (#21930 ) Perf test data(100000 times) Array: 12.599999997764826ms String: 1.6000000014901161ms Perf test case: ``` const permFunctionBodyArray = (rank: number, input: string): string => { const reverseFunc = []; reverseFunc.push(`fn perm(i: int) -> int { var a: int};`); for (let i = 0; i < rank; ++i) { reverseFunc.push(input); } reverseFunc.push('return a;}'); return reverseFunc.join('\n'); }; const permFunctionBodyString = (rank: number, input: string): string => { let reverseFunc= `fn perm(i: int}) -> int { var a: int;`; for (let i = 0; i < rank; ++i) { reverseFunc+=input; } reverseFunc+='return a;}'; return reverseFunc;//.join('\n'); }; const count = 100000; let start, end console.time('array'); start = performance.now(); for(let i =0 ; i < count; i ++) { permFunctionBodyArray(3, 'input'); } end = performance.now(); console.timeEnd('array'); console.log("Array: "+ (end-start)); console.time('string'); start = performance.now(); for(let i =0 ; i < count; i ++) { permFunctionBodyString(3, 'input'); } end = performance.now(); console.log("String: " +(end-start)); console.timeEnd('string'); ``` ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-09-16 23:17:46 -07:00
Yang Gu	2db6b734f5	[js/webgpu] Fix issue to run model demucs (#22074 ) This is to fix issue #22031 to run model demucs. For conv-transpose, outputPadding.length could be 1, while spatialRank is 2. The fix is to append enough 0s to outputPadding. For conv, the issue is similar. kernelShape.length sometimes could be 1, while inputs[1].dims.length is 4. The fix is also to append enough 0s to kernelShape.	2024-09-16 23:17:10 -07:00
Yulong Wang	291a5352b2	[js/web] remove training release (#22103 ) ### Description Remove training from onnxruntime-web Following up of #22082	2024-09-16 10:56:22 -07:00
Bin Miao	4d82404544	[WebNN EP] Support GRU operator (#20405 ) This PR support Gru operator for WebNN EP. @Honry , @fdwr thanks!	2024-09-11 14:16:36 -07:00
dependabot[bot]	19954decaf	Bump body-parser from 1.20.2 to 1.20.3 in /js/web (#22044 )	2024-09-10 23:05:44 +00:00
Jiajia Qin	3580e01348	[js/webgpu] Optimize grouped conv (#21892 ) ### Description <!-- Describe your changes. --> #21618 This PR optimizes grouped conv by 1) more sequential memory access in gpu 2) reusing input's data to reduce global memory access times. See `Conv\|GroupedConv` op in [Wav2Vec2](https://huggingface.co/facebook/wav2vec2-base-960h) becomes 92 ms from 1058 ms on iGPUs with 32 EU. For the whole model on my iGPUs with 32 EU, wav2vec2 model becomes 982ms from 1942 ms. squeezebert-uncased model becomes 71.86ms from 431.77ms. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-09-04 17:16:35 -07:00
Jiajia Qin	a80bfed5b4	[js/webgpu] Optimize transpose (#21964 ) ### Description <!-- Describe your changes. --> Fix bugs in previous implementation and add more situations to go the optimized path. Below situations will go to the optimized path. 1. 2d inputs or squeezed 2d inputs 2. channels last or channels first transpose. For example, channel last transpose: [1, 256, 512, 512] -> [1, 512, 512, 256] For this case, the transpose becomes [256, 512x512] -> [512x512, 256] ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> For SD Turbo demo, the total transpose time becomes 39.98ms from 122.09ms. And the correspnding percents becomes 3.89% from 11.05% in this demo. This PR will also help #21618, the total transpose time in that demo becomes 17.32 ms from 70.25 ms on my iGPUs.	2024-09-04 12:04:04 -07:00
Guenther Schmuelling	4fece0430f	remove duplicate function definition (#21903 )	2024-08-28 16:18:56 -07:00
xhcao	3bfb5e4f62	[js/webgpu] support float16 for Clip (#21584 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-28 13:19:20 -07:00
Jiajia Qin	252222034f	[js/webgpu] Support Reshape/Shape 21+ on jsep (#21871 ) ### Description <!-- Describe your changes. --> #21618 With this PR, the cross device copying (`MemcpyToHost`) can totally be removed for model `wav2vec2`. And the overall time becomes 48ms from 604ms. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-27 09:02:39 -07:00
Satya Kumar Jandhyala	af18824f43	[JS/WebGPU] Add GatherBlockQuantized op support (#21734 ) ### Description Add GatherBlockQuantized operator to JSEP. ### Motivation and Context Gemma model requires this.	2024-08-26 14:46:04 -07:00
Xu Xing	d9c57ac7db	[js/webgpu] Enable pad f16 uniform (#21691 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2024-08-26 07:58:48 -07:00
Jiajia Qin	87165b92e9	[js/webgpu] optimize MatmulNBits (#21747 ) ### Description <!-- Describe your changes. --> See 2x speedup for phi3 on the integrated intel gpu with this optimization. The optimization is mainly to store input A's data into local variable instead of loading them from global memory each time when calculate them with B data. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-23 16:36:00 -07:00
Jiajia Qin	27a6890529	[js/webgpu] Optimize conv1d by conv2d (#19388 ) ### Description <!-- Describe your changes. --> Optimize conv1d to go to the conv2d path to utilize the conv2d's optimization path. See whisper-tiny-encoder model becomes 158.66 ms from 532.28 ms. Conv goes to Conv2DMatMul(8 ms) instead of GroupedConv(382 ms). Old profiling result: Kernel \| Time (ms) \| Percentage (%) -- \| -- \| -- Conv\\|GroupedConv \| 382.99 \| 71.95 MatMul \| 126.16 \| 23.70 Softmax \| 7.01 \| 1.32 Transpose \| 4.59 \| 0.86 Add \| 4.39 \| 0.82 Mul \| 2.36 \| 0.44 Div \| 1.44 \| 0.27 ReduceMean\\|ReduceMeanShared \| 1.25 \| 0.23 Erf \| 0.85 \| 0.16 Sub \| 0.72 \| 0.14 Pow \| 0.46 \| 0.09 Sqrt \| 0.07 \| 0.01 Sum \| 532.28 \| New profiling result with this PR: Kernel \| Time (ms) \| Percentage (%) -- \| -- \| -- MatMul \| 127.07 \| 80.09 Conv\\|Conv2DMatMul \| 8.00 \| 5.04 Softmax \| 6.95 \| 4.38 Transpose \| 4.65 \| 2.93 Add \| 4.26 \| 2.68 Mul \| 2.56 \| 1.61 Div \| 1.51 \| 0.95 ReduceMean\\|ReduceMeanShared \| 1.31 \| 0.83 Erf \| 0.85 \| 0.54 Sub \| 0.79 \| 0.50 Pow \| 0.46 \| 0.29 Conv\\|Transpose \| 0.26 \| 0.17 Sqrt \| 0.00 \| 0.00 Sum \| 158.66 \| --------- Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2024-08-22 22:56:07 -07:00
Satya Kumar Jandhyala	1fb2e71ddc	[JS/WebGPU] Avoid producing presentKey/presentValue outputs if pastKey/pastValue … (#21782 ) Avoid producing presentKey/presentValue outputs if pastKey/pastValue don't exists. ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-19 18:02:19 -07:00
Wanming Lin	7ae0b4ce64	[WebNN EP] Support Erf and Trilu for CPU backend (#21768 )	2024-08-19 07:56:16 -07:00
xhcao	417aa00406	[js/webgpu] fix conv1d error (#21585 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-18 15:45:13 -07:00
Jiajia Qin	c4ade796d6	[js/webgpu] Fix attention shader recompilation issue (#21770 ) ### Description <!-- Describe your changes. --> This PR fixes the `AttentionProbsSoftmax` recompilation issue when executing the phi3 model. With this fix, it will further improve the phi3 performance. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-17 17:15:15 -07:00
Yang Gu	49fc168eed	[js/webgpu] Handle negative axis in op Split (#21771 ) This is to fix issue #21703, where the axis is a negative value in the model. According to the spec (https://onnx.ai/onnx/operators/onnx__Split.html), negative axis means counting dimensions from the back.	2024-08-17 16:41:23 -07:00
Tianlei Wu	d79e3c5791	Extend Attention Bias Broadcast Support (#21710 ) ### Description Previously, MultiHeadAttention supports relative position bias of shape [1, N, S, T] or [B, N, S, T], and DecoderMaskedMultiHeadAttention supports [1, N, S, T]. This will extend the support to allow [1, N, S, T], [B, N, S, T], [B, 1, S, T] and [1, 1, S, T] for CUDA and CPU EPs. - [x] Rename the input of "relative position bias" to "attention bias" because it can also be used for other types of bias, like ALiBi (Attention with Linear Biases) or attention mask. - [x] Update unfused kernel to support broadcasting 2nd dimension of attention bias. - [x] Update efficient attention to support broadcasting 2nd dimension of attention bias. - [x] Update operators (MultiHeadAttention, DecoderMaskedMultiHeadAttention, Attention, PackedAttention, PackedMultiHeadAttention) to support broadcast attention bias on CUDA and CPU EPs. - [x] Update ROCm, DML and WebGPU naming to be consistent. (Note that those EPs do not support broadcasting attention_bias for now). - [x] Add attention bias tests for MultiHeadAttention. - [x] Update operator documents - [x] Update benchmark script Other changes: * Fix some checks in multihead-attention.ts * Add helper functions to dump tensors given dimensions.	2024-08-16 15:40:04 -07:00
Yulong Wang	ef2ccc477b	[js/web] Add support for int4/uint4 tensor (#21720 ) ### Description Add support for int4/uint4 tensor.	2024-08-15 21:32:10 -07:00
Yang Gu	f8efc086ce	[js/webgpu] Support Chrome Canary in unit tests (#21750 ) Chrome Canary is helpful to test some new features. With this PR, we can enable Chrome Canary in unit tests with command like "npm test -- op abs.jsonc -b=webgpu -e=chromecanary".	2024-08-15 19:27:54 -07:00
Yulong Wang	abdc31de40	[js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728 ) ### Description See `454996d496` for manual changes (excluded auto-generated formatting changes) ### Why Because the toolsets for old clang-format is out-of-date. This reduces the development efficiency. - The NPM package `clang-format` is already in maintenance mode. not updated since 2 years ago. - The VSCode extension for clang-format is not maintained for a while, and a recent Node.js security update made it not working at all in Windows. No one in community seems interested in fixing those. Choose Prettier as it is the most popular TS/JS formatter. ### How to merge It's easy to break the build: - Be careful of any new commits on main not included in this PR. - Be careful that after this PR is merged, other PRs that already passed CI can merge. So, make sure there is no new commits before merging this one, and invalidate js PRs that already passed CI, force them to merge to latest.	2024-08-14 16:51:22 -07:00
Guenther Schmuelling	d82f15d0e3	add Gelu opset-20 to webgpu (#21725 ) https://github.com/microsoft/onnxruntime/issues/21618	2024-08-14 09:45:05 -07:00
Xu Xing	7172aff1cf	[js/webgpu] Fix max pool shape end with 0 (#21698 ) Bug: https://github.com/microsoft/onnxruntime/issues/21386 ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-13 20:59:24 -07:00
xhcao	9c6ee89fa7	[js/webgpu] fix two errors of attention operator (#21687 ) Fix two issues: (1) scale shall be fp32 instead of f16 (2) Softmax program does not handle the normalized dispatch group values, so if the sequence length is over 65535, the result is not correct for this program.	2024-08-13 09:42:34 -07:00
Satya Kumar Jandhyala	51b2044120	[JS/WebGPU] Add Dequantizelinear operator (#21642 ) ### Description Added DequantizeLinear operator for JSEP. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-09 14:44:19 -07:00
Yulong Wang	5e66fcc703	[js/web] allow op test to use f16 type for inputs/outputs (#21664 ) ### Description allow op test to use f16 type for inputs/outputs. This PR introduces "@petamoriken/float16" as Float16Array polyfill but restricts it to be only used for test runner.	2024-08-08 09:56:37 -07:00
Prathik Rao	134f47743e	bumps up version in main from 1.19 -> 1.20 (#21588 ) Bump up version in main from 1.19.0 to 1.20.0 since the release branch has been cut.	2024-08-05 15:46:04 -07:00
Wanming Lin	8c641d7182	[WebNN EP] Support Dropout op (#21586 ) ### Description WebNN only supports test mode, so we don't care about other inputs or attributes about training mode, use WebNN's identity op to implement the Dropout op directly.	2024-08-02 16:25:04 -07:00
Wanming Lin	1d4b161145	[WebNN EP] Support ConvTranspose for TFLite backend (#21291 ) ### Description Chromium supports ConvTranspose for TFLite in https://chromium-review.googlesource.com/c/chromium/src/+/5635194 With constraint that only default dilations and groups are supported. --------- Co-authored-by: Dwayne Robinson <fdwr@hotmail.com>	2024-07-30 17:46:08 -07:00
Yulong Wang	b03c9496aa	[js/web] allow load WebAssembly binary from buffer (#21534 ) ### Description This PR adds a new option `ort.env.wasm.wasmBinary`, which allows user to set to a buffer containing preload .wasm file content. This PR should resolve the problem from latest discussion in #20876.	2024-07-29 13:39:38 -07:00
Xu Xing	0d7cf301a1	[js/webgpu] Add activation Tanh (#21540 ) Bug:https://github.com/microsoft/onnxruntime/issues/21467 ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-07-29 11:05:34 -07:00
Xu Xing	5bc12bf209	[js/webgpu] Add activation for conv3d naive (#21466 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-07-29 08:47:41 -07:00
Wanming Lin	b6b29309a5	[WebNN EP] Update argMax/argMin to adapt to latest spec (#21452 ) WebNN spec recently changes the definition of argMax/argMin: - Remove selectLastIndex option, let backends decide to select the last index or not. - Move axes option to axis input	2024-07-25 17:07:01 -07:00
mindest	5b9369e93c	Fix typos according to reviewdog report. (#21335 ) ### Description Fix typos based on reviewdog report but with some exceptions/corrections.	2024-07-22 13:37:32 -07:00
Yulong Wang	01df8c787d	[js/web] fix vulnerable version of dependencies (#21412 ) ### Description ``` # npm audit report socket.io 3.0.0 - 4.6.2 Severity: high socket.io has an unhandled 'error' event - https://github.com/advisories/GHSA-25hc-qcg6-38wj Depends on vulnerable versions of engine.io fix available via `npm audit fix` node_modules/socket.io ws 8.0.0 - 8.17.0 Severity: high ws affected by a DoS when handling a request with many HTTP headers - https://github.com/advisories/GHSA-3h5v-q93c-6h6q fix available via `npm audit fix` node_modules/ws engine.io 0.7.8 - 0.7.9 \|\| 6.0.0 - 6.5.4 Depends on vulnerable versions of ws node_modules/engine.io socket.io-adapter 2.5.2 - 2.5.4 Depends on vulnerable versions of ws node_modules/socket.io-adapter 4 high severity vulnerabilities ```	2024-07-19 11:11:30 -07:00
Xu Xing	92a8407b39	[js/webgpu] Remove unnecessary initialization of var (#21312 ) This var has been initialized to 0 in tint, so no need extra loop to do it again: ``` float tint_symbol_52[1][4] = (float[1][4])0; { for(int tint_symbol_53 = 0; (tint_symbol_53 < 1); tint_symbol_53 = (tint_symbol_53 + 1)) { { for(int tint_symbol_54 = 0; (tint_symbol_54 < 4); tint_symbol_54 = (tint_symbol_54 + 1)) { tint_symbol_52[min(uint(tint_symbol_53), 0u)][min(uint(tint_symbol_54), 3u)] = 0.0f; } } } } ``` ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-07-12 12:34:34 -07:00
pengwa	88336ffa92	Fix typos - 1st Wave (#21278 ) ### Description There are so many typos reported by the review dog, [Optional Lint] actions (example: https://github.com/microsoft/onnxruntime/actions/runs/9864564489/job/27239732367), this PR is to fix some of them. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2024-07-11 13:35:08 +08:00
Enrico Galli	4c3c809bdb	[js/webnn] Enable user-supplied MLContext (#20600 ) ### Description This PR enables the API added in #20816 as well as moving context creation to JS. ### Motivation and Context In order to enable I/O Binding with the upcoming [MLBuffer](https://github.com/webmachinelearning/webnn/issues/542) API in the WebNN specification, we need to share the same `MLContext` across multiple sessions. This is because `MLBuffer`s are restricted to the `MLContext` where they were created. This PR enables developers to use the same `MLContext` across multiple sessions.	2024-07-08 10:19:39 -07:00
Wanming Lin	cd516a1677	[WebNN EP] Remove constraint for conv ops on CPU backend (#21237 ) Currently WebNN TFLite backend allows the filter of conv2d/convTranspose2d be an input. Remove the constraint and operate necessary transpose/reshape operations for the filter input.	2024-07-08 10:14:43 -07:00
Guenther Schmuelling	9eb1c2a7a3	support for layernorm in webgpu pre opset-17 (#21121 ) handled the same way cpu does	2024-06-27 10:20:48 -07:00
Wanming Lin	41ad83fb00	[WebNN EP] Support rest Reduction ops for TFLite backend (#21135 ) - reduceLogSum, reduceLogSumExp and reduceSumSquare have been landed in https://chromium-review.googlesource.com/c/chromium/src/+/5575815 - reduceL1 and reduceL2 have been landed in https://chromium-review.googlesource.com/c/chromium/src/+/5606091	2024-06-25 18:30:55 -07:00
Wanming Lin	4743803944	[WebNN EP] Support more Normalization ops for TFLite backend (#21151 ) Following Normalization ops have been supported in Chromium for TFLite backend: - batchNormalization: https://chromium-review.googlesource.com/c/chromium/src/+/5532745 - layerNormalization: https://chromium-review.googlesource.com/c/chromium/src/+/5573326 - instanceNormalization: https://chromium-review.googlesource.com/c/chromium/src/+/5532750	2024-06-24 19:04:23 -07:00
Wanming Lin	3a917e49fb	[WebNN EP] Support 4 more ops for TFLite backend (#21134 ) Recently WebNN TFLite backend supports gelu, expand, softsign, reciprocal.	2024-06-24 09:52:12 -07:00
Wanming Lin	0c80cd2157	[WebNN EP] Update Prelu restriction for CPU backend (#20878 )	2024-06-20 11:04:01 -07:00
Xu Xing	c3076721f3	[js/webgpu] Support conv3d naive (#20706 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-06-19 10:13:50 -07:00
Wanming Lin	40879a2623	[WebNN EP] Enable Cast op for WebNN CPU backend (#20864 ) WebNN TFLite backend supports `cast` op but doesn't support casting to `uint64` data type.	2024-06-19 01:51:19 -07:00
Wanming Lin	35c430a95a	[WebNN EP] Enable several ops for WebNN CPU backend (#20847 ) WebNN CPU implementation has been migrated from XNNPack to TFLite which supports more ops. Turn on partial `cpu` supported ops which just need the change from `false` to `true` firstly.	2024-06-19 01:45:31 -07:00
Yulong Wang	631a2c16be	[js/web] skip default locateFile() when dynamic import is disabled (#21073 ) ### Description skip default `locateFile()` when dynamic import is disabled. This allows the file to work with bundlers to load WebAssembly file correctly if `env.wasm.wasmPaths` is not set.	2024-06-18 12:21:45 -07:00
Yang Gu	1473d66a00	[js/webgpu] Prefer adapter.info to adapter.requestAdapterInfo (#21065 ) WebGPU is deprecating async adapter.requestAdapterInfo, and replacing it with sync adapter.info. Spec change: https://github.com/gpuweb/gpuweb/pull/4662	2024-06-18 12:02:38 -07:00
Jian Chen	4e18b0b7ce	Upgrade braces from 3.0.2 to 3.0.3 to fix the vulnerability (#21022 )	2024-06-12 18:02:52 -07:00
Yulong Wang	dd805ff77d	[js/web] ESM: use the bundled target as default export (#20991 ) ### Description ESM: use the bundled target as default export In this change, the default import of the following entries: ``` import from 'onnxruntime-web'; import from 'onnxruntime-web/all'; import from 'onnxruntime-web/webgpu'; ``` will use the "bundled" version, which has no dynamic import. This change should only apply to ESM on web.	2024-06-11 11:14:55 -07:00
Wanming Lin	043ef5c95f	[WebNN EP] Support latest WebNN softmax op (#20827 ) Latest WebNN softmax supports N-D input and axis parameter.	2024-06-11 08:27:14 -07:00
Wanming Lin	52874f628a	[WebNN EP] Remove some constraints for CPU backend (#20900 ) Following constraints have been supported by WebNN TFLite backend: - Concat: supports up to 4 inputs - Matmul: supports broadcasting - Resize: supports nearest mode - Split: supports up to 4 outputs	2024-06-06 08:22:41 -07:00
Wanming Lin	da1f8f9274	[WebNN EP] TFLite backend only supports limit ranges for Clip (#20863 )	2024-06-06 08:22:18 -07:00
Guenther Schmuelling	c749bd997a	webgpu quickgelu (#20939 )	2024-06-06 08:21:33 -07:00
Wanming Lin	9c6481fa2d	[WebNN EP] Enable ArgMax and ArgMin for CPU backend (#20865 ) WebNN TFLite backend supports ArgMax and ArgMin, but only supports 'select_last_index' value is 0.	2024-06-03 14:12:11 -07:00
Wanming Lin	c128132dd8	[WebNN EP] TFLite backend only supports Elu with default alpha (#20862 )	2024-06-03 14:10:22 -07:00
Yulong Wang	ab9f153746	[js/web] allow build target for non dynamic import (#20898 ) ### Description <!-- Describe your changes. --> This PR allows to build ORT web to `ort{.all\|.webgpu}.bundle.min.mjs`, which does not have any dynamic import. This makes it possible to use ort web via static import in service worker. Fixes #20876	2024-06-03 12:33:37 -07:00
Yulong Wang	35697d2421	[js/webnn] update API of session options for WebNN (#20816 ) ### Description This PR is an API-only change to address the requirements being discussed in #20729. There are multiple ways that users may create an ORT session by specifying the session options differently. All the code snippet below will use the variable `webnnOptions` as this: ```js const myWebnnSession = await ort.InferenceSession.create('./model.onnx', { executionProviders: [ webnnOptions ] }); ``` ### The old way (backward-compatibility) ```js // all-default, name only const webnnOptions_0 = 'webnn'; // all-default, properties omitted const webnnOptions_1 = { name: 'webnn' }; // partial const webnnOptions_2 = { name: 'webnn', deviceType: 'cpu' }; // full const webnnOptions_3 = { name: 'webnn', deviceType: 'gpu', numThreads: 1, powerPreference: 'high-performance' }; ``` ### The new way (specify with MLContext) ```js // options to create MLcontext const options = { deviceType: 'gpu', powerPreference: 'high-performance' }; const myMlContext = await navigator.ml.createContext(options); // options for session options const webnnOptions = { name: 'webnn', context: myMlContext, ...options }; ``` This should throw (because no deviceType is specified): ```js const myMlContext = await navigator.ml.createContext({ ... }); const webnnOptions = { name: 'webnn', context: myMlContext }; ``` ### Interop with WebGPU ```js // get WebGPU device const adaptor = await navigator.gpu.requestAdapter({ ... }); const device = await adaptor.requestDevice({ ... }); // set WebGPU adaptor and device ort.env.webgpu.adaptor = adaptor; ort.env.webgpu.device = device; const myMlContext = await navigator.ml.createContext(device); const webnnOptions = { name: 'webnn', context: myMlContext, gpuDevice: device }; ``` This should throw (because cannot specify both gpu device and MLContext option at the same time): ```js const webnnOptions = { name: 'webnn', context: myMlContext, gpuDevice: device, deviceType: 'gpu' }; ```	2024-05-31 03:25:14 -07:00
Xu Xing	25ac65375c	[js/webgpu] Fix mha name (#20860 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-05-30 00:01:06 -07:00
Peishen Yan	cfe68e489e	[WebNN EP] Support Trilu op (#20730 ) Adds support for Trilu via WebNN Triangular op	2024-05-24 10:46:54 -07:00
Guenther Schmuelling	33a68d221f	add missing file for pr20791 (#20811 ) this file should have been in pr20791 to allow fp16 in the tile implementation	2024-05-24 09:59:13 -07:00
Satya Kumar Jandhyala	bab5037eab	Eliminate explicit Concat operations in Attention (#20556 ) ### Description Remove explicitly concatinating pastKey with Key and pastValue with Value. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-05-24 09:07:57 -07:00
Wanming Lin	2c39d0c502	[WebNN EP] Disable ConvTranspose for WebNN CPU (#20762 ) WebNN CPU backend implementation has been migrated from XNNPack to TFLite, currently TFLite has not supported WebNN's convTranspose2d yet, just disable it for now.	2024-05-22 20:59:37 -07:00
Xu Xing	f1fef19b6e	[js/webgpu] Support shared memory for transpose 2d (#19267 ) For 1024x1024, without shared memoey, 18.7ms. With shared memory 13.2ms.	2024-05-22 08:15:44 -07:00
Wanming Lin	87d49e3dda	[WebNN EP] Add WebNN operators doc to README.md (#20734 )	2024-05-20 14:57:40 -07:00
Wanming Lin	0399d1b12d	[WebNN EP] Update chromium flag (#20732 ) WebNN is currently enabled behind "Enables WebNN API" flag.	2024-05-20 14:57:30 -07:00
Yulong Wang	036fcd93d4	[js/web] optimize module export and deployment (#20165 ) ### Description This PR make numbers of optimizations to onnxruntime-web's module export and deployment. See each section below for more details. #### Preview > [onnxruntime-web@1.19.0-esmtest.20240513-a16cd2bd21](https://www.npmjs.com/package/onnxruntime-web/v/1.19.0-esmtest.20240513-a16cd2bd21) > ~~onnxruntime-web@1.19.0-esmtest.20240430-c7edbcc63d~~ > ~~onnxruntime-web@1.18.0-esmtest.20240428-624c681c83~~ > ~~onnxruntime-web@1.18.0-esmtest.20240411-1abb64e894~~ <details> <summary><h4>Breaking changes</h4></summary> There is no code change required, but there are a few differences regarding code import, flags, bundler config and deployment steps. #### Importing: Import table is changed. See following for details. <details> <summary><h5>Current import table:</h5></summary> \| Target Name \| Path for "import" or "require" \| WebGL \| JSEP \| wasm \| Proxy \| Training \| \|------\|-----\|-----\|-----\|-----\|-----\|-----\| \| `ort` (default) \| `onnxruntime-web` \| ✔️ \| ❌ \| ✔️ \| ✔️ \| ❌ \| \| `ort.all` \| `onnxruntime-web/experimental` \| ✔️ \| ✔️ \| ✔️ \| ✔️ \| ❌ \| \| `ort.node` \| `onnxruntime-web` \| ❌ \| ❌ \| ✔️ \| ❌ \| ❌ \| \| `ort.training` \| `onnxruntime-web/training` \| ❌ \| ❌ \| ✔️ \| ✔️<sup>\[1]</sup> \| ✔️ \| \| `ort.wasm` \| `onnxruntime-web/wasm` \| ❌ \| ❌ \| ✔️ \| ✔️ \| ❌ \| \| `ort.wasm-core` \| `onnxruntime-web/wasm-core` \| ❌ \| ❌ \| ✔️ \| ❌ \| ❌ \| \| `ort.webgl` \| `onnxruntime-web/webgl` \| ✔️ \| ❌ \| ❌ \| ✔️<sup>\[2]</sup> \| ❌ \| \| `ort.webgpu` \| `onnxruntime-web/webgpu` \| ❌ \| ✔️ \| ✔️ \| ✔️ \| ❌ \| * [1] didn't test. may not actually work. * [2] not working. this is a mistake in build config. </details> <details> <summary><h5>Proposed update:</h5></summary> \| Target Name \| Path for "import" or "require" \| WebGL \| JSEP \| wasm \| Proxy \| Training \| \|------\|-----\|-----\|-----\|-----\|-----\|-----\| \| `ort` (default) \| `onnxruntime-web` \| ✔️ \| ❌ \| ✔️ \| ✔️ \| ❌ \| \| `ort.all` \| ~~`onnxruntime-web/experimental`~~<br/>`onnxruntime-web/all` \| ✔️ \| ✔️ \| ✔️ \| ✔️ \| ❌ \| \| `ort.node` \| `onnxruntime-web` \| ❌ \| ❌ \| ✔️ \| ❌ \| ❌ \| \| `ort.training` \| `onnxruntime-web/training` \| ❌ \| ❌ \| ✔️ \| ✔️ \| ✔️ \| \| `ort.wasm` \| `onnxruntime-web/wasm` \| ❌ \| ❌ \| ✔️ \| ✔️ \| ❌ \| \| ~~`ort.wasm-core`~~ \| ~~`onnxruntime-web/wasm-core`~~ \| ~~❌~~ \| ~~❌~~ \| ~~✔️~~ \| ~~❌~~ \| ~~❌~~ \| \| `ort.webgl` \| `onnxruntime-web/webgl` \| ✔️ \| ❌ \| ❌ \| ~~✔️~~ ❌ \| ❌ \| \| `ort.webgpu` \| `onnxruntime-web/webgpu` \| ❌ \| ✔️ \| ✔️ \| ✔️ \| ❌ \| </details> #### Flags: The following flags are deprecated: - `env.wasm.simd` (boolean): will be ignored. SIMD is always enabled in build. The following flags changed their type: - `env.wasm.wasmPaths`: When using this flag as a string ( for the URL prefix ), nothing is changed. When using this flag as an object ( for per-file path override ), the type changed: ```diff - export interface Old_WasmFilePaths{ - 'ort-wasm.wasm'?: string; - 'ort-wasm-threaded.wasm'?: string; - 'ort-wasm-simd.wasm'?: string; - 'ort-training-wasm-simd.wasm'?: string; - 'ort-wasm-simd-threaded.wasm'?: string; - }; + export interface New_WasmFilePaths { + /** + * Specify the override path for the main .wasm file. + * + * This path should be an absolute path. + * + * If not modified, the filename of the .wasm file is: + * - `ort-wasm-simd-threaded.wasm` for default build + * - `ort-wasm-simd-threaded.jsep.wasm` for JSEP build (with WebGPU and WebNN) + * - `ort-training-wasm-simd-threaded.wasm` for training build + / + wasm?: URL\|string; + /* + * Specify the override path for the main .mjs file. + * + * This path should be an absolute path. + * + * If not modified, the filename of the .mjs file is: + * - `ort-wasm-simd-threaded.mjs` for default build + * - `ort-wasm-simd-threaded.jsep.mjs` for JSEP build (with WebGPU and WebNN) + * - `ort-training-wasm-simd-threaded.mjs` for training build + / + mjs?: URL\|string; + } ``` #### Bundler compatibility: Config changes are need for bundlers. See usage example in /js/web/test/e2e/ for Webpack, parcel and rollup. #### Deployment: - if consuming from a CDN, there is no breaking change. - if consuming from a local server, need to copy all `ort-.wasm` and `ort-.mjs` files (totally 6 files) in the dist folder. (previously only need to copy `ort-.wasm` files.) </details> <details> <summary><h4>Problems</h4></summary> There are a few problems with the current module export and deployment: - Script URL cannot be correctly inferred when imported as ESM. - Workers are forcefully encoded using Blob URL, which makes onnxruntime-web not working in CSP environment and Node.js, when using proxy or multi-threading feature. - Generated JS code (by Emscripten) is encoded using `function.toString()`, which is unstable and error-prone. - When running with a different Emscripten build, always need the build step. Making it difficult to swap artifacts in deveopment/debug. </details> <details> <summary><h4>Goals</h4></summary> - Full ESM support - Support variances of ways to import. Including: - import from HTML's `<script>` tag (IIFE format, exporting to global variable `ort`) ```html <script src="https://example.com/cdn-path-to-onnxruntime-web/dist/ort.min.js"></script> ``` - import from source code inside `<script type="module">` tag (ESM) ```html <script type="module"> import * as ort from "https://example.com/cdn-path-to-onnxruntime-web/dist/ort.min.mjs"; // using 'ort' </script> ``` - import in a CommonJS project (CJS format, resolve from package.json "exports" field) ```js // myProject/main.js const ort = require('onnxruntime-web'); ``` - import in an ESM project (ESM format, resolve from package.json "exports" field) ```js // myProject/main.js (or main.mjs) import * as ort from 'onnxruntime-web'; ``` - Support popular bundlers when importing onnxruntime-web into a CJS/ESM project. - webpack (esm requires extra post-process step) - rollup - parcel (esm requires extra post-process step) - More bundlers TBD - Multi-threading support for Node.js NOTE: keeping single JavaScript file (the all-in-one bundle) is no longer a goal. This is because technically there is a conflict with the other requirements. </details> <details> <summary><h4>Important Design Decisions</h4></summary> - Drop support of single JavaScript output. - The current onnxruntime-web distribution uses a single JavaScript file to include all code. While there are a few benefits, it also creates problems as mentioned above. Since ESM is being used more and more widely, and browsers are making more restricted security checks and requirement, the old Blob based solution is going to be replaced. - To achieve the requirement, specifically, the CSP environment support, we have to offer a non Blob based solution. Therefore, we have to distribute multiple files and drop the single file solution. - Do not run parser/postprocess on Emscripten generated JavaScript. - Emscripten is evolving quickly so we should only depends on what's in its documentation instead of a certain implementation details. (for example, currently we patch on its code to deal with a special variable `_scriptDir`) - Keep the generated files as-is also helps to: - reduce the size of ort.min.js - make it easier to replace build artifacts when in development/debug - Drop support for non-SIMD and non-MultiThread. This helps to reduce the number of artifacts in distribution. - (fixed-sized) SIMD is supported in any mainstream JS environment. - Multi-thread as WebAssembly feature is supported in any mainstream JS environment. In some environment the feature is guarded with cross origin policy, but it can still work if not trying to create any worker. - Use ESM output for Emscripten generated JavaScript. - There are 2 ways to dynamically import classic (umd) modules and neither of them are recommended: - dynamically creating a <script> tag. This changes the HTML structure and have quite a lot of compatibility issue - use `fetch()` and `eval()`. However `eval` is strongly suggested to be avoid because there is a great perf hit. - importing ESM is super easy - just use the `import()` call. Considering ESM is widely supported in modern browsers and Node.js this is the better option. - Add Blob based solution as a fallback for cross-origin workers. - There are still wide use case of importing onnxruntime-web from CDN. In this usage, make it able create worker by using `fetch()`+`Blob` to create a same-origin Blob URL. </details> <details> <summary><h4>Distribution File Manifest</h4></summary> The distribution folder contains the following files: - WebAssembly artifacts. These files are the result of compiling the ONNX Runtime C++ code to WebAssembly by Emscripten. \| File Name \| Build Flags \| \|------\|-----\| \| ort-wasm-simd-threaded.mjs <br/> ort-wasm-simd-threaded.wasm \| `--enable_wasm_simd` <br/> `--enable_wasm_threads` \| \| ort-training-wasm-simd-threaded.mjs <br/> ort-training-wasm-simd-threaded.wasm \| `--enable_training_apis` <br/> `--enable_wasm_simd` <br/> `--enable_wasm_threads` \| \| ort-wasm-simd-threaded.jsep.mjs <br/> ort-wasm-simd-threaded.jsep.wasm \| `--enable_wasm_simd` <br/> `--enable_wasm_threads` <br/> `--use_jsep` <br/> `--use_webnn` \| - onnxruntime-web JavaScript artifacts. These files are generated by ESBuild as the entry point for onnxruntime-web. There are multiple build targets for different use cases: \| Target Name \| Path for "import" or "require" \| Description \| \|------\|-----\|-----\| \| `ort` \| `onnxruntime-web` \| The default target. \| \| `ort.all` \| `onnxruntime-web/all` \| The target including webgl. \| \| `ort.node` \| `onnxruntime-web` \| The default target for Node.js. \| \| `ort.training` \| `onnxruntime-web/training` \| The target including training APIs \| \| `ort.wasm` \| `onnxruntime-web/wasm` \| The target including only WebAssembly (CPU) EP \| \| `ort.webgl` \| `onnxruntime-web/webgl` \| The target including only WebGL EP \| For each target, there are multiple files generated: \| File Name \| Description \| \|------\|-----\| \| [target].js \| The entry point for the target. IIFE and CommonJS format. \| \| [target].mjs \| The entry point for the target. ESM format. \| \| [target].min.js <br/> [target].min.js.map \| The entry point for the target. Minimized with sourcemap. IIFE and CommonJS format. \| \| [target].min.mjs <br/> [target].min.mjs.map \| The entry point for the target. Minimized with sourcemap. ESM format. \| \| [target].proxy.mjs \| (if appliable) The proxy ESM module for the target. \| \| [target].proxy.min.mjs <br/> [target].proxy.min.mjs.map \| (if appliable) The proxy ESM module for the target. Minimized with sourcemap. \| </details> <details> <summary><h4>Dynamic Import Explained</h4></summary> - Local Served \| No Proxy: ``` [Bundle or ort.min.js] \| + import()--> [ort-wasm-simd-threaded.mjs] \| + WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm] \| + new Worker()--> [ort-wasm-simd-threaded.mjs (worker)] \| + WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm] ``` - Local Served \| Proxy: ``` [Bundle or ort.min.js] \| + import()--> [ort.proxy.min.mjs] \| + new Worker()--> [ort.proxy.min.mjs (worker)] \| + import()--> [ort-wasm-simd-threaded.mjs] \| + WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm] \| + new Worker()--> [ort-wasm-simd-threaded.mjs (worker)] \| + WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm] ``` - Cross Origin \| No Proxy: ``` [Bundle or ort.min.js] \| + fetch('ort-wasm-simd-threaded.mjs') \| + URL.createObjectURL(res.blob()) \| + import()--> [blob:... (ort-wasm-simd-threaded)] \| + WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm] \| + new Worker()--> [blob:... (ort-wasm-simd-threaded) (worker)] \| + WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm] ``` - Cross Origin \| Proxy ``` [Bundle or ort.min.js] \| + fetch('ort.proxy.min.mjs') \| + URL.createObjectURL(res.blob()) \| + import()--> [blob:... (ort.proxy)] \| + new Worker()--> [blob:... (ort.proxy) (worker)] \| + fetch('ort-wasm-simd-threaded.mjs') \| + URL.createObjectURL(res.blob()) \| + import()--> [blob:... (ort-wasm-simd-threaded)] \| + WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm] \| + new Worker()--> [blob:... (ort-wasm-simd-threaded) (worker)] \| + WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm] ``` </details>	2024-05-20 09:51:16 -07:00
Xu Xing	8c59cd4fce	[js/webgpu] Support GroupQueryAttention (#20237 ) TODOs: 1. Handle H * params.kvNumHeads greater than work group size limit. 2. Support BNSH kv cache.	2024-05-13 09:43:37 -07:00
Guenther Schmuelling	55a6986d38	optimize skiplayernorm (#20551 ) SkipSimplifiedLayerNormalization used in phi3 comes down from 222usec to 14usec	2024-05-08 08:40:03 -07:00
Yi-Hong Lyu	b2481e3602	Bump up version in main from 1.18.0 to 1.19.0 (#20489 ) Bump up version in main from 1.18.0 to 1.19.0 since the release branch has been cut. --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2024-04-29 20:21:41 -07:00
Yulong Wang	b1085b51ca	[js/web] update README (#20492 ) ### Description Update README.md in /js/web/ - update compatibility table - update links to onnxruntime.ai	2024-04-29 17:56:23 -07:00
Satya Kumar Jandhyala	99b0e19f11	[JS/WebGPU] MatMulNBits remove unnecessary condition (#20396 ) Distribute writing-to-output work over all threads in MatMulNBits. ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-29 14:27:21 -07:00
Satya Kumar Jandhyala	736cbb3925	[JS/WebGU] Support fp16 in Attention by performing the computation in fp32. (#20486 ) ### Description Perform computation in fp32 and convert finally to fp16. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-27 08:30:26 -07:00
Satya Kumar Jandhyala	21b3cbc3af	[WIP][JS/WebGPU] Inputs Key and Value could be 4-dims. (#20470 ) ### Description The Key and Value inputs could be 4-dims ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-25 13:33:46 -07:00
Yulong Wang	a5182a2ef3	[js/web] update test condition for '--force-localhost' (#20450 ) ### Description Fixes the NPM packaging pipeline failure.	2024-04-24 12:14:03 -07:00
Satya Kumar Jandhyala	ae78cdb5d7	[JS/WebGPU] MultiheadAttention bugfix (#20447 ) ### Description Fixed pastkey, key and pastvalue, value concatenation condition and fixed index error. Added new test cases. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-24 08:43:14 -07:00
Guenther Schmuelling	33d5ea39b3	[js/webgpu] fixes for fp16 attention (#20440 )	2024-04-24 08:01:28 -07:00
Yulong Wang	8f53957bcf	[js/web] add "browser" field to support parcel v2 (#20422 ) ### Description As described in latest discussion in #19915, parcel v2 without using the [new resolver](https://parceljs.org/blog/v2-9-0/#new-resolver) will not work correctly with onnxruntime-web. There are still users who uses parcel with default resolver, so add this deprecated field "browser" back for backward compatibility. This PR also corrects the "main" field, which is for old resolver for Node.js.	2024-04-23 13:10:11 -07:00
Satya Kumar Jandhyala	d42ac7f0c6	[JS/WebGPU] Multihead attention improvements (#20286 ) ### Description Enabled more usecases ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-23 12:39:49 -07:00
Guenther Schmuelling	b8e6684313	more conservitive gpu-buffer cache algo (#20312 ) tuned based on 80 models to keep performance impact minimal	2024-04-23 09:07:04 -07:00
Yulong Wang	4385602386	[js/web] fix test runner with optional input/output (#20399 ) ### Description fix test runner with optional input/output. This change fixes the OP test runner (.jsonc format test) with optional input(s) and/or output(s). this fix reveals a problem of dealing with optional outputs: > Take SkipSimplifiedLayerNorm as example: > > if in the ONNX model, the node's outputs are: [ 'output_0', '' ] instead of [ 'output_0' ], the current implementation will fail. The difference is, in the first case, context.outputCount == 2, and then the typescript implementation will try to create a tensor for output[1]. It will eventually call to C++ function (OpKernelContext::Output), and the output.DataRaw() will be nullptr. WebGPU backend will fail because it cannot deal with a TensorView with data == 0. > This problem may need to be fixed or workaround in separated PR. This PR does not fix this problem. Failed test cases are modified to work - please note this PR does not break those test cases as they never work.	2024-04-22 12:53:10 -07:00
Guenther Schmuelling	497a627a69	fix fp16 for skiplayernorm (#20381 )	2024-04-19 12:12:02 -07:00
Guenther Schmuelling	7b017cf9f8	fix web ci: csum tests need fp64 which is not supported on webgpu (#20374 )	2024-04-18 12:30:26 -07:00
Wanming Lin	da86f6f408	[WebNN EP] Add operators support table (#20253 )	2024-04-17 21:19:46 -07:00
Guenther Schmuelling	a8a77ddfdc	fix csum and enable ut (#20355 )	2024-04-17 15:01:06 -07:00
Wanming Lin	fe1c3a45c1	[WebNN EP] Support NPU deviceType (#20278 )	2024-04-15 18:43:46 -07:00
Satya Kumar Jandhyala	b33216be4c	[JS/WebGPU] Improve MatMulNBits perf (#19974 ) ### Description <!-- Describe your changes. --> Improve performance using shared memory ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-12 11:03:05 -07:00
liqun Fu	cd7112f800	Integration with ONNX 1.16.0 (#19745 ) ### Description update with ONNX 1.16.0 branch according to https://github.com/microsoft/onnxruntime/blob/main/docs/How_To_Update_ONNX_Dev_Notes.md ONNX 1.16.0 release notes: https://github.com/onnx/onnx/releases/tag/v1.16.0 #### Updated ops for CPU EP: - DequantizeLinear(21) - Added int16 and uint16 support + various optimizer tests - Missing int4 and uint4 support - Missing block dequantization support - QuantizeLinear(21) - Added int16 and uint16 support + various optimizer tests - Missing int4 and uint4 support - Missing block quantization support - Cast(21) - Missing int4 and uint4 support - CastLike(21) - Missing int4 and uint4 support - ConstantOfShape(21) - Missing int4 and uint4 support - Identity(21) - Missing int4 and uint4 support - If(21) - Missing int4 and uint4 support - Loop(21) - Missing int4 and uint4 support - Reshape(21) - Missing int4 and uint4 support - Scan(21) - Missing int4 and uint4 support - Shape(21) - Missing int4 and uint4 support - Size(21) - Missing int4 and uint4 support - Flatten(21) - Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4 support - Pad(21) - Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4 support - Squeeze(21) - Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4 support - Transpose(21) - Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4 support - Unsqueeze(21) - Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4 support #### Unimplemented opset 21 features/ops - int4 and uint4 data type - QLinearMatMul(21) - GroupNormalization(21) - ai.onnx.ml.TreeEnsemble(5) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> ### Disabled tests #### ORT Training orttraining/orttraining/test/python/orttraining_test_ort_apis_py_bindings.py - test_ort_custom_ops: Potential shape inference bug for custom ops #### Python quantization unit tests test/onnx/python/quantization (shape inference bug) - test_op_conv_transpose.py: test_quantize_conv_transpose_u8u8_fp16 - test_op_conv_transpose.py: test_quantize_conv_transpose_s8s8_fp16 - test_op_gemm.py: test_quantize_qop_gemm_s8s8 - test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_same - test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_p3 - test_op_matmul.py: test_quantize_matmul_u8u8_f16 - test_op_matmul.py: test_quantize_matmul_s8s8_f16 - test_op_matmul.py: test_quantize_matmul_s8s8_f16_entropy - test_op_matmul.py: test_quantize_matmul_s8s8_f16_percentile - test_op_matmul.py: test_quantize_matmul_s8s8_f16_distribution - test_op_relu.py: test_quantize_qop_relu_s8s8 #### ONNX tests - test_maxpool_2d_ceil_output_size_reduce_by_one: ONNX 1.16.0 fixed a maxpool output size bug and added this test. Enable this test when [ORT PR](https://github.com/microsoft/onnxruntime/pull/18377) is merged. Refer to original [ONNX PR](https://github.com/onnx/onnx/pull/5741). - test_ai_onnx_ml_tree_ensemble_set_membership_cpu: new unimplemented op ai.onnx.ml.TreeEnsemble - test_ai_onnx_ml_tree_ensemble_single_tree_cpu: same - test_ai_onnx_ml_tree_ensemble_set_membership_cuda: same - test_ai_onnx_ml_tree_ensemble_single_tree_cuda: same - test_cast_INT4_to_FLOAT_cpu: ORT Cast(21) impl doesn't support int4 yet - test_cast_INT4_to_INT8_cpu: same - test_cast_UINT4_to_FLOAT_cpu: same - test_cast_UINT4_to_UINT8_cpu: same - test_cast_INT4_to_FLOAT_cuda - test_cast_INT4_to_INT8_cuda - test_cast_UINT4_to_FLOAT_cuda - test_cast_UINT4_to_UINT8_cuda - test_constantofshape_float_ones_cuda: ConstantOfShape(21) not implemented for cuda - test_constantofshape_int_shape_zero_cuda: same - test_constantofshape_int_zeros_cuda: same - test_flatten_axis0_cuda: Flatten(21) not implemented for cuda - test_flatten_axis1_cuda: same - test_flatten_axis2_cuda: same - test_flatten_axis3_cuda: same - test_flatten_default_axis_cuda: same - test_flatten_negative_axis1_cuda: same - test_flatten_negative_axis2_cuda: same - test_flatten_negative_axis3_cuda: same - test_flatten_negative_axis4_cuda: same - test_qlinearmatmul_2D_int8_float16_cpu: QLinearMatMul(21) for onnx not implemented in ORT yet - test_qlinearmatmul_2D_int8_float32_cpu: same - test_qlinearmatmul_2D_uint8_float16_cpu: same - test_qlinearmatmul_2D_uint8_float32_cpu: same - test_qlinearmatmul_3D_int8_float16_cpu: same - test_qlinearmatmul_3D_int8_float32_cpu: same - test_qlinearmatmul_3D_uint8_float16_cpu: same - test_qlinearmatmul_3D_uint8_float32_cpu: same - test_qlinearmatmul_2D_int8_float16_cuda: same - test_qlinearmatmul_2D_int8_float32_cuda: same - test_qlinearmatmul_2D_uint8_float16_cuda: same - test_qlinearmatmul_2D_uint8_float32_cuda: same - test_qlinearmatmul_3D_int8_float16_cuda: same - test_qlinearmatmul_3D_int8_float32_cuda: same - test_qlinearmatmul_3D_uint8_float16_cuda: same - test_qlinearmatmul_3D_uint8_float32_cuda: same - test_size_cuda: Size(21) not implemented for cuda - test_size_example_cuda: same - test_dequantizelinear_blocked: Missing implementation for block dequant for DequantizeLinear(21) - test_quantizelinear_blocked_asymmetric: Missing implementation for block quant for QuantizeLinear(21) - test_quantizelinear_blocked_symmetric: Missing implementation for block quant for QuantizeLinear(21) --------- Signed-off-by: liqunfu <liqun.fu@microsoft.com> Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> Co-authored-by: Ganesan Ramalingam <grama@microsoft.com> Co-authored-by: George Wu <jywu@microsoft.com> Co-authored-by: adrianlizarraga <adlizarraga@microsoft.com>	2024-04-12 09:46:49 -07:00
dependabot[bot]	9ca1afa25c	Bump protobufjs from 7.2.4 to 7.2.5 in /js/web (#20270 ) Bumps [protobufjs](https://github.com/protobufjs/protobuf.js) from 7.2.4 to 7.2.5. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/protobufjs/protobuf.js/releases">protobufjs's releases</a>.</em></p> <blockquote> <h2>protobufjs: v7.2.5</h2> <h2><a href="https://github.com/protobufjs/protobuf.js/compare/protobufjs-v7.2.4...protobufjs-v7.2.5">7.2.5</a> (2023-08-21)</h2> <h3>Bug Fixes</h3> <ul> <li>crash in comment parsing (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1890">#1890</a>) (<a href="`eaf9f0a5a4`">eaf9f0a</a>)</li> <li>deprecation warning for new Buffer (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1905">#1905</a>) (<a href="`e93286ef70`">e93286e</a>)</li> <li>possible infinite loop when parsing option (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1923">#1923</a>) (<a href="`f2a8620179`">f2a8620</a>)</li> </ul> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/protobufjs/protobuf.js/blob/master/CHANGELOG.md">protobufjs's changelog</a>.</em></p> <blockquote> <h2><a href="https://github.com/protobufjs/protobuf.js/compare/protobufjs-v7.2.4...protobufjs-v7.2.5">7.2.5</a> (2023-08-21)</h2> <h3>Bug Fixes</h3> <ul> <li>crash in comment parsing (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1890">#1890</a>) (<a href="`eaf9f0a5a4`">eaf9f0a</a>)</li> <li>deprecation warning for new Buffer (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1905">#1905</a>) (<a href="`e93286ef70`">e93286e</a>)</li> <li>possible infinite loop when parsing option (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1923">#1923</a>) (<a href="`f2a8620179`">f2a8620</a>)</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`4436cc748c`"><code>4436cc7</code></a> chore: release master (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1925">#1925</a>)</li> <li><a href="`e93286ef70`"><code>e93286e</code></a> fix: deprecation warning for new Buffer (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1905">#1905</a>)</li> <li><a href="`eaf9f0a5a4`"><code>eaf9f0a</code></a> fix: crash in comment parsing (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1890">#1890</a>)</li> <li><a href="`f2a8620179`"><code>f2a8620</code></a> fix: possible infinite loop when parsing option (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1923">#1923</a>)</li> <li>See full diff in <a href="https://github.com/protobufjs/protobuf.js/compare/protobufjs-v7.2.4...protobufjs-v7.2.5">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=protobufjs&package-manager=npm_and_yarn&previous-version=7.2.4&new-version=7.2.5)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-04-11 22:07:08 -07:00
Yulong Wang	50bd4571ac	[js/web] support SimplifiedLayerNorm and SkipSimplifiedLayerNorm (#20277 ) ### Description Support operator `SimplifiedLayerNorm` and `SkipSimplifiedLayerNorm` for WebGPU backend.	2024-04-11 14:08:50 -07:00
MasayoshiTsutsui	6a9d8a9030	[js/webgpu] implement DepthToSpace operator in webgpu (#19948 ) ### Description This PR supports [DepthToSpace](https://onnx.ai/onnx/operators/onnx__DepthToSpace.html#depthtospace) operator in webgpu backend. ### Test We followed the steps described on [this page](https://gist.github.com/fs-eire/a55b2c7e10a6864b9602c279b8b75dce) to build, tested with the following commands, and confirmed that it passed the Model and Op tests that already existed. (Probably, these test cases were prepared in the past for WebGL backend) ``` ~/onnxruntime/js/web> % npm test -- suite0 -b=webgpu --wasm-number-threads=1 --debug ``` ##### NOTE I want to tell you that the main branch version failed 5 tests for the resize_upsample_sizes_nearest operator. Since I didn't touch this issue, those test cases still fail in my branch as well. Should I post an issue for this? ### Motivation and Context Though the DepthToSpace operator plays a crucial role in super-resolution domains, it was not supported in webgpu backend.	2024-04-10 12:13:46 -07:00
Jiajie Hu	23d3afd4fe	[js/webgpu] Implement com.microsoft.RotaryEmbedding (#20209 ) ### Description https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#commicrosoftrotaryembedding ### Motivation and Context As per customer request, this helps Phi-2 and Gemma.	2024-04-08 09:11:26 -07:00
Guenther Schmuelling	c529e05e38	fix ConvTranspose 1D (#20194 )	2024-04-05 10:05:32 -07:00
Yulong Wang	fa1917b81b	[js/webgpu] add validation to workgroup size (#20110 ) ### Description add validation to workgroup size in `shaderHelper.mainStart()`.	2024-04-02 19:29:20 -07:00
Xu Xing	a2998e5d42	[js/webgpu] Use global id in attention and instance-norm (#20008 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-02 01:42:39 -07:00
Nanashi	ca465dc087	[js] Make error friendly when isOrtFormat is undefined (#19958 ) ### Description Make error friendly when isOrtFormat is undefined (`onnxruntime.InferenceSession.create` is called with ArrayBuffer or Uint8Array). ### Motivation and Context I was trying to run my onnx model in WebGL EP, but it gave me the error "Cannot read properties of null (reading 'irVersion')". I used debugger to find that actual error is `int64 is not supported`, but the error was invisible for me. So I made it to show both error when isOrtFormat is undefined. <s>I haven't written unit test yet, so I'm making it draft. (I have no idea about how do I test this though...)</s> [d62d942](`d62d9425ba`)	2024-03-27 02:07:00 -07:00
Yulong Wang	28907d8c59	[js/web] workaround NPM test fetch failure (#20020 ) ### Description Sometimes the `npm test` failed with an error of "TypeError: Failed to fetch". I checked the callback entry of the localhost server started by karma. When the "Failed to fetch" happens, no request is reflected on the server side. The root cause is still not identified. However, as this issue only happens sometimes when the browser is just launched by karma runner, doing retry can workaround this issue for most of the time.	2024-03-26 21:35:49 -07:00

1 2 3 4 5 ...

600 commits