onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-26 22:35:43 +00:00

Author	SHA1	Message	Date
Satya Kumar Jandhyala	ae78cdb5d7	[JS/WebGPU] MultiheadAttention bugfix (#20447 ) ### Description Fixed pastkey, key and pastvalue, value concatenation condition and fixed index error. Added new test cases. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-24 08:43:14 -07:00
Guenther Schmuelling	33d5ea39b3	[js/webgpu] fixes for fp16 attention (#20440 )	2024-04-24 08:01:28 -07:00
Yulong Wang	8f53957bcf	[js/web] add "browser" field to support parcel v2 (#20422 ) ### Description As described in latest discussion in #19915, parcel v2 without using the [new resolver](https://parceljs.org/blog/v2-9-0/#new-resolver) will not work correctly with onnxruntime-web. There are still users who uses parcel with default resolver, so add this deprecated field "browser" back for backward compatibility. This PR also corrects the "main" field, which is for old resolver for Node.js.	2024-04-23 13:10:11 -07:00
Yulong Wang	13bda11583	[Node.js binding] Fix install script (#20416 ) ### Description Fix a few bugs of the install script of onnxruntime-node package. This change is integrated from branch `rel-1.17.3` (#20397)	2024-04-23 13:01:16 -07:00
Satya Kumar Jandhyala	d42ac7f0c6	[JS/WebGPU] Multihead attention improvements (#20286 ) ### Description Enabled more usecases ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-23 12:39:49 -07:00
Guenther Schmuelling	b8e6684313	more conservitive gpu-buffer cache algo (#20312 ) tuned based on 80 models to keep performance impact minimal	2024-04-23 09:07:04 -07:00
Yulong Wang	4385602386	[js/web] fix test runner with optional input/output (#20399 ) ### Description fix test runner with optional input/output. This change fixes the OP test runner (.jsonc format test) with optional input(s) and/or output(s). this fix reveals a problem of dealing with optional outputs: > Take SkipSimplifiedLayerNorm as example: > > if in the ONNX model, the node's outputs are: [ 'output_0', '' ] instead of [ 'output_0' ], the current implementation will fail. The difference is, in the first case, context.outputCount == 2, and then the typescript implementation will try to create a tensor for output[1]. It will eventually call to C++ function (OpKernelContext::Output), and the output.DataRaw() will be nullptr. WebGPU backend will fail because it cannot deal with a TensorView with data == 0. > This problem may need to be fixed or workaround in separated PR. This PR does not fix this problem. Failed test cases are modified to work - please note this PR does not break those test cases as they never work.	2024-04-22 12:53:10 -07:00
Guenther Schmuelling	497a627a69	fix fp16 for skiplayernorm (#20381 )	2024-04-19 12:12:02 -07:00
Yulong Wang	3577a4bd02	[Node.js binding] Allow installation to download CUDA binaries via script (#20364 ) ### Description Currently we try to include all prebuilt binaries into the NPM packages. This was working until we added libonnxruntime_providers_cuda.so (>400MB) into the NPM package. The NPM registry refuses to accept new package publishment because the file is too large. To make the new NPM package working, we have to remove the large file from the package, and add a new script on package installation. This script will try to dynamically install onnxruntime CUDA dynamic library for Linux/x64.	2024-04-18 13:44:42 -07:00
Guenther Schmuelling	7b017cf9f8	fix web ci: csum tests need fp64 which is not supported on webgpu (#20374 )	2024-04-18 12:30:26 -07:00
Wanming Lin	da86f6f408	[WebNN EP] Add operators support table (#20253 )	2024-04-17 21:19:46 -07:00
Guenther Schmuelling	a8a77ddfdc	fix csum and enable ut (#20355 )	2024-04-17 15:01:06 -07:00
Wanming Lin	fe1c3a45c1	[WebNN EP] Support NPU deviceType (#20278 )	2024-04-15 18:43:46 -07:00
Satya Kumar Jandhyala	b33216be4c	[JS/WebGPU] Improve MatMulNBits perf (#19974 ) ### Description <!-- Describe your changes. --> Improve performance using shared memory ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-12 11:03:05 -07:00
liqun Fu	cd7112f800	Integration with ONNX 1.16.0 (#19745 ) ### Description update with ONNX 1.16.0 branch according to https://github.com/microsoft/onnxruntime/blob/main/docs/How_To_Update_ONNX_Dev_Notes.md ONNX 1.16.0 release notes: https://github.com/onnx/onnx/releases/tag/v1.16.0 #### Updated ops for CPU EP: - DequantizeLinear(21) - Added int16 and uint16 support + various optimizer tests - Missing int4 and uint4 support - Missing block dequantization support - QuantizeLinear(21) - Added int16 and uint16 support + various optimizer tests - Missing int4 and uint4 support - Missing block quantization support - Cast(21) - Missing int4 and uint4 support - CastLike(21) - Missing int4 and uint4 support - ConstantOfShape(21) - Missing int4 and uint4 support - Identity(21) - Missing int4 and uint4 support - If(21) - Missing int4 and uint4 support - Loop(21) - Missing int4 and uint4 support - Reshape(21) - Missing int4 and uint4 support - Scan(21) - Missing int4 and uint4 support - Shape(21) - Missing int4 and uint4 support - Size(21) - Missing int4 and uint4 support - Flatten(21) - Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4 support - Pad(21) - Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4 support - Squeeze(21) - Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4 support - Transpose(21) - Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4 support - Unsqueeze(21) - Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4 support #### Unimplemented opset 21 features/ops - int4 and uint4 data type - QLinearMatMul(21) - GroupNormalization(21) - ai.onnx.ml.TreeEnsemble(5) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> ### Disabled tests #### ORT Training orttraining/orttraining/test/python/orttraining_test_ort_apis_py_bindings.py - test_ort_custom_ops: Potential shape inference bug for custom ops #### Python quantization unit tests test/onnx/python/quantization (shape inference bug) - test_op_conv_transpose.py: test_quantize_conv_transpose_u8u8_fp16 - test_op_conv_transpose.py: test_quantize_conv_transpose_s8s8_fp16 - test_op_gemm.py: test_quantize_qop_gemm_s8s8 - test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_same - test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_p3 - test_op_matmul.py: test_quantize_matmul_u8u8_f16 - test_op_matmul.py: test_quantize_matmul_s8s8_f16 - test_op_matmul.py: test_quantize_matmul_s8s8_f16_entropy - test_op_matmul.py: test_quantize_matmul_s8s8_f16_percentile - test_op_matmul.py: test_quantize_matmul_s8s8_f16_distribution - test_op_relu.py: test_quantize_qop_relu_s8s8 #### ONNX tests - test_maxpool_2d_ceil_output_size_reduce_by_one: ONNX 1.16.0 fixed a maxpool output size bug and added this test. Enable this test when [ORT PR](https://github.com/microsoft/onnxruntime/pull/18377) is merged. Refer to original [ONNX PR](https://github.com/onnx/onnx/pull/5741). - test_ai_onnx_ml_tree_ensemble_set_membership_cpu: new unimplemented op ai.onnx.ml.TreeEnsemble - test_ai_onnx_ml_tree_ensemble_single_tree_cpu: same - test_ai_onnx_ml_tree_ensemble_set_membership_cuda: same - test_ai_onnx_ml_tree_ensemble_single_tree_cuda: same - test_cast_INT4_to_FLOAT_cpu: ORT Cast(21) impl doesn't support int4 yet - test_cast_INT4_to_INT8_cpu: same - test_cast_UINT4_to_FLOAT_cpu: same - test_cast_UINT4_to_UINT8_cpu: same - test_cast_INT4_to_FLOAT_cuda - test_cast_INT4_to_INT8_cuda - test_cast_UINT4_to_FLOAT_cuda - test_cast_UINT4_to_UINT8_cuda - test_constantofshape_float_ones_cuda: ConstantOfShape(21) not implemented for cuda - test_constantofshape_int_shape_zero_cuda: same - test_constantofshape_int_zeros_cuda: same - test_flatten_axis0_cuda: Flatten(21) not implemented for cuda - test_flatten_axis1_cuda: same - test_flatten_axis2_cuda: same - test_flatten_axis3_cuda: same - test_flatten_default_axis_cuda: same - test_flatten_negative_axis1_cuda: same - test_flatten_negative_axis2_cuda: same - test_flatten_negative_axis3_cuda: same - test_flatten_negative_axis4_cuda: same - test_qlinearmatmul_2D_int8_float16_cpu: QLinearMatMul(21) for onnx not implemented in ORT yet - test_qlinearmatmul_2D_int8_float32_cpu: same - test_qlinearmatmul_2D_uint8_float16_cpu: same - test_qlinearmatmul_2D_uint8_float32_cpu: same - test_qlinearmatmul_3D_int8_float16_cpu: same - test_qlinearmatmul_3D_int8_float32_cpu: same - test_qlinearmatmul_3D_uint8_float16_cpu: same - test_qlinearmatmul_3D_uint8_float32_cpu: same - test_qlinearmatmul_2D_int8_float16_cuda: same - test_qlinearmatmul_2D_int8_float32_cuda: same - test_qlinearmatmul_2D_uint8_float16_cuda: same - test_qlinearmatmul_2D_uint8_float32_cuda: same - test_qlinearmatmul_3D_int8_float16_cuda: same - test_qlinearmatmul_3D_int8_float32_cuda: same - test_qlinearmatmul_3D_uint8_float16_cuda: same - test_qlinearmatmul_3D_uint8_float32_cuda: same - test_size_cuda: Size(21) not implemented for cuda - test_size_example_cuda: same - test_dequantizelinear_blocked: Missing implementation for block dequant for DequantizeLinear(21) - test_quantizelinear_blocked_asymmetric: Missing implementation for block quant for QuantizeLinear(21) - test_quantizelinear_blocked_symmetric: Missing implementation for block quant for QuantizeLinear(21) --------- Signed-off-by: liqunfu <liqun.fu@microsoft.com> Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> Co-authored-by: Ganesan Ramalingam <grama@microsoft.com> Co-authored-by: George Wu <jywu@microsoft.com> Co-authored-by: adrianlizarraga <adlizarraga@microsoft.com>	2024-04-12 09:46:49 -07:00
dependabot[bot]	9ca1afa25c	Bump protobufjs from 7.2.4 to 7.2.5 in /js/web (#20270 ) Bumps [protobufjs](https://github.com/protobufjs/protobuf.js) from 7.2.4 to 7.2.5. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/protobufjs/protobuf.js/releases">protobufjs's releases</a>.</em></p> <blockquote> <h2>protobufjs: v7.2.5</h2> <h2><a href="https://github.com/protobufjs/protobuf.js/compare/protobufjs-v7.2.4...protobufjs-v7.2.5">7.2.5</a> (2023-08-21)</h2> <h3>Bug Fixes</h3> <ul> <li>crash in comment parsing (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1890">#1890</a>) (<a href="`eaf9f0a5a4`">eaf9f0a</a>)</li> <li>deprecation warning for new Buffer (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1905">#1905</a>) (<a href="`e93286ef70`">e93286e</a>)</li> <li>possible infinite loop when parsing option (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1923">#1923</a>) (<a href="`f2a8620179`">f2a8620</a>)</li> </ul> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/protobufjs/protobuf.js/blob/master/CHANGELOG.md">protobufjs's changelog</a>.</em></p> <blockquote> <h2><a href="https://github.com/protobufjs/protobuf.js/compare/protobufjs-v7.2.4...protobufjs-v7.2.5">7.2.5</a> (2023-08-21)</h2> <h3>Bug Fixes</h3> <ul> <li>crash in comment parsing (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1890">#1890</a>) (<a href="`eaf9f0a5a4`">eaf9f0a</a>)</li> <li>deprecation warning for new Buffer (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1905">#1905</a>) (<a href="`e93286ef70`">e93286e</a>)</li> <li>possible infinite loop when parsing option (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1923">#1923</a>) (<a href="`f2a8620179`">f2a8620</a>)</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`4436cc748c`"><code>4436cc7</code></a> chore: release master (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1925">#1925</a>)</li> <li><a href="`e93286ef70`"><code>e93286e</code></a> fix: deprecation warning for new Buffer (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1905">#1905</a>)</li> <li><a href="`eaf9f0a5a4`"><code>eaf9f0a</code></a> fix: crash in comment parsing (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1890">#1890</a>)</li> <li><a href="`f2a8620179`"><code>f2a8620</code></a> fix: possible infinite loop when parsing option (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1923">#1923</a>)</li> <li>See full diff in <a href="https://github.com/protobufjs/protobuf.js/compare/protobufjs-v7.2.4...protobufjs-v7.2.5">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=protobufjs&package-manager=npm_and_yarn&previous-version=7.2.4&new-version=7.2.5)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-04-11 22:07:08 -07:00
Yulong Wang	50bd4571ac	[js/web] support SimplifiedLayerNorm and SkipSimplifiedLayerNorm (#20277 ) ### Description Support operator `SimplifiedLayerNorm` and `SkipSimplifiedLayerNorm` for WebGPU backend.	2024-04-11 14:08:50 -07:00
MasayoshiTsutsui	6a9d8a9030	[js/webgpu] implement DepthToSpace operator in webgpu (#19948 ) ### Description This PR supports [DepthToSpace](https://onnx.ai/onnx/operators/onnx__DepthToSpace.html#depthtospace) operator in webgpu backend. ### Test We followed the steps described on [this page](https://gist.github.com/fs-eire/a55b2c7e10a6864b9602c279b8b75dce) to build, tested with the following commands, and confirmed that it passed the Model and Op tests that already existed. (Probably, these test cases were prepared in the past for WebGL backend) ``` ~/onnxruntime/js/web> % npm test -- suite0 -b=webgpu --wasm-number-threads=1 --debug ``` ##### NOTE I want to tell you that the main branch version failed 5 tests for the resize_upsample_sizes_nearest operator. Since I didn't touch this issue, those test cases still fail in my branch as well. Should I post an issue for this? ### Motivation and Context Though the DepthToSpace operator plays a crucial role in super-resolution domains, it was not supported in webgpu backend.	2024-04-10 12:13:46 -07:00
Jiajie Hu	23d3afd4fe	[js/webgpu] Implement com.microsoft.RotaryEmbedding (#20209 ) ### Description https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#commicrosoftrotaryembedding ### Motivation and Context As per customer request, this helps Phi-2 and Gemma.	2024-04-08 09:11:26 -07:00
Guenther Schmuelling	c529e05e38	fix ConvTranspose 1D (#20194 )	2024-04-05 10:05:32 -07:00
Hans	6abfb6b928	[js/rn] Support load external data (#20090 ) Support load external data by passing local model path	2024-04-05 05:55:03 -07:00
Yi Zhang	dae77e6014	Support building Windows CUDA with Ninja (#20176 ) ### How to run it locally 1. conda install ninja 2. "C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Auxiliary\Build\vcvarsall.bat" x64 3. python.exe {ort_repo}\tools\ci_build\build.py --config RelWithDebInfo --build_dir {ort_repo}\build_cuda --skip_submodule_sync --build_csharp --update --parallel --cmake_generator "Ninja" --build_shared_lib --enable_onnx_tests --enable_pybind --build_java --build_nodejs --use_cuda "--cuda_home=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8" --enable_cuda_profiling --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=60 4. cd build_cuda\RelWithDebInfo 5. cmake --build . j16 ### Motivation and Context In packaging pipelines, we often come across a random issue that the building with CUDA on Windows takes too much time. Although it has been reduced much by moving the building to the CPU machine. We're planning to build with Ninja instead of msbuild in Packaging pipelines, thus, nvcc can run parallelly. It's the first step to support it locally.	2024-04-03 11:19:31 +08:00
Yulong Wang	fa1917b81b	[js/webgpu] add validation to workgroup size (#20110 ) ### Description add validation to workgroup size in `shaderHelper.mainStart()`.	2024-04-02 19:29:20 -07:00
Xu Xing	a2998e5d42	[js/webgpu] Use global id in attention and instance-norm (#20008 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-02 01:42:39 -07:00
Nanashi	ca465dc087	[js] Make error friendly when isOrtFormat is undefined (#19958 ) ### Description Make error friendly when isOrtFormat is undefined (`onnxruntime.InferenceSession.create` is called with ArrayBuffer or Uint8Array). ### Motivation and Context I was trying to run my onnx model in WebGL EP, but it gave me the error "Cannot read properties of null (reading 'irVersion')". I used debugger to find that actual error is `int64 is not supported`, but the error was invisible for me. So I made it to show both error when isOrtFormat is undefined. <s>I haven't written unit test yet, so I'm making it draft. (I have no idea about how do I test this though...)</s> [d62d942](`d62d9425ba`)	2024-03-27 02:07:00 -07:00
Yulong Wang	28907d8c59	[js/web] workaround NPM test fetch failure (#20020 ) ### Description Sometimes the `npm test` failed with an error of "TypeError: Failed to fetch". I checked the callback entry of the localhost server started by karma. When the "Failed to fetch" happens, no request is reflected on the server side. The root cause is still not identified. However, as this issue only happens sometimes when the browser is just launched by karma runner, doing retry can workaround this issue for most of the time.	2024-03-26 21:35:49 -07:00
Yulong Wang	473434c73f	[js/webgpu] perform uniform consistency check (#20019 ) ### Description This PR makes a change in WebGPU backend to validate program uniforms. It compares the uniform data that comes from the result of `getRunData()` callback from the program info, with the `ShaderHelper`'s maintained list of uniform variables. Fixes a few bugs that found by this check as well.	2024-03-26 17:14:43 -07:00
Yulong Wang	050085a7fb	[js/web] remove "browser" field in package.json (#20021 ) ### Description Field "browser" is deprecated in favor of "exports". Removes the unused field. Some bundler may read from "browser" and generate errors. Removing this field should let bundler to look up "exports". Fixes #19915	2024-03-26 13:57:11 -07:00
Yulong Wang	0313dd1f65	Update Web CI to use data dir under Agent.TempDirectory (#20074 ) ### Description Update Web CI to use data dir under Agent.TempDirectory This change fixes the random failure caused by unstable access to karma temp directory (which is under AppData\Local\Temp) on CI pipeline	2024-03-26 13:16:59 -07:00
Satya Kumar Jandhyala	5b64d7c32b	[JS/WebGPU] Use non-matmul implementation for ConvTranspose in channel-first case. (#20022 ) ### Description Avoid using vec4 Matmul implementation for ConvTranspose with channel-last ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-03-23 11:19:14 -07:00
Guenther Schmuelling	c45cff60cf	[js/webgpu] fix maxpool / fp16 (#19981 )	2024-03-19 16:15:49 -07:00
Yulong Wang	01c7aaf6aa	[js/webgpu] allow setting env.webgpu.adapter (#19940 ) ### Description Allow user to set `env.webgpu.adapter` before creating the first inference session. Feature request: https://github.com/microsoft/onnxruntime/pull/19857#issuecomment-1999984753 @xenova	2024-03-19 12:55:00 -07:00
Xu Xing	4c6a6a37f7	[js/webgpu] Fix NAN caused by un-initialized buffer in instance-norm (#19387 ) The added case will be NAN because of the un-initialized buffer.	2024-03-18 22:59:32 -07:00
Guenther Schmuelling	7e0d424934	accumulate in fp32 for Reduce* (#19868 )	2024-03-18 08:28:43 -07:00
dependabot[bot]	28ad6c3955	Bump follow-redirects from 1.15.4 to 1.15.6 in /js/node (#19951 ) Bumps [follow-redirects](https://github.com/follow-redirects/follow-redirects) from 1.15.4 to 1.15.6. <details> <summary>Commits</summary> <ul> <li><a href="`35a517c586`"><code>35a517c</code></a> Release version 1.15.6 of the npm package.</li> <li><a href="`c4f847f851`"><code>c4f847f</code></a> Drop Proxy-Authorization across hosts.</li> <li><a href="`8526b4a1b2`"><code>8526b4a</code></a> Use GitHub for disclosure.</li> <li><a href="`b1677ce001`"><code>b1677ce</code></a> Release version 1.15.5 of the npm package.</li> <li><a href="`d8914f7982`"><code>d8914f7</code></a> Preserve fragment in responseUrl.</li> <li>See full diff in <a href="https://github.com/follow-redirects/follow-redirects/compare/v1.15.4...v1.15.6">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=follow-redirects&package-manager=npm_and_yarn&previous-version=1.15.4&new-version=1.15.6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-03-16 18:54:53 -07:00
dependabot[bot]	afdab62f53	Bump follow-redirects from 1.15.4 to 1.15.6 in /js/web (#19949 ) Bumps [follow-redirects](https://github.com/follow-redirects/follow-redirects) from 1.15.4 to 1.15.6. <details> <summary>Commits</summary> <ul> <li><a href="`35a517c586`"><code>35a517c</code></a> Release version 1.15.6 of the npm package.</li> <li><a href="`c4f847f851`"><code>c4f847f</code></a> Drop Proxy-Authorization across hosts.</li> <li><a href="`8526b4a1b2`"><code>8526b4a</code></a> Use GitHub for disclosure.</li> <li><a href="`b1677ce001`"><code>b1677ce</code></a> Release version 1.15.5 of the npm package.</li> <li><a href="`d8914f7982`"><code>d8914f7</code></a> Preserve fragment in responseUrl.</li> <li>See full diff in <a href="https://github.com/follow-redirects/follow-redirects/compare/v1.15.4...v1.15.6">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=follow-redirects&package-manager=npm_and_yarn&previous-version=1.15.4&new-version=1.15.6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-03-16 18:53:17 -07:00
Yulong Wang	b29849a287	[js/common] fix typedoc warnings (#19933 ) ### Description Fix a few warnings in typedoc (for generating JS API): ``` [warning] The signature TrainingSession.loadParametersBuffer has an @param with name "buffer", which was not used. [warning] NonTensorType, defined in ./lib/onnx-value.ts, is referenced by OnnxValue but not included in the documentation. [warning] TensorFactory, defined in ./lib/tensor-factory.ts, is referenced by Tensor but not included in the documentation. [warning] ExternalDataFileType, defined in ./lib/onnx-model.ts, is referenced by InferenceSession.SessionOptions.externalData but not included in the documentation. [warning] TensorToDataUrlOptions, defined in ./lib/tensor-conversion.ts, is referenced by Tensor.toDataURL.toDataURL.options but not included in the documentation. [warning] TensorToImageDataOptions, defined in ./lib/tensor-conversion.ts, is referenced by Tensor.toImageData.toImageData.options but not included in the documentation. [warning] Failed to resolve link to "GpuBufferType" in comment for Env.WebGpuFlags.adapter. [warning] Failed to resolve link to "GpuBufferType" in comment for Env.WebGpuFlags.device. ``` Changes highlighted: - Merge `CoreMlExecutionProviderOption` and `CoreMLExecutionProviderOption`. They expose 2 set of different options for React-native and ORT nodejs binding. This should be fixed in future. - Fix a few inconsistency of names between JSDoc and parameters - Fix broken type links - Exclude trace functions	2024-03-15 19:01:50 -07:00
Belem Zhang	acb0df2280	Fix #19931 broken Get Started link of "ONNX Runtime JavaScript API" page (#19932 ) ### Description Fix #19931 broken Get Started link HTTP 404 for "Get Started" link in "ONNX Runtime JavaScript API" page Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2024-03-15 19:00:30 -07:00
Yulong Wang	79e50aeef3	[js/web] rewrite backend resolve to allow multiple EPs (#19735 ) ### Description This PR rewrite the backend resolve logic to support specifying multiple EPs. #### Backend The first version of ONNX Runtime Web actually carried some existing code from [ONNX.js](https://github.com/microsoft/onnxjs), which includes the "backend" concept. The original "backend" in ONNX.js is designed in a way assuming there is only one backend from user's backend hint list will be used. For example, in ONNX.js, if user specify a backend hint as `['webgl', 'wasm']`, ONNX.js will first try to use WebGL backend - if it loads successfully (the browser supports webgl), then "webgl" backend will be used and "wasm" will be ignored; otherwise, "webgl" will be ignored and try to load "wasm" backend. In short: only one backend will be used when initializing a session. #### Execution Provider Execution Provider, or EP, in ONNX Runtime is a different concept. One of the differences is that users are allow to specify multiple EPs, and if one does not support a particular kernel, it can fallback to other EP. This is a very common case when using a GPU EP in ONNX Runtime. #### Current Status: Backend v.s. EP Because of the history reasons mentioned above, the current status is quite confusing. There are real backends, which means it's different implementation in code; and there are backend hints, which are used as string names for backend hint; and there are EPs of the ONNX Runtime concepts. currently there are only 2 backends in our code base: The "onnxjs backend", and the "wasm backend". The "onnxjs backend" currently only powers backend hint "webgl", which go into the old onnx.js code path. All other backend hints including "wasm", "cpu"(alias to wasm), "webgpu" and "webnn" are all powered by "wasm backend". And because ORT Web treat "backend" as an internal concept and want to align with ONNX Runtime, so those names of backend hints are becoming EP names. The following table shows today's status: \| Execution Provider Name (public) / Backend Hint (internal) \| Backend \| EP in ORT \| -------- \| ------- \| ------- \| \| "wasm"/"cpu" \| WasmBackend \| CPU EP \| "webgl" \| OnnxjsBackend \| \* technically not an EP \| "webgpu" \| WasmBackend \| JSEP \| "webnn" \| WasmBackend \| WebNN EP #### Problem While the API allows to specify multiple EPs, the backend resolving only allows one backend. This causes issues when user specify multiple EP names in session options, the backend resolve behavior and EP registration behavior is inconsistent. Specifically, in this issue: https://github.com/microsoft/onnxruntime/issues/15796#issuecomment-1925363908: EP list `['webgpu', 'wasm']` on a browser without WebGPU support resolves to 'wasm' backend, but the full EP list is passed in session options, so JSEP is still enabled, causing the runtime error. #### Solution Since we still need WebGL backend, we cannot totally remove the backend register/resolve system. In this PR I made the following changes: - initialize every backend from the EP list, instead of only do that for the first successful one. - for the first resolved backend, filter all EP using the exact same backend. Remove all EPs not using this backend from session options - for every explicitly specified EP, if it's removed, show a warning message in console	2024-03-15 11:47:45 -07:00
Yulong Wang	e771a763c3	[js/test] align web test runner flags with ort.env (#19790 ) ### Description the `npm test` flags are difficult to memorize, because they are different to the `ort.env` flags. This change makes those flags align with ort JS API. eg. `--wasm-enable-proxy` became `--wasm.proxy`. Old flags are marked as deprecated except `-x` (as a shortcut of `--wasm.numThreads`)	2024-03-13 12:00:36 -07:00
Satya Kumar Jandhyala	ed250b88c3	[JS/WebGPU] Optimize MatMulNBits (#19852 ) ### Description Use vec<2> or vec<4>, operands in MatMulNBits ### Motivation and Context Improve performance	2024-03-13 10:33:14 -07:00
Yang Gu	53de2d8cb0	[js/webgpu] Enable GroupedConvVectorize path (#19791 ) Vectorize met 2 failed cases in a CI bot with NVIDIA GPU, but we couldn't repro with all the GPUs at hand, including NVIDIA GPUs. This PR introduces GPUAdapterInfo and enables this opt on non-NVIDIA GPUs to make the bots happy. No obivous perf gain can be seen if we enable vectorize on NVIDIA. However, it shows big perf improvement on Intel. On my Gen12 Intel GPU, mobilenetv2-12 perf was improved from 11.14ms to 7.1ms.	2024-03-12 22:25:07 -07:00
Yulong Wang	4538d31a8b	[js/webgpu] expose a few properties in WebGPU API (#19857 ) ### Description This change exposes a few properties in `ort.env.webgpu` to resolve feature requirement mentioned in properties in https://github.com/microsoft/onnxruntime/pull/14579#discussion_r1519612619. - Add `powerPreference` and `forceFallbackAdapter` in `ort.env.webgpu`, to allow users to set the value of the properties before the first inference session is created. - Add readonly property `adapter` in `ort.env.webgpu` to allow users to get the adapter instance. Now users can access `ort.env.webgpu.device` and `ort.env.webgpu.adapter`. @xenova @beaufortfrancois	2024-03-12 19:50:51 -07:00
Satya Kumar Jandhyala	24b72d2613	[JS/WebGPU] Preserve zero size input tensor dims. (#19737 ) ### Description For Concat operation, the zero-size input tensor shape need to be preserved and, unlike non-zero tensors, the dims are not constrained to match other input tensors' dims. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-03-07 19:07:49 -08:00
Yulong Wang	a788514027	[js/web] dump debug logs for karma for diagnose purpose (#19785 ) ### Description dump debug logs for karma for diagnose purpose. This is for debugging the CI issue of Chrome launch failure and considered temporary.	2024-03-05 18:27:26 -08:00
Yulong Wang	f06164ef8b	[js/web] transfer input buffer back to caller thread (#19677 ) ### Description When using proxy worker, input buffers should be transferred back to the caller thread after `run()` call is done. Fixes #19488	2024-03-01 14:50:06 -08:00
Yulong Wang	e30618d055	[js/webgpu] use Headless for webgpu test by default (#19702 ) ### Description use Chromium Headless for webgpu test by default. Still use normal Chromium with window when debug=true or perfMode=true. Use the [`--headless=new`](https://developer.chrome.com/docs/chromium/new-headless) mode. ### Motivation and Context try to use a more stable way to launch npm tests to avoid a "chrome not found" issue in pipeline, which may potentially caused by windowed application.	2024-02-28 16:05:08 -08:00
Yulong Wang	3cb81cdde2	[js/common] move 'env.wasm.trace' to 'env.trace' (#19617 ) ### Description Try to move 'env.wasm.trace' to 'env.trace' to make it less confusing, because it also works in webgpu. Marked 'env.wasm.trace' as deprecated.	2024-02-27 11:07:15 -08:00
Yulong Wang	0edb035808	[js/web] fix suite test list for zero sized tensor (#19638 ) ### Description Fixes build break brought by #19614 Currently WebGL backend does not support zero sized tensor. This change split test data into 2 parts, and only enable zero sized tensor tests for WebGPU.	2024-02-24 10:09:07 -08:00
Guenther Schmuelling	bb43a0f133	[js/webgpu] minor fixes to make tinyllama work (#19564 )	2024-02-23 15:45:30 -08:00
Yulong Wang	aec2389ad0	[js/webgpu] allows a ProgramInfo's RunData to use zero sized output (#19614 ) ### Description This PR allows zero-sized output. To make the implementation simple, it does not support partial zero-sized tensor. Which means, either all outputs are zero-sized, or an error will be reported. added 2 tests: - op test of `Add` with input T[2,0] T[2,1], and - test_split_zero_size_splits	2024-02-23 12:52:47 -08:00
satyajandhyala	ae3d73c981	[JS/WebGPU] Fix Split and Where to handle corner cases. (#19613 ) ### Description <!-- Describe your changes. --> 1. Fix Where operator to handle Boolean input less than 4 bytes. 2. Fix JSEP test harness to use tensor names consistently. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-02-23 00:21:15 -08:00
Segev Finer	29b1106033	[node] Switch to setImmediate to avoid starving the Node.js event loop (#19610 ) ### Description <!-- Describe your changes. --> Switch to setImmediate to avoid starving the Node.js event loop There should really be a true async version though, running computationally intensive things on the event loop will stop everything else from happening while it is running, e.g. a web server from answering requests. This can be done by wrapping `RunAsync` behind a [`napi::Promise`](https://github.com/nodejs/node-addon-api/blob/main/doc/promises.md) to run on the onnxruntime thread pool or [`AsyncWorker`]( https://github.com/nodejs/node-addon-api/blob/main/doc/async_worker.md) for the Node.js/libuv thread pool. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Without this, if you run inference in a tight loop, without anything else in between that is async/deferred, `process.nextTick` will lead to starving the event loop and not letting anything else run, `setImmediate` at least lets the event loop spin between calls to `run`. See https://dev.to/ynmanware/setimmediate-settimeout-and-process-nexttick-3mfd Contributed on behalf of [Swimm](https://swimm.io/)	2024-02-22 18:53:50 -08:00
dependabot[bot]	76a2a487a1	Bump ip from 1.1.8 to 1.1.9 in /js/react_native/e2e (#19583 ) Bumps [ip](https://github.com/indutny/node-ip) from 1.1.8 to 1.1.9. <details> <summary>Commits</summary> <ul> <li><a href="`1ecbf2fd8c`"><code>1ecbf2f</code></a> 1.1.9</li> <li><a href="`6a3ada9b47`"><code>6a3ada9</code></a> lib: fixed CVE-2023-42282 and added unit test</li> <li>See full diff in <a href="https://github.com/indutny/node-ip/compare/v1.1.8...v1.1.9">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=ip&package-manager=npm_and_yarn&previous-version=1.1.8&new-version=1.1.9)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) Dependabot will merge this PR once CI passes on it, as requested by @fs-eire. [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-02-22 13:58:17 -08:00
Xu Xing	fe82fccf1a	[js/webgpu] Fix Conv2DTransposeMatMul f16 compilation failure (#19596 ) This is used in sam-h-decoder-f16. ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-02-22 13:09:28 -08:00
dependabot[bot]	38c3432393	Bump ip from 1.1.8 to 1.1.9 in /js/react_native (#19582 ) Bumps [ip](https://github.com/indutny/node-ip) from 1.1.8 to 1.1.9. <details> <summary>Commits</summary> <ul> <li><a href="`1ecbf2fd8c`"><code>1ecbf2f</code></a> 1.1.9</li> <li><a href="`6a3ada9b47`"><code>6a3ada9</code></a> lib: fixed CVE-2023-42282 and added unit test</li> <li>See full diff in <a href="https://github.com/indutny/node-ip/compare/v1.1.8...v1.1.9">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=ip&package-manager=npm_and_yarn&previous-version=1.1.8&new-version=1.1.9)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) Dependabot will merge this PR once CI passes on it, as requested by @fs-eire. [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-02-21 13:58:53 -08:00
Matttttt	ebd220b073	Misspelling in README.md (#19433 ) Fixed a misspelling.	2024-02-21 13:38:18 -08:00
Xu Xing	57d6819212	[js/web] Fix fused-conv is not included in npm test (#19581 ) BUG: https://github.com/microsoft/onnxruntime/issues/18855 ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-02-21 08:08:47 -08:00
Yulong Wang	58f4921686	[js] changes to allow Float16Array if any polyfill is available (#19305 ) ### Description This change adds only necessary code to enable ort-web works with any Float16Array polyfill. Unlike #19302, in this PR, ort-web does not include any specific polyfill; instead, it's user's choice for how to use a polyfill. ORT-web uses Float16Array if it's available; otherwise, fallback to use Uint16Array. ```js // case 1: user does not use polyfill: import * as ort from 'onnxruntime-web'; const myF16Data = new Uint16Array(...); // need to use Uint16Array const myF16tensor = new ort.Tensor('float16', myF16Data, dims); ``` ```js // case 2: user use polyfill: import * as ort from 'onnxruntime-web'; import { Float16Array, isFloat16Array, isTypedArray, getFloat16, setFloat16, f16round, } from "@petamoriken/float16"; globalThis.Float16Array = Float16Array; // ort-web will pick the global Float16Array const myF16Data = new Float16Array(...); // Use the polyfilled Float16Array type const myF16tensor = new ort.Tensor('float16', myF16Data, dims); ```	2024-02-21 00:31:06 -08:00
Yulong Wang	6e04e36e3f	[js/common] upgrade tsc in common from 4.9.5 to 5.2.2 (#19317 ) ### Description upgrade tsc in common from 4.9.5 to 5.2.2	2024-02-20 17:33:37 -08:00
Yulong Wang	70567a4b3a	[js/web] use ApiTensor insteadof onnxjs Tensor in TensorResultValidator (#19358 ) ### Description use ApiTensor insteadof onnxjs Tensor in TensorResultValidator. Make test runner less depend on onnxjs classes.	2024-02-20 17:33:21 -08:00
Yulong Wang	3fe2c137ee	[js] small fix to workaround formatter (#19400 ) ### Description Rename shader variable names to snake_case naming and also to avoid formatter behaving inconsistently in win/linux.	2024-02-20 17:23:01 -08:00
Jiajie Hu	1b48054e1b	[js/webgpu] Create Split indices helpers by rank, not by shape (#19554 ) ### Description This is required to make shape uniforms really work. ### Motivation and Context The bug was unveiled in a model with multiple Split nodes. The later nodes would try to reuse a previous pipeline cache, while the old shapes were hardcoded as constants in cache.	2024-02-20 09:24:34 -08:00
satyajandhyala	dfeda9019c	[JS/WebGPU] Add MatMulNBits (#19446 ) ### Description Add MatMulNBits to support MatMul using 4-bit quantized weights ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-02-17 09:19:17 -08:00
Yulong Wang	06269a3952	[js/webgpu] allow uint8 tensors for webgpu (#19545 ) ### Description allow uint8 tensors for webgpu	2024-02-16 18:28:27 -08:00
Yulong Wang	03be65e064	[js/web] fix types exports in package.json (#19458 ) ### Description Since TypeScript v4.7, types need to specify inside "exports" field when it is available. This PR appends types just before each "default" (which is required by spec to be the last item). Fixes #19403.	2024-02-08 15:56:48 -08:00
Yulong Wang	5ff27ef02a	[js/webgpu] support customop FastGelu (#19392 ) ### Description Support WebGPU custom operator FastGelu.	2024-02-06 09:07:31 -08:00
Jiajia Qin	ccbe264a39	[js/webgpu] Add LeakyRelu activation for fusedConv (#19369 ) ### Description This PR 1) adds LeakyRelu activation for fusedConv; 2) makes `vec4<f16>` value work with `float32` uniforms attributes. For example: `clamp(value, vec4<f16>(uniforms.clip_min), vec4<f16>(uniforms.clip_max)` will throw compilation errors since `uniforms.clip_min` and `uniforms.clip_min` are `f32` not `f16`. So we need to change it to `clamp(value, vec4<f16>(f16(uniforms.clip_min)), vec4<f16>(f16(uniforms.clip_max))` And above problem was introduced when we make activation attributes as uniforms instead of constant. BTW, after adding LeakyRelu, `realesrgan-t256` model can pass.	2024-02-02 09:06:38 -08:00
Yulong Wang	50806a7dd5	[js/web] support external data in npm test (#19377 ) ### Description support external data in npm test. This allows test runner to detect whether an external data is available in the test folder, and if it is, load it as external data automatically. this feature does not parse every model to figure out whether the model has external data. the following comments in code explained how to determine whether should parse the model file. ```js // for performance consideration, we do not parse every model. when we think it's likely to have external // data, we will parse it. We think it's "likely" when one of the following conditions is met: // 1. any file in the same folder has the similar file name as the model file // (e.g., model file is "model_abc.onnx", and there is a file "model_abc.pb" or "model_abc.onnx.data") // 2. the file size is larger than 1GB ```	2024-02-02 09:05:57 -08:00
Jiajia Qin	efc17e79de	[js/webgpu] Fix the undefined push error (#19366 ) ### Description This PR fixes below errors when enable webgpu profiling: ``` TypeError: Cannot read properties of undefined (reading 'push') ```	2024-02-02 02:04:06 -08:00
Xu Xing	3a2ab1963a	[js/webgpu] Refactor createTensorShapeVariables (#18883 )	2024-02-01 17:59:00 -08:00
Yulong Wang	dd1f6ccc45	[js/webgpu] resolve codescan alert (#19343 ) ### Description resolve codescan alert: https://github.com/microsoft/onnxruntime/security/code-scanning/17687	2024-01-30 21:06:21 -08:00
Xu Xing	d73131cf0f	[js/webgpu] Use DataType as uniform cpu type (#19281 ) This saves turning data type to string by tensorDataTypeEnumToString.	2024-01-30 21:05:08 -08:00
Jiajia Qin	85cef0af8c	[js/webgpu] Support capture and replay for jsep (#18989 ) ### Description This PR expands the graph capture capability to JS EP, which is similar to #16081. But for JS EP, we don't use the CUDA Graph, instead, we records all gpu commands and replay them, which removes most of the cpu overhead to avoid the the situation that gpu waiting for cpu. mobilenetv2-12 becomes 3.7ms from 6ms on NV 3090 and becomes 3.38ms from 4.58ms on Intel A770. All limitations are similar with CUDA EP: 1. Models with control-flow ops (i.e. If, Loop and Scan ops) are not supported. 2. Usage of graph capture is limited to models where-in all ops in the model can be partitioned to the JS EP or CPU EP and no memory copy between them. 3. Shapes of inputs/outputs cannot change across inference calls. 4. IObinding is required. The usage is like below: Method 1: specify outputs buffers explicitly. ``` const sessionOptions = { executionProviders: [ { name: "webgpu", }, ], enableGraphCapture: true, }; const session = await ort.InferenceSession.create('./models/mobilenetv2-12.onnx', sessionOptions); // prepare the inputBuffer/outputBuffer ... ... const feeds = { 'input': ort.Tensor.fromGpuBuffer(inputBuffer, { dataType: 'float32', dims }) }; const fetches = { 'output': ort.Tensor.fromGpuBuffer(outputBuffer, { dataType: 'float32', dims: [1, 1000] }) }; let results = await session.run(feeds, fetches); // The first run will begin to capture the graph. // update inputBuffer content ... ... results = = await session.run(feeds, fetches); // The 2ed run and after will directly call replay to execute the graph. ... ... session.release(); ``` Method 2: Don't specify outputs buffers explicitly. Internally, when graph capture is enabled, it will set all outputs location to 'gpu-buffer'. ``` const sessionOptions = { executionProviders: [ { name: "webgpu", }, ], enableGraphCapture: true, }; const session = await ort.InferenceSession.create('./models/mobilenetv2-12.onnx', sessionOptions); // prepare the inputBuffer ... ... const feeds = { 'input': ort.Tensor.fromGpuBuffer(inputBuffer, { dataType: 'float32', dims }) }; let results = await session.run(feeds); // The first run will begin to capture the graph. // update inputBuffer content ... ... results = = await session.run(feeds); // The 2ed run and after will directly call replay to execute the graph. ... ... session.release();	2024-01-30 18:28:03 -08:00
Jiajia Qin	90883a366a	[js/webgpu] Add hardSigmoid activation for fusedConv (#19233 ) ### Description Add hardSigmoid activation for fusedConv. It will be used by mobilenetv3-small-100 model.	2024-01-30 16:28:53 -08:00
Xu Xing	624b4e2063	[js/webgpu] Remove enableShapesUniforms (#19279 )	2024-01-29 17:49:06 -08:00
Guenther Schmuelling	9e69606360	fix f16 for attention, enable slice and flatten for more types (#19262 )	2024-01-29 10:13:46 -08:00
Xu Xing	a3f0e2422b	[js/webgpu] Support f16 uniform (#19098 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-25 16:58:22 -08:00
Xu Xing	656ca66186	[js/webgpu] Support uniforms for conv, conv transpose, conv grouped (#18753 )	2024-01-25 15:37:05 -08:00
Jiajie Hu	5b06505073	[js/webgpu] Fix Tanh explosion (#19201 ) ### Description ```math \tanh(x)=\frac{e^x-e^{-x}}{e^x+e^{-x}}= \left\{ \begin{array}{cc} -\frac{1-e^{-2\cdot(-x)}}{1+e^{-2\cdot(-x)}}, & x<0 \\ 0, & x=0 \\ \frac{1-e^{-2x}}{1+e^{-2x}}, & x>0 \end{array} \right. ``` ### Motivation and Context On some platforms, $$\tanh(1000)=\frac{e^{1000}-e^{-1000}}{e^{1000}+e^{-1000}}$$ would produce NaN instead of 0.999... or 1 (imagine $e^{1000}=\infty$ and $\frac{\infty}{\infty}$ explodes).	2024-01-25 08:25:35 -08:00
Wanming Lin	7252c6e747	[WebNN EP] Support WebNN async API with Asyncify (#19145 )	2024-01-24 15:37:35 -08:00
Yang Gu	591f90c0b9	[js/webgpu] Fix issue of timestamp query (#19258 ) When we enable webgpu profiling mode between session.create and session.run, current implementation has a problem to create querySet (and also queryResolveBuffer) if we share the commandEncoder with inputs upload. This PR fixes this by moving the querySet creation to the place we set queryType.	2024-01-24 14:49:37 -08:00
satyajandhyala	a33b5bd1fa	[JS/WebGPU] Added Uniforms to SkipLayerNorm. (#18788 ) ### Description Added Uniforms to SkipLayerNorm ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Improve performance --------- Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2024-01-25 01:12:21 +05:30
Xu Xing	61610ff986	[js/webgpu] Add FusedConv clip test case (#18900 ) Bug: https://github.com/microsoft/onnxruntime/issues/18899	2024-01-23 08:25:05 -08:00
Jiajia Qin	d226e40856	[js/webgpu] set query type in onRunStart (#19202 ) ### Description <!-- Describe your changes. --> `env.webgpu.profiling` is a global flag. It may change before each session.run. So the best place is to update it in `onRunStart` event. After this, we can directly check `this.queryType`'s value. Without this pr, we need to make sure that `getCommandEncoder()` is called before checking `this.queryType`. Otherwise, it may happen that `pendingKernels`'s length is not equal to `pendingDispatchNumber`'s length. See the two ugly workarounds [1)](`e630dbf528 (diff-006fc84d3997f96a29b8033bd2075d6a0a9509211bd5812a6b934fc74fedfd9dR267-R268)`) and [2)](`e630dbf528 (diff-618fe297fbe7a1da586380163b8fd2627311ccc217640a3c5cdc9c17a33472c1R73-R80)`) if we don't introduce `onRunStart`. Or we need to call `setQueryType` in each kernel run.	2024-01-22 16:08:55 -08:00
Jiajia Qin	2e0a388c36	[js/webgpu] Add HardSigmoid support (#19215 ) ### Description This op is required in mobilenetv3-small-100. With this PR, mobilenetv3-small-100 model becomes less than 10 ms from over 100 ms on ADL.	2024-01-22 15:53:26 -08:00
Yulong Wang	d69b622ef4	[js/web] upgrade dependency packages version (#19193 ) ### Description upgrade packages version. ``` # npm audit report electron 23.0.0-alpha.1 - 23.3.13 Severity: moderate ASAR Integrity bypass via filetype confusion in electron - https://github.com/advisories/GHSA-7m48-wc93-9g85 fix available via `npm audit fix --force` Will install electron@28.1.4, which is a breaking change node_modules/electron get-func-name <2.0.1 Severity: high Chaijs/get-func-name vulnerable to ReDoS - https://github.com/advisories/GHSA-4q6p-r6v2-jvc5 fix available via `npm audit fix` node_modules/get-func-name semver <=5.7.1 \|\| 6.0.0 - 6.3.0 \|\| 7.0.0 - 7.5.1 Severity: moderate semver vulnerable to Regular Expression Denial of Service - https://github.com/advisories/GHSA-c2qf-rxjj-qqgw semver vulnerable to Regular Expression Denial of Service - https://github.com/advisories/GHSA-c2qf-rxjj-qqgw semver vulnerable to Regular Expression Denial of Service - https://github.com/advisories/GHSA-c2qf-rxjj-qqgw fix available via `npm audit fix` node_modules/cross-spawn/node_modules/semver node_modules/global-agent/node_modules/semver node_modules/semver ```	2024-01-18 13:45:42 -08:00
Yulong Wang	f87e69801f	[js/web] show warning when numThreads is set but threads is not supported (#19179 ) ### Description show warning when numThreads is set but threads is not supported. Resolves #19148, #18933 for web: when crossOriginIsolated is false. for node: always disable.	2024-01-17 15:04:22 -08:00
Yulong Wang	146ebaf91e	[js/web] allow proxy to load model with 1GB <= size < 2GB (#19178 ) ### Description allow proxy to load model with 1GB <= size < 2GB resolves #19157.	2024-01-17 15:03:43 -08:00
Rachel Guo	bd9d8fb2a5	[ORT 1.17.0 release] Bump up version to 1.18.0 (#19170 ) ### Description <!-- Describe your changes. --> Bump up version to 1.18.0 since the release branch has been cut. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>	2024-01-17 11:18:32 -08:00
Guenther Schmuelling	9dee543bed	fix gemm beta for fp16 (#19153 ) per onnx spec beta is always fp32 so we need to cast it	2024-01-15 18:40:38 -08:00
Yulong Wang	f917dde717	[web] remove xnnpack from web backends (#19116 ) ### Description XNNPACK is already disabled in web assembly build. This change removes the xnnpack backend registration in JS.	2024-01-13 23:04:02 -08:00
Yang Gu	e803f8eb0f	[js/webgpu] Refactor timestamp-query and introduce timestamp-query-inside-passes (#18894 ) We submit kernels in a batch (a fixed number 16 is used except for the last batch) for better performance. However, timestamp query support is at pass level so we disable the batch execution in profiling mode in previous implementation. Actually we can have multiple passes in a batch so that we don't have to disable batch execution, which is the first enhancement of this PR. Furthermore, WebGPU has an extension to support timestamp query inside passes, which isn't supported by all the platforms (e.g., Windows supports it, while macOS doesn't). This is expected to have lower cost compared with multiple passes solution. So this PR also introduce this support when available. This PR also refactors some implementation related to kernelInfo, and try to unify the related kernel names.	2024-01-13 00:23:17 -08:00
Yulong Wang	07cfc56538	[js] enable external data loading for ort-web (#19087 ) ### Description enable external data loading for ort-web. ### Why The ORT external data design is highly depending on the file system, especially synchronous file I/O APIs. Those are not available in web platforms. We need to have extra code to make external data working on web. ### How Considering there is no file system in web, an implementation for web to support external data is to use pre-loaded data. Assume model file a.onnx includes initializers that linked to ./b.bin, we require users to pass a full data file list when creating the session. The user code will be look like: ```js const mySess = await ort.InferenceSession.create('./path/model/a.onnx', { // session options externalData: [ { // relative or absolute path/URL of the file, // or a pre-loaded Uint8Array containing the data of the external data file data: './path/data/b.bin', // the relative path of the external data. Should match initializers' "location" value defined in the model file path: './b.bin' }, // { } if multiple external data file ] }); ``` Currently, this feature only works with JSEP build enabled.	2024-01-12 19:24:24 -08:00
Guenther Schmuelling	a756017e9f	[js/webgpu] more fixes for access above 2GB (#19065 ) when jsep calls javascript with an index to HEAP8 or HEAP32 the index is negative when the heap is above 2GB, even if we pass it as uint32_t it remains negative. So in javascript use >>> 0 to make it unsigned.	2024-01-12 17:47:37 -08:00
Jiangzhuo	a503561d0c	[js] using OffscreenCanvas when DOM is not available (#19033 ) ### Description when DOM API is not avaiable, using OffscreenCanvas ### Motivation and Context In some environment like service worker or web worker, the DOM API is not avaiable, we can use OffscreenCanvas API to replace `document.createElement('canvas')`. Most of the APIs of OffscreenCanvas and HTMLCanvasElement are the same, except that `toDataUrl` is missing. It fix this issues #19032	2024-01-12 13:54:05 -08:00
Guenther Schmuelling	4a5f13b681	fix resize for fp16 (#19110 ) resize for fp16 has 2 issues: scales are always f32 and roi can be f32 or f16. scales: this is fixed. roi this is fixed for the case where roi is not passed as optional input with f16. To fix this it requires a much larger change and I did not want to risk this short before a release. For all practical purpose passing roi as input with f16 should be rare and we can fix it in the near future.	2024-01-12 13:44:28 -08:00
Caroline Zhu	4dbaa73738	[js/web/training] added end-to-end tests (#18700 ) ## Summary * following inference's [set-up for end-to-end tests](https://github.com/microsoft/onnxruntime/tree/main/js/web/test/e2e), created an end-to-end test runner for training * this test runner copies testdata from the [trainingapi folder](https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/test/testdata/training_api) * then runs two tests (training session with evalModel & optimizer model, and training session with the minimum options), and tests if the ORT-web training package encompasses inference * these tests check * createTrainingSession * runTrainStep * runOptimizerStep if applicable * the parameters methods (getParametersSize, loadParametersBuffer, and getContiguousParameters) ## TL;DR * [`js/web/test/training/e2e/run.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-c1359c4d401f9ba69e937814219cefe5fd11b151a6ffd084c641af3c82e8216c) is responsible for setting up and running the end to end tests * [`js/web/test/training/e2e/common.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-ee5452491b7b2563d175d13d81d10f2323b12b18589aa4c5798962a8b904a4a8) contains the test function definitions (`testInferenceFunction`, `testTrainingFunctionMin`, `testTrainingFunctionAll`) ## Flow * entrypoint: user runs the following command in the terminal: `npm run test:training:e2e` * [`js/web/package.json`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-79275844e75c3c410bb3a71c7f59b2b633e5a3e975c804ffc47220025084da28) was modified to include an npm script that will run `run.js` which will run the end to end tests * [`js/web/test/training/e2e/run.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-c1359c4d401f9ba69e937814219cefe5fd11b151a6ffd084c641af3c82e8216c) is responsible for * detecting and installing local tarball packages of ORT-web * copying training data to the `js/web/training/e2e/data` folder * starting two Karma processes. Karma is a test runner framework that simulates testing in the browser. * In this case, the tests happen in Chrome. We can configure the tests to run in Edge and other browsers in the future. * one of these karma processes is self-hosted, meaning it pulls the ORT-web package from local * the other karma process is not self-hosted, meaning it pulls the ORT-web package from another source. In this case, we start an http server that serves the ORT-web binaries. * [`js/web/test/training/e2e/simple-http-server.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-f798ab485f3ec26c299fe5b2923574c9e4b090200ba20d490bbf6c183286993c) is responsible for starting the HTTP server and serving the ORT binary files. This code almost identical to the same code in the inference E2E tests. * [`js/web/test/training/e2e/karma.conf.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-436cfe8f670c768a04895bd4a1874a5e033f85e0e2d84941c62ff1f7c30a9f28) Karma configuration file that specifies what happens when a karma process is started. The config specifies Mocha as the testing framework, which will go through all the loaded files and run any tests that exist * [`js/web/test/training/e2e/browser-test-wasm.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-13b6155e106dddc7b531ef671186e69b2aadb8a0f4b2f3001db0991567d78221) File that contains the tests that Mocha will pick up on and run. * The test functions (such as testInference and testTrainingFunctionAll) are defined in [`js/web/test/training/e2e/common.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-ee5452491b7b2563d175d13d81d10f2323b12b18589aa4c5798962a8b904a4a8). ## Notes * I followed the [tests for training core](`b023de0bfc/orttraining/orttraining/test/training_api/core/training_api_tests.cc`) where they randomly generated input for the training session * E2E tests are triggered by running `npm run test:training:e2e` -- suggestions for alternative script names are appreciated!!! ## Motivation and Context - adding training bindings for web	2024-01-12 13:33:33 -08:00
zesongw	3eec1592bd	[WebNN EP] Update WebNN unit test list (#19103 ) Update WebNN test list in suite-test-list.jsonc so all test cases are passed behind WebNN CPU backend on Chrome Stable (Although some cases may fall back to CPU EP). Enable int64 support for WebNN in unit tests.	2024-01-12 10:22:38 -08:00
Jiajie Hu	acba63c36a	[js/webgpu] Change A/sqrt(B) to AinverseSqrt(B) in normalization ops (#19101 ) ### Description Change `A / sqrt(B)` to `A inverseSqrt(B)` in BatchNormalization, InstanceNormalization, LayerNormalization and SkipLayerNormalization. ### Motivation and Context For the same reason as the existence of the `inverseSqrt` built-in in WebGPU spec.	2024-01-12 00:08:16 -08:00
dependabot[bot]	5373c0c730	Bump follow-redirects from 1.15.2 to 1.15.4 in /js/web (#19068 ) Bumps [follow-redirects](https://github.com/follow-redirects/follow-redirects) from 1.15.2 to 1.15.4. <details> <summary>Commits</summary> <ul> <li><a href="`65858205e5`"><code>6585820</code></a> Release version 1.15.4 of the npm package.</li> <li><a href="`7a6567e16d`"><code>7a6567e</code></a> Disallow bracketed hostnames.</li> <li><a href="`05629af696`"><code>05629af</code></a> Prefer native URL instead of deprecated url.parse.</li> <li><a href="`1cba8e85fa`"><code>1cba8e8</code></a> Prefer native URL instead of legacy url.resolve.</li> <li><a href="`72bc2a4229`"><code>72bc2a4</code></a> Simplify _processResponse error handling.</li> <li><a href="`3d42aecdca`"><code>3d42aec</code></a> Add bracket tests.</li> <li><a href="`bcbb096b32`"><code>bcbb096</code></a> Do not directly set Error properties.</li> <li><a href="`192dbe7ce6`"><code>192dbe7</code></a> Release version 1.15.3 of the npm package.</li> <li><a href="`bd8c81e4f3`"><code>bd8c81e</code></a> Fix resource leak on destroy.</li> <li><a href="`9c728c314b`"><code>9c728c3</code></a> Split linting and testing.</li> <li>Additional commits viewable in <a href="https://github.com/follow-redirects/follow-redirects/compare/v1.15.2...v1.15.4">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=follow-redirects&package-manager=npm_and_yarn&previous-version=1.15.2&new-version=1.15.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-01-11 22:25:50 -08:00
Guenther Schmuelling	d0bac8216d	[js/webgpu] fix bcast in where (#19009 )	2024-01-11 12:13:24 -08:00
Jiajia Qin	a89db01fce	[js/webgpu] disable GroupedConvVectorize path (#19090 ) Disable createGroupedConvVectorizeProgramInfo path due to bots failures on below two cases: [webgpu]Conv - conv - vectorize group - B [webgpu]Conv - conv - vectorize group - D	2024-01-11 08:13:14 -08:00
dependabot[bot]	f11713702f	Bump follow-redirects from 1.15.2 to 1.15.4 in /js/node (#19070 ) Bumps [follow-redirects](https://github.com/follow-redirects/follow-redirects) from 1.15.2 to 1.15.4. <details> <summary>Commits</summary> <ul> <li><a href="`65858205e5`"><code>6585820</code></a> Release version 1.15.4 of the npm package.</li> <li><a href="`7a6567e16d`"><code>7a6567e</code></a> Disallow bracketed hostnames.</li> <li><a href="`05629af696`"><code>05629af</code></a> Prefer native URL instead of deprecated url.parse.</li> <li><a href="`1cba8e85fa`"><code>1cba8e8</code></a> Prefer native URL instead of legacy url.resolve.</li> <li><a href="`72bc2a4229`"><code>72bc2a4</code></a> Simplify _processResponse error handling.</li> <li><a href="`3d42aecdca`"><code>3d42aec</code></a> Add bracket tests.</li> <li><a href="`bcbb096b32`"><code>bcbb096</code></a> Do not directly set Error properties.</li> <li><a href="`192dbe7ce6`"><code>192dbe7</code></a> Release version 1.15.3 of the npm package.</li> <li><a href="`bd8c81e4f3`"><code>bd8c81e</code></a> Fix resource leak on destroy.</li> <li><a href="`9c728c314b`"><code>9c728c3</code></a> Split linting and testing.</li> <li>Additional commits viewable in <a href="https://github.com/follow-redirects/follow-redirects/compare/v1.15.2...v1.15.4">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=follow-redirects&package-manager=npm_and_yarn&previous-version=1.15.2&new-version=1.15.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-01-10 22:08:14 -08:00
Jiajia Qin	fd6bab4250	[js/webgpu] Provide a vectorized algorithm for GroupedConv (#18884 ) ### Description This PR provides a vectorized algorithm for NHWC GroupedConv to improve performance. The aggregate time of GroupedConv in mobilenetv2-12 becomes ~1ms from ~4ms on Intel Alder Lake machine. About 20% improvement for the whole model.	2024-01-10 16:12:43 -08:00
Xu Xing	ed0f26d3d4	[js/webgpu] Revert parse norm attributes (#19074 ) This resolves the below build errors: ``` lib/wasm/jsep/webgpu/op-resolve-rules.ts:19:23 - error TS2724: '"./ops/instance-norm"' has no exported member named 'parseInstanceNormAttributes'. Did you mean 'InstanceNormAttributes'? 19 import {instanceNorm, parseInstanceNormAttributes} from './ops/instance-norm'; ~~~~~~~~~~~~~~~~~~~~~~~~~~~ lib/wasm/jsep/webgpu/op-resolve-rules.ts:19:23 - error TS6133: 'parseInstanceNormAttributes' is declared but its value is never read. 19 import {instanceNorm, parseInstanceNormAttributes} from './ops/instance-norm'; ~~~~~~~~~~~~~~~~~~~~~~~~~~~ lib/wasm/jsep/webgpu/op-resolve-rules.ts:20:20 - error TS2305: Module '"./ops/layer-norm"' has no exported member 'parseLayerNormAttributes'. 20 import {layerNorm, parseLayerNormAttributes} from './ops/layer-norm'; ~~~~~~~~~~~~~~~~~~~~~~~~ lib/wasm/jsep/webgpu/op-resolve-rules.ts:20:20 - error TS6133: 'parseLayerNormAttributes' is declared but its value is never read. 20 import {layerNorm, parseLayerNormAttributes} from './ops/layer-norm'; ```	2024-01-09 20:58:50 -08:00
Xu Xing	76dfe5347c	[js/webgpu] Support uniforms for instance-norm (#18929 ) Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com>	2024-01-09 14:56:00 -08:00
Changming Sun	a2afd92093	Format TS code (#19066 ) ### Description Format code	2024-01-09 13:41:10 -08:00
zesongw	ad6dd0a597	[WebNN] Enable npm unit tests (#18486 ) ### Description - Support more test cases for WebNN EP in suite-test-list.jsonc - Add DISABLE_WEBNN flag in build.ts as preparing for WebNN EP release - Add test option: '--webnn-device-type' in test-runner-args-cli.ts to support running WebNN 'gpu' deviceType - Use Chrome Stable as default browser for WebNN testing to unblock the CI limitation.	2024-01-09 10:10:57 -08:00
Xu Xing	557ac74c05	[js/webgpu] Support gemm uniforms (#19056 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-09 09:57:06 -08:00
Xu Xing	42ba2aed54	[js/webgpu] Support pad uniforms (#19057 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-09 09:34:56 -08:00
Xu Xing	eb92681bfb	[js/webgpu] Support range uniforms (#19055 )	2024-01-09 09:33:57 -08:00
Xu Xing	dee6a5b371	[js/webgpu] Support uniforms for attention and multihead attention (#18903 )	2024-01-09 07:46:30 -08:00
Xu Xing	8f024b7394	[js/webgpu] Support uniforms for layer-norm (#18755 )	2024-01-08 18:16:25 -08:00
Jiajie Hu	447a3a7c70	[js/webgpu] Fix Expand/Gather when input type is bool (#18999 ) ### Description Also update the op test suite. ### Motivation and Context Previously the total size in case `Expand - last dim is not divisible by 4` was a multiple of 4, even though the last dimension was not, so the bug has never been caught.	2024-01-05 08:16:15 -08:00
Yulong Wang	b18abaaa2c	[js/web] wait for threadpool initialization (#18952 ) ### Description a replacement of #18683. try to resolve #18689. By specifying "-s PTHREAD_POOL_SIZE" flag in emscripten, it forces the threadpool to initialize before the webassembly instance is available.	2024-01-04 08:06:55 -08:00
xhcao	867b9d8f04	[js/webgpu] Fix f16 errors for ConvTranspose2D (#18986 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-04 08:06:01 -08:00
Steven Roussey	d5628f52df	link to docs incorrect for js/web/node (#18960 ) ### Description link to docs incorrect for js/web/node ### Motivation and Context Trying to build myself and not yet succeeding.	2024-01-03 17:30:24 -08:00
Jiajie Hu	3b8b9147fa	[js/webgpu] Mitigate floating point accuracy issue in Resize (#18956 ) ### Description The patch fixes a floating point accuracy issue in Resize by preferring integer indices and integer arithmetic where possible. ### Motivation and Context Model test `test_resize_upsample_sizes_nearest_floor_align_corners` was observed to be failing on certain platforms. The root cause is the inaccurate floating point evaluation of 21 / 7 (2.999... vs 3), which results in the wrong input element to be indexed (floor(2.999...) vs floor(3)).	2024-01-03 14:15:26 -08:00
Yang Gu	c5f3952b68	[js/webgpu] Introduce trace support (#18928 ) This is to leverage console.timeStamp to add a single marker to browsers' (only Chromium and Firefox support it) performance tool. With this support, we can dump both CPU and GPU timestamps, and use post-processing tool to clearly understand the calibrated timeline. A demo tool can be found at https://github.com/webatintel/ort-test, and more detailed info can be found at https://docs.google.com/document/d/1TuVxjE8jnELBXdhI4QGFgMnUqQn6Q53QA9y4a_dH688/edit.	2024-01-03 10:13:17 -08:00
satyajandhyala	780fc3611b	[JS/Web] Sajandhy/webgpu resize scales rank check (#18954 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-12-29 09:23:27 -08:00
Jiajia Qin	44584c3ebe	[js/webgpu] only declare shape and strides in shader when necessary (#18940 ) ### Description Previously, shape and strides were added unconditionally even they are not used. This PR fixes this issue and only adds shape and strides when they are required. It's useful when some shapes are not used as uniform if the program depends on type instead of rank.	2023-12-28 15:43:08 -08:00
Jiajia Qin	c613cc58a9	[js/webgpu] Fix shader compilation errors in Resize (#18947 ) ### Description An extra right parenthesis was added by accidentally, which results some resize cases fail. This PR fixes it.	2023-12-28 13:15:26 -08:00
satyajandhyala	3bbe4fe2ff	[JS/WebGPU] Add trilinear interpolation to Resize; activation_params attribute is optional for FusedConv also. (#18842 ) ### Description Add trilinear interpolation to Resize and changed activation_params attribute as optional for FuseConv. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-12-27 16:21:29 -08:00
Guenther Schmuelling	31d4a21c4b	[js/webgpu] fix heap access > 2GB (#18914 )	2023-12-27 15:22:05 -08:00
Xu Xing	0bc71b0c9b	[js/webgpu] Refactor attributes of pool (#18728 )	2023-12-26 17:23:52 -08:00
Yulong Wang	9a61388f0a	[js/web] revise backend registration (#18715 ) ### Description This PR revises the backend registration. The following describes the expected behavior after this change: (bolded are changed behavior) - (ort.min.js - built without webgpu support) - loading: do not register 'webgpu' backend - creating session without EP list: use default EP list ['webnn', 'cpu', 'wasm'] - creating session with ['webgpu'] as EP list: should fail with backend not available - (ort.webgpu.min.js - built with webgpu support) - loading: always register 'webgpu' backend ( previous behavior: only register 'webgpu' backend when `navigator.gpu` is available) - creating session without EP list: use default EP list ['webgpu', 'webnn', 'cpu', 'wasm'] - when WebGPU is available (win): use WebGPU backend - when WebGPU is unavailable (android): should fail backend init, and try to use next backend in the list, 'webnn' (previous behavior: does not fail backend init, but fail in JSEP init, which was too late to switch to next backend) - creating session with ['webgpu'] as EP list - when WebGPU is available (win): use WebGPU backend - when WebGPU is unavailable (android): **should fail backend init, and because no more EP listed, fail. related PRs: #18190 #18144	2023-12-20 14:45:55 -08:00
Yulong Wang	ffa6602686	[js/node] support manually dispose session (#18655 ) ### Description support manually dispose session in onnxruntime-node feature request: #16796	2023-12-19 16:20:00 -08:00
satyajandhyala	98510fb8fb	[JS/WebGPU] fix an error in Clip (#18799 ) ### Description <!-- Describe your changes. --> Check whether the min/max inputs are provided and use default values if not provided. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-12-19 13:51:01 -08:00
Frank	63b47ceaf8	[REACT NATIVE] Bugfix -> casing Podfile (#18861 ) ### Description The casing of Podfile is incorrect in the plugin. This causes issues when building iOS on case-sensitive systems such as Linux. ### Motivation and Context because cannot build ios on case sensitive systems	2023-12-19 10:20:46 +10:00
Jiajia Qin	8f7b89bd5b	[js/webgpu] Optimize NCHW layout for InstanceNormalization (#18123 ) ### Description The changes in this PR includes: 1) Fix f16 errors in InstanceNormalization with NCHW format. 2) Use vec to further optimize the original algorithm. 3) (Removed) Don't do layout conversion for InstanceNormalization for JSEP since InstanceNormalization itself is suitable for NCHW layout and has better performance in our current implementation. Tested on sd-vae-decoder-f16.onnx, it becomes 285 ms from 314 ms. The aggregate gpu profiling data can be found as below (Note the data is based change 3).): Before: <html> <body> <!--StartFragment--><span><span class="ui-provider ef bbg bbh bbi bbj bbk bbl bbm bbn bbo bbp bbq bbr bbs bbt bbu bbv bbw bbx bby bbz bca bcb bcc bcd bce bcf bcg bch bci bcj bck bcl bcm bcn" dir="ltr"> Kernel \| Time (Ms) \| Percentage (%) -- \| -- \| -- Conv \| 201.55 \| 69.56 InstanceNormalization \| 42.49 \| 14.67 Transpose \| 28.95 \| 9.99 Mul \| 5.69 \| 1.96 Add \| 3.82 \| 1.32 MatMul \| 3.27 \| 1.13 Sigmoid \| 2.24 \| 0.77 Resize \| 1.16 \| 0.40 Softmax \| 0.34 \| 0.12 Cast \| 0.24 \| 0.08 Sum \| 289.75 <br class="Apple-interchange-newline"><!--EndFragment--> </body> </html> After: <html> <body> <!--StartFragment--><span><span class="ui-provider ef bbg bbh bbi bbj bbk bbl bbm bbn bbo bbp bbq bbr bbs bbt bbu bbv bbw bbx bby bbz bca bcb bcc bcd bce bcf bcg bch bci bcj bck bcl bcm bcn" dir="ltr"> Kernel \| Time (Ms) \| Percentage (%) -- \| -- \| -- Conv \| 205.44 \| 79.43 InstanceNormalization \| 18.24 \| 7.05 Transpose \| 17.64 \| 6.82 Mul \| 5.69 \| 2.20 Add \| 3.81 \| 1.47 MatMul \| 3.56 \| 1.38 Sigmoid \| 2.24 \| 0.86 Resize \| 1.19 \| 0.46 Softmax \| 0.59 \| 0.23 Cast \| 0.24 \| 0.09 Sum \| 258.65 \| </span></span><!--EndFragment--> </body> </html> From above table, we can see that two ops time are greatly reduced. One is InstanceNormalization and the other is Transpose. The reason that the transpose time is reduced is because each InstanceNormalization is surrounded with two reshape ops in sd-vae-decoder-f16.onnx. Due to JSEP is prefer NHWC and InstanceNormalization is layout sensitive op, so two extra transpose ops are inserted dynamically when executing this model. After this change, those inserted transpose ops are not needed anymore. So the overall transpose time is reduced.	2023-12-15 11:26:15 -08:00
Jiajia Qin	4bbed4c71a	[js/webgpu] Fix f16 errors in unary (#18839 ) ### Description This PR fixes below errors: ``` no matching overload for operator > (vec4<f16>, vec4<f32>)	2023-12-15 11:25:12 -08:00
Yang Gu	81ad1e6ac3	[js/webgpu] Fix typo of outputShapes in profiling message (#18837 )	2023-12-15 08:57:48 -08:00
Jiajia Qin	b30e721dc8	[js/webgpu] Provide a naive vectorized matmul algorithm (#18758 ) ### Description This PR provided a vectorized matmul algorithm. In most situations, we still go to the workgroup memory optimized matmul. But for some situations, like N and K are very small, using workgroup optimized matmul can't fully utilize the underlying hardware due to the 32x32 tile size. So for very small N/K, we switch to the naive vectorized matmul algorithm to improve the hardware execution unit usage. With this PR, matmul with input0: [1, 36864, 3], input1: [1, 3, 3], input2: [3] becomes less than 1 ms from 4.34 ms on Intel Gen9 GPUs.	2023-12-13 09:03:23 -08:00
satyajandhyala	0ca84549ab	[JS/Web] Added uniforms to Reduce, Resize and Split Ops. (#18727 ) ### Description <!-- Describe your changes. --> Added uniforms to Reduce op ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Improve perforamnce.	2023-12-12 11:12:23 -08:00
satyajandhyala	d673e39ad8	[JS/WebGPU] Added uniforms to Tile and Where Ops (#18768 ) ### Description <!-- Describe your changes. --> Added uniforms to Tile and Where Ops ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Improve performance.	2023-12-11 20:58:52 -08:00
Jiajia Qin	b4be9e1bbb	[js/webgpu] Fix shader compilation errors in cumsum (#18779 ) ### Description This PR fixes below shader compilation errors: ``` Tint WGSL reader failure: :39:31 error: no matching overload for operator + (f32, i32) 5 candidate operators: operator + (T, T) -> T where: T is abstract-float, abstract-int, f32, i32, u32 or f16 operator + (vecN<T>, T) -> vecN<T> where: T is abstract-float, abstract-int, f32, i32, u32 or f16 operator + (T, vecN<T>) -> vecN<T> where: T is abstract-float, abstract-int, f32, i32, u32 or f16 operator + (vecN<T>, vecN<T>) -> vecN<T> where: T is abstract-float, abstract-int, f32, i32, u32 or f16 operator + (matNxM<T>, matNxM<T>) -> matNxM<T> where: T is abstract-float, f32 or f16 sum = sum + get_inputByIndices(inputIndices); ^ - While validating [ShaderModuleDescriptor "CumSum"] - While calling [Device].CreateShaderModule([ShaderModuleDescriptor "CumSum"]).	2023-12-11 18:11:38 -08:00
Caroline Zhu	eb03032925	[js/web/training] lazyResetGrad implementation (#18711 ) ### Description * implemented lazyResetGrad function ### Motivation and Context * we are in the process of adding language bindings to enable training on web * lazyresetgrad ensures that the gradients are calculated correctly after the first runTrainStep call --------- Co-authored-by: Ashwini Khade <askhade@microsoft.com>	2023-12-11 17:36:54 -08:00
Yulong Wang	efbef5f611	[js/webgpu] allow to specify callback for profiling data (#18732 ) ### Description This PR is a replacement of #17820. allow to specify callback for profiling data Previous: ```js ort.env.webgpu.profilingMode = 'default'; // enable profiling // profiling data will output to console. ``` Now: ```js ort.env.webgpu.profiling = { mode: 'default'; // enable profiling ondata: (data) => { // .. process the profiling data } }; //for each kernel, "ondata" will be called once. only output to console if ondata is not specified. ```	2023-12-07 14:10:28 -08:00
Guenther Schmuelling	9aa7284351	fix lint error (#18708 )	2023-12-05 10:37:03 -08:00
satyajandhyala	70816001cc	[JS/Web] AddedUniforms in GatherElements. (#18670 ) ### Description Use Uniforms in GatherElements and clean-up ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Improve performance	2023-12-05 09:19:53 -08:00
Xu Xing	f949e0580b	[js/webgpu] Support uniforms for pool (#18656 )	2023-12-05 07:54:30 -08:00
satyajandhyala	10c547516d	[JS/Web] Added CumSum operator to JSEP (#18637 ) ### Description Added CumSum operator ### Motivation and Context Reduce CPU <->GPU data movement.	2023-12-05 07:51:53 -08:00
Caroline Zhu	c02a386145	[js/web/training] Implemented runEvalStep & runOptimizerStep (#18259 ) ### Description * implemented runEvalStep and runOptimizerStep * added hasEvalModel and hasOptimizerModel boolean fields in TrainingSession representation * added evalInputNames and evalOutputNames fields to TrainingSessionHandler & TrainingSession * removed the inputNamesEncoded and outputNamesEncoded fields from TrainingSessionHandler -- since none of the training methods require the input names and output names as parameters, there's no need to store them. ### Motivation and Context * part of the work for implementing web bindings for training * previous PR: #18250 --------- Co-authored-by: Ashwini Khade <askhade@microsoft.com>	2023-12-04 13:37:14 -08:00
Jiajia Qin	5353adcde3	[js/webgpu] Use the naive convTranspose when in/out channels are both 1 (#18658 ) ### Description With this change, convTranspose with input0 [1, 18, 32, 1], input1 [1, 1, 16, 16] becomes 0.59ms from 6.64ms.	2023-12-04 13:18:37 -08:00
Jiajia Qin	92ee664f64	[js/webgpu] Fix shader errors in indicesGet/Set when rank > 4 (#18661 ) ### Description Currently, for non-uniform variables, we still use `array<u32, N>` type instead of array<vec4<u32>, N1>`. So we can't always treat all variables with rank > 4 as uniforms to index. This PR fixes below errors: ``` error(s) generated while compiling the shader: :5:44 error: index 4 out of bounds [0..1] return uniforms.input_strides[4] * (outputIndices[4] % uniforms.input_shape[4])+uniforms.input_strides[3] * (outputIndices[3] % uniforms.input_shape[3])+uniforms.input_strides[2] * (outputIndices[2] % uniforms.input_shape[2])+uniforms.input_strides[1] * (outputIndices[1] % uniforms.input_shape[1])+uniforms.input_strides[0] * (outputIndices[0] % uniforms.input_shape[0]); ^ FAILED #OpTest# - expand.jsonc [webgpu]Expand - Expand 5D - float32 Expand 5 - float32 FAILED #OpTest# - expand.jsonc [webgpu]Expand - Expand 5D - float32 Expand 5 - shape < input.size()	2023-12-01 15:35:35 -08:00
Xu Xing	73d9b03509	[js/webgpu] Add multidimensional(>4) uniform support (#18546 ) This change removes the check of enableShapesUniforms. When all uses of this are removed, enableShapesUniforms can be removed too.	2023-11-30 17:10:33 -08:00
Jiajia Qin	6781b6cf3d	[js/webgpu] add bool type for Expand/Gather (#18615 ) ### Description In [detr-resnet-50](https://huggingface.co/Xenova/detr-resnet-50) model, it uses expand with bool type running on cpu ep. \| Kernel \| Shape \| Provider \| \| -------- \| ------- \| ------- \| \| Expand \| "input_type_shape" : [{"bool":[1,1,1,625]},{"int64":[4]}],"activation_size" : "657","output_type_shape" : [{"bool":[1,1,625,625]}] \| CPUExecutionProvider \| After this change, it will run on jsep. \| Kernel \| Shape \| Provider \| \| -------- \| ------- \| ------- \| \| Expand \| "input_type_shape" : [{"bool":[1,1,1,625]},{"int64":[4]}],"activation_size" : "657","output_type_shape" : [{"bool":[1,1,625,625]}] \| JsExecutionProvider \|	2023-11-30 15:47:08 -08:00
Jiajia Qin	b1e749e3be	[js/webgpu] Add program name into webgpuProfiling info (#18640 ) ### Description Currently, we only print the kernelName, which is hard to distinguish which shader we actually used. For example, GroupedConv/Conv2DMatMul both belong to Conv kernel. It's not intuitive for profiling.	2023-11-30 12:57:29 -08:00
Yulong Wang	e7f64f4510	[js/web] fix ESLint by excluding generated .js from tsconfig.json (#18634 ) ### Description ESLint will went into error sometimes. The root cause is because some large generated JavaScript file in the tsconfig's include path will cause TypeScript parser fail in a line of `string.match()` with a regex on a huge string (~8MB), causing the following error: ``` RangeError: Maximum call stack size exceeded ``` The solution is to remove the large files from the tsconfig's include path. Previously I excluded the `web/dist/` folder and this PR excludes `web/test/ort.test[.min].js`.	2023-11-30 09:50:47 -08:00
Yang Gu	227dcb3a88	[js/webgpu] Log the key and program info for artifact (#18365 ) With uniform support, ideally we may just keep one artifact for each program to save the compilation time. This PR just logs the related info, including key and program name, so that we may understand better the situation.	2023-11-29 18:01:12 -08:00
satyajandhyala	7335760424	[JS/Web] Add uniforms to Einsum (#18531 ) ### Description Add uinforms to Einsum ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Improve performance.	2023-11-29 15:30:33 -08:00
Yulong Wang	50e6235af1	[js/web] allow ShaderHelper to use internal (non-I/O) variables (#18525 ) ### Description This PR includes a change that inspired from #18452 to resolve a requirement: a shader may depend on an instance of `IndicesHelper` to generate WGSL code snippet, but the IndicesHelper instance is not necessarily an input/output of the program. So the existing `declareVariables()` function does not work with this scenario. In order to support this requirement, I added this "use" function to `interface ShaderHelper`, which takes a helper-like object as parameter. The hidden implementation `ShaderHelperImpl` class will iterate the helpers and call `impl()` for each. @axinging @qjia7	2023-11-28 15:15:59 -08:00
Rachel Guo	288b80d363	Add MacOS build to ORT C Pod (#18550 ) ### Description <!-- Describe your changes. --> As title. 1. Add macos build as an optionally enabled arch for pod and changes to exsiting build_ios_framework/assemble_c_pod scripts. 2. Enable macos build arch in ios packaging pipeline (currently for variants other than Mobile) and check the output artifacts are correct. 3. Write MacOS Test Target scheme in the test app and integrate into ios packaging CI testing pipeline. Currently the changes only apply to onnxruntime-c pod. as the original request was from ORT SPM which consumes the onnxruntime-c pod only as the binary target. TODO: could look into adding macos platform to objc pod as well. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Enable macos platform support in cocoapods. and also potentially produce binary target for enabling macos platform in SPM as well. Replace https://github.com/microsoft/onnxruntime/pull/18334 --------- Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local> Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-11-28 10:11:53 -08:00
Jiajia Qin	fc8631e2f1	[js/web] Fix conv2dMatmul errors due to #18452 (#18562 ) ### Description Currently, all conv2dMatmul with inChannels = 3 and outChannels % 4 = 0 will report compilation errors. Models, which include this kind of shape will be impacted, like mobilenetv2-12, resnet50 . The errors is introduced by #18452 https://github.com/microsoft/onnxruntime/pull/18452/files#diff-8b24ea43aa11b1346c0c9e327f9bce6b37a93bd8f2bf8a6392b2b263972b7ea2R200, which accidentally pass `components` to `x`. But `x`'s components is `innerElementSize` not `components `. And when `innerElementSize` is 3, we should use `1` in current design.	2023-11-27 21:21:47 -08:00
Caroline Zhu	dd355e39a0	[js/web/training] Added parameters methods (#18250 ) ### Description * Implemented: `getParametersSize`, `getContiguousParameters` (equivalent to copyParametersToBuffer), and `loadParametersBuffer` (equivalent to copyParametersFromBuffer) * as part of these changes, getParametersSize was added to the TrainingSession interface so that users know what size buffer to create for loadParametersBuffer * The parameters methods in the interface were modified to take in a Float32Array instead ### Motivation and Context * part of the work for implementing web bindings for training * enables federated learning in the web * previous PR: #18006 --------- Co-authored-by: Ashwini Khade <askhade@microsoft.com>	2023-11-27 10:30:13 -08:00
Jiajia Qin	64dacc2892	[js/webgpu] Add BatchNormalization Op (#18468 ) ### Description This PR adds `BatchNormalization` with `float` support. Some Todos: 1. all inputs don't have same data type. For example, x/y is float16, but bias/scale is float32 or double. 2. training mode support. We see many models are using `BatchNormalization` ops. However, due to the missing in jsep, all of them run on cpu, which result very poor performance. With this PR's support, densenet-9 model becomes 20.29 ms from 250.69 ms.	2023-11-22 15:58:06 -08:00
Xu Xing	fa106942a7	[js/webgpu] Refactor matmul conv to support uniforms for matmul (#18452 ) This change refactored matmul/conv related programs to support shape uniforms. Currently only matmul shape uniforms are fully enabled. TODOs: add input dependencies for conv related programs, turn clipMax and clipMin to uniforms.	2023-11-22 14:42:55 -08:00
satyajandhyala	841f7ed3e0	[[JS/Web]Added uniform to Expand op. (#18558 ) ### Description <!-- Describe your changes. --> Added Uniforms to Expand operator kernel ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Improve performance	2023-11-22 14:14:24 -08:00
Arthur Islamov	1c555c5fc1	[JS/Web] Resize & BiasSplitGelu fp16 support (#18536 ) ### Description Resize and BiasSplitGelu fp16 support on WebGPU	2023-11-22 12:12:07 -08:00
Yulong Wang	c7fd930330	[js/web] unify resolve rules for "Clip" (#18527 ) ### Description It was a mistake to use 2 different names for Clip operator in op-resolve-rules.ts for different opset. An optimized implementation can handle both cases (opset < 11 and opset >=11). Remove "ClipV10" as an entry from the table.	2023-11-20 23:18:06 -08:00
Jiajia Qin	abdf8b7c3f	[js/webgpu] Optimize broadcast binary. (#18185 ) ### Description Currently, the binary algorithms are divided into the vectorize one (efficient) and non-vectorize one (less efficient). Below situations will go to the vectorize one: 1) A or B's shape length is 1. 2) The shared dimensions length of A and B are divisible by 4. 3) A and B have same shape. This PR adds another situation as below to go to the vectorize algorithm. 4. A or B's last dimension is divisible by 4. With this change, the aggerate time of Add in sam-b-encoder becomes 309.65 ms from 409.12 ms on Intel ADL.	2023-11-20 16:52:17 -08:00
Yulong Wang	247ce21859	[js] optimize eslint config (#18460 ) ### Description optimize eslint config to: - set parserOptions.project to `true` to allow @typescript-eslint/parser to find the nearest tsconfig.json file to that source file. This helps to avoid parsing extra files, may helps with: - reduce the possibility of seeing OOM or stackoverflow with "npm run lint" - faster processing - enforce rule "no-underscore-dangle" with a list of exceptions.	2023-11-20 12:00:56 -08:00
Yulong Wang	34c5424456	[js] update a few packages (#18499 ) ### Description [js] update a few packages - update semver - update reference of onnx_proto to local folder in order to upgrade protobufjs@7.2.4 Resolve AB#18513	2023-11-17 22:40:51 -08:00
Arthur Islamov	fac3e33da5	[js/web] JSEP Attention & MultiHeadAttention (#17742 ) ### Description This is a narrow implementation of Attention/MultiHeadAttention as it does not support: a. inputs 5-7 for MHA b. packed QKV/KV c. past/present d. attention mask But it works well for StableDiffusion and can be extended later. It reduces VRAM usage as it combines many ops into few I've updated demo here https://islamov.ai/stable-diffusion-webgpu/ it takes ~13sec for 1 image with 20 steps on RTX3090Ti and about 25s on M1 Pro VRAM usage is about 8gb if you don't use img2img Going to focus on SDXL now --------- Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com> Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2023-11-17 12:23:52 -08:00
satyajandhyala	b291b20fa0	[JS/Web]Added uniforms support to Slice op. (#18422 ) ### Description Support uniforms in Slice op ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Improve ferformance	2023-11-16 09:44:13 -08:00
Yulong Wang	586f06f5a1	[js/web] set noUnusedParameters to true and fix a few bugs (#18404 ) ### Description - set tsconfig "noUnusedParameters" to `true` and fix a few bugs discovered by typescript. how unused parameter is fixed: - for most code (webgl), add underscore as prefix, which is the standard ignore pattern for typescript check. - remove unused parameter from function and modify corresponding function calls (jsep) - fix a bug in ArgMinMax: this 2 operators do not have more than one input(s) so the `createArgMinMaxAttributesFromInputs()` is removed. - add proxy main.ts into typescript check and fix a bug in parameter passing - fixed `run()` function call and add typecheck fix (hack)	2023-11-15 09:16:29 -08:00
dependabot[bot]	5aeed62630	Bump axios from 1.3.4 to 1.6.1 in /js/node (#18400 ) Bumps [axios](https://github.com/axios/axios) from 1.3.4 to 1.6.1. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/axios/axios/releases">axios's releases</a>.</em></p> <blockquote> <h2>Release v1.6.1</h2> <h2>Release notes:</h2> <h3>Bug Fixes</h3> <ul> <li><strong>formdata:</strong> fixed content-type header normalization for non-standard browser environments; (<a href="https://redirect.github.com/axios/axios/issues/6056">#6056</a>) (<a href="`dd465ab22b`">dd465ab</a>)</li> <li><strong>platform:</strong> fixed emulated browser detection in node.js environment; (<a href="https://redirect.github.com/axios/axios/issues/6055">#6055</a>) (<a href="`3dc8369e50`">3dc8369</a>)</li> </ul> <h3>Contributors to this release</h3> <ul> <li><!-- raw HTML omitted --> <a href="https://github.com/DigitalBrainJS" title="+432/-65 ([#6059](https://github.com/axios/axios/issues/6059) [#6056](https://github.com/axios/axios/issues/6056) [#6055](https://github.com/axios/axios/issues/6055) )">Dmitriy Mozgovoy</a></li> <li><!-- raw HTML omitted --> <a href="https://github.com/meyfa" title="+5/-2 ([#5835](https://github.com/axios/axios/issues/5835) )">Fabian Meyer</a></li> </ul> <h2>Release v1.6.0</h2> <h2>Release notes:</h2> <h3>Bug Fixes</h3> <ul> <li><strong>CSRF:</strong> fixed CSRF vulnerability CVE-2023-45857 (<a href="https://redirect.github.com/axios/axios/issues/6028">#6028</a>) (<a href="`96ee232bd3`">96ee232</a>)</li> <li><strong>dns:</strong> fixed lookup function decorator to work properly in node v20; (<a href="https://redirect.github.com/axios/axios/issues/6011">#6011</a>) (<a href="`5aaff532a6`">5aaff53</a>)</li> <li><strong>types:</strong> fix AxiosHeaders types; (<a href="https://redirect.github.com/axios/axios/issues/5931">#5931</a>) (<a href="`a1c8ad008b`">a1c8ad0</a>)</li> </ul> <h3>PRs</h3> <ul> <li>CVE 2023 45857 ( <a href="https://api.github.com/repos/axios/axios/pulls/6028">#6028</a> )</li> </ul> <pre><code> ⚠️ Critical vulnerability fix. See https://security.snyk.io/vuln/SNYK-JS-AXIOS-6032459 </code></pre> <h3>Contributors to this release</h3> <ul> <li><!-- raw HTML omitted --> <a href="https://github.com/DigitalBrainJS" title="+449/-114 ([#6032](https://github.com/axios/axios/issues/6032) [#6021](https://github.com/axios/axios/issues/6021) [#6011](https://github.com/axios/axios/issues/6011) [#5932](https://github.com/axios/axios/issues/5932) [#5931](https://github.com/axios/axios/issues/5931) )">Dmitriy Mozgovoy</a></li> <li><!-- raw HTML omitted --> <a href="https://github.com/valentin-panov" title="+4/-4 ([#6028](https://github.com/axios/axios/issues/6028) )">Valentin Panov</a></li> <li><!-- raw HTML omitted --> <a href="https://github.com/therealrinku" title="+1/-1 ([#5889](https://github.com/axios/axios/issues/5889) )">Rinku Chaudhari</a></li> </ul> <h2>Release v1.5.1</h2> <h2>Release notes:</h2> <h3>Bug Fixes</h3> <ul> <li><strong>adapters:</strong> improved adapters loading logic to have clear error messages; (<a href="https://redirect.github.com/axios/axios/issues/5919">#5919</a>) (<a href="`e4107797a7`">e410779</a>)</li> <li><strong>formdata:</strong> fixed automatic addition of the <code>Content-Type</code> header for FormData in non-browser environments; (<a href="https://redirect.github.com/axios/axios/issues/5917">#5917</a>) (<a href="`bc9af51b18`">bc9af51</a>)</li> <li><strong>headers:</strong> allow <code>content-encoding</code> header to handle case-insensitive values (<a href="https://redirect.github.com/axios/axios/issues/5890">#5890</a>) (<a href="https://redirect.github.com/axios/axios/issues/5892">#5892</a>) (<a href="`4c89f25196`">4c89f25</a>)</li> <li><strong>types:</strong> removed duplicated code (<a href="`9e6205630e`">9e62056</a>)</li> </ul> <h3>Contributors to this release</h3> <ul> <li><!-- raw HTML omitted --> <a href="https://github.com/DigitalBrainJS" title="+89/-18 ([#5919](https://github.com/axios/axios/issues/5919) [#5917](https://github.com/axios/axios/issues/5917) )">Dmitriy Mozgovoy</a></li> <li><!-- raw HTML omitted --> <a href="https://github.com/DavidJDallas" title="+11/-5 ()">David Dallas</a></li> <li><!-- raw HTML omitted --> <a href="https://github.com/fb-sean" title="+2/-8 ()">Sean Sattler</a></li> <li><!-- raw HTML omitted --> <a href="https://github.com/0o001" title="+4/-4 ()">Mustafa Ateş Uzun</a></li> <li><!-- raw HTML omitted --> <a href="https://github.com/sfc-gh-pmotacki" title="+2/-1 ([#5892](https://github.com/axios/axios/issues/5892) )">Przemyslaw Motacki</a></li> <li><!-- raw HTML omitted --> <a href="https://github.com/Cadienvan" title="+1/-1 ()">Michael Di Prisco</a></li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/axios/axios/blob/v1.x/CHANGELOG.md">axios's changelog</a>.</em></p> <blockquote> <h2><a href="https://github.com/axios/axios/compare/v1.6.0...v1.6.1">1.6.1</a> (2023-11-08)</h2> <h3>Bug Fixes</h3> <ul> <li><strong>formdata:</strong> fixed content-type header normalization for non-standard browser environments; (<a href="https://redirect.github.com/axios/axios/issues/6056">#6056</a>) (<a href="`dd465ab22b`">dd465ab</a>)</li> <li><strong>platform:</strong> fixed emulated browser detection in node.js environment; (<a href="https://redirect.github.com/axios/axios/issues/6055">#6055</a>) (<a href="`3dc8369e50`">3dc8369</a>)</li> </ul> <h3>Contributors to this release</h3> <ul> <li><!-- raw HTML omitted --> <a href="https://github.com/DigitalBrainJS" title="+432/-65 ([#6059](https://github.com/axios/axios/issues/6059) [#6056](https://github.com/axios/axios/issues/6056) [#6055](https://github.com/axios/axios/issues/6055) )">Dmitriy Mozgovoy</a></li> <li><!-- raw HTML omitted --> <a href="https://github.com/meyfa" title="+5/-2 ([#5835](https://github.com/axios/axios/issues/5835) )">Fabian Meyer</a></li> </ul> <h1><a href="https://github.com/axios/axios/compare/v1.5.1...v1.6.0">1.6.0</a> (2023-10-26)</h1> <h3>Bug Fixes</h3> <ul> <li><strong>CSRF:</strong> fixed CSRF vulnerability CVE-2023-45857 (<a href="https://redirect.github.com/axios/axios/issues/6028">#6028</a>) (<a href="`96ee232bd3`">96ee232</a>)</li> <li><strong>dns:</strong> fixed lookup function decorator to work properly in node v20; (<a href="https://redirect.github.com/axios/axios/issues/6011">#6011</a>) (<a href="`5aaff532a6`">5aaff53</a>)</li> <li><strong>types:</strong> fix AxiosHeaders types; (<a href="https://redirect.github.com/axios/axios/issues/5931">#5931</a>) (<a href="`a1c8ad008b`">a1c8ad0</a>)</li> </ul> <h3>PRs</h3> <ul> <li>CVE 2023 45857 ( <a href="https://api.github.com/repos/axios/axios/pulls/6028">#6028</a> )</li> </ul> <pre><code> ⚠️ Critical vulnerability fix. See https://security.snyk.io/vuln/SNYK-JS-AXIOS-6032459 </code></pre> <h3>Contributors to this release</h3> <ul> <li><!-- raw HTML omitted --> <a href="https://github.com/DigitalBrainJS" title="+449/-114 ([#6032](https://github.com/axios/axios/issues/6032) [#6021](https://github.com/axios/axios/issues/6021) [#6011](https://github.com/axios/axios/issues/6011) [#5932](https://github.com/axios/axios/issues/5932) [#5931](https://github.com/axios/axios/issues/5931) )">Dmitriy Mozgovoy</a></li> <li><!-- raw HTML omitted --> <a href="https://github.com/valentin-panov" title="+4/-4 ([#6028](https://github.com/axios/axios/issues/6028) )">Valentin Panov</a></li> <li><!-- raw HTML omitted --> <a href="https://github.com/therealrinku" title="+1/-1 ([#5889](https://github.com/axios/axios/issues/5889) )">Rinku Chaudhari</a></li> </ul> <h2><a href="https://github.com/axios/axios/compare/v1.5.0...v1.5.1">1.5.1</a> (2023-09-26)</h2> <h3>Bug Fixes</h3> <ul> <li><strong>adapters:</strong> improved adapters loading logic to have clear error messages; (<a href="https://redirect.github.com/axios/axios/issues/5919">#5919</a>) (<a href="`e4107797a7`">e410779</a>)</li> <li><strong>formdata:</strong> fixed automatic addition of the <code>Content-Type</code> header for FormData in non-browser environments; (<a href="https://redirect.github.com/axios/axios/issues/5917">#5917</a>) (<a href="`bc9af51b18`">bc9af51</a>)</li> <li><strong>headers:</strong> allow <code>content-encoding</code> header to handle case-insensitive values (<a href="https://redirect.github.com/axios/axios/issues/5890">#5890</a>) (<a href="https://redirect.github.com/axios/axios/issues/5892">#5892</a>) (<a href="`4c89f25196`">4c89f25</a>)</li> <li><strong>types:</strong> removed duplicated code (<a href="`9e6205630e`">9e62056</a>)</li> </ul> <h3>Contributors to this release</h3> <ul> <li><!-- raw HTML omitted --> <a href="https://github.com/DigitalBrainJS" title="+89/-18 ([#5919](https://github.com/axios/axios/issues/5919) [#5917](https://github.com/axios/axios/issues/5917) )">Dmitriy Mozgovoy</a></li> <li><!-- raw HTML omitted --> <a href="https://github.com/DavidJDallas" title="+11/-5 ()">David Dallas</a></li> <li><!-- raw HTML omitted --> <a href="https://github.com/fb-sean" title="+2/-8 ()">Sean Sattler</a></li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`f6d2cf9763`"><code>f6d2cf9</code></a> chore(ci): fix publish action content permission; (<a href="https://redirect.github.com/axios/axios/issues/6061">#6061</a>)</li> <li><a href="`a22f4b918a`"><code>a22f4b9</code></a> chore(release): v1.6.1 (<a href="https://redirect.github.com/axios/axios/issues/6060">#6060</a>)</li> <li><a href="`cb8bb2beb2`"><code>cb8bb2b</code></a> chore(ci): Publish to NPM with provenance (<a href="https://redirect.github.com/axios/axios/issues/5835">#5835</a>)</li> <li><a href="`37cbf9214a`"><code>37cbf92</code></a> chore(ci): added labeling and notification for published PRs; (<a href="https://redirect.github.com/axios/axios/issues/6059">#6059</a>)</li> <li><a href="`dd465ab22b`"><code>dd465ab</code></a> fix(formdata): fixed content-type header normalization for non-standard brows...</li> <li><a href="`3dc8369e50`"><code>3dc8369</code></a> fix(platform): fixed emulated browser detection in node.js environment; (<a href="https://redirect.github.com/axios/axios/issues/6055">#6055</a>)</li> <li><a href="`f7adacdbaa`"><code>f7adacd</code></a> chore(release): v1.6.0 (<a href="https://redirect.github.com/axios/axios/issues/6031">#6031</a>)</li> <li><a href="`9917e67cbb`"><code>9917e67</code></a> chore(ci): fix release-it arg; (<a href="https://redirect.github.com/axios/axios/issues/6032">#6032</a>)</li> <li><a href="`96ee232bd3`"><code>96ee232</code></a> fix(CSRF): fixed CSRF vulnerability CVE-2023-45857 (<a href="https://redirect.github.com/axios/axios/issues/6028">#6028</a>)</li> <li><a href="`7d45ab2e2a`"><code>7d45ab2</code></a> chore(tests): fixed tests to pass in node v19 and v20 with <code>keep-alive</code> enabl...</li> <li>Additional commits viewable in <a href="https://github.com/axios/axios/compare/v1.3.4...v1.6.1">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=axios&package-manager=npm_and_yarn&previous-version=1.3.4&new-version=1.6.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-11-14 00:38:00 -08:00
Xu Xing	949ac4b7ce	[js/webgpu] Support uniforms for gather (#18312 )	2023-11-13 11:24:34 -08:00
Wanming Lin	73ed34ac4b	[WebNN EP] Support numThreads option for WebNN CPU device (#18054 )	2023-11-12 16:45:10 -08:00
Xu Xing	0c8c0014f6	[js/webgpu] Use builtin num_workgroups to fix shader key conflict (#18387 ) This fixes conformance failure of tinyyolov2-8 and potential shader key conflict issues.	2023-11-10 17:37:45 -08:00
Yulong Wang	6b0c97b43f	[js/web] fix typescript type check (#18343 ) ### Description This PR fixes the TypeScript type check. Previously, when I use esbuild to replace webpack (#17745), typescript typecheck was disabled. This causes a few TypeScript type error checked in into the code base. This PR fixes the followings: - Use "Node16" as default "module" value in tsconfig.json, because in TypeScript v5, `(module == "ES2015" && moduleResolution == "Node16")` is an invalid combination. - Set `noUnusedParameters` to true as default. in web override it to false because multiple code need to be updated ( a following-up PR will do this ) - set correct project file for 'web/lib/*/.ts' for ESLint (otherwise WebGPU types are not populated correctly) - fix type error in file js/web/lib/wasm/jsep/webgpu/program-manager.ts - upgrade "@webgpu/types" to latest to fix type error in file js/web/lib/wasm/jsep/backend-webgpu.ts - add package script "prebuild" for web to run tsc type check - add type check in CI yml file	2023-11-10 16:03:38 -08:00
Xu Xing	8dba6efd61	[js/webgpu] Add uniforms support to concat op (#18238 )	2023-11-10 13:46:03 -08:00
Jiajia Qin	28c23aed04	[js/webgpu] Fix conv2d with activation (#18388 ) ### Description Fix #18297 With PR #17766, conv2d activation in mobilenetv2-12 will not be empty. However, activation is not supported yet in [biasActivationSnippet](https://github.com/microsoft/onnxruntime/blob/main/js/web/lib/wasm/jsep/webgpu/ops/3rd-party/activation_util.ts#L48C14-L48C36). This PR makes all places unify to use [getActivationSnippet](https://github.com/microsoft/onnxruntime/blob/main/js/web/lib/wasm/jsep/webgpu/ops/fuse-utils.ts#L13) to fix this issue.	2023-11-10 12:54:35 -08:00
Xu Xing	dd1bb760eb	[js/webgpu] Fix scalar uniform (#18318 )	2023-11-10 10:12:22 -08:00
Xu Xing	829d802337	[js/webgpu] Support uniform for softmax (#18345 )	2023-11-09 11:19:23 -08:00
Guenther Schmuelling	25fbc2b0ab	fix fused relu activation (#18303 )	2023-11-09 08:18:21 -08:00
Yulong Wang	10df847baf	[js] fix linter out-of-memory issue (#18307 ) ### Description fix linter out-of-memory issue by ignoring file pattern 'test/data/'.	2023-11-07 17:12:22 -08:00
Jiajia Qin	606356d0b1	[js/webgpu] Simplify the Resize shader when noScale is true (#18321 ) ### Description For Resize, when `noScale` is true, the shader can become very simple, which is not related with `attributes.mode` anymore. So we should remove those parts of shader code for simplification. This PR can also fix #18311 since the `noScale` are all true in that model. However, #18311 also exposes that the Resize implementation for `linear` mode has bug. It seems that the currently implementation always treat the input as either 2d or 4d tensor, however, the actual input is 3d tensor, that's why the shader compilation is failed. We may need to fix it in a separate PR.	2023-11-07 12:54:20 -08:00
satyajandhyala	a16d528399	[JS/Web] Added Uniforms support to binary ops. (#18260 ) ### Description Added Uniform support to binary ops ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> To improve performance	2023-11-07 08:41:52 -08:00
satyajandhyala	e207060ac9	[JS/Web] Added Unifroms support to unary ops. (#18223 ) ### Description Added uniforms support to unary ops. ### Motivation and Context Improve performance	2023-11-03 09:30:54 -07:00
Scott McKay	4f2096be38	Update XNNPACK to latest version (#18038 ) ### Description <!-- Describe your changes. --> Update XNNPACK to latest version - adds fp16 kernels and various other improvements - requires pthreadpool update as well Most code updates in the XNNPACK EP are to adjust to the new XNNPACK API - 'setup' is split into 'reshape' and 'setup' - some ops use a workspace buffer - copied workspace allocation from XNNPACK unit test code - some suffixes changed Added wrapper for XNNPACK caches to base XNNPACK EP kernel - simplifies usage - XNNPACK split out the code and weights caches, but the code cache isn't currently usable via the public API - we could use the internal types if we think it's required for performance reasons. non-trivial though as we'd need to propagate ifdef values from the XNNPACK build up to the ORT build. - using XNNPACK internals would also mean we would not be able to support using a pre-build XNNPACK package - not an issue currently Fixed opset registration for internal NHWC domain - was not being tied to the ONNX version, so nodes inserted by layout transformation had the incorrect opset - a number of other places needed updating once this issue was fixed Remove support for NCHW Resize from XNNPACK EP so it's NHWC only - we only supported NCHW for fp32, - doing so adds complexity in multiple places (XNNPACK EP kernel implementation, layout transformation and transpose optimization) - unclear if that complexity provides any benefit. can add back if required by production scenario ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> We're looking at enabling fp16 support for CoreML and NNAPI. If we do that we need a good fallback story if the CPU EP will be used. The XNNPACK fp16 kernels will hopefully provide that. NOTE: This PR doesn't add fp16 support to the XNNPACK EP kernels. That can be done as required in separate EPs and should be relatively simple to do.	2023-11-03 09:04:28 -07:00
xhcao	8d48d3e9cc	[js/web] optimize reduce related operators (#17957 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-11-02 12:51:48 -07:00
Caroline Zhu	e3b043ba17	[js/web/training] runTrainStep implementation (#18006 ) ### Description * based on design document & following InferenceSession's run implementation, implemented TrainingSession.runTrainStep ### Motivation and Context * Adding web bindings for training #### Related work * #16521 allowed for training artifacts to be built * #17333 added interfaces for training * #17474 allowed for training package to be built + added training backend to web package * #17891 implementation for createTrainingSession on the TypeScript side [SHOULD BE MERGED IN BEFORE THIS PR] --------- Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com> Co-authored-by: Ashwini Khade <askhade@microsoft.com>	2023-11-02 08:32:50 -07:00
satyajandhyala	a2e9ba72d5	[JS/Web]Added FusedConv. (#17766 ) ### Description Added FusedConv and FusedConvTranspose ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Improve performance	2023-11-01 15:34:51 -07:00
Jiajia Qin	785e2b1eae	[js/webgpu] Optimize softmax by vector (#18153 ) ### Description This PR enables `softmax` outputs max supported components instead of scalar for each thread. Softmax with input[0]: [12,4096,4096] becomes 47.86 ms from 55.11 ms	2023-10-30 16:05:35 -07:00
Yulong Wang	9bba990871	[js/web] fix a few package consuming problems (#18109 ) ### Description This PR tries to fix a part of the NPM package consuming problems for onnxruntime-web (ES module) as described in #10913: - reduce the package size to fit the 150MB restriction in jsdelivr, by removing dev build targets for uncommon exports - add default export to support `import ort from 'onnxruntime-web';` (currently only support `import * as ort from 'onnxruntime-web';`	2023-10-30 08:11:43 -07:00
Yang Gu	52f4968359	[js/webgpu] Change timestamp-query-in-passes to timestamp-query (#18108 ) Timestamp-query has a broader support than timestamp-query-in-passes on all the platforms, including macOS. Note that to enable timestamp-query, you still need to add switch "--enable-dawn-features=allow_unsafe_apis" to Chrome. By default, the lowest 16 bits are masked with 0 (at a granularity about 0.1ms) for privacy. To get the highest precision, you need to add another switch "--enable-webgpu-developer-features".	2023-10-26 16:33:03 -07:00
Caroline Zhu	64de71c5e2	[js/web/training] Add CreateTrainingSession (#17891 ) ### Description * Adds TrainingSession.create() functionality following the web bindings for training design doc * Added 2 new training APIs to wasm/api.h: * OrtTrainingGetInputOutputName * OrtTrainingGetInputOutputCount * Moved isOrtEnvInitialized boolean to the wasm-core-impl and added a method that references it ### Motivation and Context * Adding web bindings for training #### Related work * #16521 allowed for training artifacts to be built * #17333 added interfaces for training * #17474 allows for training package to be built + adds training backend to web package [MUST BE MERGED IN BEFORE THIS ONE] --------- Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com> Co-authored-by: Ashwini Khade <askhade@microsoft.com>	2023-10-26 09:22:10 -07:00
satyajandhyala	f3cfe08c42	[JS/Web] Enabled 1d spacial input to GlobalAveragePool (#17973 ) ### Description Enable one-dim special input to GlobalAveragePoll input ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Currently only 2D input is supported.	2023-10-23 16:02:50 -07:00
Jiajia Qin	8a12b2cea6	[js/webgpu] Fix the transpose error when dims > 4D (#18027 ) ### Description <!-- Describe your changes. --> Currently, the uniform support has bugs when dims rank is larger than 4. See https://github.com/microsoft/onnxruntime/issues/17860 item 1. So this PR only enables shapes uniforms when shape rank is <= 4 for transpose. Otherwise, below compilation errors are thrown: ``` 1 error(s) generated while compiling the shader: :3:50 error: uniform storage requires that array elements are aligned to 16 bytes, but array element of type 'u32' has a stride of 4 bytes. Consider using a vector or struct as the element type instead. struct Uniforms { output_size:u32, a_shape:array<u32, 5>, a_strides:array<u32, 5>, output_shape:array<u32, 5>, output_strides:array<u32, 5> }; ^^^^^^^^^^^^^ :3:7 note: see layout of struct: /* align(4) size(84) / struct Uniforms { / offset( 0) align(4) size( 4) / output_size : u32; / offset( 4) align(4) size(20) / a_shape : array<u32, 5>; / offset(24) align(4) size(20) / a_strides : array<u32, 5>; / offset(44) align(4) size(20) / output_shape : array<u32, 5>; / offset(64) align(4) size(20) / output_strides : array<u32, 5>; / */ }; struct Uniforms { output_size:u32, a_shape:array<u32, 5>, a_strides:array<u32, 5>, output_shape:array<u32, 5>, output_strides:array<u32, 5> }; ^^^^^^ :4:42 note: 'Uniforms' used in address space 'uniform' here @group(0) @binding(2) var<uniform> uniforms: Uniforms; ^^^^^^^^ ```	2023-10-23 11:02:19 -07:00
liqun Fu	020824ed50	Update ONNX to 1.15.0rc1 (#17914 )	2023-10-20 15:08:25 -07:00
Arthur Islamov	22947109f2	[js/web] FP16 LayerNorm, InstanceNorm, SkipLayerNorm (#17630 ) ### Description This PR includes fixes for Norm operations to support FP16 and also some optimizations to use vec2/vec4 if possible	2023-10-18 10:47:41 -07:00
dependabot[bot]	f9694c5b97	Bump @babel/traverse from 7.18.5 to 7.23.2 in /js/react_native/e2e (#17963 )	2023-10-18 05:25:27 +00:00
dependabot[bot]	cf974f0905	Bump @babel/traverse from 7.18.2 to 7.23.2 in /js/react_native (#17962 )	2023-10-16 18:24:01 +00:00
Yulong Wang	ad817d0efa	[js/web] optimize tsc for web: split out "npm prepare" (#17955 ) ### Description optimize tsc for web: split out "npm prepare"	2023-10-16 09:04:54 -07:00
Caroline Zhu	c373a808a2	Add "glue" between training WASM artifacts and training web (#17474 ) ### Description * follows the packaging approach according to the design document * adds `ENABLE_TRAINING` boolean flag to `BUILD_DEFS` * modifies `package.json` to include training submodule * modifies build script to handle, validate, and minimize training WASM artifacts * adds the binding for the new backend with training enabled & the new training artifacts * adds training backend * edits `index.ts` to use training backend depending on `BUILD_DEFS` * edits `wasm-factory.ts` to use the training artifacts if necessary ### Motivation and Context * we are in the process of adding web bindings to enable training. * Adding the "glue" to allow onnxruntime-web to use the training WASM artifacts is required for this work. * Since BUILD_DEFS is defined and used at build time, I thought that it made sense to bundle the changes to building in the same PR. #### Related work * #16521 allowed for training artifacts to be built * #17333 must be merged in before this one --------- Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2023-10-12 11:16:56 -07:00
Yulong Wang	d532645bed	[js/webgpu] revise uniform support (#17871 ) ### Description <!-- Describe your changes. --> work for items (2) and (3) in #17860	2023-10-11 16:41:46 -07:00
Yulong Wang	a441a71e8e	[js/web] support different export format for ort-web (#17878 ) ### Description support different export format for ort-web.	2023-10-11 09:38:51 -07:00
Yulong Wang	5228332c9f	[js] upgrade JS shared dev dependencies (#17831 ) ### Description upgrade JS shared dev dependencies. - webpack: removed - eslint: upgrade to latest. - eslint config upgraded to compatible with latest version - typescript upgrade to v5 - update module "CommonJS" to "Node16" in tsconfig - update deprecated config "importsNotUsedAsValues" to "verbatimModuleSyntax" - remove webpack bundles in onnxruntime-common	2023-10-10 17:44:39 -07:00

... 2 3 4 5 6 ...

758 commits