onnxruntime/js/web/docs/webgpu-operators.md

## Operators Support Table

The following table shows ONNX
operators and the supported opset domain/versions in WebGPU EP by ONNX Runtime Web. For example,
`4-6, 8+` means ONNX Runtime Web currently support opset version 4 to 6, 8 and above.

*This file is automatically generated from the
def files via [this script](../script/generate-webgpu-operator-md.ts).
Do not modify directly.*

| Operator | Opset | Comments |
|:--------:|:-------------:|-----|
| Abs | ai.onnx(6-12,13+) |  |
| Acos | ai.onnx(7+) |  |
| Acosh | ai.onnx(9+) |  |
| Add | ai.onnx(7-12,13,14+) |  |
| ArgMax | ai.onnx(1-10,11-12,13+) |  |
| ArgMin | ai.onnx(1-10,11-12,13+) |  |
| Asin | ai.onnx(7+) |  |
| Asinh | ai.onnx(9+) |  |
| Atan | ai.onnx(7+) |  |
| Atanh | ai.onnx(9+) |  |
| AveragePool | ai.onnx(7-9,10,11+); com.ms.internal.nhwc(11+) | need perf optimization; need implementing activation |
| Ceil | ai.onnx(6-12,13+) |  |
| Clip | ai.onnx(6-10,11,12,13+) |  |
| Concat | ai.onnx(1-3,4-10,11-12,13+) |  |
| Conv | ai.onnx(1-10,11+); com.ms.internal.nhwc(11+) | need perf optimization; conv3d is not supported; need implementing activation |
| ConvTranspose | ai.onnx(1-10,11+); com.ms.internal.nhwc(11+) | need perf optimization; ConvTranspose3d is not supported; need implementing activation |
| Cos | ai.onnx(7+) |  |
| Cosh | ai.onnx(9+) |  |
| Div | ai.onnx(7-12,13,14+) |  |
| Elu | ai.onnx(6+) |  |
| Erf | ai.onnx(9-12,13+) |  |
| Exp | ai.onnx(6-12,13+) |  |
| Expand | ai.onnx(8-12,13+) |  |
| Flatten | ai.onnx(1-8,9-10,11-12,13+) |  |
| Floor | ai.onnx(6-12,13+) |  |
| Gather | ai.onnx(1-10,11-12,13+) |  |
| Gelu | com.microsoft(1+) |  |
| Gemm | ai.onnx(7-8,9-10,11+) |  |
| GlobalAveragePool | ai.onnx(1+); com.ms.internal.nhwc(1+) |  |
| GlobalMaxPool | ai.onnx(1+); com.ms.internal.nhwc(1+) |  |
| InstanceNormalization | ai.onnx(6+); com.ms.internal.nhwc(6+) |  |
| LayerNormalization | ai.onnx(17+) |  |
| LeakyRelu | ai.onnx(6-15,16+) |  |
| Log | ai.onnx(6-12,13+) |  |
| MatMul | ai.onnx(1-12,13+) |  |
| MaxPool | ai.onnx(1-7,8-9,10,11,12+); com.ms.internal.nhwc(11,12+) | need perf optimization; need implementing activation |
| MemcpyFromHost | ai.onnx(1+) |  |
| MemcpyToHost | ai.onnx(1+) |  |
| Mul | ai.onnx(7-12,13,14+) |  |
| Neg | ai.onnx(6-12,13+) |  |
| Pow | ai.onnx(7-11,12,13-14,15+) |  |
| Reciprocal | ai.onnx(6-12,13+) |  |
| ReduceL1 | ai.onnx(1-10,11-12,13-17,18+) |  |
| ReduceL2 | ai.onnx(1-10,11-12,13-17,18+) |  |
| ReduceLogSum | ai.onnx(1-10,11-12,13-17,18+) |  |
| ReduceLogSumExp | ai.onnx(1-10,11-12,13-17,18+) |  |
| ReduceMax | ai.onnx(1-10,11,12,13-17,18+) |  |
| ReduceMean | ai.onnx(1-10,11-12,13-17,18+) |  |
| ReduceMin | ai.onnx(1-10,11,12,13-17,18+) |  |
| ReduceProd | ai.onnx(1-10,11-12,13-17,18+) |  |
| ReduceSum | ai.onnx(1-10,11-12,13+) |  |
| ReduceSumSquare | ai.onnx(1-10,11-12,13-17,18+) |  |
| Relu | ai.onnx(6-12,13,14+) |  |
| Reshape | ai.onnx(5-12,13,14+) | no GPU kernel |
| Resize | ai.onnx(10,11-12,13-17,18,19+); com.ms.internal.nhwc(11-12,13-17,18,19+) | CoordinateTransformMode align_corners is not supported with downsampling |
| Shape | ai.onnx(1-12,13-14,15+) | no GPU kernel; an ORT warning is generated - need to fix |
| Sigmoid | ai.onnx(6-12,13+) |  |
| Sin | ai.onnx(7+) |  |
| Sinh | ai.onnx(9+) |  |
| Slice | ai.onnx(1-9,10,11-12,13+) |  |
| Softmax | ai.onnx(1-10,11-12,13+) |  |
| Split | ai.onnx(1,2-10,11-12,13-17,18+) |  |
| Sqrt | ai.onnx(6-12,13+) |  |
| Squeeze | ai.onnx(1-10,11-12,13+) |  |
| Sub | ai.onnx(7-12,13,14+) |  |
| Tan | ai.onnx(7+) |  |
| Tanh | ai.onnx(6-12,13+) |  |
| ThresholdedRelu | ai.onnx(10+) |  |
| Transpose | ai.onnx(1-12,13+) | need perf optimization |
| Unsqueeze | ai.onnx(1-10,11-12,13+) |  |
[js/webgpu] generate operator table for webgpu (#15954) ### Description [js/webgpu] generate operator table for webgpu 2023-05-20 19:20:41 +00:00			`## Operators Support Table`

			`The following table shows ONNX`
			`operators and the supported opset domain/versions in WebGPU EP by ONNX Runtime Web. For example,`
			`4-6, 8+` means ONNX Runtime Web currently support opset version 4 to 6, 8 and above.

			`*This file is automatically generated from the`
			`def files via [this script](../script/generate-webgpu-operator-md.ts).`
			`Do not modify directly.*`

			`\| Operator \| Opset \| Comments \|`
			`\|:--------:\|:-------------:\|-----\|`
			`\| Abs \| ai.onnx(6-12,13+) \| \|`
			`\| Acos \| ai.onnx(7+) \| \|`
			`\| Acosh \| ai.onnx(9+) \| \|`
			`\| Add \| ai.onnx(7-12,13,14+) \| \|`
js/webgpu: argmax,argmin,softmax support (#16882) argmax and argmin are similar to reduce. Eventually we need to add optimized flavors of the shader. softmax is optimized but only works on the last axis for now which should be the common use case. todo: enable more ut for argmax/argmin 2023-08-03 01:16:19 +00:00			`\| ArgMax \| ai.onnx(1-10,11-12,13+) \| \|`
			`\| ArgMin \| ai.onnx(1-10,11-12,13+) \| \|`
[js/webgpu] generate operator table for webgpu (#15954) ### Description [js/webgpu] generate operator table for webgpu 2023-05-20 19:20:41 +00:00			`\| Asin \| ai.onnx(7+) \| \|`
			`\| Asinh \| ai.onnx(9+) \| \|`
			`\| Atan \| ai.onnx(7+) \| \|`
			`\| Atanh \| ai.onnx(9+) \| \|`
			`\| AveragePool \| ai.onnx(7-9,10,11+); com.ms.internal.nhwc(11+) \| need perf optimization; need implementing activation \|`
			`\| Ceil \| ai.onnx(6-12,13+) \| \|`
			`\| Clip \| ai.onnx(6-10,11,12,13+) \| \|`
[Web/JS] Support WebGPU Concat operator (#16543) ### Description Add Concat operator ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> 2023-07-05 18:59:45 +00:00			`\| Concat \| ai.onnx(1-3,4-10,11-12,13+) \| \|`
[JS/WebGPU] Add Resize operator (#16680) ### Description Implemented Resize operator support in JSEP ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> 2023-07-31 16:35:06 +00:00			`\| Conv \| ai.onnx(1-10,11+); com.ms.internal.nhwc(11+) \| need perf optimization; conv3d is not supported; need implementing activation \|`
			`\| ConvTranspose \| ai.onnx(1-10,11+); com.ms.internal.nhwc(11+) \| need perf optimization; ConvTranspose3d is not supported; need implementing activation \|`
[js/webgpu] generate operator table for webgpu (#15954) ### Description [js/webgpu] generate operator table for webgpu 2023-05-20 19:20:41 +00:00			`\| Cos \| ai.onnx(7+) \| \|`
			`\| Cosh \| ai.onnx(9+) \| \|`
			`\| Div \| ai.onnx(7-12,13,14+) \| \|`
			`\| Elu \| ai.onnx(6+) \| \|`
			`\| Erf \| ai.onnx(9-12,13+) \| \|`
			`\| Exp \| ai.onnx(6-12,13+) \| \|`
[Web/JS] Added Expand operator support. (#16577) ### Description Added Expand operator support. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> 2023-07-11 16:38:16 +00:00			`\| Expand \| ai.onnx(8-12,13+) \| \|`
[JS/WebGPU] Added Flatten operator support. (#16860) ### Description Added Flatten operator support to JSEP. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> 2023-07-27 19:50:45 +00:00			`\| Flatten \| ai.onnx(1-8,9-10,11-12,13+) \| \|`
[js/webgpu] generate operator table for webgpu (#15954) ### Description [js/webgpu] generate operator table for webgpu 2023-05-20 19:20:41 +00:00			`\| Floor \| ai.onnx(6-12,13+) \| \|`
[js/web] JSEP Gather OP (#16855) ### Description Added Gather op that works with both i32 and i64 indices, assuming that values fall into i32 limit. The assumption is safe because it's not possible to allocate more than 2gb buffer for inputs. It treats all data from input tensor as u32, copying 1 or 2 elements for i64, u64 and double. --------- Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com> 2023-08-03 21:09:37 +00:00			`\| Gather \| ai.onnx(1-10,11-12,13+) \| \|`
[JS/Web] Added Gelu contrib operator support to JSEP (#16909) ### Description Added Gelu operator to JSEP ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> 2023-07-31 16:18:58 +00:00			`\| Gelu \| com.microsoft(1+) \| \|`
[js/webgpu] generate operator table for webgpu (#15954) ### Description [js/webgpu] generate operator table for webgpu 2023-05-20 19:20:41 +00:00			`\| Gemm \| ai.onnx(7-8,9-10,11+) \| \|`
			`\| GlobalAveragePool \| ai.onnx(1+); com.ms.internal.nhwc(1+) \| \|`
			`\| GlobalMaxPool \| ai.onnx(1+); com.ms.internal.nhwc(1+) \| \|`
[js/web] JSEP LayerNormalization and InstanceNormalizations kernels (#16830) ### Description Added two kernels for Layer and Instance norm Also added maximum limits for `maxBufferSize` when requesting GPU device as by default it's limited to 256mb and it fails allocating 600mb buffer while running fp32 StableDiffusion weights. ### Motivation and Context These two are used in StableDiffusion and many other networks 2023-08-08 16:09:37 +00:00			`\| InstanceNormalization \| ai.onnx(6+); com.ms.internal.nhwc(6+) \| \|`
			`\| LayerNormalization \| ai.onnx(17+) \| \|`
[js/webgpu] generate operator table for webgpu (#15954) ### Description [js/webgpu] generate operator table for webgpu 2023-05-20 19:20:41 +00:00			`\| LeakyRelu \| ai.onnx(6-15,16+) \| \|`
[JS/WebGPU] Support Log operator (#17045) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> 2023-08-15 01:04:12 +00:00			`\| Log \| ai.onnx(6-12,13+) \| \|`
[js/webgpu] generate operator table for webgpu (#15954) ### Description [js/webgpu] generate operator table for webgpu 2023-05-20 19:20:41 +00:00			`\| MatMul \| ai.onnx(1-12,13+) \| \|`
			`\| MaxPool \| ai.onnx(1-7,8-9,10,11,12+); com.ms.internal.nhwc(11,12+) \| need perf optimization; need implementing activation \|`
			`\| MemcpyFromHost \| ai.onnx(1+) \| \|`
			`\| MemcpyToHost \| ai.onnx(1+) \| \|`
			`\| Mul \| ai.onnx(7-12,13,14+) \| \|`
			`\| Neg \| ai.onnx(6-12,13+) \| \|`
			`\| Pow \| ai.onnx(7-11,12,13-14,15+) \| \|`
			`\| Reciprocal \| ai.onnx(6-12,13+) \| \|`
[js/web] Added Reduce operators support (#16122) ### Description Added support for ReduceL1, ReduceL2, ReduceMean, ReduceMin, ReduceMax, ReduceSum, ReduceLogSum, ReduceLogSumExp, ReduceProd and ReduceSquareSum. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Satya Jandhyala <sajandhy@microsoft.com> Co-authored-by: guschmue <guschmue@microsoft.com> 2023-06-12 14:46:27 +00:00			`\| ReduceL1 \| ai.onnx(1-10,11-12,13-17,18+) \| \|`
			`\| ReduceL2 \| ai.onnx(1-10,11-12,13-17,18+) \| \|`
			`\| ReduceLogSum \| ai.onnx(1-10,11-12,13-17,18+) \| \|`
			`\| ReduceLogSumExp \| ai.onnx(1-10,11-12,13-17,18+) \| \|`
			`\| ReduceMax \| ai.onnx(1-10,11,12,13-17,18+) \| \|`
			`\| ReduceMean \| ai.onnx(1-10,11-12,13-17,18+) \| \|`
			`\| ReduceMin \| ai.onnx(1-10,11,12,13-17,18+) \| \|`
			`\| ReduceProd \| ai.onnx(1-10,11-12,13-17,18+) \| \|`
			`\| ReduceSum \| ai.onnx(1-10,11-12,13+) \| \|`
			`\| ReduceSumSquare \| ai.onnx(1-10,11-12,13-17,18+) \| \|`
[js/webgpu] generate operator table for webgpu (#15954) ### Description [js/webgpu] generate operator table for webgpu 2023-05-20 19:20:41 +00:00			`\| Relu \| ai.onnx(6-12,13,14+) \| \|`
			`\| Reshape \| ai.onnx(5-12,13,14+) \| no GPU kernel \|`
[JS/Web] Fix Resize kMSInternalNHWCDomain (#17023) ### Description Fix some Resize failing tests. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com> 2023-08-10 16:14:43 +00:00			`\| Resize \| ai.onnx(10,11-12,13-17,18,19+); com.ms.internal.nhwc(11-12,13-17,18,19+) \| CoordinateTransformMode align_corners is not supported with downsampling \|`
[js/webgpu] generate operator table for webgpu (#15954) ### Description [js/webgpu] generate operator table for webgpu 2023-05-20 19:20:41 +00:00			`\| Shape \| ai.onnx(1-12,13-14,15+) \| no GPU kernel; an ORT warning is generated - need to fix \|`
			`\| Sigmoid \| ai.onnx(6-12,13+) \| \|`
			`\| Sin \| ai.onnx(7+) \| \|`
			`\| Sinh \| ai.onnx(9+) \| \|`
[Web/JS] Added Slice operator in JSEP. (#16811) ### Description Added Slice operator support to JSEP. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> 2023-07-25 21:19:20 +00:00			`\| Slice \| ai.onnx(1-9,10,11-12,13+) \| \|`
js/webgpu: argmax,argmin,softmax support (#16882) argmax and argmin are similar to reduce. Eventually we need to add optimized flavors of the shader. softmax is optimized but only works on the last axis for now which should be the common use case. todo: enable more ut for argmax/argmin 2023-08-03 01:16:19 +00:00			`\| Softmax \| ai.onnx(1-10,11-12,13+) \| \|`
[Web/JS] Added Split operator support. (#16567) ### Description Added WeGPU/JSEP Split operator support. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> 2023-07-07 19:16:10 +00:00			`\| Split \| ai.onnx(1,2-10,11-12,13-17,18+) \| \|`
[js/webgpu] generate operator table for webgpu (#15954) ### Description [js/webgpu] generate operator table for webgpu 2023-05-20 19:20:41 +00:00			`\| Sqrt \| ai.onnx(6-12,13+) \| \|`
[JS/WebGPU] Squeeze operator implementation (#16024) ### Description This PR adds an implementation of the `Squeeze` operator to WebGPU JSEP. The implementation follows the [operator schema](https://github.com/onnx/onnx/blob/main/docs/Operators.md#Squeeze) and allows one or two inputs. ### How was it tested 1. I created two models. Without `axes`: ```Python import onnx.helper node = onnx.helper.make_node( "Squeeze", inputs=["T"], outputs=["y"], ) graph = onnx.helper.make_graph([node], "test", [onnx.helper.make_tensor_value_info("T", 1, [3, 1, 4, 5])], [onnx.helper.make_tensor_value_info("y", 1, [3, 4, 5])]) onnx.save(onnx.helper.make_model(graph), "squeeze.onnx") ``` And with `axes`: ```Python import onnx.helper node = onnx.helper.make_node( "Squeeze", inputs=["T", "axes"], outputs=["y"], ) graph = onnx.helper.make_graph([node], "test", [onnx.helper.make_tensor_value_info("T", 1, [3, 1, 4, 5]), onnx.helper.make_tensor_value_info("axes", 7, [1])], [onnx.helper.make_tensor_value_info("y", 1, [3, 4, 5])]) onnx.save(onnx.helper.make_model(graph), "squeeze-dim.onnx") ``` 2. I compiled the runtime using @fs-eire's [instructions](https://gist.github.com/fs-eire/a55b2c7e10a6864b9602c279b8b75dce). 3. I ran the test models in the browser using this minimal setup: ```HTML <html> <script src=".\dist\ort.webgpu.min.js"></script> <script> async function run() { const session = await ort.InferenceSession.create('squeeze-dim.onnx', {executionProviders: ['webgpu']}); console.log(session); const input = new ort.Tensor('float32', new Float32Array(60), [3, 1, 4, 5]); const dim = new ort.Tensor('int64', [-3n], [1]); const output = await session.run({ "T": input, "axes": dim }); console.log(output); } run(); </script> </html> ``` ### Motivation and Context Improve operator coverage for WebGPU JSEP. 2023-05-26 22:53:05 +00:00			`\| Squeeze \| ai.onnx(1-10,11-12,13+) \| \|`
[js/webgpu] generate operator table for webgpu (#15954) ### Description [js/webgpu] generate operator table for webgpu 2023-05-20 19:20:41 +00:00			`\| Sub \| ai.onnx(7-12,13,14+) \| \|`
			`\| Tan \| ai.onnx(7+) \| \|`
[js/webgpu] Add C++ registration for operator Tanh in JSEP (#17124) add webgpu/tanh Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com> 2023-08-12 18:43:39 +00:00			`\| Tanh \| ai.onnx(6-12,13+) \| \|`
[js/webgpu] generate operator table for webgpu (#15954) ### Description [js/webgpu] generate operator table for webgpu 2023-05-20 19:20:41 +00:00			`\| ThresholdedRelu \| ai.onnx(10+) \| \|`
			`\| Transpose \| ai.onnx(1-12,13+) \| need perf optimization \|`
[JS/WebGPU] Unsqueeze operator implementation (#16138) ### Description This PR adds an implementation of the Squeeze operator to WebGPU JSEP. The implementation follows the [operator schema](https://github.com/onnx/onnx/blob/main/docs/Operators.md#Unsqueeze). To implement the `Unsqueeze` operator in the same fashion as the `Squeeze`, I added the `ComputeOutputShape()` method to the `UnsqueezeBase` class and made some slight modifications. Please let me know if it is a bad idea and if I should move this method to the JS implementation. I also uncommented test case lines in the `suite-test-list.jsonc` file for both Squeeze and Unsqueeze operators following @hariharans29's [comment](https://github.com/microsoft/onnxruntime/pull/16024#issuecomment-1565113633). ### How was it tested 1. I created a model with only one operator: ```Python import onnx.helper node = onnx.helper.make_node( "Unsqueeze", inputs=["T", "axes"], outputs=["y"], ) graph = onnx.helper.make_graph([node], "test", [onnx.helper.make_tensor_value_info("T", 1, [3, 4, 5]), onnx.helper.make_tensor_value_info("axes", 7, [2])], [onnx.helper.make_tensor_value_info("y", 1, [3, 1, 4, 5, 1])]) onnx.save(onnx.helper.make_model(graph), "unsqueeze.onnx") ``` 2. I compiled the runtime using @fs-eire's [instructions](https://gist.github.com/fs-eire/a55b2c7e10a6864b9602c279b8b75dce). 3. I ran the test models in the browser using this minimal setup: ```HTML <html> <script src=".\dist\ort.webgpu.min.js"></script> <script> async function run() { const session = await ort.InferenceSession.create('unsqueeze.onnx', {executionProviders: ['webgpu']}); console.log(session); const input = new ort.Tensor('float32', new Float32Array(60), [3, 4, 5]); const dim = new ort.Tensor('int64', [1n, 4n], [2]); const output = await session.run({ "T": input, "axes": dim }); console.log(output); } run(); </script> </html> ``` ### Motivation and Context Improve operator coverage for WebGPU JSEP. 2023-06-01 19:23:02 +00:00			`\| Unsqueeze \| ai.onnx(1-10,11-12,13+) \| \|`