onnxruntime/js/web/docs/webgpu-operators.md
Scott McKay 4f2096be38
Update XNNPACK to latest version (#18038)
### Description
<!-- Describe your changes. -->
Update XNNPACK to latest version
- adds fp16 kernels and various other improvements
- requires pthreadpool update as well

Most code updates in the XNNPACK EP are to adjust to the new XNNPACK API
- 'setup' is split into 'reshape' and 'setup'
-  some ops use a workspace buffer
   -  copied workspace allocation from XNNPACK unit test code
- some suffixes changed 

Added wrapper for XNNPACK caches to base XNNPACK EP kernel
- simplifies usage
- XNNPACK split out the code and weights caches, but the code cache
isn't currently usable via the public API
- we could use the internal types if we think it's required for
performance reasons. non-trivial though as we'd need to propagate ifdef
values from the XNNPACK build up to the ORT build.
- using XNNPACK internals would also mean we would not be able to
support using a pre-build XNNPACK package
    - not an issue currently
  
Fixed opset registration for internal NHWC domain
- was not being tied to the ONNX version, so nodes inserted by layout
transformation had the incorrect opset
- a number of other places needed updating once this issue was fixed

Remove support for NCHW Resize from XNNPACK EP so it's NHWC only
- we only supported NCHW for fp32,
- doing so adds complexity in multiple places (XNNPACK EP kernel
implementation, layout transformation and transpose optimization)
- unclear if that complexity provides any benefit. can add back if
required by production scenario

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
We're looking at enabling fp16 support for CoreML and NNAPI. If we do
that we need a good fallback story if the CPU EP will be used. The
XNNPACK fp16 kernels will hopefully provide that.

NOTE: This PR doesn't add fp16 support to the XNNPACK EP kernels. That
can be done as required in separate EPs and should be relatively simple
to do.
2023-11-03 09:04:28 -07:00

4.3 KiB

Operators Support Table

The following table shows ONNX operators and the supported opset domain/versions in WebGPU EP by ONNX Runtime Web. For example, 4-6, 8+ means ONNX Runtime Web currently support opset version 4 to 6, 8 and above.

This file is automatically generated from the def files via this script. Do not modify directly.

Operator Opset Comments
Abs ai.onnx(6-12,13+)
Acos ai.onnx(7+)
Acosh ai.onnx(9+)
Add ai.onnx(7-12,13,14+)
ArgMax ai.onnx(1-10,11-12,13+)
ArgMin ai.onnx(1-10,11-12,13+)
Asin ai.onnx(7+)
Asinh ai.onnx(9+)
Atan ai.onnx(7+)
Atanh ai.onnx(9+)
AveragePool ai.onnx(7-9,10,11+); com.ms.internal.nhwc(7-9,10,11+) need perf optimization; need implementing activation
BiasAdd com.microsoft(1+)
BiasSplitGelu com.microsoft(1+)
Cast ai.onnx(6-8,9-12,13-18,19+)
Ceil ai.onnx(6-12,13+)
Clip ai.onnx(6-10,11,12,13+)
Concat ai.onnx(1-3,4-10,11-12,13+)
Conv ai.onnx(1-10,11+); com.ms.internal.nhwc(1-10,11+) need perf optimization; conv3d is not supported; need implementing activation
ConvTranspose ai.onnx(1-10,11+); com.ms.internal.nhwc(1-10,11+) need perf optimization; ConvTranspose3d is not supported; need implementing activation
Cos ai.onnx(7+)
Cosh ai.onnx(9+)
Div ai.onnx(7-12,13,14+)
Einsum ai.onnx(12+)
Elu ai.onnx(6+)
Equal ai.onnx(7-10,11-12,13-18,19+)
Erf ai.onnx(9-12,13+)
Exp ai.onnx(6-12,13+)
Expand ai.onnx(8-12,13+)
Flatten ai.onnx(1-8,9-10,11-12,13+)
Floor ai.onnx(6-12,13+)
FusedConv com.microsoft(1+)
Gather ai.onnx(1-10,11-12,13+)
GatherElements ai.onnx(11-12,13+)
Gelu com.microsoft(1+)
Gemm ai.onnx(7-8,9-10,11-12,13+)
GlobalAveragePool ai.onnx(1+); com.ms.internal.nhwc(1+)
GlobalMaxPool ai.onnx(1+); com.ms.internal.nhwc(1+)
Greater ai.onnx(7-8,9-12,13+)
GreaterOrEqual ai.onnx(12-15,16+)
If ai.onnx(1-10,11-12,13-18,19+)
InstanceNormalization ai.onnx(6+); com.ms.internal.nhwc(6+)
LayerNormalization ai.onnx(17+)
LeakyRelu ai.onnx(6-15,16+)
Less ai.onnx(7-8,9-12,13+)
LessOrEqual ai.onnx(12-15,16+)
Log ai.onnx(6-12,13+)
MatMul ai.onnx(1-12,13+)
MaxPool ai.onnx(1-7,8-9,10,11,12+); com.ms.internal.nhwc(1-7,8-9,10,11,12+) need perf optimization; need implementing activation
MemcpyFromHost ai.onnx(1+)
MemcpyToHost ai.onnx(1+)
Mul ai.onnx(7-12,13,14+)
Neg ai.onnx(6-12,13+)
Not ai.onnx(1+)
Pad ai.onnx(2-10,11-12,13-17,18,19+)
Pow ai.onnx(7-11,12,13-14,15+)
Range ai.onnx(11+)
Reciprocal ai.onnx(6-12,13+)
ReduceL1 ai.onnx(1-10,11-12,13-17,18+)
ReduceL2 ai.onnx(1-10,11-12,13-17,18+)
ReduceLogSum ai.onnx(1-10,11-12,13-17,18+)
ReduceLogSumExp ai.onnx(1-10,11-12,13-17,18+)
ReduceMax ai.onnx(1-10,11,12,13-17,18+)
ReduceMean ai.onnx(1-10,11-12,13-17,18+)
ReduceMin ai.onnx(1-10,11,12,13-17,18+)
ReduceProd ai.onnx(1-10,11-12,13-17,18+)
ReduceSum ai.onnx(1-10,11-12,13+)
ReduceSumSquare ai.onnx(1-10,11-12,13-17,18+)
Relu ai.onnx(6-12,13,14+)
Reshape ai.onnx(5-12,13,14+) no GPU kernel
Resize ai.onnx(10,11-12,13-17,18,19+); com.ms.internal.nhwc(10,11-12,13-17,18,19+) CoordinateTransformMode align_corners is not supported with downsampling
Shape ai.onnx(1-12,13-14,15+) no GPU kernel; an ORT warning is generated - need to fix
Sigmoid ai.onnx(6-12,13+)
Sin ai.onnx(7+)
Sinh ai.onnx(9+)
SkipLayerNormalization com.microsoft(1+)
Slice ai.onnx(1-9,10,11-12,13+)
Softmax ai.onnx(1-10,11-12,13+)
Split ai.onnx(1,2-10,11-12,13-17,18+)
Sqrt ai.onnx(6-12,13+)
Squeeze ai.onnx(1-10,11-12,13+)
Sub ai.onnx(7-12,13,14+)
Tan ai.onnx(7+)
Tanh ai.onnx(6-12,13+)
ThresholdedRelu ai.onnx(10+)
Tile ai.onnx(6-12,13+)
Transpose ai.onnx(1-12,13+) need perf optimization
Unsqueeze ai.onnx(1-10,11-12,13+)
Where ai.onnx(9-15,16+)