onnxruntime/js/web/docs/webgpu-operators.md at 4f2096be38bb04b103c2577d5f132c92419b26ad

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-14 20:48:00 +00:00

Scott McKay 4f2096be38

Update XNNPACK to latest version (#18038 )

### Description
<!-- Describe your changes. -->
Update XNNPACK to latest version
- adds fp16 kernels and various other improvements
- requires pthreadpool update as well

Most code updates in the XNNPACK EP are to adjust to the new XNNPACK API
- 'setup' is split into 'reshape' and 'setup'
-  some ops use a workspace buffer
   -  copied workspace allocation from XNNPACK unit test code
- some suffixes changed 

Added wrapper for XNNPACK caches to base XNNPACK EP kernel
- simplifies usage
- XNNPACK split out the code and weights caches, but the code cache
isn't currently usable via the public API
- we could use the internal types if we think it's required for
performance reasons. non-trivial though as we'd need to propagate ifdef
values from the XNNPACK build up to the ORT build.
- using XNNPACK internals would also mean we would not be able to
support using a pre-build XNNPACK package
    - not an issue currently
  
Fixed opset registration for internal NHWC domain
- was not being tied to the ONNX version, so nodes inserted by layout
transformation had the incorrect opset
- a number of other places needed updating once this issue was fixed

Remove support for NCHW Resize from XNNPACK EP so it's NHWC only
- we only supported NCHW for fp32,
- doing so adds complexity in multiple places (XNNPACK EP kernel
implementation, layout transformation and transpose optimization)
- unclear if that complexity provides any benefit. can add back if
required by production scenario

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
We're looking at enabling fp16 support for CoreML and NNAPI. If we do
that we need a good fallback story if the CPU EP will be used. The
XNNPACK fp16 kernels will hopefully provide that.

NOTE: This PR doesn't add fp16 support to the XNNPACK EP kernels. That
can be done as required in separate EPs and should be relatively simple
to do.

2023-11-03 09:04:28 -07:00

4.3 KiB

Raw Blame History

Operators Support Table

The following table shows ONNX operators and the supported opset domain/versions in WebGPU EP by ONNX Runtime Web. For example, 4-6, 8+ means ONNX Runtime Web currently support opset version 4 to 6, 8 and above.

This file is automatically generated from the def files via this script. Do not modify directly.

Operator	Opset	Comments
Abs	ai.onnx(6-12,13+)
Acos	ai.onnx(7+)
Acosh	ai.onnx(9+)
Add	ai.onnx(7-12,13,14+)
ArgMax	ai.onnx(1-10,11-12,13+)
ArgMin	ai.onnx(1-10,11-12,13+)
Asin	ai.onnx(7+)
Asinh	ai.onnx(9+)
Atan	ai.onnx(7+)
Atanh	ai.onnx(9+)
AveragePool	ai.onnx(7-9,10,11+); com.ms.internal.nhwc(7-9,10,11+)	need perf optimization; need implementing activation
BiasAdd	com.microsoft(1+)
BiasSplitGelu	com.microsoft(1+)
Cast	ai.onnx(6-8,9-12,13-18,19+)
Ceil	ai.onnx(6-12,13+)
Clip	ai.onnx(6-10,11,12,13+)
Concat	ai.onnx(1-3,4-10,11-12,13+)
Conv	ai.onnx(1-10,11+); com.ms.internal.nhwc(1-10,11+)	need perf optimization; conv3d is not supported; need implementing activation
ConvTranspose	ai.onnx(1-10,11+); com.ms.internal.nhwc(1-10,11+)	need perf optimization; ConvTranspose3d is not supported; need implementing activation
Cos	ai.onnx(7+)
Cosh	ai.onnx(9+)
Div	ai.onnx(7-12,13,14+)
Einsum	ai.onnx(12+)
Elu	ai.onnx(6+)
Equal	ai.onnx(7-10,11-12,13-18,19+)
Erf	ai.onnx(9-12,13+)
Exp	ai.onnx(6-12,13+)
Expand	ai.onnx(8-12,13+)
Flatten	ai.onnx(1-8,9-10,11-12,13+)
Floor	ai.onnx(6-12,13+)
FusedConv	com.microsoft(1+)
Gather	ai.onnx(1-10,11-12,13+)
GatherElements	ai.onnx(11-12,13+)
Gelu	com.microsoft(1+)
Gemm	ai.onnx(7-8,9-10,11-12,13+)
GlobalAveragePool	ai.onnx(1+); com.ms.internal.nhwc(1+)
GlobalMaxPool	ai.onnx(1+); com.ms.internal.nhwc(1+)
Greater	ai.onnx(7-8,9-12,13+)
GreaterOrEqual	ai.onnx(12-15,16+)
If	ai.onnx(1-10,11-12,13-18,19+)
InstanceNormalization	ai.onnx(6+); com.ms.internal.nhwc(6+)
LayerNormalization	ai.onnx(17+)
LeakyRelu	ai.onnx(6-15,16+)
Less	ai.onnx(7-8,9-12,13+)
LessOrEqual	ai.onnx(12-15,16+)
Log	ai.onnx(6-12,13+)
MatMul	ai.onnx(1-12,13+)
MaxPool	ai.onnx(1-7,8-9,10,11,12+); com.ms.internal.nhwc(1-7,8-9,10,11,12+)	need perf optimization; need implementing activation
MemcpyFromHost	ai.onnx(1+)
MemcpyToHost	ai.onnx(1+)
Mul	ai.onnx(7-12,13,14+)
Neg	ai.onnx(6-12,13+)
Not	ai.onnx(1+)
Pad	ai.onnx(2-10,11-12,13-17,18,19+)
Pow	ai.onnx(7-11,12,13-14,15+)
Range	ai.onnx(11+)
Reciprocal	ai.onnx(6-12,13+)
ReduceL1	ai.onnx(1-10,11-12,13-17,18+)
ReduceL2	ai.onnx(1-10,11-12,13-17,18+)
ReduceLogSum	ai.onnx(1-10,11-12,13-17,18+)
ReduceLogSumExp	ai.onnx(1-10,11-12,13-17,18+)
ReduceMax	ai.onnx(1-10,11,12,13-17,18+)
ReduceMean	ai.onnx(1-10,11-12,13-17,18+)
ReduceMin	ai.onnx(1-10,11,12,13-17,18+)
ReduceProd	ai.onnx(1-10,11-12,13-17,18+)
ReduceSum	ai.onnx(1-10,11-12,13+)
ReduceSumSquare	ai.onnx(1-10,11-12,13-17,18+)
Relu	ai.onnx(6-12,13,14+)
Reshape	ai.onnx(5-12,13,14+)	no GPU kernel
Resize	ai.onnx(10,11-12,13-17,18,19+); com.ms.internal.nhwc(10,11-12,13-17,18,19+)	CoordinateTransformMode align_corners is not supported with downsampling
Shape	ai.onnx(1-12,13-14,15+)	no GPU kernel; an ORT warning is generated - need to fix
Sigmoid	ai.onnx(6-12,13+)
Sin	ai.onnx(7+)
Sinh	ai.onnx(9+)
SkipLayerNormalization	com.microsoft(1+)
Slice	ai.onnx(1-9,10,11-12,13+)
Softmax	ai.onnx(1-10,11-12,13+)
Split	ai.onnx(1,2-10,11-12,13-17,18+)
Sqrt	ai.onnx(6-12,13+)
Squeeze	ai.onnx(1-10,11-12,13+)
Sub	ai.onnx(7-12,13,14+)
Tan	ai.onnx(7+)
Tanh	ai.onnx(6-12,13+)
ThresholdedRelu	ai.onnx(10+)
Tile	ai.onnx(6-12,13+)
Transpose	ai.onnx(1-12,13+)	need perf optimization
Unsqueeze	ai.onnx(1-10,11-12,13+)
Where	ai.onnx(9-15,16+)

4.3 KiB Raw Blame History

Operators Support Table

4.3 KiB

Raw Blame History