onnxruntime/js/web/docs/webgpu-operators.md
Arthur Islamov fac3e33da5
[js/web] JSEP Attention & MultiHeadAttention (#17742)
### Description
This is a narrow implementation of Attention/MultiHeadAttention as it
does not support:
a. inputs 5-7 for MHA
b. packed QKV/KV
c. past/present
d. attention mask

But it works well for StableDiffusion and can be extended later. It
reduces VRAM usage as it combines many ops into few
I've updated demo here https://islamov.ai/stable-diffusion-webgpu/ it
takes ~13sec for 1 image with 20 steps on RTX3090Ti and about 25s on M1
Pro
VRAM usage is about 8gb if you don't use img2img

Going to focus on SDXL now

---------

Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com>
Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
2023-11-17 12:23:52 -08:00

4.5 KiB

Operators Support Table

The following table shows ONNX operators and the supported opset domain/versions in WebGPU EP by ONNX Runtime Web. For example, 4-6, 8+ means ONNX Runtime Web currently support opset version 4 to 6, 8 and above.

This file is automatically generated from the def files via this script. Do not modify directly.

Operator Opset Comments
Abs ai.onnx(6-12,13+)
Acos ai.onnx(7+)
Acosh ai.onnx(9+)
Add ai.onnx(7-12,13,14+)
ArgMax ai.onnx(1-10,11-12,13+)
ArgMin ai.onnx(1-10,11-12,13+)
Asin ai.onnx(7+)
Asinh ai.onnx(9+)
Atan ai.onnx(7+)
Atanh ai.onnx(9+)
Attention com.microsoft(1+) need implementing mask and past/present
AveragePool ai.onnx(7-9,10,11+); com.ms.internal.nhwc(7-9,10,11+) need perf optimization; need implementing activation
BiasAdd com.microsoft(1+)
BiasSplitGelu com.microsoft(1+)
Cast ai.onnx(6-8,9-12,13-18,19+)
Ceil ai.onnx(6-12,13+)
Clip ai.onnx(6-10,11,12,13+)
Concat ai.onnx(1-3,4-10,11-12,13+)
Conv ai.onnx(1-10,11+); com.ms.internal.nhwc(1-10,11+) need perf optimization; conv3d is not supported; need implementing activation
ConvTranspose ai.onnx(1-10,11+); com.ms.internal.nhwc(1-10,11+) need perf optimization; ConvTranspose3d is not supported; need implementing activation
Cos ai.onnx(7+)
Cosh ai.onnx(9+)
Div ai.onnx(7-12,13,14+)
Einsum ai.onnx(12+)
Elu ai.onnx(6+)
Equal ai.onnx(7-10,11-12,13-18,19+)
Erf ai.onnx(9-12,13+)
Exp ai.onnx(6-12,13+)
Expand ai.onnx(8-12,13+)
Flatten ai.onnx(1-8,9-10,11-12,13+)
Floor ai.onnx(6-12,13+)
FusedConv com.microsoft(1+)
Gather ai.onnx(1-10,11-12,13+)
GatherElements ai.onnx(11-12,13+)
Gelu com.microsoft(1+)
Gemm ai.onnx(7-8,9-10,11-12,13+)
GlobalAveragePool ai.onnx(1+); com.ms.internal.nhwc(1+)
GlobalMaxPool ai.onnx(1+); com.ms.internal.nhwc(1+)
Greater ai.onnx(7-8,9-12,13+)
GreaterOrEqual ai.onnx(12-15,16+)
If ai.onnx(1-10,11-12,13-18,19+)
InstanceNormalization ai.onnx(6+); com.ms.internal.nhwc(6+)
LayerNormalization ai.onnx(17+)
LeakyRelu ai.onnx(6-15,16+)
Less ai.onnx(7-8,9-12,13+)
LessOrEqual ai.onnx(12-15,16+)
Log ai.onnx(6-12,13+)
MatMul ai.onnx(1-12,13+)
MaxPool ai.onnx(1-7,8-9,10,11,12+); com.ms.internal.nhwc(1-7,8-9,10,11,12+) need perf optimization; need implementing activation
MemcpyFromHost ai.onnx(1+)
MemcpyToHost ai.onnx(1+)
Mul ai.onnx(7-12,13,14+)
MultiHeadAttention com.microsoft(1+) need implementing mask and past/present
Neg ai.onnx(6-12,13+)
Not ai.onnx(1+)
Pad ai.onnx(2-10,11-12,13-17,18,19+)
Pow ai.onnx(7-11,12,13-14,15+)
Range ai.onnx(11+)
Reciprocal ai.onnx(6-12,13+)
ReduceL1 ai.onnx(1-10,11-12,13-17,18+)
ReduceL2 ai.onnx(1-10,11-12,13-17,18+)
ReduceLogSum ai.onnx(1-10,11-12,13-17,18+)
ReduceLogSumExp ai.onnx(1-10,11-12,13-17,18+)
ReduceMax ai.onnx(1-10,11,12,13-17,18+)
ReduceMean ai.onnx(1-10,11-12,13-17,18+)
ReduceMin ai.onnx(1-10,11,12,13-17,18+)
ReduceProd ai.onnx(1-10,11-12,13-17,18+)
ReduceSum ai.onnx(1-10,11-12,13+)
ReduceSumSquare ai.onnx(1-10,11-12,13-17,18+)
Relu ai.onnx(6-12,13,14+)
Reshape ai.onnx(5-12,13,14+) no GPU kernel
Resize ai.onnx(10,11-12,13-17,18,19+); com.ms.internal.nhwc(10,11-12,13-17,18,19+) CoordinateTransformMode align_corners is not supported with downsampling
Shape ai.onnx(1-12,13-14,15+) no GPU kernel; an ORT warning is generated - need to fix
Sigmoid ai.onnx(6-12,13+)
Sin ai.onnx(7+)
Sinh ai.onnx(9+)
SkipLayerNormalization com.microsoft(1+)
Slice ai.onnx(1-9,10,11-12,13+)
Softmax ai.onnx(1-10,11-12,13+)
Split ai.onnx(1,2-10,11-12,13-17,18+)
Sqrt ai.onnx(6-12,13+)
Squeeze ai.onnx(1-10,11-12,13+)
Sub ai.onnx(7-12,13,14+)
Tan ai.onnx(7+)
Tanh ai.onnx(6-12,13+)
ThresholdedRelu ai.onnx(10+)
Tile ai.onnx(6-12,13+)
Transpose ai.onnx(1-12,13+) need perf optimization
Unsqueeze ai.onnx(1-10,11-12,13+)
Where ai.onnx(9-15,16+)