mirror of
https://github.com/saymrwulf/onnxruntime.git
synced 2026-05-16 21:00:14 +00:00
### Description This is a narrow implementation of Attention/MultiHeadAttention as it does not support: a. inputs 5-7 for MHA b. packed QKV/KV c. past/present d. attention mask But it works well for StableDiffusion and can be extended later. It reduces VRAM usage as it combines many ops into few I've updated demo here https://islamov.ai/stable-diffusion-webgpu/ it takes ~13sec for 1 image with 20 steps on RTX3090Ti and about 25s on M1 Pro VRAM usage is about 8gb if you don't use img2img Going to focus on SDXL now --------- Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com> Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
4.5 KiB
4.5 KiB
Operators Support Table
The following table shows ONNX
operators and the supported opset domain/versions in WebGPU EP by ONNX Runtime Web. For example,
4-6, 8+ means ONNX Runtime Web currently support opset version 4 to 6, 8 and above.
This file is automatically generated from the def files via this script. Do not modify directly.
| Operator | Opset | Comments |
|---|---|---|
| Abs | ai.onnx(6-12,13+) | |
| Acos | ai.onnx(7+) | |
| Acosh | ai.onnx(9+) | |
| Add | ai.onnx(7-12,13,14+) | |
| ArgMax | ai.onnx(1-10,11-12,13+) | |
| ArgMin | ai.onnx(1-10,11-12,13+) | |
| Asin | ai.onnx(7+) | |
| Asinh | ai.onnx(9+) | |
| Atan | ai.onnx(7+) | |
| Atanh | ai.onnx(9+) | |
| Attention | com.microsoft(1+) | need implementing mask and past/present |
| AveragePool | ai.onnx(7-9,10,11+); com.ms.internal.nhwc(7-9,10,11+) | need perf optimization; need implementing activation |
| BiasAdd | com.microsoft(1+) | |
| BiasSplitGelu | com.microsoft(1+) | |
| Cast | ai.onnx(6-8,9-12,13-18,19+) | |
| Ceil | ai.onnx(6-12,13+) | |
| Clip | ai.onnx(6-10,11,12,13+) | |
| Concat | ai.onnx(1-3,4-10,11-12,13+) | |
| Conv | ai.onnx(1-10,11+); com.ms.internal.nhwc(1-10,11+) | need perf optimization; conv3d is not supported; need implementing activation |
| ConvTranspose | ai.onnx(1-10,11+); com.ms.internal.nhwc(1-10,11+) | need perf optimization; ConvTranspose3d is not supported; need implementing activation |
| Cos | ai.onnx(7+) | |
| Cosh | ai.onnx(9+) | |
| Div | ai.onnx(7-12,13,14+) | |
| Einsum | ai.onnx(12+) | |
| Elu | ai.onnx(6+) | |
| Equal | ai.onnx(7-10,11-12,13-18,19+) | |
| Erf | ai.onnx(9-12,13+) | |
| Exp | ai.onnx(6-12,13+) | |
| Expand | ai.onnx(8-12,13+) | |
| Flatten | ai.onnx(1-8,9-10,11-12,13+) | |
| Floor | ai.onnx(6-12,13+) | |
| FusedConv | com.microsoft(1+) | |
| Gather | ai.onnx(1-10,11-12,13+) | |
| GatherElements | ai.onnx(11-12,13+) | |
| Gelu | com.microsoft(1+) | |
| Gemm | ai.onnx(7-8,9-10,11-12,13+) | |
| GlobalAveragePool | ai.onnx(1+); com.ms.internal.nhwc(1+) | |
| GlobalMaxPool | ai.onnx(1+); com.ms.internal.nhwc(1+) | |
| Greater | ai.onnx(7-8,9-12,13+) | |
| GreaterOrEqual | ai.onnx(12-15,16+) | |
| If | ai.onnx(1-10,11-12,13-18,19+) | |
| InstanceNormalization | ai.onnx(6+); com.ms.internal.nhwc(6+) | |
| LayerNormalization | ai.onnx(17+) | |
| LeakyRelu | ai.onnx(6-15,16+) | |
| Less | ai.onnx(7-8,9-12,13+) | |
| LessOrEqual | ai.onnx(12-15,16+) | |
| Log | ai.onnx(6-12,13+) | |
| MatMul | ai.onnx(1-12,13+) | |
| MaxPool | ai.onnx(1-7,8-9,10,11,12+); com.ms.internal.nhwc(1-7,8-9,10,11,12+) | need perf optimization; need implementing activation |
| MemcpyFromHost | ai.onnx(1+) | |
| MemcpyToHost | ai.onnx(1+) | |
| Mul | ai.onnx(7-12,13,14+) | |
| MultiHeadAttention | com.microsoft(1+) | need implementing mask and past/present |
| Neg | ai.onnx(6-12,13+) | |
| Not | ai.onnx(1+) | |
| Pad | ai.onnx(2-10,11-12,13-17,18,19+) | |
| Pow | ai.onnx(7-11,12,13-14,15+) | |
| Range | ai.onnx(11+) | |
| Reciprocal | ai.onnx(6-12,13+) | |
| ReduceL1 | ai.onnx(1-10,11-12,13-17,18+) | |
| ReduceL2 | ai.onnx(1-10,11-12,13-17,18+) | |
| ReduceLogSum | ai.onnx(1-10,11-12,13-17,18+) | |
| ReduceLogSumExp | ai.onnx(1-10,11-12,13-17,18+) | |
| ReduceMax | ai.onnx(1-10,11,12,13-17,18+) | |
| ReduceMean | ai.onnx(1-10,11-12,13-17,18+) | |
| ReduceMin | ai.onnx(1-10,11,12,13-17,18+) | |
| ReduceProd | ai.onnx(1-10,11-12,13-17,18+) | |
| ReduceSum | ai.onnx(1-10,11-12,13+) | |
| ReduceSumSquare | ai.onnx(1-10,11-12,13-17,18+) | |
| Relu | ai.onnx(6-12,13,14+) | |
| Reshape | ai.onnx(5-12,13,14+) | no GPU kernel |
| Resize | ai.onnx(10,11-12,13-17,18,19+); com.ms.internal.nhwc(10,11-12,13-17,18,19+) | CoordinateTransformMode align_corners is not supported with downsampling |
| Shape | ai.onnx(1-12,13-14,15+) | no GPU kernel; an ORT warning is generated - need to fix |
| Sigmoid | ai.onnx(6-12,13+) | |
| Sin | ai.onnx(7+) | |
| Sinh | ai.onnx(9+) | |
| SkipLayerNormalization | com.microsoft(1+) | |
| Slice | ai.onnx(1-9,10,11-12,13+) | |
| Softmax | ai.onnx(1-10,11-12,13+) | |
| Split | ai.onnx(1,2-10,11-12,13-17,18+) | |
| Sqrt | ai.onnx(6-12,13+) | |
| Squeeze | ai.onnx(1-10,11-12,13+) | |
| Sub | ai.onnx(7-12,13,14+) | |
| Tan | ai.onnx(7+) | |
| Tanh | ai.onnx(6-12,13+) | |
| ThresholdedRelu | ai.onnx(10+) | |
| Tile | ai.onnx(6-12,13+) | |
| Transpose | ai.onnx(1-12,13+) | need perf optimization |
| Unsqueeze | ai.onnx(1-10,11-12,13+) | |
| Where | ai.onnx(9-15,16+) |