onnxruntime/js/web/test/data/ops
Jiajia Qin 41d2ff622c
[js/webgpu] Optimize InstanceNormalization (#17491)
### Description
<!-- Describe your changes. -->
In previous implementation, there are two loops to iterate H * W
elements to calculate the `mean` and `squaredNorm` value in one thread,
meanwhile it outputs H * W elements in one thread. That results it's
very very slow when H * W is a large value. And usually, H * W does be a
large value in a model. For example, in the `candy-8` model, the shapes
of [H, W] are [224,224], [112,112], [56,56] for `InstanceNormalization`
op. And in my ADL, `[1,224,224,32]` consumes 17 ms. See below:
```
[profiling] kernel "23848328|[InstanceNormalization] 23848328" input[0]: [1,224,224,32] | float32, input[1]: [32] | float32, input[2]: [32] | float32, output[0]: [1,224,224,32] | float32, execution time: 17007914 ns
```

In this PR, it uses workgroup memory to optimize the original algorithm.
The advantage is that it can parallelly utilize the 64 (workgroupSize)
threads in one workgroup to calculate `mean` and `squaredNorm` value.
Meanwhile, it only outputs `H * W / workgroupSize` outputs for one
thread, which greatly reduces the overhead for one thread. With this
optimization, `[1,224,224,32]` becomes 3 ms and the main overhead is the
extra two `transpose`. The `createInstanceNormProgramInfo` only needs
`0.64` ms. See below:
```
[profiling] kernel "23003600|[InstanceNormalization] 23003600" input[0]: [1,224,224,32] | float32, output[0]: [1,32,224,224] | float32, execution time: 1543792 ns
program-manager.ts:115 
[profiling] kernel "23003600|[InstanceNormalization] 23003600" input[0]: [1,32,224,224] | float32, input[1]: [32] | float32, input[2]: [32] | float32, output[0]: [1,32,224,224] | float32, execution time: 642652 ns
program-manager.ts:115 
[profiling] kernel "23003600|[InstanceNormalization] 23003600" input[0]: [1,32,224,224] | float32, output[0]: [1,224,224,32] | float32, execution time: 991608 ns
```
This PR currently only applies the new algorithm to NCHW format. For
NHWC format, one way is to transpose the input so that it can use the
new algorithm. But the disadvantage is that 2 extra transpose are added.
@dakenf also gives another way to optimize NHWC. Details see
[here](d45a96616d/js/web/lib/wasm/jsep/webgpu/ops/instance-norm.ts).
I checked @dakenf's method. The perf is similar with transpose +
optimized NCHW. But on different GPUs, one is a little better than
another or vice versa. So I prefer this PR only does the NCHW part.
@dakenf can submit his optimization on NHWC.
2023-09-14 17:03:18 -07:00
..
_example.jsonc [js/web] allow optional input/output in operator test (#17184) 2023-08-16 11:50:11 -07:00
abs-int32.jsonc [js/webgpu] Include Support for neg.int32 (#17374) 2023-09-06 12:00:16 -07:00
abs.jsonc
absr.jsonc
abss.jsonc [js] enable formatter for more file types (#16888) 2023-07-28 15:46:58 -07:00
acos.jsonc [js] enable formatter for more file types (#16888) 2023-07-28 15:46:58 -07:00
add.jsonc [js] enable formatter for more file types (#16888) 2023-07-28 15:46:58 -07:00
add_int32.jsonc
and.jsonc
asin.jsonc [js] enable formatter for more file types (#16888) 2023-07-28 15:46:58 -07:00
cast.jsonc [js/webgpu] support Cast operator (#16489) 2023-08-18 23:51:03 -07:00
ceil.jsonc
concat.jsonc [js] enable formatter for more file types (#16888) 2023-07-28 15:46:58 -07:00
concat_int32.jsonc [JS/WebGPU] support Concat.int32 operator (#17003) 2023-09-13 00:05:00 -07:00
conv-transpose.jsonc [JS/Web] Fix ConvTranspose shader code compilation errors. (#17232) 2023-08-25 06:25:54 -07:00
conv.jsonc [js] enable formatter for more file types (#16888) 2023-07-28 15:46:58 -07:00
cos.jsonc [js] enable formatter for more file types (#16888) 2023-07-28 15:46:58 -07:00
depth-to-space.jsonc [js] enable formatter for more file types (#16888) 2023-07-28 15:46:58 -07:00
div.jsonc [js] enable formatter for more file types (#16888) 2023-07-28 15:46:58 -07:00
div_int32.jsonc [js/webgpu] Support int32 type for binary (#16901) 2023-08-18 12:19:01 -07:00
einsum.jsonc [JS/Web] Added Einsum operator support. (#17401) 2023-09-11 15:57:15 -07:00
equal.jsonc
exp.jsonc [js] enable formatter for more file types (#16888) 2023-07-28 15:46:58 -07:00
expand.jsonc [JS/WebGPU] Expand operator fixes (#17137) 2023-08-16 11:24:26 -07:00
floor.jsonc
gather-elements.jsonc [JS/WebGPU] Support GatherElements kernel (#17243) 2023-08-28 09:55:25 -07:00
gather.jsonc
gelu.jsonc [js/web] update op test schema (#16921) 2023-08-03 14:20:20 -07:00
gemm.jsonc
global-average-pool.jsonc [js] enable formatter for more file types (#16888) 2023-07-28 15:46:58 -07:00
greater.jsonc
identity.jsonc
image-scaler.jsonc [js] enable formatter for more file types (#16888) 2023-07-28 15:46:58 -07:00
instance-norm.jsonc [js/webgpu] Optimize InstanceNormalization (#17491) 2023-09-14 17:03:18 -07:00
layer-norm.jsonc [JS/Web] The bias input is optional, not required, for LayerNormalization operator (#17143) 2023-08-16 10:41:20 -07:00
leaky-relu.jsonc
less.jsonc
log.jsonc [js] enable formatter for more file types (#16888) 2023-07-28 15:46:58 -07:00
matmul-broadcast.jsonc [js/webgpu] add matmul broadcast tests (#17335) 2023-09-05 20:41:46 -07:00
matmul.jsonc [js/webgpu] Optimize matmul (#16969) 2023-08-29 12:40:57 -07:00
mul.jsonc [js] enable formatter for more file types (#16888) 2023-07-28 15:46:58 -07:00
mul_int32.jsonc
neg-int32.jsonc [js/webgpu] Include Support for neg.int32 (#17374) 2023-09-06 12:00:16 -07:00
neg.jsonc
not.jsonc
or.jsonc
pad-big.jsonc [js/web] update op test schema (#16921) 2023-08-03 14:20:20 -07:00
pad.jsonc [js/web] update op test schema (#16921) 2023-08-03 14:20:20 -07:00
pow-big-number.jsonc [js/web] update op test schema (#16921) 2023-08-03 14:20:20 -07:00
pow.jsonc
pow_int32.jsonc [js/webgpu] Support int32 type for binary (#16901) 2023-08-18 12:19:01 -07:00
reduce-min.jsonc
relu.jsonc
reshape-int32.jsonc
reshape-pack.jsonc
reshape.jsonc [js/webgpu] Fix reshape int32 test case (#17113) 2023-08-15 21:18:13 -07:00
resize-pack.jsonc [js/web] update op test schema (#16921) 2023-08-03 14:20:20 -07:00
shape.jsonc
sin.jsonc [js] enable formatter for more file types (#16888) 2023-07-28 15:46:58 -07:00
skip-layer-norm.jsonc [JS/Web] Added SkipLayerNormalization operator. (#17102) 2023-08-18 09:59:03 -07:00
slice.jsonc [js/webgpu] Support slice int32 (#16968) 2023-09-05 18:05:47 -07:00
softmax.jsonc [js] enable formatter for more file types (#16888) 2023-07-28 15:46:58 -07:00
split.jsonc [js/web] update op test schema (#16921) 2023-08-03 14:20:20 -07:00
sqrt.jsonc
sub.jsonc [js] enable formatter for more file types (#16888) 2023-07-28 15:46:58 -07:00
sub_int32.jsonc
tan.jsonc [js] enable formatter for more file types (#16888) 2023-07-28 15:46:58 -07:00
tile.jsonc [JS/WebGPU] Support Tile operator (#17123) 2023-08-18 10:07:21 -07:00
transpose.jsonc [js] enable formatter for more file types (#16888) 2023-07-28 15:46:58 -07:00
transpose_int32_uint32.jsonc [js/WebGPU] Support int32 Transpose in WebGPU (#16952) 2023-08-02 16:27:24 -07:00
upsample.jsonc [js] enable formatter for more file types (#16888) 2023-07-28 15:46:58 -07:00
xor.jsonc