onnxruntime/js/web/test/data/ops
Jiajia Qin 3580e01348
[js/webgpu] Optimize grouped conv (#21892)
### Description
<!-- Describe your changes. -->
#21618

This PR optimizes grouped conv by 1) more sequential memory access in
gpu 2) reusing input's data to reduce global memory access times.

See `Conv|GroupedConv` op in
[Wav2Vec2](https://huggingface.co/facebook/wav2vec2-base-960h) becomes
92 ms from 1058 ms on iGPUs with 32 EU.

For the whole model on my iGPUs with 32 EU,
wav2vec2 model becomes 982ms from 1942 ms.
squeezebert-uncased model becomes 71.86ms from 431.77ms.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-09-04 17:16:35 -07:00
..
_example.jsonc
abs-int32.jsonc
abs.jsonc
absr.jsonc
abss.jsonc
acos.jsonc
add.jsonc
add_int32.jsonc
add_zero-sized.jsonc
and.jsonc
asin.jsonc
attention.jsonc
batch-norm.jsonc
bias-add.jsonc
bias-split-gelu.jsonc
cast.jsonc
ceil.jsonc
clip.jsonc [js/webgpu] support float16 for Clip (#21584) 2024-08-28 13:19:20 -07:00
concat.jsonc
concat_int32.jsonc
concat_zero-sized.jsonc
conv-transpose.jsonc
conv.jsonc [js/webgpu] Optimize grouped conv (#21892) 2024-09-04 17:16:35 -07:00
conv1d.jsonc [js/webgpu] Optimize conv1d by conv2d (#19388) 2024-08-22 22:56:07 -07:00
conv3dncdhw.jsonc
cos.jsonc
cumsum.jsonc
depth-to-space.jsonc
dequantize-linear-int4.jsonc [JS/WebGPU] Add GatherBlockQuantized op support (#21734) 2024-08-26 14:46:04 -07:00
dequantizelinear.jsonc [JS/WebGPU] Add Dequantizelinear operator (#21642) 2024-08-09 14:44:19 -07:00
div.jsonc
div_int32.jsonc
einsum.jsonc
equal.jsonc
exp.jsonc
expand.jsonc
fast-gelu.jsonc
floor.jsonc
fused-conv.jsonc
fused-conv3dncdhw.jsonc
gather-block-quantized.jsonc [JS/WebGPU] Add GatherBlockQuantized op support (#21734) 2024-08-26 14:46:04 -07:00
gather-elements.jsonc
gather.jsonc
gelu.jsonc
gemm.jsonc
global-average-pool.jsonc
greater.jsonc
group-query-attention.jsonc
identity.jsonc
image-scaler.jsonc
instance-norm.jsonc
layer-norm.jsonc
leaky-relu.jsonc
less.jsonc
log.jsonc
matmul-broadcast.jsonc
matmul.jsonc
matmulnbits.jsonc
max-pool.jsonc [js/webgpu] Fix max pool shape end with 0 (#21698) 2024-08-13 20:59:24 -07:00
mul.jsonc
mul_int32.jsonc
multihead-attention.jsonc [JS/WebGPU] Avoid producing presentKey/presentValue outputs if pastKey/pastValue … (#21782) 2024-08-19 18:02:19 -07:00
neg-int32.jsonc
neg.jsonc
not.jsonc
or.jsonc
pad-big.jsonc
pad.jsonc
pad_f16.jsonc [js/webgpu] Enable pad f16 uniform (#21691) 2024-08-26 07:58:48 -07:00
pow-big-number.jsonc
pow.jsonc
pow_int32.jsonc
quick-gelu.jsonc
reduce-min.jsonc
relu.jsonc
reshape-int32.jsonc
reshape-pack.jsonc
reshape.jsonc
resize-pack.jsonc
resize.jsonc
rotary-embedding.jsonc
shape.jsonc
simplified-layer-norm.jsonc
sin.jsonc
skip-layer-norm.jsonc
skip-simplified-layer-norm.jsonc
slice.jsonc
softmax.jsonc
split.jsonc [js/webgpu] Handle negative axis in op Split (#21771) 2024-08-17 16:41:23 -07:00
sqrt.jsonc
sub.jsonc
sub_int32.jsonc
tan.jsonc
tanh.jsonc
tile.jsonc
transpose.jsonc [js/webgpu] Optimize transpose (#21964) 2024-09-04 12:04:04 -07:00
transpose_int32_uint32.jsonc
upsample.jsonc
where.jsonc
where_broadcast.jsonc
xor.jsonc