onnxruntime/js/web/lib/wasm/jsep/webgpu/ops
Jiajia Qin 8fbbf2fd4f
[js/webgpu] Optimize MatMul with M = 1 (#22577)
### Description
<!-- Describe your changes. -->
BUG #22031

In the demucs model, there are lots of MatMul ops with shapes like
below:
`input[0]: [3448,1,512] | float32, input[1]: [512,1536] | float32,
output[0]: [3448,1,1536] | float32`

We can see that for this kind of shape, the batch size is a big value,
but M = 1. Our current algorithm is based on [M, N] to partition tiles,
which is not efficient for such kind of shapes. This PR reshapes the
inputs to improve the matmul performance.
Before:  [3448,1,512] x [512,1536] =  [3448,1,1536]
After: [1, 3448, 512] x [512, 1536] = [1, 3448, 1536] , then the output
can be reshaped to [3448, 1, 1536]

The overall MatMul time in demucs model becomes 1778.45 ms from 4418.17
ms on my iGPUs.

---------

Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
2024-11-01 08:04:42 -07:00
..
3rd-party [js/webgpu] Optimize conv1d by conv2d (#19388) 2024-08-22 22:56:07 -07:00
argminmax.ts [js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728) 2024-08-14 16:51:22 -07:00
attention.ts [JS/WebGPU] GroupQueryAttention rewrite (#20946) 2024-10-23 10:14:09 -07:00
batch-norm.ts [js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728) 2024-08-14 16:51:22 -07:00
bias-add.ts [js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728) 2024-08-14 16:51:22 -07:00
bias-split-gelu.ts [js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728) 2024-08-14 16:51:22 -07:00
binary-op.ts [JS/WebGPU] Support WASM64 (#21836) 2024-10-24 20:21:51 -07:00
common.ts [JS/WebGPU] Support WASM64 (#21836) 2024-10-24 20:21:51 -07:00
concat.ts [js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728) 2024-08-14 16:51:22 -07:00
conv-grouped.ts [js/webgpu] Optimize grouped conv (#21892) 2024-09-04 17:16:35 -07:00
conv-transpose.ts [js/webgpu] Fix issue to run model demucs (#22074) 2024-09-16 23:17:10 -07:00
conv.ts [js/webgpu] Fix issue to run model demucs (#22074) 2024-09-16 23:17:10 -07:00
cumsum.ts [js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728) 2024-08-14 16:51:22 -07:00
depth-to-space.ts [js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728) 2024-08-14 16:51:22 -07:00
einsum.ts [js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728) 2024-08-14 16:51:22 -07:00
expand.ts [js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728) 2024-08-14 16:51:22 -07:00
fast-gelu.ts [js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728) 2024-08-14 16:51:22 -07:00
fuse-utils.ts [js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728) 2024-08-14 16:51:22 -07:00
gather-block-quantized.ts [JS/WebGPU] Add GatherBlockQuantized op support (#21734) 2024-08-26 14:46:04 -07:00
gather-elements.ts [js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728) 2024-08-14 16:51:22 -07:00
gather.ts [js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728) 2024-08-14 16:51:22 -07:00
gemm.ts [js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728) 2024-08-14 16:51:22 -07:00
group-query-attention.ts [JS/WebGPU] GroupQueryAttention rewrite (#20946) 2024-10-23 10:14:09 -07:00
instance-norm.ts [js/webgpu] Optimize InstanceNorm in some shapes (#22637) 2024-10-29 17:10:14 -07:00
layer-norm.ts [js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728) 2024-08-14 16:51:22 -07:00
matmul.ts [js/webgpu] Optimize MatMul with M = 1 (#22577) 2024-11-01 08:04:42 -07:00
matmulnbits.ts [js/webgpu] Optimize matmulnbits (#22360) 2024-10-14 15:49:29 -07:00
multihead-attention.ts [JS/WebGPU] GroupQueryAttention rewrite (#20946) 2024-10-23 10:14:09 -07:00
pad.ts [js/webgpu] Enable pad f16 uniform (#21691) 2024-08-26 07:58:48 -07:00
pool.ts [js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728) 2024-08-14 16:51:22 -07:00
quantize-linear.ts [js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728) 2024-08-14 16:51:22 -07:00
range.ts [js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728) 2024-08-14 16:51:22 -07:00
reduce-shared.ts [js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728) 2024-08-14 16:51:22 -07:00
reduce.ts [js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728) 2024-08-14 16:51:22 -07:00
resize.ts [JS/WebGPU] Fixed bugs in inputs validation of Resize (#21955) 2024-10-04 18:29:53 -07:00
rotary-embedding.ts [js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728) 2024-08-14 16:51:22 -07:00
skip-layer-norm.ts [js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728) 2024-08-14 16:51:22 -07:00
slice.ts [js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728) 2024-08-14 16:51:22 -07:00
softmax.ts [js/webgpu] Remove the limitation on axis in softmax (#22231) 2024-09-30 18:27:11 -07:00
split.ts [JS/WebGPU] GroupQueryAttention rewrite (#20946) 2024-10-23 10:14:09 -07:00
tile.ts [js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728) 2024-08-14 16:51:22 -07:00
transpose.ts [js/webgpu] Replace array with string in transpose perm (#21930) 2024-09-16 23:17:46 -07:00
unary-op.ts [js/webgpu] support float16 for Clip (#21584) 2024-08-28 13:19:20 -07:00
where.ts [js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728) 2024-08-14 16:51:22 -07:00