onnxruntime/js/web/lib
Jiajia Qin 8fbbf2fd4f
[js/webgpu] Optimize MatMul with M = 1 (#22577)
### Description
<!-- Describe your changes. -->
BUG #22031

In the demucs model, there are lots of MatMul ops with shapes like
below:
`input[0]: [3448,1,512] | float32, input[1]: [512,1536] | float32,
output[0]: [3448,1,1536] | float32`

We can see that for this kind of shape, the batch size is a big value,
but M = 1. Our current algorithm is based on [M, N] to partition tiles,
which is not efficient for such kind of shapes. This PR reshapes the
inputs to improve the matmul performance.
Before:  [3448,1,512] x [512,1536] =  [3448,1,1536]
After: [1, 3448, 512] x [512, 1536] = [1, 3448, 1536] , then the output
can be reshaped to [3448, 1, 1536]

The overall MatMul time in demucs model becomes 1778.45 ms from 4418.17
ms on my iGPUs.

---------

Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
2024-11-01 08:04:42 -07:00
..
onnxjs [js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728) 2024-08-14 16:51:22 -07:00
wasm [js/webgpu] Optimize MatMul with M = 1 (#22577) 2024-11-01 08:04:42 -07:00
backend-onnxjs.ts [js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728) 2024-08-14 16:51:22 -07:00
backend-wasm.ts [js/web] remove training release (#22103) 2024-09-16 10:56:22 -07:00
build-def.d.ts [js/web] allow build target for non dynamic import (#20898) 2024-06-03 12:33:37 -07:00
index.ts [js/web] remove training release (#22103) 2024-09-16 10:56:22 -07:00
version.ts bumps up version in main from 1.20 -> 1.21 (#22482) 2024-10-17 12:32:35 -07:00