onnxruntime/js/web/test
Jiajia Qin 8fbbf2fd4f
[js/webgpu] Optimize MatMul with M = 1 (#22577)
### Description
<!-- Describe your changes. -->
BUG #22031

In the demucs model, there are lots of MatMul ops with shapes like
below:
`input[0]: [3448,1,512] | float32, input[1]: [512,1536] | float32,
output[0]: [3448,1,1536] | float32`

We can see that for this kind of shape, the batch size is a big value,
but M = 1. Our current algorithm is based on [M, N] to partition tiles,
which is not efficient for such kind of shapes. This PR reshapes the
inputs to improve the matmul performance.
Before:  [3448,1,512] x [512,1536] =  [3448,1,1536]
After: [1, 3448, 512] x [512, 1536] = [1, 3448, 1536] , then the output
can be reshaped to [3448, 1, 1536]

The overall MatMul time in demucs model becomes 1778.45 ms from 4418.17
ms on my iGPUs.

---------

Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
2024-11-01 08:04:42 -07:00
..
data/ops [js/webgpu] Optimize MatMul with M = 1 (#22577) 2024-11-01 08:04:42 -07:00
e2e [js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728) 2024-08-14 16:51:22 -07:00
unittests [js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728) 2024-08-14 16:51:22 -07:00
op-test-schema.json [js/web] Add support for int4/uint4 tensor (#21720) 2024-08-15 21:32:10 -07:00
suite-test-list.jsonc [WebNN] Support And, Or and Xor ops (#22598) 2024-10-30 17:52:10 -07:00
test-main.ts [js/webgpu] Manage model download with a specific unittest option (#22214) 2024-09-30 18:27:43 -07:00
test-runner.ts [WebNN EP] Use boolean flags instead of MLTensorUsage (#22497) 2024-10-22 17:20:36 -07:00
test-shared.ts [js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728) 2024-08-14 16:51:22 -07:00
test-types.ts [js/webgpu] Manage model download with a specific unittest option (#22214) 2024-09-30 18:27:43 -07:00