mirror of
https://github.com/saymrwulf/onnxruntime.git
synced 2026-05-18 21:21:17 +00:00
### Description This PR provided a vectorized matmul algorithm. In most situations, we still go to the workgroup memory optimized matmul. But for some situations, like N and K are very small, using workgroup optimized matmul can't fully utilize the underlying hardware due to the 32x32 tile size. So for very small N/K, we switch to the naive vectorized matmul algorithm to improve the hardware execution unit usage. With this PR, matmul with input0: [1, 36864, 3], input1: [1, 3, 3], input2: [3] becomes less than 1 ms from 4.34 ms on Intel Gen9 GPUs. |
||
|---|---|---|
| .. | ||
| onnxjs | ||
| wasm | ||
| backend-onnxjs.ts | ||
| backend-wasm-inference.ts | ||
| backend-wasm-training.ts | ||
| backend-wasm.ts | ||
| build-def.d.ts | ||
| index.ts | ||
| version.ts | ||