onnxruntime/js/web/lib/wasm/jsep/webgpu
Jiajia Qin 8159723ba7
[js/webgpu] Optimize matmulnbits (#22360)
### Description
<!-- Describe your changes. -->
This PR further optimizes matmulnbits specially for iGPUs. The phi3 demo
becomes ~12 tokens/second from ~8 tokens on iGPUs.

Some todos:
1. Make the optimization more general, Remove the blockSize = 32
limitation.
2. Tune the parameter, such as workgroupSize, components size (currently
only support components = 1), to see the performance change.
2024-10-14 15:49:29 -07:00
..
ops [js/webgpu] Optimize matmulnbits (#22360) 2024-10-14 15:49:29 -07:00
attribute-with-cache-key.ts
gpu-data-manager.ts [js/webgpu] fix external buffer registration (#22254) 2024-09-28 10:36:40 -07:00
op-resolve-rules.ts [JS/WebGPU] Add GatherBlockQuantized op support (#21734) 2024-08-26 14:46:04 -07:00
program-manager.ts
types.ts [js/webgpu] Optimize matmulnbits (#22360) 2024-10-14 15:49:29 -07:00