mirror of
https://github.com/saymrwulf/onnxruntime.git
synced 2026-06-18 01:54:05 +00:00
### Description <!-- Describe your changes. --> This PR further optimizes matmulnbits specially for iGPUs. The phi3 demo becomes ~12 tokens/second from ~8 tokens on iGPUs. Some todos: 1. Make the optimization more general, Remove the blockSize = 32 limitation. 2. Tune the parameter, such as workgroupSize, components size (currently only support components = 1), to see the performance change. |
||
|---|---|---|
| .. | ||
| webgpu | ||
| webnn | ||
| backend-webgpu.ts | ||
| backend-webnn.ts | ||
| init.ts | ||
| log.ts | ||
| tensor-view.ts | ||
| util.ts | ||