mirror of
https://github.com/saymrwulf/onnxruntime.git
synced 2026-05-26 22:35:43 +00:00
### Description <!-- Describe your changes. --> This PR further optimizes matmulnbits specially for iGPUs. The phi3 demo becomes ~12 tokens/second from ~8 tokens on iGPUs. Some todos: 1. Make the optimization more general, Remove the blockSize = 32 limitation. 2. Tune the parameter, such as workgroupSize, components size (currently only support components = 1), to see the performance change. |
||
|---|---|---|
| .. | ||
| jsep | ||
| proxy-worker | ||
| proxy-messages.ts | ||
| proxy-wrapper.ts | ||
| run-options.ts | ||
| session-handler-inference.ts | ||
| session-options.ts | ||
| wasm-common.ts | ||
| wasm-core-impl.ts | ||
| wasm-factory.ts | ||
| wasm-types.ts | ||
| wasm-utils-env.ts | ||
| wasm-utils-import.ts | ||
| wasm-utils-load-file.ts | ||
| wasm-utils.ts | ||