onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-21 19:18:55 +00:00

History

Jiajia Qin 8159723ba7 [js/webgpu] Optimize matmulnbits (#22360 ) ### Description <!-- Describe your changes. --> This PR further optimizes matmulnbits specially for iGPUs. The phi3 demo becomes ~12 tokens/second from ~8 tokens on iGPUs. Some todos: 1. Make the optimization more general, Remove the blockSize = 32 limitation. 2. Tune the parameter, such as workgroupSize, components size (currently only support components = 1), to see the performance change.		2024-10-14 15:49:29 -07:00
..
ops	[js/webgpu] Optimize matmulnbits (#22360 )	2024-10-14 15:49:29 -07:00
attribute-with-cache-key.ts	[js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728 )	2024-08-14 16:51:22 -07:00
gpu-data-manager.ts	[js/webgpu] fix external buffer registration (#22254 )	2024-09-28 10:36:40 -07:00
op-resolve-rules.ts	[JS/WebGPU] Add GatherBlockQuantized op support (#21734 )	2024-08-26 14:46:04 -07:00
program-manager.ts	[js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728 )	2024-08-14 16:51:22 -07:00
types.ts	[js/webgpu] Optimize matmulnbits (#22360 )	2024-10-14 15:49:29 -07:00