mirror of
https://github.com/saymrwulf/onnxruntime.git
synced 2026-06-26 03:00:54 +00:00
Use components = 4 if possible. llama3.2-1B becomes 20 tokens/s from 18 tokens/s on my iGPUs. |
||
|---|---|---|
| .. | ||
| ops | ||
| attribute-with-cache-key.ts | ||
| gpu-data-manager.ts | ||
| op-resolve-rules.ts | ||
| program-manager.ts | ||
| types.ts | ||