mirror of
https://github.com/saymrwulf/onnxruntime.git
synced 2026-05-22 22:01:08 +00:00
### Description This PR applies subgroup to implement matmulnbits when tile_m > 1 for intel devices. With this PR, prefill for 500 tokens prompt for phi3 becomes 3.5s from 8.5s on intel Meteor Lake. |
||
|---|---|---|
| .. | ||
| common | ||
| dll | ||
| dlpack | ||
| eager | ||
| flatbuffers | ||
| framework | ||
| graph | ||
| mickey | ||
| mlas | ||
| optimizer | ||
| platform | ||
| providers | ||
| quantization | ||
| session | ||
| util | ||