mirror of
https://github.com/saymrwulf/pytorch.git
synced 2026-05-14 20:57:59 +00:00
Previously we are using: * mv kernel for M == 1 * mm kernel for 1 < M < 4 * llama.cpp inspired mm kernel for M >= 4 This PR consolidate it to only 2 kernels, use the same mv kernel for M < 12. Benchmarked on https://github.com/malfet/llm_experiments/blob/main/metal-perf/int8mm.mm Mac M1 Max, input size M x 4128 x 4096  Pull Request resolved: https://github.com/pytorch/pytorch/pull/128632 Approved by: https://github.com/malfet |
||
|---|---|---|
| .. | ||
| conda | ||
| src | ||
| tools | ||
| CMakeLists.txt | ||