pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-14 20:57:59 +00:00

History

Mengwei Liu 6564d63e69 Use mv kernel for small M (#128632 ) Previously we are using: * mv kernel for M == 1 * mm kernel for 1 < M < 4 * llama.cpp inspired mm kernel for M >= 4 This PR consolidate it to only 2 kernels, use the same mv kernel for M < 12. Benchmarked on https://github.com/malfet/llm_experiments/blob/main/metal-perf/int8mm.mm Mac M1 Max, input size M x 4128 x 4096 ![llama cpp shader and ATen shader (2)](https://github.com/pytorch/pytorch/assets/8188269/9e2e3024-c5ea-4303-88bf-ff3646296396) Pull Request resolved: https://github.com/pytorch/pytorch/pull/128632 Approved by: https://github.com/malfet		2024-06-14 01:06:53 +00:00
..
conda
src	Use mv kernel for small M (#128632 )	2024-06-14 01:06:53 +00:00
tools
CMakeLists.txt