mirror of
https://github.com/saymrwulf/pytorch.git
synced 2026-05-15 21:00:47 +00:00
**Motivation** Enable SVE vectorization with `torch.compile` Extends PR: #119571 * This PR enables vectorization for codegen part using SVE-256 (vec length) * The changes can be extended to other SVE vec lengths I've done some comparisons against existing NEON implementation with SVE vectorization enabled route for `torch.compile` Test results are for 8 cores on ARM Neoverse_V1 <img width="359" alt="Screenshot 2024-08-28 at 16 02 07" src="https://github.com/user-attachments/assets/6961fbea-8285-4ca3-b92e-934a2db50ee2"> It's worth mentioning, for standalone `SiLU op` there's a `~1.8x` speedup with `torch.compile` Pull Request resolved: https://github.com/pytorch/pytorch/pull/134672 Approved by: https://github.com/jgong5, https://github.com/malfet
13 lines
433 B
Python
13 lines
433 B
Python
from torch.types import _bool, _int
|
|
|
|
# Defined in torch/csrc/cpu/Module.cpp
|
|
|
|
def _is_avx2_supported() -> _bool: ...
|
|
def _is_avx512_supported() -> _bool: ...
|
|
def _is_avx512_vnni_supported() -> _bool: ...
|
|
def _is_avx512_bf16_supported() -> _bool: ...
|
|
def _is_amx_tile_supported() -> _bool: ...
|
|
def _init_amx() -> _bool: ...
|
|
def _is_arm_sve_supported() -> _bool: ...
|
|
def _L1d_cache_size() -> _int: ...
|
|
def _L2_cache_size() -> _int: ...
|