pytorch/torch/_C/_cpu.pyi
Aditya Tewari 575f260229 Extend vectorization with SVE(ARM) with Torch Compile (Inductor) (#134672)
**Motivation**
Enable SVE vectorization with `torch.compile`
Extends PR: #119571

* This PR enables vectorization for codegen part using SVE-256 (vec length)
* The changes can be extended to other SVE vec lengths

I've done some comparisons against existing NEON implementation with SVE vectorization enabled route for `torch.compile`
Test results are for 8 cores on ARM Neoverse_V1

<img width="359" alt="Screenshot 2024-08-28 at 16 02 07" src="https://github.com/user-attachments/assets/6961fbea-8285-4ca3-b92e-934a2db50ee2">

It's worth mentioning, for standalone `SiLU op` there's a `~1.8x` speedup with `torch.compile`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134672
Approved by: https://github.com/jgong5, https://github.com/malfet
2024-10-10 13:20:40 +00:00

13 lines
433 B
Python

from torch.types import _bool, _int
# Defined in torch/csrc/cpu/Module.cpp
def _is_avx2_supported() -> _bool: ...
def _is_avx512_supported() -> _bool: ...
def _is_avx512_vnni_supported() -> _bool: ...
def _is_avx512_bf16_supported() -> _bool: ...
def _is_amx_tile_supported() -> _bool: ...
def _init_amx() -> _bool: ...
def _is_arm_sve_supported() -> _bool: ...
def _L1d_cache_size() -> _int: ...
def _L2_cache_size() -> _int: ...