pytorch/cmake/Modules
maajidkhann 5a6ddbcc3b Extending the Pytorch vec backend for SVE (ARM) (#119571)
**Motivation:**
In Pytorch, Aten vectorization supports multiple platforms, including x86 and Arm, as well as multiple data types. It provides a generic implementation of Vector (Vec) type that allows the programmer to write code packing various primitives (such as floats) within 256bit & 512bits registers. It can be extended to support other ISAs easily by adding more VecISA sub-classes.

**Reference Link:** https://github.com/pytorch/pytorch/tree/main/aten/src/ATen/cpu/vec

**This PR:**

* Our goal with this contribution is to add support for SVE backend for Vec in the Aten vectorization for CPU backend which can be benefitted by any ARM architecture supported CPU's that supports SVE.

* More about SVE ISA for ARM: [https://developer.arm.com/Architectures/Scalable Vector Extensions](https://developer.arm.com/Architectures/Scalable%20Vector%20Extensions)

* We are using the ARM C Language Extensions for SVE (https://developer.arm.com/documentation/102699/0100/Optimizing-with-intrinsics ) to accelerate performance for various operators in the SVE backend for Vec.

* Currently we are adding support only for SVE ISA with the vector length of 256 bits (SVE 256). In future, we plan to extend this SVE support for other vector lengths as well.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119571
Approved by: https://github.com/malfet, https://github.com/snadampal

Co-authored-by: Divya Kotadiya <divya.kotadiya@fujitsu.com>
2024-09-18 18:59:10 +00:00
..
FindARM.cmake Extending the Pytorch vec backend for SVE (ARM) (#119571) 2024-09-18 18:59:10 +00:00
FindAtlas.cmake
FindAVX.cmake
FindBenchmark.cmake
FindBLAS.cmake Fix FindBLAS.cmake (#129713) 2024-06-28 02:15:16 +00:00
FindBLIS.cmake
FindCUB.cmake
FindCUDAToolkit.cmake [Reland] Add wrappers for synchronous GPUDirect Storage APIs (#133489) 2024-08-15 17:11:52 +00:00
FindCUDSS.cmake SparseCsrCUDA: cuDSS backend for linalg.solve (#129856) 2024-08-22 07:57:30 +00:00
FindCUSPARSELT.cmake
FindFlexiBLAS.cmake
FindGloo.cmake
FindITT.cmake
FindLAPACK.cmake
FindMAGMA.cmake
FindMKL.cmake Fix mkl-static issue for Windows. (#130697) 2024-07-15 19:28:11 +00:00
FindMKLDNN.cmake Add oneDNN BRGEMM support on CPU (#131878) 2024-09-07 13:22:30 +00:00
FindNCCL.cmake
FindNuma.cmake
FindOpenBLAS.cmake
FindOpenMP.cmake
FindOpenTelemetryApi.cmake
Findpybind11.cmake
FindSanitizer.cmake
FindSYCLToolkit.cmake xpu: fix 3rd party builds on systems with cmake<3.25 (#135767) 2024-09-12 05:31:01 +00:00
FindvecLib.cmake
FindVSX.cmake
FindZVECTOR.cmake
README.md

This folder contains various custom cmake modules for finding libraries and packages. Details about some of them are listed below.

FindOpenMP.cmake

This is modified from the file included in CMake 3.13 release, with the following changes:

  • Replace VERSION_GREATER_EQUAL with NOT ... VERSION_LESS as VERSION_GREATER_EQUAL is not supported in CMake 3.5 (our min supported version).

  • Update the separate_arguments commands to not use NATIVE_COMMAND which is not supported in CMake 3.5 (our min supported version).

  • Make it respect the QUIET flag so that, when it is set, try_compile failures are not reported.

  • For AppleClang compilers, use -Xpreprocessor instead of -Xclang as the later is not documented.

  • For AppleClang compilers, an extra flag option is tried, which is -Xpreprocessor -openmp -I${DIR_OF_omp_h}, where ${DIR_OF_omp_h} is a obtained using find_path on omp.h with brew's default include directory as a hint. Without this, the compiler will complain about missing headers as they are not natively included in Apple's LLVM.

  • For non-GNU compilers, whenever we try a candidate OpenMP flag, first try it with directly linking MKL's libomp if it has one. Otherwise, we may end up linking two libomps and end up with this nasty error:

    OMP: Error #15: Initializing libomp.dylib, but found libiomp5.dylib already
    initialized.
    
    OMP: Hint This means that multiple copies of the OpenMP runtime have been
    linked into the program. That is dangerous, since it can degrade performance
    or cause incorrect results. The best thing to do is to ensure that only a
    single OpenMP runtime is linked into the process, e.g. by avoiding static
    linking of the OpenMP runtime in any library. As an unsafe, unsupported,
    undocumented workaround you can set the environment variable
    KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but
    that may cause crashes or silently produce incorrect results. For more
    information, please see http://openmp.llvm.org/
    

    See NOTE [ Linking both MKL and OpenMP ] for details.