Commit graph

205 commits

Author SHA1 Message Date
Yu, Guangye
891ba2ec8a Fix xpu cmake typo (#140374)
# Motivation
This PR aims to fix a typo in the CMake build. The typo impacts the XPU Windows build and results in PyTorch being built without XPU, which is unexpected.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140374
Approved by: https://github.com/EikanWang, https://github.com/ezyang, https://github.com/atalman
2024-11-13 00:26:35 +00:00
Yu, Guangye
8051ee802c Add XPU compiler version control in cmake to keep BC (#139258)
# Motivation
This PR aims to maintain backward compatibility when building PyTorch XPU with the old and new compilers.

# Additional Context
The details are described here. The new compiler (2025.0.0) has some breaking changes compared with the old compiler(2024.1), for examples:
1. On Windows, sycl library is named `sycl7.lib` in the old compiler but is named `sycl.lib` in the new compiler.
2. On Linux, in order to support ABI=0, we have to link `libsycl-preview.so` in the old compiler but we could link `libsycl.so` in the new compiler to have the same ABI compatibility.
3. We added a macro `SYCL_COMPILER_VERSION` to support our new code has good backward compatibility with the old compiler. Now the new feature(Event elapsed_time, memory summary, and device architecture property) introduced by the new compiler will be controlled within the macro `SYCL_COMPILER_VERSION`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139258
Approved by: https://github.com/EikanWang, https://github.com/atalman, https://github.com/gujinghui
2024-11-09 13:31:21 +00:00
Irem Yuksel
b021486405 Enable Windows Arm64 (#133088)
This PR enables Pytorch for Windows on Arm64 - CPU only.
Currently, there aren't any checks in place to build and test for Windows on Arm64, but we're working to implement those as soon as possible.
We recommend using [Arm Performance Libraries (APL)](https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Libraries) as a BLAS option, which is introduced in this PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133088
Approved by: https://github.com/malfet

Co-authored-by: cristian panaite <panaite.cristian2000@gmail.com>
Co-authored-by: Stefan-Alin Pahontu <56953855+alinpahontu2912@users.noreply.github.com>
Co-authored-by: Ozan Aydin <148207261+ozanMSFT@users.noreply.github.com>
2024-10-24 16:10:44 +00:00
maajidkhann
5a6ddbcc3b Extending the Pytorch vec backend for SVE (ARM) (#119571)
**Motivation:**
In Pytorch, Aten vectorization supports multiple platforms, including x86 and Arm, as well as multiple data types. It provides a generic implementation of Vector (Vec) type that allows the programmer to write code packing various primitives (such as floats) within 256bit & 512bits registers. It can be extended to support other ISAs easily by adding more VecISA sub-classes.

**Reference Link:** https://github.com/pytorch/pytorch/tree/main/aten/src/ATen/cpu/vec

**This PR:**

* Our goal with this contribution is to add support for SVE backend for Vec in the Aten vectorization for CPU backend which can be benefitted by any ARM architecture supported CPU's that supports SVE.

* More about SVE ISA for ARM: [https://developer.arm.com/Architectures/Scalable Vector Extensions](https://developer.arm.com/Architectures/Scalable%20Vector%20Extensions)

* We are using the ARM C Language Extensions for SVE (https://developer.arm.com/documentation/102699/0100/Optimizing-with-intrinsics ) to accelerate performance for various operators in the SVE backend for Vec.

* Currently we are adding support only for SVE ISA with the vector length of 256 bits (SVE 256). In future, we plan to extend this SVE support for other vector lengths as well.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119571
Approved by: https://github.com/malfet, https://github.com/snadampal

Co-authored-by: Divya Kotadiya <divya.kotadiya@fujitsu.com>
2024-09-18 18:59:10 +00:00
Dmitry Rogozhkin
9852c6d236 xpu: fix 3rd party builds on systems with cmake<3.25 (#135767)
Cmake LINUX variable is available on starting from cmake 3.25. Better to use CMAKE_SYSTEM_NAME instead to relax cmake version requirement.

See: https://cmake.org/cmake/help/v3.25/variable/LINUX.html
Fixes: #135766
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135767
Approved by: https://github.com/malfet, https://github.com/guangyey
2024-09-12 05:31:01 +00:00
CaoE
f7c0c06692 Add oneDNN BRGEMM support on CPU (#131878)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131878
Approved by: https://github.com/jgong5, https://github.com/peterbell10
2024-09-07 13:22:30 +00:00
min-jean-cho
ecbd715363 [Intel GPU][Windows] Fix overriding default CMAKE_CXX_FLAGS (#135093)
The root cause is that `/EHsc` is part of the default `CMAKE_CXX_FLAGS` in CMake.
Fix to not override the default `CMAKE_CXX_FLAGS`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135093
Approved by: https://github.com/EikanWang, https://github.com/atalman
2024-09-05 12:52:43 +00:00
Edward Z. Yang
a258844a32 Properly handle empty CPUINFO variable (#134916)
Fixes https://github.com/pytorch/pytorch/issues/134915

But I did not root cause why CPUINFO is totally empty to begin with...

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134916
Approved by: https://github.com/Skylion007
2024-09-03 15:59:59 +00:00
Yu, Guangye
3402a5d865 fix windows xpu build issue (#133845)
# Motivation
If build XPU via oneAPI 2024.2, it will fail because `sycl-preview.lib` exists in windows. And linking the unexpected lib results in `error LNK2019: unresolved external symbol`.

# Solution
Use explicitly `sycl-preview` in linux build only.

# Additional Context
For `find_library`, please note that the variable will not be updated if it has been stored.
```
If the library is found the result is stored in the variable and the search will not be repeated unless the variable is cleared.
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133845
Approved by: https://github.com/min-jean-cho, https://github.com/EikanWang, https://github.com/atalman, https://github.com/malfet
2024-08-29 23:53:32 +00:00
Zitong Zhan
90c821814e SparseCsrCUDA: cuDSS backend for linalg.solve (#129856)
This PR switches to cuDSS library and has the same purpose of #127692, which is to add Sparse CSR tensor support to linalg.solve.
Fixes #69538

Minimum example of usage:
```
import torch

if __name__ == '__main__':
    spd = torch.rand(4, 3)
    A = spd.T @ spd
    b = torch.rand(3).to(torch.float64).cuda()
    A = A.to_sparse_csr().to(torch.float64).cuda()

    x = torch.linalg.solve(A, b)
    print((A @ x - b).norm())

```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129856
Approved by: https://github.com/amjames, https://github.com/lezcano, https://github.com/huydhn

Co-authored-by: Zihang Fang <zhfang1108@gmail.com>
Co-authored-by: Huy Do <huydhn@gmail.com>
2024-08-22 07:57:30 +00:00
Mikayla Gawarecki
018e48c337 [Reland] Add wrappers for synchronous GPUDirect Storage APIs (#133489)
Reland #130633

USE_CUFILE turned off by default in this version
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133489
Approved by: https://github.com/albanD
2024-08-15 17:11:52 +00:00
Yu, Guangye
92bebb46fa Support XPU ABI=0 build (#130110)
# Motivation
This PR intends to support ABI=0 build for XPU backend.

# Additional Context
The major change is adding a compilation option `-D__INTEL_PREVIEW_BREAKING_CHANGES` for the host compiler(gcc) and `-fpreview-breaking-changes` for XPU device kernel code compiler(icpx), why?
Because we use
- gcc to compile host code and link SYCL runtime. So we need to pass `-D__INTEL_PREVIEW_BREAKING_CHANGES` to tell the host compiler invoking the ABI-neutral API included in SYCL. And
- use icpx to compile device kernel code and link SYCL runtime. So we need to pass `-fpreview-breaking-changes` to tell the device kernel compiler building ABI-neutral code. Besides,
- `libsycl-preview.so` is an ABI-neutral library but `libsycl.so` is not.

This PR depends on https://github.com/pytorch/pytorch/pull/131643.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130110
Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/albanD
2024-08-01 21:42:14 +00:00
PyTorch MergeBot
e191b83462 Revert "Add wrappers for synchronous GPUDirect Storage APIs (#130633)"
This reverts commit 709ddf7a9d.

Reverted https://github.com/pytorch/pytorch/pull/130633 on behalf of https://github.com/clee2000 due to still failing internally D60265673 ([comment](https://github.com/pytorch/pytorch/pull/130633#issuecomment-2253239607))
2024-07-26 18:08:20 +00:00
Mikayla Gawarecki
709ddf7a9d Add wrappers for synchronous GPUDirect Storage APIs (#130633)
Based in part on https://github.com/NVIDIA/apex/pull/1774

Differential Revision: [D60155434](https://our.internmc.facebook.com/intern/diff/D60155434)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130633
Approved by: https://github.com/albanD
2024-07-25 22:23:38 +00:00
PyTorch MergeBot
e4b5645f83 Revert "Add wrappers for synchronous GPUDirect Storage APIs (#130633)"
This reverts commit 5b5e0698a5.

Reverted https://github.com/pytorch/pytorch/pull/130633 on behalf of https://github.com/clee2000 due to breaking a lot of jobs and build rules internally D60085885, possibly needs to update some bazel build? ([comment](https://github.com/pytorch/pytorch/pull/130633#issuecomment-2245806738))
2024-07-23 17:19:34 +00:00
Mikayla Gawarecki
5b5e0698a5 Add wrappers for synchronous GPUDirect Storage APIs (#130633)
Based in part on https://github.com/NVIDIA/apex/pull/1774

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130633
Approved by: https://github.com/albanD
2024-07-22 14:51:24 +00:00
Xu Han
f1456c74a0 Fix mkl-static issue for Windows. (#130697)
Background:
We found the pytorch Windows release/2.4 performance regression: https://github.com/pytorch/pytorch/issues/130619

After some debug works, I found the pytorch Windows static mkl build options are wrong:
<img width="1049" alt="image" src="https://github.com/user-attachments/assets/38692142-bfca-4c98-8092-6e105c82bb13">
1. Thread lib is wrong.
2. Miss `openmp` lib and config.
> Debug history: https://github.com/pytorch/pytorch/issues/130619#issuecomment-2226782504 and https://github.com/pytorch/pytorch/issues/130619#issuecomment-2226418611

This PR will fix `mkl-static` build options issue.
<img width="863" alt="image" src="https://github.com/user-attachments/assets/834f6cee-7e6d-4d74-b2bc-8a270f05e429">

Reference:
<img width="482" alt="image" src="https://github.com/user-attachments/assets/8184dadb-f230-4062-a49f-51df1d7285f5">

https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.html#gs.c6izlg

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130697
Approved by: https://github.com/jgong5, https://github.com/atalman
2024-07-15 19:28:11 +00:00
Nikita Shulga
fe4032fe20 [BE][CMake] Do not use EXEC_PROGRAM (#129714)
It was deprecated since CMake-3.0 in favor of `execute_process`, see https://cmake.org/cmake/help/v3.18/command/exec_program.html

This makes the following warning disappear:
```
CMake Warning (dev) at cmake/Modules/FindARM.cmake:5 (EXEC_PROGRAM):
  Policy CMP0153 is not set: The exec_program command should not be called.
  Run "cmake --help-policy CMP0153" for policy details.  Use the cmake_policy
  command to set the policy and suppress this warning.

  Use execute_process() instead.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129714
Approved by: https://github.com/kit1980
2024-06-28 13:29:52 +00:00
Nikita Shulga
4b598d87d3 Fix FindBLAS.cmake (#129713)
Fixes regression introduced by https://github.com/pytorch/pytorch/pull/125227 by adding `INCLUDE(CheckFunctionExists)` that fixes
```
CMake Error at cmake/Modules/FindBLAS.cmake:413 (check_function_exists):
  Unknown CMake command "check_function_exists".
```

Fixes https://github.com/pytorch/pytorch/issues/129693

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129713
Approved by: https://github.com/kit1980
2024-06-28 02:15:16 +00:00
vinithakv
f8db12a538 Fix logic to find sbgemm in BLAS library (#125227)
Current logic to set the HAS_SBGEMM flag is ignored in case the BLAS libraries are found already, ie, if set from environment variable BLAS=OpenBLAS . If BLAS_LIBRARIES are already set the code to find if BLAS_LIBRARY has sbgemm is never executed. The following commit brings out this logic outside unconditionally.

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125227
Approved by: https://github.com/malfet
2024-06-25 16:34:38 +00:00
sdp
b4a0161449 Build SYCL kernels for ATen XPU ops on Native Windows (take 2) (#127390)
Original PR https://github.com/pytorch/pytorch/pull/126725 is closed due to bad rebase.

-------
As proposed in https://github.com/pytorch/pytorch/issues/126719, we are enabling PyTorch XPU on Native Windows on Intel GPU.

This PR  enables XPU build on Windows as the first step of #126719:

- Enable `USE_XPU` build on Windows using MSVC as host compiler. The use of MSVC as host compiler seamlessly aligns with the existing PyTorch build on Windows.
- Build oneDNN GPU library on Windows.

Co-authored-by: Yu, Guangye <guangye.yu@intel.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127390
Approved by: https://github.com/guangyey, https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/ezyang
2024-06-06 01:41:06 +00:00
cyy
3d617333e7 Simplify CMake code (#127683)
Due to the recent adoption of find(python), it is possible to further simplify some CMake code.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127683
Approved by: https://github.com/ezyang
2024-06-05 15:17:31 +00:00
cyy
d44daebdbc [Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051
Approved by: https://github.com/cpuhrsch, https://github.com/malfet
2024-05-31 01:20:45 +00:00
cyy
8777443d73 Remove FindMatlabMex.cmake (#127414)
It is not used anymore.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127414
Approved by: https://github.com/ezyang
2024-05-30 16:26:35 +00:00
Dmitry Rogozhkin
9f73c65b8f xpu: pass MAX_JOBS building xpu_mkldnn_proj (#126562)
mkldnn is quite big project and MAX_JOBS support is essential when building on a system with big number of cpus and limited memory.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126562
Approved by: https://github.com/jgong5, https://github.com/guangyey, https://github.com/albanD
2024-05-30 12:10:33 +00:00
PyTorch MergeBot
67739d8c6f Revert "[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051)"
This reverts commit 699db7988d.

Reverted https://github.com/pytorch/pytorch/pull/127051 on behalf of https://github.com/PaliC due to This PR needs to be synced using the import button as there is a bug in our diff train ([comment](https://github.com/pytorch/pytorch/pull/127051#issuecomment-2138496995))
2024-05-30 01:16:57 +00:00
cyy
8ea1dc8748 Use Python::NumPy target (#127399)
Now that we use FindPython, use it again for numpy detection.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127399
Approved by: https://github.com/malfet
2024-05-29 23:17:58 +00:00
Nikita Shulga
0910429d72 [BE][CMake] Use FindPython module (#124613)
As FindPythonInterp and FindPythonLibs has been deprecated since cmake-3.12

Replace `PYTHON_EXECUTABLE` with `Python_EXECUTABLE` everywhere (CMake variable names are case-sensitive)

This makes PyTorch buildable with python3 binary shipped with XCode on MacOS

TODO: Get rid of `FindNumpy` as its part of Python package
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124613
Approved by: https://github.com/cyyever, https://github.com/Skylion007
2024-05-29 13:17:35 +00:00
cyy
699db7988d [Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051
Approved by: https://github.com/cpuhrsch, https://github.com/malfet
2024-05-29 11:58:03 +00:00
PyTorch MergeBot
cdbb2c9acc Revert "[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051)"
This reverts commit 4fdbaa794f.

Reverted https://github.com/pytorch/pytorch/pull/127051 on behalf of https://github.com/PaliC due to This PR needs to be synced using the import button as there is a bug in our diff train ([comment](https://github.com/pytorch/pytorch/pull/127051#issuecomment-2136428735))
2024-05-29 03:02:35 +00:00
cyy
4fdbaa794f [Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051
Approved by: https://github.com/cpuhrsch, https://github.com/malfet
2024-05-27 03:54:03 +00:00
Bas Zalmstra
a8eac0efa8 fix: unknown CMake command "check_function_exists" (#126165)
When building pytorch with OpenBLAS on windows I ran into this CMake issue:

```
CMake Error at cmake/Modules/FindLAPACK.cmake:137 (check_function_exists):
  Unknown CMake command "check_function_exists".
Call Stack (most recent call first):
  cmake/Dependencies.cmake:1745 (find_package)
  CMakeLists.txt:708 (include)
```

Similarly described here: https://discuss.pytorch.org/t/cmake-with-error-by-compiling-on-windows-with-mingw32-make/159140

This PR fixes this issue by adding:

```
include(CheckFunctionExists)
```

To the offending CMake file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126165
Approved by: https://github.com/ezyang
2024-05-14 20:54:06 +00:00
cyy
83845a7c78 [1/2] Remove caffe2 db and distributed from build system (#125092)
This PR tries to decompose https://github.com/pytorch/pytorch/pull/122527 into a smaller one. Caffe2 db, distributed and some binaries have been removed.
To be noted, this was inspired and is co-dev with @r-barnes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125092
Approved by: https://github.com/malfet
2024-05-04 06:48:46 +00:00
Aleksei Nikiforov
2b5ae2611e s390x: use runtime detection for vectorization support (#123936)
s390x: use runtime detection for vectorization support

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123936
Approved by: https://github.com/malfet, https://github.com/jansel, https://github.com/xuhancn
2024-05-03 21:34:37 +00:00
aaitzhan
e3627d05e7 [CMake] Add NVPL BLAS/LAPACK option (#125268)
This PR add a [NVPL](https://docs.nvidia.com/nvpl/introduction.html) BLAS/LAPACK option to CMake for `aarch64` (ARM) machines.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125268
Approved by: https://github.com/albanD
2024-05-01 17:26:28 +00:00
cyy
04c6424fbf Remove caffe2 image and video (#125045)
This PR tries to decompose https://github.com/pytorch/pytorch/pull/122527 into a smaller one. Caffe2 image and video folders are removed along with the related CMake code.
To be noted, this was inspired and is co-dev with @r-barnes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125045
Approved by: https://github.com/eqy, https://github.com/albanD
2024-04-30 17:31:57 +00:00
Xu Han
44bb5da529 Fix mkl cmake not support static mkl on Windows. (#124925)
Fixes #124869

Fix mkl not support static library on Windows.
# Local test:
## MKL static:
![image](https://github.com/pytorch/pytorch/assets/8433590/9c6ee5f8-9844-4383-acbd-6b22aff06daa)
MKL backend check:
<img width="724" alt="Image" src="https://github.com/pytorch/pytorch/assets/8433590/e45e12a5-2dfc-47a1-ad94-32a667bd4799">

## MKL shared, original path:
![image](https://github.com/pytorch/pytorch/assets/8433590/27a822c7-c4ab-4e5f-bbdb-8c4b085140e5)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124925
Approved by: https://github.com/jgong5, https://github.com/ezyang
2024-04-25 14:21:15 +00:00
Chirag Pandya
fd90991790 [rfc] opentelemetry in pytorch (#122999)
1. Add current latest version (opentelemetry-cpp version v1.14.2) to PyTorch library.
Steps:
```
$cd pytorch
$git submodule add https://github.com/open-telemetry/opentelemetry-cpp.git third_party/opentelemetry-cpp
$cd third_party/opentelemetry-cpp
$git checkout v1.14.2
$git add third_party/opentelemetry-cpp .gitmodules
$git commit
```
Expected change in checkout size:
```
(/home/cpio/local/a/pytorch-env) [cpio@devvm17556.vll0 ~/local/pytorch (gh/c-p-i-o/otel)]$ git count-objects -vH
count: 654
size: 3.59 MiB
in-pack: 1229701
packs: 17
size-pack: 1.17 GiB
prune-packable: 76
garbage: 0
size-garbage: 0 bytes
```

2.

TODO
- [x] Figure out how dynamic linking works. App builders will somehow need to `target_include` opentelemetry-cpp at runtime.
- [ ] Examples on how to use opentelemetry + pytorch
- [ ] Tests + documentation (e.g. using null opentelemetry implementation).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122999
Approved by: https://github.com/ezyang
2024-04-21 15:20:21 +00:00
ZhiweiYan-96
9875a834e4 [Intel GPU] oneDNN GPU GEMM support (#117202)
# Motivation

This PR is a part of RFC #114848, and it  is a successor PR of #116249 and #116019. This PR would depend on oneDNN compilation in #116249. Some runtime support is needed in #116019.

Aten operators like `addmm`, `baddmm` is defined in `Blas.cpp` in `aten/src/ATen/native/mkldnn/xpu/`.

Accompanied with these files provide core functionaliy, `BlasImpl.h`, `Utils.h` and other file provide basic utilities for them. For instance, `Utils.h` provide common memory descriptor query utils for `Matmul.h` and these utility function will also be used in other primitive, like `convolution`.  `BlasImpl.h` is a header file that provide helper for handling shape info processing in matmul related operators. It would not only help basic GEMM operator like `addmm, baddmm` but also help fusion operators used in `torch.compile` like `linear_pointwise` in #117824.

In next stage, we would continually complete the oneDNN support through enabling  `matmul fusion`  and `convolution` related code.

Co-authored-by: xiaolil1 <xiaoli.liu@intel.com>
Co-authored-by: lei,zhenyuan <zhenyuan.lei@intel.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117202
Approved by: https://github.com/EikanWang, https://github.com/jgong5, https://github.com/malfet
ghstack dependencies: #117098, #117112
2024-04-17 23:06:38 +00:00
ZhiweiYan-96
1cdde98df4 Intel GPU oneDNN upstreaming for library compilation (#117098)
# Motivation

As proposed  in https://github.com/pytorch/pytorch/issues/114848 and https://github.com/pytorch/pytorch/issues/114723, oneDNN library is an important component for Intel GPU software ecosystem.

This PR is intended to enable oneDNN compilation for Intel GPU.  It is the first step for we enabling any operators like `at::baddmm`.
With this PR, a static library `libdnnl.a` for GPU would be compiled in directory `/build/xpumkldnn_proj-prefix`.  It can be further linked to `libtorch_xpu.so` in future. The compilation would  depend on `USE_XPU` bool variables and runtime check like SYCL, which is defined in https://github.com/pytorch/pytorch/pull/116019 for runtime support. Once the #116019 merged, the compilation should be able to be triggered.

The modification is independent to oneDNN CPU compilation, hence no modification would be introduced for CPU Cmakefiles(e.g. FindMKLDNN.cmake)

Co-authored-by: xiaolil1 <xiaoli.liu@intel.com>
Co-authored-by: lei,zhenyuan <zhenyuan.lei@intel.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117098
Approved by: https://github.com/EikanWang, https://github.com/jgong5, https://github.com/atalman
2024-04-12 13:46:22 +00:00
Nikita Shulga
291848bf30 [Build] Fix AVX detection logic (#122708)
`CXX_AVX[2|512]_FOUND` flags should indicate whether compiler supports generating code  for given instruction set, rather than whether host machine can run the generated code.

This fixes a weird problem that surfaced after https://github.com/pytorch/pytorch/pull/122503 when builder can sometimes be dispatched to an old CPU architecture, that can not run AVX512 instructions, but can compile for those just fine

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122708
Approved by: https://github.com/jeanschmidt
2024-03-26 20:37:35 +00:00
CaoE
6bd1807ae9 enable mkl_gemm_f16f16f32 in cpublas::gemm (#118367)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118367
Approved by: https://github.com/jgong5, https://github.com/cpuhrsch
2024-01-31 18:37:42 +00:00
yanbing-j
4b4e6550f2 Update oneDNN build option for older systems (#118057)
Fixes [#116623](https://github.com/pytorch/pytorch/issues/116623).

As we discussed in https://github.com/pytorch/pytorch/issues/116623#issuecomment-1900406773 and https://github.com/pytorch/pytorch/issues/116623#issuecomment-1900825829, we update oneDNN build option to support older systems and document we only support CPUs with SSE4.1+.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118057
Approved by: https://github.com/malfet
2024-01-25 11:34:51 +00:00
mantaionut
6784594532 Fix sparse windows on CPU with MKL (#102604)
Fix https://github.com/pytorch/pytorch/issues/97352.
This PR changes the way the linking to intel MKL is done and updating MKL on Windows to mkl-2021.4.0 .
There are for both conda and pip packages MKL  version with which you can link dynamically. mkl-devel contains the static versions of the dlls and MKL contains the needed dlls for the runtime. MKL dlls and static libs starting with  2021.4.0 have the version in their names( for MKL 2023 we have mkl_core.2.dll and for 2021.4.0 we have mkl_core.1.dll) so its possible to have multiple versions installed and it will work properly.
For the wheel build, I added dependency for whell MKL and on conda a dependecy for the conda MKL  and on libtorch I copied the MKL binaries in libtorch.
In order to test this PR I have to use custom builder https://github.com/pytorch/builder/pull/1467

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102604
Approved by: https://github.com/IvanYashchuk, https://github.com/malfet
2024-01-23 17:41:18 +00:00
Yu, Guangye
50049cfaa0 [1/4] Intel GPU Runtime Upstreaming for Device (#116019)
# Motivation
As mentioned in [[RFC] Intel GPU Runtime Upstreaming](https://github.com/pytorch/pytorch/issues/114842), The first runtime component we would like to upstream is `Device` which contains the device management functions of Intel GPU's runtime. To facilitate the code review, we split the code changes into 4 PRs. This is one of the 4 PRs and covers the changes under `c10`.

# Design
Intel GPU device is a wrapper of sycl device on which kernels can be executed. In our design, we will maintain a sycl device pool containing all the GPU devices of the current machine, and manage the status of the device pool by PyTorch. The thread local safe is considered in this design. The corresponding C++ files related to `Device` will be placed in c10/xpu folder. And we provide the c10 device runtime APIs, like
  - `c10::xpu::device_count`
  - `c10::xpu::set_device`
  - ...

# Additional Context
In our plan, 4 PRs should be submitted to PyTorch for `Device`:
1. for c10
2. for aten
3. for python frontend
4. for lazy initialization shared with CUDA

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116019
Approved by: https://github.com/gujinghui, https://github.com/jgong5, https://github.com/EikanWang, https://github.com/malfet
2024-01-12 07:36:25 +00:00
PyTorch MergeBot
9ac0e6971a Revert "[1/4] Intel GPU Runtime Upstreaming for Device (#116019)"
This reverts commit b4cebe2c34.

Reverted https://github.com/pytorch/pytorch/pull/116019 on behalf of https://github.com/malfet due to Broke internal and periodic buck builds, see https://github.com/pytorch/pytorch/actions/runs/7414664129/job/20176215868 ([comment](https://github.com/pytorch/pytorch/pull/116019#issuecomment-1879030285))
2024-01-05 17:36:39 +00:00
Yu, Guangye
b4cebe2c34 [1/4] Intel GPU Runtime Upstreaming for Device (#116019)
# Motivation
As mentioned in [[RFC] Intel GPU Runtime Upstreaming](https://github.com/pytorch/pytorch/issues/114842), The first runtime component we would like to upstream is `Device` which contains the device management functions of Intel GPU's runtime. To facilitate the code review, we split the code changes into 4 PRs. This is one of the 4 PRs and covers the changes under `c10`.

# Design
Intel GPU device is a wrapper of sycl device on which kernels can be executed. In our design, we will maintain a sycl device pool containing all the GPU devices of the current machine, and manage the status of the device pool by PyTorch. The thread local safe is considered in this design. The corresponding C++ files related to `Device` will be placed in c10/xpu folder. And we provide the c10 device runtime APIs, like
  - `c10::xpu::device_count`
  - `c10::xpu::set_device`
  - ...

# Additional Context
In our plan, 4 PRs should be submitted to PyTorch for `Device`:
1. for c10
2. for aten
3. for python frontend
4. for lazy initialization shared with CUDA

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116019
Approved by: https://github.com/gujinghui, https://github.com/jgong5, https://github.com/EikanWang, https://github.com/malfet
2024-01-04 17:35:04 +00:00
Sunita Nadampalli
db8f9686a7 [cmake] set 'mcpu=generic' as the default build flag for mkldnn on aarch64 (#113820)
This is to remove the dependencies on mkldnn cmake default definitions

Fixes https://github.com/pytorch/pytorch/issues/109312

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113820
Approved by: https://github.com/malfet
2023-11-22 02:49:33 +00:00
Alin Pahontu
21d77bcf80 added path to correct directory containing headers (#110063)
After make install the headers are placed in include/openblas/ folder instead of include/ folder. Updated FindOpenBLAS.cmake to make that change clear.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110063
Approved by: https://github.com/Blackhex, https://github.com/kit1980
2023-10-04 21:56:36 +00:00
Andrei Gheorghe
2028987bf7 Fix finding Intel MKL on Windows, as well as LAPACK, cuDNN and cuSPARSELt (#108040)
Fixes #108039

Intel MKL is now found correctly:

-- MKL libraries: C:/Program Files (x86)/Intel/oneAPI/mkl/latest/lib/intel64/mkl_intel_lp64.lib;C:/Program Files (x86)/Intel/oneAPI/mkl/latest/lib/intel64/mkl_sequential.lib;C:/Program Files (x86)/Intel/oneAPI/mkl/latest/lib/intel64/mkl_core.lib
-- MKL include directory: C:/Program Files (x86)/Intel/oneAPI/mkl/latest/include

and LAPACK too (excerpt from build.ninja):

LINK_LIBRARIES = lib\c10.lib  lib\pthreadpool.lib  lib\cpuinfo.lib  lib\XNNPACK.lib  lib\fbgemm.lib  lib\libittnotify.lib  lib\gloo.lib  lib\foxi_loader.lib  lib\kineto.lib  "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\mkl_intel_lp64.lib"  "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\mkl_sequential.lib"  "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\mkl_core.lib"  "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\**mkl_lapack95_lp64.lib**"  "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\mkl_intel_lp64.lib"  "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\mkl_sequential.lib"  "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\mkl_core.lib"  "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\mkl_intel_lp64.lib"  "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\mkl_sequential.lib"  "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\mkl_core.lib"

cuSPARSELt is also found correctly:

-- Found CUSPARSELT: C:/Program Files/NVIDIA cuSPARSELt/v0.4/lib/cusparseLt.lib

Also cuDNN include directory is properly added for the test target cuda_cudnn_test:

build caffe2\CMakeFiles\cuda_cudnn_test.dir\__\aten\src\ATen\test\cuda_cudnn_test.cpp.obj: CXX_COMPILER__cuda_cudnn_test_RelWithDebInfo C$:\work\Repos\pytorch\aten\src\ATen\test\cuda_cudnn_test.cpp || cmake_object_order_depends_target_cuda_cudnn_test
  DEFINES = ....
  FLAGS = ....
  INCLUDES = -IC:\work\Repos\pytorch\build\aten\src -IC:\work\Repos\pytorch\aten\src ........... -external:IC:\work\Repos\pytorch\third_party\ittapi\include -external:IC:\work\Repos\pytorch\cmake\..\third_party\eigen -external:I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\include" -external:IC:\work\Repos\pytorch\torch\include -external:IC:\work\Repos\pytorch\third_party\ideep\include -external:IC:\work\Repos\pytorch\third_party\googletest\googletest\include -external:IC:\work\Repos\pytorch\third_party\googletest\googletest **-external:I"C:\Program Files\NVIDIA cuDNN\include"** -external:IC:\work\Repos\pytorch\cmake\..\third_party\cudnn_frontend\include -external:W0

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108040
Approved by: https://github.com/ezyang
2023-09-08 14:41:00 +00:00