2020-09-17 02:12:25 +00:00
|
|
|
.. role:: hidden
|
|
|
|
|
:class: hidden-section
|
|
|
|
|
|
|
|
|
|
torch.backends
|
|
|
|
|
==============
|
2022-03-10 22:21:03 +00:00
|
|
|
.. automodule:: torch.backends
|
2020-09-17 02:12:25 +00:00
|
|
|
|
|
|
|
|
`torch.backends` controls the behavior of various backends that PyTorch supports.
|
|
|
|
|
|
|
|
|
|
These backends include:
|
|
|
|
|
|
2023-05-03 19:02:07 +00:00
|
|
|
- ``torch.backends.cpu``
|
2020-09-17 02:12:25 +00:00
|
|
|
- ``torch.backends.cuda``
|
|
|
|
|
- ``torch.backends.cudnn``
|
2024-08-21 17:13:14 +00:00
|
|
|
- ``torch.backends.cusparselt``
|
2024-01-02 21:13:55 +00:00
|
|
|
- ``torch.backends.mha``
|
2022-08-03 18:33:15 +00:00
|
|
|
- ``torch.backends.mps``
|
2020-09-17 02:12:25 +00:00
|
|
|
- ``torch.backends.mkl``
|
|
|
|
|
- ``torch.backends.mkldnn``
|
Add setUserEnabledNNPACK config (#116152)
When exporting a model with a convolution kernel on cpu, if mkldnn is disabled and nnpack is enabled, export will go down the nnpack optimized convolution kernel for certain shapes ((code pointer)[https://github.com/pytorch/pytorch/blob/cd449e260c830c9ce0f06ed4833b46aa638f1529/aten/src/ATen/native/Convolution.cpp#L542-L552]). This means that we will automatically create a guard on that certain shape. If users want to export without any restrictions, one option is to disable nnpack. However, no config function exists for this, so this PR is adding a config function, similar to the `set_mkldnn_enabled` function.
Original context is in https://fb.workplace.com/groups/1075192433118967/posts/1349589822345892/?comment_id=1349597102345164&reply_comment_id=1349677642337110.
To test the flag, the following script runs successfully:
```
import os
import torch
from torchvision.models import ResNet18_Weights, resnet18
torch.set_float32_matmul_precision("high")
model = resnet18(weights=ResNet18_Weights.DEFAULT)
model.eval()
with torch.no_grad():
# device = "cuda" if torch.cuda.is_available() else "cpu"
torch.backends.mkldnn.set_flags(False)
torch.backends.nnpack.set_flags(False) # <--- Added config
device = "cpu"
model = model.to(device=device)
example_inputs = (torch.randn(2, 3, 224, 224, device=device),)
batch_dim = torch.export.Dim("batch", min=2, max=32)
so_path = torch._export.aot_compile(
model,
example_inputs,
# Specify the first dimension of the input x as dynamic
dynamic_shapes={"x": {0: batch_dim}},
# Specify the generated shared library path
options={
"aot_inductor.output_path": os.path.join(os.getcwd(), "resnet18_pt2.so"),
"max_autotune": True,
},
)
```
I'm not sure who to add as reviewer, so please feel free to add whoever is relevant!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116152
Approved by: https://github.com/malfet
2023-12-27 06:00:12 +00:00
|
|
|
- ``torch.backends.nnpack``
|
2020-09-17 02:12:25 +00:00
|
|
|
- ``torch.backends.openmp``
|
2022-10-05 06:33:25 +00:00
|
|
|
- ``torch.backends.opt_einsum``
|
2022-07-29 12:57:19 +00:00
|
|
|
- ``torch.backends.xeon``
|
2020-09-17 02:12:25 +00:00
|
|
|
|
2023-05-03 19:02:07 +00:00
|
|
|
torch.backends.cpu
|
|
|
|
|
^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
.. automodule:: torch.backends.cpu
|
|
|
|
|
|
|
|
|
|
.. autofunction:: torch.backends.cpu.get_cpu_capability
|
2020-09-17 02:12:25 +00:00
|
|
|
|
|
|
|
|
torch.backends.cuda
|
|
|
|
|
^^^^^^^^^^^^^^^^^^^
|
2022-03-10 22:21:03 +00:00
|
|
|
.. automodule:: torch.backends.cuda
|
2020-09-17 02:12:25 +00:00
|
|
|
|
|
|
|
|
.. autofunction:: torch.backends.cuda.is_built
|
|
|
|
|
|
2023-05-22 14:58:33 +00:00
|
|
|
.. currentmodule:: torch.backends.cuda.matmul
|
|
|
|
|
.. attribute:: allow_tf32
|
2020-09-17 02:12:25 +00:00
|
|
|
|
|
|
|
|
A :class:`bool` that controls whether TensorFloat-32 tensor cores may be used in matrix
|
|
|
|
|
multiplications on Ampere or newer GPUs. See :ref:`tf32_on_ampere`.
|
|
|
|
|
|
2023-05-22 14:58:33 +00:00
|
|
|
.. attribute:: allow_fp16_reduced_precision_reduction
|
2021-11-10 01:24:48 +00:00
|
|
|
|
|
|
|
|
A :class:`bool` that controls whether reduced precision reductions (e.g., with fp16 accumulation type) are allowed with fp16 GEMMs.
|
|
|
|
|
|
2023-05-22 14:58:33 +00:00
|
|
|
.. attribute:: allow_bf16_reduced_precision_reduction
|
2022-12-21 18:58:28 +00:00
|
|
|
|
|
|
|
|
A :class:`bool` that controls whether reduced precision reductions are allowed with bf16 GEMMs.
|
|
|
|
|
|
2023-05-22 14:58:33 +00:00
|
|
|
.. currentmodule:: torch.backends.cuda
|
|
|
|
|
.. attribute:: cufft_plan_cache
|
2020-09-17 02:12:25 +00:00
|
|
|
|
2023-03-14 00:28:14 +00:00
|
|
|
``cufft_plan_cache`` contains the cuFFT plan caches for each CUDA device.
|
|
|
|
|
Query a specific device `i`'s cache via `torch.backends.cuda.cufft_plan_cache[i]`.
|
2020-09-17 02:12:25 +00:00
|
|
|
|
2023-05-22 14:58:33 +00:00
|
|
|
.. currentmodule:: torch.backends.cuda.cufft_plan_cache
|
2020-09-17 02:12:25 +00:00
|
|
|
.. attribute:: size
|
|
|
|
|
|
2023-03-14 00:28:14 +00:00
|
|
|
A readonly :class:`int` that shows the number of plans currently in a cuFFT plan cache.
|
2020-09-17 02:12:25 +00:00
|
|
|
|
|
|
|
|
.. attribute:: max_size
|
|
|
|
|
|
2023-03-14 00:28:14 +00:00
|
|
|
A :class:`int` that controls the capacity of a cuFFT plan cache.
|
2020-09-17 02:12:25 +00:00
|
|
|
|
|
|
|
|
.. method:: clear()
|
|
|
|
|
|
2023-03-14 00:28:14 +00:00
|
|
|
Clears a cuFFT plan cache.
|
2021-03-06 01:19:22 +00:00
|
|
|
|
2024-04-22 15:38:22 +00:00
|
|
|
.. autofunction:: torch.backends.cuda.preferred_blas_library
|
|
|
|
|
|
2025-01-03 22:01:36 +00:00
|
|
|
.. autofunction:: torch.backends.cuda.preferred_rocm_fa_library
|
|
|
|
|
|
2021-12-04 03:03:03 +00:00
|
|
|
.. autofunction:: torch.backends.cuda.preferred_linalg_library
|
|
|
|
|
|
2023-11-15 04:54:52 +00:00
|
|
|
.. autoclass:: torch.backends.cuda.SDPAParams
|
|
|
|
|
|
2022-10-03 17:36:36 +00:00
|
|
|
.. autofunction:: torch.backends.cuda.flash_sdp_enabled
|
|
|
|
|
|
2022-10-28 15:51:10 +00:00
|
|
|
.. autofunction:: torch.backends.cuda.enable_mem_efficient_sdp
|
|
|
|
|
|
|
|
|
|
.. autofunction:: torch.backends.cuda.mem_efficient_sdp_enabled
|
|
|
|
|
|
2022-10-03 17:36:36 +00:00
|
|
|
.. autofunction:: torch.backends.cuda.enable_flash_sdp
|
|
|
|
|
|
|
|
|
|
.. autofunction:: torch.backends.cuda.math_sdp_enabled
|
|
|
|
|
|
|
|
|
|
.. autofunction:: torch.backends.cuda.enable_math_sdp
|
|
|
|
|
|
2024-09-24 07:11:36 +00:00
|
|
|
.. autofunction:: torch.backends.cuda.fp16_bf16_reduction_math_sdp_allowed
|
|
|
|
|
|
|
|
|
|
.. autofunction:: torch.backends.cuda.allow_fp16_bf16_reduction_math_sdp
|
|
|
|
|
|
2024-02-14 22:02:03 +00:00
|
|
|
.. autofunction:: torch.backends.cuda.cudnn_sdp_enabled
|
|
|
|
|
|
|
|
|
|
.. autofunction:: torch.backends.cuda.enable_cudnn_sdp
|
|
|
|
|
|
2024-07-31 08:11:12 +00:00
|
|
|
.. autofunction:: torch.backends.cuda.is_flash_attention_available
|
|
|
|
|
|
2023-11-15 04:54:52 +00:00
|
|
|
.. autofunction:: torch.backends.cuda.can_use_flash_attention
|
|
|
|
|
|
|
|
|
|
.. autofunction:: torch.backends.cuda.can_use_efficient_attention
|
|
|
|
|
|
2024-06-30 19:22:16 +00:00
|
|
|
.. autofunction:: torch.backends.cuda.can_use_cudnn_attention
|
|
|
|
|
|
2022-10-03 17:36:36 +00:00
|
|
|
.. autofunction:: torch.backends.cuda.sdp_kernel
|
2020-09-17 02:12:25 +00:00
|
|
|
|
|
|
|
|
torch.backends.cudnn
|
|
|
|
|
^^^^^^^^^^^^^^^^^^^^
|
2022-03-10 22:21:03 +00:00
|
|
|
.. automodule:: torch.backends.cudnn
|
2020-09-17 02:12:25 +00:00
|
|
|
|
|
|
|
|
.. autofunction:: torch.backends.cudnn.version
|
|
|
|
|
|
|
|
|
|
.. autofunction:: torch.backends.cudnn.is_available
|
|
|
|
|
|
2023-05-22 14:58:33 +00:00
|
|
|
.. attribute:: enabled
|
2020-09-17 02:12:25 +00:00
|
|
|
|
|
|
|
|
A :class:`bool` that controls whether cuDNN is enabled.
|
|
|
|
|
|
2023-05-22 14:58:33 +00:00
|
|
|
.. attribute:: allow_tf32
|
2020-09-17 02:12:25 +00:00
|
|
|
|
|
|
|
|
A :class:`bool` that controls where TensorFloat-32 tensor cores may be used in cuDNN
|
|
|
|
|
convolutions on Ampere or newer GPUs. See :ref:`tf32_on_ampere`.
|
|
|
|
|
|
2023-05-22 14:58:33 +00:00
|
|
|
.. attribute:: deterministic
|
2020-09-17 02:12:25 +00:00
|
|
|
|
|
|
|
|
A :class:`bool` that, if True, causes cuDNN to only use deterministic convolution algorithms.
|
2021-01-22 19:17:28 +00:00
|
|
|
See also :func:`torch.are_deterministic_algorithms_enabled` and
|
|
|
|
|
:func:`torch.use_deterministic_algorithms`.
|
2020-09-17 02:12:25 +00:00
|
|
|
|
2023-05-22 14:58:33 +00:00
|
|
|
.. attribute:: benchmark
|
2020-09-17 02:12:25 +00:00
|
|
|
|
|
|
|
|
A :class:`bool` that, if True, causes cuDNN to benchmark multiple convolution algorithms
|
|
|
|
|
and select the fastest.
|
|
|
|
|
|
2023-05-22 14:58:33 +00:00
|
|
|
.. attribute:: benchmark_limit
|
2022-07-07 23:25:23 +00:00
|
|
|
|
|
|
|
|
A :class:`int` that specifies the maximum number of cuDNN convolution algorithms to try when
|
|
|
|
|
`torch.backends.cudnn.benchmark` is True. Set `benchmark_limit` to zero to try every
|
|
|
|
|
available algorithm. Note that this setting only affects convolutions dispatched via the
|
|
|
|
|
cuDNN v8 API.
|
|
|
|
|
|
2023-10-06 14:16:01 +00:00
|
|
|
.. py:module:: torch.backends.cudnn.rnn
|
2022-07-07 23:25:23 +00:00
|
|
|
|
2024-08-21 17:13:14 +00:00
|
|
|
torch.backends.cusparselt
|
|
|
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
.. automodule:: torch.backends.cusparselt
|
|
|
|
|
|
|
|
|
|
.. autofunction:: torch.backends.cusparselt.version
|
|
|
|
|
|
|
|
|
|
.. autofunction:: torch.backends.cusparselt.is_available
|
2024-01-02 21:13:55 +00:00
|
|
|
|
|
|
|
|
torch.backends.mha
|
|
|
|
|
^^^^^^^^^^^^^^^^^^
|
|
|
|
|
.. automodule:: torch.backends.mha
|
|
|
|
|
|
|
|
|
|
.. autofunction:: torch.backends.mha.get_fastpath_enabled
|
|
|
|
|
.. autofunction:: torch.backends.mha.set_fastpath_enabled
|
|
|
|
|
|
|
|
|
|
|
2022-05-11 17:19:45 +00:00
|
|
|
torch.backends.mps
|
|
|
|
|
^^^^^^^^^^^^^^^^^^
|
|
|
|
|
.. automodule:: torch.backends.mps
|
|
|
|
|
|
|
|
|
|
.. autofunction:: torch.backends.mps.is_available
|
|
|
|
|
|
|
|
|
|
.. autofunction:: torch.backends.mps.is_built
|
|
|
|
|
|
2020-09-17 02:12:25 +00:00
|
|
|
|
|
|
|
|
torch.backends.mkl
|
|
|
|
|
^^^^^^^^^^^^^^^^^^
|
2022-03-10 22:21:03 +00:00
|
|
|
.. automodule:: torch.backends.mkl
|
2020-09-17 02:12:25 +00:00
|
|
|
|
|
|
|
|
.. autofunction:: torch.backends.mkl.is_available
|
|
|
|
|
|
[RFC] enable oneMKL&oneDNN on-demands verbose functinality (#63212)
**RFC:
Problem statement**
Intel oneMKL and oneDNN are used to accelerate performance on Intel platforms. Both these 2 libraries provide verbose functionality to dump detailed operator execution information as well as execution time. These verbose messages are very helpful to performance profiling. However, the verbose functionality works for the entire execution. In many scenarios, though, we only would like to profile partial of the execution process. This feature is to expose PyTorch API functions to control oneDNN and oneMKL verbose functionality in runtime.
**Additional context**
The most used performance profiling steps are shown as the following code snippet:
```
def inference(model, inputs):
# step0 (optional): jit
model = torch.jit.trace(model, inputs)
# step1: warmup
for _ in range(100):
model(inputs)
# step2: performance profiling. We only care the profiling result, as well as oneDNN and oneMKL verbose messages, of this step
model(inputs)
# step3 (optional): benchmarking
t0 = time.time()
for _ in range(100):
model(inputs)
t1 = time.time()
print(‘dur: {}’.format((t1-t0)/100))
return model(inputs)
```
Since environment variables MKL_VERBOSE and DNNL_VERBOSE will be effect to the entire progress, we will get a great number of verbose messages for all of 101 iterations (if step3 is not involved). However, we only care about the verbose messages dumped in step2. It is very difficult to filter unnecessary verbose messages out if we are running into a complicated usages scenario. Also, jit trace will also bring more undesired verbose messages.
Furthermore, there are more complicated topologies or usages like cascaded topologies as below:
```
model1 = Model1()
model2 = Model2()
model3 = Model3()
x1 = inference(model1, x)
x2 = inference(model2, x1)
y = inference(model3, x2)
```
There are many cases that it is very hard to split these child topologies out. In this scenario, it is not possible to investigate performance of each individual topology with `DNNL_VERBOSE` and `MKL_VERBOSE`.
To solve this issue, oneDNN and oneMKL provide API functions to make it possible to control verbose functionality in runtime.
```
int mkl_verbose (int enable)
status dnnl::set_verbose(int level)
```
oneDNN and oneMKL print verbose messages to stdout when oneMKL or oneDNN ops are executed.
Sample verbose messages:
```
MKL_VERBOSE SGEMM(t,n,768,2048,3072,0x7fff64115800,0x7fa1aca58040,3072,0x1041f5c0,3072,0x7fff64115820,0x981f0c0,768) 8.52ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:44
dnnl_verbose,exec,cpu,inner_product,brgemm:avx512_core,forward_training,src_f32::blocked:ab:f0 wei_f32::blocked:AB16b64a:f0 bia_f32::blocked:a:f0 dst_f32::blocked:ab:f0,,,mb16ic768oc768,0.0839844
```
**Design and implementation**
The design is to make python-interfaced wrap functions to invoke mkl_verbose and dnnl::set_verbose functions.
**Design concern**
- Need to add wrapper C++ functions for mkl_verbose and dnnl::set_verbose functions in torch/csrc and aten/csrc.
- Python API functions will be added to device-specific backends
- with torch.backends.mkl.verbose(1):
- with torch.backends.mkldnn.verbose(1):
**Use cases**
```
def inference(model, inputs):
# step0 (optional): jit
model = torch.jit.trace(model, inputs)
# step1: warmup
for _ in range(100):
model(inputs)
# step2: performance profiling
with torch.backends.mkl.verbose(1), torch.backends.mkldnn.verbose(1):
model(inputs)
# step3 (optional): benchmarking
t0 = time.time()
for _ in range(100):
model(inputs)
t1 = time.time()
print(‘dur: {}’.format((t1-t0)/100))
return model(inputs)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63212
Approved by: https://github.com/VitalyFedyunin, https://github.com/malfet
2022-07-27 23:29:35 +00:00
|
|
|
.. autoclass:: torch.backends.mkl.verbose
|
|
|
|
|
|
2020-09-17 02:12:25 +00:00
|
|
|
|
|
|
|
|
torch.backends.mkldnn
|
|
|
|
|
^^^^^^^^^^^^^^^^^^^^^
|
2022-03-10 22:21:03 +00:00
|
|
|
.. automodule:: torch.backends.mkldnn
|
2020-09-17 02:12:25 +00:00
|
|
|
|
|
|
|
|
.. autofunction:: torch.backends.mkldnn.is_available
|
|
|
|
|
|
[RFC] enable oneMKL&oneDNN on-demands verbose functinality (#63212)
**RFC:
Problem statement**
Intel oneMKL and oneDNN are used to accelerate performance on Intel platforms. Both these 2 libraries provide verbose functionality to dump detailed operator execution information as well as execution time. These verbose messages are very helpful to performance profiling. However, the verbose functionality works for the entire execution. In many scenarios, though, we only would like to profile partial of the execution process. This feature is to expose PyTorch API functions to control oneDNN and oneMKL verbose functionality in runtime.
**Additional context**
The most used performance profiling steps are shown as the following code snippet:
```
def inference(model, inputs):
# step0 (optional): jit
model = torch.jit.trace(model, inputs)
# step1: warmup
for _ in range(100):
model(inputs)
# step2: performance profiling. We only care the profiling result, as well as oneDNN and oneMKL verbose messages, of this step
model(inputs)
# step3 (optional): benchmarking
t0 = time.time()
for _ in range(100):
model(inputs)
t1 = time.time()
print(‘dur: {}’.format((t1-t0)/100))
return model(inputs)
```
Since environment variables MKL_VERBOSE and DNNL_VERBOSE will be effect to the entire progress, we will get a great number of verbose messages for all of 101 iterations (if step3 is not involved). However, we only care about the verbose messages dumped in step2. It is very difficult to filter unnecessary verbose messages out if we are running into a complicated usages scenario. Also, jit trace will also bring more undesired verbose messages.
Furthermore, there are more complicated topologies or usages like cascaded topologies as below:
```
model1 = Model1()
model2 = Model2()
model3 = Model3()
x1 = inference(model1, x)
x2 = inference(model2, x1)
y = inference(model3, x2)
```
There are many cases that it is very hard to split these child topologies out. In this scenario, it is not possible to investigate performance of each individual topology with `DNNL_VERBOSE` and `MKL_VERBOSE`.
To solve this issue, oneDNN and oneMKL provide API functions to make it possible to control verbose functionality in runtime.
```
int mkl_verbose (int enable)
status dnnl::set_verbose(int level)
```
oneDNN and oneMKL print verbose messages to stdout when oneMKL or oneDNN ops are executed.
Sample verbose messages:
```
MKL_VERBOSE SGEMM(t,n,768,2048,3072,0x7fff64115800,0x7fa1aca58040,3072,0x1041f5c0,3072,0x7fff64115820,0x981f0c0,768) 8.52ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:44
dnnl_verbose,exec,cpu,inner_product,brgemm:avx512_core,forward_training,src_f32::blocked:ab:f0 wei_f32::blocked:AB16b64a:f0 bia_f32::blocked:a:f0 dst_f32::blocked:ab:f0,,,mb16ic768oc768,0.0839844
```
**Design and implementation**
The design is to make python-interfaced wrap functions to invoke mkl_verbose and dnnl::set_verbose functions.
**Design concern**
- Need to add wrapper C++ functions for mkl_verbose and dnnl::set_verbose functions in torch/csrc and aten/csrc.
- Python API functions will be added to device-specific backends
- with torch.backends.mkl.verbose(1):
- with torch.backends.mkldnn.verbose(1):
**Use cases**
```
def inference(model, inputs):
# step0 (optional): jit
model = torch.jit.trace(model, inputs)
# step1: warmup
for _ in range(100):
model(inputs)
# step2: performance profiling
with torch.backends.mkl.verbose(1), torch.backends.mkldnn.verbose(1):
model(inputs)
# step3 (optional): benchmarking
t0 = time.time()
for _ in range(100):
model(inputs)
t1 = time.time()
print(‘dur: {}’.format((t1-t0)/100))
return model(inputs)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63212
Approved by: https://github.com/VitalyFedyunin, https://github.com/malfet
2022-07-27 23:29:35 +00:00
|
|
|
.. autoclass:: torch.backends.mkldnn.verbose
|
|
|
|
|
|
Add setUserEnabledNNPACK config (#116152)
When exporting a model with a convolution kernel on cpu, if mkldnn is disabled and nnpack is enabled, export will go down the nnpack optimized convolution kernel for certain shapes ((code pointer)[https://github.com/pytorch/pytorch/blob/cd449e260c830c9ce0f06ed4833b46aa638f1529/aten/src/ATen/native/Convolution.cpp#L542-L552]). This means that we will automatically create a guard on that certain shape. If users want to export without any restrictions, one option is to disable nnpack. However, no config function exists for this, so this PR is adding a config function, similar to the `set_mkldnn_enabled` function.
Original context is in https://fb.workplace.com/groups/1075192433118967/posts/1349589822345892/?comment_id=1349597102345164&reply_comment_id=1349677642337110.
To test the flag, the following script runs successfully:
```
import os
import torch
from torchvision.models import ResNet18_Weights, resnet18
torch.set_float32_matmul_precision("high")
model = resnet18(weights=ResNet18_Weights.DEFAULT)
model.eval()
with torch.no_grad():
# device = "cuda" if torch.cuda.is_available() else "cpu"
torch.backends.mkldnn.set_flags(False)
torch.backends.nnpack.set_flags(False) # <--- Added config
device = "cpu"
model = model.to(device=device)
example_inputs = (torch.randn(2, 3, 224, 224, device=device),)
batch_dim = torch.export.Dim("batch", min=2, max=32)
so_path = torch._export.aot_compile(
model,
example_inputs,
# Specify the first dimension of the input x as dynamic
dynamic_shapes={"x": {0: batch_dim}},
# Specify the generated shared library path
options={
"aot_inductor.output_path": os.path.join(os.getcwd(), "resnet18_pt2.so"),
"max_autotune": True,
},
)
```
I'm not sure who to add as reviewer, so please feel free to add whoever is relevant!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116152
Approved by: https://github.com/malfet
2023-12-27 06:00:12 +00:00
|
|
|
torch.backends.nnpack
|
|
|
|
|
^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
.. automodule:: torch.backends.nnpack
|
|
|
|
|
|
|
|
|
|
.. autofunction:: torch.backends.nnpack.is_available
|
|
|
|
|
|
|
|
|
|
.. autofunction:: torch.backends.nnpack.flags
|
|
|
|
|
|
|
|
|
|
.. autofunction:: torch.backends.nnpack.set_flags
|
2020-09-17 02:12:25 +00:00
|
|
|
|
|
|
|
|
torch.backends.openmp
|
|
|
|
|
^^^^^^^^^^^^^^^^^^^^^
|
2022-03-10 22:21:03 +00:00
|
|
|
.. automodule:: torch.backends.openmp
|
2020-09-17 02:12:25 +00:00
|
|
|
|
|
|
|
|
.. autofunction:: torch.backends.openmp.is_available
|
2022-03-10 22:21:03 +00:00
|
|
|
|
|
|
|
|
.. Docs for other backends need to be added here.
|
|
|
|
|
.. Automodules are just here to ensure checks run but they don't actually
|
|
|
|
|
.. add anything to the rendered page for now.
|
|
|
|
|
.. py:module:: torch.backends.quantized
|
|
|
|
|
.. py:module:: torch.backends.xnnpack
|
2025-01-23 18:50:58 +00:00
|
|
|
.. py:module:: torch.backends.kleidiai
|
2022-07-29 12:57:19 +00:00
|
|
|
|
|
|
|
|
|
2022-10-05 06:33:25 +00:00
|
|
|
torch.backends.opt_einsum
|
|
|
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
.. automodule:: torch.backends.opt_einsum
|
|
|
|
|
|
|
|
|
|
.. autofunction:: torch.backends.opt_einsum.is_available
|
|
|
|
|
|
|
|
|
|
.. autofunction:: torch.backends.opt_einsum.get_opt_einsum
|
|
|
|
|
|
2023-05-22 14:58:33 +00:00
|
|
|
.. attribute:: enabled
|
2022-10-05 06:33:25 +00:00
|
|
|
|
2024-10-09 16:04:29 +00:00
|
|
|
A :class:`bool` that controls whether opt_einsum is enabled (``True`` by default). If so,
|
2022-10-05 06:33:25 +00:00
|
|
|
torch.einsum will use opt_einsum (https://optimized-einsum.readthedocs.io/en/stable/path_finding.html)
|
|
|
|
|
if available to calculate an optimal path of contraction for faster performance.
|
|
|
|
|
|
|
|
|
|
If opt_einsum is not available, torch.einsum will fall back to the default contraction path
|
|
|
|
|
of left to right.
|
|
|
|
|
|
2023-05-22 14:58:33 +00:00
|
|
|
.. attribute:: strategy
|
2022-10-05 06:33:25 +00:00
|
|
|
|
2024-10-09 16:04:29 +00:00
|
|
|
A :class:`str` that specifies which strategies to try when ``torch.backends.opt_einsum.enabled``
|
2022-10-05 06:33:25 +00:00
|
|
|
is ``True``. By default, torch.einsum will try the "auto" strategy, but the "greedy" and "optimal"
|
|
|
|
|
strategies are also supported. Note that the "optimal" strategy is factorial on the number of
|
|
|
|
|
inputs as it tries all possible paths. See more details in opt_einsum's docs
|
|
|
|
|
(https://optimized-einsum.readthedocs.io/en/stable/path_finding.html).
|
|
|
|
|
|
|
|
|
|
|
2022-07-29 12:57:19 +00:00
|
|
|
torch.backends.xeon
|
|
|
|
|
^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
.. automodule:: torch.backends.xeon
|
2023-10-06 14:16:01 +00:00
|
|
|
.. py:module:: torch.backends.xeon.run_cpu
|