pytorch/docs/source
Benjamin Glass 5aa5a5763e [inductor triton] Disable incorrect TF32 usage on CUDA capability < 8 (#145684)
Triton 2.2 and greater have a bug where allowing TF32 generation for a GPU that does not support TF32 will cause code generation errors. Patch around this problem by:

1. Adding a function to `torch.cuda` that determines whether CUDA hardware is capable of using the TF32 format.
2. Using that function to explicitly disable TF32 generation when calling Triton, where needed.

To demonstrate that this fix works, try running `test/inductor/test_max_autotune.py` on a GPU with CUDA compute capability < 8 (e.g. any NVIDIA consumer GPU) without this fix.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145684
Approved by: https://github.com/eqy
2025-01-28 22:01:08 +00:00
..
_static Update OSS nested tensor docs to focus on NJT (#145402) 2025-01-25 04:08:19 +00:00
_templates Add an option for classic search (#142018) 2024-12-06 01:24:52 +00:00
community
elastic
notes change the test wheel to release wheel when release wheel available (#145252) 2025-01-28 21:23:53 +00:00
rpc
scripts [ONNX] Update images and APIs to onnx_dynamo.rst (#144358) 2025-01-08 21:44:43 +00:00
accelerator.rst [BE][accelerator] formalize API name {current,set}_device_{idx => index} (#140542) 2024-12-12 10:53:48 +00:00
amp.rst
autograd.rst
backends.rst Revert "Reverting the PR adding Kleidiai-based int4 kernels (#145392)" (#145505) 2025-01-23 18:50:59 +00:00
benchmark_utils.rst
bottleneck.rst
checkpoint.rst
complex_numbers.rst
cond.rst
conf.py Revert "Add flop formula for _scaled_mm (#144872)" 2025-01-16 15:16:18 +00:00
config_mod.rst
cpp_extension.rst
cpp_index.rst
cpu.rst
cuda._sanitizer.rst
cuda.rst [inductor triton] Disable incorrect TF32 usage on CUDA capability < 8 (#145684) 2025-01-28 22:01:08 +00:00
cuda.tunable.rst [ROCm] Fix TunableOp UTs: Rotating Buffer (#143172) 2024-12-14 06:18:11 +00:00
cuda_environment_variables.rst
cudnn_persistent_rnn.rst
cudnn_rnn_determinism.rst
data.rst
ddp_comm_hooks.rst
debugging_environment_variables.rst
deploy.rst
deterministic.rst
distributed.algorithms.join.rst
distributed.checkpoint.rst
distributed.elastic.rst
distributed.fsdp.fully_shard.rst [FSDP2] Move to public torch.distributed.fsdp (#141868) 2024-12-07 01:24:28 +00:00
distributed.optim.rst
distributed.pipelining.rst [pipelining] Update tutorials and documentation (#143045) 2024-12-12 18:42:17 +00:00
distributed.rst [C10D] Update docs for wait() (#143305) 2024-12-17 00:41:11 +00:00
distributed.tensor.parallel.rst
distributed.tensor.rst [dtensor] expose the __create_chunk_list__ in the doc (#144100) 2025-01-03 20:06:23 +00:00
distributions.rst
dlpack.rst
docutils.conf
export.ir_spec.rst [export] Update docs (#142011) 2024-12-05 03:44:46 +00:00
export.programming_model.rst fix formatting in programming model doc (#143587) 2024-12-20 07:09:19 +00:00
export.rst torch export programming model (#143546) 2024-12-19 16:56:13 +00:00
fft.rst
fsdp.rst
func.api.rst
func.batch_norm.rst
func.migrating.rst
func.rst
func.ux_limitations.rst
func.whirlwind_tour.rst
future_mod.rst
futures.rst
fx.experimental.rst Add truediv support in export serializer (#136364) 2024-12-05 17:33:33 +00:00
fx.rst
hub.rst
index.rst Add Torchao docs link to Pytorch libraries (#145412) 2025-01-24 17:11:20 +00:00
jit.rst
jit_builtin_functions.rst
jit_language_reference.rst
jit_language_reference_v2.rst
jit_python_reference.rst
jit_unsupported.rst
jit_utils.rst
library.rst [Custom Ops] Add a new API to allow users to register an autocast for the custom op (#145588) 2025-01-27 19:22:43 +00:00
linalg.rst
logging.rst
masked.rst
math-quantizer-equation.png
meta.rst
miscellaneous_environment_variables.rst
mobile_optimizer.rst
model_zoo.rst
module_tracker.rst
monitor.rst
mps.rst [MPS] Expose MPSProfiler::start/stopCapture to Python (#144561) 2025-01-11 02:05:36 +00:00
mps_environment_variables.rst
mtia.memory.rst Revert "[MTIA] (3/n) Implement PyTorch APIs to query/reset device peak memory usage (#143347)" 2024-12-21 04:04:16 +00:00
mtia.rst Revert "[MTIA] (3/n) Implement PyTorch APIs to query/reset device peak memory usage (#143347)" 2024-12-21 04:04:16 +00:00
multiprocessing.rst
name_inference.rst
named_tensor.rst
nested.rst Update OSS nested tensor docs to focus on NJT (#145402) 2025-01-25 04:08:19 +00:00
nn.attention.bias.rst
nn.attention.experimental.rst
nn.attention.flex_attention.rst
nn.attention.rst
nn.functional.rst
nn.init.rst
nn.rst
onnx.rst
onnx_dynamo.rst [ONNX] Update images and APIs to onnx_dynamo.rst (#144358) 2025-01-08 21:44:43 +00:00
onnx_dynamo_memory_usage.rst Update TorchDynamo-based ONNX Exporter memory usage example code. (#144139) 2025-01-03 20:41:36 +00:00
onnx_dynamo_onnxruntime_backend.rst
onnx_torchscript.rst [ONNX] Update images and APIs to onnx_dynamo.rst (#144358) 2025-01-08 21:44:43 +00:00
onnx_torchscript_supported_aten_ops.rst
optim.rst
package.rst
profiler.rst
quantization-accuracy-debugging.rst
quantization-backend-configuration.rst
quantization-support.rst Add support for prototype affine quantization in pt2e flow (#141421) 2024-12-24 04:22:18 +00:00
quantization.rst [BC-Breaking]Remove capture_pre_autograd_graph references in quantization (#139505) 2024-12-13 22:26:22 +00:00
random.rst
rpc.rst
signal.rst
size.rst
sparse.rst
special.rst
storage.rst
tensor_attributes.rst
tensor_view.rst
tensorboard.rst
tensors.rst
testing.rst
threading_environment_variables.rst
torch.ao.ns._numeric_suite.rst
torch.ao.ns._numeric_suite_fx.rst
torch.compiler.config.rst
torch.compiler.rst
torch.compiler_aot_inductor.rst [AOTI][doc] Update tutorial (#143390) 2024-12-17 18:35:40 +00:00
torch.compiler_aot_inductor_minifier.rst Aoti minifier flatten (#141156) 2024-12-06 07:12:45 +00:00
torch.compiler_api.rst [export] add is_exporting flag (#142425) 2024-12-18 21:36:28 +00:00
torch.compiler_best_practices_for_backends.rst
torch.compiler_cudagraph_trees.rst [CUDAGraph][Docs] add cuda to torch.randn (#144793) 2025-01-15 18:02:10 +00:00
torch.compiler_custom_backends.rst
torch.compiler_dynamic_shapes.rst
torch.compiler_dynamo_deepdive.rst
torch.compiler_dynamo_overview.rst
torch.compiler_fake_tensor.rst
torch.compiler_faq.rst Rename cache limit to recompile limit in configs (#143709) 2024-12-22 10:03:57 +00:00
torch.compiler_fine_grain_apis.rst [export] add is_exporting flag (#142425) 2024-12-18 21:36:28 +00:00
torch.compiler_get_started.rst
torch.compiler_inductor_profiling.rst
torch.compiler_ir.rst
torch.compiler_nn_module.rst
torch.compiler_performance_dashboard.rst
torch.compiler_profiling_torch_compile.rst
torch.compiler_transformations.rst
torch.compiler_troubleshooting.rst Rename cache limit to recompile limit in configs (#143709) 2024-12-22 10:03:57 +00:00
torch.compiler_troubleshooting_old.rst Rename cache limit to recompile limit in configs (#143709) 2024-12-22 10:03:57 +00:00
torch.overrides.rst
torch.rst Transform unbacked int expressions into a fresh unbacked int. (#141917) 2024-12-05 16:53:44 +00:00
torch_cuda_memory.rst
torch_environment_variables.rst
torch_nccl_environment_variables.rst
type_info.rst
utils.rst
xpu.rst Add get_stream_from_external API for XPU backend (#141123) 2024-12-31 11:15:52 +00:00