pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-14 20:57:59 +00:00

History

Benjamin Glass 5aa5a5763e [inductor triton] Disable incorrect TF32 usage on CUDA capability < 8 (#145684 ) Triton 2.2 and greater have a bug where allowing TF32 generation for a GPU that does not support TF32 will cause code generation errors. Patch around this problem by: 1. Adding a function to `torch.cuda` that determines whether CUDA hardware is capable of using the TF32 format. 2. Using that function to explicitly disable TF32 generation when calling Triton, where needed. To demonstrate that this fix works, try running `test/inductor/test_max_autotune.py` on a GPU with CUDA compute capability < 8 (e.g. any NVIDIA consumer GPU) without this fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145684 Approved by: https://github.com/eqy		2025-01-28 22:01:08 +00:00
..
_static	Update OSS nested tensor docs to focus on NJT (#145402 )	2025-01-25 04:08:19 +00:00
_templates	Add an option for classic search (#142018 )	2024-12-06 01:24:52 +00:00
community
elastic
notes	change the test wheel to release wheel when release wheel available (#145252 )	2025-01-28 21:23:53 +00:00
rpc
scripts	[ONNX] Update images and APIs to onnx_dynamo.rst (#144358 )	2025-01-08 21:44:43 +00:00
accelerator.rst	[BE][accelerator] formalize API name `{current,set}_device_{idx => index}` (#140542 )	2024-12-12 10:53:48 +00:00
amp.rst
autograd.rst
backends.rst	Revert "Reverting the PR adding Kleidiai-based int4 kernels (#145392 )" (#145505 )	2025-01-23 18:50:59 +00:00
benchmark_utils.rst
bottleneck.rst
checkpoint.rst
complex_numbers.rst
cond.rst
conf.py	Revert "Add flop formula for _scaled_mm (#144872 )"	2025-01-16 15:16:18 +00:00
config_mod.rst
cpp_extension.rst
cpp_index.rst
cpu.rst
cuda._sanitizer.rst
cuda.rst	[inductor triton] Disable incorrect TF32 usage on CUDA capability < 8 (#145684 )	2025-01-28 22:01:08 +00:00
cuda.tunable.rst	[ROCm] Fix TunableOp UTs: Rotating Buffer (#143172 )	2024-12-14 06:18:11 +00:00
cuda_environment_variables.rst
cudnn_persistent_rnn.rst
cudnn_rnn_determinism.rst
data.rst
ddp_comm_hooks.rst
debugging_environment_variables.rst
deploy.rst
deterministic.rst
distributed.algorithms.join.rst
distributed.checkpoint.rst
distributed.elastic.rst
distributed.fsdp.fully_shard.rst	[FSDP2] Move to public `torch.distributed.fsdp` (#141868 )	2024-12-07 01:24:28 +00:00
distributed.optim.rst
distributed.pipelining.rst	[pipelining] Update tutorials and documentation (#143045 )	2024-12-12 18:42:17 +00:00
distributed.rst	[C10D] Update docs for wait() (#143305 )	2024-12-17 00:41:11 +00:00
distributed.tensor.parallel.rst
distributed.tensor.rst	[dtensor] expose the __create_chunk_list__ in the doc (#144100 )	2025-01-03 20:06:23 +00:00
distributions.rst
dlpack.rst
docutils.conf
export.ir_spec.rst	[export] Update docs (#142011 )	2024-12-05 03:44:46 +00:00
export.programming_model.rst	fix formatting in programming model doc (#143587 )	2024-12-20 07:09:19 +00:00
export.rst	torch export programming model (#143546 )	2024-12-19 16:56:13 +00:00
fft.rst
fsdp.rst
func.api.rst
func.batch_norm.rst
func.migrating.rst
func.rst
func.ux_limitations.rst
func.whirlwind_tour.rst
future_mod.rst
futures.rst
fx.experimental.rst	Add `truediv` support in export serializer (#136364 )	2024-12-05 17:33:33 +00:00
fx.rst
hub.rst
index.rst	Add Torchao docs link to Pytorch libraries (#145412 )	2025-01-24 17:11:20 +00:00
jit.rst
jit_builtin_functions.rst
jit_language_reference.rst
jit_language_reference_v2.rst
jit_python_reference.rst
jit_unsupported.rst
jit_utils.rst
library.rst	[Custom Ops] Add a new API to allow users to register an autocast for the custom op (#145588 )	2025-01-27 19:22:43 +00:00
linalg.rst
logging.rst
masked.rst
math-quantizer-equation.png
meta.rst
miscellaneous_environment_variables.rst
mobile_optimizer.rst
model_zoo.rst
module_tracker.rst
monitor.rst
mps.rst	[MPS] Expose `MPSProfiler::start/stopCapture` to Python (#144561 )	2025-01-11 02:05:36 +00:00
mps_environment_variables.rst
mtia.memory.rst	Revert "[MTIA] (3/n) Implement PyTorch APIs to query/reset device peak memory usage (#143347 )"	2024-12-21 04:04:16 +00:00
mtia.rst	Revert "[MTIA] (3/n) Implement PyTorch APIs to query/reset device peak memory usage (#143347 )"	2024-12-21 04:04:16 +00:00
multiprocessing.rst
name_inference.rst
named_tensor.rst
nested.rst	Update OSS nested tensor docs to focus on NJT (#145402 )	2025-01-25 04:08:19 +00:00
nn.attention.bias.rst
nn.attention.experimental.rst
nn.attention.flex_attention.rst
nn.attention.rst
nn.functional.rst
nn.init.rst
nn.rst
onnx.rst
onnx_dynamo.rst	[ONNX] Update images and APIs to onnx_dynamo.rst (#144358 )	2025-01-08 21:44:43 +00:00
onnx_dynamo_memory_usage.rst	Update TorchDynamo-based ONNX Exporter memory usage example code. (#144139 )	2025-01-03 20:41:36 +00:00
onnx_dynamo_onnxruntime_backend.rst
onnx_torchscript.rst	[ONNX] Update images and APIs to onnx_dynamo.rst (#144358 )	2025-01-08 21:44:43 +00:00
onnx_torchscript_supported_aten_ops.rst
optim.rst
package.rst
profiler.rst
quantization-accuracy-debugging.rst
quantization-backend-configuration.rst
quantization-support.rst	Add support for prototype affine quantization in pt2e flow (#141421 )	2024-12-24 04:22:18 +00:00
quantization.rst	[BC-Breaking]Remove capture_pre_autograd_graph references in quantization (#139505 )	2024-12-13 22:26:22 +00:00
random.rst
rpc.rst
signal.rst
size.rst
sparse.rst
special.rst
storage.rst
tensor_attributes.rst
tensor_view.rst
tensorboard.rst
tensors.rst
testing.rst
threading_environment_variables.rst
torch.ao.ns._numeric_suite.rst
torch.ao.ns._numeric_suite_fx.rst
torch.compiler.config.rst
torch.compiler.rst
torch.compiler_aot_inductor.rst	[AOTI][doc] Update tutorial (#143390 )	2024-12-17 18:35:40 +00:00
torch.compiler_aot_inductor_minifier.rst	Aoti minifier flatten (#141156 )	2024-12-06 07:12:45 +00:00
torch.compiler_api.rst	[export] add is_exporting flag (#142425 )	2024-12-18 21:36:28 +00:00
torch.compiler_best_practices_for_backends.rst
torch.compiler_cudagraph_trees.rst	[CUDAGraph][Docs] add `cuda` to `torch.randn` (#144793 )	2025-01-15 18:02:10 +00:00
torch.compiler_custom_backends.rst
torch.compiler_dynamic_shapes.rst
torch.compiler_dynamo_deepdive.rst
torch.compiler_dynamo_overview.rst
torch.compiler_fake_tensor.rst
torch.compiler_faq.rst	Rename cache limit to recompile limit in configs (#143709 )	2024-12-22 10:03:57 +00:00
torch.compiler_fine_grain_apis.rst	[export] add is_exporting flag (#142425 )	2024-12-18 21:36:28 +00:00
torch.compiler_get_started.rst
torch.compiler_inductor_profiling.rst
torch.compiler_ir.rst
torch.compiler_nn_module.rst
torch.compiler_performance_dashboard.rst
torch.compiler_profiling_torch_compile.rst
torch.compiler_transformations.rst
torch.compiler_troubleshooting.rst	Rename cache limit to recompile limit in configs (#143709 )	2024-12-22 10:03:57 +00:00
torch.compiler_troubleshooting_old.rst	Rename cache limit to recompile limit in configs (#143709 )	2024-12-22 10:03:57 +00:00
torch.overrides.rst
torch.rst	Transform unbacked int expressions into a fresh unbacked int. (#141917 )	2024-12-05 16:53:44 +00:00
torch_cuda_memory.rst
torch_environment_variables.rst
torch_nccl_environment_variables.rst
type_info.rst
utils.rst
xpu.rst	Add get_stream_from_external API for XPU backend (#141123 )	2024-12-31 11:15:52 +00:00