pytorch/docs/source
eqy 790763b0fe Add an option to disable reduced precision reductions for FP16 GEMM (#67946)
Summary:
https://github.com/pytorch/pytorch/issues/67578 disabled reduced precision reductions for FP16 GEMMs. After benchmarking, we've found that this has substantial performance impacts for common GEMM shapes (e.g., those found in popular instantiations of multiheaded-attention) on architectures such as Volta. As these performance regressions may come as a surprise to current users, this PR adds a toggle to disable reduced precision reductions
`torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction = `
rather than making it the default behavior.

CC ngimel ptrblck
stas00 Note that the behavior after the previous PR can be replicated with
`torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction = False`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67946

Reviewed By: zou3519

Differential Revision: D32289896

Pulled By: ngimel

fbshipit-source-id: a1ea2918b77e27a7d9b391e030417802a0174abe
2021-11-09 17:27:20 -08:00
..
_static clarify the documentation of torch.meshgrid (#62977) 2021-08-18 04:01:22 -07:00
_templates
community Update contribution_guide.rst (#64142) 2021-08-30 19:26:59 -07:00
elastic (torchelastic) make --max_restarts explicit in the quickstart and runner docs (#65838) 2021-09-29 19:29:01 -07:00
notes Add an option to disable reduced precision reductions for FP16 GEMM (#67946) 2021-11-09 17:27:20 -08:00
rpc Support Union in TorchScript (#64234) 2021-09-03 06:12:24 -07:00
scripts [docs] Add images to some activation functions (#65415) 2021-09-22 11:05:29 -07:00
__config__.rst
amp.rst rebase for autocast updates to include device_type and dtype flags (#61002) 2021-08-10 20:03:12 -07:00
autograd.rst Update extending doc to cover forward mode AD (#66962) 2021-10-27 14:18:38 -07:00
backends.rst Add an option to disable reduced precision reductions for FP16 GEMM (#67946) 2021-11-09 17:27:20 -08:00
benchmark_utils.rst
bottleneck.rst
checkpoint.rst
complex_numbers.rst Grammatical update of tech docs (#61547) 2021-07-14 14:01:59 -07:00
conf.py [Quant] Add dynamic QAT Linear module (#67325) 2021-11-08 10:24:25 -08:00
cpp_extension.rst
cpp_index.rst
cuda.rst [CUDA graphs] Beta, not prototype (#65247) 2021-09-20 13:32:36 -07:00
cudnn_persistent_rnn.rst Remove orphan from cuDNN persistent note (#65160) 2021-09-21 11:09:47 -07:00
cudnn_rnn_determinism.rst
data.rst Add a warning about DataLoader num_workers > 0 "memory leak" (#64337) 2021-09-01 21:49:41 -07:00
ddp_comm_hooks.rst [DDP Comm Hook] Add debugging communication hooks to ddp_comm_hooks.rst (#64352) 2021-09-01 17:37:19 -07:00
distributed.algorithms.join.rst Add tutorial link (#62785) 2021-08-05 17:28:02 -07:00
distributed.elastic.rst
distributed.optim.rst [distributed][docs] Delete distributed optimimzer section from RPC and add reference to namespace docs page (#68068) 2021-11-09 15:01:54 -08:00
distributed.rst Update distributed.rst to show that CUDA send/recv on GPU is supported (#65601) 2021-09-24 12:30:10 -07:00
distributions.rst
dlpack.rst
docutils.conf
fft.rst C++ API and docs for hfftn (#66127) 2021-10-07 12:48:36 -07:00
futures.rst
fx.rst fx: Update fx.rst (#68043) 2021-11-09 14:00:45 -08:00
hub.rst
index.rst Make _Join, _Joinable, _JoinHook public (#62605) 2021-08-03 12:20:11 -07:00
jit.rst Back out "D30740897 Add fusion enabled apis" (#64500) 2021-09-04 20:55:58 -07:00
jit_builtin_functions.rst
jit_language_reference.rst Document torch.jit.is_tracing() (#67326) 2021-10-28 09:56:09 -07:00
jit_language_reference_v2.rst Document torch.jit.is_tracing() (#67326) 2021-10-28 09:56:09 -07:00
jit_python_reference.rst
jit_unsupported.rst
linalg.rst Create linalg.matrix_exp (#62715) 2021-10-19 09:07:15 -07:00
math-quantizer-equation.png
mobile_optimizer.rst
model_zoo.rst
multiprocessing.rst
name_inference.rst
named_tensor.rst
nn.functional.rst
nn.init.rst
nn.rst Implements the orthogonal parametrization (#62089) 2021-08-30 13:12:07 -07:00
onnx.rst [Doc] [ONNX]Fix a broken url for ONNXRuntime custom op (#67944) 2021-11-08 15:51:02 -08:00
optim.rst To add SequentialLR to PyTorch Core Schedulers (#64037) 2021-09-09 09:36:32 -07:00
package.rst [package] add some docs describing how to debug dependencies (#65704) 2021-09-27 12:14:23 -07:00
pipeline.rst fixed comments referring fairscale master branch (#65531) 2021-09-23 14:37:58 -07:00
profiler.rst
quantization-support.rst [Quant] Add dynamic QAT Linear module (#67325) 2021-11-08 10:24:25 -08:00
quantization.rst pytorch quantization: document the custom module APIs (#67449) 2021-10-29 05:22:17 -07:00
random.rst
rpc.rst [distributed][docs] Delete distributed optimimzer section from RPC and add reference to namespace docs page (#68068) 2021-11-09 15:01:54 -08:00
sparse.rst
special.rst [special] special alias for softmax (#62251) 2021-10-01 03:55:32 -07:00
storage.rst
tensor_attributes.rst
tensor_view.rst Add tensor.{adjoint(),H,mT,mH} methods and properties (#64179) 2021-10-13 07:44:43 -07:00
tensorboard.rst
tensors.rst [numpy] add torch.argwhere (#64257) 2021-10-30 15:26:11 -07:00
testing.rst [Doc] make_tensor to torch.testing module (#63925) 2021-08-30 12:25:40 -07:00
torch.ao.ns._numeric_suite.rst Quantization docs: add pages for Numeric Suite (Eager and FX) (#66380) 2021-10-11 18:47:58 -07:00
torch.ao.ns._numeric_suite_fx.rst Quantization docs: add pages for Numeric Suite (Eager and FX) (#66380) 2021-10-11 18:47:58 -07:00
torch.overrides.rst
torch.rst [numpy] add torch.argwhere (#64257) 2021-10-30 15:26:11 -07:00
type_info.rst clarify that torch.finfo.tiny is the smallest normal number (#63241) 2021-08-18 13:44:52 -07:00