pytorch/docs/source
Michael Carilli 675cea1adb [CUDA graphs][BC-breaking] Removes post-backward syncs on default stream (#60421)
Summary:
Before https://github.com/pytorch/pytorch/pull/57833, calls to backward() or grad() synced only the calling thread's default stream with autograd leaf streams at the end of backward. This made the following weird pattern safe:
```python
with torch.cuda.stream(s):
    # imagine forward used many streams, so backward leaf nodes may run on many streams
    loss.backward()
# no sync
use grads
```

but a more benign-looking pattern was unsafe:
```python
with torch.cuda.stream(s):
    # imagine forward used a lot of streams, so backward leaf nodes may run on many streams
    loss.backward()
    # backward() syncs the default stream with all the leaf streams, but does not sync s with anything,
    # so counterintuitively (even though we're in the same stream context as backward()!)
    # it is NOT SAFE to use grads here, and there's no easy way to make it safe,
    # unless you manually sync on all the streams you used in forward,
    # or move "use grads" back to default stream outside the context.
    use grads
```
mruberry ngimel and I decided backward() should have the [same user-facing stream semantics as any cuda op](https://pytorch.org/docs/master/notes/cuda.html#stream-semantics-of-backward-passes).** In other words, the weird pattern should be unsafe, and the benign-looking pattern should be safe. Implementationwise, this meant backward() should sync its calling thread's current stream, not default stream, with the leaf streams.

After https://github.com/pytorch/pytorch/pull/57833, backward syncs the calling thread's current stream AND default stream with all leaf streams at the end of backward. The default stream syncs were retained for temporary backward compatibility.

This PR finishes https://github.com/pytorch/pytorch/pull/57833's work by deleting syncs on the default stream.

With this PR, graph-capturing an entire backward() call should be possible (see the [test_graph_grad_scaling diffs](https://github.com/pytorch/pytorch/compare/master...mcarilli:streaming_backwards_remove_default_syncs?expand=1#diff-893b1eea27352f336f4cd832919e48d721e4e90186e63400b8596db6b82e7450R3641-R3642)).

** first paragraph has a formatting error which this PR should also fix.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60421

Reviewed By: VitalyFedyunin, albanD

Differential Revision: D29342234

Pulled By: ngimel

fbshipit-source-id: 98e6be7fdd8550872f0a78f9a66cb8dfe75abf63
2021-06-23 23:35:24 -07:00
..
_static DOC Improve documentation for LayerNorm (#59178) 2021-06-07 14:34:10 -07:00
_templates Remove master documentation from being indexable by search engines (#58056) 2021-05-18 06:20:09 -07:00
community
elastic [torch/elastic] Update the rendezvous docs (#58160) 2021-05-12 16:54:28 -07:00
notes [CUDA graphs][BC-breaking] Removes post-backward syncs on default stream (#60421) 2021-06-23 23:35:24 -07:00
rpc
scripts Add mish activation function (#58648) 2021-05-25 10:36:21 -07:00
__config__.rst
amp.rst Moves grid_sampler to autocast promote list (#58618) 2021-06-21 10:22:36 -07:00
autograd.rst Add no-grad inference mode note (#58513) 2021-05-25 13:06:54 -07:00
backends.rst
benchmark_utils.rst
bottleneck.rst
checkpoint.rst
complex_numbers.rst Abladawood patch 1 (#58496) 2021-05-20 10:32:18 -07:00
conf.py Use proper Google Analytics id (#56578) 2021-05-04 13:23:16 -07:00
cpp_extension.rst
cpp_index.rst
cuda.rst breakup optim, cuda documentation (#55673) 2021-04-14 12:44:00 -07:00
cudnn_persistent_rnn.rst
cudnn_rnn_determinism.rst
data.rst [DataLoader][doc] Randomness for base_seed generator and NumPy seed (#56528) 2021-04-22 09:40:45 -07:00
ddp_comm_hooks.rst [Gradient Compression] Remove unnecessary warning on the rst file and the check on C++ version (#58170) 2021-05-12 14:15:10 -07:00
distributed.elastic.rst [1/n][torch/elastic] Move torchelastic docs *.rst (#148) 2021-05-04 00:57:56 -07:00
distributed.optim.rst
distributed.rst [reland] Document debugability features in torch.distributed (#59726) 2021-06-09 16:40:11 -07:00
distributions.rst
dlpack.rst
docutils.conf
fft.rst Use autosummary on torch.fft, torch.linalg (#55748) 2021-04-13 12:02:36 -07:00
futures.rst Update docs to mention CUDA support for Future (#50048) 2021-05-11 08:26:33 -07:00
fx.rst [FX][docs][EZ] Fix link to fuser example (#59670) 2021-06-08 17:32:55 -07:00
hub.rst
index.rst add torch.testing to docs (#57247) 2021-05-07 09:16:39 -07:00
jit.rst Remove caption for Lang Reference (#56526) 2021-04-20 14:33:42 -07:00
jit_builtin_functions.rst
jit_language_reference.rst
jit_language_reference_v2.rst Fix hasattr support type (#57950) 2021-05-10 12:21:56 -07:00
jit_python_reference.rst [JIT] improve documentation (#57991) 2021-05-19 11:47:32 -07:00
jit_unsupported.rst
linalg.rst Add torch.linalg.inv_ex without checking for errors by default (#58039) 2021-05-13 09:42:15 -07:00
math-quantizer-equation.png
mobile_optimizer.rst
model_zoo.rst
multiprocessing.rst
name_inference.rst Abladawood patch 1 (#58496) 2021-05-20 10:32:18 -07:00
named_tensor.rst
nn.functional.rst Add mish activation function (#58648) 2021-05-25 10:36:21 -07:00
nn.init.rst
nn.rst ENH Adds nn.ReflectionPad3d (#59791) 2021-06-21 10:53:14 -07:00
onnx.rst Update Autograd Export Docs (#56594) (#59534) 2021-06-15 12:23:00 -07:00
optim.rst To add Rectified Adam Algorithm to Optimizers (#58968) 2021-06-23 18:27:57 -07:00
package.rst [package] fix tutorial link (#60113) 2021-06-16 11:27:25 -07:00
pipeline.rst
profiler.rst
quantization-support.rst
quantization.rst quantization: improve documentation on natively supported backends (#58925) 2021-06-07 17:29:03 -07:00
random.rst
rpc.rst Add a disclaimer about limited CUDA support in RPC (#58023) 2021-05-12 00:11:22 -07:00
sparse.rst
special.rst Implement erfcx() (#58194) 2021-06-22 12:38:38 -07:00
storage.rst
tensor_attributes.rst
tensor_view.rst Conjugate View (#54987) 2021-06-04 14:12:41 -07:00
tensorboard.rst
tensors.rst Implement histogram operator on CPU (#58780) 2021-06-22 10:06:04 -07:00
testing.rst add torch.testing to docs (#57247) 2021-05-07 09:16:39 -07:00
torch.nn.intrinsic.qat.rst
torch.nn.intrinsic.quantized.rst
torch.nn.intrinsic.rst
torch.nn.qat.rst
torch.nn.quantized.dynamic.rst
torch.nn.quantized.rst
torch.overrides.rst
torch.quantization.rst
torch.rst Implement histogram operator on CPU (#58780) 2021-06-22 10:06:04 -07:00
type_info.rst