mirror of
https://github.com/saymrwulf/pytorch.git
synced 2026-05-14 20:57:59 +00:00
Summary: For similar reasons as documented in the `[Sync Streams]` note. For a current example, `ProcessGroupNCCL::allgather` must also call `recordStream` and does so already. The output tensor is created on the default stream (by the application). NCCL/RCCL internally uses another stream (i.e., ncclStream). If we do not record the output tensor on the ncclStream, there is a chance that the output tensor might be deallocated while NCCL/RCCL is using it. The application is not aware of the ncclStream since it's internal to ProcessGroupNCCL. So, the application cannot record the output tensor on the ncclStream. Patch originally developed by sarunyap. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46603 Reviewed By: srinivas212 Differential Revision: D24458530 fbshipit-source-id: b02e74d1c3a176ea1b9bbdd7dc671b221fcadaef |
||
|---|---|---|
| .. | ||
| _C | ||
| autograd | ||
| backends | ||
| contrib | ||
| csrc | ||
| cuda | ||
| distributed | ||
| distributions | ||
| fft | ||
| for_onnx | ||
| futures | ||
| fx | ||
| jit | ||
| legacy | ||
| lib | ||
| linalg | ||
| multiprocessing | ||
| nn | ||
| onnx | ||
| optim | ||
| package | ||
| quantization | ||
| sparse | ||
| testing | ||
| utils | ||
| __config__.py | ||
| __future__.py | ||
| __init__.py | ||
| _appdirs.py | ||
| _classes.py | ||
| _jit_internal.py | ||
| _linalg_utils.py | ||
| _lobpcg.py | ||
| _lowrank.py | ||
| _namedtensor_internals.py | ||
| _ops.py | ||
| _six.py | ||
| _storage_docs.py | ||
| _tensor_docs.py | ||
| _tensor_str.py | ||
| _torch_docs.py | ||
| _utils.py | ||
| _utils_internal.py | ||
| _VF.py | ||
| _vmap_internals.py | ||
| abi-check.cpp | ||
| CMakeLists.txt | ||
| custom_class.h | ||
| custom_class_detail.h | ||
| extension.h | ||
| functional.py | ||
| hub.py | ||
| library.h | ||
| overrides.py | ||
| py.typed | ||
| quasirandom.py | ||
| random.py | ||
| README.txt | ||
| script.h | ||
| serialization.py | ||
| storage.py | ||
| tensor.py | ||
| types.py | ||
Note [TH abstraction violation] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ TH/THC provide some hpp headers, which are proper C++ headers rather than C headers. These headers serve double duty as *internal implementation detail* headers, whose contents should largely not be used by external clients. Ideally, we would not install these headers at all; instead, you should use public functions (in headers like `THTensor.h`, NOT `THTensor.hpp`) to manipulate these structs. However, there are a few places in torch/csrc where we violate this abstraction. They are marked with a pointer to this note. Each of those sites will have to be refactored when we refactor the guts of THTensor and related structures.