pytorch/torch
Pritam Damania bf85642c4c Remove lock from GraphTask::set_exception_without_signal. (#45867)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45867

In most cases the lock ordering was hold a lock in local autograd and
then hold a lock in DistAutogradContext.

In case of `set_exception_without_signal` the lock order was in reverse and as
a result we saw potential deadlock issues in our TSAN tests. To fix this, I
removed the lock and instead just used std::atomic exchange.

In addition to this, I fixed TestE2E to ensure that we use the appropriate
timeout.

TestE2EProcessGroup was flaky for these two reasons and now is fixed.
ghstack-source-id: 113592709

Test Plan: waitforbuildbot.

Reviewed By: albanD

Differential Revision: D24120962

fbshipit-source-id: 12447b84ceae772b91e9a183c90d1e6340f44e66
2020-10-05 20:02:29 -07:00
..
_C Change type inferred from empty annotation (#45360) 2020-10-05 15:16:56 -07:00
autograd fixed formatting in function rstrings in torch.autograd.functional (#45849) 2020-10-05 13:39:01 -07:00
backends
contrib
csrc Remove lock from GraphTask::set_exception_without_signal. (#45867) 2020-10-05 20:02:29 -07:00
cuda Use MTA for amp grad unscaling, enforce op math type in MTA functors, and allow op lambdas (#44778) 2020-10-01 07:51:16 -07:00
distributed [NCCL] Support NCCL Send/Recv (#44921) 2020-10-05 18:27:57 -07:00
distributions [docs] Update docs for NegativeBinomial (#45693) 2020-10-01 23:20:34 -07:00
fft
for_onnx
futures
fx [FX][WIP] Mutable Graph APIs (#45227) 2020-10-05 17:07:08 -07:00
jit Change type inferred from empty annotation (#45360) 2020-10-05 15:16:56 -07:00
legacy
lib [NCCL] create NCCL communicator for send/recv on demand (#44922) 2020-10-05 18:33:03 -07:00
linalg
multiprocessing
nn [docs] Fix EmbeddingBag docs (#45763) 2020-10-05 15:56:35 -07:00
onnx Revert D23398534: [pytorch][PR] [ONNX] Improve error handling for adaptive_pool 2020-10-05 15:16:59 -07:00
optim Add more tests for mt optimizers (#45475) 2020-09-28 23:59:58 -07:00
package [package] Add dependency viz (#45214) 2020-09-28 15:38:41 -07:00
quantization [FX][WIP] Mutable Graph APIs (#45227) 2020-10-05 17:07:08 -07:00
sparse
testing [NCCL] Support NCCL Send/Recv (#44921) 2020-10-05 18:27:57 -07:00
utils CUDA BFloat16 infrastructure (#44925) 2020-10-02 16:21:30 -07:00
__config__.py
__future__.py
__init__.py [quant] creating quint4x2 dtype for quantized tensors (#44678) 2020-10-01 23:53:34 -07:00
_appdirs.py
_classes.py
_jit_internal.py [JIT] Enable @unused syntax for ignoring properties (#45261) 2020-09-29 10:24:25 -07:00
_linalg_utils.py
_lobpcg.py Backward support for generalized eigenvalue solver with LOBPCG in forward [only k-rank SYMEIG case] (#43002) 2020-09-28 07:22:35 -07:00
_lowrank.py
_namedtensor_internals.py
_ops.py
_six.py
_storage_docs.py
_tensor_docs.py [numpy] Add torch.nan_to_num (#44592) 2020-10-05 01:38:56 -07:00
_tensor_str.py
_torch_docs.py [numpy] Add torch.nan_to_num (#44592) 2020-10-05 01:38:56 -07:00
_utils.py
_utils_internal.py
_VF.py
_vmap_internals.py
abi-check.cpp
CMakeLists.txt Enable TorchBind tests on ROCm (#45426) 2020-10-05 09:38:12 -07:00
custom_class.h
custom_class_detail.h
extension.h
functional.py [docs] Add 3D reduction example to tensordot docs (#45697) 2020-10-05 15:36:59 -07:00
hub.py
library.h
overrides.py [numpy] Add torch.nan_to_num (#44592) 2020-10-05 01:38:56 -07:00
py.typed
quasirandom.py Type check quasirandom (#45434) 2020-09-28 16:49:38 -07:00
random.py
README.txt
script.h
serialization.py
storage.py
tensor.py Makes rdiv consistent with div (#45407) 2020-09-29 08:34:01 -07:00
types.py

Note [TH abstraction violation]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

TH/THC provide some hpp headers, which are proper C++ headers rather than
C headers.  These headers serve double duty as *internal implementation
detail* headers, whose contents should largely not be used by external
clients.

Ideally, we would not install these headers at all; instead, you should
use public functions (in headers like `THTensor.h`, NOT `THTensor.hpp`)
to manipulate these structs.  However, there are a few places
in torch/csrc where we violate this abstraction.  They are marked with
a pointer to this note.  Each of those sites will have to be refactored
when we refactor the guts of THTensor and related structures.