pytorch/test/cpp
Pritam Damania bf85642c4c Remove lock from GraphTask::set_exception_without_signal. (#45867)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45867

In most cases the lock ordering was hold a lock in local autograd and
then hold a lock in DistAutogradContext.

In case of `set_exception_without_signal` the lock order was in reverse and as
a result we saw potential deadlock issues in our TSAN tests. To fix this, I
removed the lock and instead just used std::atomic exchange.

In addition to this, I fixed TestE2E to ensure that we use the appropriate
timeout.

TestE2EProcessGroup was flaky for these two reasons and now is fixed.
ghstack-source-id: 113592709

Test Plan: waitforbuildbot.

Reviewed By: albanD

Differential Revision: D24120962

fbshipit-source-id: 12447b84ceae772b91e9a183c90d1e6340f44e66
2020-10-05 20:02:29 -07:00
..
api [c++] Distance-agnostic triplet margin loss (#45377) 2020-09-30 12:37:35 -07:00
common
dist_autograd Fix Windows build failure after DDP PR merged (#45335) 2020-09-25 12:37:50 -07:00
jit Enable TorchBind tests on ROCm (#45426) 2020-10-05 09:38:12 -07:00
rpc Remove lock from GraphTask::set_exception_without_signal. (#45867) 2020-10-05 20:02:29 -07:00
tensorexpr Enable TorchBind tests on ROCm (#45426) 2020-10-05 09:38:12 -07:00
__init__.py remediation of S205607 2020-07-17 17:19:47 -07:00