pytorch/test/cpp/rpc
Pritam Damania bf85642c4c Remove lock from GraphTask::set_exception_without_signal. (#45867)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45867

In most cases the lock ordering was hold a lock in local autograd and
then hold a lock in DistAutogradContext.

In case of `set_exception_without_signal` the lock order was in reverse and as
a result we saw potential deadlock issues in our TSAN tests. To fix this, I
removed the lock and instead just used std::atomic exchange.

In addition to this, I fixed TestE2E to ensure that we use the appropriate
timeout.

TestE2EProcessGroup was flaky for these two reasons and now is fixed.
ghstack-source-id: 113592709

Test Plan: waitforbuildbot.

Reviewed By: albanD

Differential Revision: D24120962

fbshipit-source-id: 12447b84ceae772b91e9a183c90d1e6340f44e66
2020-10-05 20:02:29 -07:00
..
CMakeLists.txt Build test_e2e_tensorpipe only if Gloo is enabled (#43041) 2020-08-14 09:24:47 -07:00
e2e_test_base.cpp Enroll TensorPipe agent in C++-only E2E test (#42680) 2020-08-13 07:07:30 -07:00
e2e_test_base.h Enroll TensorPipe agent in C++-only E2E test (#42680) 2020-08-13 07:07:30 -07:00
test_e2e_process_group.cpp Remove lock from GraphTask::set_exception_without_signal. (#45867) 2020-10-05 20:02:29 -07:00
test_e2e_tensorpipe.cpp [Codemod][GleanFbcode] Remove dead includes in caffe2/test (#43953) 2020-09-01 21:48:28 -07:00
test_tensorpipe_serialization.cpp Make Channel API accept buffer structs rather than raw pointers. (#45014) 2020-09-21 10:18:45 -07:00
test_wire_serialization.cpp Back out "Revert D21171334: [pytorch][PR] Change StorageImpl to track byte count rather than element count" (#37893) 2020-05-05 22:43:15 -07:00