pytorch/test/cpp/rpc
Pritam Damania fd41ed1cce Fix flaky TestTrainingLoop - TestE2ETensorPipe (#51939)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51939

TestTrainingLoop - TestE2ETensorPipe was flaky since there would still
be inflight background RPCs running as we performed the assertions. This
resulted in these assertions failing since we didn't wait for all RPCs on the
agent to finish.

To resolve this issue, in this PR we join() and shutdown() the RPC agent to
ensure no further RPCs are done. Then we assertion the map sizes to ensure no
leaks occurred.

In addition to this, added messageIdToTimeout map to lookup the appropriate
timeout for a messageId. This ensures we remove the appropriate entry from the
map. The previous solution was passing the expirationTime through the lambda,
but it is not guaranteed the lambda would read the response of the request we
just sent out.
ghstack-source-id: 121412604

Test Plan:
1) unit tests
2) waitforbuildbot

Reviewed By: rohan-varma

Differential Revision: D26331585

fbshipit-source-id: a41e0534d7d4dfd240446e661e5541311931c7d7
2021-02-10 22:14:06 -08:00
..
CMakeLists.txt Build test_e2e_tensorpipe only if Gloo is enabled (#43041) 2020-08-14 09:24:47 -07:00
e2e_test_base.cpp
e2e_test_base.h Support device map for distributed autograd while using TensorPipe. (#44859) 2021-01-27 13:01:44 -08:00
test_e2e_process_group.cpp [c10d] switch ProcessGroup to be managed by intrusive_ptr (#47343) 2020-11-12 07:36:23 -08:00
test_e2e_tensorpipe.cpp Fix flaky TestTrainingLoop - TestE2ETensorPipe (#51939) 2021-02-10 22:14:06 -08:00
test_tensorpipe_serialization.cpp Make Channel API accept buffer structs rather than raw pointers. (#45014) 2020-09-21 10:18:45 -07:00
test_wire_serialization.cpp