mirror of
https://github.com/saymrwulf/pytorch.git
synced 2026-05-15 21:00:47 +00:00
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50564 When an RPC was sent, the associated future was stored in two maps: pendingResponseMessage_ and timeoutMap_. Once the response was received, the entry was only removed from pendingResponseMessage_ and not timeoutMap_. The pollTimedoudRpcs method then eventually removed the entry from timeoutMap_ after the time out duration had passed. Although, in scenarios where there is a large timeout and a large number of RPCs being used, it is very easy for the timeoutMap_ to grow without any bounds. This was discovered in https://github.com/pytorch/pytorch/issues/50522. To fix this issue, I've added some code to cleanup timeoutMap_ as well once we receive a response. ghstack-source-id: 119925182 Test Plan: 1) Unit test added. 2) Tested with repro in https://github.com/pytorch/pytorch/issues/50522 #Closes: https://github.com/pytorch/pytorch/issues/50522 Reviewed By: mrshenli Differential Revision: D25919650 fbshipit-source-id: a0a42647e706d598fce2ca2c92963e540b9d9dbb |
||
|---|---|---|
| .. | ||
| CMakeLists.txt | ||
| e2e_test_base.cpp | ||
| e2e_test_base.h | ||
| test_e2e_process_group.cpp | ||
| test_e2e_tensorpipe.cpp | ||
| test_tensorpipe_serialization.cpp | ||
| test_wire_serialization.cpp | ||