mirror of
https://github.com/saymrwulf/pytorch.git
synced 2026-05-15 21:00:47 +00:00
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45867 In most cases the lock ordering was hold a lock in local autograd and then hold a lock in DistAutogradContext. In case of `set_exception_without_signal` the lock order was in reverse and as a result we saw potential deadlock issues in our TSAN tests. To fix this, I removed the lock and instead just used std::atomic exchange. In addition to this, I fixed TestE2E to ensure that we use the appropriate timeout. TestE2EProcessGroup was flaky for these two reasons and now is fixed. ghstack-source-id: 113592709 Test Plan: waitforbuildbot. Reviewed By: albanD Differential Revision: D24120962 fbshipit-source-id: 12447b84ceae772b91e9a183c90d1e6340f44e66 |
||
|---|---|---|
| .. | ||
| api | ||
| common | ||
| dist_autograd | ||
| jit | ||
| rpc | ||
| tensorexpr | ||
| __init__.py | ||