mirror of
https://github.com/saymrwulf/pytorch.git
synced 2026-05-14 20:57:59 +00:00
Summary: Split out `seq_id` into `collective_seq_id` and `p2p_seq_id`. The main idea here is that collectives that go to all machines should have identical `collective_seq_id` and therefore it makes it easier to spot if one of machines isn't handling a collective operation. Next, we can attempt to match up p2p operations to ensure that the sender(s)/receivers(s) are in sync. Resolves issue: https://github.com/pytorch/pytorch/issues/125173 Test Plan: Unit tests. Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/125727 Approved by: https://github.com/zdevito |
||
|---|---|---|
| .. | ||
| aoti_abi_check | ||
| aoti_inference | ||
| api | ||
| c10d | ||
| common | ||
| dist_autograd | ||
| jit | ||
| lazy | ||
| lite_interpreter_runtime | ||
| monitor | ||
| profiler | ||
| rpc | ||
| tensorexpr | ||
| __init__.py | ||