mirror of
https://github.com/saymrwulf/pytorch.git
synced 2026-05-14 20:57:59 +00:00
Summary: Reland of https://github.com/pytorch/pytorch/pull/72578. **Overview** Windows CI was failing due to the multi-rank single-GPU case (see [here](https://github.com/pytorch/pytorch/runs/5204906995?check_suite_focus=true)). To address this, I - added `common_distributed.skip_if_no_gpu` for `test_multiple_param_groups()` to ensure that each rank can safely call `to(self.device)` -- this targets the expected SPSD use case where each rank has its own GPU; - moved `test_constructor()` back to `TestZeroRedundancyOptimizerSingleRank` to check that the multiple parameter group method for construction works even on a single rank. **Test Plan** - I checked both tests for CPU, 1 GPU, 2 GPUs, 4 GPUs, and 8 GPUs. - I added the `ciflow/win` label to run the failing Windows CI test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/72932 Reviewed By: rohan-varma Differential Revision: D34281482 Pulled By: awgu fbshipit-source-id: c4fe604ddd9d2c123c3071249741e6b8a6454b6e (cherry picked from commit 6bea9bcc6349ff1aad403563206fb170a3af0c70) |
||
|---|---|---|
| .. | ||
| _shard | ||
| algorithms | ||
| bin | ||
| elastic | ||
| fsdp | ||
| launcher | ||
| nn/jit | ||
| optim | ||
| pipeline/sync | ||
| rpc | ||
| argparse_util_test.py | ||
| test_c10d_common.py | ||
| test_c10d_gloo.py | ||
| test_c10d_nccl.py | ||
| test_c10d_spawn.py | ||
| test_c10d_spawn_gloo.py | ||
| test_c10d_spawn_nccl.py | ||
| test_data_parallel.py | ||
| test_distributed_spawn.py | ||
| test_launcher.py | ||
| test_nccl.py | ||
| test_pg_wrapper.py | ||
| test_store.py | ||