mirror of
https://github.com/saymrwulf/pytorch.git
synced 2026-05-15 21:00:47 +00:00
This PR shows that we can use FSDP solely for CPU offloading when composing with N-way TP. Each FSDP mesh is just 1 rank. This was motivated from an ask on Slack :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127024 Approved by: https://github.com/weifengpy, https://github.com/wanchaol ghstack dependencies: #127004 |
||
|---|---|---|
| .. | ||
| fsdp | ||
| fully_shard | ||
| test_checkpoint.py | ||
| test_compose.py | ||
| test_contract.py | ||
| test_replicate.py | ||
| test_replicate_with_compiler.py | ||