mirror of
https://github.com/saymrwulf/pytorch.git
synced 2026-05-14 20:57:59 +00:00
Currently, pre- and post-division steps in `FullyShardedDataParallel._post_backward_hook` state the following: > Average grad by world_size for consistency with PyTorch DDP. This is not matching what is actually going on, i.e. pre-divide factor may be equal to `world_size` and may not. For example, for `world_size = 3 `, `predivide_factor=2` This PR clarifies pre- and post-division in the code Pull Request resolved: https://github.com/pytorch/pytorch/pull/80456 Approved by: https://github.com/rohan-varma |
||
|---|---|---|
| .. | ||
| _shard | ||
| _sharded_tensor | ||
| _sharding_spec | ||
| algorithms | ||
| autograd | ||
| benchmarks | ||
| elastic | ||
| fsdp | ||
| launcher | ||
| nn | ||
| optim | ||
| pipeline | ||
| rpc | ||
| __init__.py | ||
| argparse_util.py | ||
| constants.py | ||
| CONTRIBUTING.md | ||
| distributed_c10d.py | ||
| launch.py | ||
| remote_device.py | ||
| rendezvous.py | ||
| run.py | ||
| utils.py | ||