pytorch/torch/distributed
Olga Andreeva a48f3059b7 Corrected comments in fsdp (#80456)
Currently,  pre- and post-division steps in `FullyShardedDataParallel._post_backward_hook` state the following:
>  Average grad by world_size for consistency with PyTorch DDP.

This is not matching what is actually going on, i.e. pre-divide factor may be equal to `world_size` and may not.
For example, for `world_size = 3 `, `predivide_factor=2`

This PR clarifies pre- and post-division in the code

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80456
Approved by: https://github.com/rohan-varma
2022-06-28 18:46:05 +00:00
..
_shard [shard] make state_dict hook be consistent 2022-06-17 22:08:06 +00:00
_sharded_tensor
_sharding_spec
algorithms FSDP communication hook interface for NO_SHARD strategy (#79833) 2022-06-28 08:03:11 +00:00
autograd
benchmarks
elastic Add __all__ to torch.distributed and tensorboard submodules (#80444) 2022-06-28 16:33:22 +00:00
fsdp Corrected comments in fsdp (#80456) 2022-06-28 18:46:05 +00:00
launcher Add __all__ to torch.distributed and tensorboard submodules (#80444) 2022-06-28 16:33:22 +00:00
nn Ensure tensors are contiguous in functional all_gather. 2022-06-17 01:27:11 +00:00
optim [CUDA graphs] Allows Adam and AdamW to be capture-safe (#77862) 2022-06-13 01:56:47 +00:00
pipeline
rpc Add __all__ to various submodules in torch.fx, distributions, distributed, package (#80367) 2022-06-27 21:27:30 +00:00
__init__.py
argparse_util.py
constants.py
CONTRIBUTING.md Fix some links in torch/distributed/CONTRIBUTING.md (#79855) 2022-06-21 00:48:30 +00:00
distributed_c10d.py Revert "Revert "[distributed] Handle object collectives and NCCL. (#79034)"" 2022-06-15 10:04:37 -07:00
launch.py
remote_device.py
rendezvous.py
run.py
utils.py