pytorch/torch/distributed
Pritam Damania 0d6fa1adc5 Introduce ChunkShardingSpec as a model sharding specification. (#55728)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55728

Full design: https://github.com/pytorch/pytorch/issues/55207

This PR introduces ChunkShardingSpec (SingleShardingSpec in the design). Used
the name ChunkShardingSpec since it is very similar to `torch.chunk` in terms
of how a Tensor is split up and feels more clear compared to SingleShardingSpec.
ghstack-source-id: 129603318

Test Plan: waitforbuildbot

Reviewed By: SciPioneer

Differential Revision: D27694108

fbshipit-source-id: c8764abe6a4d5fc56d023fda29b74b5af2a73b49
2021-05-23 16:04:57 -07:00
..
_sharding_spec Introduce ChunkShardingSpec as a model sharding specification. (#55728) 2021-05-23 16:04:57 -07:00
algorithms [Gradient Compression] Update the docstring of fp16_compress_hook (#58168) 2021-05-12 14:28:41 -07:00
autograd
benchmarks Add lint for unqualified type: ignore (#56290) 2021-04-21 08:07:23 -07:00
elastic [torch/elastic] Add logging to the sanitize function of RendezvousStateHolder (#58169) 2021-05-12 18:53:55 -07:00
launcher Add lint for unqualified type: ignore (#56290) 2021-04-21 08:07:23 -07:00
nn Introduce ChunkShardingSpec as a model sharding specification. (#55728) 2021-05-23 16:04:57 -07:00
optim [doc] update distributed optimizer doc (#58084) 2021-05-13 23:37:00 -07:00
pipeline Convert assert -> cast. (#57458) 2021-05-12 13:54:16 -07:00
rpc Introduce ChunkShardingSpec as a model sharding specification. (#55728) 2021-05-23 16:04:57 -07:00
__init__.py [torch distributed] Implementing all_gather_base (#56315) 2021-04-23 14:16:47 -07:00
argparse_util.py [19/n][torch/elastic][upstream] Replace pytorch.distributed.launch with torchelastic launcher (#56214) 2021-04-16 13:38:23 -07:00
constants.py make ProcessGroupDefaultTimeout the same as python (#56549) 2021-04-21 17:56:05 -07:00
CONTRIBUTING.md Split test_c10d.py to test_c10d_common.py, test_c10d_gloo.py, test_c10d_nccl.py (#56598) 2021-04-21 22:10:41 -07:00
distributed_c10d.py Document monitored barrier (#58322) 2021-05-21 19:04:57 -07:00
launch.py [23/n][torch/elastic][upstream] Rename torch.distributed.elastic_launch to torch.distributed.run (#56831) 2021-04-29 11:06:20 -07:00
rendezvous.py Fix path handling on Win32 in rendezvous.py (#57000) 2021-04-29 13:55:11 -07:00
run.py [tsm] add support for jetter to Role (base_image) for mast launches (#58252) 2021-05-14 17:39:18 -07:00
utils.py Introduce ChunkShardingSpec as a model sharding specification. (#55728) 2021-05-23 16:04:57 -07:00