pytorch/test/distributed
Pritam Damania 4709fdb117 Add GenericShardingSpec for generic tensor sharding. (#57409)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57409

Full design: https://github.com/pytorch/pytorch/issues/55207

In https://github.com/pytorch/pytorch/issues/55207, we proposed
`MeshShardingSpec` as a generic sharding mechanism. However, that proposal does
not provide the flexibility to specify shards which have uneven
sizes/partitions and assumes even partitioning. Uneven partitioning is one of
the requirements of an internal use case.

As a result, instead of that we introduce a `GenericShardingSpec` which allows
specifying any arbitrary partitioning of a multi dimensional tensor. Basically
it specifies the start offsets of each shard and the length of each dim of the
shard allowing for greater flexibility
ghstack-source-id: 129604155

Test Plan:
1) unit tests
2) waitforbuildbot

Reviewed By: SciPioneer

Differential Revision: D28137616

fbshipit-source-id: 61255762485fb8fa3ec3a43c27bbb222ca25abff
2021-05-23 16:06:05 -07:00
..
_sharding_spec Add GenericShardingSpec for generic tensor sharding. (#57409) 2021-05-23 16:06:05 -07:00
algorithms/ddp_comm_hooks
bin Pytorch resolve bug around incorrect rdzv handler resolution (#56386) 2021-04-19 23:50:28 -07:00
elastic [torch/elastic] Revise distributed run script (#58159) 2021-05-12 16:53:31 -07:00
launcher [tsm] add support for jetter to Role (base_image) for mast launches (#58252) 2021-05-14 17:39:18 -07:00
nn/jit
optim Add lint for unqualified type: ignore (#56290) 2021-04-21 08:07:23 -07:00
pipeline/sync combine consecutive layes on the same device (#55973) 2021-05-11 08:04:08 -07:00
rpc
argparse_util_test.py
test_c10d_common.py Do not use TF32 matmul in linalg and DDP tests (#56114) 2021-05-20 14:01:19 -07:00
test_c10d_gloo.py Add Futures to ProcessGroupGloo (#57818) 2021-05-11 14:47:09 -07:00
test_c10d_nccl.py [DDP] Support not all outputs used in loss calculation (#57081) 2021-05-20 08:34:33 -07:00
test_c10d_spawn.py Split test_c10d.py to test_c10d_common.py, test_c10d_gloo.py, test_c10d_nccl.py (#56598) 2021-04-21 22:10:41 -07:00
test_c10d_spawn_gloo.py Add Python-3.9 CI testing (#50992) 2021-05-10 10:51:39 -07:00
test_c10d_spawn_nccl.py Add Python-3.9 CI testing (#50992) 2021-05-10 10:51:39 -07:00
test_data_parallel.py
test_distributed_fork.py
test_distributed_spawn.py
test_jit_c10d.py Fix misleading messages in test_jit_c10d (#57256) 2021-04-29 09:17:41 -07:00
test_launcher.py Pytorch resolve bug around incorrect rdzv handler resolution (#56386) 2021-04-19 23:50:28 -07:00
test_nccl.py