pytorch/test/distributed
Rohan Varma ab3c039910 Fix FSDP device_id when CPU offloading (#82892)
See https://github.com/pytorch/pytorch/issues/82891 for full context.

When we init FSDP with device_id + CPU offload, we could potentially hit a crash when an outer FSDP unit does not manage any params. What was happening is that it would end up getting a flat param of a child FSDP module, check the device of this, see it is CPU, and throw an error.

The fix is to avoid this check if we hit a flat param. Also fixes up the documentation of the function.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82892
Approved by: https://github.com/awgu
2022-08-07 19:06:32 +00:00
..
_shard [_shard] only check shard metadata for copy_ (#82655) 2022-08-04 06:12:19 +00:00
algorithms Enable test: distributed/algorithms/quantization/test_quantization (#80097) 2022-07-01 01:32:33 +00:00
bin
elastic [ci] remove IN_CI env var 2022-06-11 17:16:30 +00:00
fsdp Fix FSDP device_id when CPU offloading (#82892) 2022-08-07 19:06:32 +00:00
launcher
nn/jit
optim [CUDA graphs] Allows Adam and AdamW to be capture-safe (#77862) 2022-06-13 01:56:47 +00:00
pipeline/sync
rpc [ci] remove IN_CI env var 2022-06-11 17:16:30 +00:00
argparse_util_test.py
defs.bzl
test_c10d_common.py
test_c10d_gloo.py
test_c10d_nccl.py [ROCm]: Enable test_grad_layout_1devicemodule_1replicaperprocess (#82005) 2022-07-24 16:30:56 +00:00
test_c10d_object_collectives.py Revert "Revert "[distributed] Handle object collectives and NCCL. (#79034)"" 2022-06-15 10:04:37 -07:00
test_c10d_pypg.py [distributed] Make DDP work with python process group (#79176) 2022-06-28 17:14:21 +00:00
test_c10d_spawn.py
test_c10d_spawn_gloo.py
test_c10d_spawn_nccl.py Ensure tensors are contiguous in functional all_gather. 2022-06-17 01:27:11 +00:00
test_data_parallel.py
test_distributed_spawn.py
test_launcher.py
test_nccl.py
test_pg_wrapper.py Add _reduce_scatter_base to ProcessGroupWrapper. (#79633) 2022-06-29 15:32:42 +00:00
test_store.py [rpc/distributed] eliminate code duplication in distributed/rendezvou… (#81577) 2022-07-22 16:21:00 +00:00