pytorch/test/distributed
Rohan Varma 242fc29c96 [FSDP] Refactor optimizer in backward (#104813)
1) Use zero_grad(set_to_none=True) to set grad to None, 2) call
prepare_grad_for_optim() before call to .step, 3) use
_reset_flat_param_grad_info to set flat param gradient back to None. These
changes should just be refactors and equivalent to how gradient memory was
managed  before.

Differential Revision: [D47310761](https://our.internmc.facebook.com/intern/diff/D47310761/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104813
Approved by: https://github.com/awgu
2023-07-13 06:42:53 +00:00
..
_composable [FSDP][Easy] Rename streams; add back stream sharing test (#104966) 2023-07-13 00:24:41 +00:00
_shard [distributed][sharded_tensor] Move local_shards check from ShardedTensorBase to ShardedTensor (#100197) 2023-05-02 12:42:24 +00:00
_spmd [SPMD] Disable all SPMD tests (#104784) 2023-07-07 23:31:54 +00:00
_tensor [PTD][TP] Add BWD support for colwise embedding sharding (#104820) 2023-07-10 22:33:20 +00:00
_tools
algorithms [BE] Fix all B022 useless-contextlib-suppress (#100335) 2023-04-30 18:47:40 +00:00
bin
checkpoint [DCP][fsspec] Consolidate OSS FsspecWriter/Reader and internal FsspecWriter/Reader (#104724) 2023-07-10 19:31:01 +00:00
elastic [RFC] Allow elastic agent to fail fast (#99051) 2023-04-25 23:51:20 +00:00
fsdp [FSDP] Refactor optimizer in backward (#104813) 2023-07-13 06:42:53 +00:00
launcher
nn/jit
optim [BE] Fix all B022 useless-contextlib-suppress (#100335) 2023-04-30 18:47:40 +00:00
pipeline/sync
rpc Revert "Revert "Expandable blocks in allocator (#96995)"" (#99275) 2023-04-17 23:46:08 +00:00
tensor/parallel [DTensor][TP][Random] Introduce TensorParallelRNGTracker to integrate parallel RNG state with Tensor Parallel (#103910) 2023-06-30 08:06:41 +00:00
argparse_util_test.py
test_c10d_common.py Back out "Revert "[DDP] multiple forward support for static graph (#103487)" (#103873)" (#103938) 2023-06-22 21:55:58 +00:00
test_c10d_gloo.py Enable test sparse allreduce basics Windows (#103317) 2023-06-14 07:37:50 +00:00
test_c10d_logger.py [c10d] Record time spent for init_process_group, new_group, _store_based_barrier (#101912) 2023-05-24 09:36:34 +00:00
test_c10d_nccl.py Add back in reduce_scatter_tensor_coalesced (#104345) 2023-06-29 22:53:26 +00:00
test_c10d_object_collectives.py [c10d] Remove test for init barrier (#103223) 2023-06-08 16:56:40 +00:00
test_c10d_pypg.py
test_c10d_spawn.py
test_c10d_spawn_gloo.py
test_c10d_spawn_nccl.py
test_c10d_spawn_ucc.py
test_c10d_ucc.py [CI] Enable UCC in CI (#100395) 2023-06-08 19:01:22 +00:00
test_collective_utils.py Initial commit of collective_utils (#101037) 2023-06-27 02:15:16 +00:00
test_data_parallel.py
test_distributed_spawn.py Back out "Revert "[DDP] multiple forward support for static graph (#103487)" (#103873)" (#103938) 2023-06-22 21:55:58 +00:00
test_dynamo_distributed.py [ROCm] enable additional inductor/dynamo UTs (#104624) 2023-07-11 20:44:02 +00:00
test_fake_pg.py [c10d] add fake pg necessary collectives (#102238) 2023-05-25 05:01:16 +00:00
test_functional_api.py [c10d] Adopt allgather_into_tensor_coalesced for NCCL. (#103086) 2023-07-06 15:05:55 +00:00
test_inductor_collectives.py [ROCm] enable additional inductor/dynamo UTs (#104624) 2023-07-11 20:44:02 +00:00
test_launcher.py
test_multi_threaded_pg.py [MTPG] Use TLS propagation to enable MTPG from bwd. (#104735) 2023-07-12 18:47:02 +00:00
test_nccl.py
test_pg_wrapper.py [c10d] Figure out device to use for object collectives (#100954) 2023-05-11 01:49:09 +00:00
test_store.py [C10D] Reimplement TCPStore wait timeout logic. (#100594) 2023-07-11 00:36:41 +00:00