pytorch/test/distributed
weifengpy 4bc846c101 [FSDP] Ignore buffer type casting in ignored modules (#106766)
issue resolved: https://github.com/pytorch/pytorch/issues/97791

before this PR, mixed_precision applies to buffers from ignored modules. see ```test_state_dict_with_ignored_modules(mixed_precision=True)``` for reproduce

after, we avoid applying mixed_precision semantics to buffers from ignored modules
* step 1 initialization: state._ignored_buffer_names contains all the buffers from ignored modules
* step 2 lazy init at runtime: skip ignored buffers in ```_get_buffers_and_dtypes_for_computation```
* step 3 skip upcasting in state_dict hook: avoid upcasting for ignored buffers in ```_get_buffers_and_dtypes_for_computation```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106766
Approved by: https://github.com/awgu
2023-08-09 23:09:43 +00:00
..
_composable [FSDP][9/N] Introduce CustomPolicy (#104986) 2023-08-03 12:46:36 +00:00
_shard [distributed][sharded_tensor] Move local_shards check from ShardedTensorBase to ShardedTensor (#100197) 2023-05-02 12:42:24 +00:00
_spmd [device_mesh][BE] remove allgather from DM (#105614) 2023-07-27 01:33:05 +00:00
_tensor [device_mesh][BE] reduce_scatter fallback to funcol and remove from DM (#105642) 2023-07-27 01:33:05 +00:00
_tools
algorithms [BE] Fix all B022 useless-contextlib-suppress (#100335) 2023-04-30 18:47:40 +00:00
bin
checkpoint [DCP] Modify tensor saving logic in DCP (#106415) 2023-08-09 00:16:10 +00:00
elastic [BE] Enable ruff's UP rules and autoformat distributed/ (#105433) 2023-07-19 14:27:11 +00:00
fsdp [FSDP] Ignore buffer type casting in ignored modules (#106766) 2023-08-09 23:09:43 +00:00
launcher [BE] Enable ruff's UP rules and autoformat distributed/ (#105433) 2023-07-19 14:27:11 +00:00
nn/jit
optim Back out "Reland "Make adding buffers more like adding parameters (#104069)" (#106224)" (#106743) 2023-08-08 15:27:34 +00:00
pipeline/sync [BE] Enable ruff's UP rules and autoformat distributed/ (#105433) 2023-07-19 14:27:11 +00:00
rpc
tensor/parallel Clean up unsed MHA code to avoid confusion (#105956) 2023-07-27 17:10:17 +00:00
argparse_util_test.py
test_c10d_common.py [BE] Enable ruff's UP rules and autoformat distributed/ (#105433) 2023-07-19 14:27:11 +00:00
test_c10d_gloo.py [BE] Enable ruff's UP rules and autoformat distributed/ (#105433) 2023-07-19 14:27:11 +00:00
test_c10d_logger.py [c10d] Record time spent for init_process_group, new_group, _store_based_barrier (#101912) 2023-05-24 09:36:34 +00:00
test_c10d_nccl.py Fixes netName assignment for NCCL Config (#105776) 2023-07-26 21:13:56 +00:00
test_c10d_object_collectives.py [c10d] Remove test for init barrier (#103223) 2023-06-08 16:56:40 +00:00
test_c10d_pypg.py
test_c10d_spawn.py [BE] f-stringify torch/ and scripts (#105538) 2023-07-21 19:35:24 +00:00
test_c10d_spawn_gloo.py
test_c10d_spawn_nccl.py
test_c10d_spawn_ucc.py
test_c10d_ucc.py [BE] Enable ruff's UP rules and autoformat distributed/ (#105433) 2023-07-19 14:27:11 +00:00
test_collective_utils.py Initial commit of collective_utils (#101037) 2023-06-27 02:15:16 +00:00
test_data_parallel.py Back out "Reland "Make adding buffers more like adding parameters (#104069)" (#106224)" (#106743) 2023-08-08 15:27:34 +00:00
test_distributed_spawn.py Back out "Revert "[DDP] multiple forward support for static graph (#103487)" (#103873)" (#103938) 2023-06-22 21:55:58 +00:00
test_dynamo_distributed.py Back out "Reland "Make adding buffers more like adding parameters (#104069)" (#106224)" (#106743) 2023-08-08 15:27:34 +00:00
test_fake_pg.py [c10d] add fake pg necessary collectives (#102238) 2023-05-25 05:01:16 +00:00
test_functional_api.py [device_mesh][BE] reduce_scatter fallback to funcol and remove from DM (#105642) 2023-07-27 01:33:05 +00:00
test_inductor_collectives.py [ROCm] enable additional inductor/dynamo UTs (#104624) 2023-07-11 20:44:02 +00:00
test_launcher.py
test_multi_threaded_pg.py [C10D] Improve MTPG autograd test. Fixes #105106 (#105356) 2023-07-20 13:51:21 +00:00
test_nccl.py
test_pg_wrapper.py [c10d] Figure out device to use for object collectives (#100954) 2023-05-11 01:49:09 +00:00
test_store.py [BE] Enable ruff's UP rules and autoformat distributed/ (#105433) 2023-07-19 14:27:11 +00:00