pytorch/test/distributed/checkpoint
Ke Wen 762a05b3b3 [DCP] Remove all-gather of state dict keys (#145998)
The original `_all_gather_keys` call was for a safety check, but could be costly as things scale, and it blocks CPU.

Instead, we make it clear in the documentation that the `state_dict` passed to the `load` API should have same set of keys, otherwise the API may hang.

In addition, we move the check to a utility function: `utils.assert_same_keys`. User uncertain about state dict unity can optionally call this API to check.

Resolves #145965 (as a workaround).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145998
Approved by: https://github.com/mhorowitz, https://github.com/fegin
2025-02-04 03:16:13 +00:00
..
e2e PEP585 update - test (#145176) 2025-01-22 04:48:28 +00:00
fsdp
test_checkpoint.py PEP585 update - test (#145176) 2025-01-22 04:48:28 +00:00
test_compatibility.py
test_dedup_tensors.py
test_dtensor_checkpoint.py PEP585 update - test (#145176) 2025-01-22 04:48:28 +00:00
test_dtensor_resharding.py [dcp] Add ZStandard transformer (#143360) 2025-01-25 00:14:07 +00:00
test_file_system_checkpoint.py [dcp] Add ZStandard transformer (#143360) 2025-01-25 00:14:07 +00:00
test_file_system_checkpoint_cpu.py [dcp] Add ZStandard transformer (#143360) 2025-01-25 00:14:07 +00:00
test_format_utils.py
test_fsdp_model_state.py
test_fsdp_optim_state.py
test_fsdp_tp_checkpoint_conversion.py
test_fsspec.py PEP585 update - test (#145176) 2025-01-22 04:48:28 +00:00
test_hsdp_checkpoint.py
test_nested_dict.py
test_planner.py
test_save_load_api.py [DCP] Remove all-gather of state dict keys (#145998) 2025-02-04 03:16:13 +00:00
test_state_dict.py PEP585 update - test (#145176) 2025-01-22 04:48:28 +00:00
test_state_dict_utils.py
test_tp_checkpoint.py
test_traverse.py
test_utils.py