pytorch/test/distributed/checkpoint
Ke Wen 762a05b3b3 [DCP] Remove all-gather of state dict keys (#145998)
The original `_all_gather_keys` call was for a safety check, but could be costly as things scale, and it blocks CPU.

Instead, we make it clear in the documentation that the `state_dict` passed to the `load` API should have same set of keys, otherwise the API may hang.

In addition, we move the check to a utility function: `utils.assert_same_keys`. User uncertain about state dict unity can optionally call this API to check.

Resolves #145965 (as a workaround).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145998
Approved by: https://github.com/mhorowitz, https://github.com/fegin
2025-02-04 03:16:13 +00:00
..
e2e PEP585 update - test (#145176) 2025-01-22 04:48:28 +00:00
fsdp [FSDP2] Move to public torch.distributed.fsdp (#141868) 2024-12-07 01:24:28 +00:00
test_checkpoint.py PEP585 update - test (#145176) 2025-01-22 04:48:28 +00:00
test_compatibility.py
test_dedup_tensors.py
test_dtensor_checkpoint.py PEP585 update - test (#145176) 2025-01-22 04:48:28 +00:00
test_dtensor_resharding.py [dcp] Add ZStandard transformer (#143360) 2025-01-25 00:14:07 +00:00
test_file_system_checkpoint.py [dcp] Add ZStandard transformer (#143360) 2025-01-25 00:14:07 +00:00
test_file_system_checkpoint_cpu.py [dcp] Add ZStandard transformer (#143360) 2025-01-25 00:14:07 +00:00
test_format_utils.py
test_fsdp_model_state.py
test_fsdp_optim_state.py
test_fsdp_tp_checkpoint_conversion.py Fix unused Python variables in test/[a-d]* (#134665) 2024-12-13 22:13:12 +00:00
test_fsspec.py PEP585 update - test (#145176) 2025-01-22 04:48:28 +00:00
test_hsdp_checkpoint.py Fix unused Python variables in test/[a-d]* (#134665) 2024-12-13 22:13:12 +00:00
test_nested_dict.py Fix unused Python variables in test/[a-d]* (#134665) 2024-12-13 22:13:12 +00:00
test_planner.py [checkpointing][oss] Throw an error when loading a different size than saved tensor (#141571) 2024-12-11 15:35:48 +00:00
test_save_load_api.py [DCP] Remove all-gather of state dict keys (#145998) 2025-02-04 03:16:13 +00:00
test_state_dict.py PEP585 update - test (#145176) 2025-01-22 04:48:28 +00:00
test_state_dict_utils.py Fix unused Python variables in test/[a-d]* (#134665) 2024-12-13 22:13:12 +00:00
test_tp_checkpoint.py
test_traverse.py
test_utils.py [distributed] Fix _ReaderView.read() and readinto() to stop reading at the end of the slice (#143357) 2025-01-11 00:22:10 +00:00