pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-14 20:57:59 +00:00

History

mori360 a7ba562ec8 [state dict] Change _load_model_state_dict to enable cpu_offload, accept 2 device type and optimize memory (#142845 ) For destributed state dict api [migration](https://github.com/pytorch/torchtune/pull/2138), make the changes here: 1. `load_from_full_model_state_dict` at TorchTune calls `set_model_state_dict` with the options on whether to have cpu_offload. Add cpu_offload at _load_model_state_dict to process to cpu if config is True 2. Change the device check as lora_finetune might hace 2 device types, accept that to be valid. 3. Some changes to optimize the memory performance: 3.1 use `.detach().clone()` instead of view directly 3.2 if local_state is not meta, copy `full_tensor[slices]` to `ret.to_local()` 4. add relative unit tests Memory performance calling from TorchTune with llama2/7B_full: 1. cpu_offload = True <img width="555" alt="Screenshot 2024-12-18 at 1 36 47 PM" src="https://github.com/user-attachments/assets/429261f5-1107-4592-b295-de3944a2614b" /> 2. cpu_offload = False <img width="555" alt="Screenshot 2024-12-18 at 1 36 52 PM" src="https://github.com/user-attachments/assets/40bf281a-236a-4218-826b-b1192a10c806" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/142845 Approved by: https://github.com/fegin		2024-12-19 05:06:41 +00:00
..
e2e	Fix unused Python variables in test/[a-d]* (#134665 )	2024-12-13 22:13:12 +00:00
fsdp	[FSDP2] Move to public `torch.distributed.fsdp` (#141868 )	2024-12-07 01:24:28 +00:00
test_checkpoint.py	Fix unused Python variables in test/[a-d]* (#134665 )	2024-12-13 22:13:12 +00:00
test_compatibility.py
test_dedup_tensors.py
test_dtensor_checkpoint.py
test_dtensor_resharding.py
test_file_system_checkpoint.py
test_file_system_checkpoint_cpu.py
test_format_utils.py
test_fsdp_model_state.py
test_fsdp_optim_state.py
test_fsdp_tp_checkpoint_conversion.py	Fix unused Python variables in test/[a-d]* (#134665 )	2024-12-13 22:13:12 +00:00
test_fsspec.py
test_hsdp_checkpoint.py	Fix unused Python variables in test/[a-d]* (#134665 )	2024-12-13 22:13:12 +00:00
test_nested_dict.py	Fix unused Python variables in test/[a-d]* (#134665 )	2024-12-13 22:13:12 +00:00
test_planner.py	[checkpointing][oss] Throw an error when loading a different size than saved tensor (#141571 )	2024-12-11 15:35:48 +00:00
test_save_load_api.py	Fix unused Python variables in test/[a-d]* (#134665 )	2024-12-13 22:13:12 +00:00
test_state_dict.py	[state dict] Change _load_model_state_dict to enable cpu_offload, accept 2 device type and optimize memory (#142845 )	2024-12-19 05:06:41 +00:00
test_state_dict_utils.py	Fix unused Python variables in test/[a-d]* (#134665 )	2024-12-13 22:13:12 +00:00
test_tp_checkpoint.py
test_traverse.py
test_utils.py