pytorch/test/distributed/fsdp
Alexander Zinoviev ee713f80ed Enable channels_last format for FSDP (#137382)
Enable FSDP to deal with channels_last memory formatted tensors. Preserving channels_last memory format makes FSDP compatible with the best kernels CUDNN offers.

Summary of changes:
1) Store strides information along with shapes
2) Replace calls to flatten() with as_strided(size=(param.numel(),), stride=(1,)) for flattening
3) Replace calls to view() with as_strided with the stored sizes and strides for unflattening

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137382
Approved by: https://github.com/awgu
2024-10-11 03:47:16 +00:00
..
test_checkpoint_wrapper.py
test_distributed_checkpoint.py
test_fsdp_apply.py Generalization of FSDP common for non-cuda execution (#133209) 2024-09-27 00:38:10 +00:00
test_fsdp_backward_prefetch.py
test_fsdp_checkpoint.py
test_fsdp_clip_grad_norm.py Generalization of FSDP common for non-cuda execution (#133209) 2024-09-27 00:38:10 +00:00
test_fsdp_comm.py Generalization of FSDP common for non-cuda execution (#133209) 2024-09-27 00:38:10 +00:00
test_fsdp_comm_hooks.py
test_fsdp_core.py Generalization of FSDP common for non-cuda execution (#133209) 2024-09-27 00:38:10 +00:00
test_fsdp_dtensor_state_dict.py
test_fsdp_exec_order.py
test_fsdp_fine_tune.py
test_fsdp_flatten_params.py Enable channels_last format for FSDP (#137382) 2024-10-11 03:47:16 +00:00
test_fsdp_freezing_weights.py
test_fsdp_fx.py
test_fsdp_grad_acc.py Generalization of FSDP common for non-cuda execution (#133209) 2024-09-27 00:38:10 +00:00
test_fsdp_hybrid_shard.py Generalization of FSDP common for non-cuda execution (#133209) 2024-09-27 00:38:10 +00:00
test_fsdp_ignored_modules.py Generalization of FSDP common for non-cuda execution (#133209) 2024-09-27 00:38:10 +00:00
test_fsdp_input.py
test_fsdp_memory.py
test_fsdp_meta.py
test_fsdp_misc.py Generalization of FSDP common for non-cuda execution (#133209) 2024-09-27 00:38:10 +00:00
test_fsdp_mixed_precision.py Generalization of FSDP common for non-cuda execution (#133209) 2024-09-27 00:38:10 +00:00
test_fsdp_multiple_forward.py
test_fsdp_multiple_wrapping.py
test_fsdp_optim_state.py Generalization of FSDP common for non-cuda execution (#133209) 2024-09-27 00:38:10 +00:00
test_fsdp_overlap.py
test_fsdp_pure_fp16.py Generalization of FSDP common for non-cuda execution (#133209) 2024-09-27 00:38:10 +00:00
test_fsdp_sharded_grad_scaler.py Generalization of FSDP common for non-cuda execution (#133209) 2024-09-27 00:38:10 +00:00
test_fsdp_state_dict.py Generalization of FSDP common for non-cuda execution (#133209) 2024-09-27 00:38:10 +00:00
test_fsdp_tp_integration.py
test_fsdp_traversal.py Generalization of FSDP common for non-cuda execution (#133209) 2024-09-27 00:38:10 +00:00
test_fsdp_uneven.py
test_fsdp_unshard_params.py Generalization of FSDP common for non-cuda execution (#133209) 2024-09-27 00:38:10 +00:00
test_fsdp_use_orig_params.py Add Triton CPU as an Inductor backend (#133408) 2024-09-30 20:24:52 +00:00
test_hsdp_dtensor_state_dict.py
test_shard_utils.py
test_utils.py
test_wrap.py Generalization of FSDP common for non-cuda execution (#133209) 2024-09-27 00:38:10 +00:00