pytorch/torch/distributed
Howard Huang e8e65764d1 [pipelining] Improve schedule csv loading (#142009)
Add small changes based on feedback from Less when testing out https://github.com/pytorch/torchtitan/pull/707
- expose `validate_schedule` as a function
- handle spaces around actions in csv file
- add error arrow to `_format_pipeline_schedule()` to better show where the step errored

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142009
Approved by: https://github.com/lessw2020
2024-12-04 04:15:34 +00:00
..
_composable Remove old FSDP1 fully_shard (#141875) 2024-12-03 17:00:47 +00:00
_shard
_sharded_tensor
_sharding_spec
_symmetric_memory
_tensor
_tools Revert "ILP for auto FSDP wrapping (#140298)" 2024-12-02 14:08:04 +00:00
algorithms [BE]: Update mypy to 1.13.0 (#140808) 2024-12-03 02:50:10 +00:00
autograd
benchmarks
checkpoint Initialize lr as a tensor if it is originally a tensor (#141620) 2024-12-03 18:10:23 +00:00
elastic
examples
fsdp [BE]: Update mypy to 1.13.0 (#140808) 2024-12-03 02:50:10 +00:00
launcher
nn
optim
pipelining [pipelining] Improve schedule csv loading (#142009) 2024-12-04 04:15:34 +00:00
rpc
tensor [BE]: Update mypy to 1.13.0 (#140808) 2024-12-03 02:50:10 +00:00
__init__.py
_checkpointable.py
_composable_state.py
_functional_collectives.py
_functional_collectives_impl.py
_state_dict_utils.py
argparse_util.py
c10d_logger.py [c10d] Switch all timer logging in c10d to wait_counter (#141154) 2024-11-21 01:10:11 +00:00
collective_utils.py
constants.py
CONTRIBUTING.md
device_mesh.py
distributed_c10d.py [BE]: Update mypy to 1.13.0 (#140808) 2024-12-03 02:50:10 +00:00
launch.py
logging_handlers.py
remote_device.py
rendezvous.py
run.py
utils.py