pytorch/torch/distributed
Will Constable 84416618a6 [Pipelining] Update schedules to use I, B actions. (#138886)
Also, update tests to use I (BACKWARD_INPUT) vs B (FULL_BACKWARD)
consistently.

Previously, schedules would issue a 'B' operation and leave it ambiguous
whether that operation should be BACKWARD_INPUT or FULL_BACKWARD,
depending on a separate flag (use_full_backward) passed to the schedule
class, which would determine which behavior was taken at runtime.

Now, use_full_backward is removed and the schedule class is required to
produce unambiguous IR.  The logic for 'use_full_backward' is removed
from the runtime.

_validate_pipeline_order is replaced  with _simulate_comms_compute. Both
offer similar functionality, to validate the corrrectness of a schedule
IR.  'validate' operates on compute-only IR, while simulate operates on
compute + comm IR.  To convert from using validate to simulate, you have
to first insert comm actions via '_add_send_recv'.

'simulate' was inefficiently written before this PR and needed to be
optimized to run quickly for extra large schedules with >32 ranks and
microbatches per rank used in some unit tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138886
Approved by: https://github.com/H-Huang
2024-11-01 03:54:06 +00:00
..
_composable [Device] Replace hardcoded devices with 'torch._C._get_accelerator()' (#139032) 2024-10-29 04:51:47 +00:00
_shard
_sharded_tensor
_sharding_spec
_symmetric_memory get_symm_mem_workspace(): print helpful error during graph capture (#138028) 2024-10-30 18:11:09 +00:00
_tensor
_tools
algorithms Make DDP Quantization hooks backend Agnostic (#138816) 2024-10-29 15:02:45 +00:00
autograd
benchmarks
checkpoint [DCP] Unit Test to validate the stateful and non-stateful loads (#139251) 2024-10-31 01:12:51 +00:00
elastic
examples
fsdp
launcher
nn
optim
pipelining [Pipelining] Update schedules to use I, B actions. (#138886) 2024-11-01 03:54:06 +00:00
rpc
tensor [DTensor][Bug Fix]Fix 2D DTensor mm with mesh_shape (1, n) or (n, 1) (#139134) 2024-10-30 08:09:39 +00:00
__init__.py
_checkpointable.py
_composable_state.py
_functional_collectives.py [c10d][Partial-Graph Overlap] Support calling .wait_tensor() on output tensor of eager async_op=True collective if under allow_inflight_collective_as_graph_input_ctx() context manager (#137763) 2024-10-29 03:31:19 +00:00
_functional_collectives_impl.py
_state_dict_utils.py
argparse_util.py
c10d_logger.py
collective_utils.py
constants.py
CONTRIBUTING.md
device_mesh.py [DeviceMesh] fix sub mesh size calculation in create_sub_mesh() (#138945) 2024-10-29 17:56:56 +00:00
distributed_c10d.py [c10d] allow sub group to be eagerly inited even if default one is not (#138665) 2024-10-24 23:51:28 +00:00
launch.py
logging_handlers.py
remote_device.py
rendezvous.py
run.py
utils.py