pytorch/torch/distributed
PyTorch MergeBot e363a8a222 Revert "[pipelining] Add pipeline stage test (#126721)"
This reverts commit b948b1ad7a.

Reverted https://github.com/pytorch/pytorch/pull/126721 on behalf of https://github.com/clee2000 due to The test_public_bindings failure is real, you just got unlucky since it was also broken on trunk for a different reason ([comment](https://github.com/pytorch/pytorch/pull/126721#issuecomment-2121725408))
2024-05-21 04:40:05 +00:00
..
_composable [Traceable FSDP2] Change from register_multi_grad_hook to per-tensor backward hook (#126350) 2024-05-18 04:44:29 +00:00
_shard Revert "[FX] Update type hints in torch.fx._compatibility.py (#125469)" 2024-05-06 18:36:43 +00:00
_sharded_tensor
_sharding_spec
_spmd [dtensor] refactor view ops to use OpStrategy (#126011) 2024-05-17 05:39:21 +00:00
_tensor [DTensor] Turn on foreach implementation for clip_grad_norm_ for DTensor by default (#126423) 2024-05-17 06:57:52 +00:00
_tools [BE]: Try TCH autofixes on torch/ (#125536) 2024-05-05 23:13:59 +00:00
algorithms
autograd
benchmarks
checkpoint Fix strict default value in StateDictOptions (#125998) 2024-05-16 21:42:53 +00:00
elastic [torch-distributed] Make log directory creation idempotent (#126496) 2024-05-18 00:17:13 +00:00
examples
fsdp [BE][FSDP] Remove unnecessary warnings (#126365) 2024-05-16 17:34:01 +00:00
launcher torchelastic: change monitor_interval default to 0.1 (#124692) 2024-04-24 01:44:41 +00:00
nn
optim [optim] add fused_adagrad support for CPU device (#124905) 2024-05-16 01:11:51 +00:00
pipeline [BE]: Update ruff to v0.4.4 (#125031) 2024-05-12 20:02:37 +00:00
pipelining Revert "[pipelining] Add pipeline stage test (#126721)" 2024-05-21 04:40:05 +00:00
rpc
tensor [DeviceMesh] Make _validate_tp_mesh_dim support 3D (#125763) 2024-05-08 21:22:11 +00:00
__init__.py Revert "c10d: add Collectives abstraction (#125978)" 2024-05-20 07:40:41 +00:00
_composable_state.py
_functional_collectives.py [Traceable FSDP2] Add all_gather_into_tensor out variant (#126334) 2024-05-16 10:27:06 +00:00
_functional_collectives_impl.py Make c10d_functional ops call into _c10d_functional ops (#124979) 2024-04-27 08:08:02 +00:00
_state_dict_utils.py [DSD] Implement broadcast_from_rank0 option for optim state_dict (#125339) 2024-05-08 07:22:20 +00:00
argparse_util.py
c10d_logger.py [DCP] Adds better handling in logging of specific kwargs (#123658) 2024-04-11 21:09:38 +00:00
collective_utils.py
constants.py
CONTRIBUTING.md
device_mesh.py [DeviceMesh] Supported N groups in from_group (#126258) 2024-05-17 01:03:21 +00:00
distributed_c10d.py [C10D] make get_node_local_rank() accept fallback_rank (#126737) 2024-05-21 03:38:02 +00:00
launch.py [Docs][Distributed] Add migration notes for --local-rank option style change for torchrun in PyTorch 2.0 (#109480) 2024-04-16 05:51:57 +00:00
logging_handlers.py
remote_device.py
rendezvous.py Fix public binding to actually traverse modules (#126103) 2024-05-15 19:36:03 +00:00
run.py [BE]: Improve exception typing. Remove NOQAs (#125535) 2024-05-08 14:07:13 +00:00
utils.py