pytorch/test/distributed
Brian Hirsh 471017cbc9 avoid specializing strides with DDPOptimizer + inductor (#140751)
Fixes https://github.com/pytorch/pytorch/issues/140229

Fixes https://github.com/pytorch/pytorch/issues/139474

The issue was that:

(1) DDPOptimizer has some logic to partition the dynamo graph into buckets, and run AOTAutograd/inductor on each bucket

(2) doing so requires knowing the **exact** strides of the outputs of each subgraph, so we can have example inputs (with correct strides) to each of the later subgraphs to compile with

(3) there is some existing logic to do this today: we have a `fakify_first_call` flag in AOTAutograd that lets you run it with fake tensor inputs (to handle the calling convention changes that AOTAutograd performs at runtime). During this process, we query inductor for the output strides that it compiled with

(4) these outputs strides are stored in the FX graph cache as raw strings of sympy expressions. We have a function, `evaluate_symexpr`, which given the sympy string, and the ShapeEnv's `var_to_val` mapping, will evaluate the sympy string to generate concrete strides

(5) evaluating this expression will specialize on the exact values of any variables in our shape env, however. In DDPOptimizer, we want to know what inductor's stride outputs are symbolically. This requires converting the (string) sympy expression into actual `SymInts` that we can return.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140751
Approved by: https://github.com/eellison
2024-12-05 03:41:12 +00:00
..
_composable [FSDP2] Move to public torch.distributed.fsdp (#141868) 2024-12-05 03:04:01 +00:00
_shard Flip default on weights_only (#137602) 2024-11-04 18:30:29 +00:00
_tensor [FSDP2] Move to public torch.distributed.fsdp (#141868) 2024-12-05 03:04:01 +00:00
_tools [FSDP2] Move to public torch.distributed.fsdp (#141868) 2024-12-05 03:04:01 +00:00
algorithms
bin
checkpoint [FSDP2] Move to public torch.distributed.fsdp (#141868) 2024-12-05 03:04:01 +00:00
elastic
flight_recorder [FR] Polish the log message for dtype mismatch and don't exit when too many mismatch (#140451) 2024-11-13 07:24:53 +00:00
fsdp Remove old FSDP1 fully_shard (#141875) 2024-12-03 17:00:47 +00:00
launcher
nn/jit
optim
pipelining [pipelining] Improve schedule csv loading (#142009) 2024-12-04 04:15:34 +00:00
rpc
tensor/parallel [Inductor] improve the stride preservation logic of user-visible outputs (#136732) 2024-10-26 18:49:14 +00:00
argparse_util_test.py
test_backends.py API to retrieve default distributed backend from device (#140536) 2024-11-22 11:01:53 +00:00
test_c10d_common.py Enforce contiguity for alltoall (#141816) 2024-12-04 10:17:39 +00:00
test_c10d_functional_native.py [c10d][Partial-Graph Overlap] Support calling .wait_tensor() on output tensor of eager async_op=True collective if under allow_inflight_collective_as_graph_input_ctx() context manager (#137763) 2024-10-29 03:31:19 +00:00
test_c10d_gloo.py
test_c10d_logger.py [c10d] Switch all timer logging in c10d to wait_counter (#141154) 2024-11-21 01:10:11 +00:00
test_c10d_nccl.py [c10d] Test needs abort; otherwise will hang (#141509) 2024-11-27 05:47:17 +00:00
test_c10d_object_collectives.py [c10d][CI] Improve world size setting in some tests (#138846) 2024-10-25 23:02:17 +00:00
test_c10d_ops_nccl.py [c10d][CI] Improve world size setting in some tests (#138846) 2024-10-25 23:02:17 +00:00
test_c10d_pypg.py PyProcessGroup: support rank, world size, group name/desc overrides (#141529) 2024-11-26 20:56:57 +00:00
test_c10d_spawn.py
test_c10d_spawn_gloo.py
test_c10d_spawn_nccl.py
test_c10d_spawn_ucc.py
test_c10d_ucc.py
test_collective_utils.py
test_compute_comm_reordering.py [CI] Add Compiled DDP / Compiled FSDP2 / compute-comm reordering tests to test_inductor_distributed (#138178) 2024-10-20 19:38:18 +00:00
test_control_collectives.py
test_data_parallel.py Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
test_device_mesh.py [DeviceMesh] fix sub mesh size calculation in create_sub_mesh() (#138945) 2024-10-29 17:56:56 +00:00
test_distributed_spawn.py
test_dynamo_distributed.py avoid specializing strides with DDPOptimizer + inductor (#140751) 2024-12-05 03:41:12 +00:00
test_fake_pg.py
test_functional_api.py Generalization of distributed test cases for non-CUDA devices (#138216) 2024-11-18 09:38:00 +00:00
test_inductor_collectives.py Move Sympy printers to torch/utils/_sympy/printers.py (#140597) 2024-11-26 18:11:00 +00:00
test_launcher.py
test_multi_threaded_pg.py
test_nccl.py [Pytorch][ATEN] Enable FP8 NCCL in Pytorch ATEN (#138776) 2024-10-25 21:56:47 +00:00
test_pg_wrapper.py
test_store.py
test_symmetric_memory.py [torch/distributed] Make _SymmetricMemory.has_multicast_support() ret… (#141598) 2024-11-26 23:36:32 +00:00