pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-14 20:57:59 +00:00

History

Brian Hirsh 471017cbc9 avoid specializing strides with DDPOptimizer + inductor (#140751 ) Fixes https://github.com/pytorch/pytorch/issues/140229 Fixes https://github.com/pytorch/pytorch/issues/139474 The issue was that: (1) DDPOptimizer has some logic to partition the dynamo graph into buckets, and run AOTAutograd/inductor on each bucket (2) doing so requires knowing the exact strides of the outputs of each subgraph, so we can have example inputs (with correct strides) to each of the later subgraphs to compile with (3) there is some existing logic to do this today: we have a `fakify_first_call` flag in AOTAutograd that lets you run it with fake tensor inputs (to handle the calling convention changes that AOTAutograd performs at runtime). During this process, we query inductor for the output strides that it compiled with (4) these outputs strides are stored in the FX graph cache as raw strings of sympy expressions. We have a function, `evaluate_symexpr`, which given the sympy string, and the ShapeEnv's `var_to_val` mapping, will evaluate the sympy string to generate concrete strides (5) evaluating this expression will specialize on the exact values of any variables in our shape env, however. In DDPOptimizer, we want to know what inductor's stride outputs are symbolically. This requires converting the (string) sympy expression into actual `SymInts` that we can return. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140751 Approved by: https://github.com/eellison		2024-12-05 03:41:12 +00:00
..
_composable	[FSDP2] Move to public `torch.distributed.fsdp` (#141868 )	2024-12-05 03:04:01 +00:00
_shard	Flip default on weights_only (#137602 )	2024-11-04 18:30:29 +00:00
_tensor	[FSDP2] Move to public `torch.distributed.fsdp` (#141868 )	2024-12-05 03:04:01 +00:00
_tools	[FSDP2] Move to public `torch.distributed.fsdp` (#141868 )	2024-12-05 03:04:01 +00:00
algorithms
bin
checkpoint	[FSDP2] Move to public `torch.distributed.fsdp` (#141868 )	2024-12-05 03:04:01 +00:00
elastic
flight_recorder	[FR] Polish the log message for dtype mismatch and don't exit when too many mismatch (#140451 )	2024-11-13 07:24:53 +00:00
fsdp	Remove old FSDP1 `fully_shard` (#141875 )	2024-12-03 17:00:47 +00:00
launcher
nn/jit
optim
pipelining	[pipelining] Improve schedule csv loading (#142009 )	2024-12-04 04:15:34 +00:00
rpc
tensor/parallel	[Inductor] improve the stride preservation logic of user-visible outputs (#136732 )	2024-10-26 18:49:14 +00:00
argparse_util_test.py
test_backends.py	API to retrieve default distributed backend from device (#140536 )	2024-11-22 11:01:53 +00:00
test_c10d_common.py	Enforce contiguity for alltoall (#141816 )	2024-12-04 10:17:39 +00:00
test_c10d_functional_native.py	[c10d][Partial-Graph Overlap] Support calling .wait_tensor() on output tensor of eager `async_op=True` collective if under `allow_inflight_collective_as_graph_input_ctx()` context manager (#137763 )	2024-10-29 03:31:19 +00:00
test_c10d_gloo.py
test_c10d_logger.py	[c10d] Switch all timer logging in c10d to wait_counter (#141154 )	2024-11-21 01:10:11 +00:00
test_c10d_nccl.py	[c10d] Test needs abort; otherwise will hang (#141509 )	2024-11-27 05:47:17 +00:00
test_c10d_object_collectives.py	[c10d][CI] Improve world size setting in some tests (#138846 )	2024-10-25 23:02:17 +00:00
test_c10d_ops_nccl.py	[c10d][CI] Improve world size setting in some tests (#138846 )	2024-10-25 23:02:17 +00:00
test_c10d_pypg.py	PyProcessGroup: support rank, world size, group name/desc overrides (#141529 )	2024-11-26 20:56:57 +00:00
test_c10d_spawn.py
test_c10d_spawn_gloo.py
test_c10d_spawn_nccl.py
test_c10d_spawn_ucc.py
test_c10d_ucc.py
test_collective_utils.py
test_compute_comm_reordering.py	[CI] Add Compiled DDP / Compiled FSDP2 / compute-comm reordering tests to test_inductor_distributed (#138178 )	2024-10-20 19:38:18 +00:00
test_control_collectives.py
test_data_parallel.py	Replace clone.detach with detach.clone (#140264 )	2024-11-13 07:01:02 +00:00
test_device_mesh.py	[DeviceMesh] fix sub mesh size calculation in create_sub_mesh() (#138945 )	2024-10-29 17:56:56 +00:00
test_distributed_spawn.py
test_dynamo_distributed.py	avoid specializing strides with DDPOptimizer + inductor (#140751 )	2024-12-05 03:41:12 +00:00
test_fake_pg.py
test_functional_api.py	Generalization of distributed test cases for non-CUDA devices (#138216 )	2024-11-18 09:38:00 +00:00
test_inductor_collectives.py	Move Sympy printers to torch/utils/_sympy/printers.py (#140597 )	2024-11-26 18:11:00 +00:00
test_launcher.py
test_multi_threaded_pg.py
test_nccl.py	[Pytorch][ATEN] Enable FP8 NCCL in Pytorch ATEN (#138776 )	2024-10-25 21:56:47 +00:00
test_pg_wrapper.py
test_store.py
test_symmetric_memory.py	[torch/distributed] Make _SymmetricMemory.has_multicast_support() ret… (#141598 )	2024-11-26 23:36:32 +00:00