mirror of
https://github.com/saymrwulf/pytorch.git
synced 2026-05-14 20:57:59 +00:00
Fixes https://github.com/pytorch/pytorch/issues/140229 Fixes https://github.com/pytorch/pytorch/issues/139474 The issue was that: (1) DDPOptimizer has some logic to partition the dynamo graph into buckets, and run AOTAutograd/inductor on each bucket (2) doing so requires knowing the **exact** strides of the outputs of each subgraph, so we can have example inputs (with correct strides) to each of the later subgraphs to compile with (3) there is some existing logic to do this today: we have a `fakify_first_call` flag in AOTAutograd that lets you run it with fake tensor inputs (to handle the calling convention changes that AOTAutograd performs at runtime). During this process, we query inductor for the output strides that it compiled with (4) these outputs strides are stored in the FX graph cache as raw strings of sympy expressions. We have a function, `evaluate_symexpr`, which given the sympy string, and the ShapeEnv's `var_to_val` mapping, will evaluate the sympy string to generate concrete strides (5) evaluating this expression will specialize on the exact values of any variables in our shape env, however. In DDPOptimizer, we want to know what inductor's stride outputs are symbolically. This requires converting the (string) sympy expression into actual `SymInts` that we can return. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140751 Approved by: https://github.com/eellison |
||
|---|---|---|
| .. | ||
| _composable | ||
| _shard | ||
| _tensor | ||
| _tools | ||
| algorithms | ||
| bin | ||
| checkpoint | ||
| elastic | ||
| flight_recorder | ||
| fsdp | ||
| launcher | ||
| nn/jit | ||
| optim | ||
| pipelining | ||
| rpc | ||
| tensor/parallel | ||
| argparse_util_test.py | ||
| test_backends.py | ||
| test_c10d_common.py | ||
| test_c10d_functional_native.py | ||
| test_c10d_gloo.py | ||
| test_c10d_logger.py | ||
| test_c10d_nccl.py | ||
| test_c10d_object_collectives.py | ||
| test_c10d_ops_nccl.py | ||
| test_c10d_pypg.py | ||
| test_c10d_spawn.py | ||
| test_c10d_spawn_gloo.py | ||
| test_c10d_spawn_nccl.py | ||
| test_c10d_spawn_ucc.py | ||
| test_c10d_ucc.py | ||
| test_collective_utils.py | ||
| test_compute_comm_reordering.py | ||
| test_control_collectives.py | ||
| test_data_parallel.py | ||
| test_device_mesh.py | ||
| test_distributed_spawn.py | ||
| test_dynamo_distributed.py | ||
| test_fake_pg.py | ||
| test_functional_api.py | ||
| test_inductor_collectives.py | ||
| test_launcher.py | ||
| test_multi_threaded_pg.py | ||
| test_nccl.py | ||
| test_pg_wrapper.py | ||
| test_store.py | ||
| test_symmetric_memory.py | ||