pytorch/test
Brian Hirsh 471017cbc9 avoid specializing strides with DDPOptimizer + inductor (#140751)
Fixes https://github.com/pytorch/pytorch/issues/140229

Fixes https://github.com/pytorch/pytorch/issues/139474

The issue was that:

(1) DDPOptimizer has some logic to partition the dynamo graph into buckets, and run AOTAutograd/inductor on each bucket

(2) doing so requires knowing the **exact** strides of the outputs of each subgraph, so we can have example inputs (with correct strides) to each of the later subgraphs to compile with

(3) there is some existing logic to do this today: we have a `fakify_first_call` flag in AOTAutograd that lets you run it with fake tensor inputs (to handle the calling convention changes that AOTAutograd performs at runtime). During this process, we query inductor for the output strides that it compiled with

(4) these outputs strides are stored in the FX graph cache as raw strings of sympy expressions. We have a function, `evaluate_symexpr`, which given the sympy string, and the ShapeEnv's `var_to_val` mapping, will evaluate the sympy string to generate concrete strides

(5) evaluating this expression will specialize on the exact values of any variables in our shape env, however. In DDPOptimizer, we want to know what inductor's stride outputs are symbolically. This requires converting the (string) sympy expression into actual `SymInts` that we can return.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140751
Approved by: https://github.com/eellison
2024-12-05 03:41:12 +00:00
..
ao/sparsity Run only listed tests on s390x (#140265) 2024-11-20 22:53:09 +00:00
autograd Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
backends/xeon
benchmark_utils
bottleneck_test
cpp [functional autograd] Refactor validate_outputs into a functional variant (#141348) 2024-12-04 18:06:31 +00:00
cpp_api_parity
cpp_extensions OpenReg: Fix releasing tensor issue when exiting process (#140936) 2024-11-22 13:50:35 +00:00
custom_backend
custom_operator
distributed avoid specializing strides with DDPOptimizer + inductor (#140751) 2024-12-05 03:41:12 +00:00
distributions fix test_save_load_transform. (#140494) 2024-11-19 04:36:06 +00:00
dynamo Refactor optional graph module into CompiledFxGraphConstants (#141897) 2024-12-05 00:34:14 +00:00
dynamo_expected_failures [dynamo] Fix VariableBuilder._wrap on frozenset and enforce invariants on ConstantVariable (#141504) 2024-11-27 21:58:35 +00:00
dynamo_skips config: Add env_name_default and env_name_force to Config (#138956) 2024-11-06 21:20:42 +00:00
edge Set RUNPATH so installed tests can find the required shared libraries (#136627) 2024-10-25 09:38:08 +00:00
error_messages
expect [aotd] capture rrelu_with_noise noise mutation in compile (#141867) 2024-12-04 12:18:58 +00:00
export [export] Generate compatible thrift schema out of schema.py (#141611) 2024-11-29 20:09:49 +00:00
forward_backward_compatibility [aotd] capture rrelu_with_noise noise mutation in compile (#141867) 2024-12-04 12:18:58 +00:00
functorch Refactor optional graph module into CompiledFxGraphConstants (#141897) 2024-12-05 00:34:14 +00:00
fx [export] Change fx graph _replace_hook to a list of Callable (#142006) 2024-12-05 03:26:48 +00:00
higher_order_ops Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
inductor [AOTInductor] Option to not include weight in .so (#141997) 2024-12-05 03:35:54 +00:00
jit
jit_hooks
lazy
mobile
nn [BE] Replace skipIfMPS with expectedFailureMPS (#139940) 2024-11-15 03:48:37 +00:00
onnx [ONNX] Remove special handling of torchvision.ops imports in onnx export (#141569) 2024-11-28 18:05:40 +00:00
optim
package [ci, 3.13] skip failing torch.package dynamo-wrapped test (#141886) 2024-12-05 00:33:26 +00:00
profiler [ci, 3.13] disable segfaulting dynamo-wrapped profiler test (#141951) 2024-12-05 00:33:26 +00:00
quantization [ci, 3.13] disable some quantization tests affected by numpy 2.1 overflow error (#141621) 2024-12-05 00:24:29 +00:00
scripts
test_img
torch_np [ci, 3.13] fix/skip failing numpy 2.0+ dynamo-wrapped tests (#141950) 2024-12-05 00:33:26 +00:00
typing
xpu
_test_bazel.py
allowlist_for_publicAPI.json Refactor ShapeGuardPrinter for future C++ addiiton (#140968) 2024-11-27 20:09:58 +00:00
conftest.py
create_dummy_torchscript_model.py
delete.py
hi.py
HowToWriteTestsUsingFileCheck.md
linear.py
load_torchscript_model.py Flip default on weights_only (#137602) 2024-11-04 18:30:29 +00:00
minioptest_failures_dict.json
mkl_verbose.py
mkldnn_verbose.py
pytest_shard_custom.py
run_doctests.sh
run_test.py [BE] Remove Model Dump utility (#141540) 2024-11-27 22:52:55 +00:00
simulate_nccl_errors.py
slow_tests.json Update slow tests (#139051) 2024-11-04 11:49:06 +00:00
test_ao_sparsity.py
test_autocast.py [MPS] Add support for bf16 autocast (#139390) 2024-11-20 19:52:28 +00:00
test_autograd.py Run only listed tests on s390x (#140265) 2024-11-20 22:53:09 +00:00
test_autograd_fallback.py Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
test_autoload.py
test_binary_ufuncs.py Fix unit test failures with SciPy 1.13+ (#141986) 2024-12-04 21:41:38 +00:00
test_bundled_images.py
test_bundled_inputs.py
test_ci_sanity_check_fail.py
test_comparison_utils.py [export] Add device and dtype fields to assert_tensor_metadata (#141071) 2024-11-22 20:54:55 +00:00
test_compile_benchmark_util.py
test_complex.py
test_content_store.py
test_cpp_api_parity.py
test_cpp_extensions_aot.py
test_cpp_extensions_jit.py Avoid file encoding issues when loading cpp extensions (#138565) 2024-10-28 14:06:34 +00:00
test_cpp_extensions_mtia_backend.py
test_cpp_extensions_open_device_registration.py [ci, 3.13] disable another failing cpp_extension test in 3.13 (#141673) 2024-12-05 00:24:42 +00:00
test_cpp_extensions_stream_and_event.py
test_cuda.py Add API query for available per-process CUDA memory (#140620) 2024-12-03 00:24:03 +00:00
test_cuda_expandable_segments.py
test_cuda_multigpu.py
test_cuda_nvml_based_avail.py
test_cuda_primary_ctx.py
test_cuda_sanitizer.py [BE]: Add better optional typing (#138426) 2024-10-27 14:19:00 +00:00
test_cuda_trace.py
test_custom_ops.py Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
test_dataloader.py [BE][Ez]: Update ruff to 0.7.4 (#140806) 2024-11-15 17:04:32 +00:00
test_datapipe.py
test_decomp.py [aotd] capture rrelu_with_noise noise mutation in compile (#141867) 2024-12-04 12:18:58 +00:00
test_deploy.py
test_determination.py
test_dispatch.py
test_dlpack.py Use DLPack for creating tensors out of custom classes, when available. (#138697) 2024-10-26 01:27:05 +00:00
test_dynamic_shapes.py Try to simplify FloorDiv axioms implications when needed during evaluations. (#141267) 2024-11-28 15:35:35 +00:00
test_expanded_weights.py
test_fake_tensor.py Allow Fakified subclass to have different device for inner and outer tensor (#141839) 2024-12-03 00:09:41 +00:00
test_file_check.py
test_flop_counter.py FlopCounterMode: Decompose ops for inference mode (#138508) 2024-11-25 16:53:10 +00:00
test_foreach.py pow: fix meta function output argument dtype check. (#140287) 2024-11-20 13:28:47 +00:00
test_function_schema.py
test_functional_autograd_benchmark.py
test_functional_optim.py Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
test_functionalization.py
test_functionalization_of_rng_ops.py Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
test_futures.py
test_fx.py Revert "Refactor FxGraphDrawer to use HTML-like labels (#137726)" 2024-11-04 17:44:44 +00:00
test_fx_experimental.py [fx] make split_module work with keep_original_order=True and no-op graph (#141340) 2024-11-24 06:41:30 +00:00
test_fx_passes.py
test_fx_reinplace_pass.py
test_hub.py
test_import_stats.py
test_indexing.py Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
test_itt.py
test_jit.py [ci, 3.13] skip some parts of a failing jit test in 3.13 (#141605) 2024-12-05 00:24:22 +00:00
test_jit_autocast.py
test_jit_disabled.py
test_jit_fuser.py
test_jit_fuser_legacy.py
test_jit_fuser_te.py
test_jit_legacy.py
test_jit_llga_fuser.py [Dynamo] Replace torch._dynamo.optimize() with torch.compile() [2/N] (#140238) 2024-11-13 05:13:39 +00:00
test_jit_profiling.py
test_jit_simple.py
test_jit_string.py
test_jiterator.py
test_kernel_launch_checks.py
test_legacy_vmap.py
test_license.py
test_linalg.py [ROCM] Support Multi-GPU offline tuning in TunableOp (#139673) 2024-11-26 19:07:41 +00:00
test_logging.py
test_masked.py
test_maskedtensor.py Correctly specify size of sparse_csr tensors in maskedtensor binary ops (#134335) 2024-12-03 02:55:57 +00:00
test_matmul_cuda.py Revert "[ROCm] port CK rowwise F8 from fbgemm (#140856)" 2024-12-05 01:51:40 +00:00
test_meta.py pow: fix meta function output argument dtype check. (#140287) 2024-11-20 13:28:47 +00:00
test_metal.py
test_mkl_verbose.py
test_mkldnn.py
test_mkldnn_fusion.py
test_mkldnn_verbose.py
test_mobile_optimizer.py
test_model_exports_to_core_aten.py
test_module_tracker.py [ci, 3.13] skip failing module tracker dynamo-wrapped test (#141887) 2024-12-05 00:33:26 +00:00
test_modules.py Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
test_monitor.py [ci, 3.13] update tensorboard version for 3.13 to fix broken tests (#141572) 2024-12-05 00:24:07 +00:00
test_mps.py [MPS] Add scatter_reduce.two (#141948) 2024-12-04 04:56:43 +00:00
test_multiprocessing.py
test_multiprocessing_spawn.py
test_namedtensor.py
test_namedtuple_return_api.py
test_native_functions.py
test_native_mha.py
test_nestedtensor.py Switch to using Python nested int (#141166) 2024-12-02 19:17:30 +00:00
test_nn.py softshrink nan fixes (#138421) 2024-11-21 23:06:08 +00:00
test_nnapi.py
test_numba_integration.py
test_numpy_interop.py Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
test_openmp.py [1/N] Don't skip ASAN on some tests (#138571) 2024-10-23 02:38:45 +00:00
test_ops.py [2/N] Enable UBSAN tests (#141740) 2024-12-03 20:52:26 +00:00
test_ops_fwd_gradients.py
test_ops_gradients.py Run only listed tests on s390x (#140265) 2024-11-20 22:53:09 +00:00
test_ops_jit.py
test_optim.py [MPS] Expand fused forloop to bfloat16 (#141104) 2024-11-22 01:07:15 +00:00
test_out_dtype_op.py
test_overrides.py Introduce torch.sym_add, variadic add (#138660) 2024-10-23 17:42:41 +00:00
test_package.py
test_per_overload_api.py
test_prims.py
test_proxy_tensor.py Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
test_pruning_op.py
test_public_bindings.py [BE] Remove Model Dump utility (#141540) 2024-11-27 22:52:55 +00:00
test_python_dispatch.py [torchgen] Improve schema parsing with regex for numeric ranges (#140210) 2024-11-14 23:28:27 +00:00
test_pytree.py When serializing treespec context, support enum as well (#141525) 2024-12-04 03:08:50 +00:00
test_quantization.py
test_reductions.py use more elements per thread for narrow dtypes (#139449) 2024-11-14 22:50:16 +00:00
test_scatter_gather_ops.py Fix the check for can_use_expanded_index_path (#140351) 2024-11-15 05:52:23 +00:00
test_schema_check.py
test_segment_reductions.py Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
test_serialization.py Fix tests in test/test_serialization that were failing if run individually (#141300) 2024-11-22 02:40:37 +00:00
test_set_default_mobile_cpu_allocator.py
test_shape_ops.py Add size param check of unfold (#139965) 2024-11-09 17:12:53 +00:00
test_show_pickle.py
test_sort_and_select.py Support torch.bool in torch.sort + CUDA (#139409) 2024-11-06 00:02:54 +00:00
test_sparse.py Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
test_sparse_csr.py Add out_dtype kw argument to optimize_bsr_dense_addmm (#136626) 2024-10-22 09:52:25 +00:00
test_sparse_semi_structured.py Enable CI on SM89 (#140305) 2024-12-03 04:49:46 +00:00
test_spectral_ops.py
test_stateless.py
test_static_runtime.py
test_subclass.py
test_sympy_utils.py [dynamo] add SymNode bitwise and/or (#138777) 2024-11-22 23:36:16 +00:00
test_tensor_creation_ops.py fix test_float_to_int_conversion_nonfinite for NumPy 2 (#138131) 2024-11-14 04:19:19 +00:00
test_tensorboard.py [ci, 3.13] update tensorboard version for 3.13 to fix broken tests (#141572) 2024-12-05 00:24:07 +00:00
test_tensorexpr.py
test_tensorexpr_pybind.py
test_testing.py [ci, 3.13] update test_testing.py usage of locals() for 3.13 (#141577) 2024-12-05 00:24:14 +00:00
test_throughput_benchmark.py
test_torch.py Implement deterministic scan (#140887) 2024-11-19 23:43:26 +00:00
test_transformers.py Lint: switch oncall owner for test_transformers (#141722) 2024-11-27 21:45:43 +00:00
test_type_hints.py
test_type_info.py
test_type_promotion.py
test_typing.py
test_unary_ufuncs.py Implement nonzero for large inputs (#141592) 2024-11-27 10:19:53 +00:00
test_utils.py
test_utils_config_module.py config: Throw if justknobs value is not a boolean (#139488) 2024-11-20 23:52:17 +00:00
test_view_ops.py Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00
test_vulkan.py
test_weak.py
test_xnnpack_integration.py
test_xpu.py Use default context on Windows for Intel GPU (#138049) 2024-11-28 02:49:46 +00:00