mirror of
https://github.com/saymrwulf/pytorch.git
synced 2026-05-14 20:57:59 +00:00
## Description
Fixes https://github.com/pytorch/pytorch/issues/114450. This PR builds upon the work from @imzhuhl done in https://github.com/pytorch/pytorch/pull/114451.
This PR requires https://github.com/pytorch/pytorch/pull/122472 to land firstly.
We leverage the serialization and deserialization API from oneDNN v3.4.1 to save the opaque MKLDNN tensor during the compilation and restore the opaque tensor when loading the compiled .so.
ideep version is updated so that we won't break any pipeline even if third_party/ideep is not updated at the same time.
### Test plan:
```sh
python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_freezing_non_abi_compatible_cpu
python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_conv_freezing_non_abi_compatible_cpu
python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_deconv_freezing_non_abi_compatible_cpu
python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_linear_freezing_non_abi_compatible_cpu
```
### TODOs in follow-up PRs
1. We found that using `AOTI_TORCH_CHECK` will cause performance drop on several models (`DistillGPT2`, `MBartForConditionalGeneration`, `T5ForConditionalGeneration`, `T5Small`) compared with JIT Inductor which uses `TORCH_CHECK`. This may need further discussion how to address (`AOTI_TORCH_CHECK` is introduced in
https://github.com/pytorch/pytorch/pull/119220).
2. Freezing in non-ABI compatible mode will work with the support in this PR. While for ABI compatible mode, we need to firstly address this issue: `AssertionError: None, i.e. optional output is not supported`.
|
||
|---|---|---|
| .. | ||
| cpp | ||
| extension_backends | ||
| __init__.py | ||
| indirect_assert_helper.py | ||
| minifier_smoke.py | ||
| opinfo_harness.py | ||
| test_aot_inductor.py | ||
| test_aot_inductor_utils.py | ||
| test_benchmark_fusion.py | ||
| test_binary_folding.py | ||
| test_codecache.py | ||
| test_codegen_triton.py | ||
| test_compiled_autograd.py | ||
| test_compiled_optimizers.py | ||
| test_config.py | ||
| test_control_flow.py | ||
| test_coordinate_descent_tuner.py | ||
| test_cpu_cpp_wrapper.py | ||
| test_cpu_repro.py | ||
| test_cpu_select_algorithm.py | ||
| test_cuda_cpp_wrapper.py | ||
| test_cuda_repro.py | ||
| test_cudacodecache.py | ||
| test_cudagraph_trees.py | ||
| test_custom_lowering.py | ||
| test_custom_post_grad_passes.py | ||
| test_cutlass_backend.py | ||
| test_debug_trace.py | ||
| test_decompose_mem_bound_mm.py | ||
| test_dependencies.py | ||
| test_distributed_patterns.py | ||
| test_efficient_conv_bn_eval.py | ||
| test_extension_backend.py | ||
| test_flex_attention.py | ||
| test_foreach.py | ||
| test_fp8.py | ||
| test_fused_attention.py | ||
| test_fx_fusion.py | ||
| test_group_batch_fusion.py | ||
| test_halide.py | ||
| test_indexing.py | ||
| test_inductor_freezing.py | ||
| test_inductor_utils.py | ||
| test_inplacing_pass.py | ||
| test_kernel_benchmark.py | ||
| test_layout_optim.py | ||
| test_max_autotune.py | ||
| test_memory_planning.py | ||
| test_metrics.py | ||
| test_minifier.py | ||
| test_minifier_isolate.py | ||
| test_mkldnn_pattern_matcher.py | ||
| test_mmdecomp.py | ||
| test_move_constructors_to_cuda.py | ||
| test_multi_kernel.py | ||
| test_pad_mm.py | ||
| test_padding.py | ||
| test_pattern_matcher.py | ||
| test_perf.py | ||
| test_profiler.py | ||
| test_select_algorithm.py | ||
| test_smoke.py | ||
| test_snode_runtime.py | ||
| test_split_cat_fx_passes.py | ||
| test_standalone_compile.py | ||
| test_torchbind.py | ||
| test_torchinductor.py | ||
| test_torchinductor_codegen_dynamic_shapes.py | ||
| test_torchinductor_dynamic_shapes.py | ||
| test_torchinductor_opinfo.py | ||
| test_triton_extension_backend.py | ||
| test_triton_heuristics.py | ||
| test_triton_kernels.py | ||
| test_triton_wrapper.py | ||
| test_unbacked_symints.py | ||
| test_utils.py | ||
| test_xpu_basic.py | ||