pytorch/test/inductor
Wu, Chunyuan 4a997de8b9 [AOTI] support freezing for MKLDNN (#124350)
## Description
Fixes https://github.com/pytorch/pytorch/issues/114450. This PR builds upon the work from @imzhuhl done in https://github.com/pytorch/pytorch/pull/114451.

This PR requires https://github.com/pytorch/pytorch/pull/122472 to land firstly.

We leverage the serialization and deserialization API from oneDNN v3.4.1 to save the opaque MKLDNN tensor during the compilation and restore the opaque tensor when loading the compiled .so.
ideep version is updated so that we won't break any pipeline even if third_party/ideep is not updated at the same time.

### Test plan:
```sh
python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_freezing_non_abi_compatible_cpu
python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_conv_freezing_non_abi_compatible_cpu
python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_deconv_freezing_non_abi_compatible_cpu
python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_linear_freezing_non_abi_compatible_cpu
```

### TODOs in follow-up PRs
1. We found that using `AOTI_TORCH_CHECK` will cause performance drop on several models (`DistillGPT2`, `MBartForConditionalGeneration`, `T5ForConditionalGeneration`, `T5Small`) compared with JIT Inductor which uses `TORCH_CHECK`. This may need further discussion how to address (`AOTI_TORCH_CHECK` is introduced in
 https://github.com/pytorch/pytorch/pull/119220).
2. Freezing in non-ABI compatible mode will work with the support in this PR. While for ABI compatible mode, we need to firstly address this issue: `AssertionError: None, i.e. optional output is not supported`.
6c4f43f826/torch/_inductor/codegen/cpp_wrapper_cpu.py (L2023-L2024)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124350
Approved by: https://github.com/jgong5, https://github.com/desertfire
2024-05-25 07:15:36 +00:00
..
cpp
extension_backends [codemod] c10::optional -> std::optional in caffe2/aten/src/ATen/DeviceGuard.h +117 (#126901) 2024-05-24 00:26:15 +00:00
__init__.py
indirect_assert_helper.py
minifier_smoke.py
opinfo_harness.py
test_aot_inductor.py [AOTI] support freezing for MKLDNN (#124350) 2024-05-25 07:15:36 +00:00
test_aot_inductor_utils.py
test_benchmark_fusion.py Enable epilogue fusion benchmarking internally (#125455) 2024-05-14 23:06:29 +00:00
test_binary_folding.py
test_codecache.py [Inductor Intel GPU backend Upstream] Reuse inductor test for Intel GPU (PART 1) (#122866) 2024-05-09 00:51:35 +00:00
test_codegen_triton.py
test_compiled_autograd.py [compiled autograd] Better cache miss logging (#126602) 2024-05-19 23:49:52 +00:00
test_compiled_optimizers.py Revert " Updated test_graph_optims and test_graph_scaling_fused_optimizers to use new OptimizerInfo infrastructure (#125127)" 2024-05-20 12:14:22 +00:00
test_config.py
test_control_flow.py Fix flexattention not realizing inputs before lowering (also refactored runtime estimation) (#126615) 2024-05-22 17:28:46 +00:00
test_coordinate_descent_tuner.py Remove removed ruff rule TRY200 (#126256) 2024-05-17 16:31:05 +00:00
test_cpu_cpp_wrapper.py [Quant][Inductor] Enable lowering of qlinear-binary(-unary) fusion for X86Inductor (#122593) 2024-05-17 07:46:48 +00:00
test_cpu_repro.py Unify the dtype to VecMask<float, N> in ops.masked (#126662) 2024-05-21 20:52:25 +00:00
test_cpu_select_algorithm.py [inductor][cpp] support bf16/fp16 gemm template epilogue fusion (#126545) 2024-05-24 12:29:06 +00:00
test_cuda_cpp_wrapper.py [AOTI] Fix an int array codegen issue (#126801) 2024-05-24 19:10:33 +00:00
test_cuda_repro.py [inductor] Fix ops.scan for non-commutative operators (#126633) 2024-05-20 10:27:17 +00:00
test_cudacodecache.py
test_cudagraph_trees.py Implement native support for float inputs in Dynamo and ShapeEnv (#125325) 2024-05-14 04:10:01 +00:00
test_custom_lowering.py
test_custom_post_grad_passes.py
test_cutlass_backend.py
test_debug_trace.py Add mode to MemoryDep to track atomic accumulates (#123223) 2024-05-16 04:34:09 +00:00
test_decompose_mem_bound_mm.py
test_dependencies.py
test_distributed_patterns.py functionalize storage resizing, minimal ppFSDP traceable forward (#122434) 2024-05-10 18:09:10 +00:00
test_efficient_conv_bn_eval.py
test_extension_backend.py
test_flex_attention.py Made some minor improvements to flexattention perf + added more autotune configs (#126811) 2024-05-25 05:03:31 +00:00
test_foreach.py
test_fp8.py
test_fused_attention.py [Inductor] Add SDPA pattern for OOB GPT2 models (#125562) 2024-05-09 01:21:09 +00:00
test_fx_fusion.py
test_group_batch_fusion.py Fix flexattention not realizing inputs before lowering (also refactored runtime estimation) (#126615) 2024-05-22 17:28:46 +00:00
test_halide.py [halide-backend] Add HalideCodeCache (#126416) 2024-05-22 06:52:50 +00:00
test_indexing.py
test_inductor_freezing.py
test_inductor_utils.py
test_inplacing_pass.py
test_kernel_benchmark.py Revert "update pointwise cat heuristics (#125772)" 2024-05-11 15:27:44 +00:00
test_layout_optim.py
test_max_autotune.py [inductor] make conv lowering work with dynamic shapes (#126823) 2024-05-22 23:15:29 +00:00
test_memory_planning.py Make 'pytest test/inductor/test_memory_planning.py' work (#126397) 2024-05-16 20:28:20 +00:00
test_metrics.py [Inductor Intel GPU backend Upstream] Reuse inductor test for Intel GPU (PART 1) (#122866) 2024-05-09 00:51:35 +00:00
test_minifier.py
test_minifier_isolate.py
test_mkldnn_pattern_matcher.py Revert "[Quant][PT2E] enable qlinear post op fusion for dynamic quant & qat (#122667)" 2024-05-21 13:45:07 +00:00
test_mmdecomp.py
test_move_constructors_to_cuda.py
test_multi_kernel.py
test_pad_mm.py dont pad 0 dim mm inputs (#126475) 2024-05-17 05:03:27 +00:00
test_padding.py cprofile every compile id [x/y] to keep consistent with tlparse (#125659) 2024-05-14 17:09:28 +00:00
test_pattern_matcher.py Fix flexattention not realizing inputs before lowering (also refactored runtime estimation) (#126615) 2024-05-22 17:28:46 +00:00
test_perf.py Fix flexattention not realizing inputs before lowering (also refactored runtime estimation) (#126615) 2024-05-22 17:28:46 +00:00
test_profiler.py
test_select_algorithm.py
test_smoke.py
test_snode_runtime.py Fix flexattention not realizing inputs before lowering (also refactored runtime estimation) (#126615) 2024-05-22 17:28:46 +00:00
test_split_cat_fx_passes.py
test_standalone_compile.py
test_torchbind.py [torchbind] Add inductor support (#123709) 2024-05-13 18:18:17 +00:00
test_torchinductor.py [AOTI][refactor] Update DTYPE_TO_CPP mapping (#126915) 2024-05-24 21:03:12 +00:00
test_torchinductor_codegen_dynamic_shapes.py [Inductor Intel GPU backend Upstream] Reuse inductor test for Intel GPU (PART 1) (#122866) 2024-05-09 00:51:35 +00:00
test_torchinductor_dynamic_shapes.py [inductor] [FX graph cache] Ignore unbacked symints in guards expression (#126251) 2024-05-16 01:35:41 +00:00
test_torchinductor_opinfo.py Unify the dtype to VecMask<float, N> in ops.masked (#126662) 2024-05-21 20:52:25 +00:00
test_triton_extension_backend.py
test_triton_heuristics.py Remove removed ruff rule TRY200 (#126256) 2024-05-17 16:31:05 +00:00
test_triton_kernels.py [User-Written Triton] Handle the scf.for and scf.while case (#127065) 2024-05-24 21:01:13 +00:00
test_triton_wrapper.py
test_unbacked_symints.py [inductor] fix unbacked case in pointwise + reduction vertical fusion (#125982) 2024-05-17 17:06:24 +00:00
test_utils.py
test_xpu_basic.py