pytorch/test
Feng Shi bb6eef8ed1 [2/2] PT2 Inductor ComboKernels - automatic horizontal fusing (#131675)
Summary:
A ComboKernel combines independent Inductor Triton kernels into a single one.
This is part 2 pull request which 1) adds automatic horizontal fusion in the end of the inductor operator fusion process, 2) adds type annotation for trition_combo_kernel.py

ComboKernel is used in two cases: 1) for existing foreach kernels, combo kernels are used as the backend kernel. the front-end kernel generation logic remains the same. 2) Added an extra optimization phase to the end of the scheduler to generate extra combo kernels if combo_kernels is True in config.py

This is part 2 pull request which deals with the 2nd case above:

- The combo kernel generation in the added optimization phase is done in two steps: 1) in the front end inside the scheduler, it topologically sort the schedule nodes to find all the nodes with no data dependency and create a frond end schedule node for them. We currently limit the maximal number of sub-nodes for each combo kernel to 8 (but we still need to find what is the optimal number). 2) then, these sub-nodes are combined in the codegen phase to generate the combo kernel code for them based on a few rules. For example, 1d and 2d kernels are separated into different combo kernels, as mixing them is not supported yet. Note these algorithms we provide are very basic, and the users can register their customized combo kernel generation algorithms for both steps.

- Performance wise, combining small kernels is about always to see performance gain. however, combining very large kernels may not see any perf gain, sometimes even regression possibly due to improper block sizes. Thus, a benchmark function is implemented to avoid such perf regression, and it is recommended to turn it on by setting benchmark_combo_kernels to True whenever combo_kernels is True.

Please refer to part 1 pull request https://github.com/pytorch/pytorch/pull/124969 for more details.

Test Plan: buck2 test mode/dev-nosan caffe2/test/inductor:combo_kernels

Differential Revision: D60067757

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131675
Approved by: https://github.com/mlazos
2024-08-09 03:14:16 +00:00
..
ao/sparsity Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
autograd [BE][Easy][7/19] enforce style for empty lines in import segments in test/[a-c]*/ and test/[q-z]*/ (#129758) 2024-07-31 10:54:03 +00:00
backends/xeon
benchmark_utils
bottleneck_test Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
cpp [BE] rename testHelperPrefix test (#132916) 2024-08-08 20:54:52 +00:00
cpp_api_parity [BE][Easy][7/19] enforce style for empty lines in import segments in test/[a-c]*/ and test/[q-z]*/ (#129758) 2024-07-31 10:54:03 +00:00
cpp_extensions [13/N] Use std::optional (#132527) 2024-08-08 03:16:28 +00:00
custom_backend
custom_operator Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
distributed [dtensor] multi-dim mesh redistribute follow up (#133023) 2024-08-09 02:26:23 +00:00
distributions Revert "Temp disable MKL in DistributionKernels.cpp (#132532)" 2024-08-06 20:57:09 +00:00
dynamo Make debugging backends accept and ignore options kwargs from torch.compile (#132892) 2024-08-09 00:49:45 +00:00
dynamo_expected_failures [dynamo][user_defined][stable-diffusion] Raise ObservedAttributeError on UserDefinedObject var_getattr (#132806) 2024-08-07 18:19:49 +00:00
dynamo_skips Add fallback() to torch.library (#131707) 2024-07-27 18:02:35 +00:00
edge [Reland] [11/N] Use std::nullopt and std::optional (#132622) 2024-08-05 20:36:33 +00:00
error_messages
expect Conversions between strided and jagged layouts for Nested Tensors (#115749) 2024-08-07 14:18:53 +00:00
export dynamic shapes mismatch errors (#132982) 2024-08-09 02:22:32 +00:00
forward_backward_compatibility
functorch Add a private _safe_softmax (#131060) 2024-08-08 23:09:38 +00:00
fx [export][fx] More robust DCE pass (#132764) 2024-08-06 22:27:22 +00:00
higher_order_ops Make the __module__ name of HOO to be always "torch.ops.higher_order" (#132775) 2024-08-08 16:55:09 +00:00
inductor [2/2] PT2 Inductor ComboKernels - automatic horizontal fusing (#131675) 2024-08-09 03:14:16 +00:00
jit Add None return type to init -- tests rest (#132376) 2024-08-01 15:44:51 +00:00
jit_hooks [BE][Easy][13/19] enforce style for empty lines in import segments in test/j*/ (#129764) 2024-08-01 12:13:42 +00:00
lazy Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
mobile Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
nn Add padding_side to pad_sequence with "left" and "right" options ("right" as default) (#131884) 2024-08-07 15:53:07 +00:00
onnx move torch._functionalize APIs to pybind. add one for marking storage mutations (#132337) 2024-08-05 21:28:59 +00:00
optim Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
package Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
profiler [Profiler] Test Logging for Empty Traces (#132444) 2024-08-02 22:04:15 +00:00
quantization [export] Remove Proxy from exported programs and modules (#132956) 2024-08-09 00:00:20 +00:00
scripts [BE][Easy][7/19] enforce style for empty lines in import segments in test/[a-c]*/ and test/[q-z]*/ (#129758) 2024-07-31 10:54:03 +00:00
test_img
torch_np [test/torch_np] Fix usages of deprecated NumPy 2.0 APIs in numpy_tests (#131909) 2024-08-05 16:21:08 +00:00
typing [BE][Easy][7/19] enforce style for empty lines in import segments in test/[a-c]*/ and test/[q-z]*/ (#129758) 2024-07-31 10:54:03 +00:00
xpu Populate submodules of torch._C to sys.modules recursively (#132216) 2024-08-08 10:20:25 +00:00
_test_bazel.py
allowlist_for_publicAPI.json [BE] Reroute all uses of proxy_tensor.maybe_disable_fake_tensor_mode to fake_tensor.unset_fake_temporarily (#132770) 2024-08-08 23:07:23 +00:00
conftest.py [BE] Format uncategorized Python files with ruff format (#132576) 2024-08-04 17:13:31 +00:00
create_dummy_torchscript_model.py Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
delete.py
hi.py
HowToWriteTestsUsingFileCheck.md
linear.py
load_torchscript_model.py
minioptest_failures_dict.json
mkl_verbose.py
mkldnn_verbose.py Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
pytest_shard_custom.py [BE] Format uncategorized Python files with ruff format (#132576) 2024-08-04 17:13:31 +00:00
run_doctests.sh
run_test.py [BE] Format uncategorized Python files with ruff format (#132576) 2024-08-04 17:13:31 +00:00
simulate_nccl_errors.py
slow_tests.json Move slow tests to be in repo (#132379) 2024-08-07 18:42:56 +00:00
test_ao_sparsity.py
test_autocast.py Revert "[MPS] Add support for autocast in MPS (#99272)" 2024-08-05 19:59:04 +00:00
test_autograd.py torch.autograd.graph.increment_version: accept List[Tensor], use in AOTDispatcher (#132652) 2024-08-06 17:46:48 +00:00
test_autograd_fallback.py
test_autoload.py
test_binary_ufuncs.py [BE] Format uncategorized Python files with ruff format (#132576) 2024-08-04 17:13:31 +00:00
test_bundled_images.py
test_bundled_inputs.py [BE] Format uncategorized Python files with ruff format (#132576) 2024-08-04 17:13:31 +00:00
test_ci_sanity_check_fail.py
test_comparison_utils.py
test_compile_benchmark_util.py Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
test_complex.py
test_content_store.py
test_cpp_api_parity.py
test_cpp_extensions_aot.py
test_cpp_extensions_jit.py Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
test_cpp_extensions_mtia_backend.py
test_cpp_extensions_open_device_registration.py [Intel GPU] Dispatch Stub support (#130019) 2024-07-29 02:18:52 +00:00
test_cpp_extensions_stream_and_event.py Re-implement pin_memory to be device-agnostic by leveraging the Accelerator concept (#126376) 2024-07-23 01:44:15 +00:00
test_cuda.py Loads .pyd instead of .so in MemPool test for windows (#132749) 2024-08-08 14:29:56 +00:00
test_cuda_expandable_segments.py
test_cuda_multigpu.py Allow torch.cuda.memory.mem_get_info to take a device str argument with an unspecified device index. (#132616) 2024-08-06 13:19:46 +00:00
test_cuda_nvml_based_avail.py
test_cuda_primary_ctx.py
test_cuda_sanitizer.py [BE] Format uncategorized Python files with ruff format (#132576) 2024-08-04 17:13:31 +00:00
test_cuda_trace.py
test_custom_ops.py [BE] Format uncategorized Python files with ruff format (#132576) 2024-08-04 17:13:31 +00:00
test_dataloader.py Support IPC for Expandable Segments (#130890) 2024-08-05 18:48:13 +00:00
test_datapipe.py Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
test_decomp.py Add a private _safe_softmax (#131060) 2024-08-08 23:09:38 +00:00
test_deploy.py Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
test_determination.py
test_dispatch.py [BE] Format uncategorized Python files with ruff format (#132576) 2024-08-04 17:13:31 +00:00
test_dlpack.py
test_dynamic_shapes.py [BE] Format uncategorized Python files with ruff format (#132576) 2024-08-04 17:13:31 +00:00
test_expanded_weights.py Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
test_fake_tensor.py Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
test_flop_counter.py [NJT][flop counter] attention: if offsets are fake, use max seqlen (#132356) 2024-08-02 20:42:29 +00:00
test_foreach.py [BE] Format uncategorized Python files with ruff format (#132576) 2024-08-04 17:13:31 +00:00
test_function_schema.py
test_functional_autograd_benchmark.py
test_functional_optim.py Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
test_functionalization.py [BE] Format uncategorized Python files with ruff format (#132576) 2024-08-04 17:13:31 +00:00
test_functionalization_of_rng_ops.py
test_futures.py
test_fx.py Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
test_fx_experimental.py Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
test_fx_passes.py Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
test_fx_reinplace_pass.py Fix py codegen to delete values that don't have any users (#131028) 2024-08-01 03:18:37 +00:00
test_hub.py
test_import_stats.py
test_indexing.py [BE] Format uncategorized Python files with ruff format (#132576) 2024-08-04 17:13:31 +00:00
test_itt.py
test_jit.py Add padding_side to pad_sequence with "left" and "right" options ("right" as default) (#131884) 2024-08-07 15:53:07 +00:00
test_jit_autocast.py Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
test_jit_disabled.py
test_jit_fuser.py Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
test_jit_fuser_legacy.py
test_jit_fuser_te.py [BE] Format uncategorized Python files with ruff format (#132576) 2024-08-04 17:13:31 +00:00
test_jit_legacy.py
test_jit_llga_fuser.py Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
test_jit_profiling.py
test_jit_simple.py
test_jit_string.py
test_jiterator.py
test_kernel_launch_checks.py
test_legacy_vmap.py [BE] Format uncategorized Python files with ruff format (#132576) 2024-08-04 17:13:31 +00:00
test_license.py
test_linalg.py TunableOp more unit test follow-up (#130065) 2024-08-08 22:42:16 +00:00
test_logging.py
test_masked.py
test_maskedtensor.py
test_matmul_cuda.py
test_meta.py [TEST] Fix _scaled_mm tests (#130897) 2024-07-18 02:15:00 +00:00
test_metal.py Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
test_mkl_verbose.py
test_mkldnn.py
test_mkldnn_fusion.py improve mkldnn_linear_pointwise_binary performance for contiguous tensor with non default contiguous strides (#132019) 2024-07-30 05:02:38 +00:00
test_mkldnn_verbose.py
test_mobile_optimizer.py Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
test_model_dump.py Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
test_model_exports_to_core_aten.py
test_module_tracker.py Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
test_modules.py
test_monitor.py [pytorch][counters] Pybind for WaitCounter (#132357) 2024-08-02 16:08:10 +00:00
test_mps.py Revert "[MPS] Add support for autocast in MPS (#99272)" 2024-08-05 19:59:04 +00:00
test_multiprocessing.py Support IPC for Expandable Segments (#130890) 2024-08-05 18:48:13 +00:00
test_multiprocessing_spawn.py
test_namedtensor.py
test_namedtuple_return_api.py
test_native_functions.py
test_native_mha.py
test_nestedtensor.py Support 'non-contiguous with holes' NJTs for contiguous clone() (#132776) 2024-08-08 17:08:11 +00:00
test_nn.py added persistent option to buffers and namedbuffers (#132994) 2024-08-08 21:39:01 +00:00
test_nnapi.py Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
test_numba_integration.py
test_numpy_interop.py [BE] Format uncategorized Python files with ruff format (#132576) 2024-08-04 17:13:31 +00:00
test_openmp.py
test_ops.py [BE] Format uncategorized Python files with ruff format (#132576) 2024-08-04 17:13:31 +00:00
test_ops_fwd_gradients.py
test_ops_gradients.py
test_ops_jit.py
test_optim.py Revert "Deprecate torch._utils.is_compiling() and torch._dynamo.external_utils.is_compiling() (#127690)" 2024-08-07 00:05:20 +00:00
test_out_dtype_op.py
test_overrides.py Change deprecate warning on dispatch_on_subclass to warn once (#132374) 2024-08-05 20:02:33 +00:00
test_package.py
test_per_overload_api.py
test_prims.py properly register conjugate/neg fallthroughs to prim ops (#132699) 2024-08-06 17:57:04 +00:00
test_proxy_tensor.py Only thunkify proxies in some situations (#132421) 2024-08-08 12:03:06 +00:00
test_pruning_op.py
test_public_bindings.py [BE] Format uncategorized Python files with ruff format (#132576) 2024-08-04 17:13:31 +00:00
test_python_dispatch.py [Doc] fix some typos (found by codespell and typos) (#132544) 2024-08-05 17:21:56 +00:00
test_pytree.py [BE] Format uncategorized Python files with ruff format (#132576) 2024-08-04 17:13:31 +00:00
test_quantization.py
test_reductions.py
test_scatter_gather_ops.py
test_schema_check.py
test_segment_reductions.py
test_serialization.py Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
test_set_default_mobile_cpu_allocator.py
test_shape_ops.py
test_show_pickle.py
test_sort_and_select.py [BE] Format uncategorized Python files with ruff format (#132576) 2024-08-04 17:13:31 +00:00
test_sparse.py [traced-graph][sparse] propagate sparsity in fx graph (#131920) 2024-08-05 15:49:53 +00:00
test_sparse_csr.py Add tests to bsr_dense_addmm_meta. Tune bsr_dense_addmm kernel for ViT shapes. (#132646) 2024-08-05 20:22:33 +00:00
test_sparse_semi_structured.py Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
test_spectral_ops.py Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
test_stateless.py Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
test_static_runtime.py Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
test_subclass.py Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
test_sympy_utils.py
test_tensor_creation_ops.py
test_tensorboard.py Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
test_tensorexpr.py Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
test_tensorexpr_pybind.py
test_testing.py Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
test_throughput_benchmark.py
test_torch.py [CUDA] is_bf16_supported() should not crash if there are no GPUs (#132313) 2024-08-02 02:50:43 +00:00
test_transformers.py Grouped Query Attention (#132689) 2024-08-07 05:35:36 +00:00
test_type_hints.py
test_type_info.py
test_type_promotion.py
test_typing.py
test_unary_ufuncs.py
test_utils.py Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
test_view_ops.py [BE] Format uncategorized Python files with ruff format (#132576) 2024-08-04 17:13:31 +00:00
test_vulkan.py Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
test_weak.py [BE] Format uncategorized Python files with ruff format (#132576) 2024-08-04 17:13:31 +00:00
test_xnnpack_integration.py Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
test_xpu.py