mirror of
https://github.com/saymrwulf/onnxruntime.git
synced 2026-06-16 01:33:39 +00:00
Split `IsTunbaleOpEnable` semantics into **enable tunable op for using** and **enable tunable op for tuning**. They remain disabled in general for safety purpose. But - if session is created with onnx model with tuning results embeded - the embedded tuning results is set to the EP without error `Status` then we automatically enable the using, tuning remains disabled. The planned options will be - `tunable_op_enable`: The top-level switch of `TunableOp`, indicate if we will run into `TunableOp` related logic. **NOTE:** most of our impls have a bottom impl that is acting as a fallback and is set as the default. In this case, we still call into the `TunableOp`, but no kernel selection, no kernel tuning and caching is involved. This reduced our maintainance burden of a duplicate code path. - `tunable_op_tuning_enable`: The secondary switch of `TunableOp`, indicate if we will run into the tuning related logic of `TunableOp` Then for the possible future options: - `tunable_op_tuning_max_iteration`: blahblah - `tunable_op_tuning_max_duration_ms`: blahblah - `tunable_op_flash_attention_enable`: blahblah, for example only, we will not have this. For developer oriented envvar, it is for developers' convenience to inspect the performance impact of tuning. So there is only `ORT_ROCM_TUNABLE_OP_ENABLE`, `ORT_ROCM_TUNABLE_OP_TUNING_ENABLE` to take the fine-grind control of combinations. |
||
|---|---|---|
| .. | ||
| rocm | ||
| _kernel_explorer.pyi | ||
| batched_gemm_test.py | ||
| fast_gelu_test.py | ||
| gemm_fast_gelu_test.py | ||
| gemm_softmax_gemm_permute_test.py | ||
| gemm_test.py | ||
| kernel_explorer.py | ||
| skip_layer_norm_test.py | ||
| softmax_test.py | ||
| strided_batched_gemm_test.py | ||
| utils.py | ||
| vector_add.cu | ||
| vector_add_kernel.cuh | ||
| vector_add_test.py | ||