onnxruntime/include/onnxruntime/core
cloudhan 71a4e7eb97
Automatically enable tunable op usage for production models (#15156)
Split `IsTunbaleOpEnable` semantics into **enable tunable op for using**
and **enable tunable op for tuning**.

They remain disabled in general for safety purpose. But
- if session is created with onnx model with tuning results embeded
- the embedded tuning results is set to the EP without error `Status`

then we automatically enable the using, tuning remains disabled.

The planned options will be
- `tunable_op_enable`: The top-level switch of `TunableOp`, indicate if we will run into `TunableOp` related logic. **NOTE:** most of our impls have a bottom impl that is acting as a fallback and is set as the default. In this case, we still call into the `TunableOp`, but no kernel selection, no kernel tuning and caching is involved. This reduced our maintainance burden of a duplicate code path.
- `tunable_op_tuning_enable`: The secondary switch of `TunableOp`, indicate if we will run into the tuning related logic of `TunableOp`

Then for the possible future options:
- `tunable_op_tuning_max_iteration`: blahblah
- `tunable_op_tuning_max_duration_ms`: blahblah
- `tunable_op_flash_attention_enable`: blahblah, for example only, we will not have this.

For developer oriented envvar, it is for developers' convenience to inspect the performance impact of tuning. So there is only `ORT_ROCM_TUNABLE_OP_ENABLE`, `ORT_ROCM_TUNABLE_OP_TUNING_ENABLE` to take the fine-grind control of combinations.
2023-04-06 13:52:47 +08:00
..
common Graph transformer to ensure unique DQ nodes for QDQ node units (#15145) 2023-03-31 08:39:43 +10:00
eager
framework FasterTransformer model wrapper using custom op (#15013) 2023-03-20 09:05:30 -07:00
graph Graph transformer to ensure unique DQ nodes for QDQ node units (#15145) 2023-03-31 08:39:43 +10:00
optimizer Graph transformer to ensure unique DQ nodes for QDQ node units (#15145) 2023-03-31 08:39:43 +10:00
platform
providers Automatically enable tunable op usage for production models (#15156) 2023-04-06 13:52:47 +08:00
session Automatically enable tunable op usage for production models (#15156) 2023-04-06 13:52:47 +08:00