onnxruntime/cmake
Tianlei Wu 6ffaaebb60
[CUDA] Attention kernel provider option (#21344)
### Description
* Add a cuda provider option `sdpa_kernel` to choose which attention kernel to run for testing purpose. 
* Allow dump which attention kernel is used per node.
* Reserve  a flag for cudnn flash attention which will be added soon.

#### CUDA provider option sdpa_kernel
Instead of setting environment variable, we also support setting it in
provider option. Note that the setting is global per session. That could
help performance testing of each kernel.

#### Attention Kernel Debug Info
Set an environment variable `ORT_ENABLE_ATTENTION_KERNEL_DEBUG_INFO=1`,
and ORT will print sdpa kernel used in each node:

For example 
```
ORT_ENABLE_ATTENTION_KERNEL_DEBUG_INFO=1 ./onnxruntime_test_all --gtest_filter=MultiHeadAttentionTest*
```
It will show debug information of kernel used in testing:
```
[ RUN      ] MultiHeadAttentionTest.SelfAttention_Batch2_HeadSize32_NoBias_NoMask_PackedQKV
AttentionKernelOptions: FLASH_ATTENTION=0 EFFICIENT_ATTENTION=0 TRT_FUSED_ATTENTION=1 CUDNN_FLASH_ATTENTION=0 TRT_FLASH_ATTENTION=1 TRT_CROSS_ATTENTION=0 TRT_CAUSAL_ATTENTION=0 MATH=1
Operator=MultiHeadAttention Node=node1 DataType=fp16 TRT_FUSED_ATTENTION=1
AttentionKernelOptions: FLASH_ATTENTION=0 EFFICIENT_ATTENTION=1 TRT_FUSED_ATTENTION=0 CUDNN_FLASH_ATTENTION=0 TRT_FLASH_ATTENTION=0 TRT_CROSS_ATTENTION=0 TRT_CAUSAL_ATTENTION=0 MATH=1
Operator=MultiHeadAttention Node=node1 DataType=fp16 EFFICIENT_ATTENTION=1
```
In this test case, the debug info shows that one session uses trt fused
attention and another session use efficient attention.
2024-07-19 13:58:54 -07:00
..
external Enablement of onnxruntime for AIX and fixing issues related to big-endian platform. (#21133) 2024-07-17 12:37:06 -07:00
patches Enablement of onnxruntime for AIX and fixing issues related to big-endian platform. (#21133) 2024-07-17 12:37:06 -07:00
tensorboard
adjust_global_compile_flags.cmake tools: build: fix typo (#21052) 2024-06-19 16:14:58 -07:00
arm64x.cmake Dev/mookerem/arm64x update (#20536) 2024-05-07 12:50:38 -07:00
CMakeLists.txt Enablement of onnxruntime for AIX and fixing issues related to big-endian platform. (#21133) 2024-07-17 12:37:06 -07:00
CMakeSettings.json
codeconv.runsettings
deps.txt Update absl (#21300) 2024-07-10 11:14:15 -07:00
deps_update_and_upload.py Update google benchmark to 1.8.3. (#19734) 2024-03-01 11:01:58 -08:00
EnableVisualStudioCodeAnalysis.props
gdk_toolchain.cmake
Info.plist.in
libonnxruntime.pc.cmake.in
linux_arm32_crosscompile_toolchain.cmake Add a build validation for Linux ARM64 cross-compile (#18200) 2023-11-08 13:03:18 -08:00
linux_arm64_crosscompile_toolchain.cmake Add a build validation for Linux ARM64 cross-compile (#18200) 2023-11-08 13:03:18 -08:00
maccatalyst_prepare_objects_for_prelink.py Support xcframework for mac catalyst builds. (#19534) 2024-03-20 10:55:19 -07:00
nuget_helpers.cmake
onnxruntime.cmake Enablement of onnxruntime for AIX and fixing issues related to big-endian platform. (#21133) 2024-07-17 12:37:06 -07:00
onnxruntime_codegen_tvm.cmake
onnxruntime_common.cmake Enable QNN HTP support for Node (#20576) 2024-05-09 13:11:07 -07:00
onnxruntime_compile_triton_kernel.cmake [CUDA] Add SparseAttention operator for Phi-3-small (#20216) 2024-04-30 09:06:29 -07:00
onnxruntime_config.h.in
onnxruntime_csharp.cmake
onnxruntime_flatbuffers.cmake
onnxruntime_framework.cmake Enablement of onnxruntime for AIX and fixing issues related to big-endian platform. (#21133) 2024-07-17 12:37:06 -07:00
onnxruntime_framework.natvis
onnxruntime_fuzz_test.cmake
onnxruntime_graph.cmake [Apple framework] Fix minimal build with training enabled. (#19858) 2024-03-12 11:33:30 -07:00
onnxruntime_ios.toolchain.cmake Support visionos build (#20365) 2024-04-23 18:15:07 -07:00
onnxruntime_java.cmake Remove deprecated "mobile" packages (#20941) 2024-06-07 16:20:32 -05:00
onnxruntime_java_unittests.cmake
onnxruntime_kernel_explorer.cmake [ROCm] Update ck to use ck_tile (#21030) 2024-06-19 14:06:10 +08:00
onnxruntime_mlas.cmake Enablement of onnxruntime for AIX and fixing issues related to big-endian platform. (#21133) 2024-07-17 12:37:06 -07:00
onnxruntime_nodejs.cmake Enable QNN HTP support for Node (#20576) 2024-05-09 13:11:07 -07:00
onnxruntime_objectivec.cmake
onnxruntime_opschema_lib.cmake
onnxruntime_optimizer.cmake Flash attention recompute (#20603) 2024-05-21 13:38:19 +08:00
onnxruntime_providers.cmake [VSINPU]Code improvement && Slice/Dropout OP support (#21217) 2024-07-09 20:14:46 -07:00
onnxruntime_providers_acl.cmake
onnxruntime_providers_armnn.cmake
onnxruntime_providers_azure.cmake
onnxruntime_providers_cann.cmake
onnxruntime_providers_coreml.cmake Fix Objective-C static analysis warnings. (#20417) 2024-04-24 11:48:29 -07:00
onnxruntime_providers_cpu.cmake Enablement of onnxruntime for AIX and fixing issues related to big-endian platform. (#21133) 2024-07-17 12:37:06 -07:00
onnxruntime_providers_cuda.cmake [Build] Propagate build option for CUDA minimal to TRT (#20695) 2024-07-09 14:40:04 -07:00
onnxruntime_providers_dml.cmake Delay load dxcore.dll in addition to ext-ms-win-dxcore-l1-1-0.dll (#18913) 2023-12-26 12:33:42 -08:00
onnxruntime_providers_dnnl.cmake
onnxruntime_providers_js.cmake
onnxruntime_providers_migraphx.cmake Migraphx ep windows build (#21284) 2024-07-11 21:21:38 -07:00
onnxruntime_providers_nnapi.cmake Make partitioning utils QDQ aware so it does not break up QDQ node units (#19723) 2024-03-12 10:55:49 +10:00
onnxruntime_providers_openvino.cmake Change libonnxruntime.so's SONAME: remove the minor and patch version. (#21339) 2024-07-15 14:21:34 -07:00
onnxruntime_providers_qnn.cmake Make partitioning utils QDQ aware so it does not break up QDQ node units (#19723) 2024-03-12 10:55:49 +10:00
onnxruntime_providers_rknpu.cmake
onnxruntime_providers_rocm.cmake [ROCm] fix: obtain AMD GPU memory info through rocm_smi library (#21190) 2024-07-09 20:35:26 -07:00
onnxruntime_providers_tensorrt.cmake [Build] Propagate build option for CUDA minimal to TRT (#20695) 2024-07-09 14:40:04 -07:00
onnxruntime_providers_tvm.cmake
onnxruntime_providers_vitisai.cmake [VitisAI] Solve the problem that gsl cannot be found when compiling under linux (#20466) 2024-04-28 20:56:16 -07:00
onnxruntime_providers_vsinpu.cmake [VSINPU]Code improvement && Slice/Dropout OP support (#21217) 2024-07-09 20:14:46 -07:00
onnxruntime_providers_webnn.cmake
onnxruntime_providers_xnnpack.cmake Make partitioning utils QDQ aware so it does not break up QDQ node units (#19723) 2024-03-12 10:55:49 +10:00
onnxruntime_python.cmake onnxruntime shared lib inside python package (#21223) 2024-07-02 15:37:50 -07:00
onnxruntime_rocm_hipify.cmake [CUDA] Attention kernel provider option (#21344) 2024-07-19 13:58:54 -07:00
onnxruntime_session.cmake
onnxruntime_snpe_provider.cmake
onnxruntime_training.cmake Delete pyop (#21094) 2024-06-19 16:21:33 -07:00
onnxruntime_unittests.cmake [CUDA] Attention kernel provider option (#21344) 2024-07-19 13:58:54 -07:00
onnxruntime_util.cmake
onnxruntime_visionos.toolchain.cmake Support visionos build (#20365) 2024-04-23 18:15:07 -07:00
onnxruntime_webassembly.cmake [js/web] optimize module export and deployment (#20165) 2024-05-20 09:51:16 -07:00
precompiled_header.cmake
riscv64.toolchain.cmake Enable RISC-V 64-bit Cross-Compiling Support for ONNX Runtime on Linux (#19238) 2024-01-24 16:27:05 -08:00
Sdl.ruleset
set_winapi_family_desktop.h
target_delayload.cmake
uwp_stubs.h
wcos_rules_override.cmake Stop using apiset in OneCore build: use onecoreuap.lib instead of onecoreuap_apiset.lib (#19632) 2024-02-23 22:31:57 -08:00
winml.cmake Change libonnxruntime.so's SONAME: remove the minor and patch version. (#21339) 2024-07-15 14:21:34 -07:00
winml_cppwinrt.cmake
winml_sdk_helpers.cmake
winml_unittests.cmake