onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-16 21:00:14 +00:00

History

Tianlei Wu 6ffaaebb60 [CUDA] Attention kernel provider option (#21344 ) ### Description * Add a cuda provider option `sdpa_kernel` to choose which attention kernel to run for testing purpose. * Allow dump which attention kernel is used per node. * Reserve a flag for cudnn flash attention which will be added soon. #### CUDA provider option sdpa_kernel Instead of setting environment variable, we also support setting it in provider option. Note that the setting is global per session. That could help performance testing of each kernel. #### Attention Kernel Debug Info Set an environment variable `ORT_ENABLE_ATTENTION_KERNEL_DEBUG_INFO=1`, and ORT will print sdpa kernel used in each node: For example ``` ORT_ENABLE_ATTENTION_KERNEL_DEBUG_INFO=1 ./onnxruntime_test_all --gtest_filter=MultiHeadAttentionTest* ``` It will show debug information of kernel used in testing: ``` [ RUN ] MultiHeadAttentionTest.SelfAttention_Batch2_HeadSize32_NoBias_NoMask_PackedQKV AttentionKernelOptions: FLASH_ATTENTION=0 EFFICIENT_ATTENTION=0 TRT_FUSED_ATTENTION=1 CUDNN_FLASH_ATTENTION=0 TRT_FLASH_ATTENTION=1 TRT_CROSS_ATTENTION=0 TRT_CAUSAL_ATTENTION=0 MATH=1 Operator=MultiHeadAttention Node=node1 DataType=fp16 TRT_FUSED_ATTENTION=1 AttentionKernelOptions: FLASH_ATTENTION=0 EFFICIENT_ATTENTION=1 TRT_FUSED_ATTENTION=0 CUDNN_FLASH_ATTENTION=0 TRT_FLASH_ATTENTION=0 TRT_CROSS_ATTENTION=0 TRT_CAUSAL_ATTENTION=0 MATH=1 Operator=MultiHeadAttention Node=node1 DataType=fp16 EFFICIENT_ATTENTION=1 ``` In this test case, the debug info shows that one session uses trt fused attention and another session use efficient attention.		2024-07-19 13:58:54 -07:00
..
external	Enablement of onnxruntime for AIX and fixing issues related to big-endian platform. (#21133 )	2024-07-17 12:37:06 -07:00
patches	Enablement of onnxruntime for AIX and fixing issues related to big-endian platform. (#21133 )	2024-07-17 12:37:06 -07:00
tensorboard
adjust_global_compile_flags.cmake	tools: build: fix typo (#21052 )	2024-06-19 16:14:58 -07:00
arm64x.cmake	Dev/mookerem/arm64x update (#20536 )	2024-05-07 12:50:38 -07:00
CMakeLists.txt	Enablement of onnxruntime for AIX and fixing issues related to big-endian platform. (#21133 )	2024-07-17 12:37:06 -07:00
CMakeSettings.json
codeconv.runsettings
deps.txt	Update absl (#21300 )	2024-07-10 11:14:15 -07:00
deps_update_and_upload.py	Update google benchmark to 1.8.3. (#19734 )	2024-03-01 11:01:58 -08:00
EnableVisualStudioCodeAnalysis.props
gdk_toolchain.cmake
Info.plist.in
libonnxruntime.pc.cmake.in
linux_arm32_crosscompile_toolchain.cmake	Add a build validation for Linux ARM64 cross-compile (#18200 )	2023-11-08 13:03:18 -08:00
linux_arm64_crosscompile_toolchain.cmake	Add a build validation for Linux ARM64 cross-compile (#18200 )	2023-11-08 13:03:18 -08:00
maccatalyst_prepare_objects_for_prelink.py	Support xcframework for mac catalyst builds. (#19534 )	2024-03-20 10:55:19 -07:00
nuget_helpers.cmake
onnxruntime.cmake	Enablement of onnxruntime for AIX and fixing issues related to big-endian platform. (#21133 )	2024-07-17 12:37:06 -07:00
onnxruntime_codegen_tvm.cmake
onnxruntime_common.cmake	Enable QNN HTP support for Node (#20576 )	2024-05-09 13:11:07 -07:00
onnxruntime_compile_triton_kernel.cmake	[CUDA] Add SparseAttention operator for Phi-3-small (#20216 )	2024-04-30 09:06:29 -07:00
onnxruntime_config.h.in
onnxruntime_csharp.cmake
onnxruntime_flatbuffers.cmake
onnxruntime_framework.cmake	Enablement of onnxruntime for AIX and fixing issues related to big-endian platform. (#21133 )	2024-07-17 12:37:06 -07:00
onnxruntime_framework.natvis
onnxruntime_fuzz_test.cmake
onnxruntime_graph.cmake	[Apple framework] Fix minimal build with training enabled. (#19858 )	2024-03-12 11:33:30 -07:00
onnxruntime_ios.toolchain.cmake	Support visionos build (#20365 )	2024-04-23 18:15:07 -07:00
onnxruntime_java.cmake	Remove deprecated "mobile" packages (#20941 )	2024-06-07 16:20:32 -05:00
onnxruntime_java_unittests.cmake
onnxruntime_kernel_explorer.cmake	[ROCm] Update ck to use ck_tile (#21030 )	2024-06-19 14:06:10 +08:00
onnxruntime_mlas.cmake	Enablement of onnxruntime for AIX and fixing issues related to big-endian platform. (#21133 )	2024-07-17 12:37:06 -07:00
onnxruntime_nodejs.cmake	Enable QNN HTP support for Node (#20576 )	2024-05-09 13:11:07 -07:00
onnxruntime_objectivec.cmake
onnxruntime_opschema_lib.cmake
onnxruntime_optimizer.cmake	Flash attention recompute (#20603 )	2024-05-21 13:38:19 +08:00
onnxruntime_providers.cmake	[VSINPU]Code improvement && Slice/Dropout OP support (#21217 )	2024-07-09 20:14:46 -07:00
onnxruntime_providers_acl.cmake
onnxruntime_providers_armnn.cmake
onnxruntime_providers_azure.cmake
onnxruntime_providers_cann.cmake
onnxruntime_providers_coreml.cmake	Fix Objective-C static analysis warnings. (#20417 )	2024-04-24 11:48:29 -07:00
onnxruntime_providers_cpu.cmake	Enablement of onnxruntime for AIX and fixing issues related to big-endian platform. (#21133 )	2024-07-17 12:37:06 -07:00
onnxruntime_providers_cuda.cmake	[Build] Propagate build option for CUDA minimal to TRT (#20695 )	2024-07-09 14:40:04 -07:00
onnxruntime_providers_dml.cmake	Delay load dxcore.dll in addition to ext-ms-win-dxcore-l1-1-0.dll (#18913 )	2023-12-26 12:33:42 -08:00
onnxruntime_providers_dnnl.cmake
onnxruntime_providers_js.cmake
onnxruntime_providers_migraphx.cmake	Migraphx ep windows build (#21284 )	2024-07-11 21:21:38 -07:00
onnxruntime_providers_nnapi.cmake	Make partitioning utils QDQ aware so it does not break up QDQ node units (#19723 )	2024-03-12 10:55:49 +10:00
onnxruntime_providers_openvino.cmake	Change libonnxruntime.so's SONAME: remove the minor and patch version. (#21339 )	2024-07-15 14:21:34 -07:00
onnxruntime_providers_qnn.cmake	Make partitioning utils QDQ aware so it does not break up QDQ node units (#19723 )	2024-03-12 10:55:49 +10:00
onnxruntime_providers_rknpu.cmake
onnxruntime_providers_rocm.cmake	[ROCm] fix: obtain AMD GPU memory info through rocm_smi library (#21190 )	2024-07-09 20:35:26 -07:00
onnxruntime_providers_tensorrt.cmake	[Build] Propagate build option for CUDA minimal to TRT (#20695 )	2024-07-09 14:40:04 -07:00
onnxruntime_providers_tvm.cmake
onnxruntime_providers_vitisai.cmake	[VitisAI] Solve the problem that gsl cannot be found when compiling under linux (#20466 )	2024-04-28 20:56:16 -07:00
onnxruntime_providers_vsinpu.cmake	[VSINPU]Code improvement && Slice/Dropout OP support (#21217 )	2024-07-09 20:14:46 -07:00
onnxruntime_providers_webnn.cmake
onnxruntime_providers_xnnpack.cmake	Make partitioning utils QDQ aware so it does not break up QDQ node units (#19723 )	2024-03-12 10:55:49 +10:00
onnxruntime_python.cmake	onnxruntime shared lib inside python package (#21223 )	2024-07-02 15:37:50 -07:00
onnxruntime_rocm_hipify.cmake	[CUDA] Attention kernel provider option (#21344 )	2024-07-19 13:58:54 -07:00
onnxruntime_session.cmake
onnxruntime_snpe_provider.cmake
onnxruntime_training.cmake	Delete pyop (#21094 )	2024-06-19 16:21:33 -07:00
onnxruntime_unittests.cmake	[CUDA] Attention kernel provider option (#21344 )	2024-07-19 13:58:54 -07:00
onnxruntime_util.cmake
onnxruntime_visionos.toolchain.cmake	Support visionos build (#20365 )	2024-04-23 18:15:07 -07:00
onnxruntime_webassembly.cmake	[js/web] optimize module export and deployment (#20165 )	2024-05-20 09:51:16 -07:00
precompiled_header.cmake
riscv64.toolchain.cmake	Enable RISC-V 64-bit Cross-Compiling Support for ONNX Runtime on Linux (#19238 )	2024-01-24 16:27:05 -08:00
Sdl.ruleset
set_winapi_family_desktop.h
target_delayload.cmake
uwp_stubs.h
wcos_rules_override.cmake	Stop using apiset in OneCore build: use onecoreuap.lib instead of onecoreuap_apiset.lib (#19632 )	2024-02-23 22:31:57 -08:00
winml.cmake	Change libonnxruntime.so's SONAME: remove the minor and patch version. (#21339 )	2024-07-15 14:21:34 -07:00
winml_cppwinrt.cmake
winml_sdk_helpers.cmake
winml_unittests.cmake