onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-05 04:17:53 +00:00

History

Tianlei Wu de93f40240 [CUDA] Lean Attention (#22352 ) ### Description Add [Lean Attention](https://arxiv.org/abs/2405.10480) and the integration with MultiHeadAttention operator for LLM in GPU. LeanAttention speeds up self-attention for the token-generation phase (decode-phase) of decoder-only transformer models, especially on long context lengths. - [x] Initial implementation of Lean Attention (by Srikant Bharadwaj) - [x] Integration with MultiHeadAttention operator - [x] Add parity tests - [x] Add benchmark #### Implementation Details (1) Lean Attention is enabled in build for Linux, and disabled for Windows (2) Lean Attention is disabled by default. Need enable it through cuda provider option sdpa_kernel, or use environment variable `ORT_ENABLE_LEAN_ATTENTION=1` (3) It only works for token-generation (sequence_length==1, past_sequence_length > 0). (4) Like flash attention, it only works in Ampere or newer GPU. We can revisit #1 and #2 after comparing with DecoderMaskedMultiHeadAttention and XQA kernels. #### Benchmark ``` cd onnxruntime/test/python/transformers /bin/bash benchmark_mha.sh lean ``` Example outputs in H100: Note that past and present does not share buffer for MHA for now, so we can see low tflops. The relative ratio will change after buffer sharing is enabled. But we expect that the order (kernel A is faster than B) will remain the same after buffer sharing is enabled. Note that common settings `sequence_length=1; causal=True;attn_bias=None;cuda_graph=False` are not shown in the below table. batch_size \| past_sequence_length \| num_heads \| head_size \| average_latency \| tflops \| kernel -- \| -- \| -- \| -- \| -- \| -- \| -- 1 \| 512 \| 16 \| 64 \| 0.000059 \| 0.0178 \| ort:flash 1 \| 512 \| 16 \| 64 \| 0.000068 \| 0.0155 \| ort:efficient 1 \| 512 \| 16 \| 64 \| 0.000065 \| 0.0161 \| ort:math 1 \| 512 \| 16 \| 64 \| 0.000060 \| 0.0176 \| ort:lean 1 \| 512 \| 32 \| 128 \| 0.000062 \| 0.0674 \| ort:flash 1 \| 512 \| 32 \| 128 \| 0.000064 \| 0.0661 \| ort:efficient 1 \| 512 \| 32 \| 128 \| 0.000067 \| 0.0625 \| ort:math 1 \| 512 \| 32 \| 128 \| 0.000062 \| 0.0678 \| ort:lean 1 \| 1024 \| 16 \| 64 \| 0.000061 \| 0.0345 \| ort:flash 1 \| 1024 \| 16 \| 64 \| 0.000086 \| 0.0244 \| ort:efficient 1 \| 1024 \| 16 \| 64 \| 0.000065 \| 0.0322 \| ort:math 1 \| 1024 \| 16 \| 64 \| 0.000063 \| 0.0332 \| ort:lean 1 \| 1024 \| 32 \| 128 \| 0.000075 \| 0.1125 \| ort:flash 1 \| 1024 \| 32 \| 128 \| 0.000088 \| 0.0951 \| ort:efficient 1 \| 1024 \| 32 \| 128 \| 0.000079 \| 0.1068 \| ort:math 1 \| 1024 \| 32 \| 128 \| 0.000072 \| 0.1171 \| ort:lean 1 \| 2048 \| 16 \| 64 \| 0.000069 \| 0.0606 \| ort:flash 1 \| 2048 \| 16 \| 64 \| 0.000125 \| 0.0336 \| ort:efficient 1 \| 2048 \| 16 \| 64 \| 0.000064 \| 0.0655 \| ort:lean 1 \| 2048 \| 32 \| 128 \| 0.000098 \| 0.1720 \| ort:flash 1 \| 2048 \| 32 \| 128 \| 0.000132 \| 0.1270 \| ort:efficient 1 \| 2048 \| 32 \| 128 \| 0.000092 \| 0.1828 \| ort:lean 1 \| 4096 \| 16 \| 64 \| 0.000076 \| 0.1097 \| ort:flash 1 \| 4096 \| 16 \| 64 \| 0.000207 \| 0.0406 \| ort:efficient 1 \| 4096 \| 16 \| 64 \| 0.000069 \| 0.1209 \| ort:lean 1 \| 4096 \| 32 \| 128 \| 0.000140 \| 0.2394 \| ort:flash 1 \| 4096 \| 32 \| 128 \| 0.000213 \| 0.1575 \| ort:efficient 1 \| 4096 \| 32 \| 128 \| 0.000139 \| 0.2419 \| ort:lean 1 \| 8192 \| 16 \| 64 \| 0.000104 \| 0.1609 \| ort:flash 1 \| 8192 \| 16 \| 64 \| 0.000392 \| 0.0428 \| ort:efficient 1 \| 8192 \| 16 \| 64 \| 0.000093 \| 0.1809 \| ort:lean 1 \| 8192 \| 32 \| 128 \| 0.000212 \| 0.3160 \| ort:flash 1 \| 8192 \| 32 \| 128 \| 0.000360 \| 0.1866 \| ort:efficient 1 \| 8192 \| 32 \| 128 \| 0.000212 \| 0.3162 \| ort:lean 1 \| 16384 \| 16 \| 64 \| 0.000139 \| 0.2410 \| ort:flash 1 \| 16384 \| 16 \| 64 \| 0.000731 \| 0.0459 \| ort:efficient 1 \| 16384 \| 16 \| 64 \| 0.000136 \| 0.2465 \| ort:lean 1 \| 16384 \| 32 \| 128 \| 0.000361 \| 0.3722 \| ort:flash 1 \| 16384 \| 32 \| 128 \| 0.000667 \| 0.2014 \| ort:efficient 1 \| 16384 \| 32 \| 128 \| 0.000357 \| 0.3765 \| ort:lean 1 \| 32768 \| 16 \| 64 \| 0.000210 \| 0.3194 \| ort:flash 1 \| 32768 \| 16 \| 64 \| 0.001428 \| 0.0470 \| ort:efficient 1 \| 32768 \| 16 \| 64 \| 0.000209 \| 0.3211 \| ort:lean 1 \| 32768 \| 32 \| 128 \| 0.000659 \| 0.4074 \| ort:flash 1 \| 32768 \| 32 \| 128 \| 0.001270 \| 0.2114 \| ort:efficient 1 \| 32768 \| 32 \| 128 \| 0.000651 \| 0.4123 \| ort:lean 1 \| 65536 \| 16 \| 64 \| 0.000355 \| 0.3785 \| ort:flash 1 \| 65536 \| 16 \| 64 \| 0.002736 \| 0.0491 \| ort:efficient 1 \| 65536 \| 16 \| 64 \| 0.000349 \| 0.3845 \| ort:lean 1 \| 65536 \| 32 \| 128 \| 0.001251 \| 0.4290 \| ort:flash 1 \| 65536 \| 32 \| 128 \| 0.002480 \| 0.2165 \| ort:efficient 1 \| 65536 \| 32 \| 128 \| 0.001239 \| 0.4333 \| ort:lean 4 \| 512 \| 16 \| 64 \| 0.000063 \| 0.0665 \| ort:flash 4 \| 512 \| 16 \| 64 \| 0.000069 \| 0.0607 \| ort:efficient 4 \| 512 \| 16 \| 64 \| 0.000066 \| 0.0634 \| ort:math 4 \| 512 \| 16 \| 64 \| 0.000062 \| 0.0674 \| ort:lean 4 \| 512 \| 32 \| 128 \| 0.000100 \| 0.1677 \| ort:flash 4 \| 512 \| 32 \| 128 \| 0.000099 \| 0.1703 \| ort:efficient 4 \| 512 \| 32 \| 128 \| 0.000108 \| 0.1557 \| ort:math 4 \| 512 \| 32 \| 128 \| 0.000092 \| 0.1818 \| ort:lean 4 \| 1024 \| 16 \| 64 \| 0.000077 \| 0.1094 \| ort:flash 4 \| 1024 \| 16 \| 64 \| 0.000099 \| 0.0850 \| ort:efficient 4 \| 1024 \| 16 \| 64 \| 0.000081 \| 0.1038 \| ort:math 4 \| 1024 \| 16 \| 64 \| 0.000072 \| 0.1161 \| ort:lean 4 \| 1024 \| 32 \| 128 \| 0.000143 \| 0.2343 \| ort:flash 4 \| 1024 \| 32 \| 128 \| 0.000137 \| 0.2447 \| ort:efficient 4 \| 1024 \| 32 \| 128 \| 0.000150 \| 0.2245 \| ort:math 4 \| 1024 \| 32 \| 128 \| 0.000135 \| 0.2496 \| ort:lean 4 \| 2048 \| 16 \| 64 \| 0.000096 \| 0.1757 \| ort:flash 4 \| 2048 \| 16 \| 64 \| 0.000156 \| 0.1078 \| ort:efficient 4 \| 2048 \| 16 \| 64 \| 0.000089 \| 0.1892 \| ort:lean 4 \| 2048 \| 32 \| 128 \| 0.000223 \| 0.3010 \| ort:flash 4 \| 2048 \| 32 \| 128 \| 0.000217 \| 0.3101 \| ort:efficient 4 \| 2048 \| 32 \| 128 \| 0.000209 \| 0.3209 \| ort:lean 4 \| 4096 \| 16 \| 64 \| 0.000137 \| 0.2448 \| ort:flash 4 \| 4096 \| 16 \| 64 \| 0.000256 \| 0.1312 \| ort:efficient 4 \| 4096 \| 16 \| 64 \| 0.000133 \| 0.2530 \| ort:lean 4 \| 4096 \| 32 \| 128 \| 0.000389 \| 0.3450 \| ort:flash 4 \| 4096 \| 32 \| 128 \| 0.000376 \| 0.3574 \| ort:efficient 4 \| 4096 \| 32 \| 128 \| 0.000354 \| 0.3794 \| ort:lean 4 \| 8192 \| 16 \| 64 \| 0.000210 \| 0.3198 \| ort:flash 4 \| 8192 \| 16 \| 64 \| 0.000453 \| 0.1480 \| ort:efficient 4 \| 8192 \| 16 \| 64 \| 0.000206 \| 0.3260 \| ort:lean 4 \| 8192 \| 32 \| 128 \| 0.000725 \| 0.3705 \| ort:flash 4 \| 8192 \| 32 \| 128 \| 0.000693 \| 0.3874 \| ort:efficient 4 \| 8192 \| 32 \| 128 \| 0.000653 \| 0.4114 \| ort:lean 4 \| 16384 \| 16 \| 64 \| 0.000355 \| 0.3782 \| ort:flash 4 \| 16384 \| 16 \| 64 \| 0.000849 \| 0.1581 \| ort:efficient 4 \| 16384 \| 16 \| 64 \| 0.000346 \| 0.3874 \| ort:lean 4 \| 16384 \| 32 \| 128 \| 0.001395 \| 0.3848 \| ort:flash 4 \| 16384 \| 32 \| 128 \| 0.001337 \| 0.4017 \| ort:efficient 4 \| 16384 \| 32 \| 128 \| 0.001252 \| 0.4288 \| ort:lean 4 \| 32768 \| 16 \| 64 \| 0.000647 \| 0.4146 \| ort:flash 4 \| 32768 \| 16 \| 64 \| 0.001649 \| 0.1628 \| ort:efficient 4 \| 32768 \| 16 \| 64 \| 0.000639 \| 0.4204 \| ort:lean 4 \| 32768 \| 32 \| 128 \| 0.002721 \| 0.3947 \| ort:flash 4 \| 32768 \| 32 \| 128 \| 0.002601 \| 0.4128 \| ort:efficient 4 \| 32768 \| 32 \| 128 \| 0.002434 \| 0.4411 \| ort:lean 4 \| 65536 \| 16 \| 64 \| 0.001231 \| 0.4361 \| ort:flash 4 \| 65536 \| 16 \| 64 \| 0.003238 \| 0.1658 \| ort:efficient 4 \| 65536 \| 16 \| 64 \| 0.001217 \| 0.4412 \| ort:lean 4 \| 65536 \| 32 \| 128 \| 0.005357 \| 0.4009 \| ort:flash 4 \| 65536 \| 32 \| 128 \| 0.005118 \| 0.4196 \| ort:efficient 4 \| 65536 \| 32 \| 128 \| 0.004781 \| 0.4492 \| ort:lean 16 \| 512 \| 16 \| 64 \| 0.000098 \| 0.1724 \| ort:flash 16 \| 512 \| 16 \| 64 \| 0.000104 \| 0.1616 \| ort:efficient 16 \| 512 \| 16 \| 64 \| 0.000118 \| 0.1420 \| ort:math 16 \| 512 \| 16 \| 64 \| 0.000087 \| 0.1926 \| ort:lean 16 \| 512 \| 32 \| 128 \| 0.000220 \| 0.3062 \| ort:flash 16 \| 512 \| 32 \| 128 \| 0.000208 \| 0.3237 \| ort:efficient 16 \| 512 \| 32 \| 128 \| 0.000237 \| 0.2838 \| ort:math 16 \| 512 \| 32 \| 128 \| 0.000209 \| 0.3216 \| ort:lean 16 \| 1024 \| 16 \| 64 \| 0.000136 \| 0.2465 \| ort:flash 16 \| 1024 \| 16 \| 64 \| 0.000150 \| 0.2235 \| ort:efficient 16 \| 1024 \| 16 \| 64 \| 0.000148 \| 0.2266 \| ort:math 16 \| 1024 \| 16 \| 64 \| 0.000129 \| 0.2611 \| ort:lean 16 \| 1024 \| 32 \| 128 \| 0.000367 \| 0.3663 \| ort:flash 16 \| 1024 \| 32 \| 128 \| 0.000351 \| 0.3829 \| ort:efficient 16 \| 1024 \| 32 \| 128 \| 0.000400 \| 0.3357 \| ort:math 16 \| 1024 \| 32 \| 128 \| 0.000349 \| 0.3853 \| ort:lean 16 \| 2048 \| 16 \| 64 \| 0.000209 \| 0.3206 \| ort:flash 16 \| 2048 \| 16 \| 64 \| 0.000243 \| 0.2762 \| ort:efficient 16 \| 2048 \| 16 \| 64 \| 0.000201 \| 0.3338 \| ort:lean 16 \| 2048 \| 32 \| 128 \| 0.000671 \| 0.4002 \| ort:flash 16 \| 2048 \| 32 \| 128 \| 0.000645 \| 0.4163 \| ort:efficient 16 \| 2048 \| 32 \| 128 \| 0.000642 \| 0.4185 \| ort:lean 16 \| 4096 \| 16 \| 64 \| 0.000360 \| 0.3732 \| ort:flash 16 \| 4096 \| 16 \| 64 \| 0.000425 \| 0.3162 \| ort:efficient 16 \| 4096 \| 16 \| 64 \| 0.000341 \| 0.3933 \| ort:lean 16 \| 4096 \| 32 \| 128 \| 0.001292 \| 0.4156 \| ort:flash 16 \| 4096 \| 32 \| 128 \| 0.001251 \| 0.4291 \| ort:efficient 16 \| 4096 \| 32 \| 128 \| 0.001241 \| 0.4327 \| ort:lean 16 \| 8192 \| 16 \| 64 \| 0.000666 \| 0.4030 \| ort:flash 16 \| 8192 \| 16 \| 64 \| 0.000804 \| 0.3339 \| ort:efficient 16 \| 8192 \| 16 \| 64 \| 0.000627 \| 0.4283 \| ort:lean 16 \| 8192 \| 32 \| 128 \| 0.002541 \| 0.4226 \| ort:flash 16 \| 8192 \| 32 \| 128 \| 0.002454 \| 0.4376 \| ort:efficient 16 \| 8192 \| 32 \| 128 \| 0.002438 \| 0.4405 \| ort:lean 16 \| 16384 \| 16 \| 64 \| 0.001292 \| 0.4156 \| ort:flash 16 \| 16384 \| 16 \| 64 \| 0.001571 \| 0.3417 \| ort:efficient 16 \| 16384 \| 16 \| 64 \| 0.001217 \| 0.4411 \| ort:lean 16 \| 16384 \| 32 \| 128 \| 0.005042 \| 0.4260 \| ort:flash 16 \| 16384 \| 32 \| 128 \| 0.004859 \| 0.4420 \| ort:efficient 16 \| 16384 \| 32 \| 128 \| 0.004827 \| 0.4449 \| ort:lean 16 \| 32768 \| 16 \| 64 \| 0.002537 \| 0.4233 \| ort:flash 16 \| 32768 \| 16 \| 64 \| 0.003103 \| 0.3461 \| ort:efficient 16 \| 32768 \| 16 \| 64 \| 0.002385 \| 0.4501 \| ort:lean 16 \| 32768 \| 32 \| 128 \| 0.009961 \| 0.4312 \| ort:flash 16 \| 32768 \| 32 \| 128 \| 0.009605 \| 0.4472 \| ort:efficient 16 \| 32768 \| 32 \| 128 \| 0.009524 \| 0.4510 \| ort:lean 16 \| 65536 \| 16 \| 64 \| 0.005019 \| 0.4279 \| ort:flash 16 \| 65536 \| 16 \| 64 \| 0.006133 \| 0.3502 \| ort:efficient 16 \| 65536 \| 16 \| 64 \| 0.004703 \| 0.4566 \| ort:lean 16 \| 65536 \| 32 \| 128 \| 0.019746 \| 0.4350 \| ort:flash 16 \| 65536 \| 32 \| 128 \| 0.019027 \| 0.4515 \| ort:efficient 16 \| 65536 \| 32 \| 128 \| 0.018864 \| 0.4554 \| ort:lean ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->		2024-10-14 14:49:37 -07:00
..
external	[MIGraphX EP/ ROCm EP] add gfx1200, gfx1201 to CMAKE_HIP_ARCHITECTURES (#22348 )	2024-10-11 17:31:36 -07:00
patches	[MIGraphX EP/ ROCm EP] add gfx1200, gfx1201 to CMAKE_HIP_ARCHITECTURES (#22348 )	2024-10-11 17:31:36 -07:00
tensorboard
adjust_global_compile_flags.cmake	Enable Android 16 KB page size support (#22076 )	2024-09-19 18:53:57 +10:00
arm64x.cmake	Dev/mookerem/arm64x update (#20536 )	2024-05-07 12:50:38 -07:00
CMakeLists.txt	[CUDA] Lean Attention (#22352 )	2024-10-14 14:49:37 -07:00
CMakePresets.json	Create CMake option `onnxruntime_USE_VCPKG` (#21348 )	2024-09-10 16:39:27 -07:00
CMakeSettings.json
codeconv.runsettings
deps.txt	Upgrade absl to the latest released version (#22365 )	2024-10-09 20:21:40 -07:00
deps_update_and_upload.py	Update google benchmark to 1.8.3. (#19734 )	2024-03-01 11:01:58 -08:00
EnableVisualStudioCodeAnalysis.props
gdk_toolchain.cmake
hip_fatbin_insert	[MIGraphX EP/ ROCm EP] add gfx1200, gfx1201 to CMAKE_HIP_ARCHITECTURES (#22348 )	2024-10-11 17:31:36 -07:00
Info.plist.in
libonnxruntime.pc.cmake.in
linux_arm32_crosscompile_toolchain.cmake
linux_arm64_crosscompile_toolchain.cmake
maccatalyst_prepare_objects_for_prelink.py	Support xcframework for mac catalyst builds. (#19534 )	2024-03-20 10:55:19 -07:00
nuget_helpers.cmake	Update nuget.exe used in WindowsAI nuget packaging so `readme` property is supported. (#22141 )	2024-09-19 19:06:47 +10:00
onnxruntime.cmake	Fix Xcode 16 iOS build issues (#22379 )	2024-10-14 09:24:38 -07:00
onnxruntime_codegen_tvm.cmake
onnxruntime_common.cmake	Enable QNN HTP support for Node (#20576 )	2024-05-09 13:11:07 -07:00
onnxruntime_compile_triton_kernel.cmake	[CUDA] Add SparseAttention operator for Phi-3-small (#20216 )	2024-04-30 09:06:29 -07:00
onnxruntime_config.h.in	Get build working on Xcode 16 (#22168 )	2024-09-24 08:33:03 -07:00
onnxruntime_csharp.cmake
onnxruntime_flatbuffers.cmake
onnxruntime_framework.cmake	Adding CUDNN Frontend and use for CUDA NN Convolution (#19470 )	2024-08-02 15:16:42 -07:00
onnxruntime_framework.natvis
onnxruntime_fuzz_test.cmake	[Fuzzer] Add two new ORT libfuzzer (Linux clang support for now) (#22055 )	2024-09-12 11:50:34 -07:00
onnxruntime_graph.cmake	[Apple framework] Fix minimal build with training enabled. (#19858 )	2024-03-12 11:33:30 -07:00
onnxruntime_ios.toolchain.cmake	Support visionos build (#20365 )	2024-04-23 18:15:07 -07:00
onnxruntime_java.cmake	Remove deprecated "mobile" packages (#20941 )	2024-06-07 16:20:32 -05:00
onnxruntime_java_unittests.cmake	[Java] Add API for appending QNN EP (#22208 )	2024-10-01 10:18:04 -07:00
onnxruntime_kernel_explorer.cmake	Add Linux ROCm CI Pipeline (#21798 )	2024-08-30 14:50:32 +08:00
onnxruntime_lora.cmake	Multi-Lora support (#22046 )	2024-09-30 15:59:07 -07:00
onnxruntime_mlas.cmake	[ARM64] MatMulNBits: use neon instrinsics to convert between fp16 and fp32 (#22195 )	2024-09-26 13:55:40 -07:00
onnxruntime_nodejs.cmake	Initial WebGPU EP checkin (#22318 )	2024-10-08 16:10:46 -07:00
onnxruntime_objectivec.cmake	Initial WebGPU EP checkin (#22318 )	2024-10-08 16:10:46 -07:00
onnxruntime_opschema_lib.cmake
onnxruntime_optimizer.cmake	Flash attention recompute (#20603 )	2024-05-21 13:38:19 +08:00
onnxruntime_providers.cmake	Initial WebGPU EP checkin (#22318 )	2024-10-08 16:10:46 -07:00
onnxruntime_providers_acl.cmake
onnxruntime_providers_armnn.cmake
onnxruntime_providers_azure.cmake
onnxruntime_providers_cann.cmake
onnxruntime_providers_coreml.cmake	Fix Objective-C static analysis warnings. (#20417 )	2024-04-24 11:48:29 -07:00
onnxruntime_providers_cpu.cmake	Initial WebGPU EP checkin (#22318 )	2024-10-08 16:10:46 -07:00
onnxruntime_providers_cuda.cmake	Adding CUDNN Frontend and use for CUDA NN Convolution (#19470 )	2024-08-02 15:16:42 -07:00
onnxruntime_providers_dml.cmake
onnxruntime_providers_dnnl.cmake
onnxruntime_providers_js.cmake
onnxruntime_providers_migraphx.cmake	Migraphx ep windows build (#21284 )	2024-07-11 21:21:38 -07:00
onnxruntime_providers_nnapi.cmake	Make partitioning utils QDQ aware so it does not break up QDQ node units (#19723 )	2024-03-12 10:55:49 +10:00
onnxruntime_providers_openvino.cmake	Ovep develop lnl 1.2 (#22424 )	2024-10-14 12:10:01 -07:00
onnxruntime_providers_qnn.cmake	Make partitioning utils QDQ aware so it does not break up QDQ node units (#19723 )	2024-03-12 10:55:49 +10:00
onnxruntime_providers_rknpu.cmake
onnxruntime_providers_rocm.cmake	[MIGraphX EP/ ROCm EP] add gfx1200, gfx1201 to CMAKE_HIP_ARCHITECTURES (#22348 )	2024-10-11 17:31:36 -07:00
onnxruntime_providers_tensorrt.cmake	Adding CUDNN Frontend and use for CUDA NN Convolution (#19470 )	2024-08-02 15:16:42 -07:00
onnxruntime_providers_tvm.cmake
onnxruntime_providers_vitisai.cmake	[VitisAI] remove wrong error msg, required by Microsoft (#21715 )	2024-08-21 21:10:28 -07:00
onnxruntime_providers_vsinpu.cmake	[VSINPU]Code improvement && Slice/Dropout OP support (#21217 )	2024-07-09 20:14:46 -07:00
onnxruntime_providers_webgpu.cmake	Initial WebGPU EP checkin (#22318 )	2024-10-08 16:10:46 -07:00
onnxruntime_providers_webnn.cmake
onnxruntime_providers_xnnpack.cmake	Make partitioning utils QDQ aware so it does not break up QDQ node units (#19723 )	2024-03-12 10:55:49 +10:00
onnxruntime_python.cmake	Initial WebGPU EP checkin (#22318 )	2024-10-08 16:10:46 -07:00
onnxruntime_rocm_hipify.cmake	[CUDA] cuDNN Flash Attention (#21629 )	2024-08-20 08:50:22 -07:00
onnxruntime_session.cmake	Multi-Lora support (#22046 )	2024-09-30 15:59:07 -07:00
onnxruntime_snpe_provider.cmake
onnxruntime_training.cmake	Multi-Lora support (#22046 )	2024-09-30 15:59:07 -07:00
onnxruntime_unittests.cmake	Fix Xcode 16 iOS build issues (#22379 )	2024-10-14 09:24:38 -07:00
onnxruntime_util.cmake
onnxruntime_visionos.toolchain.cmake	Support visionos build (#20365 )	2024-04-23 18:15:07 -07:00
onnxruntime_webassembly.cmake	Multi-Lora support (#22046 )	2024-09-30 15:59:07 -07:00
precompiled_header.cmake
riscv64.toolchain.cmake
Sdl.ruleset
set_winapi_family_desktop.h
target_delayload.cmake
uwp_stubs.h
vcpkg-configuration.json	Auto regenerate LORA's fbs files (#22313 )	2024-10-04 10:01:19 -07:00
vcpkg.json	Create CMake option `onnxruntime_USE_VCPKG` (#21348 )	2024-09-10 16:39:27 -07:00
wcos_rules_override.cmake	Stop using apiset in OneCore build: use onecoreuap.lib instead of onecoreuap_apiset.lib (#19632 )	2024-02-23 22:31:57 -08:00
winml.cmake	Change libonnxruntime.so's SONAME: remove the minor and patch version. (#21339 )	2024-07-15 14:21:34 -07:00
winml_cppwinrt.cmake
winml_sdk_helpers.cmake
winml_unittests.cmake	Multi-Lora support (#22046 )	2024-09-30 15:59:07 -07:00