onnxruntime/cmake
pengwa 8a98874e7e
Flash attention recompute (#20603)
### Flash attn recompute

1. Allow PythonOp(FlashAttn) can be recomputed correctly.
45879ff5c2
2. Use JSON to pass the selected-to-recompute subgraphs.
3c374da678

#### Better Memory Efficiency 

Customer model can run both PyTorch SPDA and Flash Attn, this PR make it
possible to let the Flash Attn path work with ORTModule layerwise
recompute. The peak drop from 45.xGB to 32.xGB if we only compare the
layers (not including other pieces, BTW there are few more optimization
targeting other pieces as well later).

#### Better Perf

Using Flash ATTN bring additionally 16% end to end time reduction, with
highly aligned loss curve.


![image](https://github.com/microsoft/onnxruntime/assets/10530022/bb63894a-f281-49bc-a8e6-ff818439be38)

#### Use JSON File to pass Recompute Plans

To overcome the limitation of max length of the strings defined in
session options.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-05-21 13:38:19 +08:00
..
external [js/web] optimize module export and deployment (#20165) 2024-05-20 09:51:16 -07:00
patches Remove usage of 'required reason' iOS API from protobuf (#20529) 2024-05-02 08:21:08 +10:00
tensorboard
adjust_global_compile_flags.cmake Support visionos build (#20365) 2024-04-23 18:15:07 -07:00
arm64x.cmake Dev/mookerem/arm64x update (#20536) 2024-05-07 12:50:38 -07:00
CMakeLists.txt support --arm64ec for qnn ep build (#20607) 2024-05-08 11:09:15 -07:00
CMakeSettings.json
codeconv.runsettings
deps.txt [TensorRT EP] support TensorRT 10-GA (#20506) 2024-05-01 11:10:53 -07:00
deps_update_and_upload.py Update google benchmark to 1.8.3. (#19734) 2024-03-01 11:01:58 -08:00
EnableVisualStudioCodeAnalysis.props
gdk_toolchain.cmake
Info.plist.in
libonnxruntime.pc.cmake.in
linux_arm32_crosscompile_toolchain.cmake Add a build validation for Linux ARM64 cross-compile (#18200) 2023-11-08 13:03:18 -08:00
linux_arm64_crosscompile_toolchain.cmake Add a build validation for Linux ARM64 cross-compile (#18200) 2023-11-08 13:03:18 -08:00
maccatalyst_prepare_objects_for_prelink.py Support xcframework for mac catalyst builds. (#19534) 2024-03-20 10:55:19 -07:00
nuget_helpers.cmake
onnxruntime.cmake Support xcframework for mac catalyst builds. (#19534) 2024-03-20 10:55:19 -07:00
onnxruntime_codegen_tvm.cmake
onnxruntime_common.cmake Enable QNN HTP support for Node (#20576) 2024-05-09 13:11:07 -07:00
onnxruntime_compile_triton_kernel.cmake [CUDA] Add SparseAttention operator for Phi-3-small (#20216) 2024-04-30 09:06:29 -07:00
onnxruntime_config.h.in Enabling c++ 20 in MacOS build (#16187) 2023-09-26 11:27:02 -07:00
onnxruntime_csharp.cmake
onnxruntime_flatbuffers.cmake
onnxruntime_framework.cmake
onnxruntime_framework.natvis
onnxruntime_fuzz_test.cmake
onnxruntime_graph.cmake [Apple framework] Fix minimal build with training enabled. (#19858) 2024-03-12 11:33:30 -07:00
onnxruntime_ios.toolchain.cmake Support visionos build (#20365) 2024-04-23 18:15:07 -07:00
onnxruntime_java.cmake
onnxruntime_java_unittests.cmake
onnxruntime_kernel_explorer.cmake
onnxruntime_language_interop_ops.cmake
onnxruntime_mlas.cmake Mlas Gemm 4bit avx2, avx512, and avx512vnni kernels (#20163) 2024-04-25 21:30:50 -07:00
onnxruntime_nodejs.cmake Enable QNN HTP support for Node (#20576) 2024-05-09 13:11:07 -07:00
onnxruntime_objectivec.cmake
onnxruntime_opschema_lib.cmake
onnxruntime_optimizer.cmake Flash attention recompute (#20603) 2024-05-21 13:38:19 +08:00
onnxruntime_providers.cmake Add initial support for CoreML ML Program to the CoreML EP. (#19347) 2024-02-15 08:46:03 +10:00
onnxruntime_providers_acl.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_armnn.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_azure.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_cann.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_coreml.cmake Fix Objective-C static analysis warnings. (#20417) 2024-04-24 11:48:29 -07:00
onnxruntime_providers_cpu.cmake Support visionos build (#20365) 2024-04-23 18:15:07 -07:00
onnxruntime_providers_cuda.cmake Enable CUDA EP unit testing on Windows (#20039) 2024-03-27 13:32:36 -07:00
onnxruntime_providers_dml.cmake Delay load dxcore.dll in addition to ext-ms-win-dxcore-l1-1-0.dll (#18913) 2023-12-26 12:33:42 -08:00
onnxruntime_providers_dnnl.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_js.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_migraphx.cmake CUDA EP vs ROCM EP hipify audit (#17776) 2023-10-13 10:13:53 +08:00
onnxruntime_providers_nnapi.cmake Make partitioning utils QDQ aware so it does not break up QDQ node units (#19723) 2024-03-12 10:55:49 +10:00
onnxruntime_providers_openvino.cmake Ort openvino npu 1.17 master (#19966) 2024-03-21 18:44:00 -07:00
onnxruntime_providers_qnn.cmake Make partitioning utils QDQ aware so it does not break up QDQ node units (#19723) 2024-03-12 10:55:49 +10:00
onnxruntime_providers_rknpu.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_rocm.cmake CUDA EP vs ROCM EP hipify audit (#17776) 2023-10-13 10:13:53 +08:00
onnxruntime_providers_tensorrt.cmake [TensorRT] adapt for TRT lib name change after TRT 10 GA (update) (#20550) 2024-05-06 15:00:13 -07:00
onnxruntime_providers_tvm.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_vitisai.cmake [VitisAI] Solve the problem that gsl cannot be found when compiling under linux (#20466) 2024-04-28 20:56:16 -07:00
onnxruntime_providers_webnn.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_xnnpack.cmake Make partitioning utils QDQ aware so it does not break up QDQ node units (#19723) 2024-03-12 10:55:49 +10:00
onnxruntime_pyop.cmake
onnxruntime_python.cmake [qnn ep] include qnn sdk in onnxruntime-qnn python whl (#20485) 2024-04-29 09:44:54 -07:00
onnxruntime_rocm_hipify.cmake hipify int4 gemv (#20666) 2024-05-18 16:59:03 +08:00
onnxruntime_session.cmake
onnxruntime_snpe_provider.cmake
onnxruntime_training.cmake
onnxruntime_unittests.cmake [js/web] optimize module export and deployment (#20165) 2024-05-20 09:51:16 -07:00
onnxruntime_util.cmake
onnxruntime_visionos.toolchain.cmake Support visionos build (#20365) 2024-04-23 18:15:07 -07:00
onnxruntime_webassembly.cmake [js/web] optimize module export and deployment (#20165) 2024-05-20 09:51:16 -07:00
precompiled_header.cmake
riscv64.toolchain.cmake Enable RISC-V 64-bit Cross-Compiling Support for ONNX Runtime on Linux (#19238) 2024-01-24 16:27:05 -08:00
Sdl.ruleset
set_winapi_family_desktop.h
target_delayload.cmake
uwp_stubs.h
wcos_rules_override.cmake Stop using apiset in OneCore build: use onecoreuap.lib instead of onecoreuap_apiset.lib (#19632) 2024-02-23 22:31:57 -08:00
winml.cmake [CP] Fix for xfgcheck and Fix WAI ARM64 build (#19634) (#19644) 2024-03-13 17:54:06 -07:00
winml_cppwinrt.cmake
winml_sdk_helpers.cmake
winml_unittests.cmake Update C/C++ dependencies: abseil, date, nsync, googletest, wil, mp11, cpuinfo and safeint (#15470) 2023-09-08 13:35:04 -07:00