onnxruntime/cmake
Ye Wang 2ee822d483
Extend memory efficient attention coverage in Attention/MHA cuda op (#15064)
### Description
<!-- Describe your changes. -->

1. upgrade cutlass to 3.0 that containing attn_bias support.
2. extend Attention/MHA to use memory efficient attention when
rel_pos_bias with [1, num_head, s, s*] and 1d mask with [2 * batch_size
+ 1] are present.

new mask format introduction:
MASK_1D_KEY_SEQ_LEN_START,  
[3 * batch_size + 2] with [key_len[0], ..., key_len[batch_size - 1],
query_start[0], ..., query_start[batch_size - 1], query_end[batch_size -
1], key_start[0], ..., key_start[batch_size - 1], key_end[batch_size -
1]]

e.g
2D mask with [[1, 1, 1, 0, 0, 0], [1, 1, 1, 1, 1, 0]] converts to this
1D mask is [3, 5, 0, 6, 12, 0, 6, 12]


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

It potentially benefits tnlrv6 and t5(encoder)

---------

Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>
Co-authored-by: Kunal Vaishnavi <kvaishnavi@microsoft.com>
Co-authored-by: Kunal Vaishnavi <kvaishnavi@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2023-03-23 11:05:17 -07:00
..
external Extend memory efficient attention coverage in Attention/MHA cuda op (#15064) 2023-03-23 11:05:17 -07:00
patches Extend memory efficient attention coverage in Attention/MHA cuda op (#15064) 2023-03-23 11:05:17 -07:00
tensorboard Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
adjust_global_compile_flags.cmake [js] upgrade dependencies and enable strict mode (#14930) 2023-03-22 15:05:04 -07:00
CMakeLists.txt Fixing CUDA12 build (#15135) 2023-03-23 09:36:51 -07:00
CMakeSettings.json
codeconv.runsettings
deps.txt Extend memory efficient attention coverage in Attention/MHA cuda op (#15064) 2023-03-23 11:05:17 -07:00
EnableVisualStudioCodeAnalysis.props Fix SDL warnings in CPU EP (#9975) 2021-12-19 20:54:29 -08:00
gdk_toolchain.cmake Enable building with a GDK (#11126) 2022-04-07 15:06:31 -07:00
Info.plist.in Enable build dynamic framework for macOS/iOS (#7343) 2021-04-15 16:47:53 -07:00
libonnxruntime.pc.cmake.in cmake: support install target with generated pkg-config file (#7076) 2021-03-22 19:36:31 -07:00
nuget_helpers.cmake Fix nuget build error (#6009) 2020-12-03 09:28:39 -08:00
onnxruntime.cmake OnnxRuntime QNN EP (#14791) 2023-03-01 13:48:20 -08:00
onnxruntime_codegen_tvm.cmake Use target name for flatbuffers (#13991) 2022-12-20 11:44:02 -08:00
onnxruntime_common.cmake Enabling thread pool to be numa-aware (#13778) 2022-12-12 10:33:55 -08:00
onnxruntime_config.h.in Use safe allocator for JNI code (#13999) 2023-03-08 11:40:55 -08:00
onnxruntime_csharp.cmake Refactor training build options (#13964) 2023-01-03 13:28:16 -08:00
onnxruntime_eager.cmake Use target name for flatbuffers (#13991) 2022-12-20 11:44:02 -08:00
onnxruntime_flatbuffers.cmake Use target name for flatbuffers (#13991) 2022-12-20 11:44:02 -08:00
onnxruntime_framework.cmake Introduce collective ops to ort inference build (#14399) 2023-02-07 13:47:48 -08:00
onnxruntime_fuzz_test.cmake Fix fuzz test (#14385) 2023-01-22 22:17:43 -08:00
onnxruntime_graph.cmake Create dedicated build for training api (#14136) 2023-01-10 20:58:04 -08:00
onnxruntime_ios.toolchain.cmake Enable build dynamic framework for macOS/iOS (#7343) 2021-04-15 16:47:53 -07:00
onnxruntime_java.cmake Update Gradle version (#14862) 2023-03-08 12:22:06 -08:00
onnxruntime_java_unittests.cmake [Java] Initial on device training support (#14027) 2023-03-08 10:01:08 -08:00
onnxruntime_kernel_explorer.cmake Add TuningContext for TunableOp (#14557) 2023-02-10 14:27:43 +08:00
onnxruntime_language_interop_ops.cmake Use target name for flatbuffers (#13991) 2022-12-20 11:44:02 -08:00
onnxruntime_mlas.cmake [AMX] add assembler check (#15055) 2023-03-22 07:57:22 +08:00
onnxruntime_nodejs.cmake [js] upgrade dependencies and enable strict mode (#14930) 2023-03-22 15:05:04 -07:00
onnxruntime_objectivec.cmake Remove SafeInt dependency from Objective-C API. (#13698) 2022-11-18 17:06:12 -08:00
onnxruntime_opschema_lib.cmake Use target name for flatbuffers (#13991) 2022-12-20 11:44:02 -08:00
onnxruntime_optimizer.cmake Create dedicated build for training api (#14136) 2023-01-10 20:58:04 -08:00
onnxruntime_providers.cmake fix miopen new API cannot be supported by ROCm5.2.3 (#15077) 2023-03-17 08:40:35 +08:00
onnxruntime_pyop.cmake Use target name for flatbuffers (#13991) 2022-12-20 11:44:02 -08:00
onnxruntime_python.cmake Statistics tool for ORTModule convergence parity (#15020) 2023-03-23 20:34:24 +08:00
onnxruntime_rocm_hipify.cmake exclude packed_attention* from rocm (#15161) 2023-03-23 13:58:57 +08:00
onnxruntime_session.cmake fix headers for training apis (#14350) 2023-01-19 10:26:53 -08:00
onnxruntime_snpe_provider.cmake Use target name for flatbuffers (#13991) 2022-12-20 11:44:02 -08:00
onnxruntime_training.cmake Create dedicated build for training api (#14136) 2023-01-10 20:58:04 -08:00
onnxruntime_unittests.cmake FasterTransformer model wrapper using custom op (#15013) 2023-03-20 09:05:30 -07:00
onnxruntime_util.cmake Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
onnxruntime_webassembly.cmake [js] upgrade dependencies and enable strict mode (#14930) 2023-03-22 15:05:04 -07:00
precompiled_header.cmake Fix Windows Store build (#8753) 2021-08-23 11:19:03 -07:00
Sdl.ruleset Update Sdl.ruleset to remove C26812 from the rules (#12695) 2022-09-01 20:05:20 -07:00
set_winapi_family_desktop.h
target_delayload.cmake Remove Windows Store specific code 2022-03-17 23:38:14 -07:00
uwp_stubs.h Fix Windows Store build (#8753) 2021-08-23 11:19:03 -07:00
wcos_rules_override.cmake Use onecore umbrella lib in onecore builds (#5182) 2020-09-16 10:46:27 -07:00
winml.cmake Use target name for flatbuffers (#13991) 2022-12-20 11:44:02 -08:00
winml_cppwinrt.cmake Fix Windows Store build (#8753) 2021-08-23 11:19:03 -07:00
winml_sdk_helpers.cmake
winml_unittests.cmake Use target name for flatbuffers (#13991) 2022-12-20 11:44:02 -08:00