onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-16 18:31:27 +00:00

History

Ye Wang 2ee822d483 Extend memory efficient attention coverage in Attention/MHA cuda op (#15064 ) ### Description <!-- Describe your changes. --> 1. upgrade cutlass to 3.0 that containing attn_bias support. 2. extend Attention/MHA to use memory efficient attention when rel_pos_bias with [1, num_head, s, s] and 1d mask with [2 batch_size + 1] are present. new mask format introduction: MASK_1D_KEY_SEQ_LEN_START, [3 * batch_size + 2] with [key_len[0], ..., key_len[batch_size - 1], query_start[0], ..., query_start[batch_size - 1], query_end[batch_size - 1], key_start[0], ..., key_start[batch_size - 1], key_end[batch_size - 1]] e.g 2D mask with [[1, 1, 1, 0, 0, 0], [1, 1, 1, 1, 1, 0]] converts to this 1D mask is [3, 5, 0, 6, 12, 0, 6, 12] ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> It potentially benefits tnlrv6 and t5(encoder) --------- Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net> Co-authored-by: Kunal Vaishnavi <kvaishnavi@microsoft.com> Co-authored-by: Kunal Vaishnavi <kvaishnavi@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>		2023-03-23 11:05:17 -07:00
..
external	Extend memory efficient attention coverage in Attention/MHA cuda op (#15064 )	2023-03-23 11:05:17 -07:00
patches	Extend memory efficient attention coverage in Attention/MHA cuda op (#15064 )	2023-03-23 11:05:17 -07:00
tensorboard	Improve dependency management (#13523 )	2022-12-01 09:51:59 -08:00
adjust_global_compile_flags.cmake	[js] upgrade dependencies and enable strict mode (#14930 )	2023-03-22 15:05:04 -07:00
CMakeLists.txt	Fixing CUDA12 build (#15135 )	2023-03-23 09:36:51 -07:00
CMakeSettings.json
codeconv.runsettings
deps.txt	Extend memory efficient attention coverage in Attention/MHA cuda op (#15064 )	2023-03-23 11:05:17 -07:00
EnableVisualStudioCodeAnalysis.props	Fix SDL warnings in CPU EP (#9975 )	2021-12-19 20:54:29 -08:00
gdk_toolchain.cmake	Enable building with a GDK (#11126 )	2022-04-07 15:06:31 -07:00
Info.plist.in
libonnxruntime.pc.cmake.in
nuget_helpers.cmake
onnxruntime.cmake	OnnxRuntime QNN EP (#14791 )	2023-03-01 13:48:20 -08:00
onnxruntime_codegen_tvm.cmake	Use target name for flatbuffers (#13991 )	2022-12-20 11:44:02 -08:00
onnxruntime_common.cmake	Enabling thread pool to be numa-aware (#13778 )	2022-12-12 10:33:55 -08:00
onnxruntime_config.h.in	Use safe allocator for JNI code (#13999 )	2023-03-08 11:40:55 -08:00
onnxruntime_csharp.cmake	Refactor training build options (#13964 )	2023-01-03 13:28:16 -08:00
onnxruntime_eager.cmake	Use target name for flatbuffers (#13991 )	2022-12-20 11:44:02 -08:00
onnxruntime_flatbuffers.cmake	Use target name for flatbuffers (#13991 )	2022-12-20 11:44:02 -08:00
onnxruntime_framework.cmake	Introduce collective ops to ort inference build (#14399 )	2023-02-07 13:47:48 -08:00
onnxruntime_fuzz_test.cmake	Fix fuzz test (#14385 )	2023-01-22 22:17:43 -08:00
onnxruntime_graph.cmake	Create dedicated build for training api (#14136 )	2023-01-10 20:58:04 -08:00
onnxruntime_ios.toolchain.cmake
onnxruntime_java.cmake	Update Gradle version (#14862 )	2023-03-08 12:22:06 -08:00
onnxruntime_java_unittests.cmake	[Java] Initial on device training support (#14027 )	2023-03-08 10:01:08 -08:00
onnxruntime_kernel_explorer.cmake	Add TuningContext for TunableOp (#14557 )	2023-02-10 14:27:43 +08:00
onnxruntime_language_interop_ops.cmake	Use target name for flatbuffers (#13991 )	2022-12-20 11:44:02 -08:00
onnxruntime_mlas.cmake	[AMX] add assembler check (#15055 )	2023-03-22 07:57:22 +08:00
onnxruntime_nodejs.cmake	[js] upgrade dependencies and enable strict mode (#14930 )	2023-03-22 15:05:04 -07:00
onnxruntime_objectivec.cmake	Remove SafeInt dependency from Objective-C API. (#13698 )	2022-11-18 17:06:12 -08:00
onnxruntime_opschema_lib.cmake	Use target name for flatbuffers (#13991 )	2022-12-20 11:44:02 -08:00
onnxruntime_optimizer.cmake	Create dedicated build for training api (#14136 )	2023-01-10 20:58:04 -08:00
onnxruntime_providers.cmake	fix miopen new API cannot be supported by ROCm5.2.3 (#15077 )	2023-03-17 08:40:35 +08:00
onnxruntime_pyop.cmake	Use target name for flatbuffers (#13991 )	2022-12-20 11:44:02 -08:00
onnxruntime_python.cmake	Statistics tool for ORTModule convergence parity (#15020 )	2023-03-23 20:34:24 +08:00
onnxruntime_rocm_hipify.cmake	exclude packed_attention* from rocm (#15161 )	2023-03-23 13:58:57 +08:00
onnxruntime_session.cmake	fix headers for training apis (#14350 )	2023-01-19 10:26:53 -08:00
onnxruntime_snpe_provider.cmake	Use target name for flatbuffers (#13991 )	2022-12-20 11:44:02 -08:00
onnxruntime_training.cmake	Create dedicated build for training api (#14136 )	2023-01-10 20:58:04 -08:00
onnxruntime_unittests.cmake	FasterTransformer model wrapper using custom op (#15013 )	2023-03-20 09:05:30 -07:00
onnxruntime_util.cmake	Improve dependency management (#13523 )	2022-12-01 09:51:59 -08:00
onnxruntime_webassembly.cmake	[js] upgrade dependencies and enable strict mode (#14930 )	2023-03-22 15:05:04 -07:00
precompiled_header.cmake
Sdl.ruleset	Update Sdl.ruleset to remove C26812 from the rules (#12695 )	2022-09-01 20:05:20 -07:00
set_winapi_family_desktop.h
target_delayload.cmake	Remove Windows Store specific code	2022-03-17 23:38:14 -07:00
uwp_stubs.h
wcos_rules_override.cmake
winml.cmake	Use target name for flatbuffers (#13991 )	2022-12-20 11:44:02 -08:00
winml_cppwinrt.cmake
winml_sdk_helpers.cmake
winml_unittests.cmake	Use target name for flatbuffers (#13991 )	2022-12-20 11:44:02 -08:00