onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-19 19:00:47 +00:00

History

Ye Wang 2ee822d483 Extend memory efficient attention coverage in Attention/MHA cuda op (#15064 ) ### Description <!-- Describe your changes. --> 1. upgrade cutlass to 3.0 that containing attn_bias support. 2. extend Attention/MHA to use memory efficient attention when rel_pos_bias with [1, num_head, s, s] and 1d mask with [2 batch_size + 1] are present. new mask format introduction: MASK_1D_KEY_SEQ_LEN_START, [3 * batch_size + 2] with [key_len[0], ..., key_len[batch_size - 1], query_start[0], ..., query_start[batch_size - 1], query_end[batch_size - 1], key_start[0], ..., key_start[batch_size - 1], key_end[batch_size - 1]] e.g 2D mask with [[1, 1, 1, 0, 0, 0], [1, 1, 1, 1, 1, 0]] converts to this 1D mask is [3, 5, 0, 6, 12, 0, 6, 12] ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> It potentially benefits tnlrv6 and t5(encoder) --------- Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net> Co-authored-by: Kunal Vaishnavi <kvaishnavi@microsoft.com> Co-authored-by: Kunal Vaishnavi <kvaishnavi@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>		2023-03-23 11:05:17 -07:00
..
github	Extend memory efficient attention coverage in Attention/MHA cuda op (#15064 )	2023-03-23 11:05:17 -07:00
__init__.py
amd_hipify.py	[ROCm] Enable Sampling Op UT on AMD (#14581 )	2023-02-06 20:52:06 -08:00
build.py	Fix test_custom_op_get_const_input inference test on Android CI (#15132 )	2023-03-22 23:02:42 -07:00
clean_docker_image_cache.py
coverage.py
gen_def.py	OnnxRuntime QNN EP (#14791 )	2023-03-01 13:48:20 -08:00
get_docker_image.py	[Fix] Error in Linux_Packaging_combined_GPU of nuget packaing pipeline (#15060 )	2023-03-16 08:49:37 +08:00
logger.py
op_registration_utils.py
op_registration_validator.py	Update CUDA ArgMin/ArgMax op kernels to have end version 11 since opset 12+ is not supported yet. (#13983 )	2022-12-21 19:01:00 -05:00
patch_manylinux.py	detach patch manylinux from get_docker_image (#14958 )	2023-03-09 15:40:58 +08:00
policheck_exclusions.xml	Exculde hipify option from policheck (#13431 )	2022-10-25 16:35:16 +08:00
reduce_op_kernels.py	Fix broken and outdated links in documentation (#14092 )	2023-02-23 10:48:04 -08:00
replace_urls_in_deps.py	Move C/C++ deps' URLs to deps.txt (#13769 )	2022-11-29 18:06:35 -08:00
requirements.txt	upgrade protobuf to 3.20.2 and onnx to 1.13 (#14279 )	2023-01-31 12:55:09 -08:00
update_tsaoptions.py	Add license header to some files. (#13074 )	2022-09-23 18:46:02 -07:00
upload_python_package_to_azure_storage.py