onnxruntime/cmake/external
Ye Wang 2ee822d483
Extend memory efficient attention coverage in Attention/MHA cuda op (#15064)
### Description
<!-- Describe your changes. -->

1. upgrade cutlass to 3.0 that containing attn_bias support.
2. extend Attention/MHA to use memory efficient attention when
rel_pos_bias with [1, num_head, s, s*] and 1d mask with [2 * batch_size
+ 1] are present.

new mask format introduction:
MASK_1D_KEY_SEQ_LEN_START,  
[3 * batch_size + 2] with [key_len[0], ..., key_len[batch_size - 1],
query_start[0], ..., query_start[batch_size - 1], query_end[batch_size -
1], key_start[0], ..., key_start[batch_size - 1], key_end[batch_size -
1]]

e.g
2D mask with [[1, 1, 1, 0, 0, 0], [1, 1, 1, 1, 1, 0]] converts to this
1D mask is [3, 5, 0, 6, 12, 0, 6, 12]


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

It potentially benefits tnlrv6 and t5(encoder)

---------

Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>
Co-authored-by: Kunal Vaishnavi <kvaishnavi@microsoft.com>
Co-authored-by: Kunal Vaishnavi <kvaishnavi@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2023-03-23 11:05:17 -07:00
..
eigen@d10b27fe37
emsdk@0ab19024f0 [wasm] upgrade emsdk from 3.1.19 to 3.1.32 (#14818) 2023-02-28 11:06:09 -08:00
libprotobuf-mutator@7a2ed51a6b
onnx@9b7bca2a72 to work with onnx 1.13 rc, implement ver 18 reduce and optioanl ops, … (#13765) 2023-01-09 10:26:16 -08:00
onnxruntime-extensions@81e7799c69 pin ort-ext to 81e7799c69044c745239202085eb0a98f102937b (#14044) 2023-01-10 10:10:17 -08:00
protobuf@a20c65f2cd upgrade protobuf to 3.20.2 and onnx to 1.13 (#14279) 2023-01-31 12:55:09 -08:00
abseil-cpp.cmake Let Cmake decide where to place abseil (#14057) 2022-12-23 12:08:13 -08:00
abseil-cpp.natvis Update absl to the latest release (#13990) 2022-12-19 14:25:13 -08:00
composable_kernel.cmake ROCm Flash Attention (#14838) 2023-03-16 10:39:58 +08:00
cutlass.cmake Extend memory efficient attention coverage in Attention/MHA cuda op (#15064) 2023-03-23 11:05:17 -07:00
dml.cmake [DML EP] Upgrade DML to 1.10.1 (#14433) 2023-01-25 21:07:10 -08:00
dnnl.cmake [oneDNN] Update to oneDNN v3.0 (#14267) 2023-02-17 09:56:29 -08:00
eigen.cmake Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
extensions.cmake Migrating ORT Extensions from Git submodule to cmake FetchContent (#14298) 2023-02-22 19:42:36 -08:00
find_snpe.cmake Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
FindNumPy.cmake
helper_functions.cmake Enable cache for msbuild (#14085) 2023-01-06 11:19:57 +08:00
ipp-crypto.cmake
mimalloc.cmake Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
onnx_minimal.cmake Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
onnx_protobuf.natvis
onnxruntime_external_deps.cmake TensorRT EP - timing cache (#14767) 2023-03-10 09:02:27 -08:00
protobuf_function.cmake Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
pybind11.cmake Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
pyxir.cmake
triton.cmake CloudEP (#13855) 2023-01-03 10:03:15 -08:00
tvm.cmake [TVM EP] Support zero copying TVM EP output tensor to ONNX Runtime output tensor (#12593) 2023-02-08 10:02:20 -08:00
wil.cmake Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
xnnpack.cmake Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00