mirror of
https://github.com/saymrwulf/onnxruntime.git
synced 2026-06-04 23:59:56 +00:00
### Description <!-- Describe your changes. --> 1. upgrade cutlass to 3.0 that containing attn_bias support. 2. extend Attention/MHA to use memory efficient attention when rel_pos_bias with [1, num_head, s, s*] and 1d mask with [2 * batch_size + 1] are present. new mask format introduction: MASK_1D_KEY_SEQ_LEN_START, [3 * batch_size + 2] with [key_len[0], ..., key_len[batch_size - 1], query_start[0], ..., query_start[batch_size - 1], query_end[batch_size - 1], key_start[0], ..., key_start[batch_size - 1], key_end[batch_size - 1]] e.g 2D mask with [[1, 1, 1, 0, 0, 0], [1, 1, 1, 1, 1, 0]] converts to this 1D mask is [3, 5, 0, 6, 12, 0, 6, 12] ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> It potentially benefits tnlrv6 and t5(encoder) --------- Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net> Co-authored-by: Kunal Vaishnavi <kvaishnavi@microsoft.com> Co-authored-by: Kunal Vaishnavi <kvaishnavi@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net> |
||
|---|---|---|
| .. | ||
| nodejs/templates | ||
| nuget/templates | ||
| templates | ||
| android-arm64-v8a-QNN-crosscompile-ci-pipeline.yml | ||
| android-x86_64-crosscompile-ci-pipeline.yml | ||
| anybuild.yml | ||
| binary-size-checks-pipeline.yml | ||
| build-perf-test-binaries-pipeline.yml | ||
| c-api-noopenmp-packaging-pipelines.yml | ||
| clean-build-docker-image-cache-pipeline.yml | ||
| linux-ci-pipeline.yml | ||
| linux-cpu-aten-pipeline.yml | ||
| linux-cpu-eager-pipeline.yml | ||
| linux-cpu-minimal-build-ci-pipeline.yml | ||
| linux-dnnl-ci-pipeline.yml | ||
| linux-gpu-ci-pipeline.yml | ||
| linux-gpu-tensorrt-ci-pipeline.yml | ||
| linux-gpu-tensorrt-daily-perf-pipeline.yml | ||
| linux-migraphx-ci-pipeline.yml | ||
| linux-multi-gpu-ci-pipeline.yml | ||
| linux-multi-gpu-tensorrt-ci-pipeline.yml | ||
| linux-openvino-ci-pipeline.yml | ||
| linux-openvino-nightly-pipeline.yml | ||
| linux-qnn-ci-pipeline.yml | ||
| mac-ci-pipeline.yml | ||
| mac-coreml-ci-pipeline.yml | ||
| mac-ios-ci-pipeline.yml | ||
| mac-ios-packaging-pipeline.yml | ||
| mac-objc-static-analysis-ci-pipeline.yml | ||
| mac-react-native-ci-pipeline.yml | ||
| npm-packaging-pipeline.yml | ||
| orttraining-linux-ci-pipeline.yml | ||
| orttraining-linux-external-custom-ops.yml | ||
| orttraining-linux-gpu-amd-e2e-test-ci-pipeline.yml | ||
| orttraining-linux-gpu-ci-pipeline.yml | ||
| orttraining-linux-gpu-distributed-e2e-test-pipeline.yml | ||
| orttraining-linux-gpu-docker-release-pipeline.yml | ||
| orttraining-linux-gpu-ortmodule-distributed-test-ci-pipeline.yml | ||
| orttraining-linux-gpu-ortmodule-test-clear-cache-pipeline.yml | ||
| orttraining-linux-gpu-training-apis.yml | ||
| orttraining-linux-nightly-ortmodule-test-pipeline.yml | ||
| orttraining-mac-ci-pipeline.yml | ||
| orttraining-pai-ci-pipeline.yml | ||
| orttraining-py-packaging-pipeline-cpu.yml | ||
| orttraining-py-packaging-pipeline-cuda116.yml | ||
| orttraining-py-packaging-pipeline-rocm.yml | ||
| post-merge-jobs.yml | ||
| py-package-build-pipeline.yml | ||
| py-package-test-pipeline.yml | ||
| py-packaging-pipeline.yml | ||
| python-checks-ci-pipeline.yml | ||
| sign_ov_ep_binaries.yml | ||
| snpe-ep-nuget-packaging-pipeline.yml | ||
| web-ci-pipeline.yml | ||
| web-packaging-pipeline.yml | ||
| win-ci-fuzz-testing.yml | ||
| win-ci-pipeline.yml | ||
| win-eager-ci-pipeline.yml | ||
| win-gpu-ci-pipeline.yml | ||
| win-gpu-reduce-op-ci-pipeline.yml | ||
| win-gpu-tensorrt-ci-pipeline.yml | ||
| win-qnn-arm64-ci-pipeline.yml | ||
| win-qnn-ci-pipeline.yml | ||