onnxruntime/tools/ci_build
aciddelgado cbb29d80ff
GQA Rotary and Packed QKV with Flash (#18906)
### Description
These changes add rotary embedding and packed qkv input to gqa. As of
now, the changes are only supported with Flash-Attention (SM >= 80) but
should soon be supported with Memory Efficient Attention as well.



### Motivation and Context
With the fusion of rotary embedding into this Attention op, we hope to
observe some perf gain. The packed QKV should also provide some perf
gain in the context of certain models, like Llama2, that would benefit
from running ops on the fused QKV matrix, rather than the separate Q, K,
and V.

---------

Co-authored-by: Yufeng Li <liyufeng1987@gmail.com>
2024-01-23 16:34:26 -08:00
..
github Replace T4 to A10 in Linux GPU workflow (#19205) 2024-01-23 10:49:24 -08:00
__init__.py
amd_hipify.py undo hipify of __half to rocblas_half (#18573) 2023-11-24 18:04:23 +08:00
build.py GQA Rotary and Packed QKV with Flash (#18906) 2024-01-23 16:34:26 -08:00
clean_docker_image_cache.py
compile_triton.py [Better Engineering] Bump ruff to 0.0.278 and fix new lint errors (#16789) 2023-07-21 12:53:41 -07:00
coverage.py
gen_def.py [TensorRT EP] Refactor OrtTensorRTProviderOptions initialization and make it easy to add new field (#17617) 2023-10-06 14:12:20 -07:00
get_docker_image.py [Better Engineering] Bump ruff to 0.0.278 and fix new lint errors (#16789) 2023-07-21 12:53:41 -07:00
logger.py
op_registration_utils.py [CI] Removes type2 in process_registration and fix Windows GPU Reduced Ops CI Pipeline (#16530) 2023-07-07 18:21:06 +02:00
op_registration_validator.py [CI] Removes type2 in process_registration and fix Windows GPU Reduced Ops CI Pipeline (#16530) 2023-07-07 18:21:06 +02:00
patch_manylinux.py [Better Engineering] Bump ruff to 0.0.278 and fix new lint errors (#16789) 2023-07-21 12:53:41 -07:00
policheck_exclusions.xml
reduce_op_kernels.py Re-organize the transpose optimization and layout transformation files. (#16246) 2023-07-07 08:24:47 +10:00
replace_urls_in_deps.py Add a build validation for Linux ARM64 cross-compile (#18200) 2023-11-08 13:03:18 -08:00
requirements-transformers-test.txt GQA Rotary and Packed QKV with Flash (#18906) 2024-01-23 16:34:26 -08:00
set-trigger-rules.py Pr trggiers generated by code (#17247) 2023-08-30 05:57:03 +08:00
update_tsaoptions.py
upload_python_package_to_azure_storage.py [Linter] Bump ruff and remove pylint (#17797) 2023-10-05 21:07:33 -07:00