onnxruntime/docs
aciddelgado cbb29d80ff
GQA Rotary and Packed QKV with Flash (#18906)
### Description
These changes add rotary embedding and packed qkv input to gqa. As of
now, the changes are only supported with Flash-Attention (SM >= 80) but
should soon be supported with Memory Efficient Attention as well.



### Motivation and Context
With the fusion of rotary embedding into this Attention op, we hope to
observe some perf gain. The packed QKV should also provide some perf
gain in the context of certain models, like Llama2, that would benefit
from running ops on the fused QKV matrix, rather than the separate Q, K,
and V.

---------

Co-authored-by: Yufeng Li <liyufeng1987@gmail.com>
2024-01-23 16:34:26 -08:00
..
c_cxx Remove extraneous javascript includes (#17558) 2023-09-14 20:43:24 -07:00
execution_providers/images Remove docs that have been migrated to https://onnxruntime.ai/docs (#6225) 2021-02-05 18:09:27 -08:00
images API Documentation (#8948) 2021-09-09 22:04:51 -07:00
python Increment year to 2024 in conf.py (python documentation) (#19107) 2024-01-19 19:36:19 +01:00
ABI_Dev_Notes.md Fix a typo in ABI_Dev_Notes.md (#17832) 2023-10-09 07:51:34 -07:00
Android_testing.md Removed BUILD.md from master as source now lives in gh-pages (#6709) 2021-02-19 11:34:21 -08:00
C_API_Guidelines.md Replace 'master' branch ref to 'main' in the code (#12547) 2022-08-22 10:48:12 -07:00
cmake_guideline.md fix some typo in docs (#13212) 2022-10-07 15:58:18 -07:00
Coding_Conventions_and_Standards.md [docs] Specify Objective-C max line length. (#16503) 2023-06-28 16:58:23 -07:00
ContribOperators.md GQA Rotary and Packed QKV with Flash (#18906) 2024-01-23 16:34:26 -08:00
FAQ.md [Technical docs] Fixed a couple of old links in FAQ.md (#17415) 2023-09-26 13:38:24 -07:00
How_To_Update_ONNX_Dev_Notes.md Remove exclusions for ONNX model tests that now pass. (#14337) 2023-01-24 08:04:27 +10:00
Memory_Optimizer.md Allow layer-wise recompute (#18566) 2023-12-12 08:44:05 +08:00
Model_Test.md
NotesOnThreading.md Replace 'master' branch ref to 'main' in the code (#12547) 2022-08-22 10:48:12 -07:00
ONNX_Runtime_Server_Usage.md Update docs/ONNX_Runtime_Server_Usage.md (#7818) 2021-05-26 16:17:20 -07:00
onnxruntime_dependencies.dot
onnxruntime_dependencies.png
onnxruntime_extensions.md Remove the extensions submodule (#17097) 2023-08-14 10:16:33 -07:00
OperatorKernels.md GQA Rotary and Packed QKV with Flash (#18906) 2024-01-23 16:34:26 -08:00
ORT_Format_Update_in_1.13.md Update ORT format v5 change docs to cover limited backwards compatibility in 1.14. (#14413) 2023-01-25 08:23:12 -08:00
ORT_Use_Trtion_Kernel.md [ROCm] Add ROCm Triton TunableOp for GroupNorm (#16196) 2023-07-11 13:55:30 +08:00
ORTMobilePackageOperatorTypeSupport.md Replace 'master' branch ref to 'main' in the code (#12547) 2022-08-22 10:48:12 -07:00
ORTModule_Convergence_Notes.md Introduce ZeROOffloadSubscriber for ORTModule (#17006) 2023-08-25 00:15:22 +08:00
ORTModule_ModuleWithLoss_Wrapper.md add steps to write modulewithloss wrapper (#16486) 2023-07-11 09:07:35 +08:00
ORTModule_PythonOp_Notes.md Add document for PythonOp (#17888) 2023-10-12 08:36:22 +08:00
ORTModule_Training_Guidelines.md ORTModule memory improvement (#18924) 2024-01-16 08:57:37 +08:00
PR_Guidelines.md
Privacy.md [C# and Python APIs] Expose knobs to enable/disable platform telemetry collection (#5481) 2020-10-21 10:32:13 -07:00
Python_Dev_Notes.md Changes related to the release binaries requiring Visual C++ 2019 runtime (#3871) 2020-05-12 17:07:06 -07:00
Reduced_Operator_Kernel_build.md replace 'master' branch ref to 'main' for onnx repo (#12678) 2022-08-30 13:41:42 -07:00
ReleaseManagement.md Updated TPN for OpenMPI and cleanup (#3932) 2020-05-14 11:42:44 -07:00
Roadmap.md Replace 'master' branch ref to 'main' in the code (#12547) 2022-08-22 10:48:12 -07:00
Server.md Update documentation for contributing a PR and add deprecation notices for PyOp and ORT server. (#6172) 2020-12-18 02:00:42 -08:00
TVM_EP.md Fix: update hyperlinks to the Jupyter notebooks (#16145) 2023-08-21 09:53:05 -07:00
Versioning.md replace 'master' branch ref to 'main' for onnx repo (#12678) 2022-08-30 13:41:42 -07:00
WinML_principles.md Replace 'master' branch ref to 'main' in the code (#12547) 2022-08-22 10:48:12 -07:00