onnxruntime/docs
aciddelgado 94c69f55d4
GQA 4 CPU (#20299)
### Description
Support GQA operator on CPU with FP32.



### Motivation and Context
Right now, models generated for CPU and GPU must be different. GQA CPU
allows these models to be the same.
2024-04-22 19:57:05 -07:00
..
c_cxx
execution_providers/images
images API Documentation (#8948) 2021-09-09 22:04:51 -07:00
python Bump ruff to 0.3.2 and black to 24 (#19878) 2024-03-13 10:00:32 -07:00
ABI_Dev_Notes.md
Android_testing.md Removed BUILD.md from master as source now lives in gh-pages (#6709) 2021-02-19 11:34:21 -08:00
C_API_Guidelines.md Replace 'master' branch ref to 'main' in the code (#12547) 2022-08-22 10:48:12 -07:00
cmake_guideline.md
Coding_Conventions_and_Standards.md
ContribOperators.md GQA 4 CPU (#20299) 2024-04-22 19:57:05 -07:00
FAQ.md
How_To_Update_ONNX_Dev_Notes.md Integration with ONNX 1.16.0 (#19745) 2024-04-12 09:46:49 -07:00
Memory_Optimizer.md Prompt layer-wise recompute when applicable (#20126) 2024-04-10 11:50:28 +08:00
Model_Test.md Renaming MKL-DNN as DNNL (#2515) 2019-12-03 07:34:23 -08:00
NotesOnThreading.md
ONNX_Runtime_Server_Usage.md
onnxruntime_dependencies.dot
onnxruntime_dependencies.png
onnxruntime_extensions.md
OperatorKernels.md GQA 4 CPU (#20299) 2024-04-22 19:57:05 -07:00
ORT_Format_Update_in_1.13.md
ORT_Use_Trtion_Kernel.md
ORTMobilePackageOperatorTypeSupport.md
ORTModule_Convergence_Notes.md Fix and enable few ORTModule Unit Tests (#19847) 2024-03-12 10:49:19 +08:00
ORTModule_ModuleWithLoss_Wrapper.md
ORTModule_PythonOp_Notes.md
ORTModule_Training_Guidelines.md Introducing ORTPipelineModule - DeepSpeed Parallel Pipeline Support. (#20287) 2024-04-18 11:30:15 -07:00
PR_Guidelines.md
Privacy.md
Python_Dev_Notes.md
Reduced_Operator_Kernel_build.md
ReleaseManagement.md
Roadmap.md
Server.md
TVM_EP.md
Versioning.md
WinML_principles.md