onnxruntime/docs
aciddelgado 509cb54d6f
softcap gqa (#21683)
### Description
Implement softcap for gqa.

### Motivation and Context
Fixes certain models like Gemma-2 which need softcap to work so they
don't output nan's.
2024-08-30 19:11:04 -07:00
..
c_cxx
execution_providers/images
images
python [Fix] Make python API doc generation in Microsoft-hosted Agent (#21766) 2024-08-20 23:32:38 +08:00
ABI_Dev_Notes.md
Android_testing.md
C_API_Guidelines.md
cmake_guideline.md
Coding_Conventions_and_Standards.md
ContribOperators.md softcap gqa (#21683) 2024-08-30 19:11:04 -07:00
FAQ.md
How_To_Update_ONNX_Dev_Notes.md Update Dockerfile.cuda (#21042) 2024-06-13 23:50:03 -07:00
Memory_Optimizer.md Flash attention recompute (#20603) 2024-05-21 13:38:19 +08:00
Model_Test.md Update docs/Model_Test.md (#11466) 2024-05-15 11:33:11 -07:00
NotesOnThreading.md
ONNX_Runtime_Server_Usage.md
onnxruntime_dependencies.dot
onnxruntime_dependencies.png
onnxruntime_extensions.md
OperatorKernels.md [CUDA] Support CUDA EP blocked quantization in Q/DQ ops. (#21846) 2024-08-30 18:28:00 -07:00
ORT_Format_Update_in_1.13.md
ORT_Use_Triton_Kernel.md Rename a mispelled filename in the documentation (#21066) 2024-06-17 18:18:41 +02:00
ORTModule_Convergence_Notes.md Fix and enable few ORTModule Unit Tests (#19847) 2024-03-12 10:49:19 +08:00
ORTModule_ModuleWithLoss_Wrapper.md
ORTModule_PythonOp_Notes.md
ORTModule_Training_Guidelines.md Adds ATen fallback for scaled_dot_product_attention (#21107) 2024-07-22 16:37:04 -07:00
PR_Guidelines.md
Privacy.md
Reduced_Operator_Kernel_build.md
ReleaseManagement.md
Roadmap.md
Server.md
TVM_EP.md
Versioning.md
WinML_principles.md