onnxruntime/docs
Prathik Rao 11ad299451
Adds ATen fallback for scaled_dot_product_attention (#21107)
### Description
<!-- Describe your changes. -->

Introduces an ATen fallback for
`torch.nn.functional.scaled_dot_product_attention`. This operator was
introduced in torch 2.0 and, since then, has had many updates including
the implementation of memory efficient attention for V100 machines. The
current torchscript exporter exports a subgraph for attention which does
not provide the same memory savings that PyTorch's memory efficient
attention kernel provides. Allowing fallback to PyTorch ATen op for
attention helps mitigate memory spike issues for models leveraging
memory efficient attention.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Memory issues arose when integrating ONNX Runtime Training with AML
Stable Diffusion.

---------

Co-authored-by: root <prathikrao@microsoft.com>
2024-07-22 16:37:04 -07:00
..
c_cxx Remove extraneous javascript includes (#17558) 2023-09-14 20:43:24 -07:00
execution_providers/images
images API Documentation (#8948) 2021-09-09 22:04:51 -07:00
python Fix typos according to reviewdog report. (#21335) 2024-07-22 13:37:32 -07:00
ABI_Dev_Notes.md Fix a typo in ABI_Dev_Notes.md (#17832) 2023-10-09 07:51:34 -07:00
Android_testing.md
C_API_Guidelines.md
cmake_guideline.md fix some typo in docs (#13212) 2022-10-07 15:58:18 -07:00
Coding_Conventions_and_Standards.md [docs] Specify Objective-C max line length. (#16503) 2023-06-28 16:58:23 -07:00
ContribOperators.md [CPU] SparseAttention op (#21110) 2024-07-03 21:51:57 -07:00
FAQ.md [Technical docs] Fixed a couple of old links in FAQ.md (#17415) 2023-09-26 13:38:24 -07:00
How_To_Update_ONNX_Dev_Notes.md Update Dockerfile.cuda (#21042) 2024-06-13 23:50:03 -07:00
Memory_Optimizer.md Flash attention recompute (#20603) 2024-05-21 13:38:19 +08:00
Model_Test.md Update docs/Model_Test.md (#11466) 2024-05-15 11:33:11 -07:00
NotesOnThreading.md
ONNX_Runtime_Server_Usage.md
onnxruntime_dependencies.dot
onnxruntime_dependencies.png
onnxruntime_extensions.md Remove the extensions submodule (#17097) 2023-08-14 10:16:33 -07:00
OperatorKernels.md [CPU] SparseAttention op (#21110) 2024-07-03 21:51:57 -07:00
ORT_Format_Update_in_1.13.md Update ORT format v5 change docs to cover limited backwards compatibility in 1.14. (#14413) 2023-01-25 08:23:12 -08:00
ORT_Use_Triton_Kernel.md Rename a mispelled filename in the documentation (#21066) 2024-06-17 18:18:41 +02:00
ORTMobilePackageOperatorTypeSupport.md
ORTModule_Convergence_Notes.md Fix and enable few ORTModule Unit Tests (#19847) 2024-03-12 10:49:19 +08:00
ORTModule_ModuleWithLoss_Wrapper.md add steps to write modulewithloss wrapper (#16486) 2023-07-11 09:07:35 +08:00
ORTModule_PythonOp_Notes.md Add document for PythonOp (#17888) 2023-10-12 08:36:22 +08:00
ORTModule_Training_Guidelines.md Adds ATen fallback for scaled_dot_product_attention (#21107) 2024-07-22 16:37:04 -07:00
PR_Guidelines.md
Privacy.md
Reduced_Operator_Kernel_build.md
ReleaseManagement.md
Roadmap.md
Server.md
TVM_EP.md Fix: update hyperlinks to the Jupyter notebooks (#16145) 2023-08-21 09:53:05 -07:00
Versioning.md
WinML_principles.md