onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-16 18:31:27 +00:00

History

Prathik Rao 11ad299451 Adds ATen fallback for scaled_dot_product_attention (#21107 ) ### Description <!-- Describe your changes. --> Introduces an ATen fallback for `torch.nn.functional.scaled_dot_product_attention`. This operator was introduced in torch 2.0 and, since then, has had many updates including the implementation of memory efficient attention for V100 machines. The current torchscript exporter exports a subgraph for attention which does not provide the same memory savings that PyTorch's memory efficient attention kernel provides. Allowing fallback to PyTorch ATen op for attention helps mitigate memory spike issues for models leveraging memory efficient attention. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Memory issues arose when integrating ONNX Runtime Training with AML Stable Diffusion. --------- Co-authored-by: root <prathikrao@microsoft.com>		2024-07-22 16:37:04 -07:00
..
c_cxx	Remove extraneous javascript includes (#17558 )	2023-09-14 20:43:24 -07:00
execution_providers/images
images
python	Fix typos according to reviewdog report. (#21335 )	2024-07-22 13:37:32 -07:00
ABI_Dev_Notes.md	Fix a typo in ABI_Dev_Notes.md (#17832 )	2023-10-09 07:51:34 -07:00
Android_testing.md
C_API_Guidelines.md
cmake_guideline.md
Coding_Conventions_and_Standards.md
ContribOperators.md	[CPU] SparseAttention op (#21110 )	2024-07-03 21:51:57 -07:00
FAQ.md	[Technical docs] Fixed a couple of old links in `FAQ.md` (#17415 )	2023-09-26 13:38:24 -07:00
How_To_Update_ONNX_Dev_Notes.md	Update Dockerfile.cuda (#21042 )	2024-06-13 23:50:03 -07:00
Memory_Optimizer.md	Flash attention recompute (#20603 )	2024-05-21 13:38:19 +08:00
Model_Test.md	Update docs/Model_Test.md (#11466 )	2024-05-15 11:33:11 -07:00
NotesOnThreading.md
ONNX_Runtime_Server_Usage.md
onnxruntime_dependencies.dot
onnxruntime_dependencies.png
onnxruntime_extensions.md	Remove the extensions submodule (#17097 )	2023-08-14 10:16:33 -07:00
OperatorKernels.md	[CPU] SparseAttention op (#21110 )	2024-07-03 21:51:57 -07:00
ORT_Format_Update_in_1.13.md
ORT_Use_Triton_Kernel.md	Rename a mispelled filename in the documentation (#21066 )	2024-06-17 18:18:41 +02:00
ORTMobilePackageOperatorTypeSupport.md
ORTModule_Convergence_Notes.md	Fix and enable few ORTModule Unit Tests (#19847 )	2024-03-12 10:49:19 +08:00
ORTModule_ModuleWithLoss_Wrapper.md
ORTModule_PythonOp_Notes.md	Add document for PythonOp (#17888 )	2023-10-12 08:36:22 +08:00
ORTModule_Training_Guidelines.md	Adds ATen fallback for scaled_dot_product_attention (#21107 )	2024-07-22 16:37:04 -07:00
PR_Guidelines.md
Privacy.md
Reduced_Operator_Kernel_build.md
ReleaseManagement.md
Roadmap.md
Server.md
TVM_EP.md	Fix: update hyperlinks to the Jupyter notebooks (#16145 )	2023-08-21 09:53:05 -07:00
Versioning.md
WinML_principles.md