onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-14 20:48:00 +00:00

History

Tianlei Wu d79e3c5791 Extend Attention Bias Broadcast Support (#21710 ) ### Description Previously, MultiHeadAttention supports relative position bias of shape [1, N, S, T] or [B, N, S, T], and DecoderMaskedMultiHeadAttention supports [1, N, S, T]. This will extend the support to allow [1, N, S, T], [B, N, S, T], [B, 1, S, T] and [1, 1, S, T] for CUDA and CPU EPs. - [x] Rename the input of "relative position bias" to "attention bias" because it can also be used for other types of bias, like ALiBi (Attention with Linear Biases) or attention mask. - [x] Update unfused kernel to support broadcasting 2nd dimension of attention bias. - [x] Update efficient attention to support broadcasting 2nd dimension of attention bias. - [x] Update operators (MultiHeadAttention, DecoderMaskedMultiHeadAttention, Attention, PackedAttention, PackedMultiHeadAttention) to support broadcast attention bias on CUDA and CPU EPs. - [x] Update ROCm, DML and WebGPU naming to be consistent. (Note that those EPs do not support broadcasting attention_bias for now). - [x] Add attention bias tests for MultiHeadAttention. - [x] Update operator documents - [x] Update benchmark script Other changes: * Fix some checks in multihead-attention.ts * Add helper functions to dump tensors given dimensions.		2024-08-16 15:40:04 -07:00
..
c_cxx	Remove extraneous javascript includes (#17558 )	2023-09-14 20:43:24 -07:00
execution_providers/images
images
python	[Fix] Python API doc generation (#21717 )	2024-08-14 08:48:29 +08:00
ABI_Dev_Notes.md	Fix a typo in ABI_Dev_Notes.md (#17832 )	2023-10-09 07:51:34 -07:00
Android_testing.md
C_API_Guidelines.md
cmake_guideline.md
Coding_Conventions_and_Standards.md	[docs] Specify Objective-C max line length. (#16503 )	2023-06-28 16:58:23 -07:00
ContribOperators.md	Extend Attention Bias Broadcast Support (#21710 )	2024-08-16 15:40:04 -07:00
FAQ.md	[Technical docs] Fixed a couple of old links in `FAQ.md` (#17415 )	2023-09-26 13:38:24 -07:00
How_To_Update_ONNX_Dev_Notes.md	Update Dockerfile.cuda (#21042 )	2024-06-13 23:50:03 -07:00
Memory_Optimizer.md	Flash attention recompute (#20603 )	2024-05-21 13:38:19 +08:00
Model_Test.md	Update docs/Model_Test.md (#11466 )	2024-05-15 11:33:11 -07:00
NotesOnThreading.md
ONNX_Runtime_Server_Usage.md
onnxruntime_dependencies.dot
onnxruntime_dependencies.png
onnxruntime_extensions.md	Remove the extensions submodule (#17097 )	2023-08-14 10:16:33 -07:00
OperatorKernels.md	Extend Attention Bias Broadcast Support (#21710 )	2024-08-16 15:40:04 -07:00
ORT_Format_Update_in_1.13.md
ORT_Use_Triton_Kernel.md	Rename a mispelled filename in the documentation (#21066 )	2024-06-17 18:18:41 +02:00
ORTModule_Convergence_Notes.md	Fix and enable few ORTModule Unit Tests (#19847 )	2024-03-12 10:49:19 +08:00
ORTModule_ModuleWithLoss_Wrapper.md	add steps to write modulewithloss wrapper (#16486 )	2023-07-11 09:07:35 +08:00
ORTModule_PythonOp_Notes.md	Add document for PythonOp (#17888 )	2023-10-12 08:36:22 +08:00
ORTModule_Training_Guidelines.md	Adds ATen fallback for scaled_dot_product_attention (#21107 )	2024-07-22 16:37:04 -07:00
PR_Guidelines.md
Privacy.md
Reduced_Operator_Kernel_build.md
ReleaseManagement.md
Roadmap.md
Server.md
TVM_EP.md	Fix: update hyperlinks to the Jupyter notebooks (#16145 )	2023-08-21 09:53:05 -07:00
Versioning.md
WinML_principles.md