onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-01 03:45:06 +00:00

History

Ye Wang 5eac2c1f41 relational attention bias cuda op (#14149 ) ### Description This cuda op implements the compute_bias() method in T5 Attention including the permutation. note: 1. bias_table needs to be saved in col-major. be careful when implementing fusion script 2. second input(sequence length) is placed on cpu. (using Shape node's output should be good) 3. the first dimension of output is 1, so extra_add_qk in attention should support broadcasting 4. compute_bias() only used in self-attn in t5 TODO: docs change will be applied later ### Motivation and Context It's part of the process of optimizing t5 attention as well as t5 based generation model Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>		2023-01-06 17:32:58 -08:00
..
c_cxx	Document C/C++ API documentation version info conventions. (#10396 )	2022-01-27 10:20:13 -08:00
execution_providers/images
images	API Documentation (#8948 )	2021-09-09 22:04:51 -07:00
python	Add Google Analytics to python apidocs (#13901 )	2022-12-09 15:44:12 -08:00
ABI_Dev_Notes.md	skip windows GPU check if changes only in doc (#13248 )	2022-10-11 13:51:44 +08:00
Android_testing.md
C_API_Guidelines.md	Replace 'master' branch ref to 'main' in the code (#12547 )	2022-08-22 10:48:12 -07:00
cmake_guideline.md	fix some typo in docs (#13212 )	2022-10-07 15:58:18 -07:00
Coding_Conventions_and_Standards.md	Fixed a minor typo (#13194 )	2022-10-05 12:10:14 -07:00
ContribOperators.md	relational attention bias cuda op (#14149 )	2023-01-06 17:32:58 -08:00
FAQ.md	Fix typo enviroment => environment (#13195 )	2022-10-03 17:02:26 -07:00
How_To_Update_ONNX_Dev_Notes.md	Update script to find optimizers that potentially need supported opset updates (#12330 )	2022-08-04 07:37:27 +10:00
Memory_Optimizer.md	Add guidelines for ORTModule (#13553 )	2022-11-04 19:42:10 +08:00
Model_Test.md
NotesOnThreading.md	Replace 'master' branch ref to 'main' in the code (#12547 )	2022-08-22 10:48:12 -07:00
ONNX_Runtime_Server_Usage.md	Update docs/ONNX_Runtime_Server_Usage.md (#7818 )	2021-05-26 16:17:20 -07:00
onnxruntime_dependencies.dot
onnxruntime_dependencies.png
onnxruntime_extensions.md	replace 'master' branch ref to 'main' for onnx repo (#12678 )	2022-08-30 13:41:42 -07:00
OperatorKernels.md	relational attention bias cuda op (#14149 )	2023-01-06 17:32:58 -08:00
ORT_Format_Update_in_1.13.md	Update kernel matching logic: decouple from op schemas and remove kernel def hashes (#12791 )	2022-09-20 14:24:59 -07:00
ORTMobilePackageOperatorTypeSupport.md	Replace 'master' branch ref to 'main' in the code (#12547 )	2022-08-22 10:48:12 -07:00
ORTModule_Training_Guidelines.md	Refactor training build options (#13964 )	2023-01-03 13:28:16 -08:00
PR_Guidelines.md
Privacy.md
Python_Dev_Notes.md
Reduced_Operator_Kernel_build.md	replace 'master' branch ref to 'main' for onnx repo (#12678 )	2022-08-30 13:41:42 -07:00
ReleaseManagement.md
Roadmap.md	Replace 'master' branch ref to 'main' in the code (#12547 )	2022-08-22 10:48:12 -07:00
Server.md
TVM_EP.md	[C#][TVM EP] Fix issues related to using TVM EP in C# front-end (#12958 )	2022-09-16 16:04:59 +02:00
Versioning.md	replace 'master' branch ref to 'main' for onnx repo (#12678 )	2022-08-30 13:41:42 -07:00
WinML_principles.md	Replace 'master' branch ref to 'main' in the code (#12547 )	2022-08-22 10:48:12 -07:00