onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-05 04:17:53 +00:00

History

Ye Wang 5eac2c1f41 relational attention bias cuda op (#14149 ) ### Description This cuda op implements the compute_bias() method in T5 Attention including the permutation. note: 1. bias_table needs to be saved in col-major. be careful when implementing fusion script 2. second input(sequence length) is placed on cpu. (using Shape node's output should be good) 3. the first dimension of output is 1, so extra_add_qk in attention should support broadcasting 4. compute_bias() only used in self-attn in t5 TODO: docs change will be applied later ### Motivation and Context It's part of the process of optimizing t5 attention as well as t5 based generation model Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>		2023-01-06 17:32:58 -08:00
..
contrib_ops	relational attention bias cuda op (#14149 )	2023-01-06 17:32:58 -08:00
core	relational attention bias cuda op (#14149 )	2023-01-06 17:32:58 -08:00
python	Add CrossAttention operator (#14146 )	2023-01-06 14:27:40 -08:00
test	relational attention bias cuda op (#14149 )	2023-01-06 17:32:58 -08:00
tool/etw
wasm	[wasm] fix session option setting of mem_pattern (#13858 )	2022-12-07 13:15:44 -08:00
__init__.py
ReformatSource.ps1
ReformatSourcePython.bat
VSCodeCoverage.runsettings