onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-25 19:48:11 +00:00

History

Yufeng Li 90d1f537cb optimize SLN with large dimension (#18138 ) ### Description <!-- Describe your changes. --> Optimize SkipLayerNorm for large dimension (>=2048) by handling 8 elements in one thread. It avoid the re-writing and re-loading sum of input, skip and bias to main memory. It reduces the latency of dimension 4096 with small batch size from ~18us to ~3.8us on A100. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->		2023-10-30 14:12:17 -07:00
..
contrib_ops	optimize SLN with large dimension (#18138 )	2023-10-30 14:12:17 -07:00
core	[DML EP] Handle non-raw data in dynamic graph compilation (#18160 )	2023-10-30 13:48:34 -07:00
python	Enable global TRT timing cache (#17865 )	2023-10-27 09:23:19 -07:00
test	Augment blockwise quantization (#18101 )	2023-10-30 09:14:37 -07:00
tool/etw
wasm	[js/web/training] Add CreateTrainingSession (#17891 )	2023-10-26 09:22:10 -07:00
__init__.py	Python API to check whether collective ops are available or not (#17730 )	2023-09-29 14:11:05 -07:00
ReformatSource.ps1
ReformatSourcePython.bat
VSCodeCoverage.runsettings