onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-03 03:58:54 +00:00

History

Tianlei Wu 95c4fc6877 [CUDA] Add TensorRT fused attention fp16 v2 kernels (#12814 ) * Add TensorRT fused attention fp16 kernels * drop sm 72; seq 512 for sm75; and head_size 32 kernels * Add env variable ORT_DISABLE_FUSED_ATTENTION * exclude files in hipify * update AttentionPastState_dynamic test threshold * fix --use_mask_index in benchmark		2022-09-13 15:16:12 -07:00
..
android_custom_build	Replace references to onnxruntime 'master' with 'main' in Dockerfiles. (#12550 )	2022-08-16 14:13:05 -07:00
ci_build	[CUDA] Add TensorRT fused attention fp16 v2 kernels (#12814 )	2022-09-13 15:16:12 -07:00
doc	Format all python files under onnxruntime with black and isort (#11324 )	2022-04-26 09:35:16 -07:00
nuget	Refactor python packaging pipeline and nuget packaging pipeline (#12945 )	2022-09-13 14:50:31 -07:00
perf_view	fix json format (#11046 )	2022-03-30 16:15:33 -07:00
python	Add --output_dir option to convert_onnx_models_to_ort.py. (#12844 )	2022-09-12 15:36:03 -07:00