onnxruntime/tools
Tianlei Wu 95c4fc6877
[CUDA] Add TensorRT fused attention fp16 v2 kernels (#12814)
* Add TensorRT fused attention fp16 kernels
* drop sm 72;  seq 512 for sm75; and head_size 32 kernels
* Add env variable ORT_DISABLE_FUSED_ATTENTION
* exclude files in hipify
* update AttentionPastState_dynamic test threshold
* fix --use_mask_index in benchmark
2022-09-13 15:16:12 -07:00
..
android_custom_build Replace references to onnxruntime 'master' with 'main' in Dockerfiles. (#12550) 2022-08-16 14:13:05 -07:00
ci_build [CUDA] Add TensorRT fused attention fp16 v2 kernels (#12814) 2022-09-13 15:16:12 -07:00
doc Format all python files under onnxruntime with black and isort (#11324) 2022-04-26 09:35:16 -07:00
nuget Refactor python packaging pipeline and nuget packaging pipeline (#12945) 2022-09-13 14:50:31 -07:00
perf_view
python Add --output_dir option to convert_onnx_models_to_ort.py. (#12844) 2022-09-12 15:36:03 -07:00