onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-22 22:01:08 +00:00

History

Chi Lo 4e3cff60fd CUDA graph support for TRT EP (#16081 ) CUDA EP already supports [CUDA graph](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cuda-graphs), also we observed some models can benefit from using CUDA graph with `trtexec`. Therefore, this PR enables the CUDA graph support for TRT EP. The implementation is based on https://github.com/microsoft/onnxruntime/pull/9978 with the same [constraints](https://github.com/microsoft/onnxruntime/pull/9978) as below: - Models with control-flow ops (i.e. If, Loop and Scan ops) are not supported. - Usage of CUDA Graphs is limited to models where-in all the model ops (graph nodes) can be partitioned to the TRT EP. - The input/output types of models need to be tensors. - Shapes of inputs/outputs cannot change across inference calls. - IObinding is required.		2023-06-21 09:36:45 -07:00
..
cuda_ops.cu
custom_op_utils.cc	Create a new C API KernelContext_GetAllocator() for Custom Op scenario (#15591 )	2023-04-23 21:54:35 -07:00
custom_op_utils.h	Single-schema-multi-kernel (#15184 )	2023-04-27 13:39:59 -07:00
fns_candy_style_transfer.c
onnx_protobuf.h
test_allocator.cc	Run clang-format in CI (#15524 )	2023-04-18 09:26:58 -07:00
test_fixture.h	Run clang-format in CI (#15524 )	2023-04-18 09:26:58 -07:00
test_inference.cc	CUDA graph support for TRT EP (#16081 )	2023-06-21 09:36:45 -07:00
test_io_types.cc	Run clang-format in CI (#15524 )	2023-04-18 09:26:58 -07:00
test_model_loading.cc	Run clang-format in CI (#15524 )	2023-04-18 09:26:58 -07:00
test_nontensor_types.cc	Run clang-format in CI (#15524 )	2023-04-18 09:26:58 -07:00
test_ort_format_models.cc	Changes to support standalone custom ops in a minimal build. (#14497 )	2023-03-01 11:22:54 +10:00
test_run_options.cc
test_session_options.cc	[CUDA] Update fused MHA to support flash attention and causal mask (#13953 )	2022-12-31 10:33:54 -08:00
utils.cc
utils.h