onnxruntime/include/onnxruntime/core/session
Ye Wang 72ce4de07d
cuda graph enhancement (#19636)
### Description
<!-- Describe your changes. -->

1. add a config key in run_options to control cuda graph in runtime.
2. enhance cuda graph class to support mutiple graph saving and
retrieving in one ORT session
3. provide model modification/inference example on Phi2
4. benchmark shows an average of 13% latency reduction in token
generation.



limitation: TRT ep and ROCM ep hasn't applied this feature. we can
revisit this in the future.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-03-07 10:15:18 -08:00
..
environment.h ExecutionProvider API refactor - move allocator from EP level to SessionState level and indexed by OrtDevice (#15833) 2023-06-19 17:44:45 -07:00
experimental_onnxruntime_cxx_api.h
experimental_onnxruntime_cxx_inline.h
onnxruntime_c_api.h [VitisAI] Refactor the VAIEP to use MSFT's standalone API (#19058) 2024-01-31 21:08:26 -08:00
onnxruntime_cxx_api.h [VitisAI] Refactor the VAIEP to use MSFT's standalone API (#19058) 2024-01-31 21:08:26 -08:00
onnxruntime_cxx_inline.h [VitisAI] Refactor the VAIEP to use MSFT's standalone API (#19058) 2024-01-31 21:08:26 -08:00
onnxruntime_float16.h Work on eliminating Internal Compiler Error (#16741) 2023-07-18 10:17:52 -07:00
onnxruntime_lite_custom_op.h Move up members in Lite Custom Op hierarchy for possible memleaks. (#18478) 2023-11-18 15:00:54 -08:00
onnxruntime_run_options_config_keys.h cuda graph enhancement (#19636) 2024-03-07 10:15:18 -08:00
onnxruntime_session_options_config_keys.h [aarch64] Add Sbgemm kernel to accelerate fp32 tensor matmul with bfloat16 (#17031) 2024-01-22 14:43:06 -08:00
snippets.dox