onnxruntime/include
Ye Wang 72ce4de07d
cuda graph enhancement (#19636)
### Description
<!-- Describe your changes. -->

1. add a config key in run_options to control cuda graph in runtime.
2. enhance cuda graph class to support mutiple graph saving and
retrieving in one ORT session
3. provide model modification/inference example on Phi2
4. benchmark shows an average of 13% latency reduction in token
generation.



limitation: TRT ep and ROCM ep hasn't applied this feature. we can
revisit this in the future.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-03-07 10:15:18 -08:00
..
onnxruntime/core cuda graph enhancement (#19636) 2024-03-07 10:15:18 -08:00