onnxruntime/include/onnxruntime/core
Ye Wang 72ce4de07d
cuda graph enhancement (#19636)
### Description
<!-- Describe your changes. -->

1. add a config key in run_options to control cuda graph in runtime.
2. enhance cuda graph class to support mutiple graph saving and
retrieving in one ORT session
3. provide model modification/inference example on Phi2
4. benchmark shows an average of 13% latency reduction in token
generation.



limitation: TRT ep and ROCM ep hasn't applied this feature. we can
revisit this in the future.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-03-07 10:15:18 -08:00
..
common ORT ETW dynamic logging that improves ORT diagnosability & performance (#18882) 2024-01-11 12:43:27 -08:00
eager Run clang-format in CI (#15524) 2023-04-18 09:26:58 -07:00
framework cuda graph enhancement (#19636) 2024-03-07 10:15:18 -08:00
graph Add SpaceToDepth and DepthToSpace CUDA NHWC Ops (#19646) 2024-03-06 12:35:55 -08:00
optimizer fix compilation error in no absl build (#15769) 2023-05-02 08:20:49 -07:00
platform Bump linter versions (#18341) 2023-11-08 13:04:40 -08:00
providers Don't reduce warning level for CUDA build on Windows (#19663) 2024-03-06 15:03:55 +10:00
session cuda graph enhancement (#19636) 2024-03-07 10:15:18 -08:00