mirror of
https://github.com/saymrwulf/onnxruntime.git
synced 2026-06-29 03:30:52 +00:00
### Description <!-- Describe your changes. --> 1. add a config key in run_options to control cuda graph in runtime. 2. enhance cuda graph class to support mutiple graph saving and retrieving in one ORT session 3. provide model modification/inference example on Phi2 4. benchmark shows an average of 13% latency reduction in token generation. limitation: TRT ep and ROCM ep hasn't applied this feature. we can revisit this in the future. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> |
||
|---|---|---|
| .. | ||
| environment.h | ||
| experimental_onnxruntime_cxx_api.h | ||
| experimental_onnxruntime_cxx_inline.h | ||
| onnxruntime_c_api.h | ||
| onnxruntime_cxx_api.h | ||
| onnxruntime_cxx_inline.h | ||
| onnxruntime_float16.h | ||
| onnxruntime_lite_custom_op.h | ||
| onnxruntime_run_options_config_keys.h | ||
| onnxruntime_session_options_config_keys.h | ||
| snippets.dox | ||