onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-29 03:30:52 +00:00

History

Ye Wang 72ce4de07d cuda graph enhancement (#19636 ) ### Description <!-- Describe your changes. --> 1. add a config key in run_options to control cuda graph in runtime. 2. enhance cuda graph class to support mutiple graph saving and retrieving in one ORT session 3. provide model modification/inference example on Phi2 4. benchmark shows an average of 13% latency reduction in token generation. limitation: TRT ep and ROCM ep hasn't applied this feature. we can revisit this in the future. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->		2024-03-07 10:15:18 -08:00
..
environment.h	ExecutionProvider API refactor - move allocator from EP level to SessionState level and indexed by OrtDevice (#15833 )	2023-06-19 17:44:45 -07:00
experimental_onnxruntime_cxx_api.h
experimental_onnxruntime_cxx_inline.h
onnxruntime_c_api.h	[VitisAI] Refactor the VAIEP to use MSFT's standalone API (#19058 )	2024-01-31 21:08:26 -08:00
onnxruntime_cxx_api.h	[VitisAI] Refactor the VAIEP to use MSFT's standalone API (#19058 )	2024-01-31 21:08:26 -08:00
onnxruntime_cxx_inline.h	[VitisAI] Refactor the VAIEP to use MSFT's standalone API (#19058 )	2024-01-31 21:08:26 -08:00
onnxruntime_float16.h	Work on eliminating Internal Compiler Error (#16741 )	2023-07-18 10:17:52 -07:00
onnxruntime_lite_custom_op.h	Move up members in Lite Custom Op hierarchy for possible memleaks. (#18478 )	2023-11-18 15:00:54 -08:00
onnxruntime_run_options_config_keys.h	cuda graph enhancement (#19636 )	2024-03-07 10:15:18 -08:00
onnxruntime_session_options_config_keys.h	[aarch64] Add Sbgemm kernel to accelerate fp32 tensor matmul with bfloat16 (#17031 )	2024-01-22 14:43:06 -08:00
snippets.dox