onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-25 22:26:24 +00:00

History

Ye Wang 72ce4de07d cuda graph enhancement (#19636 ) ### Description <!-- Describe your changes. --> 1. add a config key in run_options to control cuda graph in runtime. 2. enhance cuda graph class to support mutiple graph saving and retrieving in one ORT session 3. provide model modification/inference example on Phi2 4. benchmark shows an average of 13% latency reduction in token generation. limitation: TRT ep and ROCM ep hasn't applied this feature. we can revisit this in the future. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->		2024-03-07 10:15:18 -08:00
..
common	ORT ETW dynamic logging that improves ORT diagnosability & performance (#18882 )	2024-01-11 12:43:27 -08:00
eager	Run clang-format in CI (#15524 )	2023-04-18 09:26:58 -07:00
framework	cuda graph enhancement (#19636 )	2024-03-07 10:15:18 -08:00
graph	Add SpaceToDepth and DepthToSpace CUDA NHWC Ops (#19646 )	2024-03-06 12:35:55 -08:00
optimizer	fix compilation error in no absl build (#15769 )	2023-05-02 08:20:49 -07:00
platform	Bump linter versions (#18341 )	2023-11-08 13:04:40 -08:00
providers	Don't reduce warning level for CUDA build on Windows (#19663 )	2024-03-06 15:03:55 +10:00
session	cuda graph enhancement (#19636 )	2024-03-07 10:15:18 -08:00