onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-29 03:30:52 +00:00

History

Hector Li 05889b33ef Support loading from model with multiple QNN context binary (#20930 ) ### Description Support loading from model with multiple QNN context binary ### Motivation and Context QNN EP generated context binary model only has one single QNN context. Because of QNN PD memory limitation, large model (>3.5GB) has to be split into 2 smaller models. Then generate the model with context binary. User can load from the smaller models with context binary. The problem is it requires 2 Ort session. User want to glue the split models into 1 (with multiple EPContext nodes) so that they can use 1 Ort session to do the work. QNN EP has limitation which only support loading from 1 single QNN context binary. This PR removes that limitation to unblock this user scenario. --------- Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>		2024-06-06 14:44:57 -07:00
..
contrib_ops	webgpu quickgelu (#20939 )	2024-06-06 08:21:33 -07:00
core	Support loading from model with multiple QNN context binary (#20930 )	2024-06-06 14:44:57 -07:00
python	[Quant tool] Improve performance of int4 weight quantization (#20935 )	2024-06-05 16:48:40 -07:00
test	Support loading from model with multiple QNN context binary (#20930 )	2024-06-06 14:44:57 -07:00
tool/etw
wasm	[js/web] optimize module export and deployment (#20165 )	2024-05-20 09:51:16 -07:00
__init__.py	Bump up version in main from 1.18.0 to 1.19.0 (#20489 )	2024-04-29 20:21:41 -07:00
ReformatSource.ps1
ReformatSourcePython.bat
VSCodeCoverage.runsettings