onnxruntime/onnxruntime
Hector Li 05889b33ef
Support loading from model with multiple QNN context binary (#20930)
### Description
Support loading from model with multiple QNN context binary

### Motivation and Context
QNN EP generated context binary model only has one single QNN context.
Because of QNN PD memory limitation, large model (>3.5GB) has to be split into 2 smaller models. Then generate the model with context binary. User can load from the smaller models with context binary. The problem is it requires 2 Ort session. User want to glue the split models into 1 (with multiple EPContext nodes) so that they can use 1 Ort session to do the work.
QNN EP has limitation which only support loading from 1 single QNN context binary. This PR removes that limitation to unblock this user scenario.

---------

Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>
2024-06-06 14:44:57 -07:00
..
contrib_ops webgpu quickgelu (#20939) 2024-06-06 08:21:33 -07:00
core Support loading from model with multiple QNN context binary (#20930) 2024-06-06 14:44:57 -07:00
python [Quant tool] Improve performance of int4 weight quantization (#20935) 2024-06-05 16:48:40 -07:00
test Support loading from model with multiple QNN context binary (#20930) 2024-06-06 14:44:57 -07:00
tool/etw
wasm [js/web] optimize module export and deployment (#20165) 2024-05-20 09:51:16 -07:00
__init__.py Bump up version in main from 1.18.0 to 1.19.0 (#20489) 2024-04-29 20:21:41 -07:00
ReformatSource.ps1
ReformatSourcePython.bat
VSCodeCoverage.runsettings