onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-28 22:56:32 +00:00

History

Hector Li 401d16c671 Enable QNN HTP spill fill buffer setting to save RAM usage. (#22853 ) ### Description Enable QNN HTP spill fill buffer setting to save RAM usage. This feature is available after QNN 2.28. Need to re-generate QNN context binary. https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/htp_backend.html#qnn-htp-backend-api Requirements: 1. Need to re-generate the Onnx model with QNN context binary by set the EP option enable_htp_spill_fill_buffer = 1. 2. Works for a model with multiple Context binaries. Need manually merge 2 Onnx model with context binary into 1 Onnx model. 3. Requires Linux platform if generate the context binary offline since QnnSystem lib is not available for Windows x86_64 platform. No need to do extra thing while running the model inference. The generated EPContext node will have a max_size attribute with the maximum spill fill buffer size for the context binary <img width="353" alt="image" src="https://github.com/user-attachments/assets/a3bf48be-a8da-4381-8a1d-3f2558eea37d"> --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>		2024-12-06 11:36:52 -08:00
..
common	Remove nsync (#20413 )	2024-10-21 15:32:14 -07:00
eager	Fix typos - 1st Wave (#21278 )	2024-07-11 13:35:08 +08:00
framework	Revert "enable serialize prepacked weights into data file (#22256 )" (#22788 )	2024-11-11 09:59:05 -08:00
graph	Revert "enable serialize prepacked weights into data file (#22256 )" (#22788 )	2024-11-11 09:59:05 -08:00
optimizer	Utilize ext data location to reduce qd matmul memory usage (#21451 )	2024-07-30 15:22:46 -07:00
platform	Remove nsync (#20413 )	2024-10-21 15:32:14 -07:00
providers	[CoreML] Create EP by AppendExecutionProvider (#22675 )	2024-11-27 09:26:31 +08:00
session	Enable QNN HTP spill fill buffer setting to save RAM usage. (#22853 )	2024-12-06 11:36:52 -08:00