onnxruntime/include/onnxruntime/core
Hector Li 401d16c671
Enable QNN HTP spill fill buffer setting to save RAM usage. (#22853)
### Description
Enable QNN HTP spill fill buffer setting to save RAM usage.
This feature is available after QNN 2.28. Need to re-generate QNN
context binary.

https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/htp_backend.html#qnn-htp-backend-api

Requirements:
1. Need to re-generate the Onnx model with QNN context binary by set the
EP option enable_htp_spill_fill_buffer = 1.
2. Works for a model with multiple Context binaries. Need manually merge
2 Onnx model with context binary into 1 Onnx model.
3. Requires Linux platform if generate the context binary offline since
QnnSystem lib is not available for Windows x86_64 platform.
No need to do extra thing while running the model inference.

The generated EPContext node will have a max_size attribute with the
maximum spill fill buffer size for the context binary
<img width="353" alt="image"
src="https://github.com/user-attachments/assets/a3bf48be-a8da-4381-8a1d-3f2558eea37d">

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-12-06 11:36:52 -08:00
..
common Remove nsync (#20413) 2024-10-21 15:32:14 -07:00
eager Fix typos - 1st Wave (#21278) 2024-07-11 13:35:08 +08:00
framework Revert "enable serialize prepacked weights into data file (#22256)" (#22788) 2024-11-11 09:59:05 -08:00
graph Revert "enable serialize prepacked weights into data file (#22256)" (#22788) 2024-11-11 09:59:05 -08:00
optimizer Utilize ext data location to reduce qd matmul memory usage (#21451) 2024-07-30 15:22:46 -07:00
platform Remove nsync (#20413) 2024-10-21 15:32:14 -07:00
providers [CoreML] Create EP by AppendExecutionProvider (#22675) 2024-11-27 09:26:31 +08:00
session Enable QNN HTP spill fill buffer setting to save RAM usage. (#22853) 2024-12-06 11:36:52 -08:00