onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-03 03:58:54 +00:00

History

Adrian Lizarraga 84d48b6ad6 [QNN EP] Add provider option to offload graph I/O quantization/dequantization to the CPU EP (#22436 ) ### Description Adds QNN provider option `offload_graph_io_quantization` to offload graph input quantization and graph output dequantization to the CPU EP. Option is disabled by default to maintain current behavior. ### Motivation and Context Offloading the handling of I/O quantization to the CPU EP significantly improves inference latency for many models.	2024-10-16 15:00:53 -07:00
..
onnxruntime/core	[QNN EP] Add provider option to offload graph I/O quantization/dequantization to the CPU EP (#22436 )	2024-10-16 15:00:53 -07:00

Adrian Lizarraga 84d48b6ad6

[QNN EP] Add provider option to offload graph I/O quantization/dequantization to the CPU EP (#22436 )

### Description
Adds QNN provider option `offload_graph_io_quantization` to offload
graph input quantization and graph output dequantization to the CPU EP.
Option is disabled by default to maintain current behavior.


### Motivation and Context
Offloading the handling of I/O quantization to the CPU EP significantly
improves inference latency for many models.

2024-10-16 15:00:53 -07:00

onnxruntime/core

[QNN EP] Add provider option to offload graph I/O quantization/dequantization to the CPU EP (#22436 )

2024-10-16 15:00:53 -07:00