onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-08 17:17:15 +00:00

History

Adrian Lizarraga dea425e7c1 [QNN/CPU EP] Add 16-bit Quantize/Dequantize contrib ops (#17015 ) ### Description - Adds 16-bit integer support to: - Quantization kernel implementations: Intel, Neon, and Power intrinsics - DequantizeLinear and QuantizeLinear contrib ops - QNN EP Quantize and Dequantize operators - Python quantization scripts - Disables QDQ fusions for most 16-bit QDQ node groups (need to add 16-bit support to QLinear* ops) - Retains support for dropping QDQ nodes from Split, Gather, Reshape, Transpose, Squeeze, and Unsqueeze node groups. Sample python code to generate QDQ model with 16-bit activations and 8-bit weights: ```python quantize_static( input_model_path, output_model_path, data_reader, quant_format=args.quant_format, per_channel=args.per_channel, activation_type=QuantType.QUInt16, weight_type=QuantType.QUInt8, extra_options={"DedicatedQDQPair": True, "ForceQuantizeNoInputCheck": True, "UseQDQContribOps": True}, ) ``` Note that enabling the `UseQDQContribOps` extra option is not strictly necessary. If the 16bit types are used without enabling `UseQDQContribOps`, the QDQ ops domains are overridden to 'com.microsoft', and a warning is printed to stdout. ### Automated Tests MLAS/CPU EP: - [x] 16-bit QuantizeLinear computation - [x] 16-bit DequantizeLinear computation Optimizer: - [x] Transpose QDQ fusion - [x] Gather QDQ fusion - [x] Reshape QDQ fusion - [x] Squeeze QDQ fusion - [x] Unsqueeze QDQ fusion - [x] Split drop QDQ - [x] DoubleQDQPairRemover - [x] Transpose optimization - [x] EnsureUniqueDQForNodeUnit - [x] Common subexpression elimination (DQ not removed) - [x] Constant folding QNN EP: - [x] Conv 16-bit activations, 8-bit weights - [x] MatMul 16-bit activations, 8-bit weights - [x] Unary 16-bit QDQ ops - [x] Binary 16-bit QDQ ops Quantization tool: - [x] Test creation of 16-bit QDQ model ### Motivation and Context Support mixed precision (8bit weights, 16bit activations) models. --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>		2023-09-18 09:43:34 -07:00
..
c_cxx	Remove extraneous javascript includes (#17558 )	2023-09-14 20:43:24 -07:00
execution_providers/images
images
python	Openvino ep ort 5.1 (#17042 )	2023-08-09 11:50:10 -07:00
ABI_Dev_Notes.md	skip windows GPU check if changes only in doc (#13248 )	2022-10-11 13:51:44 +08:00
Android_testing.md
C_API_Guidelines.md
cmake_guideline.md	fix some typo in docs (#13212 )	2022-10-07 15:58:18 -07:00
Coding_Conventions_and_Standards.md	[docs] Specify Objective-C max line length. (#16503 )	2023-06-28 16:58:23 -07:00
ContribOperators.md	[QNN/CPU EP] Add 16-bit Quantize/Dequantize contrib ops (#17015 )	2023-09-18 09:43:34 -07:00
FAQ.md
How_To_Update_ONNX_Dev_Notes.md	Remove exclusions for ONNX model tests that now pass. (#14337 )	2023-01-24 08:04:27 +10:00
Memory_Optimizer.md	Add guidelines for ORTModule (#13553 )	2022-11-04 19:42:10 +08:00
Model_Test.md
NotesOnThreading.md
ONNX_Runtime_Server_Usage.md
onnxruntime_dependencies.dot
onnxruntime_dependencies.png
onnxruntime_extensions.md	Remove the extensions submodule (#17097 )	2023-08-14 10:16:33 -07:00
OperatorKernels.md	[QNN/CPU EP] Add 16-bit Quantize/Dequantize contrib ops (#17015 )	2023-09-18 09:43:34 -07:00
ORT_Format_Update_in_1.13.md	Update ORT format v5 change docs to cover limited backwards compatibility in 1.14. (#14413 )	2023-01-25 08:23:12 -08:00
ORT_Use_Trtion_Kernel.md	[ROCm] Add ROCm Triton TunableOp for GroupNorm (#16196 )	2023-07-11 13:55:30 +08:00
ORTMobilePackageOperatorTypeSupport.md
ORTModule_Convergence_Notes.md	Introduce ZeROOffloadSubscriber for ORTModule (#17006 )	2023-08-25 00:15:22 +08:00
ORTModule_ModuleWithLoss_Wrapper.md	add steps to write modulewithloss wrapper (#16486 )	2023-07-11 09:07:35 +08:00
ORTModule_Training_Guidelines.md	Use full qualified name for PythonOp export (#17021 )	2023-08-09 10:58:33 +08:00
PR_Guidelines.md
Privacy.md
Python_Dev_Notes.md
Reduced_Operator_Kernel_build.md
ReleaseManagement.md
Roadmap.md
Server.md
TVM_EP.md	Fix: update hyperlinks to the Jupyter notebooks (#16145 )	2023-08-21 09:53:05 -07:00
Versioning.md
WinML_principles.md