onnxruntime/cmake
Adrian Lizarraga e066fca777
[Quantization] Tensor quant overrides and QNN EP quantization configuration (#18465)
### Description
#### 1. Adds `TensorQuantOverrides` extra option
Allows specifying a dictionary of tensor-level quantization overrides:
```
TensorQuantOverrides = dictionary :
    Default is {}. Set tensor quantization overrides. The key is a tensor name and the value is a
    list of dictionaries. For per-tensor quantization, the list contains a single dictionary. For
    per-channel quantization, the list contains a dictionary for each channel in the tensor.
    Each dictionary contains optional overrides with the following keys and values.
          'quant_type' = QuantType : The tensor's quantization data type.
          'scale' =  Float         : The scale value to use. Must also specify `zero_point` if set.
          'zero_point' = Int       : The zero-point value to use. Must also specify `scale` is set.
          'symmetric' = Bool       : If the tensor should use symmetric quantization. Invalid if also
                                     set `scale` or `zero_point`.
          'reduce_range' = Bool    : If the quantization range should be reduced. Invalid if also
                                     set `scale` or `zero_point`.
          'rmax' = Float           : Override the maximum real tensor value in calibration data.
                                     Invalid if also set `scale` or `zero_point`.
          'rmin' = Float           : Override the minimum real tensor value in calibration data.
                                     Invalid if also set `scale` or `zero_point`.
```

- All of the options are optional.
- Some combinations are invalid.
- Ex: `rmax` and `rmin` are unnecessary if the `zero_point` and `scale`
are also specified.

Example for per-tensor quantization overrides:
```Python3
extra_options = {
    "TensorQuantOverrides": {
        "SIG_OUT": [{"scale": 1.0, "zero_point": 127}],
        "WGT": [{"quant_type": quantization.QuantType.QInt8, "symmetric": True, "reduce_range": True}],
        "BIAS": [{"quant_type": quantization.QuantType.QInt8, "symmetric": True, "reduce_range": True}],
    },
}
```

Example for per-channel quantization overrides (Conv weight and bias):
```Python3
extra_options = {
    "TensorQuantOverrides": {
        "WGT": [
            {
                "quant_type": quantization.QuantType.QUInt8,
                "rmin": 0.0,
                "rmax": 2.5,
                "reduce_range": True,
            },
            {
                "quant_type": quantization.QuantType.QUInt8,
                "rmin": 0.2,
                "rmax": 2.55,
                "reduce_range": False,
            },
        ],
        "BIAS": [
            {"zero_point": 0, "scale": 0.000621},
            {"zero_point": 0, "scale": 0.23},
        ],
    },
}
```

#### 2. Adds utilities to get the default QDQ configs for QNN EP
Added a `quantization.execution_providers.qnn.get_qnn_qdq_config` method
that inspects the model and returns suitable quantization
configurations.

Example usage:
```python3
from quantization import quantize, QuantType
from quantization.execution_providers.qnn import get_qnn_qdq_config

qnn_config = get_qnn_qdq_config(input_model_path,
                                data_reader,
                                activation_type=QuantType.QUInt16,
                                weight_type=QuantType.QUInt8)
                                
quantize(input_model_path,
         output_model_path,
         qnn_config)
```

### Motivation and Context
Make it possible to create more QDQ models that run on QNN EP.

---------

Signed-off-by: adrianlizarraga <adlizarraga@microsoft.com>
2023-12-04 17:54:58 -08:00
..
external [DNNL] add Arm Compute Library (ACL) backend for dnnl execution provider (#15847) 2023-12-01 09:16:44 -08:00
patches Fix Objective-C static analysis build (#18606) 2023-11-28 17:14:20 -08:00
tensorboard
adjust_global_compile_flags.cmake Update cmake to 3.27 and upgrade Linux CUDA docker files from CentOS7 to UBI8 (#16856) 2023-09-05 18:12:10 -07:00
CMakeLists.txt Revert "remove full protobuf requirement for tensorrt ep" (#18626) 2023-11-29 22:27:51 -08:00
CMakeSettings.json
codeconv.runsettings
deps.txt [ROCm] Update ck for GemmFloat8 (#18487) 2023-11-23 12:06:19 +08:00
deps_update_and_upload.py [Linter] Bump ruff and remove pylint (#17797) 2023-10-05 21:07:33 -07:00
EnableVisualStudioCodeAnalysis.props
gdk_toolchain.cmake
Info.plist.in
libonnxruntime.pc.cmake.in
linux_arm32_crosscompile_toolchain.cmake Add a build validation for Linux ARM64 cross-compile (#18200) 2023-11-08 13:03:18 -08:00
linux_arm64_crosscompile_toolchain.cmake Add a build validation for Linux ARM64 cross-compile (#18200) 2023-11-08 13:03:18 -08:00
nuget_helpers.cmake
onnxruntime.cmake Add MacOS build to ORT C Pod (#18550) 2023-11-28 10:11:53 -08:00
onnxruntime_codegen_tvm.cmake
onnxruntime_common.cmake Update C/C++ dependencies: abseil, date, nsync, googletest, wil, mp11, cpuinfo and safeint (#15470) 2023-09-08 13:35:04 -07:00
onnxruntime_compile_triton_kernel.cmake [ROCm] Add ROCm Triton TunableOp for GroupNorm (#16196) 2023-07-11 13:55:30 +08:00
onnxruntime_config.h.in Enabling c++ 20 in MacOS build (#16187) 2023-09-26 11:27:02 -07:00
onnxruntime_csharp.cmake
onnxruntime_flatbuffers.cmake Rework some external targets to ease building with -DFETCHCONTENT_FULLY_DISCONNECTED=ON (#15323) 2023-04-03 17:45:12 -07:00
onnxruntime_framework.cmake [C#, CPP] Introduce Float16/BFloat16 support and tests for C#, C++ (#16506) 2023-07-14 10:46:52 -07:00
onnxruntime_framework.natvis [C#, CPP] Introduce Float16/BFloat16 support and tests for C#, C++ (#16506) 2023-07-14 10:46:52 -07:00
onnxruntime_fuzz_test.cmake
onnxruntime_graph.cmake Pre-link when creating static library for apple framework (#18241) 2023-11-03 23:38:29 +10:00
onnxruntime_ios.toolchain.cmake
onnxruntime_java.cmake Update build option for training in java to enable_training_api (#15638) 2023-04-24 11:53:08 -07:00
onnxruntime_java_unittests.cmake Update build option for training in java to enable_training_api (#15638) 2023-04-24 11:53:08 -07:00
onnxruntime_kernel_explorer.cmake [ROCm] TunableOp: Update rocBLAS get_solutions API (since ROCm5.6) (#16657) 2023-07-13 11:20:26 +08:00
onnxruntime_language_interop_ops.cmake
onnxruntime_mlas.cmake MLAS AArch64 quantized int4 Gemm kernel (#18031) 2023-11-15 09:31:54 -08:00
onnxruntime_nodejs.cmake Added DML and CUDA provider support in onnxruntime-node (#16050) 2023-08-25 16:57:06 -07:00
onnxruntime_objectivec.cmake Objective C Training API: TrainingSession (#16374) 2023-06-28 09:13:56 -07:00
onnxruntime_opschema_lib.cmake
onnxruntime_optimizer.cmake Memory optimization refactor and refinement (#17481) 2023-11-23 11:39:00 +08:00
onnxruntime_providers.cmake Add API for NPU Device Selection in the DML EP (#17612) 2023-10-11 14:53:00 -07:00
onnxruntime_providers_acl.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_armnn.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_azure.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_cann.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_coreml.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_cpu.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_cuda.cmake Adding unit test for sm80 prepack (#18514) 2023-11-28 10:01:09 -08:00
onnxruntime_providers_dml.cmake Add API for NPU Device Selection in the DML EP (#17612) 2023-10-11 14:53:00 -07:00
onnxruntime_providers_dnnl.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_js.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_migraphx.cmake CUDA EP vs ROCM EP hipify audit (#17776) 2023-10-13 10:13:53 +08:00
onnxruntime_providers_nnapi.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_openvino.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_qnn.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_rknpu.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_rocm.cmake CUDA EP vs ROCM EP hipify audit (#17776) 2023-10-13 10:13:53 +08:00
onnxruntime_providers_tensorrt.cmake [TensorRT EP] Properly set CUDA_INCLUDE_DIR for onnx-tensorrt (#18274) 2023-11-03 20:04:10 -07:00
onnxruntime_providers_tvm.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_vitisai.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_webnn.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_xnnpack.cmake Update XNNPACK to latest version (#18038) 2023-11-03 09:04:28 -07:00
onnxruntime_pyop.cmake
onnxruntime_python.cmake [Quantization] Tensor quant overrides and QNN EP quantization configuration (#18465) 2023-12-04 17:54:58 -08:00
onnxruntime_rocm_hipify.cmake onboard MoE (#18279) 2023-11-14 16:48:51 -08:00
onnxruntime_session.cmake added support for cmake "find_package" (#8919) 2023-06-19 22:20:31 -07:00
onnxruntime_snpe_provider.cmake
onnxruntime_training.cmake Triton Codegen for ORTModule (#15831) 2023-07-13 18:17:58 +08:00
onnxruntime_unittests.cmake Adding unit test for sm80 prepack (#18514) 2023-11-28 10:01:09 -08:00
onnxruntime_util.cmake
onnxruntime_webassembly.cmake Update XNNPACK to latest version (#18038) 2023-11-03 09:04:28 -07:00
precompiled_header.cmake
Sdl.ruleset Add a Github workflow for Prefast (#15763) 2023-05-03 11:42:51 -07:00
set_winapi_family_desktop.h
target_delayload.cmake
uwp_stubs.h Run clang-format in CI (#15524) 2023-04-18 09:26:58 -07:00
wcos_rules_override.cmake
winml.cmake Update winml to use #cores - #soc cores by Default as the number of intraopthreads (#18384) 2023-11-28 09:26:48 -08:00
winml_cppwinrt.cmake
winml_sdk_helpers.cmake
winml_unittests.cmake Update C/C++ dependencies: abseil, date, nsync, googletest, wil, mp11, cpuinfo and safeint (#15470) 2023-09-08 13:35:04 -07:00