mirror of
https://github.com/saymrwulf/onnxruntime.git
synced 2026-05-26 22:35:43 +00:00
add domain check for nodes + update documentation (#2831)
This commit is contained in:
parent
aa37dea598
commit
8643f3ebbb
2 changed files with 22 additions and 3 deletions
|
|
@ -1,6 +1,6 @@
|
|||
# Quantization tool Overview
|
||||
This tool supports 8 bit linear quantization of an onnx model. quantize() takes a model in ModelProto format and returns the quantized model in ModelProto format.
|
||||
Today ORT does not guarantee support for E2E model quantization, meaning all ONNX ops do not have support for 8 bit data types therefore only the supported ops in the model are quantized. For rest of the ops inputs are reconverted to FP32.
|
||||
Today ORT does not guarantee support for E2E model quantization, meaning since not all ONNX ops have support for 8 bit data types therefore only the supported ops in the model are quantized. For rest of the ops inputs are reconverted to FP32.
|
||||
|
||||
List of Supported Quantized Ops:
|
||||
The following ops were chosen as phase 1 ops because in most of the CNN models these ops consume most amount of compute and power and therefore there is benefit in quantizing these ops to get perf benefits.
|
||||
|
|
@ -25,6 +25,21 @@ The following ops were chosen as phase 1 ops because in most of the CNN models t
|
|||
|
||||
Zero point represents zero in quantization space. It is important that floating point zero value be exactly representable in quantization space. This is because in lot of CNNs, zero padding is used and if after quantization it is not possible to represent 0 uniquely then it will lead to accuracy errors.
|
||||
|
||||
## Quantization and model opset versions
|
||||
Quantization is fairly new in ONNX and ONNXRuntime. Quantization ops were introduced in ONNX opset version 10. Therefore it is important that the model which is being quantized be opset 10 or higher. In case the model opset version is < 10 then it is recommended that the model should be reconverted to ONNX from its original framework using the latest opset.
|
||||
|
||||
Quantization tool displays a warning when the model opset version is < 10 and still goes ahead and quantizes the model and at the end changes the opset version to 10. It is the responsibility of the model owner to run model checker and make sure the model is valid. If the model is not valid then use the above recommended way i.e. reconvert the model from original framework.
|
||||
|
||||
## Quantization and Graph Optimization
|
||||
Please note quantization and graph optimizations may not always work together.
|
||||
|
||||
### Quantizing an optimized model
|
||||
If a model is optimized using level 99 (i.e. all possible optimizations are run on that model) then it is possible that after these optimizations are applied the model is converted in a way that quantization cannot be applied on this model anymore and therefore after running quantization script there will be no change in the model.
|
||||
|
||||
### Optimizing a quantized model
|
||||
Same goes other way round. After quantizing a model some graph optimizations which otherwise might have been applicable on this model may not be applicable anymore.
|
||||
|
||||
It is advised that the model owner be aware of this and run perf evaluations to understand which technique gives the best performance for their model.
|
||||
|
||||
## Quantize an ONNX model
|
||||
```python
|
||||
|
|
|
|||
|
|
@ -262,8 +262,12 @@ class ONNXQuantizer:
|
|||
# Create a new topologically sorted list for quantizing a model
|
||||
new_list = []
|
||||
for node in self.model.graph.node:
|
||||
# if a list of ops to be quantized is provided then only quantize those ops
|
||||
if self.nodes_to_quantize is not None and node.name not in self.nodes_to_quantize:
|
||||
new_list +=self._handle_other_ops(node, new_list)
|
||||
# only onnx domain ops can be quantized today
|
||||
elif node.domain != "ai.onnx" or node.domain != "":
|
||||
new_list +=self._handle_other_ops(node, new_list)
|
||||
else:
|
||||
if node.op_type == 'Conv':
|
||||
new_list += self._quantize_convolution(node, new_list)
|
||||
|
|
@ -274,7 +278,7 @@ class ONNXQuantizer:
|
|||
elif node.op_type == 'Relu' or node.op_type == 'Clip':
|
||||
new_list +=self._handle_activation_ops(node, new_list)
|
||||
else:
|
||||
new_list +=self._handle_other_ops(node, new_list)
|
||||
new_list +=self._handle_other_ops(node, new_list)
|
||||
|
||||
# extend is used to append to the list for a protobuf fields
|
||||
# https://developers.google.com/protocol-buffers/docs/reference/python-generated?csw=1#fields
|
||||
|
|
@ -284,7 +288,7 @@ class ONNXQuantizer:
|
|||
# Remove weights which are already quantized from graph.
|
||||
self._remove_quantized_weights()
|
||||
|
||||
# update opset.
|
||||
# update opset.
|
||||
opset_info = next((opset for opset in self.model.opset_import if opset.domain == '' or opset.domain == onnx_domain), None)
|
||||
if opset_info is not None:
|
||||
self.model.opset_import.remove(opset_info)
|
||||
|
|
|
|||
Loading…
Reference in a new issue