add domain check for nodes + update documentation (#2831)

This commit is contained in:
Ashwini Khade 2020-01-14 11:15:50 -08:00 committed by GitHub
parent aa37dea598
commit 8643f3ebbb
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
2 changed files with 22 additions and 3 deletions

View file

@ -1,6 +1,6 @@
# Quantization tool Overview
This tool supports 8 bit linear quantization of an onnx model. quantize() takes a model in ModelProto format and returns the quantized model in ModelProto format.
Today ORT does not guarantee support for E2E model quantization, meaning all ONNX ops do not have support for 8 bit data types therefore only the supported ops in the model are quantized. For rest of the ops inputs are reconverted to FP32.
Today ORT does not guarantee support for E2E model quantization, meaning since not all ONNX ops have support for 8 bit data types therefore only the supported ops in the model are quantized. For rest of the ops inputs are reconverted to FP32.
List of Supported Quantized Ops:
The following ops were chosen as phase 1 ops because in most of the CNN models these ops consume most amount of compute and power and therefore there is benefit in quantizing these ops to get perf benefits.
@ -25,6 +25,21 @@ The following ops were chosen as phase 1 ops because in most of the CNN models t
Zero point represents zero in quantization space. It is important that floating point zero value be exactly representable in quantization space. This is because in lot of CNNs, zero padding is used and if after quantization it is not possible to represent 0 uniquely then it will lead to accuracy errors.
## Quantization and model opset versions
Quantization is fairly new in ONNX and ONNXRuntime. Quantization ops were introduced in ONNX opset version 10. Therefore it is important that the model which is being quantized be opset 10 or higher. In case the model opset version is < 10 then it is recommended that the model should be reconverted to ONNX from its original framework using the latest opset.
Quantization tool displays a warning when the model opset version is < 10 and still goes ahead and quantizes the model and at the end changes the opset version to 10. It is the responsibility of the model owner to run model checker and make sure the model is valid. If the model is not valid then use the above recommended way i.e. reconvert the model from original framework.
## Quantization and Graph Optimization
Please note quantization and graph optimizations may not always work together.
### Quantizing an optimized model
If a model is optimized using level 99 (i.e. all possible optimizations are run on that model) then it is possible that after these optimizations are applied the model is converted in a way that quantization cannot be applied on this model anymore and therefore after running quantization script there will be no change in the model.
### Optimizing a quantized model
Same goes other way round. After quantizing a model some graph optimizations which otherwise might have been applicable on this model may not be applicable anymore.
It is advised that the model owner be aware of this and run perf evaluations to understand which technique gives the best performance for their model.
## Quantize an ONNX model
```python

View file

@ -262,8 +262,12 @@ class ONNXQuantizer:
# Create a new topologically sorted list for quantizing a model
new_list = []
for node in self.model.graph.node:
# if a list of ops to be quantized is provided then only quantize those ops
if self.nodes_to_quantize is not None and node.name not in self.nodes_to_quantize:
new_list +=self._handle_other_ops(node, new_list)
# only onnx domain ops can be quantized today
elif node.domain != "ai.onnx" or node.domain != "":
new_list +=self._handle_other_ops(node, new_list)
else:
if node.op_type == 'Conv':
new_list += self._quantize_convolution(node, new_list)
@ -274,7 +278,7 @@ class ONNXQuantizer:
elif node.op_type == 'Relu' or node.op_type == 'Clip':
new_list +=self._handle_activation_ops(node, new_list)
else:
new_list +=self._handle_other_ops(node, new_list)
new_list +=self._handle_other_ops(node, new_list)
# extend is used to append to the list for a protobuf fields
# https://developers.google.com/protocol-buffers/docs/reference/python-generated?csw=1#fields
@ -284,7 +288,7 @@ class ONNXQuantizer:
# Remove weights which are already quantized from graph.
self._remove_quantized_weights()
# update opset.
# update opset.
opset_info = next((opset for opset in self.model.opset_import if opset.domain == '' or opset.domain == onnx_domain), None)
if opset_info is not None:
self.model.opset_import.remove(opset_info)