onnxruntime/tools/python/util/qdq_helpers/optimize_qdq_model.py

#!/usr/bin/env python3
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.

import argparse
import os
import pathlib

import onnx


def optimize_qdq_model():
    parser = argparse.ArgumentParser(
        os.path.basename(__file__),
        description="Update a QDQ format ONNX model to ensure optimal performance when executed using ONNX Runtime.",
    )

    parser.add_argument("input_model", type=pathlib.Path, help="Provide path to ONNX model to update.")
    parser.add_argument("output_model", type=pathlib.Path, help="Provide path to write updated ONNX model to.")

    args = parser.parse_args()

    model = onnx.load(str(args.input_model.resolve(strict=True)))

    # run QDQ model optimizations here

    # Originally, the fixing up of DQ nodes with multiple consumers was implemented as one such optimization.
    # That was moved to an ORT graph transformer.
    print("As of ORT 1.15, the fixing up of DQ nodes with multiple consumers is done by an ORT graph transformer.")

    # There are no optimizations being run currently but we expect that there may be in the future.

    onnx.save(model, str(args.output_model.resolve()))


if __name__ == "__main__":
    optimize_qdq_model()
Add helper for optimizing a QDQ format model for usage with ORT. (#10595) * Add initial helper for optimizing a QDQ format model for usage with ORT. If a DQ node has multiple consumers it will end up in multiple QDQ node units. This is complicated to handle as each qdq unit could end up being handled by different execution providers. By duplicating the DQ node we simplify this logic. Generally the duplicate nodes will disappear when the qdq node unit is converted to a single node with a quantized operator. If there are qdq node units that are not able to be converted to use a quantized operator the ORT cleanup (pending) to drop remaining Q->DQ pairs between fp32 nodes can remove any remaining DQ nodes. * Fix pep8 warning Co-authored-by: Guoyu Wang <wanggy@outlook.com> 2022-02-20 23:26:19 +00:00			`#!/usr/bin/env python3`
			`# Copyright (c) Microsoft Corporation. All rights reserved.`
			`# Licensed under the MIT License.`

			`import argparse`
			`import os`
			`import pathlib`

Format all python files under onnxruntime with black and isort (#11324) Description: Format all python files under onnxruntime with black and isort. After checking in, we can use .git-blame-ignore-revs to ignore the formatting PR in git blame. #11315, #11316 2022-04-26 16:35:16 +00:00			`import onnx`

Add helper for optimizing a QDQ format model for usage with ORT. (#10595) * Add initial helper for optimizing a QDQ format model for usage with ORT. If a DQ node has multiple consumers it will end up in multiple QDQ node units. This is complicated to handle as each qdq unit could end up being handled by different execution providers. By duplicating the DQ node we simplify this logic. Generally the duplicate nodes will disappear when the qdq node unit is converted to a single node with a quantized operator. If there are qdq node units that are not able to be converted to use a quantized operator the ORT cleanup (pending) to drop remaining Q->DQ pairs between fp32 nodes can remove any remaining DQ nodes. * Fix pep8 warning Co-authored-by: Guoyu Wang <wanggy@outlook.com> 2022-02-20 23:26:19 +00:00
			`def optimize_qdq_model():`
Format all python files under onnxruntime with black and isort (#11324) Description: Format all python files under onnxruntime with black and isort. After checking in, we can use .git-blame-ignore-revs to ignore the formatting PR in git blame. #11315, #11316 2022-04-26 16:35:16 +00:00			`parser = argparse.ArgumentParser(`
			`os.path.basename(__file__),`
Graph transformer to ensure unique DQ nodes for QDQ node units (#15145) ### Description <!-- Describe your changes. --> Add required graph transformer to duplicate DQ nodes to ensure that QDQ node units have unique DQ nodes. This condition is necessary for QDQ node unit processing. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> There is an existing Python utility that does this: https://github.com/microsoft/onnxruntime/blob/c7ced7a5e9c5ba1245b03f84edd537f45432b2e9/tools/python/util/qdq_helpers/qdq_model_utils.py#L77 This PR implements it as a graph transformer so it is integrated into ORT and does not require a separate step to update the model. There are also tests to ensure that its effects are not undone by basic level graph optimizations. 2023-03-30 22:39:43 +00:00			`description="Update a QDQ format ONNX model to ensure optimal performance when executed using ONNX Runtime.",`
Format all python files under onnxruntime with black and isort (#11324) Description: Format all python files under onnxruntime with black and isort. After checking in, we can use .git-blame-ignore-revs to ignore the formatting PR in git blame. #11315, #11316 2022-04-26 16:35:16 +00:00			`)`
Add helper for optimizing a QDQ format model for usage with ORT. (#10595) * Add initial helper for optimizing a QDQ format model for usage with ORT. If a DQ node has multiple consumers it will end up in multiple QDQ node units. This is complicated to handle as each qdq unit could end up being handled by different execution providers. By duplicating the DQ node we simplify this logic. Generally the duplicate nodes will disappear when the qdq node unit is converted to a single node with a quantized operator. If there are qdq node units that are not able to be converted to use a quantized operator the ORT cleanup (pending) to drop remaining Q->DQ pairs between fp32 nodes can remove any remaining DQ nodes. * Fix pep8 warning Co-authored-by: Guoyu Wang <wanggy@outlook.com> 2022-02-20 23:26:19 +00:00
Format all python files under onnxruntime with black and isort (#11324) Description: Format all python files under onnxruntime with black and isort. After checking in, we can use .git-blame-ignore-revs to ignore the formatting PR in git blame. #11315, #11316 2022-04-26 16:35:16 +00:00			`parser.add_argument("input_model", type=pathlib.Path, help="Provide path to ONNX model to update.")`
			`parser.add_argument("output_model", type=pathlib.Path, help="Provide path to write updated ONNX model to.")`
Add helper for optimizing a QDQ format model for usage with ORT. (#10595) * Add initial helper for optimizing a QDQ format model for usage with ORT. If a DQ node has multiple consumers it will end up in multiple QDQ node units. This is complicated to handle as each qdq unit could end up being handled by different execution providers. By duplicating the DQ node we simplify this logic. Generally the duplicate nodes will disappear when the qdq node unit is converted to a single node with a quantized operator. If there are qdq node units that are not able to be converted to use a quantized operator the ORT cleanup (pending) to drop remaining Q->DQ pairs between fp32 nodes can remove any remaining DQ nodes. * Fix pep8 warning Co-authored-by: Guoyu Wang <wanggy@outlook.com> 2022-02-20 23:26:19 +00:00
			`args = parser.parse_args()`

			`model = onnx.load(str(args.input_model.resolve(strict=True)))`

Graph transformer to ensure unique DQ nodes for QDQ node units (#15145) ### Description <!-- Describe your changes. --> Add required graph transformer to duplicate DQ nodes to ensure that QDQ node units have unique DQ nodes. This condition is necessary for QDQ node unit processing. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> There is an existing Python utility that does this: https://github.com/microsoft/onnxruntime/blob/c7ced7a5e9c5ba1245b03f84edd537f45432b2e9/tools/python/util/qdq_helpers/qdq_model_utils.py#L77 This PR implements it as a graph transformer so it is integrated into ORT and does not require a separate step to update the model. There are also tests to ensure that its effects are not undone by basic level graph optimizations. 2023-03-30 22:39:43 +00:00			`# run QDQ model optimizations here`

			`# Originally, the fixing up of DQ nodes with multiple consumers was implemented as one such optimization.`
			`# That was moved to an ORT graph transformer.`
			`print("As of ORT 1.15, the fixing up of DQ nodes with multiple consumers is done by an ORT graph transformer.")`

			`# There are no optimizations being run currently but we expect that there may be in the future.`
Add helper for optimizing a QDQ format model for usage with ORT. (#10595) * Add initial helper for optimizing a QDQ format model for usage with ORT. If a DQ node has multiple consumers it will end up in multiple QDQ node units. This is complicated to handle as each qdq unit could end up being handled by different execution providers. By duplicating the DQ node we simplify this logic. Generally the duplicate nodes will disappear when the qdq node unit is converted to a single node with a quantized operator. If there are qdq node units that are not able to be converted to use a quantized operator the ORT cleanup (pending) to drop remaining Q->DQ pairs between fp32 nodes can remove any remaining DQ nodes. * Fix pep8 warning Co-authored-by: Guoyu Wang <wanggy@outlook.com> 2022-02-20 23:26:19 +00:00
			`onnx.save(model, str(args.output_model.resolve()))`


Format all python files under onnxruntime with black and isort (#11324) Description: Format all python files under onnxruntime with black and isort. After checking in, we can use .git-blame-ignore-revs to ignore the formatting PR in git blame. #11315, #11316 2022-04-26 16:35:16 +00:00			`if __name__ == "__main__":`
Add helper for optimizing a QDQ format model for usage with ORT. (#10595) * Add initial helper for optimizing a QDQ format model for usage with ORT. If a DQ node has multiple consumers it will end up in multiple QDQ node units. This is complicated to handle as each qdq unit could end up being handled by different execution providers. By duplicating the DQ node we simplify this logic. Generally the duplicate nodes will disappear when the qdq node unit is converted to a single node with a quantized operator. If there are qdq node units that are not able to be converted to use a quantized operator the ORT cleanup (pending) to drop remaining Q->DQ pairs between fp32 nodes can remove any remaining DQ nodes. * Fix pep8 warning Co-authored-by: Guoyu Wang <wanggy@outlook.com> 2022-02-20 23:26:19 +00:00			`optimize_qdq_model()`