onnxruntime/docs/python/notebooks/onnxruntime-tvm-tutorial.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "72476497",
   "metadata": {},
   "source": [
    "# ONNX Runtime: Tutorial for TVM execution provider\n",
    "\n",
    "This notebook shows a simple example for model inference with TVM EP.\n",
    "\n",
    "\n",
    "#### Tutorial Roadmap:\n",
    "1. Prerequistes\n",
    "2. Accuracy check for TVM EP\n",
    "3. Configuration options\n",
    "4. Support precompiled model"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9345cbab",
   "metadata": {},
   "source": [
    "## 1. Prerequistes\n",
    "\n",
    "Make sure that you have installed all the necessary dependencies described in the corresponding paragraph of the documentation.\n",
    "\n",
    "Also, make sure you have the `tvm` and `onnxruntime-tvm` packages in your pip environment. \n",
    "\n",
    "If you are using `PYTHONPATH` variable expansion, make sure it contains the following paths: `<path_to_msft_onnxrt>/onnxruntime/cmake/external/tvm_update/python` and `<path_to_msft_onnxrt>/onnxruntime/build/Linux/Release`."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "da4ca21f",
   "metadata": {},
   "source": [
    "### Common import\n",
    "\n",
    "These packages can be delivered from standard `pip`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "0f072875",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import onnx\n",
    "import tempfile\n",
    "import numpy as np\n",
    "from typing import List, AnyStr\n",
    "from onnx import ModelProto, helper, checker, mapping"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "118670aa",
   "metadata": {},
   "source": [
    "### Specialized import\n",
    "\n",
    "It is better to collect these packages from source code in order to clearly understand what is available to you right now."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "a5502966",
   "metadata": {},
   "outputs": [],
   "source": [
    "import onnxruntime\n",
    "\n",
    "import tvm\n",
    "import tvm.relay\n",
    "import tvm.testing\n",
    "import tvm.runtime\n",
    "import tvm.runtime.vm\n",
    "import tvm.relay.backend.vm\n",
    "import tvm.contrib.download"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b7313183",
   "metadata": {},
   "source": [
    "### Helper functions for working with ONNX ModelProto\n",
    "\n",
    "This set of helper functions allows you to recognize the meta information of the models. This information is needed for more versatile processing of ONNX models."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "7d0a36e8",
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_onnx_input_names(model: ModelProto) -> List[AnyStr]:\n",
    "    inputs = [node.name for node in model.graph.input]\n",
    "    initializer = [node.name for node in model.graph.initializer]\n",
    "    inputs = list(set(inputs) - set(initializer))\n",
    "    return sorted(inputs)\n",
    "\n",
    "\n",
    "def get_onnx_output_names(model: ModelProto) -> List[AnyStr]:\n",
    "    return [node.name for node in model.graph.output]\n",
    "\n",
    "\n",
    "def get_onnx_input_types(model: ModelProto) -> List[np.dtype]:\n",
    "    input_names = get_onnx_input_names(model)\n",
    "    return [\n",
    "        mapping.TENSOR_TYPE_TO_NP_TYPE[node.type.tensor_type.elem_type]\n",
    "        for node in sorted(model.graph.input, key=lambda node: node.name) if node.name in input_names\n",
    "    ]\n",
    "\n",
    "\n",
    "def get_onnx_input_shapes(model: ModelProto) -> List[List[int]]:\n",
    "    input_names = get_onnx_input_names(model)\n",
    "    return [\n",
    "        [dv.dim_value for dv in node.type.tensor_type.shape.dim]\n",
    "        for node in sorted(model.graph.input, key=lambda node: node.name) if node.name in input_names\n",
    "    ]\n",
    "\n",
    "\n",
    "def get_random_model_inputs(model: ModelProto) -> List[np.ndarray]:\n",
    "    input_shapes = get_onnx_input_shapes(model)\n",
    "    input_types = get_onnx_input_types(model)\n",
    "    assert len(input_types) == len(input_shapes)\n",
    "    inputs = [np.random.uniform(size=shape).astype(dtype) for shape, dtype in zip(input_shapes, input_types)]\n",
    "    return inputs"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f0de1682",
   "metadata": {},
   "source": [
    "### Wrapper helper functions for Inference\n",
    "\n",
    "Wrapper helper functions for running model inference using ONNX Runtime EP."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "258ce9e9",
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_onnxruntime_output(model: ModelProto, inputs: List, provider_name: AnyStr) -> np.ndarray:\n",
    "    output_names = get_onnx_output_names(model)\n",
    "    input_names = get_onnx_input_names(model)\n",
    "    assert len(input_names) == len(inputs)\n",
    "    input_dict = {input_name: input_value for input_name, input_value in zip(input_names, inputs)}\n",
    "\n",
    "    inference_session = onnxruntime.InferenceSession(model.SerializeToString(), providers=[provider_name])\n",
    "    output = inference_session.run(output_names, input_dict)\n",
    "\n",
    "    # Unpack output if there's only a single value.\n",
    "    if len(output) == 1:\n",
    "        output = output[0]\n",
    "    return output\n",
    "\n",
    "\n",
    "def get_cpu_onnxruntime_output(model: ModelProto, inputs: List) -> np.ndarray:\n",
    "    return get_onnxruntime_output(model, inputs, \"CPUExecutionProvider\")\n",
    "\n",
    "\n",
    "def get_tvm_onnxruntime_output(model: ModelProto, inputs: List) -> np.ndarray:\n",
    "    return get_onnxruntime_output(model, inputs, \"TvmExecutionProvider\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cc17d3b2",
   "metadata": {},
   "source": [
    "### Helper function for checking accuracy\n",
    "\n",
    "This function uses the TVM API to compare two output tensors. The tensor obtained using the `CPUExecutionProvider` is used as a reference.\n",
    "\n",
    "If a mismatch is found between tensors, an appropriate exception will be thrown."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "4e598907",
   "metadata": {},
   "outputs": [],
   "source": [
    "def verify_outputs(\n",
    "    lhs: List[np.ndarray],\n",
    "    rhs: List[np.ndarray],\n",
    "    rtol: float = 5e-5,\n",
    "    atol: float = 5e-5\n",
    ") -> None:\n",
    "    for lhs_tensor, rhs_tensor in zip(lhs, rhs):\n",
    "        tvm.testing.assert_allclose(lhs_tensor, rhs_tensor, rtol=rtol, atol=atol)\n",
    "        assert lhs_tensor.dtype == rhs_tensor.dtype\n",
    "    print(\"Same output, congratulations!\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "f33a372b",
   "metadata": {},
   "outputs": [],
   "source": [
    "def verify_with_ort_with_inputs(\n",
    "    model,\n",
    "    inputs,\n",
    "    out_shape=None,\n",
    "    opset=None,\n",
    "    freeze_params=False,\n",
    "    dtype=\"float32\",\n",
    "    rtol=1e-5,\n",
    "    atol=1e-5,\n",
    "    opt_level=1,\n",
    "):\n",
    "    if opset is not None:\n",
    "        model.opset_import[0].version = opset\n",
    "\n",
    "    ort_out = get_cpu_onnxruntime_output(model, inputs)\n",
    "    tvm_out = get_tvm_onnxruntime_output(model, inputs)\n",
    "    verify_outputs(ort_out, tvm_out, rtol, atol)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8c62b01a",
   "metadata": {},
   "source": [
    "### Helper functions for download models\n",
    "\n",
    "These functions use the TVM API to download models from the ONNX Model Zoo."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "324c00e7",
   "metadata": {},
   "outputs": [],
   "source": [
    "BASE_MODEL_URL = \"https://github.com/onnx/models/raw/master/\"\n",
    "MODEL_URL_COLLECTION = {\n",
    "    \"ResNet50-v1\": \"vision/classification/resnet/model/resnet50-v1-7.onnx\",\n",
    "    \"ResNet50-v2\": \"vision/classification/resnet/model/resnet50-v2-7.onnx\",\n",
    "    \"SqueezeNet-v1.1\": \"vision/classification/squeezenet/model/squeezenet1.1-7.onnx\",\n",
    "    \"SqueezeNet-v1.0\": \"vision/classification/squeezenet/model/squeezenet1.0-7.onnx\",\n",
    "    \"Inception-v1\": \"vision/classification/inception_and_googlenet/inception_v1/model/inception-v1-7.onnx\",\n",
    "    \"Inception-v2\": \"vision/classification/inception_and_googlenet/inception_v2/model/inception-v2-7.onnx\",\n",
    "}\n",
    "\n",
    "\n",
    "def get_model_url(model_name):\n",
    "    return BASE_MODEL_URL + MODEL_URL_COLLECTION[model_name]\n",
    "\n",
    "\n",
    "def get_name_from_url(url):\n",
    "    return url[url.rfind(\"/\") + 1 :].strip()\n",
    "\n",
    "\n",
    "def find_of_download(model_name):\n",
    "    model_url = get_model_url(model_name)\n",
    "    model_file_name = get_name_from_url(model_url)\n",
    "    return tvm.contrib.download.download_testdata(model_url, model_file_name, module=\"models\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "90fb7c5c",
   "metadata": {},
   "source": [
    "## 2. Accuracy check for TVM EP \n",
    "\n",
    "This section will check the accuracy. The check will be to compare the output tensors for `CPUExecutionProvider` and `TvmExecutionProvider`. See the description of `verify_with_ort_with_inputs` function used above.\n",
    "\n",
    "\n",
    "### Check for simple architectures"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "c739ed5c",
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_two_input_model(op_name: AnyStr) -> ModelProto:\n",
    "    dtype = \"float32\"\n",
    "    in_shape = [1, 2, 3, 3]\n",
    "    in_type = mapping.NP_TYPE_TO_TENSOR_TYPE[np.dtype(dtype)]\n",
    "    out_shape = in_shape\n",
    "    out_type = in_type\n",
    "\n",
    "    layer = helper.make_node(op_name, [\"in1\", \"in2\"], [\"out\"])\n",
    "    graph = helper.make_graph(\n",
    "        [layer],\n",
    "        \"two_input_test\",\n",
    "        inputs=[\n",
    "            helper.make_tensor_value_info(\"in1\", in_type, in_shape),\n",
    "            helper.make_tensor_value_info(\"in2\", in_type, in_shape),\n",
    "        ],\n",
    "        outputs=[\n",
    "            helper.make_tensor_value_info(\n",
    "                \"out\", out_type, out_shape\n",
    "            )\n",
    "        ],\n",
    "    )\n",
    "    model = helper.make_model(graph, producer_name=\"two_input_test\")\n",
    "    checker.check_model(model, full_check=True)\n",
    "    return model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "7048ee6d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Same output, congratulations!\n",
      "****************** Success! ******************\n"
     ]
    }
   ],
   "source": [
    "onnx_model = get_two_input_model(\"Add\")\n",
    "inputs = get_random_model_inputs(onnx_model)\n",
    "verify_with_ort_with_inputs(onnx_model, inputs)\n",
    "print(\"****************** Success! ******************\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "52c880f4",
   "metadata": {},
   "source": [
    "### Check for DNN architectures "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "f5d465dc",
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_onnx_model(model_name):\n",
    "    model_path = find_of_download(model_name)\n",
    "    onnx_model = onnx.load(model_path)\n",
    "    return onnx_model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "68daac7e",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Same output, congratulations!\n",
      "****************** Success! ******************\n"
     ]
    }
   ],
   "source": [
    "model_name = \"ResNet50-v1\"\n",
    "\n",
    "onnx_model = get_onnx_model(model_name)\n",
    "inputs = get_random_model_inputs(onnx_model)\n",
    "verify_with_ort_with_inputs(onnx_model, inputs)\n",
    "print(\"****************** Success! ******************\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e27f64a2",
   "metadata": {},
   "source": [
    "## 3. Configuration options\n",
    "\n",
    "This section shows how you can configure TVM EP using custom options. For more details on the options used, see the corresponding section of the documentation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "a053f59f",
   "metadata": {},
   "outputs": [],
   "source": [
    "provider_name = \"TvmExecutionProvider\"\n",
    "provider_options = dict(\n",
    "    target=\"llvm -mtriple=x86_64-linux-gnu\",\n",
    "    target_host=\"llvm -mtriple=x86_64-linux-gnu\",\n",
    "    opt_level=3,\n",
    "    freeze_weights=True,\n",
    "    tuning_file_path=\"\",\n",
    "    tuning_type=\"Ansor\",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "3f6e6f01",
   "metadata": {},
   "outputs": [],
   "source": [
    "model_name = \"ResNet50-v1\"\n",
    "onnx_model = get_onnx_model(model_name)\n",
    "input_dict = {\n",
    "    input_name: input_value for input_name, input_value in zip(\n",
    "        get_onnx_input_names(onnx_model),\n",
    "        get_random_model_inputs(onnx_model),\n",
    "    )\n",
    "}\n",
    "output_names = get_onnx_output_names(onnx_model)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "85ab83f2",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "****************** Output shape: (1, 1000) ******************\n"
     ]
    }
   ],
   "source": [
    "tvm_session = onnxruntime.InferenceSession(\n",
    "    onnx_model.SerializeToString(),\n",
    "    providers=[provider_name],\n",
    "    provider_options=[provider_options],\n",
    ")\n",
    "output = tvm_session.run(output_names, input_dict)[0]\n",
    "print(f\"****************** Output shape: {output.shape} ******************\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b704374b",
   "metadata": {},
   "source": [
    "## 4. Support precompiled model\n",
    "\n",
    "Wrapper functions that allow you to compile the model and save it in the desired format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "8150942b",
   "metadata": {},
   "outputs": [],
   "source": [
    "def compile_virtual_machine(model: onnx.ModelProto, target_str: AnyStr) -> tvm.runtime.vm.Executable:\n",
    "    ir_mod, params = tvm.relay.frontend.from_onnx(\n",
    "        model,\n",
    "        opset=model.opset_import[0].version,\n",
    "        freeze_params=True,\n",
    "    )\n",
    "    target = tvm.target.Target(target=target_str, host=target_str)\n",
    "    return tvm.relay.backend.vm.compile(ir_mod, target)\n",
    "\n",
    "\n",
    "def serialize_virtual_machine(vm_exec: tvm.runtime.vm.Executable) -> AnyStr:\n",
    "    temp_directory = tempfile.mkdtemp()\n",
    "    path_consts = os.path.join(temp_directory, \"consts\")\n",
    "    vm_exec.move_late_bound_consts(path_consts, byte_limit=256)\n",
    "    lib_path = os.path.join(temp_directory, f\"model.so\")\n",
    "    code_path = os.path.join(temp_directory, f\"model.ro\")\n",
    "    code, lib = vm_exec.save()\n",
    "    lib.export_library(lib_path)\n",
    "    with open(code_path, \"wb\") as fo:\n",
    "        fo.write(code)\n",
    "    return temp_directory"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9cbb987e",
   "metadata": {},
   "source": [
    "Preparation of the ONNX model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "febb9d72",
   "metadata": {},
   "outputs": [],
   "source": [
    "model_name = \"ResNet50-v1\"\n",
    "onnx_model = get_onnx_model(model_name)\n",
    "input_dict = {\n",
    "    input_name: input_value for input_name, input_value in zip(\n",
    "        get_onnx_input_names(onnx_model),\n",
    "        get_random_model_inputs(onnx_model),\n",
    "    )\n",
    "}\n",
    "output_names = get_onnx_output_names(onnx_model)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b05b251a",
   "metadata": {},
   "source": [
    "Compiling the ONNX model using `VirtualMachine` (TVM)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "b4b999ee",
   "metadata": {},
   "outputs": [],
   "source": [
    "compiled_vm_exec = compile_virtual_machine(onnx_model, target_str=\"llvm\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "e3408c15",
   "metadata": {},
   "outputs": [],
   "source": [
    "so_folder = serialize_virtual_machine(compiled_vm_exec)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "311405e8",
   "metadata": {},
   "source": [
    "Preparing `ProviderOptions` and launching `TVM EP` inference.\n",
    "\n",
    "In order to use the precompiled model, you only need to pass two options:\n",
    "* **executor** - `vm` (`VirtualMachine`) must be used as a value (this functionality is not supported for `GraphExecutor`);\n",
    "* **so_folder** - as a value, you must pass the path to the directory where the files of the precompiled model are located."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "8927293c",
   "metadata": {},
   "outputs": [],
   "source": [
    "provider_name = \"TvmExecutionProvider\"\n",
    "provider_options = dict(\n",
    "    executor=\"vm\",\n",
    "    so_folder=so_folder,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "d7532863",
   "metadata": {},
   "outputs": [],
   "source": [
    "tvm_session = onnxruntime.InferenceSession(\n",
    "    onnx_model.SerializeToString(),\n",
    "    providers=[provider_name],\n",
    "    provider_options=[provider_options],\n",
    ")\n",
    "tvm_output = tvm_session.run(output_names, input_dict)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1c0b983e",
   "metadata": {},
   "source": [
    "Let's make sure that the output values match those that can be obtained through `CPUExecutionProvider`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "c3de2299",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Same output, congratulations!\n"
     ]
    }
   ],
   "source": [
    "verify_outputs(\n",
    "    tvm_output[0],\n",
    "    get_cpu_onnxruntime_output(\n",
    "        onnx_model,\n",
    "        input_dict.values()\n",
    "    ),\n",
    ")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}