2018-11-20 00:48:22 +00:00
2022-03-15 16:40:59 +00:00
===
API
===
2018-11-20 00:48:22 +00:00
.. contents ::
:local:
2022-03-15 16:40:59 +00:00
API Overview
============
2021-10-14 16:07:35 +00:00
2022-03-15 16:40:59 +00:00
*ONNX Runtime* loads and runs inference on a model in ONNX graph format, or ORT format (for memory and disk constrained environments).
2020-10-07 04:14:41 +00:00
2022-03-15 16:40:59 +00:00
The data consumed and produced by the model can be specified and accessed in the way that best matches your scenario.
Load and run a model
--------------------
InferenceSession is the main class of ONNX Runtime. It is used to load and run an ONNX model,
as well as specify environment and application configuration options.
.. code-block :: python
session = onnxruntime.InferenceSession('model.onnx')
outputs = session.run([output names], inputs)
ONNX and ORT format models consist of a graph of computations, modeled as operators,
and implemented as optimized operator kernels for different hardware targets.
ONNX Runtime orchestrates the execution of operator kernels via `execution providers` .
An execution provider contains the set of kernels for a specific execution target (CPU, GPU, IoT etc).
Execution provides are configured using the `providers` parameter. Kernels from different execution
providers are chosen in the priority order given in the list of providers. In the example below
if there is a kernel in the CUDA execution provider ONNX Runtime executes that on GPU. If not
the kernel is executed on CPU.
.. code-block :: python
session = onnxruntime.InferenceSession(model,
providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
The list of available execution providers can be found here: `Execution Providers <https://onnxruntime.ai/docs/execution-providers> `_ .
Since ONNX Runtime 1.10, you must explicitly specify the execution provider for your target.
Running on CPU is the only time the API allows no explicit setting of the `provider` parameter.
In the examples that follow, the `CUDAExecutionProvider` and `CPUExecutionProvider` are used, assuming the application is running on NVIDIA GPUs.
Replace these with the execution provider specific to your environment.
You can supply other session configurations via the `session options` parameter. For example, to enable
profiling on the session:
.. code-block :: python
2020-10-07 04:14:41 +00:00
2022-03-15 16:40:59 +00:00
options = onnxruntime.SessionOptions()
options.enable_profiling=True
session = onnxruntime.InferenceSession('model.onnx', sess_options=options, providers=['CUDAExecutionProvider', 'CPUExecutionProvider']))
Data inputs and outputs
-----------------------
The ONNX Runtime Inference Session consumes and produces data using its OrtValue class.
Data on CPU
^^^^^^^^^^^
On CPU (the default), OrtValues can be mapped to and from native Python data structures: numpy arrays, dictionaries and lists of
numpy arrays.
2020-10-07 04:14:41 +00:00
.. code-block :: python
2022-03-15 16:40:59 +00:00
# X is numpy array on cpu
ortvalue = onnxruntime.OrtValue.ortvalue_from_numpy(X)
ortvalue.device_name() # 'cpu'
ortvalue.shape() # shape of the numpy array X
ortvalue.data_type() # 'tensor(float)'
ortvalue.is_tensor() # 'True'
2020-10-07 04:14:41 +00:00
np.array_equal(ortvalue.numpy(), X) # 'True'
2021-10-14 16:07:35 +00:00
# ortvalue can be provided as part of the input feed to a model
2022-03-15 16:40:59 +00:00
session = onnxruntime.InferenceSession('model.onnx', providers=['CUDAExecutionProvider', 'CPUExecutionProvider']))
results = session.run(["Y"], {"X": ortvalue})
2020-10-07 04:14:41 +00:00
2022-03-15 16:40:59 +00:00
By default, *ONNX Runtime* always places input(s) and output(s) on CPU. Having the data on CPU
may not optimal if the input or output is consumed and produced on a device
other than CPU because it introduces data copy between CPU and the device.
2020-07-08 10:48:22 +00:00
2022-03-15 16:40:59 +00:00
Data on device
^^^^^^^^^^^^^^
*ONNX Runtime* supports a custom data structure that supports all ONNX data formats that allows users
to place the data backing these on a device, for example, on a CUDA supported device. In ONNX Runtime,
this called `IOBinding` .
2020-07-08 10:48:22 +00:00
2022-03-15 16:40:59 +00:00
To use the `IOBinding` feature, replace `InferenceSession.run()` with `InferenceSession.run_with_iobinding()` .
2020-07-08 10:48:22 +00:00
2020-10-07 04:14:41 +00:00
A graph is executed on a device other than CPU, for instance CUDA. Users can
2022-03-15 16:40:59 +00:00
use IOBinding to copy the data onto the GPU.
2020-07-08 10:48:22 +00:00
.. code-block :: python
2021-10-14 16:07:35 +00:00
# X is numpy array on cpu
2022-03-15 16:40:59 +00:00
session = onnxruntime.InferenceSession('model.onnx', providers=['CUDAExecutionProvider', 'CPUExecutionProvider']))
2020-07-08 10:48:22 +00:00
io_binding = session.io_binding()
2020-10-07 04:14:41 +00:00
# OnnxRuntime will copy the data over to the CUDA device if 'input' is consumed by nodes on the CUDA device
2020-07-08 10:48:22 +00:00
io_binding.bind_cpu_input('input', X)
io_binding.bind_output('output')
session.run_with_iobinding(io_binding)
Y = io_binding.copy_outputs_to_cpu()[0]
2020-10-07 04:14:41 +00:00
The input data is on a device, users directly use the input. The output data is on CPU.
2020-07-08 10:48:22 +00:00
.. code-block :: python
2021-10-14 16:07:35 +00:00
# X is numpy array on cpu
2020-10-07 04:14:41 +00:00
X_ortvalue = onnxruntime.OrtValue.ortvalue_from_numpy(X, 'cuda', 0)
2022-03-15 16:40:59 +00:00
session = onnxruntime.InferenceSession('model.onnx', providers=['CUDAExecutionProvider', 'CPUExecutionProvider']))
2020-07-08 10:48:22 +00:00
io_binding = session.io_binding()
2020-10-07 04:14:41 +00:00
io_binding.bind_input(name='input', device_type=X_ortvalue.device_name(), device_id=0, element_type=np.float32, shape=X_ortvalue.shape(), buffer_ptr=X_ortvalue.data_ptr())
2020-07-08 10:48:22 +00:00
io_binding.bind_output('output')
session.run_with_iobinding(io_binding)
Y = io_binding.copy_outputs_to_cpu()[0]
2022-03-15 16:40:59 +00:00
The input data and output data are both on a device, users directly use the input and also place output on the device.
2020-10-07 04:14:41 +00:00
.. code-block :: python
#X is numpy array on cpu
X_ortvalue = onnxruntime.OrtValue.ortvalue_from_numpy(X, 'cuda', 0)
Y_ortvalue = onnxruntime.OrtValue.ortvalue_from_shape_and_type([3, 2], np.float32, 'cuda', 0) # Change the shape to the actual shape of the output being bound
2022-03-15 16:40:59 +00:00
session = onnxruntime.InferenceSession('model.onnx', providers=['CUDAExecutionProvider', 'CPUExecutionProvider']))
2020-10-07 04:14:41 +00:00
io_binding = session.io_binding()
io_binding.bind_input(name='input', device_type=X_ortvalue.device_name(), device_id=0, element_type=np.float32, shape=X_ortvalue.shape(), buffer_ptr=X_ortvalue.data_ptr())
io_binding.bind_output(name='output', device_type=Y_ortvalue.device_name(), device_id=0, element_type=np.float32, shape=Y_ortvalue.shape(), buffer_ptr=Y_ortvalue.data_ptr())
session.run_with_iobinding(io_binding)
Users can request *ONNX Runtime* to allocate an output on a device. This is particularly useful for dynamic shaped outputs.
2020-12-05 01:59:56 +00:00
Users can use the *get_outputs()* API to get access to the *OrtValue* (s) corresponding to the allocated output(s).
2020-10-07 04:14:41 +00:00
Users can thus consume the *ONNX Runtime* allocated memory for the output as an *OrtValue* .
.. code-block :: python
#X is numpy array on cpu
X_ortvalue = onnxruntime.OrtValue.ortvalue_from_numpy(X, 'cuda', 0)
2022-03-15 16:40:59 +00:00
session = onnxruntime.InferenceSession('model.onnx', providers=['CUDAExecutionProvider', 'CPUExecutionProvider']))
2020-10-07 04:14:41 +00:00
io_binding = session.io_binding()
io_binding.bind_input(name='input', device_type=X_ortvalue.device_name(), device_id=0, element_type=np.float32, shape=X_ortvalue.shape(), buffer_ptr=X_ortvalue.data_ptr())
#Request ONNX Runtime to bind and allocate memory on CUDA for 'output'
io_binding.bind_output('output', 'cuda')
session.run_with_iobinding(io_binding)
# The following call returns an OrtValue which has data allocated by ONNX Runtime on CUDA
ort_output = io_binding.get_outputs()[0]
2022-03-15 16:40:59 +00:00
In addition, *ONNX Runtime* supports directly working with *OrtValue* (s) while inferencing a model if provided as part of the input feed.
2020-10-07 04:14:41 +00:00
2020-12-05 01:59:56 +00:00
Users can bind *OrtValue* (s) directly.
2020-07-08 10:48:22 +00:00
.. code-block :: python
2020-10-07 04:14:41 +00:00
#X is numpy array on cpu
#X is numpy array on cpu
X_ortvalue = onnxruntime.OrtValue.ortvalue_from_numpy(X, 'cuda', 0)
Y_ortvalue = onnxruntime.OrtValue.ortvalue_from_shape_and_type([3, 2], np.float32, 'cuda', 0) # Change the shape to the actual shape of the output being bound
2022-03-15 16:40:59 +00:00
session = onnxruntime.InferenceSession('model.onnx', providers=['CUDAExecutionProvider', 'CPUExecutionProvider']))
2020-07-08 10:48:22 +00:00
io_binding = session.io_binding()
2020-10-07 04:14:41 +00:00
io_binding.bind_ortvalue_input('input', X_ortvalue)
io_binding.bind_ortvalue_output('output', Y_ortvalue)
2020-07-08 10:48:22 +00:00
session.run_with_iobinding(io_binding)
2018-11-20 00:48:22 +00:00
2022-03-15 16:40:59 +00:00
You can also bind inputs and outputs directly to a PyTorch tensor.
2018-11-20 00:48:22 +00:00
2022-03-15 16:40:59 +00:00
.. code-block :: python
2018-11-20 00:48:22 +00:00
2022-03-15 16:40:59 +00:00
# X is a PyTorch tensor on device
session = onnxruntime.InferenceSession('model.onnx', providers=['CUDAExecutionProvider', 'CPUExecutionProvider']))
binding = session.io_binding()
X_tensor = X.contiguous()
binding.bind_input(
name='X',
device_type='cuda',
device_id=0,
element_type=np.float32,
shape=tuple(x_tensor.shape),
buffer_ptr=x_tensor.data_ptr(),
)
## Allocate the PyTorch tensor for the model output
Y_shape = ... # You need to specify the output PyTorch tensor shape
Y_tensor = torch.empty(Y_shape, dtype=torch.float32, device='cuda:0').contiguous()
binding.bind_output(
name='Y',
device_type='cuda',
device_id=0,
element_type=np.float32,
shape=tuple(Y_tensor.shape),
buffer_ptr=Y_tensor.data_ptr(),
)
session.run_with_iobinding(binding)
API Details
===========
2018-11-20 00:48:22 +00:00
2022-03-15 16:40:59 +00:00
InferenceSession
2021-07-26 13:58:47 +00:00
----------
2018-11-20 00:48:22 +00:00
.. autoclass :: onnxruntime.InferenceSession
:members:
2021-07-26 13:58:47 +00:00
:inherited-members:
2018-11-20 00:48:22 +00:00
2021-07-26 13:58:47 +00:00
Options
-------
2018-11-20 00:48:22 +00:00
2021-10-14 16:07:35 +00:00
RunOptions
^^^^^^^^^^
2018-11-20 00:48:22 +00:00
.. autoclass :: onnxruntime.RunOptions
:members:
2021-10-14 16:07:35 +00:00
SessionOptions
^^^^^^^^^^^^^^
2018-11-20 00:48:22 +00:00
.. autoclass :: onnxruntime.SessionOptions
:members:
2021-07-26 13:58:47 +00:00
Data
----
2021-10-14 16:07:35 +00:00
OrtValue
^^^^^^^^
2021-07-26 13:58:47 +00:00
.. autoclass :: onnxruntime.OrtValue
:members:
2021-10-14 16:07:35 +00:00
SparseTensor
^^^^^^^^^^^^
2021-07-26 13:58:47 +00:00
.. autoclass :: onnxruntime.SparseTensor
:members:
Devices
-------
2021-10-14 16:07:35 +00:00
IOBinding
^^^^^^^^^
2021-07-26 13:58:47 +00:00
.. autoclass :: onnxruntime.IOBinding
:members:
2021-10-14 16:07:35 +00:00
OrtDevice
^^^^^^^^^
2021-07-26 13:58:47 +00:00
.. autoclass :: onnxruntime.OrtDevice
:members:
Internal classes
----------------
These classes cannot be instantiated by users but they are returned
by methods or functions of this libary.
2021-10-14 16:07:35 +00:00
ModelMetadata
^^^^^^^^^^^^^
2021-07-26 13:58:47 +00:00
.. autoclass :: onnxruntime.ModelMetadata
:members:
2021-10-14 16:07:35 +00:00
NodeArg
^^^^^^^
2021-07-26 13:58:47 +00:00
.. autoclass :: onnxruntime.NodeArg
:members:
2018-11-20 00:48:22 +00:00
Backend
=======
In addition to the regular API which is optimized for performance and usability,
*ONNX Runtime* also implements the
`ONNX backend API <https://github.com/onnx/onnx/blob/master/docs/ImplementingAnOnnxBackend.md> `_
for verification of *ONNX* specification conformance.
The following functions are supported:
.. autofunction :: onnxruntime.backend.is_compatible
.. autofunction :: onnxruntime.backend.prepare
.. autofunction :: onnxruntime.backend.run
.. autofunction :: onnxruntime.backend.supports_device