mirror of
https://github.com/saymrwulf/onnxruntime.git
synced 2026-07-03 03:58:54 +00:00
Minor wording changes to design doc (#51)
* Update HighLevelDesign.md * Update HighLevelDesign.md * Update HighLevelDesign.md
This commit is contained in:
parent
6371025860
commit
7523e76649
1 changed files with 17 additions and 18 deletions
|
|
@ -1,7 +1,7 @@
|
|||
# ONNX Runtime High Level Design
|
||||
|
||||
This document outlines the high level design of
|
||||
ONNXRuntime - a high performance, cross platform engine.
|
||||
ONNX Runtime - a high performance, cross platform engine.
|
||||
|
||||
## Key objectives
|
||||
* Maximally and automatically leverage the custom accelerators and runtimes
|
||||
|
|
@ -10,8 +10,8 @@ available on disparate platforms.
|
|||
runtimes. We call this abstraction an [execution
|
||||
provider](../include/onnxruntime/core/framework/execution_provider.h). It defines and exposes a set of
|
||||
its capabilities to ONNXRuntime: a set of single or fused nodes it can
|
||||
execute, its memory allocator and more. Custom accelerators and runtimes are
|
||||
instances of execution provider.
|
||||
execute, its memory allocator, and more. Custom accelerators and runtimes are
|
||||
instances of execution providers.
|
||||
* We don't expect that an execution provider can always run an ONNX model fully
|
||||
on its device. This means that ONNXRuntime must be able to execute a single
|
||||
model in a heterogeneous environment involving multiple execution providers.
|
||||
|
|
@ -35,46 +35,45 @@ provider using the GetCapability() API.
|
|||
|
||||

|
||||
|
||||
*Note: TensorRT and nGraph support in the works.*
|
||||
*Note: TensorRT and nGraph support are in progress*
|
||||
|
||||
### More about partitioning
|
||||
ONNXRuntime partitions a model graph based on the available execution providers
|
||||
into subgraphs, each for a distinct provider respectively. ONNXRuntime provides
|
||||
a default execution provider that is used for fallback execution for the
|
||||
ONNXRuntime partitions a model graph into subgraphs based on the available execution providers, one for each distinct provider. ONNXRuntime provides
|
||||
a default execution provider that is used as the fallback execution for the
|
||||
operators that cannot be pushed onto the more specialized but more efficient
|
||||
execution providers. Intuitively we probably want to push computation to the
|
||||
specialized execution providers as much as possible.
|
||||
execution providers. Intuitively we want to push computation to more
|
||||
specialized execution providers whenever possible.
|
||||
|
||||
We use a simple graph partitioning technique. The available execution providers
|
||||
will be considered in a specific order, and each will be assigned the maximal
|
||||
subgraphs (possibly more than one) that it is able to handle. The
|
||||
ONNXRuntime-provided default execution provider will be the last one to be
|
||||
ONNXRuntime-provided default execution provider will be the last one
|
||||
considered, and it ensures completeness. More sophisticated optimizations can be
|
||||
considered in the future (or can even be implemented as a composite execution
|
||||
provider).
|
||||
|
||||
Conceptually, each partition is reduced to a single fused operator. It is
|
||||
created by invoking the execution provider's Compile() method and wrap it as a
|
||||
created by invoking the execution provider's Compile() method and wraps it as a
|
||||
custom operator. Currently we support only synchronous mode of execution. An execution
|
||||
provider exposes its memory allocator, which is used to allocate the input
|
||||
tensors for the execution provider. The rewriting and partitioning transform the
|
||||
initial model graph into a new graph composed with operators assigned to either
|
||||
initial model graph into a new graph composed of operators assigned to either
|
||||
the default execution provider or other registered execution
|
||||
providers. ONNXRuntime execution engine is responsible for running this graph.
|
||||
providers. The ONNXRuntime execution engine is responsible for running this graph.
|
||||
|
||||
## Key design decisions
|
||||
* Multiple threads should be able to inovke the Run() method on the same
|
||||
* Multiple threads can invoke the Run() method on the same
|
||||
inference session object. See [API doc](C_API.md) for more details.
|
||||
* To facilitate the above the Compute() function of all kernels is const
|
||||
* To facilitate this, the Compute() function of all kernels is const
|
||||
implying the kernels are stateless.
|
||||
* We call implementations of the operators by execution providers as
|
||||
* Implementations of the operators by execution providers are called
|
||||
kernels. Each execution provider supports a subset of the (ONNX)
|
||||
operators/kernels.
|
||||
* ONNXRuntime runtime guarantees that all operators are supported by the default
|
||||
* The ONNXRuntime runtime guarantees that all operators are supported by the default
|
||||
execution provider.
|
||||
* Tensor representation: ONNXRuntime will utilize a standard representation for
|
||||
the tensor runtime values. The execution providers can internally use a
|
||||
different representation, if they choose to, but it is their responsibility to
|
||||
different representation if they choose to, but it is their responsibility to
|
||||
convert the values from/to the standard representation at the boundaries of
|
||||
their subgraph.
|
||||
|
||||
|
|
|
|||
Loading…
Reference in a new issue