The flow is quite simple. Starting from an ONNX model, ONNXRuntime first
converts the model graph into its in-memory graph representation. It then
applies a number of graph transformations that a) perform a set of provider
independent optimizations such cast transformations between float16 and float32, and b) partition the
graph into a set of subgraphs based on the available execution providers. Each
subgraph is assigned to an execution provider. We ensure that a subgraph can be
executed by an execution provider by querying the capability of the execution
provider using the GetCapability() API.

*Note: TensorRT and nGraph support in the works.*
### More about partitioning
ONNXRuntime partitions a model graph based on the available execution providers
into subgraphs, each for a distinct provider respectively. ONNXRuntime provides
a default execution provider that is used for fallback execution for the
operators that cannot be pushed onto the more specialized but more efficient
execution providers. Intuitively we probably want to push computation to the
specialized execution providers as much as possible.
We use a simple graph partitioning technique. The available execution providers
will be considered in a specific order, and each will be assigned the maximal
subgraphs (possibly more than one) that it is able to handle. The
ONNXRuntime-provided default execution provider will be the last one to be
considered, and it ensures completeness. More sophisticated optimizations can be
considered in the future (or can even be implemented as a composite execution
provider).
Conceptually, each partition is reduced to a single fused operator. It is
created by invoking the execution provider's Compile() method and wrap it as a