onnxruntime/docs/execution_providers/MKL-DNN-Subgraphs.md

# Subgraph Optimization

MKL-DNN uses blocked layout (example: nhwc with channels blocked by 16 – nChw16c) to take advantage of vector operations using AVX512.  To get best performance, we avoid reorders (example. Nchw16c to nchw) and propagate blocked layout to next primitive. 

Subgraph optimization achieves this in the following steps.
1.	Parses ONNX Runtime graph and creates an Internal Representation of subgraph..
2.	Subgraph Operator (MklDnnFunKernel) iterates through MKL-DNN nodes and creates a vector MKL-DNN Kernels
3.	Compute Function of MklDnnFunKernel iterates and binds data to MKL-DNN primitives in the vector and submits vector for execution.


## Subgraph (IR) Internal Representation
MklDnnExecutionProvicer::GetCapability() parses ONNX model graph and creates IR (Internal Representation) of subgraphs of MKL-DNN operators.
Each subgraph contains a vector MklDnnNodes, inputs, outputs and attributes for all its MklDnnNodes. There can be attributes of same name. So, we prefix attribute names with Node name and its index. 
Unique id for subgraph is set as an attribute. 

MklDnnNode has an index to its inputs and outputs and pointer to its parent nodes. MklDnnNode directly reads blocked memory from its parent to avoid data reordering.

<p align="left"><img src="images/mkl-dnn_node.png" /></p>


## Subgraph Classes
Primitive like MklDnnConv, MklDnnPool, etc are derived from MklDnnKernel base class.

The following UML diagram captures Subgraph classes.

<p align="left"><img src="images/mkl-dnn_subgraph.png" /></p>


## Subgraph Execution

MklDnnExecutionProvicer::Compute() function creates MklDnnFuncKernel and call it’s Compute Function.


MklDnnFuncKernel::Compute function creates SubgraphPrimitve pool and add the object to a map.

SubgraphPrimitve constructor calls the following member functions
```
SubgraphPrimitve::CreatePrimitives()
    for (auto& mklnode : mklnodes) {
      if (mklnode.name == "Conv") {
        kernel.reset(new MklDnnConv());
        kernels.push_back(kernel);
      } else if (mklnode.name == "BatchNormalization-Relu") {
        kernel.reset(new MklDnnBatchNorm());
        context_.kernels.push_back(kernel);
      } else if (mklnode.name == "MaxPool") {
        kernel.reset(new MklDnnPool());
        context_.kernels.push_back(kernel);
      } 
      .
      .
      .
```      
In CreatePrimitives method, we iterate MklDnnNodes and creates MklDnnKernel objects and add MKL-DNN primitive to a vector. It also reads attributes. This is done only once, at first iteration.

``` 
SubgraphPrimitve::Compute()
   for (auto& kernel : kernels) {
      kernel->Bind(input_tensors, output_tensors);
    }
    stream->submit(net);
```

In SubgraphPrimitve::Compute() method, we iterate thru MklDnn Kernels and bind input data. Then we submit the vector of Primitives to MKL-DNN stream.
-												MKL-DNN Subgraphs (#1116)

* subgraph with memcpy fix

* Linux compile errors fix

* Linux compile errors fix

* subgraph with memcpy fix

* Linux compile errors fix

* Linux compile errors fix

* memcpy (PR1020) fix implemented

* check graph viewer GetNode for nullptr at other plances

* documents

* Review changes (UseSubgraph simplified)

* static_cast<int> removed

* static_cast<int> removed 2

* fall back to CPU implementation in GetCapability()

* check shape for null. fall back to CPU implementation in GetCapability()

* backend data errors fixed

* PR review changes

* disable Opset10 tests

* removed tests from main.cc of test runner. added a check at GetCapability()

* backend data and Model-Zoo related fixes

* subgraph with memcpy fix

* Linux compile errors fix

* Linux compile errors fix

* subgraph with memcpy fix

* Linux compile errors fix

* memcpy (PR1020) fix implemented

* documents

* Review changes (UseSubgraph simplified)

* static_cast<int> removed

* fall back to CPU implementation in GetCapability()

* check shape for null. fall back to CPU implementation in GetCapability()

* backend data errors fixed

* PR review changes

* disable Opset10 tests

* removed tests from main.cc of test runner. added a check at GetCapability()

* backend data and Model-Zoo related fixes

* patch to run tests and models separatly

											
										
										
											2019-06-11 03:18:56 +00:00
+								# Subgraph Optimization
 								MKL-DNN uses blocked layout (example: nhwc with channels blocked by 16 – nChw16c) to take advantage of vector operations using AVX512.  To get best performance, we avoid reorders (example. Nchw16c to nchw) and propagate blocked layout to next primitive.
 								Subgraph optimization achieves this in the following steps.
 .	Parses ONNX Runtime graph and creates an Internal Representation of subgraph..
 .	Subgraph Operator (MklDnnFunKernel) iterates through MKL-DNN nodes and creates a vector MKL-DNN Kernels
 .	Compute Function of MklDnnFunKernel iterates and binds data to MKL-DNN primitives in the vector and submits vector for execution.
 								## Subgraph (IR) Internal Representation
 								MklDnnExecutionProvicer::GetCapability() parses ONNX model graph and creates IR (Internal Representation) of subgraphs of MKL-DNN operators.
 								Each subgraph contains a vector MklDnnNodes, inputs, outputs and attributes for all its MklDnnNodes. There can be attributes of same name. So, we prefix attribute names with Node name and its index.
 								Unique id for subgraph is set as an attribute.
 								MklDnnNode has an index to its inputs and outputs and pointer to its parent nodes. MklDnnNode directly reads blocked memory from its parent to avoid data reordering.
 								<p align="left"><img src="images/mkl-dnn_node.png" /></p>
 								## Subgraph Classes
 								Primitive like MklDnnConv, MklDnnPool, etc are derived from MklDnnKernel base class.
 								The following UML diagram captures Subgraph classes.
 								<p align="left"><img src="images/mkl-dnn_subgraph.png" /></p>
 								## Subgraph Execution
 								MklDnnExecutionProvicer::Compute() function creates MklDnnFuncKernel and call it’s Compute Function.
 								MklDnnFuncKernel::Compute function creates SubgraphPrimitve pool and add the object to a map.
 								SubgraphPrimitve constructor calls the following member functions
 								```
 								SubgraphPrimitve::CreatePrimitives()
 								    for (auto& mklnode : mklnodes) {
 								      if (mklnode.name == "Conv") {
 								        kernel.reset(new MklDnnConv());
 								        kernels.push_back(kernel);
 								      } else if (mklnode.name == "BatchNormalization-Relu") {
 								        kernel.reset(new MklDnnBatchNorm());
 								        context_.kernels.push_back(kernel);
 								      } else if (mklnode.name == "MaxPool") {
 								        kernel.reset(new MklDnnPool());
 								        context_.kernels.push_back(kernel);
 								      }
 								      .
 								      .
 								      .
 								```
 								In CreatePrimitives method, we iterate MklDnnNodes and creates MklDnnKernel objects and add MKL-DNN primitive to a vector. It also reads attributes. This is done only once, at first iteration.
 								```
 								SubgraphPrimitve::Compute()
 								   for (auto& kernel : kernels) {
 								      kernel->Bind(input_tensors, output_tensors);
 								    }
 								    stream->submit(net);
 								```
 								In SubgraphPrimitve::Compute() method, we iterate thru MklDnn Kernels and bind input data. Then we submit the vector of Primitives to MKL-DNN stream.