diff --git a/README.md b/README.md index af60d60367..0226ee6783 100644 --- a/README.md +++ b/README.md @@ -1,32 +1,33 @@

-**ONNX Runtime** is a cross-platform **inference and training machine-learning accelerator** compatible with deep learning frameworks, PyTorch and TensorFlow/Keras, as well as classical machine learning libraries such as scikit-learn, and more. +**ONNX Runtime is a cross-platform inference and training machine-learning accelerator**. -ONNX Runtime uses the portable [ONNX](https://onnx.ai) computation graph format, backed by execution providers optimized for operating systems, drivers and hardware. +**ONNX Runtime inference** can enable faster customer experiences and lower costs, supporting models from deep learning frameworks such as PyTorch and TensorFlow/Keras as well as classical machine learning libraries such as scikit-learn, LightGBM, XGBoost, etc. ONNX Runtime is compatible with different hardware, drivers, and operating systems, and provides optimal performance by leveraging hardware accelerators where applicable alongside graph optimizations and transforms. [Learn more →](https://www.onnxruntime.ai/docs/#onnx-runtime-for-inferencing) -Common use cases for ONNX Runtime: - -* Improve inference performance for a wide variety of ML models -* Reduce time and cost of training large models -* Train in Python but deploy into a C#/C++/Java app -* Run with optimized performance on different hardware and operating systems -* Support models created in several different frameworks - -[ONNX Runtime inference](https://www.onnxruntime.ai/docs/get-started/inference.html) APIs are stable and production-ready since the [1.0 release](https://github.com/microsoft/onnxruntime/releases/tag/v1.0.0) in October 2019 and can enable faster customer experiences and lower costs. - -[ONNX Runtime training](https://www.onnxruntime.ai/docs/get-started/training.html) feature was introduced in May 2020 in preview. This feature supports acceleration of PyTorch training on multi-node NVIDIA GPUs for transformer models. Additional updates for this feature are coming soon. +**ONNX Runtime training** can accelerate the model training time on multi-node NVIDIA GPUs for transformer models with a one-line addition for existing PyTorch training scripts. [Learn more →](https://www.onnxruntime.ai/docs/#onnx-runtime-for-training) ## Get Started **http://onnxruntime.ai/** -* [Install](https://www.onnxruntime.ai/docs/get-started/install.html) -* [Inference](https://www.onnxruntime.ai/docs/get-started/inference.html) -* [Training](https://www.onnxruntime.ai/docs/get-started/training.html) -* [Documentation](https://www.onnxruntime.ai/docs/) -* [Samples and Tutorials](https://www.onnxruntime.ai/docs/tutorials/) -* [Build Instructions](https://www.onnxruntime.ai/docs/how-to/build.html) -* [Frequently Asked Questions](./docs/FAQ.md) +* [Overview](https://www.onnxruntime.ai/docs/) +* [Tutorials](https://www.onnxruntime.ai/docs/tutorials/) + * [Inferencing](https://www.onnxruntime.ai/docs/tutorials/inferencing/) + * [Training](https://www.onnxruntime.ai/docs/tutorials/training/) +* [How To](https://www.onnxruntime.ai/docs/how-to) + * [Install](https://www.onnxruntime.ai/docs/how-to/install.html) + * [Build](https://www.onnxruntime.ai/docs/how-to/build/) + * [Tune performance](https://www.onnxruntime.ai/docs/how-to/tune-performance.html) + * [Quantize models](https://www.onnxruntime.ai/docs/how-to/quantization.html) + * [Deploy on mobile](https://www.onnxruntime.ai/docs/how-to/deploy-on-mobile.html) + * [Use custom ops](https://www.onnxruntime.ai/docs/how-to/add-custom-op.html) + * [Add a new EP](https://www.onnxruntime.ai/docs/how-to/add-execution-provider.html) +* [Reference](https://www.onnxruntime.ai/docs/reference) + * [API documentation](https://www.onnxruntime.ai/docs/reference/api/) + * [Execution Providers](https://www.onnxruntime.ai/docs/reference/execution-providers/) + * [Releases and servicing](https://www.onnxruntime.ai/docs/reference/releases-servicing.html) + * [Citing](https://www.onnxruntime.ai/docs/reference/citing.html) +* [Additional resources](https://www.onnxruntime.ai/docs/resources/) ## Build Pipeline Status |System|CPU|GPU|EPs| @@ -41,7 +42,7 @@ Common use cases for ONNX Runtime: ## Data/Telemetry -This project may collect usage data and send it to Microsoft to help improve our products and services. See the [privacy statement](docs/Privacy.md) for more details. +Windows distributions of this project may collect usage data and send it to Microsoft to help improve our products and services. See the [privacy statement](docs/Privacy.md) for more details. ## Contributions and Feedback diff --git a/docs/ONNX_Runtime_Mobile_NNAPI_perf_considerations.md b/docs/ONNX_Runtime_Mobile_NNAPI_perf_considerations.md deleted file mode 100644 index 9a6fe176dc..0000000000 --- a/docs/ONNX_Runtime_Mobile_NNAPI_perf_considerations.md +++ /dev/null @@ -1,111 +0,0 @@ -# ONNX Runtime Mobile: Performance Considerations When Using NNAPI - -ONNX Runtime Mobile with the NNAPI Execution Provider (EP) can be used to execute ORT format models on Android platforms using NNAPI. This document explains the details of how different optimizations affect performance, and provides some suggestions for performance testing with ORT format models. - -Please review the introductory details for [using NNAPI with ONNX Runtime Mobile](ONNX_Runtime_for_Mobile_Platforms.md#Using-NNAPI-with-ONNX-Runtime-Mobile) first. - - -## 1. ONNX Model Optimization Example - -ONNX Runtime applies optimizations to the ONNX model to improve inferencing performance. These optimizations occur prior to exporting an ORT format model. See the [graph optimization](https://www.onnxruntime.ai/docs/resources/graph-optimizations.html) documentation for further details of the available optimizations. - -It is important to understand how the different optimization levels affect the nodes in the model, as this will determine how much of the model can be executed using NNAPI. - -*Basic* - -The _basic_ optimizations remove redundant nodes and perform constant folding. Only ONNX operators are used by these optimizations when modifying the model. - -*Extended* - -The _extended_ optimizations replace one or more standard ONNX operators with custom internal ONNX Runtime operators to boost performance. Each optimization has a list of EPs that it is valid for. It will only replace nodes that are assigned to that EP, and the replacement node will be executed using the same EP. - -*Layout* - -_Layout_ optimizations are hardware specific, and should not be used when creating ORT format models. - -### Outcome of optimizations when creating an optimized ORT format model - -Below is an example of the changes that occur in _basic_ and _extended_ optimizations when applied to the MNIST model with only the CPU EP enabled. The optimization level is specified when creating the ORT format model using `convert_onnx_models_to_ort.py`. - - - At the _basic_ level we combine the Conv and Add nodes (the addition is done via the 'B' input to Conv), we combine the MatMul and Add into a single Gemm node (the addition is done via the 'C' input to Gemm), and constant fold to remove one of the Reshape nodes. - - `python /tools/python/convert_onnx_models_to_ort.py --optimization_level basic /dir_with_mnist_onnx_model` - - At the _extended_ level we additionally fuse the Conv and Relu nodes using the internal ONNX Runtime FusedConv operator. - - `python /tools/python/convert_onnx_models_to_ort.py --optimization_level extended /dir_with_mnist_onnx_model` - -Changes to nodes from basic and extended optimizations. - -### Outcome of executing an optimized ORT format model using the NNAPI EP - -If the NNAPI EP is registered at runtime, it is given an opportunity to select the nodes in the loaded model that it can execute. When doing so it will group as many nodes together as possible to minimize the overhead of copying data between the CPU and NNAPI to execute the nodes. Each group of nodes can be considered as a sub-graph. The more nodes in each sub-graph, and the fewer sub-graphs, the better the performance will be. - -For each sub-graph, the NNAPI EP will create an [NNAPI model](https://developer.android.com/ndk/guides/neuralnetworks#model) that replicates the processing of the original nodes. It will create a function that executes this NNAPI model and performs any required data copies between CPU and NNAPI. ONNX Runtime will replace the original nodes in the loaded model with a single node that calls this function. - -If the NNAPI EP is not registered, or can not process a node, the node will be executed using the CPU EP. - -Below is an example for the MNIST model comparing what happens to the ORT format models at runtime if the NNAPI EP is registered. - -As the _basic_ level optimizations result in a model that only uses ONNX operators, the NNAPI EP is able to handle the majority of the model as NNAPI can execute the Conv, Relu and MaxPool nodes. This is done with a single NNAPI model as all the nodes NNAPI can handle are connected. We would expect performance gains from using NNAPI with this model, as the overhead of the device copies between CPU and NNAPI for a single NNAPI node is likely to be exceeded by the time saved executing multiple operations at once using NNAPI. - -The _extended_ level optimizations introduce the custom FusedConv nodes, which the NNAPI EP ignores as it will only take nodes that are using ONNX operators that NNAPI can handle. This results in two nodes using NNAPI, each handling a single MaxPool operation. The performance of this model is likely to be adversely affected, as the overhead of the device copies between CPU and NNAPI (which are required before and after each of the two NNAPI nodes) is unlikely to be exceeded by the time saved executing a single MaxPool operation each time using NNAPI. Better performance may be obtainable by not registering the NNAPI EP so that all nodes in the model are executed using the CPU EP. - -Changes to nodes by NNAPI EP depending on the optimization level the model was created with. - -## 2. Initial Performance Testing - -The best optimization settings will differ by model. Some models may perform better with NNAPI, some models may not. As the performance will be model specific you must performance test to determine the best combination for your model. - -It is suggested to run performance tests: - - with NNAPI enabled and an ORT format model created with _basic_ level optimization - - with NNAPI disabled and an ORT format model created with _extended_ level optimization - -For most scenarios it is expected that one of these two approaches will yield the best performance. - -If using an ORT format model with _basic_ level optimizations and NNAPI yields equivalent or better performance, it _may_ be possible to further improve performance by creating an NNAPI-aware ORT format model. The difference with this model is that the _extended_ optimizations are applied to nodes that can not be executed using NNAPI. Whether any nodes fall into this category is model dependent. - -## 3. Creating an NNAPI-aware ORT format model - -An NNAPI-aware ORT format model will keep all nodes from the ONNX model that can be executed using NNAPI, and allow _extended_ optimizations to be applied to any remaining nodes. - -For our MNIST model that would mean that after the _basic_ optimizations are applied, the nodes in the red shading are kept as-is, and nodes in the green shading could have _extended_ optimizations applied to them. - -Show nodes that are preserved as NNAPI can execute them, and nodes that are considered by extended optimizations. - -To create an NNAPI-aware ORT format model please follow these steps. - -1. Create a 'full' build of ONNX Runtime with the NNAPI EP by [building ONNX Runtime from source](https://www.onnxruntime.ai/docs/how-to/build.html#cpu). - - This build can be done on any platform, as the NNAPI EP can be used to create the ORT format model without the Android NNAPI library as there is no model execution in this process. When building add `--use_nnapi --build_shared_lib --build_wheel` to the build flags if any of those are missing. - - Do NOT add the --minimal_build` flag. - - Windows : - ``` - \build.bat --config RelWithDebInfo --use_nnapi --build_shared_lib --build_wheel --parallel - ``` - - - Linux: - ``` - /build.sh --config RelWithDebInfo --use_nnapi --build_shared_lib --build_wheel --parallel - ``` - - - **NOTE** if you have previously done a minimal build you will need to run `git reset --hard` to make sure any operator kernel exclusions are reversed prior to performing the 'full' build. If you do not, you may not be able to load the ONNX format model due to missing kernels. - -2. Install the python wheel from the build output directory. - - - Windows : This is located in `build/Windows///dist/.whl`. - - - Linux : This is located in `build/Linux//dist/.whl`. - - The package name will differ based on your platform, python version, and build parameters. `` is the value from the `--config` parameter from the build command. - ``` - pip install -U build\Windows\RelWithDebIfo\RelWithDebIfo\dist\onnxruntime_noopenmp-1.5.2-cp37-cp37m-win_amd64.whl - ``` - -3. Create an NNAPI-aware ORT format model by running `convert_onnx_models_to_ort.py` as per the [standard instructions](ONNX_Runtime_for_Mobile_Platforms.md#Create-ORT-format-model-and-configuration-file-with-required-operators), with NNAPI enabled (`--use_nnapi`), and the optimization level set to _extended_ (`--optimization_level extended`). This will allow extended level optimizations to run on any nodes that NNAPI can not handle. - - ``` - python /tools/python/convert_onnx_models_to_ort.py --use_nnapi --optimization_level extended /models - ``` - - The python package from your 'full' build with NNAPI enabled must be installed for `--use_nnapi` to be a valid option - -This model can be used with [a minimal build that includes the NNAPI EP](ONNX_Runtime_for_Mobile_Platforms.md#Create-a-minimal-build-for-Android-with-NNAPI-support). \ No newline at end of file diff --git a/docs/ONNX_Runtime_for_Mobile_Platforms.md b/docs/ONNX_Runtime_for_Mobile_Platforms.md deleted file mode 100644 index a86798d546..0000000000 --- a/docs/ONNX_Runtime_for_Mobile_Platforms.md +++ /dev/null @@ -1,185 +0,0 @@ -# ONNX Runtime for Mobile Platforms - -## Overview - -ONNX Runtime now supports an internal model format to minimize the build size for usage in mobile, edge and embedded scenarios. An ONNX model can be converted to an internal ONNX Runtime format ('ORT format model') using the below instructions. - -A minimal build can be used with any ORT format model, provided that the kernels for the operators used in the model were included in the build. -i.e. the custom build provides a set of kernels, and if that set satisfies a given ORT format model's needs, the model can be loaded and executed. - -
-Steps to build for mobile platforms. -
- -## Steps to create model and minimal build - -As you will need to perform a custom build of ONNX Runtime, you will need to clone the repository locally. See [here](https://www.onnxruntime.ai/docs/how-to/build.html#prerequisites) for initial steps. - -The directory the ONNX Runtime repository was cloned into is referred to as `` in this documentation. - -Once you have cloned the repository, perform the following steps to create a minimal build of ONNX Runtime that is model specific: - -### 1. Create ORT format model and configuration file with required operators - -We will use a helper python script to convert ONNX format models into ORT format models, and to create the configuration file for use with the minimal build. - -The configuration file specifies what operator kernels to include in the build. -This allows unused operator kernels to be pruned in order to decrease the binary size. - -It is also possible (and optional) to further prune the operator kernel implementations based on their input and output type usage detected in the ORT format models. -This pruning is referred to as "operator type reduction" in this documentation. - -- The helper python script requires the standard ONNX Runtime python package to be installed. Install the ONNX Runtime python package from https://pypi.org/project/onnxruntime/. Version 1.5.2 or later is required. - - `pip install onnxruntime` - - Ensure that any existing ONNX Runtime python package was uninstalled first, or use `-U` with the above command to upgrade an existing package. - -- Additionally, if you want to enable operator type reduction, ONNX Runtime version 1.7 or later is required, and the flatbuffers python package must be installed. - - `pip install flatbuffers` - -- Copy all the ONNX models you wish to convert and use with the minimal build into a directory. - -- Convert the ONNX models to ORT format - - Run the helper script to convert the models - - For ONNX Runtime version 1.8 or later you can use ONNX Runtime python package: - - `python -m onnxruntime.tools.convert_onnx_models_to_ort ` - - Alternatively use the conversion script in the ONNX Runtime repository: - - `python /tools/python/convert_onnx_models_to_ort.py ` - - To enable operator type reduction, specify the `--enable_type_reduction` option - - For each ONNX model an ORT format model will be created with '.ort' as the file extension - - A configuration file will also be created. - If operator type reduction is enabled, the file will be called `required_operators_and_types.config`. - Otherwise, the file will be called `required_operators.config`. - -Example: - -Running `python -m onnxruntime.tools.convert_onnx_models_to_ort /models` where the '/models' directory contains ModelA.onnx and ModelB.onnx - - Will create /models/ModelA.ort and /models/ModelB.ort - - Will create /models/required_operators.config - -### 2. Create the minimal build - -You will need to build ONNX Runtime from source to reduce the included operator kernels and other aspects of the binary. - -See [here](https://www.onnxruntime.ai/docs/how-to/build.html#cpu) for the general ONNX Runtime build instructions. - - -#### Binary size reduction options: - -The follow options can be used to reduce the build size. Enable all options that your scenario allows. - - Reduce build to required operator kernels - - Add `--include_ops_by_config --skip_tests` to the build parameters. - - To enable operator type reduction, also add `--enable_reduced_operator_type_support`. - - See the documentation on the [Reduced Operator Kernel build](Reduced_Operator_Kernel_build.md) for more information. This step can also be done pre-build if needed. - - NOTE: This step will edit some of the ONNX Runtime source files to exclude unused kernels. If you wish to go back to creating a full build, or wish to change the operator kernels included, you should run `git reset --hard` or `git checkout HEAD -- ./onnxruntime/core/providers` to undo these changes. - - - Enable minimal build (`--minimal_build`) - - A minimal build will ONLY support loading and executing ORT format models. - - RTTI is disabled by default in this build, unless the Python bindings (`--build_wheel`) are enabled. - - If you wish to enable execution providers that compile kernels such as NNAPI specify `--minimal_build extended`. - - See [here](#Using-NNAPI-with-ONNX-Runtime-Mobile) for more information about using NNAPI with ONNX Runtime Mobile on Android platforms - - - Disable exceptions (`--disable_exceptions`) - - Disables support for exceptions in the build. - - Any locations that would have thrown an exception will instead log the error message and call abort(). - - Requires `--minimal_build`. - - NOTE: This is not a valid option if you need the Python bindings (`--build_wheel`) as the Python Wheel requires exceptions to be enabled. - - Exceptions are only used in ONNX Runtime for exceptional things. If you have validated the input to be used, and validated that the model can be loaded, it is unlikely that ORT would throw an exception unless there's a system level issue (e.g. out of memory). - - - ML op support (`--disable_ml_ops`) - - Whilst the operator kernel reduction script will disable all unused ML operator kernels, additional savings can be achieved by removing support for ML specific types. If you know that your model has no ML ops, or no ML ops that use the Map type, this flag can be provided. - - See the specs for the [ONNX ML Operators](https://github.com/onnx/onnx/blob/master/docs/Operators-ml.md) if unsure. - - - Use shared libc++ on Android (`--android_cpp_shared`) - - Building using the shared libc++ library instead of the default static libc++ library will result in a smaller libonnxruntime.so library. - - See [Android NDK documentation](https://developer.android.com/ndk/guides/cpp-support) for more information. - -#### Build Configuration - -The `MinSizeRel` configuration will produce the smallest binary size. -The `Release` configuration could also be used if you wish to prioritize performance over binary size. - -#### Example build commands - -##### Windows - -`\build.bat --config=MinSizeRel --cmake_generator="Visual Studio 16 2019" --build_shared_lib --minimal_build --disable_ml_ops --disable_exceptions --include_ops_by_config --skip_tests` - -##### Linux - -`/build.sh --config=MinSizeRel --build_shared_lib --minimal_build --disable_ml_ops --disable_exceptions --include_ops_by_config --skip_tests` - -##### Building ONNX Runtime Python Wheel as part of Minimal build - -Remove `--disable_exceptions` (Python requires exceptions to be enabled) and add `--build_wheel` to build a Python Wheel with the ONNX Runtime bindings. -A .whl file will be produced in the build output directory under the `/dist` folder. - - - The Python Wheel for a Windows MinSizeRel build using build.bat would be in `\build\Windows\MinSizeRel\MinSizeRel\dist\` - - The Python Wheel for a Linux MinSizeRel build using build.sh would be in `/build/Linux/MinSizeRel/dist/` - -The wheel can be installed using `pip`. Adjust the following command for your platform and the whl filename. - - `pip install -U .\build\Windows\MinSizeRel\MinSizeRel\dist\onnxruntime-1.4.0-cp37-cp37m-win_amd64.whl` - -## Executing ORT format models - -The API for executing ORT format models is the same as for ONNX models. See the [ONNX Runtime API documentation](https://github.com/Microsoft/onnxruntime/#api-documentation). - -If you provide a filename for the ORT format model, a file extension of '.ort' will be inferred to be an ORT format model. -If you provide in-memory bytes for the ORT format model, a marker in those bytes will be checked to infer if it's an ORT format model. - -If you wish to explicitly say that the InferenceSession input is an ORT format model you can do so via SessionOptions. - -C++ API -```C++ -Ort::SessionOptions session_options; -session_options.AddConfigEntry('session.load_model_format', 'ORT'); -``` - -Python -```python -so = onnxruntime.SessionOptions() -so.add_session_config_entry('session.load_model_format', 'ORT') -session = onnxruntime.InferenceSession(, so) -``` - -## Using NNAPI with ONNX Runtime Mobile - -Using the NNAPI Execution Provider on Android platforms is now supported by ONNX Runtime Mobile. A minimal build targeting Android with NNAPI support must be created. An ORT format model that only uses ONNX operators is also recommended as a starting point. - -For a more in-depth analysis of the performance considerations when using NNAPI with an ORT format model please see [ONNX Runtime Mobile: Performance Considerations When Using NNAPI](ONNX_Runtime_Mobile_NNAPI_perf_considerations.md). - -### Limit ORT format model to ONNX operators - -The NNAPI Execution Provider is only able to execute ONNX operators using NNAPI. When creating the ORT format model it is recommended to limit the optimization level to 'basic' so that custom internal ONNX Runtime operators are not added by the 'extended' or 'all' optimization levels. This will ensure that the maximum number of nodes can be executed using NNAPI. See the [graph optimization](https://www.onnxruntime.ai/docs/resources/graph-optimizations.html) documentation for details on the optimization levels. - -To limit the optimization level when creating the ORT format models using `tools\python\convert_onnx_models_to_ort.py` as per the above [instructions](#1-Create-ORT-format-model-and-configuration-file-with-required-operators), add `--optimization_level basic` to the arguments. - - e.g. `python -m onnxruntime.tools.convert_onnx_models_to_ort --optimization_level basic /models` or `python /tools/python/convert_onnx_models_to_ort.py --optimization_level basic /models` - -### Create a minimal build for Android with NNAPI support - -For NNAPI to be used on Android with ONNX Runtime Mobile, the NNAPI Execution Provider must be included in the minimal build. - -First, read the general instructions for [creating an Android build with NNAPI included](https://www.onnxruntime.ai/docs/how-to/build.html#android-nnapi-execution-provider). These provide details on setting up the components required to create an Android build of ONNX Runtime, such as the Android NDK. - -Once you have all the necessary components setup, follow the instructions to [create the minimal build](#2-Create-the-minimal-build), with the following changes: - - Replace `--minimal_build` with `--minimal_build extended` to enable support for execution providers that dynamically create kernels at runtime, which is needed by the NNAPI Execution Provider. - - Add `--use_nnapi` to include the NNAPI Execution Provider in the build - - Windows example: - `.\build.bat --config RelWithDebInfo --android --android_sdk_path D:\Android --android_ndk_path D:\Android\ndk\21.1.6352462\ --android_abi arm64-v8a --android_api 29 --cmake_generator Ninja --minimal_build extended --use_nnapi --disable_ml_ops --disable_exceptions --build_shared_lib --skip_tests --include_ops_by_config ` - - Linux example: - `./build.sh --config RelWithDebInfo --android --android_sdk_path /Android --android_ndk_path /Android/ndk/21.1.6352462/ --android_abi arm64-v8a --android_api 29 --minimal_build extended --use_nnapi --disable_ml_ops --disable_exceptions --build_shared_lib --skip_tests --include_ops_by_config ` - -## Limitations - -A minimal build has the following limitations currently: - - No support for ONNX format models - - Model must be converted to ORT format - - No support for runtime optimizations - - Optimizations should be performed prior to conversion to ORT format - - Execution providers that statically register kernels (e.g. ONNX Runtime CPU Execution Provider) are supported by default - - Limited support for runtime partitioning (assigning nodes in a model to specific execution providers) - - Execution providers that statically register kernels and will be used at runtime MUST be enabled when creating the ORT format model - - Execution providers that compile nodes are optionally supported, and nodes they create will be correctly partitioned at runtime - - currently this is limited to the NNAPI Execution Provider - -We do not currently offer backwards compatibility guarantees for ORT format models, as we will be expanding the capabilities in the short term and may need to update the internal format in an incompatible manner to accommodate these changes. You may need to regenerate the ORT format models to use with a future version of ONNX Runtime. Once the feature set stabilizes we will provide backwards compatibility guarantees. -