onnxruntime/docs/C_API.md

# C API

## Features

* Creating an InferenceSession from an on-disk model file and a set of SessionOptions.
* Registering customized loggers.
* Registering customized allocators.
* Registering predefined providers and set the priority order. ONNXRuntime has a set of predefined execution providers, like CUDA, DNNL. User can register providers to their InferenceSession. The order of registration indicates the preference order as well.
* Running a model with inputs. These inputs must be in CPU memory, not GPU. If the model has multiple outputs, user can specify which outputs they want.
* Converting an in-memory ONNX Tensor encoded in protobuf format to a pointer that can be used as model input.
* Setting the thread pool size for each session.
* Setting graph optimization level for each session.
* Dynamically loading custom ops. [Instructions](/docs/AddingCustomOp.md)
* Ability to load a model from a byte array. See ```OrtCreateSessionFromArray``` in [onnxruntime_c_api.h](/include/onnxruntime/core/session/onnxruntime_c_api.h).
* **Global/shared threadpools:** By default each session creates its own set of threadpools. In situations where multiple
sessions need to be created (to infer different models) in the same process, you end up with several threadpools created
by each session. In order to address this inefficiency we introduce a new feature called global/shared threadpools.
The basic idea here is to share a set of global threadpools across multiple sessions. Typical usage of this feature
is as follows
   * Populate ```ThreadingOptions```. Use the value of 0 for ORT to pick the defaults.
   * Create env using ```CreateEnvWithGlobalThreadPools()```
   * Create session and call ```DisablePerSessionThreads()``` on the session options object
   * Call ```Run()``` as usual
* **Share allocator(s) between sessions:**
   * *Description*: This feature allows multiple sessions in the same process to use the same allocator(s).
   * *Scenario*: You've several sessions in the same process and see high memory usage. One of the reasons for this is as follows. Each session creates its own CPU allocator which is arena based by default. [ORT implements](onnxruntime/core/framework/bfc_arena.h) a simplified version of an arena allocator that is based on [Doug Lea's best-first with coalescing algorithm](http://gee.cs.oswego.edu/dl/html/malloc.html). Each allocator lives in its own session. It allocates a large region of memory during init time and thereafter it chunks, coalesces and extends this initial region as per allocation/deallocation demands. Overtime the arena ends up with unused chunks of memory per session. Moreover, the memory allocated by the arena is never returned to the system; once allocated it always remains allocated. All these factors add up when using multiple sessions (each with its own arena) thereby increasing the overall memory consumption of the process. Hence it becomes important to share the arena allocator between sessions.
   * *Usage*:
      * Create and register a shared allocator with the env using the ```CreateAndRegisterAllocator``` API. This allocator is then reused by all sessions that use the same env instance unless a session
chooses to override this by setting ```session_state.use_env_allocators``` to "0".
      * Set ```session.use_env_allocators``` to "1" for each session that wants to use the env registered allocators.
      * See test ```TestSharedAllocatorUsingCreateAndRegisterAllocator``` in
     onnxruntime/test/shared_lib/test_inference.cc for an example.
      * Configuring *OrtArenaCfg*:
         * Default values for these configs can be found in the [BFCArena class](onnxruntime/core/framework/bfc_arena.h).
         * ```initial_chunk_size_bytes```: This is the size of the region that the arena allocates first. Chunks are handed over to allocation requests from this region. If the logs show that the arena is getting extended a lot more than expected, you're better off choosing a big enough initial size for this.
         * ```max_mem```: This is the maximum amount of memory the arena allocates. If a chunk cannot be serviced by any existing region, the arena extends itself by allocating one more region depending on available memory (max_mem - allocated_so_far). An error is returned if available memory is less than the requested extension.
         * ```arena_extend_strategy```: This can take only 2 values currently: kSameAsRequested or kNextPowerOfTwo. As the name suggests kNextPowerOfTwo (the default) extends the arena by a power of 2, while kSameAsRequested extends by a size that is the same as the allocation request each time. kSameAsRequested is suited for more advanced configurations where you know the expected memory usage in advance.
         * ```max_dead_bytes_per_chunk```: This controls whether a chunk is split to service an allocation request. Currently if the difference between the chunk size and requested size is less than this value, the chunk is not split. This has the potential to waste memory by keeping a part of the chunk unused (hence called dead bytes) throughout the process thereby increasing the memory usage (until this chunk is returned to the arena).

* **Share initializer(s) between sessions:**
   * *Description*: This feature allows a user to share the same instance of an initializer across
multiple sessions.
   * *Scenario*: You've several models that use the same set of initializers except the last few layers of the model and you load these models in the same process. When every model (session) creates a separate instance of the same initializer, it leads to excessive and wasteful memory usage since in this case it's the same initializer. You want to optimize memory usage while having the flexibility to allocate the initializers (possibly even store them in shared memory). 
   * *Example Usage*: Use the ```AddInitializer``` API to add a pre-allocated initializer to session options before calling ```CreateSession```. Use the same instance of session options to create several sessions allowing the initializer(s) to be shared between the sessions. See [C API sample usage (TestSharingOfInitializer)](../onnxruntime/test/shared_lib/test_inference.cc) and [C# API sample usage (TestWeightSharingBetweenSessions)](../csharp/test/Microsoft.ML.OnnxRuntime.Tests/InferenceTest.cs).

## Usage Overview

1. Include [onnxruntime_c_api.h](/include/onnxruntime/core/session/onnxruntime_c_api.h).
2. Call OrtCreateEnv
3. Create Session: OrtCreateSession(env, model_uri, nullptr,...)
   - Optionally add more execution providers (e.g. for CUDA use OrtSessionOptionsAppendExecutionProvider_CUDA)
4. Create Tensor
   1) OrtCreateMemoryInfo
   2) OrtCreateTensorWithDataAsOrtValue
5. OrtRun

## Sample code

The example below shows a sample run using the SqueezeNet model from ONNX model zoo, including dynamically reading model inputs, outputs, shape and type information, as well as running a sample vector and fetching the resulting class probabilities for inspection.

* [../csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests.Capi/C_Api_Sample.cpp](../csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests.Capi/C_Api_Sample.cpp)

## Deployment

### Windows 10

Your installer should put the onnxruntime.dll into the same folder as your application.   Your application can either use [load-time dynamic linking](https://docs.microsoft.com/en-us/windows/win32/dlls/using-load-time-dynamic-linking) or [run-time dynamic linking](https://docs.microsoft.com/en-us/windows/win32/dlls/using-run-time-dynamic-linking) to bind to the dll.

#### Dynamic Link Library Search Order

This is an important article on how Windows finds supporting dlls: [Dynamic Link Library Search Order](https://docs.microsoft.com/en-us/windows/win32/dlls/dynamic-link-library-search-order).

There are some cases where the app is not directly consuming the onnxruntime but instead calling into a DLL that is consuming the onnxruntime.    People building these DLLs that consume the onnxruntime need to take care about folder structures.  Do not modify the system %path% variable to add your folders.  This can conflict with other software on the machine that is also using the onnxruntme.  Instead place your DLL and the onnxruntime DLL in the same folder and use [run-time dynamic linking](https://docs.microsoft.com/en-us/windows/win32/dlls/using-run-time-dynamic-linking) to bind explicity to that copy.  You can use code like this sample does in [GetModulePath()](https://github.com/microsoft/Windows-Machine-Learning/blob/master/Samples/SampleSharedLib/SampleSharedLib/FileHelper.cpp) to find out what folder your dll is loaded from.

## Telemetry

To turn on/off telemetry collection on official Windows builds, please use Enable/DisableTelemetryEvents() in the C API. See the [Privacy](./Privacy.md) page for more information on telemetry collection and Microsoft's privacy policy.
Initial bootstrap commit. 2018-11-20 00:48:22 +00:00			`# C API`

Bug bash (#43) * Update README.md * Update Versioning.md * Update rename_manylinux.sh Remove duplicate word * Update README.md Remove a 'the' as ONNX Runtime is a proper noun. * Update CUDA version to 9.1 cudnn version to 7.1 * Update ReleaseManagement.md * put tensorflow copy-right headers there are around 10 lines of code is borrowed from tflite. * Update README.md Mention C++ API * Update README.md Fix link * Update C_API.md Fix broken link to onnxruntime_c_api.h * Update ABI.md Delete mention of COM and fix 'ONNX Runtime' to be two words * Update README.md * Update README.md * Update C_API.md 2018-11-28 02:52:50 +00:00			`## Features`
Sync with internal master. 2018-11-23 04:56:43 +00:00
Initial bootstrap commit. 2018-11-20 00:48:22 +00:00			`* Creating an InferenceSession from an on-disk model file and a set of SessionOptions.`
			`* Registering customized loggers.`
			`* Registering customized allocators.`
Renaming MKL-DNN as DNNL (#2515) * DNNL: Moving Files to rename file names * DNNL name change * azure pipeline updated * disable ceil/dialation and enable Opset10 * disable ceil/dialation tests in Python * mlperf_ssd_resnet34_1200 disabled 2019-12-03 15:34:23 +00:00			`* Registering predefined providers and set the priority order. ONNXRuntime has a set of predefined execution providers, like CUDA, DNNL. User can register providers to their InferenceSession. The order of registration indicates the preference order as well.`
Initial bootstrap commit. 2018-11-20 00:48:22 +00:00			`* Running a model with inputs. These inputs must be in CPU memory, not GPU. If the model has multiple outputs, user can specify which outputs they want.`
add details aboud adding execution providers in the C api to comments and docs (i.e. need OrtSessionOptionsAppendExecutionProvider_CUDA to get CUDA) 2019-06-02 19:59:01 +00:00			`* Converting an in-memory ONNX Tensor encoded in protobuf format to a pointer that can be used as model input.`
Initial bootstrap commit. 2018-11-20 00:48:22 +00:00			`* Setting the thread pool size for each session.`
add capi to set graph optimization level (#657) * add capi to set graph optimization level * remove 1 unnecessary check + review comment * plus updates 2019-03-21 00:14:46 +00:00			`* Setting graph optimization level for each session.`
Add link to custom ops (#576) * Add link to custom ops * Wording 2019-03-08 08:49:10 +00:00			`* Dynamically loading custom ops. [Instructions](/docs/AddingCustomOp.md)`
Mention OrtCreateSessionFromArray in C API doc (#1459) 2019-07-22 22:44:46 +00:00			* Ability to load a model from a byte array. See ```OrtCreateSessionFromArray``` in [onnxruntime_c_api.h](/include/onnxruntime/core/session/onnxruntime_c_api.h).
Add support for sessions to share a global threadpool. (#3177) * Add support for sessions to share a global threadpool. * Fix build issues * Add tests, fix build issues. * Added some documentation * Fix centos issue when threadpools become nullptr due to 1 core. * Fix mac and x86 build issues * Address some PR comments * Disabled test for android, added few more tests and addressed more PR comments. * const_cast 2020-03-18 22:42:46 +00:00			`* Global/shared threadpools: By default each session creates its own set of threadpools. In situations where multiple`
			`sessions need to be created (to infer different models) in the same process, you end up with several threadpools created`
			`by each session. In order to address this inefficiency we introduce a new feature called global/shared threadpools.`
			`The basic idea here is to share a set of global threadpools across multiple sessions. Typical usage of this feature`
			`is as follows`
			* Populate ```ThreadingOptions```. Use the value of 0 for ORT to pick the defaults.
			* Create env using ```CreateEnvWithGlobalThreadPools()```
			* Create session and call ```DisablePerSessionThreads()``` on the session options object
			* Call ```Run()``` as usual
Add documentation for OrtArenaCfg for CreateAndRegisterAllocator API. (#5831) * Add documentation for OrtArenaCfg for CreateAndRegisterAllocator API. * Address PR comments * More comments 2020-11-18 18:21:20 +00:00			`* Share allocator(s) between sessions:`
			`* Description: This feature allows multiple sessions in the same process to use the same allocator(s).`
			* Scenario: You've several sessions in the same process and see high memory usage. One of the reasons for this is as follows. Each session creates its own CPU allocator which is arena based by default. [ORT implements](onnxruntime/core/framework/bfc_arena.h) a simplified version of an arena allocator that is based on [Doug Lea's best-first with coalescing algorithm](http://gee.cs.oswego.edu/dl/html/malloc.html). Each allocator lives in its own session. It allocates a large region of memory during init time and thereafter it chunks, coalesces and extends this initial region as per allocation/deallocation demands. Overtime the arena ends up with unused chunks of memory per session. Moreover, the memory allocated by the arena is never returned to the system; once allocated it always remains allocated. All these factors add up when using multiple sessions (each with its own arena) thereby increasing the overall memory consumption of the process. Hence it becomes important to share the arena allocator between sessions.
			`* Usage:`
			* Create and register a shared allocator with the env using the ```CreateAndRegisterAllocator``` API. This allocator is then reused by all sessions that use the same env instance unless a session
			chooses to override this by setting ```session_state.use_env_allocators``` to "0".
			* Set ```session.use_env_allocators``` to "1" for each session that wants to use the env registered allocators.
			* See test ```TestSharedAllocatorUsingCreateAndRegisterAllocator``` in
Allow multiple sessions to share an allocator, optimize constant folding memory usage, expose arena configs. (#4813) * Add support for sharing allocators * Incremental update * Address some PR comments, add unit tests, add documentation. * Address PR comments, add tests and some documentation. * Fix build and test issues * Remove RegisterAllocator API restoring the OrtAllocator interface changes. Changed docs to reflect this. Also fixed the orttraining segfault. The segfault was because in the case of training session, the CPU exec prov is not available at the time the transformers are applied. Changed it to create a new one. 2020-08-22 17:03:17 +00:00			`onnxruntime/test/shared_lib/test_inference.cc for an example.`
Add documentation for OrtArenaCfg for CreateAndRegisterAllocator API. (#5831) * Add documentation for OrtArenaCfg for CreateAndRegisterAllocator API. * Address PR comments * More comments 2020-11-18 18:21:20 +00:00			`* Configuring OrtArenaCfg:`
			`* Default values for these configs can be found in the [BFCArena class](onnxruntime/core/framework/bfc_arena.h).`
			* ```initial_chunk_size_bytes```: This is the size of the region that the arena allocates first. Chunks are handed over to allocation requests from this region. If the logs show that the arena is getting extended a lot more than expected, you're better off choosing a big enough initial size for this.
			* ```max_mem```: This is the maximum amount of memory the arena allocates. If a chunk cannot be serviced by any existing region, the arena extends itself by allocating one more region depending on available memory (max_mem - allocated_so_far). An error is returned if available memory is less than the requested extension.
			* ```arena_extend_strategy```: This can take only 2 values currently: kSameAsRequested or kNextPowerOfTwo. As the name suggests kNextPowerOfTwo (the default) extends the arena by a power of 2, while kSameAsRequested extends by a size that is the same as the allocation request each time. kSameAsRequested is suited for more advanced configurations where you know the expected memory usage in advance.
			* ```max_dead_bytes_per_chunk```: This controls whether a chunk is split to service an allocation request. Currently if the difference between the chunk size and requested size is less than this value, the chunk is not split. This has the potential to waste memory by keeping a part of the chunk unused (hence called dead bytes) throughout the process thereby increasing the memory usage (until this chunk is returned to the arena).

Allow sharing of initializers between sessions. (#5092) * Allow sharing of initializers between sessions. * Allow sharing of initializers between sessions (2). * Add test for C# * Add test for C#; address PR comments * Address PR comments Moved AddInitializer logic to internal session options Added tests for owned buffer Clarified documentation Fix bug where memory info and not device was getting compared * Fix test * Fix training build * Add ver 5 end marker and ver 6 starter, add scenario and usage examples. 2020-09-21 21:09:37 +00:00			`* Share initializer(s) between sessions:`
			`* Description: This feature allows a user to share the same instance of an initializer across`
			`multiple sessions.`
			`* Scenario: You've several models that use the same set of initializers except the last few layers of the model and you load these models in the same process. When every model (session) creates a separate instance of the same initializer, it leads to excessive and wasteful memory usage since in this case it's the same initializer. You want to optimize memory usage while having the flexibility to allocate the initializers (possibly even store them in shared memory).`
			* Example Usage: Use the ```AddInitializer``` API to add a pre-allocated initializer to session options before calling ```CreateSession```. Use the same instance of session options to create several sessions allowing the initializer(s) to be shared between the sessions. See [C API sample usage (TestSharingOfInitializer)](../onnxruntime/test/shared_lib/test_inference.cc) and [C# API sample usage (TestWeightSharingBetweenSessions)](../csharp/test/Microsoft.ML.OnnxRuntime.Tests/InferenceTest.cs).
Sync with internal master. 2018-11-23 04:56:43 +00:00
Update C-API with working example (#503) * Add working example of C-API * Section formatting * Shortened example, and grouped by functionality * Update C_API.md 2019-02-26 03:06:51 +00:00			`## Usage Overview`
Sync with internal master. 2018-11-23 04:56:43 +00:00
Bug bash (#43) * Update README.md * Update Versioning.md * Update rename_manylinux.sh Remove duplicate word * Update README.md Remove a 'the' as ONNX Runtime is a proper noun. * Update CUDA version to 9.1 cudnn version to 7.1 * Update ReleaseManagement.md * put tensorflow copy-right headers there are around 10 lines of code is borrowed from tflite. * Update README.md Mention C++ API * Update README.md Fix link * Update C_API.md Fix broken link to onnxruntime_c_api.h * Update ABI.md Delete mention of COM and fix 'ONNX Runtime' to be two words * Update README.md * Update README.md * Update C_API.md 2018-11-28 02:52:50 +00:00			`1. Include [onnxruntime_c_api.h](/include/onnxruntime/core/session/onnxruntime_c_api.h).`
Rename OrtInitialize to OrtCreateEnv in preparation for future. (#399) * Rename OrtInitialize to OrtCreateEnv in preparation for future. Add version number to structures * Forgot about exports * Update documentation 2019-01-29 23:03:18 +00:00			`2. Call OrtCreateEnv`
			`3. Create Session: OrtCreateSession(env, model_uri, nullptr,...)`
add details aboud adding execution providers in the C api to comments and docs (i.e. need OrtSessionOptionsAppendExecutionProvider_CUDA to get CUDA) 2019-06-02 19:59:01 +00:00			`- Optionally add more execution providers (e.g. for CUDA use OrtSessionOptionsAppendExecutionProvider_CUDA)`
Update C_API.md Rephrasing 2018-11-26 22:57:54 +00:00			`4. Create Tensor`
Part 2 of renaming AllocatorInfo to MemoryInfo. (#1804) * Mention OrtCreateSessionFromArray in C API doc * Part 2 of renaming AllocatorInfo to MemoryInfo. * pr comments * fix comment 2019-09-12 15:19:29 +00:00			`1) OrtCreateMemoryInfo`
Update C-API with working example (#503) * Add working example of C-API * Section formatting * Shortened example, and grouped by functionality * Update C_API.md 2019-02-26 03:06:51 +00:00			`2) OrtCreateTensorWithDataAsOrtValue`
Rename OrtInitialize to OrtCreateEnv in preparation for future. (#399) * Rename OrtInitialize to OrtCreateEnv in preparation for future. Add version number to structures * Forgot about exports * Update documentation 2019-01-29 23:03:18 +00:00			`5. OrtRun`
Sync with internal master. 2018-11-23 04:56:43 +00:00
Update C-API with working example (#503) * Add working example of C-API * Section formatting * Shortened example, and grouped by functionality * Update C_API.md 2019-02-26 03:06:51 +00:00			`## Sample code`

add details aboud adding execution providers in the C api to comments and docs (i.e. need OrtSessionOptionsAppendExecutionProvider_CUDA to get CUDA) 2019-06-02 19:59:01 +00:00			`The example below shows a sample run using the SqueezeNet model from ONNX model zoo, including dynamically reading model inputs, outputs, shape and type information, as well as running a sample vector and fetching the resulting class probabilities for inspection.`
Update C-API with working example (#503) * Add working example of C-API * Section formatting * Shortened example, and grouped by functionality * Update C_API.md 2019-02-26 03:06:51 +00:00
Updating C_API end-to-end test and user samples (#564) * Updating user sample and C_API unit test * remove debugging info * remove precompiled headers * header file location changed in master...updating 2019-03-07 08:28:15 +00:00			`* [../csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests.Capi/C_Api_Sample.cpp](../csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests.Capi/C_Api_Sample.cpp)`
Adding platform telemetry (#2109) 2019-10-20 01:25:57 +00:00
Add new docs around how to bind to the onnxruntime.dll (#3539) 2020-04-22 18:24:36 +00:00			`## Deployment`

			`### Windows 10`

			`Your installer should put the onnxruntime.dll into the same folder as your application. Your application can either use [load-time dynamic linking](https://docs.microsoft.com/en-us/windows/win32/dlls/using-load-time-dynamic-linking) or [run-time dynamic linking](https://docs.microsoft.com/en-us/windows/win32/dlls/using-run-time-dynamic-linking) to bind to the dll.`

			`#### Dynamic Link Library Search Order`

			`This is an important article on how Windows finds supporting dlls: [Dynamic Link Library Search Order](https://docs.microsoft.com/en-us/windows/win32/dlls/dynamic-link-library-search-order).`

			There are some cases where the app is not directly consuming the onnxruntime but instead calling into a DLL that is consuming the onnxruntime. People building these DLLs that consume the onnxruntime need to take care about folder structures. Do not modify the system %path% variable to add your folders. This can conflict with other software on the machine that is also using the onnxruntme. Instead place your DLL and the onnxruntime DLL in the same folder and use [run-time dynamic linking](https://docs.microsoft.com/en-us/windows/win32/dlls/using-run-time-dynamic-linking) to bind explicity to that copy. You can use code like this sample does in [GetModulePath()](https://github.com/microsoft/Windows-Machine-Learning/blob/master/Samples/SampleSharedLib/SampleSharedLib/FileHelper.cpp) to find out what folder your dll is loaded from.

Adding platform telemetry (#2109) 2019-10-20 01:25:57 +00:00			`## Telemetry`
Add new docs around how to bind to the onnxruntime.dll (#3539) 2020-04-22 18:24:36 +00:00
Adding platform telemetry (#2109) 2019-10-20 01:25:57 +00:00			`To turn on/off telemetry collection on official Windows builds, please use Enable/DisableTelemetryEvents() in the C API. See the [Privacy](./Privacy.md) page for more information on telemetry collection and Microsoft's privacy policy.`