onnxruntime/cmake
Tang, Cheng a81faee41e
Multi-stream execution support (#13495)
**Description**: This PR including following works:
1. provide stream and related synchronization abstractions in
onnxruntime.
2. enhance onnxruntime's execution planner / executor / memory arena to
support execute multiple streams in parallel.
3. deprecate the parallel executor for cpu.
4. deprecate the Fence mechanism. 
5. update the cuda / tensorrt EP to support the stream mechanism,
support running different request in different cuda stream.

**Motivation and Context**
- Why is this change required? 
currently, the execution plan is just a linear list of those primitives,
ort will execute them step by step. For any given graph, ORT will
serialize it to a fixed execution order. This sequential execution
design simplifies most scenarios, but it has the following limitations:
1. it is difficult to enable inter-node parallelization, we have a
half-baked parallel executor but it is very difficult to make it work
with GPU.
2. The fence mechanism can work with single gpu stream + cpu thread
case, but when extend to multiple stream, it is difficult to manage the
cross GPU stream synchronizations.
3. our cuda EP rely on the BFCArena to make the memory management work
with the GPU async kernels, but current BFCArena is not aware of the
streams, so it doesn't behavior correctly when run with multiple
streams.

This PR enhance our existing execution plan and executor to support
multiple stream execution. we use an unified algorithm to mange both
single stream and multiple stream scenarios.
This PR mainly focus on the infrastructure support for multiple stream
execution, that is said, given a valid stream assignment, onnxruntime
can execute it correctly. How to generate a good stream assignment for a
given model will be in the future PR.

Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Cheng Tang <chenta@microsoft.com>
Co-authored-by: RandySheriffH <48490400+RandySheriffH@users.noreply.github.com>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: cao lei <jslhcl@gmail.com>
Co-authored-by: Lei Cao <leca@microsoft.com>
2022-12-15 07:39:29 -08:00
..
external Add protobuf version constraint (#13870) 2022-12-08 16:14:16 -08:00
patches Patch Protobuf and ONNX's cmake files and enforce BinSkim check (#13694) 2022-11-18 10:09:47 -08:00
tensorboard Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
adjust_global_compile_flags.cmake Multi-stream execution support (#13495) 2022-12-15 07:39:29 -08:00
CMakeLists.txt Fix usage of enable_training_ops and reduce ifdef complexity for training builds (#13888) 2022-12-14 08:32:46 -08:00
CMakeSettings.json
codeconv.runsettings
deps.txt [TensorRT EP] support TensorRT 8.5 (#13867) 2022-12-14 13:06:03 -08:00
EnableVisualStudioCodeAnalysis.props Fix SDL warnings in CPU EP (#9975) 2021-12-19 20:54:29 -08:00
gdk_toolchain.cmake Enable building with a GDK (#11126) 2022-04-07 15:06:31 -07:00
Info.plist.in
libonnxruntime.pc.cmake.in
nuget_helpers.cmake
onnxruntime.cmake Remove miscellaneous nuphar configs (#13070) 2022-09-26 13:41:28 -07:00
onnxruntime_codegen_tvm.cmake Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
onnxruntime_common.cmake Enabling thread pool to be numa-aware (#13778) 2022-12-12 10:33:55 -08:00
onnxruntime_config.h.in [wasm] update emscripten v2.0.34 (#10391) 2022-01-26 14:46:02 -08:00
onnxruntime_csharp.cmake Enable nuget packages for on device training (#13637) 2022-12-05 14:54:09 -08:00
onnxruntime_eager.cmake Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
onnxruntime_flatbuffers.cmake Switch GSL to MS GSL 4.0.0 (#13416) 2022-10-29 04:15:20 -07:00
onnxruntime_framework.cmake Fix usage of enable_training_ops and reduce ifdef complexity for training builds (#13888) 2022-12-14 08:32:46 -08:00
onnxruntime_fuzz_test.cmake
onnxruntime_graph.cmake Fix usage of enable_training_ops and reduce ifdef complexity for training builds (#13888) 2022-12-14 08:32:46 -08:00
onnxruntime_ios.toolchain.cmake
onnxruntime_java.cmake Add linux and macos arm64 java aritifacts (#10981) 2022-03-25 16:23:17 -07:00
onnxruntime_java_unittests.cmake
onnxruntime_kernel_explorer.cmake Share TunableOp between CUDA and ROCM EP (#13560) 2022-11-11 13:56:44 +08:00
onnxruntime_language_interop_ops.cmake Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
onnxruntime_mlas.cmake Switch GSL to MS GSL 4.0.0 (#13416) 2022-10-29 04:15:20 -07:00
onnxruntime_nodejs.cmake Add Node.js binding support to packaging pipeline (#9577) 2021-11-05 15:29:40 -07:00
onnxruntime_objectivec.cmake Remove SafeInt dependency from Objective-C API. (#13698) 2022-11-18 17:06:12 -08:00
onnxruntime_opschema_lib.cmake Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
onnxruntime_optimizer.cmake Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
onnxruntime_providers.cmake Multi-stream execution support (#13495) 2022-12-15 07:39:29 -08:00
onnxruntime_pyop.cmake Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
onnxruntime_python.cmake Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
onnxruntime_rocm_hipify.cmake Multi-stream execution support (#13495) 2022-12-15 07:39:29 -08:00
onnxruntime_session.cmake Fix usage of enable_training_ops and reduce ifdef complexity for training builds (#13888) 2022-12-14 08:32:46 -08:00
onnxruntime_snpe_provider.cmake cmake changes for SNPE EP (#11821) 2022-06-13 08:15:37 -07:00
onnxruntime_training.cmake Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
onnxruntime_unittests.cmake [TensorRT EP] support TensorRT 8.5 (#13867) 2022-12-14 13:06:03 -08:00
onnxruntime_util.cmake Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
onnxruntime_webassembly.cmake Fix usage of enable_training_ops and reduce ifdef complexity for training builds (#13888) 2022-12-14 08:32:46 -08:00
precompiled_header.cmake Fix Windows Store build (#8753) 2021-08-23 11:19:03 -07:00
Sdl.ruleset Update Sdl.ruleset to remove C26812 from the rules (#12695) 2022-09-01 20:05:20 -07:00
set_winapi_family_desktop.h
target_delayload.cmake Remove Windows Store specific code 2022-03-17 23:38:14 -07:00
uwp_stubs.h Fix Windows Store build (#8753) 2021-08-23 11:19:03 -07:00
wcos_rules_override.cmake
winml.cmake Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
winml_cppwinrt.cmake Fix Windows Store build (#8753) 2021-08-23 11:19:03 -07:00
winml_sdk_helpers.cmake
winml_unittests.cmake Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00