onnxruntime/cmake
Yufeng Li 8c5db7f973
use legacy stream mode (#2076)
In ORT, there is only 3 cuda stream: default, HtoD, DtoH. And both HtoD and DtoH are non-blocking stream. Thus, per-thread stream mode doesn't have any benefit.
I also tried in multiple thread env and the legacy mode is also better than per-thread model.
Below is the perf of a 3 layer bert on v100. Unit is ms:
batch size 1:
 concurrency | c=1 | c=2 | c=4
legacy | 0.54 | 1.17 | 2.68
per-thread | 0.66 | 1.37 | 2.86
 
batch size 4:  
 concurrency | c=1 | c=2 | c=4
legacy | 1.1 | 2.22 | 4.6
per-thread | 1.21 | 2.44 | 4.98

batch size 64:
concurrency  | c=1 | c=2 | c=4
legacy | 8.09 | 16.13 | 32.37
per-thread | 8.18 | 16.26 | 32.45
2019-10-14 16:03:04 -07:00
..
external Update nGraph to version 0.26 (#1965) 2019-10-14 10:37:48 -07:00
onnx make builds more robust (#906) (#932) 2019-04-29 12:58:20 -07:00
patches Update nGraph to version 0.26 (#1965) 2019-10-14 10:37:48 -07:00
CMakeLists.txt use legacy stream mode (#2076) 2019-10-14 16:03:04 -07:00
ConfigureVisualStudioCodeAnalysis.props Initial bootstrap commit. 2018-11-19 16:48:22 -08:00
EnableVisualStudioCodeAnalysis.props Initial bootstrap commit. 2018-11-19 16:48:22 -08:00
get_boost.cmake restore ninja compatibility 2019-05-15 10:18:52 -07:00
onnxruntime.cmake Replace GSL with GSL-LITE submodule and fix up refs (#1920) 2019-10-01 12:43:29 -07:00
onnxruntime_automl_featurizers.cmake Replace GSL with GSL-LITE submodule and fix up refs (#1920) 2019-10-01 12:43:29 -07:00
onnxruntime_codegen.cmake Replace GSL with GSL-LITE submodule and fix up refs (#1920) 2019-10-01 12:43:29 -07:00
onnxruntime_common.cmake Replace GSL with GSL-LITE submodule and fix up refs (#1920) 2019-10-01 12:43:29 -07:00
onnxruntime_config.h.in Ignore some gcc warnings (#1996) 2019-10-07 16:32:34 -07:00
onnxruntime_csharp.cmake Conditionally export execution provider apis in chsarp (#1724) 2019-09-09 11:17:44 -07:00
onnxruntime_dependencies.dot Initial bootstrap commit. 2018-11-19 16:48:22 -08:00
onnxruntime_framework.cmake Replace GSL with GSL-LITE submodule and fix up refs (#1920) 2019-10-01 12:43:29 -07:00
onnxruntime_graph.cmake Replace GSL with GSL-LITE submodule and fix up refs (#1920) 2019-10-01 12:43:29 -07:00
onnxruntime_language_interop_ops.cmake Replace GSL with GSL-LITE submodule and fix up refs (#1920) 2019-10-01 12:43:29 -07:00
onnxruntime_mlas.cmake Introduce a separate check and conditional for AVX512BW build (#2083) 2019-10-10 16:14:00 -07:00
onnxruntime_nuphar_extern.cmake Weba/merge ngemm (#2021) 2019-10-05 12:09:22 -07:00
onnxruntime_optimizer.cmake Cleanup some aspects of the Initializer class used by optimizers (#2005) 2019-10-09 10:37:44 +10:00
onnxruntime_providers.cmake Update TensorRT to version 6.0.1.5 (#1966) 2019-10-06 10:40:53 -07:00
onnxruntime_pyop.cmake Replace GSL with GSL-LITE submodule and fix up refs (#1920) 2019-10-01 12:43:29 -07:00
onnxruntime_python.cmake pack pyop in nightly build (#2018) 2019-10-08 12:02:45 -07:00
onnxruntime_server.cmake Replace GSL with GSL-LITE submodule and fix up refs (#1920) 2019-10-01 12:43:29 -07:00
onnxruntime_session.cmake Replace GSL with GSL-LITE submodule and fix up refs (#1920) 2019-10-01 12:43:29 -07:00
onnxruntime_unittests.cmake Replace std::regex with re2 bc CentOS std::regex is broken (#2017) 2019-10-04 18:47:03 -07:00
onnxruntime_util.cmake Replace GSL with GSL-LITE submodule and fix up refs (#1920) 2019-10-01 12:43:29 -07:00
protobuf_function.cmake Use protobuf-lite to reduce onnxruntime.dll size. (#639) 2019-03-21 14:06:38 -07:00