onnxruntime/cmake
Weixing Zhang 299ace0759
Support to allow user to specify compute stream per session (#3723)
* Support to allow user to specify compute stream per session

Create computation cuda stream explicitly rather than use default legacy stream or per-thread default stream.

remove some redudant cudaStreamSynchronize

fix gpt2 model test failures

don't use default stream in nccl either.

add stream schronization in OnRunEnd()

using cub::DeviceScan::InclusiveSum which can be called with stream specified.

fix topK failure due to latest rebase

fix tensorrt

support user specified stream

add user_stream support in tensorrt EP

use same stream for both tensort and CUDA EP.

fix ScatterND

specify stream for adasum and p2p kernels.

fix loop

fix CApiTest.custom_op_handler

fix CApiTest.varied_input_custom_op_handler

change for cudaMemcpyFromSymbol

improve provider options for user specified compute stream

* add changes for ROCM EP

* fix GatherGrad UT for ROCM EP

* clean code and fix NonMaxSuppression

* use default stream for ROCM now

* fix CApiTest.custom_op_handler:OrtFormatCustomOpTests.ConvertOnnxModelToOrt

* fix tensorrt ut: CApiTest.io_binding_cuda

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
2021-02-05 15:48:18 -08:00
..
external add set_model_dir and update ONNX (#6119) 2021-02-05 09:30:49 -08:00
patches [OpenVINO-EP] Remove support for OpenVINO 2020.2 (#6493) 2021-01-28 23:00:41 -08:00
tensorboard Introduce training changes. 2020-03-11 14:39:03 -07:00
CMakeLists.txt Support to allow user to specify compute stream per session (#3723) 2021-02-05 15:48:18 -08:00
CMakeSettings.json Fork the WinML APIs into the Microsoft namespace (#3503) 2020-04-17 06:18:54 -07:00
codeconv.runsettings CMake changes (#2961) 2020-02-03 19:33:14 -08:00
flake8.cmake Add ability to track per operator types in reduced build config. (#6428) 2021-01-29 07:59:51 +10:00
nuget_helpers.cmake Fix nuget build error (#6009) 2020-12-03 09:28:39 -08:00
onnxruntime.cmake Merge CPU packaging pipelines (#6480) 2021-02-04 08:38:56 -08:00
onnxruntime_codegen.cmake [ORT Mobile] file format schema and file I/O code (#4973) 2020-09-01 11:51:31 +10:00
onnxruntime_common.cmake Op kernel type reduction infrastructure. (#6466) 2021-01-28 07:27:19 -08:00
onnxruntime_config.h.in Thread pool changes (#3153) 2020-03-30 12:18:40 -07:00
onnxruntime_csharp.cmake Remove nGraph Execution Provider (#5858) 2020-11-19 16:47:55 -08:00
onnxruntime_flatbuffers.cmake Add ability to track per operator types in reduced build config. (#6428) 2021-01-29 07:59:51 +10:00
onnxruntime_framework.cmake Merge CPU packaging pipelines (#6480) 2021-02-04 08:38:56 -08:00
onnxruntime_fuzz_test.cmake Merge CPU packaging pipelines (#6480) 2021-02-04 08:38:56 -08:00
onnxruntime_graph.cmake Deprecating Horovod and refactored Adasum computations (#5468) 2020-12-17 16:21:33 -08:00
onnxruntime_ios.toolchain.cmake Add iOS test pipeline and a sample app. (#5298) 2020-09-29 13:53:11 -07:00
onnxruntime_java.cmake Merge CPU packaging pipelines (#6480) 2021-02-04 08:38:56 -08:00
onnxruntime_java_unittests.cmake [java] Adds a CUDA test (#3956) 2020-05-18 12:05:51 -07:00
onnxruntime_language_interop_ops.cmake [ORT Mobile] file format schema and file I/O code (#4973) 2020-09-01 11:51:31 +10:00
onnxruntime_mlas.cmake MLAS: improve quantized depthwise convolution (#6513) 2021-02-01 21:22:27 -08:00
onnxruntime_nodejs.cmake build: split nodejs binding build and test to avoid timeout issue (#4188) 2020-06-10 19:16:32 -07:00
onnxruntime_nuphar_extern.cmake Weba/merge ngemm (#2021) 2019-10-05 12:09:22 -07:00
onnxruntime_optimizer.cmake Fix Windows x86 compiler warnings in the optimizers project (#6377) 2021-01-20 17:50:16 -08:00
onnxruntime_providers.cmake Merge CPU packaging pipelines (#6480) 2021-02-04 08:38:56 -08:00
onnxruntime_pyop.cmake [ORT Mobile] file format schema and file I/O code (#4973) 2020-09-01 11:51:31 +10:00
onnxruntime_python.cmake Merge CPU packaging pipelines (#6480) 2021-02-04 08:38:56 -08:00
onnxruntime_session.cmake [ORT Mobile] file format schema and file I/O code (#4973) 2020-09-01 11:51:31 +10:00
onnxruntime_training.cmake Merge CPU packaging pipelines (#6480) 2021-02-04 08:38:56 -08:00
onnxruntime_unittests.cmake Merge CPU packaging pipelines (#6480) 2021-02-04 08:38:56 -08:00
onnxruntime_util.cmake Merge CPU packaging pipelines (#6480) 2021-02-04 08:38:56 -08:00
precompiled_header.cmake Merge windowsai (winml layering) into master (#2956) 2020-02-04 17:12:19 -08:00
protobuf_function.cmake Initial version of CoreML EP (#6392) 2021-01-27 10:43:17 -08:00
set_winapi_family_desktop.h Fix WCOS/Win32 linking bugs (#3126) 2020-03-19 08:52:40 -07:00
store_toolchain.cmake Use onecore umbrella lib in onecore builds (#5182) 2020-09-16 10:46:27 -07:00
target_delayload.cmake Use onecore umbrella lib in onecore builds (#5182) 2020-09-16 10:46:27 -07:00
wcos_rules_override.cmake Use onecore umbrella lib in onecore builds (#5182) 2020-09-16 10:46:27 -07:00
wil.cmake Merge windowsai (winml layering) into master (#2956) 2020-02-04 17:12:19 -08:00
winml.cmake Op kernel type reduction infrastructure. (#6466) 2021-01-28 07:27:19 -08:00
winml_cppwinrt.cmake Reintroduce experimental api changes and fix remote build break (#6385) 2021-01-22 15:15:53 -08:00
winml_sdk_helpers.cmake Merge windowsai (winml layering) into master (#2956) 2020-02-04 17:12:19 -08:00
winml_unittests.cmake Merge CPU packaging pipelines (#6480) 2021-02-04 08:38:56 -08:00