onnxruntime/include/onnxruntime/core
Weixing Zhang 299ace0759
Support to allow user to specify compute stream per session (#3723)
* Support to allow user to specify compute stream per session

Create computation cuda stream explicitly rather than use default legacy stream or per-thread default stream.

remove some redudant cudaStreamSynchronize

fix gpt2 model test failures

don't use default stream in nccl either.

add stream schronization in OnRunEnd()

using cub::DeviceScan::InclusiveSum which can be called with stream specified.

fix topK failure due to latest rebase

fix tensorrt

support user specified stream

add user_stream support in tensorrt EP

use same stream for both tensort and CUDA EP.

fix ScatterND

specify stream for adasum and p2p kernels.

fix loop

fix CApiTest.custom_op_handler

fix CApiTest.varied_input_custom_op_handler

change for cudaMemcpyFromSymbol

improve provider options for user specified compute stream

* add changes for ROCM EP

* fix GatherGrad UT for ROCM EP

* clean code and fix NonMaxSuppression

* use default stream for ROCM now

* fix CApiTest.custom_op_handler:OrtFormatCustomOpTests.ConvertOnnxModelToOrt

* fix tensorrt ut: CApiTest.io_binding_cuda

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
2021-02-05 15:48:18 -08:00
..
common Rename MakeString and ParseString functions. (#6272) 2021-01-07 15:43:42 -08:00
framework Support to allow user to specify compute stream per session (#3723) 2021-02-05 15:48:18 -08:00
graph Let execution fall back to CPU EP if Compile of a partition on current EP fails (#6580) 2021-02-05 12:14:55 -08:00
optimizer Expose recompute configs to the frontend (#5318) 2020-10-02 09:49:47 -07:00
platform Cast Op performance fix. (#6509) 2021-02-04 14:52:37 -08:00
providers [CoreML EP] Add CI for CoreML EP (macOS) and add coreml_flags for EP options (#6481) 2021-01-28 12:25:46 -08:00
session Support to allow user to specify compute stream per session (#3723) 2021-02-05 15:48:18 -08:00