mirror of
https://github.com/saymrwulf/onnxruntime.git
synced 2026-06-18 01:54:05 +00:00
* Support to allow user to specify compute stream per session Create computation cuda stream explicitly rather than use default legacy stream or per-thread default stream. remove some redudant cudaStreamSynchronize fix gpt2 model test failures don't use default stream in nccl either. add stream schronization in OnRunEnd() using cub::DeviceScan::InclusiveSum which can be called with stream specified. fix topK failure due to latest rebase fix tensorrt support user specified stream add user_stream support in tensorrt EP use same stream for both tensort and CUDA EP. fix ScatterND specify stream for adasum and p2p kernels. fix loop fix CApiTest.custom_op_handler fix CApiTest.varied_input_custom_op_handler change for cudaMemcpyFromSymbol improve provider options for user specified compute stream * add changes for ROCM EP * fix GatherGrad UT for ROCM EP * clean code and fix NonMaxSuppression * use default stream for ROCM now * fix CApiTest.custom_op_handler:OrtFormatCustomOpTests.ConvertOnnxModelToOrt * fix tensorrt ut: CApiTest.io_binding_cuda Co-authored-by: Weixing Zhang <wezhan@microsoft.com> |
||
|---|---|---|
| .. | ||
| cuda | ||
| allocation_planner_test.cc | ||
| allocator_test.cc | ||
| bfc_arena_test.cc | ||
| data_types_test.cc | ||
| distance_test.cc | ||
| dummy_allocator.cc | ||
| dummy_allocator.h | ||
| dummy_provider.cc | ||
| dummy_provider.h | ||
| endian_test.cc | ||
| execution_frame_test.cc | ||
| execution_provider_test.cc | ||
| float_16_test.cc | ||
| inference_session_test.cc | ||
| insert_cast_transformer_test.cc | ||
| kernel_registry_test.cc | ||
| local_kernel_registry_test.cc | ||
| math_test.cc | ||
| mem_pattern_planner_test.cc | ||
| memcpy_transformer_test.cc | ||
| model_builder_utils.h | ||
| opaque_kernels_test.cc | ||
| ort_model_only_test.cc | ||
| parallel_executor_test.cc | ||
| provider_options_utils_test.cc | ||
| random_test.cc | ||
| session_state_test.cc | ||
| shape_inference_test.cc | ||
| sparse_kernels_test.cc | ||
| tensor_test.cc | ||
| tensorutils_test.cc | ||
| test_tensor_loader.cc | ||
| test_utils.cc | ||
| test_utils.h | ||
| TestAllocatorManager.cc | ||
| TestAllocatorManager.h | ||