onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-26 03:00:54 +00:00

History

Tim Harris 2e09d9921a "Sticky" allocation of worker threads (#7551 ) [ PR previously merged as https://github.com//pull/7372, then reverted pending investigation of lost-wake-up issue seen with ParallelExecutor. Issue was a missing test for new work pushed to thread concurrent with a worker blocking. Change from 7372 is the addition of: https://github.com/microsoft/onnxruntime/blob/tiharr/dev-sticky-4/include/onnxruntime/core/platform/EigenNonBlockingThreadPool.h#L1473-L1492 ] Description: This change updates the heuristics used when a thread selects which worker threads to push work to on entering a parallel loop. Previously, worker threads would maintain a best-effort bitmap of "good worker hints" indicating the threads that were likely to be spinning waiting for work. This change uses a simpler heuristic where a thread records which workers ran its previous loop, and then re-submits its next loop to those same workers. The aim is to retain affinity between a thread and a set of workers, and to avoid maintaining the "good worker hints" bitmaps. Motivation and Context: Profiling suggested that maintaining the "good worker hints" was taking unexpected time, particularly on NUMA systems. In addition, when running many concurrent workloads, the hints did not provide a way to help retain locality of workers and hence data in caches. Testing to confirm no regressions on microbenchmark (./build/Linux/Release/onnxruntime_benchmark --benchmark_filter=BM_ThreadPoolParallelFor) and on Linux mobilenet_v1_1.0_224.onnx, comparing p50 and p99 with vs without this change: 1 concurrent: p50 0.0172s vs 0.0181s p99 0.0204s vs 0.0216s 2 concurrent: p50 0.0172s vs 0.0181s p99 0.0213s vs 0.0221s		2021-05-03 18:28:13 +01:00
..
common	Set CMAKE_CUDA_STANDARD to 14 because we are using std::make_unique (#7534 )	2021-04-30 20:20:00 -07:00
eager	kerne invoker api for eager mode (#7473 )	2021-04-30 13:33:58 -07:00
framework	Change onnxruntime::make_unique to std::make_unique (#7502 )	2021-04-29 17:04:53 -07:00
graph	Change onnxruntime::make_unique to std::make_unique (#7502 )	2021-04-29 17:04:53 -07:00
optimizer	Enable NHWC transformer when generating ORT format model (#7126 )	2021-03-29 18:39:48 +10:00
platform	"Sticky" allocation of worker threads (#7551 )	2021-05-03 18:28:13 +01:00
providers	rename cuda_mem_limit and hip_mem_limit to gpu_mem_limit for both CUDA EP and ROCm EP (#7226 )	2021-04-05 09:04:04 -07:00
session	Lifhuan/force trt sequential (#7440 )	2021-04-28 13:59:37 -07:00