mirror of
https://github.com/saymrwulf/onnxruntime.git
synced 2026-05-14 20:48:00 +00:00
This PR updates the ThreadPool API to support multi-loop parallel sections. As with the OpenMP "parallel" construct, this allows per-loop work to be amortized over a series of loops. For ORT, it also promotes locality between successive loops in the sense that iteration X of one loop will tend to run on the same worker thread as iteration X of preceding loops. The change was developed while optimizing the implementation of a model that performed better with OpenMP. Profiling indicated that OpenMP was providing lower loop entry/exit costs and that, via OpenMP's static scheduling, it was leading to a lower L2 miss rate in the series of parallel loops used in GRU. The main changes are: - Addition of ThreadPool::ParallelSection and underlying support in the modified Eigen thread pool. - In EigenNonBlockingThreadPool.h, refactoring the RunInParallel method to support two variants: one that takes an existing parallel section object created by the caller, and another (used by default) that creates its own parallel section. - Simplify ThreadPool::LoopCounter (used by worker threads to claim loop iterations), basing it an ID supplied by the underlying Eigen thread pool for affinity in a series of loops. - Fix a possible perf issue where a loop with iterations scheduled in batches would have more threads than batches available. - Use of parallel sections in the GRU operator. - Additional test cases in threadpool_test.h. - Additional comments at the top of threadpool.h and EigenNonBlockingThreadPool.h. |
||
|---|---|---|
| .. | ||
| execution_providers | ||
| images | ||
| python | ||
| ABI_Dev_Notes.md | ||
| AddingCustomOp.md | ||
| AddingExecutionProvider.md | ||
| Android_testing.md | ||
| C_API.md | ||
| C_API_Guidelines.md | ||
| cmake_guideline.md | ||
| Coding_Conventions_and_Standards.md | ||
| ContribOperators.md | ||
| CSharp_API.md | ||
| ExportPyTorchCustomOps.md | ||
| FAQ.md | ||
| How_To_Update_ONNX_Dev_Notes.md | ||
| InferenceHighLevelDesign.md | ||
| Java_API.md | ||
| Model_Test.md | ||
| NotesOnThreading.md | ||
| ONNX_Runtime_for_Mobile_Platforms.md | ||
| ONNX_Runtime_Graph_Optimizations.md | ||
| ONNX_Runtime_Perf_Tuning.md | ||
| ONNX_Runtime_Server_Usage.md | ||
| onnxruntime_dependencies.dot | ||
| onnxruntime_dependencies.png | ||
| OperatorKernels.md | ||
| PR_Guidelines.md | ||
| Privacy.md | ||
| PyOp.md | ||
| Python_Dev_Notes.md | ||
| Reduced_Operator_Kernel_build.md | ||
| ReleaseManagement.md | ||
| Roadmap.md | ||
| Server.md | ||
| Versioning.md | ||
| WinML_principles.md | ||
| WinRT_API.md | ||