onnxruntime/docs/NotesOnThreading.md
Tim Harris 5e8952ef89
ThreadPool clean up : mm_pause in loops, correctly spin-then-wait, and adopt static methods consistently in the API (#5590)
Description: This change makes three changes to the ThreadPool class to clean up issues identified during performance analysis and optimization. (1) It uses mm_pause intrinsics in spin loops, helping avoid consuming pipeline resources while waiting. (2) It re-organizes the spin-then-steal loop for work distribution to start out spinning as intended, rather than to start out trying to steal. (3) It updates the ThreadPool class's API to be consistent in the use of static methods for public functions. The PR includes minor doc updates and corresponding changes to test cases.

Motivation and Context
The change helps ensure consistency in behavior between the OpenMP and Eigen-based implementations. Unlike the instance methods, the static methods abstract over the different ways in which threading can be implemented; they will map onto the OpenMP or Eigen-based implementations when threading is used. When threading is not used they will run work sequentially.
2020-10-28 09:49:18 +00:00

1.5 KiB

Notes on Threading in ORT

This document is intended for ORT developers.

ORT allows the usage of either OpenMP or non-OpenMP (ORT) threads for execution. Threadpool management is abstracted behind: (1) ThreadPool class in threadpool.h and (2) functions in thread_utils.h.

When developing an op, please use these abstractions to parallelize your code. These abstractions centralize 2 things. When OpenMP is enabled, they resort to using OpenMP. When OpenMP is disabled they resort to sequential execution if the threadpool ptr is NULL or schedule the tasks on the threadpool otherwise.

Examples of these abstractions are: (threadpool.h has more documentation for these)

  • TryParallelFor
  • TrySimpleParallelFor
  • TryBatchParallelFor
  • ShouldParallelize
  • DegreeOfParallelism

These static methods abstract over the different implementation choices. They can run over the ORT thread pool, or run over OpenMP, or run sequentially.

Please do not write #ifdef pragma omp in operator code.

For intra op parallelism ORT users can use either OpenMP or ORT threadpool. The choice of using OpenMP is indicated by building ORT with --use_openmp switch. For inter op parallelism, however, we always use the ORT threadpool.