onnxruntime/include/onnxruntime/core
Tim Harris 5e44d25c5a
Support multi-loop parallel sections, use multi-loop sections in GRU (#5602)
This PR updates the ThreadPool API to support multi-loop parallel sections. As with the OpenMP "parallel" construct, this allows per-loop work to be amortized over a series of loops. For ORT, it also promotes locality between successive loops in the sense that iteration X of one loop will tend to run on the same worker thread as iteration X of preceding loops.

The change was developed while optimizing the implementation of a model that performed better with OpenMP. Profiling indicated that OpenMP was providing lower loop entry/exit costs and that, via OpenMP's static scheduling, it was leading to a lower L2 miss rate in the series of parallel loops used in GRU.

The main changes are:

- Addition of ThreadPool::ParallelSection and underlying support in the modified Eigen thread pool.

- In EigenNonBlockingThreadPool.h, refactoring the RunInParallel method to support two variants: one that takes an existing parallel section object created by the caller, and another (used by default) that creates its own parallel section.

- Simplify ThreadPool::LoopCounter (used by worker threads to claim loop iterations), basing it an ID supplied by the underlying Eigen thread pool for affinity in a series of loops.

- Fix a possible perf issue where a loop with iterations scheduled in batches would have more threads than batches available.

- Use of parallel sections in the GRU operator.

- Additional test cases in threadpool_test.h.

- Additional comments at the top of threadpool.h and EigenNonBlockingThreadPool.h.
2020-11-10 12:24:57 +00:00
..
common Upgrade optional implementation to https://github.com/martinmoene/optional-lite. (#5563) 2020-11-03 15:27:47 -08:00
framework Move GetUseDeterministicCompute() to OpKernelContext to avoid need to downcast to OpKernelContextInternal. (#5729) 2020-11-09 11:37:06 -08:00
graph The IndexedSubGraph is used to create the Function body, but after that is invalid as the nodes it referred to have been removed from the main Graph. As such there's no need to store it in the FunctionImpl instance. (#5669) 2020-11-05 17:21:56 +10:00
optimizer Expose recompute configs to the frontend (#5318) 2020-10-02 09:49:47 -07:00
platform Support multi-loop parallel sections, use multi-loop sections in GRU (#5602) 2020-11-10 12:24:57 +00:00
providers Add runtime options for NNAPI EP (#5576) 2020-11-04 10:08:43 -08:00
session Add tag types for Ort::Float16_t and Ort:Bfloat16_t structs (#5716) 2020-11-06 16:41:26 -08:00