onnxruntime/onnxruntime/test
Chen Fu f4f2cc1a00
Add batch interface to floating point GEMM (#7323)
Currently in high dimension matmul, we call multiple GEMM sequentially. In this change we execute these GEMMs in parallel, removing barriers between two adjacent GEMM operations.

Performance tested with Bert and T5 model. Bert model shows no noticeable perf differences, as the heavy lifting is done by the attention operator, which is not changed in this PR. In T5 model, we see no regression on low parallel threads (x4), and performance improvement is more pronounced in high number of threads (8-16). T5 shows 10% speedup with 16 threads. With profiling, we can see the most expensive MatMul operators in T5 achieves around 20% speedup with 16 threads.

Co-authored-by: Chen Fu <fuchen@microsoft.com>
2021-04-23 17:34:22 -07:00
..
api_tests_without_env
common QDQ implementation (#7033) 2021-03-25 09:17:23 -07:00
contrib_ops Adding interface for batched integer gemm (#7249) 2021-04-15 10:25:31 -07:00
debug_node_inputs_outputs Sync ORTModule branch with master and fix tests (#6526) 2021-02-02 08:59:56 -08:00
featurizers_ops
framework Partial graph execution made simple. (#7324) 2021-04-23 15:09:18 -07:00
fuzzing
global_thread_pools
ir Sync ORTModule branch with master and fix tests (#6526) 2021-02-02 08:59:56 -08:00
mlas Add batch interface to floating point GEMM (#7323) 2021-04-23 17:34:22 -07:00
onnx Adding interface for batched integer gemm (#7249) 2021-04-15 10:25:31 -07:00
opaque_api
optimizer Add level two optimizations for constant propagation transformation. (#7410) 2021-04-23 13:25:54 -07:00
perftest [OpenVINO-EP] Enabling save/Load blob feature (#7054) 2021-04-07 20:59:16 -07:00
platform
proto
providers add gather elements (#7435) 2021-04-23 14:05:17 -07:00
python support loading external execution provider from python frontend (#7332) 2021-04-23 09:54:09 -07:00
shared_lib Add ability to allocate initialized tensor memory from non-arena memory (#7267) 2021-04-20 20:27:48 -07:00
testdata Add level two optimizations for constant propagation transformation. (#7410) 2021-04-23 13:25:54 -07:00
tvm
unittest_main Fix DEBUG_NODE_INPUTS_OUTPUTS test by putting it in a separate process, clean up unused test_main.cc files. (#5949) 2020-12-11 11:36:58 -08:00
util [OpenVINO-EP] Enabling save/Load blob feature (#7054) 2021-04-07 20:59:16 -07:00
win_getopt
xctest