onnxruntime/onnxruntime/test/perftest
2019-12-06 15:07:21 -08:00
..
posix Replace GSL with GSL-LITE submodule and fix up refs (#1920) 2019-10-01 12:43:29 -07:00
windows Replace GSL with GSL-LITE submodule and fix up refs (#1920) 2019-10-01 12:43:29 -07:00
command_args_parser.cc Revert "Disable thread pool creation when enabled OpenMP (#2485)" (#2535) 2019-12-03 22:09:02 -08:00
command_args_parser.h Support large model(>2GB) (#520) 2019-03-05 21:27:12 -08:00
main.cc Move CXX API global into the header (#2228) 2019-10-23 14:15:53 -07:00
ort_test_session.cc Revert "Disable thread pool creation when enabled OpenMP (#2485)" (#2535) 2019-12-03 22:09:02 -08:00
ort_test_session.h Ryanunderhill/cxx api2 (#1091) 2019-05-24 11:15:51 -07:00
performance_runner.cc Fix a warning found in the latest VS release 2019-12-06 15:07:21 -08:00
performance_runner.h General performance testing tooling improvements (#1577) 2019-09-11 19:46:59 +10:00
README.md Revert "Disable thread pool creation when enabled OpenMP (#2485)" (#2535) 2019-12-03 22:09:02 -08:00
ReadMe.txt Initial bootstrap commit. 2018-11-19 16:48:22 -08:00
test_configuration.h onnxruntime_perf_test: Add -u option to save optimized model (#2227) 2019-10-28 12:36:31 -07:00
test_session.h perf test runner: support NCHW->NHWC rotation (#976) 2019-05-07 11:50:29 -07:00
tf_test_session.h Fixed tensor reference to const data and cleaned up Env API. (#1979) 2019-10-24 10:28:13 -07:00
TFModelInfo.cc Fixed tensor reference to const data and cleaned up Env API. (#1979) 2019-10-24 10:28:13 -07:00
TFModelInfo.h Integrate tensorflow into onnxruntime_perf_test tool 2019-04-09 15:55:08 -07:00
utils.h Initial bootstrap commit. 2018-11-19 16:48:22 -08:00

ONNXRuntime Performance Test

This tool provides the performance results using the ONNX Runtime with the specific execution provider to run the inference for a given model using the sample input test data. This tool can provide a reliable measurement for the inference latency usign ONNX Runtime on the device. The options to use with the tool are listed below:

onnxruntime_perf_test [options...] model_path result_file

Options:

-A: Disable memory arena.

-M: Disable memory pattern.

-P: Use parallel executor instead of sequential executor.

-c: [parallel runs]: Specifies the (max) number of runs to invoke simultaneously. Default:1.

-e: [cpu|cuda|mkldnn|tensorrt|ngraph|openvino|nuphar|acl]: Specifies the execution provider 'cpu','cuda','dnnn','tensorrt', 'ngraph', 'openvino', 'nuphar' or 'acl'. Default is 'cpu'.
    
-m: [test_mode]: Specifies the test mode. Value coulde be 'duration' or 'times'. Provide 'duration' to run the test for a fix duration, and 'times' to repeated for a certain times. Default:'duration'.
    
-o: [optimization level]: Default is 1. Valid values are 0 (disable), 1 (basic), 2 (extended), 99 (all). Please see __onnxruntime_c_api.h__ (enum GraphOptimizationLevel) for the full list of all optimization levels.

-u: [path to save optimized model]: Default is empty so no optimized model would be saved.

-p: [profile_file]: Specifies the profile name to enable profiling and dump the profile data to the file.

-r: [repeated_times]: Specifies the repeated times if running in 'times' test mode.Default:1000.
    
-s: Show statistics result, like P75, P90.

-t: [seconds_to_run]: Specifies the seconds to run for 'duration' mode. Default:600.
    
-v: Show verbose information.
    
-x: [intra_op_num_threads]: Sets the number of threads used to parallelize the execution within nodes. A value of 0 means the test will auto-select a default. Must >=0.

-y: [inter_op_num_threads]: Sets the number of threads used to parallelize the execution of the graph (across nodes), A value of 0 means the test will auto-select a default. Must >=0.

-h: help.

Model path and input data dependency: Performance test uses the same input structure as onnx_test_runner tool. It requrires the directory trees as below:

--ModelName
    --test_data_set_0
        --input0.pb
    --test_data_set_2
        --input0.pb
    --model.onnx

The path of model.onnx needs to be provided as <model_path> argument.

Sample output from the tool will look something like this:

Total time cost:58.8053
Total iterations:1000
Average time cost:58.8053 ms
Total run time:58.8102 s
Min Latency is 0.0559777sec
Max Latency is 0.0623472sec
P50 Latency is 0.0587108sec
P90 Latency is 0.0599845sec
P95 Latency is 0.0605676sec
P99 Latency is 0.0619517sec
P999 Latency is 0.0623472se