onnxruntime/onnxruntime/test/perftest
Ryan Hill ba739a8000
Convert OpenVINO into a shared provider (#5778)
Same as Dnnl and TensorRT before it, now with more methods and more cleanup.
2020-11-20 17:39:57 -08:00
..
posix
windows
command_args_parser.cc Remove nGraph Execution Provider (#5858) 2020-11-19 16:47:55 -08:00
command_args_parser.h
main.cc Maajid/multi threading 2 (#5568) 2020-10-27 14:48:12 -07:00
ort_test_session.cc Convert OpenVINO into a shared provider (#5778) 2020-11-20 17:39:57 -08:00
ort_test_session.h Refactor onnx_test_runner (#5169) 2020-09-18 13:19:35 -07:00
performance_runner.cc Move ort flatbuffers helper functions and value info r/w functions into separated lib (#5276) 2020-09-25 05:36:29 -07:00
performance_runner.h Add option ORT_NO_EXCEPTIONS to disable most exception/throw in /onnxruntime/ (#4894) 2020-08-28 23:03:51 -07:00
README.md Remove nGraph Execution Provider (#5858) 2020-11-19 16:47:55 -08:00
ReadMe.txt
test_configuration.h Sunghcho/denormals (#5391) 2020-10-15 12:47:42 -07:00
test_session.h Refactor onnx_test_runner (#5169) 2020-09-18 13:19:35 -07:00
tf_test_session.h Fix the tensorflow performance test (#3847) 2020-05-13 11:52:59 -07:00
TFModelInfo.cc Try to avoid std::move in return whilst keeping CentOS build happy. (#4163) 2020-06-09 21:41:49 +10:00
TFModelInfo.h Support opset-13 specs of controlflow ops (Loop, If) (#5665) 2020-11-11 23:44:14 -08:00
utils.h

ONNXRuntime Performance Test

This tool provides the performance results using the ONNX Runtime with the specific execution provider to run the inference for a given model using the sample input test data. This tool can provide a reliable measurement for the inference latency usign ONNX Runtime on the device. The options to use with the tool are listed below:

onnxruntime_perf_test [options...] model_path result_file

Options:

-A: Disable memory arena.

-M: Disable memory pattern.

-P: Use parallel executor instead of sequential executor.

-c: [parallel runs]: Specifies the (max) number of runs to invoke simultaneously. Default:1.

-e: [cpu|cuda|mkldnn|tensorrt|openvino|nuphar|acl]: Specifies the execution provider 'cpu','cuda','dnnn','tensorrt', 'openvino', 'nuphar' or 'acl'. Default is 'cpu'.
    
-m: [test_mode]: Specifies the test mode. Value coulde be 'duration' or 'times'. Provide 'duration' to run the test for a fix duration, and 'times' to repeated for a certain times. Default:'duration'.
    
-o: [optimization level]: Default is 1. Valid values are 0 (disable), 1 (basic), 2 (extended), 99 (all). Please see __onnxruntime_c_api.h__ (enum GraphOptimizationLevel) for the full list of all optimization levels.

-u: [path to save optimized model]: Default is empty so no optimized model would be saved.

-p: [profile_file]: Specifies the profile name to enable profiling and dump the profile data to the file.

-r: [repeated_times]: Specifies the repeated times if running in 'times' test mode.Default:1000.
    
-s: Show statistics result, like P75, P90.

-t: [seconds_to_run]: Specifies the seconds to run for 'duration' mode. Default:600.
    
-v: Show verbose information.
    
-x: [intra_op_num_threads]: Sets the number of threads used to parallelize the execution within nodes. A value of 0 means the test will auto-select a default. Must >=0.

-y: [inter_op_num_threads]: Sets the number of threads used to parallelize the execution of the graph (across nodes), A value of 0 means the test will auto-select a default. Must >=0.

-h: help.

Model path and input data dependency: Performance test uses the same input structure as onnx_test_runner tool. It requrires the directory trees as below:

--ModelName
    --test_data_set_0
        --input0.pb
    --test_data_set_2
        --input0.pb
    --model.onnx

The path of model.onnx needs to be provided as <model_path> argument.

Sample output from the tool will look something like this:

Total time cost:58.8053
Total iterations:1000
Average time cost:58.8053 ms
Total run time:58.8102 s
Min Latency is 0.0559777sec
Max Latency is 0.0623472sec
P50 Latency is 0.0587108sec
P90 Latency is 0.0599845sec
P95 Latency is 0.0605676sec
P99 Latency is 0.0619517sec
P999 Latency is 0.0623472se