mirror of
https://github.com/saymrwulf/onnxruntime.git
synced 2026-06-06 00:03:22 +00:00
Implemented "lock-free" spinlock to save CPU usage on context switching. The change has been tested on queene service of Ads team, the lock-free version of ort (40 threads) saves CPU usage on gen8 (128 logical processors on 8 numa nodes) windows by nearly half, from 65% to 35%. For 32 cores, the curve is flat: Anubis, 32 vCPU, windows, hugging face models, 95 percentile E2E latency in ms: model | mutex(ms) | mutex-free --- | --- | --- alvert_base_v2 | 34.21 | 34.09 bert_large_uncased | 116.27| 117.84 bart_base | 72.06 | 71.99 distilgpt2 | 25.43 | 25.02 vit_base_patch16_224 | 37.33 | 37.76 Anubis, 32 vCPU win, Linux, 1st party models, 95 percentile E2E latency in ms: model | mutex(ms) | mutex-free --- | --- | --- deepthink_v2 | 24.35 | 22.95 bing_feeds | 36.96 | 36.48 deep_writes | 14.46 | 14.32 keypoints | 9.34 | 7.69 model11 | 1.71 | 1.66 model12 | 1.82 | 1.44 model2 | 4.21 | 3.95 model6 | 1.08 | 1.05 agiencoder | 0.99 | 0.93 geminet_transformer | 5.32 | 5.24 --------- Co-authored-by: Randy Shuai <rashuai@microsoft.com> |
||
|---|---|---|
| .. | ||
| external | ||
| patches | ||
| tensorboard | ||
| adjust_global_compile_flags.cmake | ||
| CMakeLists.txt | ||
| CMakeSettings.json | ||
| codeconv.runsettings | ||
| deps.txt | ||
| EnableVisualStudioCodeAnalysis.props | ||
| gdk_toolchain.cmake | ||
| Info.plist.in | ||
| libonnxruntime.pc.cmake.in | ||
| nuget_helpers.cmake | ||
| onnxruntime.cmake | ||
| onnxruntime_codegen_tvm.cmake | ||
| onnxruntime_common.cmake | ||
| onnxruntime_compile_triton_kernel.cmake | ||
| onnxruntime_config.h.in | ||
| onnxruntime_csharp.cmake | ||
| onnxruntime_flatbuffers.cmake | ||
| onnxruntime_framework.cmake | ||
| onnxruntime_fuzz_test.cmake | ||
| onnxruntime_graph.cmake | ||
| onnxruntime_ios.toolchain.cmake | ||
| onnxruntime_java.cmake | ||
| onnxruntime_java_unittests.cmake | ||
| onnxruntime_kernel_explorer.cmake | ||
| onnxruntime_language_interop_ops.cmake | ||
| onnxruntime_mlas.cmake | ||
| onnxruntime_nodejs.cmake | ||
| onnxruntime_objectivec.cmake | ||
| onnxruntime_opschema_lib.cmake | ||
| onnxruntime_optimizer.cmake | ||
| onnxruntime_providers.cmake | ||
| onnxruntime_pyop.cmake | ||
| onnxruntime_python.cmake | ||
| onnxruntime_rocm_hipify.cmake | ||
| onnxruntime_session.cmake | ||
| onnxruntime_snpe_provider.cmake | ||
| onnxruntime_training.cmake | ||
| onnxruntime_unittests.cmake | ||
| onnxruntime_util.cmake | ||
| onnxruntime_webassembly.cmake | ||
| precompiled_header.cmake | ||
| Sdl.ruleset | ||
| set_winapi_family_desktop.h | ||
| target_delayload.cmake | ||
| uwp_stubs.h | ||
| wcos_rules_override.cmake | ||
| winml.cmake | ||
| winml_cppwinrt.cmake | ||
| winml_sdk_helpers.cmake | ||
| winml_unittests.cmake | ||