onnxruntime/cmake
Chen Fu dc72159105
Symmetric Quant indirect Conv kernel for ARMv8 A55 chip (#10862)
ARM a55 micro-architecture (with dot product instructions), similar to a53, is widely used as little cores in big.Little configurations. A55 has a narrower memory load/store hardware, where a 128b load instruction would block the pipeline for 2 whole cycles, during which no other instructions can be executed. On the other hand, a 64b load instruction can be duo issued with many other instructions.

This change adds a Symmetric Quant indirect Conv kernel for a55 micro-architecture, where we replace

ldr q4,[x1],

with

ldr d4,[x1],
ldr x11,[x1],
ins v4.d[1],x11

so that we can try to hide the memory load cycles behind computing cycles in the kernel.

With this new kernel, cartoongan model shows significant perf improvement on Pixel5a little cores (2 threads running on two little cores):

new kernel: 2188.59 ms
old kernel: 2360.61 ms
2022-03-25 17:10:47 -07:00
..
external [TVM EP] code refactor (#10655) 2022-03-16 13:55:04 +01:00
patches Patch absl so that it doesn't disable important VC++ warnings (#10836) 2022-03-10 15:35:39 -08:00
tensorboard Update compliance tasks in python packaging pipeline and fix some compile warnings (#8471) 2021-07-30 17:16:37 -07:00
CMakeLists.txt [cmake] Add keyword STATIC to add_library in function onnxruntime_add_static_library (#10998) 2022-03-25 16:19:36 -07:00
CMakeSettings.json
codeconv.runsettings
EnableVisualStudioCodeAnalysis.props Fix SDL warnings in CPU EP (#9975) 2021-12-19 20:54:29 -08:00
Info.plist.in
libonnxruntime.pc.cmake.in
nuget_helpers.cmake
onnxruntime.cmake Remove Windows Store specific code 2022-03-17 23:38:14 -07:00
onnxruntime_codegen_tvm.cmake Update our absl cmake files (#10762) 2022-03-04 09:28:04 -08:00
onnxruntime_common.cmake Remove Windows Store specific code 2022-03-17 23:38:14 -07:00
onnxruntime_config.h.in [wasm] update emscripten v2.0.34 (#10391) 2022-01-26 14:46:02 -08:00
onnxruntime_csharp.cmake [TVM EP] Rename Standalone TVM (STVM) Execution Provider to TVM EP (#10260) 2022-02-15 10:21:02 +01:00
onnxruntime_eager.cmake Update our absl cmake files (#10762) 2022-03-04 09:28:04 -08:00
onnxruntime_flatbuffers.cmake Remove Windows Store specific code 2022-03-17 23:38:14 -07:00
onnxruntime_framework.cmake Remove onnxruntime_util dependency on onnxruntime_framework (#10512) 2022-02-10 19:17:08 -08:00
onnxruntime_fuzz_test.cmake
onnxruntime_graph.cmake Reorganize contrib op schemas (#10494) 2022-02-09 09:31:58 -08:00
onnxruntime_ios.toolchain.cmake
onnxruntime_java.cmake Add linux and macos arm64 java aritifacts (#10981) 2022-03-25 16:23:17 -07:00
onnxruntime_java_unittests.cmake
onnxruntime_language_interop_ops.cmake
onnxruntime_mlas.cmake Symmetric Quant indirect Conv kernel for ARMv8 A55 chip (#10862) 2022-03-25 17:10:47 -07:00
onnxruntime_nodejs.cmake Add Node.js binding support to packaging pipeline (#9577) 2021-11-05 15:29:40 -07:00
onnxruntime_nuphar_extern.cmake
onnxruntime_objectivec.cmake
onnxruntime_opschema_lib.cmake Update compliance tasks in python packaging pipeline and fix some compile warnings (#8471) 2021-07-30 17:16:37 -07:00
onnxruntime_optimizer.cmake Remove ORT_ENABLE_RUNTIME_OPTIMIZATION_IN_MINIMAL_BUILD. (#10778) 2022-03-08 16:18:49 -08:00
onnxruntime_providers.cmake Improve NonZero on CUDA/ROCM (#10307) 2022-03-25 07:35:45 +08:00
onnxruntime_pyop.cmake Packaging pipeline now builds with PythonOp (aka running autograd.Function) (#8652) 2021-08-17 10:55:13 -07:00
onnxruntime_python.cmake Fix a couple of issues with the python package tools (#10858) 2022-03-15 15:52:12 +10:00
onnxruntime_session.cmake [ROCm] static re-hipify of CUDA EP to ROCm EP, now a shared provider (#8877) 2021-10-14 15:15:51 -07:00
onnxruntime_training.cmake [ROCm] static re-hipify of CUDA EP to ROCm EP, now a shared provider (#8877) 2021-10-14 15:15:51 -07:00
onnxruntime_unittests.cmake amdmigraphx_ep-add ops to be supported by migraphx and fixed a bug in check ops to be supported (#10496) 2022-03-23 19:17:19 -07:00
onnxruntime_util.cmake Convert ConvActivationFusion transformer to a selector action transformer. (#10687) 2022-03-02 13:47:55 +10:00
onnxruntime_webassembly.cmake Support ORT WASM compilation with the training flag (#10973) 2022-03-22 16:13:35 -07:00
precompiled_header.cmake Fix Windows Store build (#8753) 2021-08-23 11:19:03 -07:00
protobuf_function.cmake
Sdl.ruleset Enable more static analysis warnings and enable the analyzer for training cpu (#10176) 2022-01-27 11:17:20 -08:00
set_winapi_family_desktop.h
target_delayload.cmake Remove Windows Store specific code 2022-03-17 23:38:14 -07:00
uwp_stubs.h Fix Windows Store build (#8753) 2021-08-23 11:19:03 -07:00
wcos_rules_override.cmake
wil.cmake
winml.cmake Enable JoinModels API in WinML+RT Experimental API (#9746) 2021-11-12 16:56:31 -08:00
winml_cppwinrt.cmake Fix Windows Store build (#8753) 2021-08-23 11:19:03 -07:00
winml_sdk_helpers.cmake
winml_unittests.cmake Remove Windows Store specific code 2022-03-17 23:38:14 -07:00