onnxruntime/cmake
Chen Fu 50a6f095cd
Symmetric QGEMM kernel for ARMv8 A55 chip (#10754)
ARM a55 micro-architecture (with dot product instructions), similar to a53, is widely used as little cores in big.Little configurations. A55 has a narrower memory load/store hardware, where a 128b load instruction would block the pipeline for 2 whole cycles, during which no other instructions can be executed. On the other hand, a 64b load instruction can be duo issued with many other instructions.

This change adds a Symmetric QGEMM kernel for a55 micro-architecture, where we replace

ldr q4,[x1],#16

with

ldr d4,[x1],#8
ldr x11,[x1],#8
ins v4.d[1],x11

so that we can try to hide the memory load cycles behind computing cycles in the kernel.

Co-authored-by: Chen Fu <fuchen@microsoft.com>
2022-03-07 08:41:13 -08:00
..
external
patches
tensorboard
CMakeLists.txt
CMakeSettings.json
codeconv.runsettings
EnableVisualStudioCodeAnalysis.props
Info.plist.in
libonnxruntime.pc.cmake.in
nuget_helpers.cmake
onnxruntime.cmake
onnxruntime_codegen_tvm.cmake
onnxruntime_common.cmake
onnxruntime_config.h.in
onnxruntime_csharp.cmake
onnxruntime_eager.cmake
onnxruntime_flatbuffers.cmake
onnxruntime_framework.cmake
onnxruntime_fuzz_test.cmake
onnxruntime_graph.cmake
onnxruntime_ios.toolchain.cmake
onnxruntime_java.cmake
onnxruntime_java_unittests.cmake
onnxruntime_language_interop_ops.cmake
onnxruntime_mlas.cmake Symmetric QGEMM kernel for ARMv8 A55 chip (#10754) 2022-03-07 08:41:13 -08:00
onnxruntime_nodejs.cmake
onnxruntime_nuphar_extern.cmake
onnxruntime_objectivec.cmake
onnxruntime_opschema_lib.cmake
onnxruntime_optimizer.cmake
onnxruntime_providers.cmake
onnxruntime_pyop.cmake
onnxruntime_python.cmake
onnxruntime_session.cmake
onnxruntime_training.cmake
onnxruntime_unittests.cmake
onnxruntime_util.cmake
onnxruntime_webassembly.cmake
precompiled_header.cmake
protobuf_function.cmake
Sdl.ruleset
set_winapi_family_desktop.h
store_toolchain.cmake
target_delayload.cmake
uwp_stubs.h
wcos_rules_override.cmake
wil.cmake
winml.cmake
winml_cppwinrt.cmake
winml_sdk_helpers.cmake
winml_unittests.cmake