onnxruntime/cmake
Chen Fu 00b345eb7b
ARM Neon S8S8 kernel for QGemm (#8695)
Using signed int, qgemm kernel avoids extending uint8 to int16 while computing matrix multiplication, achieving higher performance. We also find that by using only lower 64b of vector registers to load A and B matrix, we can get further performance improvements. We also experimented with using ldp to load two 64b in one shot, vs using two ldr to load one 64b at a time, in both Big and little cores, there is no noticeable differences.

Submitting the LDP version. At this point we don't need to choose kernel based on micro-architecture.

Inference time of resnet50, thread count 2

Big Core on Pixel 3a
Current master: 292.947 ms
First iteration S8S8: 188.239 ms
LDP load two 64b reg: 178.715 ms
LDR load one 64b reg: 179.536 ms

Little Core
Master: 546.317 ms
S8S8: 513.332 ms
LDP: 489.19 ms
LDR: 497.865 ms

Raspberry Pi 3B+
Master: 660.08 ms
S8S8: 608.577 ms
LDP: 603.675 ms
LDR 602.075 ms
2021-08-18 09:58:47 -07:00
..
external [Nuphar] Fix Windows build in VS 2019 (#8728) 2021-08-13 16:13:34 -07:00
patches Sync ORTModule branch with master and fix tests (#6526) 2021-02-02 08:59:56 -08:00
tensorboard Update compliance tasks in python packaging pipeline and fix some compile warnings (#8471) 2021-07-30 17:16:37 -07:00
CMakeLists.txt [Nuphar] Fix Windows build in VS 2019 (#8728) 2021-08-13 16:13:34 -07:00
CMakeSettings.json
codeconv.runsettings
Info.plist.in Enable build dynamic framework for macOS/iOS (#7343) 2021-04-15 16:47:53 -07:00
libonnxruntime.pc.cmake.in cmake: support install target with generated pkg-config file (#7076) 2021-03-22 19:36:31 -07:00
nuget_helpers.cmake
onnxruntime.cmake Add iOS/macOS static framework (#8357) 2021-07-14 16:39:17 -07:00
onnxruntime_codegen.cmake Update manylinux build scripts and GPU CUDA version from 11.0 to 11.1 (#7632) 2021-06-02 23:36:49 -07:00
onnxruntime_common.cmake Revert "Fix Windows Store build (#8481)" (#8679) 2021-08-11 00:37:36 -07:00
onnxruntime_config.h.in Fix unknown warning "-Wformat-truncation" build failure for arm (#8721) 2021-08-12 23:47:03 -07:00
onnxruntime_csharp.cmake
onnxruntime_eager.cmake Integrate eager mode source code into onnxruntime repo (#8584) 2021-08-06 08:30:27 -07:00
onnxruntime_flatbuffers.cmake Revert "Fix Windows Store build (#8481)" (#8679) 2021-08-11 00:37:36 -07:00
onnxruntime_framework.cmake Decouple Forward and Backward of ATenOp (#8301) 2021-07-23 16:53:26 +08:00
onnxruntime_fuzz_test.cmake Merge CPU packaging pipelines (#6480) 2021-02-04 08:38:56 -08:00
onnxruntime_graph.cmake Update compliance tasks in python packaging pipeline and fix some compile warnings (#8471) 2021-07-30 17:16:37 -07:00
onnxruntime_ios.toolchain.cmake Enable build dynamic framework for macOS/iOS (#7343) 2021-04-15 16:47:53 -07:00
onnxruntime_java.cmake [Java] Adds support for DNNL, OpenVINO, TensorRT shared providers and refactors the CUDA shared provider loader (#8013) 2021-07-20 22:33:15 -07:00
onnxruntime_java_unittests.cmake [Java] Adds support for DNNL, OpenVINO, TensorRT shared providers and refactors the CUDA shared provider loader (#8013) 2021-07-20 22:33:15 -07:00
onnxruntime_language_interop_ops.cmake Update manylinux build scripts and GPU CUDA version from 11.0 to 11.1 (#7632) 2021-06-02 23:36:49 -07:00
onnxruntime_mlas.cmake ARM Neon S8S8 kernel for QGemm (#8695) 2021-08-18 09:58:47 -07:00
onnxruntime_nodejs.cmake Specify correct dependency for CI pipeline of nodejs binding (#7717) 2021-05-15 08:56:58 -07:00
onnxruntime_nuphar_extern.cmake Add static code analyzer to Windows CPU/GPU CI builds and fix the warnings (#7489) 2021-04-29 11:54:57 -07:00
onnxruntime_objectivec.cmake [Objective-C API] Add script to assemble pod package files. (#7958) 2021-06-07 19:16:39 -07:00
onnxruntime_opschema_lib.cmake Update compliance tasks in python packaging pipeline and fix some compile warnings (#8471) 2021-07-30 17:16:37 -07:00
onnxruntime_optimizer.cmake Refactor QDQ optimizers to enable future usage in minimal build (#8191) 2021-07-09 16:11:43 +10:00
onnxruntime_providers.cmake [CoreML EP]Make coreml ep build on non-macOS platform (#8677) 2021-08-18 09:35:32 -07:00
onnxruntime_pyop.cmake Packaging pipeline now builds with PythonOp (aka running autograd.Function) (#8652) 2021-08-17 10:55:13 -07:00
onnxruntime_python.cmake [Nuphar] Fix Windows build in VS 2019 (#8728) 2021-08-13 16:13:34 -07:00
onnxruntime_session.cmake Packaging pipeline now builds with PythonOp (aka running autograd.Function) (#8652) 2021-08-17 10:55:13 -07:00
onnxruntime_training.cmake Packaging pipeline now builds with PythonOp (aka running autograd.Function) (#8652) 2021-08-17 10:55:13 -07:00
onnxruntime_unittests.cmake [CoreML EP]Make coreml ep build on non-macOS platform (#8677) 2021-08-18 09:35:32 -07:00
onnxruntime_util.cmake Update manylinux build scripts and GPU CUDA version from 11.0 to 11.1 (#7632) 2021-06-02 23:36:49 -07:00
onnxruntime_webassembly.cmake [wasm] allows to specify MALLOC setting for wasm build (#7934) 2021-06-03 23:08:56 -07:00
precompiled_header.cmake Revert "Fix Windows Store build (#8481)" (#8679) 2021-08-11 00:37:36 -07:00
protobuf_function.cmake Sync ORTModule branch with master and fix tests (#6526) 2021-02-02 08:59:56 -08:00
set_winapi_family_desktop.h
store_toolchain.cmake
target_delayload.cmake
wcos_rules_override.cmake
wil.cmake
winml.cmake Revert "Fix Windows Store build (#8481)" (#8679) 2021-08-11 00:37:36 -07:00
winml_cppwinrt.cmake Revert "Fix Windows Store build (#8481)" (#8679) 2021-08-11 00:37:36 -07:00
winml_sdk_helpers.cmake
winml_unittests.cmake Revert "Fix Windows Store build (#8481)" (#8679) 2021-08-11 00:37:36 -07:00