onnxruntime/cmake
Jing Fang 1942e40e05
[ARM64] MatMulNBits: use neon instrinsics to convert between fp16 and fp32 (#22195)
### Description
For fp16 Atype, the fallback operation is convert the data to fp32 and
calculate.
Added neon intrinsics version to speed up the conversion.

Store address alignment and loop unrolling have insignificant impact on
latency so they are omitted.

|Benchmark | Time | CPU |

|--------------|---------------------------------------------|--------------------|
|M_ConvertF16ToF32/baseline/real_time | 1076961 ns | 1083398 ns |
|M_ConvertF16ToF32/aligned:0/real_time | 46785 ns | 46516 ns |
|M_ConvertF16ToF32/aligned:1/real_time | 46631 ns | 46391 ns |
|M_ConvertF16ToF32_unroll2/aligned:0/real_time | 44074 ns | 44392 ns |
|M_ConvertF16ToF32_unroll2/aligned:1/real_time | 44726 ns | 45226 ns |
|M_ConvertF32ToF16/baseline/real_time | 520109 ns | 527329 ns |
|M_ConvertF32ToF16/aligned:0/real_time | 73610 ns | 74015 ns |
|M_ConvertF32ToF16/aligned:1/real_time | 71557 ns | 71525 ns |
|M_ConvertF32ToF16_unroll2/aligned:0/real_time | 64227 ns | 63374 ns |
|M_ConvertF32ToF16_unroll2/aligned:1/real_time | 67428 ns | 67989 ns |



### Motivation and Context
speed up fallback implementation of Fp16 MatMulNBits
2024-09-26 13:55:40 -07:00
..
external Get build working on Xcode 16 (#22168) 2024-09-24 08:33:03 -07:00
patches Get build working on Xcode 16 (#22168) 2024-09-24 08:33:03 -07:00
tensorboard
adjust_global_compile_flags.cmake Enable Android 16 KB page size support (#22076) 2024-09-19 18:53:57 +10:00
arm64x.cmake Dev/mookerem/arm64x update (#20536) 2024-05-07 12:50:38 -07:00
CMakeLists.txt Get build working on Xcode 16 (#22168) 2024-09-24 08:33:03 -07:00
CMakePresets.json Create CMake option onnxruntime_USE_VCPKG (#21348) 2024-09-10 16:39:27 -07:00
CMakeSettings.json
codeconv.runsettings
deps.txt [Running CI] Update TensorRT to 10.4 (#22049) 2024-09-26 11:10:52 -07:00
deps_update_and_upload.py
EnableVisualStudioCodeAnalysis.props
gdk_toolchain.cmake
Info.plist.in
libonnxruntime.pc.cmake.in
linux_arm32_crosscompile_toolchain.cmake
linux_arm64_crosscompile_toolchain.cmake
maccatalyst_prepare_objects_for_prelink.py
nuget_helpers.cmake Update nuget.exe used in WindowsAI nuget packaging so readme property is supported. (#22141) 2024-09-19 19:06:47 +10:00
onnxruntime.cmake Specify the paths of system tools when building Apple framework (#22056) 2024-09-23 17:19:30 +08:00
onnxruntime_codegen_tvm.cmake
onnxruntime_common.cmake Enable QNN HTP support for Node (#20576) 2024-05-09 13:11:07 -07:00
onnxruntime_compile_triton_kernel.cmake [CUDA] Add SparseAttention operator for Phi-3-small (#20216) 2024-04-30 09:06:29 -07:00
onnxruntime_config.h.in Get build working on Xcode 16 (#22168) 2024-09-24 08:33:03 -07:00
onnxruntime_csharp.cmake
onnxruntime_flatbuffers.cmake
onnxruntime_framework.cmake Adding CUDNN Frontend and use for CUDA NN Convolution (#19470) 2024-08-02 15:16:42 -07:00
onnxruntime_framework.natvis
onnxruntime_fuzz_test.cmake [Fuzzer] Add two new ORT libfuzzer (Linux clang support for now) (#22055) 2024-09-12 11:50:34 -07:00
onnxruntime_graph.cmake
onnxruntime_ios.toolchain.cmake Support visionos build (#20365) 2024-04-23 18:15:07 -07:00
onnxruntime_java.cmake Remove deprecated "mobile" packages (#20941) 2024-06-07 16:20:32 -05:00
onnxruntime_java_unittests.cmake
onnxruntime_kernel_explorer.cmake Add Linux ROCm CI Pipeline (#21798) 2024-08-30 14:50:32 +08:00
onnxruntime_mlas.cmake [ARM64] MatMulNBits: use neon instrinsics to convert between fp16 and fp32 (#22195) 2024-09-26 13:55:40 -07:00
onnxruntime_nodejs.cmake Enable QNN HTP support for Node (#20576) 2024-05-09 13:11:07 -07:00
onnxruntime_objectivec.cmake
onnxruntime_opschema_lib.cmake
onnxruntime_optimizer.cmake Flash attention recompute (#20603) 2024-05-21 13:38:19 +08:00
onnxruntime_providers.cmake [VSINPU]Code improvement && Slice/Dropout OP support (#21217) 2024-07-09 20:14:46 -07:00
onnxruntime_providers_acl.cmake
onnxruntime_providers_armnn.cmake
onnxruntime_providers_azure.cmake
onnxruntime_providers_cann.cmake
onnxruntime_providers_coreml.cmake Fix Objective-C static analysis warnings. (#20417) 2024-04-24 11:48:29 -07:00
onnxruntime_providers_cpu.cmake Add CUDA custom op header files to Linux tarball (#21551) 2024-08-01 04:23:02 -07:00
onnxruntime_providers_cuda.cmake Adding CUDNN Frontend and use for CUDA NN Convolution (#19470) 2024-08-02 15:16:42 -07:00
onnxruntime_providers_dml.cmake
onnxruntime_providers_dnnl.cmake
onnxruntime_providers_js.cmake
onnxruntime_providers_migraphx.cmake Migraphx ep windows build (#21284) 2024-07-11 21:21:38 -07:00
onnxruntime_providers_nnapi.cmake
onnxruntime_providers_openvino.cmake Ovep release lnl 1.2.1 (#22027) 2024-09-11 14:55:40 -07:00
onnxruntime_providers_qnn.cmake
onnxruntime_providers_rknpu.cmake
onnxruntime_providers_rocm.cmake Add CUDA custom op header files to Linux tarball (#21551) 2024-08-01 04:23:02 -07:00
onnxruntime_providers_tensorrt.cmake Adding CUDNN Frontend and use for CUDA NN Convolution (#19470) 2024-08-02 15:16:42 -07:00
onnxruntime_providers_tvm.cmake
onnxruntime_providers_vitisai.cmake [VitisAI] remove wrong error msg, required by Microsoft (#21715) 2024-08-21 21:10:28 -07:00
onnxruntime_providers_vsinpu.cmake [VSINPU]Code improvement && Slice/Dropout OP support (#21217) 2024-07-09 20:14:46 -07:00
onnxruntime_providers_webnn.cmake
onnxruntime_providers_xnnpack.cmake
onnxruntime_python.cmake [QNN EP] set up py packaging pipeline for Linux x64 (#22132) 2024-09-18 23:24:32 -07:00
onnxruntime_rocm_hipify.cmake [CUDA] cuDNN Flash Attention (#21629) 2024-08-20 08:50:22 -07:00
onnxruntime_session.cmake Adding CUDNN Frontend and use for CUDA NN Convolution (#19470) 2024-08-02 15:16:42 -07:00
onnxruntime_snpe_provider.cmake
onnxruntime_training.cmake Adding CUDNN Frontend and use for CUDA NN Convolution (#19470) 2024-08-02 15:16:42 -07:00
onnxruntime_unittests.cmake Get build working on Xcode 16 (#22168) 2024-09-24 08:33:03 -07:00
onnxruntime_util.cmake
onnxruntime_visionos.toolchain.cmake Support visionos build (#20365) 2024-04-23 18:15:07 -07:00
onnxruntime_webassembly.cmake [js/web] allow load WebAssembly binary from buffer (#21534) 2024-07-29 13:39:38 -07:00
precompiled_header.cmake
riscv64.toolchain.cmake
Sdl.ruleset
set_winapi_family_desktop.h
target_delayload.cmake
uwp_stubs.h
vcpkg-configuration.json Create CMake option onnxruntime_USE_VCPKG (#21348) 2024-09-10 16:39:27 -07:00
vcpkg.json Create CMake option onnxruntime_USE_VCPKG (#21348) 2024-09-10 16:39:27 -07:00
wcos_rules_override.cmake
winml.cmake Change libonnxruntime.so's SONAME: remove the minor and patch version. (#21339) 2024-07-15 14:21:34 -07:00
winml_cppwinrt.cmake
winml_sdk_helpers.cmake
winml_unittests.cmake