onnxruntime/cmake
liqun Fu cc26b2dac2
Mlas Gemm 4bit avx2, avx512, and avx512vnni kernels (#20163)
### Description

```
Avx2:
Int8

NS(Prompt)		MLAS(Prompt)  	MLAS(Prompt)Gain/Loss		NS(TokenGen)		MLAS(TokenGen)  	MLAS(TokenGen)Gain/Loss
Blklen16: 	90.96			25.15			-72%					7.65				11.71			53%
Blklen32:	90.73			48.55			-46%					7.86				14.28			81%
Blklen64:	89.49			68.84			-23%					8.30				15.78			90%
Blklen128:	87.38			78.37			-10%					7.90				16.05			103%
Blklen256:	89.45			82.36			-7%					8.30				16.56			99%

Fp32		
NS(Prompt)		MLAS(Prompt)  	MLAS(Prompt)Gain/Loss		NS(TokenGen)		MLAS(TokenGen)  	MLAS(TokenGen)Gain/Loss
Blklen16:	91.36			105.18		15%				7.57			9.52		25%
Blklen32:	89.30			105.99			18%					7.65				9.68			26%
Blklen64:	89.53			101.41			13%					7.97				9.84			23%
Blklen128:	85.23			99.71			16%					7.86				10.39			32%
Blklen256:	88.46			97.94			10%					8.32				10.23			22%

Avx512vnni:
Int8		
NS(Prompt)		MLAS(Prompt)  	MLAS(Prompt)Gain/Loss		NS(TokenGen)		MLAS(TokenGen)  	MLAS(TokenGen)Gain/Loss
Blklen16:	132.18			21.56			-83%					10.34				11.48			11%
Blklen32:	168.28			43.69			-74%					11.85				14.73			24%
Blklen64:	201.81			60.29			-70%					12.36				15.47			25%
Blklen128:	194.92			57.04			-71%					13.03				14.67			12%
Blklen256:	218.76			70.20			-68%					13.33				16.31			22%

Fp32		
NS(Prompt)		MLAS(Prompt)  	MLAS(Prompt)Gain/Loss		NS(TokenGen)		MLAS(TokenGen)  	MLAS(TokenGen)Gain/Loss
Blklen16:	102.81			92.74			-9%					8.41				9.18			9%
Blklen32:	109.49			97.08			-11%					8.83				11.51			30%
Blklen64:	104.13			101.57			-2%					9.32				12.00			28%
Blklen128:	108.45			103.69			-4%					9.58				12.45			29%
Blklen256:	109.43			106.43			-2%					9.19				12.2			32%

```

---------

Signed-off-by: Liqun Fu <liqfu@microsoft.com>
Signed-off-by: liqunfu <liqun.fu@microsoft.com>
Co-authored-by: edgchen1 <18449977+edgchen1@users.noreply.github.com>
2024-04-25 21:30:50 -07:00
..
external upgrade emsdk to 3.1.57 (#20295) 2024-04-19 23:05:18 -07:00
patches Add patch for ONNX 1.16.0 shape inference bug (#20316) 2024-04-17 10:23:22 -07:00
tensorboard
adjust_global_compile_flags.cmake Support visionos build (#20365) 2024-04-23 18:15:07 -07:00
arm64x.cmake Build onnxruntime.dll as arm64x (#18633) 2023-12-06 16:49:00 -08:00
CMakeLists.txt Mlas Gemm 4bit avx2, avx512, and avx512vnni kernels (#20163) 2024-04-25 21:30:50 -07:00
CMakeSettings.json
codeconv.runsettings
deps.txt Integration with ONNX 1.16.0 (#19745) 2024-04-12 09:46:49 -07:00
deps_update_and_upload.py Update google benchmark to 1.8.3. (#19734) 2024-03-01 11:01:58 -08:00
EnableVisualStudioCodeAnalysis.props
gdk_toolchain.cmake
Info.plist.in
libonnxruntime.pc.cmake.in
linux_arm32_crosscompile_toolchain.cmake Add a build validation for Linux ARM64 cross-compile (#18200) 2023-11-08 13:03:18 -08:00
linux_arm64_crosscompile_toolchain.cmake Add a build validation for Linux ARM64 cross-compile (#18200) 2023-11-08 13:03:18 -08:00
maccatalyst_prepare_objects_for_prelink.py Support xcframework for mac catalyst builds. (#19534) 2024-03-20 10:55:19 -07:00
nuget_helpers.cmake
onnxruntime.cmake Support xcframework for mac catalyst builds. (#19534) 2024-03-20 10:55:19 -07:00
onnxruntime_codegen_tvm.cmake
onnxruntime_common.cmake Fix build errors from date/date.h C++20 compatibility (#20139) 2024-04-02 22:10:25 -07:00
onnxruntime_compile_triton_kernel.cmake
onnxruntime_config.h.in Enabling c++ 20 in MacOS build (#16187) 2023-09-26 11:27:02 -07:00
onnxruntime_csharp.cmake
onnxruntime_flatbuffers.cmake
onnxruntime_framework.cmake
onnxruntime_framework.natvis
onnxruntime_fuzz_test.cmake
onnxruntime_graph.cmake [Apple framework] Fix minimal build with training enabled. (#19858) 2024-03-12 11:33:30 -07:00
onnxruntime_ios.toolchain.cmake Support visionos build (#20365) 2024-04-23 18:15:07 -07:00
onnxruntime_java.cmake
onnxruntime_java_unittests.cmake
onnxruntime_kernel_explorer.cmake
onnxruntime_language_interop_ops.cmake
onnxruntime_mlas.cmake Mlas Gemm 4bit avx2, avx512, and avx512vnni kernels (#20163) 2024-04-25 21:30:50 -07:00
onnxruntime_nodejs.cmake Support building Windows CUDA with Ninja (#20176) 2024-04-03 11:19:31 +08:00
onnxruntime_objectivec.cmake
onnxruntime_opschema_lib.cmake
onnxruntime_optimizer.cmake [ROCm] Fix hipify error: fast_divmod.h: No such file or directory (#19060) 2024-01-10 14:49:19 +08:00
onnxruntime_providers.cmake Add initial support for CoreML ML Program to the CoreML EP. (#19347) 2024-02-15 08:46:03 +10:00
onnxruntime_providers_acl.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_armnn.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_azure.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_cann.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_coreml.cmake Fix Objective-C static analysis warnings. (#20417) 2024-04-24 11:48:29 -07:00
onnxruntime_providers_cpu.cmake Support visionos build (#20365) 2024-04-23 18:15:07 -07:00
onnxruntime_providers_cuda.cmake Enable CUDA EP unit testing on Windows (#20039) 2024-03-27 13:32:36 -07:00
onnxruntime_providers_dml.cmake Delay load dxcore.dll in addition to ext-ms-win-dxcore-l1-1-0.dll (#18913) 2023-12-26 12:33:42 -08:00
onnxruntime_providers_dnnl.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_js.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_migraphx.cmake CUDA EP vs ROCM EP hipify audit (#17776) 2023-10-13 10:13:53 +08:00
onnxruntime_providers_nnapi.cmake Make partitioning utils QDQ aware so it does not break up QDQ node units (#19723) 2024-03-12 10:55:49 +10:00
onnxruntime_providers_openvino.cmake Ort openvino npu 1.17 master (#19966) 2024-03-21 18:44:00 -07:00
onnxruntime_providers_qnn.cmake Make partitioning utils QDQ aware so it does not break up QDQ node units (#19723) 2024-03-12 10:55:49 +10:00
onnxruntime_providers_rknpu.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_rocm.cmake CUDA EP vs ROCM EP hipify audit (#17776) 2023-10-13 10:13:53 +08:00
onnxruntime_providers_tensorrt.cmake [TensorRT] adapt for TRT lib name change after TRT 10 GA (#20445) 2024-04-24 21:46:54 -07:00
onnxruntime_providers_tvm.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_vitisai.cmake [VitisAI] Refactor the VAIEP to use MSFT's standalone API (#19058) 2024-01-31 21:08:26 -08:00
onnxruntime_providers_webnn.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_xnnpack.cmake Make partitioning utils QDQ aware so it does not break up QDQ node units (#19723) 2024-03-12 10:55:49 +10:00
onnxruntime_pyop.cmake
onnxruntime_python.cmake Support visionos build (#20365) 2024-04-23 18:15:07 -07:00
onnxruntime_rocm_hipify.cmake add QMoE (#20108) 2024-03-29 10:24:19 -07:00
onnxruntime_session.cmake
onnxruntime_snpe_provider.cmake
onnxruntime_training.cmake
onnxruntime_unittests.cmake Support visionos build (#20365) 2024-04-23 18:15:07 -07:00
onnxruntime_util.cmake
onnxruntime_visionos.toolchain.cmake Support visionos build (#20365) 2024-04-23 18:15:07 -07:00
onnxruntime_webassembly.cmake upgrade emsdk to 3.1.57 (#20295) 2024-04-19 23:05:18 -07:00
precompiled_header.cmake
riscv64.toolchain.cmake Enable RISC-V 64-bit Cross-Compiling Support for ONNX Runtime on Linux (#19238) 2024-01-24 16:27:05 -08:00
Sdl.ruleset
set_winapi_family_desktop.h
target_delayload.cmake
uwp_stubs.h
wcos_rules_override.cmake Stop using apiset in OneCore build: use onecoreuap.lib instead of onecoreuap_apiset.lib (#19632) 2024-02-23 22:31:57 -08:00
winml.cmake [CP] Fix for xfgcheck and Fix WAI ARM64 build (#19634) (#19644) 2024-03-13 17:54:06 -07:00
winml_cppwinrt.cmake
winml_sdk_helpers.cmake
winml_unittests.cmake Update C/C++ dependencies: abseil, date, nsync, googletest, wil, mp11, cpuinfo and safeint (#15470) 2023-09-08 13:35:04 -07:00