onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-21 19:18:55 +00:00

History

Chen Fu 3c10f027de 4b quantization for weights of LLMs (#16833 ) ### Description Blockwise 4b quantization for LLMs. 1. Introduce 4b block-wise quantization for linear layer weights. 2. Implements matrix multiplication kernel for fp32 x int4 3. Implements special operator MatMulFpQ4 4. Implements quantization tool, that convert MatMul operator to MatMulFpQ4, when the right hand side is 2D const tensor. ### Motivation and Context Compress and accelerate LLMs \|Benchmark \| Time(ns)\| \|-------------\|----------\| \|Q4GEMM/Q4Sym/M:1/N:4096/K:4096/Threads:8\| 218054\| \|Q4GEMM/Q4Sym/M:1024/N:4096/K:4096/Threads:8\| 35830155\| \|Q4GEMM/Q4Sym/M:2048/N:4096/K:4096/Threads:8\| 73479790\| \|Q4GEMM/Q4Zp8/M:1/N:4096/K:4096/Threads:8\| 270152\| \|Q4GEMM/Q4Zp8/M:1024/N:4096/K:4096/Threads:8\| 35826721\| \|Q4GEMM/Q4Zp8/M:2048/N:4096/K:4096/Threads:8\| 73021200\| \|Q4GEMM/Q4Sym128/M:1/N:4096/K:4096/Threads:8\| 213832\| \|Q4GEMM/Q4Sym128/M:1024/N:4096/K:4096/Threads:8\| 36749874\| \|Q4GEMM/Q4Sym128/M:2048/N:4096/K:4096/Threads:8\| 72618120\| \|Benchmark \| Time(ns)\| \|-------------\|----------\| \|SGEMM/LLM/M:1/N:4096/K:4096/Threads:8\| 522610\| \|SGEMM/LLM/M:1024/N:4096/K:4096/Threads:8\| 39237689\| \|SGEMM/LLM/M:2048/N:4096/K:4096/Threads:8\| 75983467\| --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>		2023-08-07 12:23:55 -07:00
..
external	Fix protobuf TaggedStringPtr display (#17008 )	2023-08-04 17:51:01 -07:00
patches	Fix some build issues on MacOS with Xcode 14.3. (#15878 )	2023-06-07 12:07:11 -07:00
tensorboard
adjust_global_compile_flags.cmake	Cleanup WASM cmake code (#15996 )	2023-05-20 18:07:39 -07:00
CMakeLists.txt	[ROCm] add gfx1100 and gfx1101 to CMAKE_HIP_ARCHITECTURES (#16972 )	2023-08-04 08:38:42 +08:00
CMakeSettings.json
codeconv.runsettings
deps.txt	Allow --build_wasm on a mac system (#16761 )	2023-07-21 14:21:37 -07:00
EnableVisualStudioCodeAnalysis.props
gdk_toolchain.cmake
Info.plist.in
libonnxruntime.pc.cmake.in
nuget_helpers.cmake
onnxruntime.cmake	[ios] Enable `--use_extensions` with custom built iOS pod (#16711 )	2023-07-14 15:37:16 -07:00
onnxruntime_codegen_tvm.cmake
onnxruntime_common.cmake	[C#, CPP] Introduce Float16/BFloat16 support and tests for C#, C++ (#16506 )	2023-07-14 10:46:52 -07:00
onnxruntime_compile_triton_kernel.cmake	[ROCm] Add ROCm Triton TunableOp for GroupNorm (#16196 )	2023-07-11 13:55:30 +08:00
onnxruntime_config.h.in	Enable `-Wshorten-64-to-32` warning if available. (#16524 )	2023-07-07 08:11:44 -07:00
onnxruntime_csharp.cmake
onnxruntime_flatbuffers.cmake	Rework some external targets to ease building with `-DFETCHCONTENT_FULLY_DISCONNECTED=ON` (#15323 )	2023-04-03 17:45:12 -07:00
onnxruntime_framework.cmake	[C#, CPP] Introduce Float16/BFloat16 support and tests for C#, C++ (#16506 )	2023-07-14 10:46:52 -07:00
onnxruntime_framework.natvis	[C#, CPP] Introduce Float16/BFloat16 support and tests for C#, C++ (#16506 )	2023-07-14 10:46:52 -07:00
onnxruntime_fuzz_test.cmake
onnxruntime_graph.cmake	added support for cmake "find_package" (#8919 )	2023-06-19 22:20:31 -07:00
onnxruntime_ios.toolchain.cmake
onnxruntime_java.cmake	Update build option for training in java to enable_training_api (#15638 )	2023-04-24 11:53:08 -07:00
onnxruntime_java_unittests.cmake	Update build option for training in java to enable_training_api (#15638 )	2023-04-24 11:53:08 -07:00
onnxruntime_kernel_explorer.cmake	[ROCm] TunableOp: Update rocBLAS get_solutions API (since ROCm5.6) (#16657 )	2023-07-13 11:20:26 +08:00
onnxruntime_language_interop_ops.cmake
onnxruntime_mlas.cmake	4b quantization for weights of LLMs (#16833 )	2023-08-07 12:23:55 -07:00
onnxruntime_nodejs.cmake	[js] upgrade dependencies and enable strict mode (#14930 )	2023-03-22 15:05:04 -07:00
onnxruntime_objectivec.cmake	Objective C Training API: TrainingSession (#16374 )	2023-06-28 09:13:56 -07:00
onnxruntime_opschema_lib.cmake
onnxruntime_optimizer.cmake	Triton Codegen for ORTModule (#15831 )	2023-07-13 18:17:58 +08:00
onnxruntime_providers.cmake	[JS/Web] Added Gelu contrib operator support to JSEP (#16909 )	2023-07-31 09:18:58 -07:00
onnxruntime_pyop.cmake
onnxruntime_python.cmake	Refactor schema extraction and output unflattening (#16894 )	2023-08-04 13:58:21 +08:00
onnxruntime_rocm_hipify.cmake	[CUDA] Add PackedMultiHeadAttention operator (#16779 )	2023-07-28 16:35:38 -07:00
onnxruntime_session.cmake	added support for cmake "find_package" (#8919 )	2023-06-19 22:20:31 -07:00
onnxruntime_snpe_provider.cmake
onnxruntime_training.cmake	Triton Codegen for ORTModule (#15831 )	2023-07-13 18:17:58 +08:00
onnxruntime_unittests.cmake	Ignore deprecated declarations warning for TRT EP build (#16948 )	2023-08-02 09:51:58 -07:00
onnxruntime_util.cmake
onnxruntime_webassembly.cmake	[WebNN EP] Merge support for segment anything into main branch (#16208 )	2023-06-07 09:56:37 -07:00
precompiled_header.cmake
Sdl.ruleset	Add a Github workflow for Prefast (#15763 )	2023-05-03 11:42:51 -07:00
set_winapi_family_desktop.h
target_delayload.cmake
uwp_stubs.h	Run clang-format in CI (#15524 )	2023-04-18 09:26:58 -07:00
wcos_rules_override.cmake
winml.cmake
winml_cppwinrt.cmake
winml_sdk_helpers.cmake
winml_unittests.cmake