onnxruntime/cmake
Yufeng Li 11af34440a
Add MatMul 4bits support on GPU (#17890)
### Description
<!-- Describe your changes. -->
Add a contrib op MatMulNBits and related toolchain to support
quantization on weight. This PR only adds support for 4bits. It:

- add schema for contrib op MatMulNBits which can support 1-7 bits
quantization on weight.
- a naive implementation for 4bits MatMulNBits on CPU and GPU, i.e.,
implemented like MatMul(A, Dequantize(B)).
- a special implementation for GemV for 4bits MatMulNBits and related
benchmark tool
- tool to quantization model with 4bits. 

Next:
- add general and more efficient kernels for 4bits MatMulNBits on CPU
and GPU
2023-10-13 16:55:30 -07:00
..
external Enable backtrace in unit tests (#17655) 2023-09-29 12:32:56 -07:00
patches ONNX 1.15 integration (#17125) 2023-09-26 14:44:48 -07:00
tensorboard Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
adjust_global_compile_flags.cmake Update cmake to 3.27 and upgrade Linux CUDA docker files from CentOS7 to UBI8 (#16856) 2023-09-05 18:12:10 -07:00
CMakeLists.txt Remove MPI dependency (#17624) 2023-10-06 15:33:18 +08:00
CMakeSettings.json
codeconv.runsettings
deps.txt ONNX 1.15 integration (#17125) 2023-09-26 14:44:48 -07:00
deps_update_and_upload.py [Linter] Bump ruff and remove pylint (#17797) 2023-10-05 21:07:33 -07:00
EnableVisualStudioCodeAnalysis.props
gdk_toolchain.cmake
Info.plist.in
libonnxruntime.pc.cmake.in
nuget_helpers.cmake
onnxruntime.cmake Add noexcep_operators to onnxruntime internal libraries (#17850) 2023-10-09 16:29:41 -07:00
onnxruntime_codegen_tvm.cmake Use target name for flatbuffers (#13991) 2022-12-20 11:44:02 -08:00
onnxruntime_common.cmake Update C/C++ dependencies: abseil, date, nsync, googletest, wil, mp11, cpuinfo and safeint (#15470) 2023-09-08 13:35:04 -07:00
onnxruntime_compile_triton_kernel.cmake [ROCm] Add ROCm Triton TunableOp for GroupNorm (#16196) 2023-07-11 13:55:30 +08:00
onnxruntime_config.h.in Enabling c++ 20 in MacOS build (#16187) 2023-09-26 11:27:02 -07:00
onnxruntime_csharp.cmake Refactor training build options (#13964) 2023-01-03 13:28:16 -08:00
onnxruntime_flatbuffers.cmake Rework some external targets to ease building with -DFETCHCONTENT_FULLY_DISCONNECTED=ON (#15323) 2023-04-03 17:45:12 -07:00
onnxruntime_framework.cmake [C#, CPP] Introduce Float16/BFloat16 support and tests for C#, C++ (#16506) 2023-07-14 10:46:52 -07:00
onnxruntime_framework.natvis [C#, CPP] Introduce Float16/BFloat16 support and tests for C#, C++ (#16506) 2023-07-14 10:46:52 -07:00
onnxruntime_fuzz_test.cmake Fix fuzz test (#14385) 2023-01-22 22:17:43 -08:00
onnxruntime_graph.cmake Update C/C++ dependencies: abseil, date, nsync, googletest, wil, mp11, cpuinfo and safeint (#15470) 2023-09-08 13:35:04 -07:00
onnxruntime_ios.toolchain.cmake
onnxruntime_java.cmake Update build option for training in java to enable_training_api (#15638) 2023-04-24 11:53:08 -07:00
onnxruntime_java_unittests.cmake Update build option for training in java to enable_training_api (#15638) 2023-04-24 11:53:08 -07:00
onnxruntime_kernel_explorer.cmake [ROCm] TunableOp: Update rocBLAS get_solutions API (since ROCm5.6) (#16657) 2023-07-13 11:20:26 +08:00
onnxruntime_language_interop_ops.cmake Use target name for flatbuffers (#13991) 2022-12-20 11:44:02 -08:00
onnxruntime_mlas.cmake Fix typo of cmake (#17715) 2023-09-27 11:48:46 -07:00
onnxruntime_nodejs.cmake Added DML and CUDA provider support in onnxruntime-node (#16050) 2023-08-25 16:57:06 -07:00
onnxruntime_objectivec.cmake Objective C Training API: TrainingSession (#16374) 2023-06-28 09:13:56 -07:00
onnxruntime_opschema_lib.cmake Use target name for flatbuffers (#13991) 2022-12-20 11:44:02 -08:00
onnxruntime_optimizer.cmake Support inplace update for PythonOp/Grad (#17687) 2023-10-10 21:36:45 -07:00
onnxruntime_providers.cmake Add API for NPU Device Selection in the DML EP (#17612) 2023-10-11 14:53:00 -07:00
onnxruntime_providers_acl.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_armnn.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_azure.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_cann.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_coreml.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_cpu.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_cuda.cmake distributed slice (#17761) 2023-10-12 14:28:00 -07:00
onnxruntime_providers_dml.cmake Add API for NPU Device Selection in the DML EP (#17612) 2023-10-11 14:53:00 -07:00
onnxruntime_providers_dnnl.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_js.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_migraphx.cmake CUDA EP vs ROCM EP hipify audit (#17776) 2023-10-13 10:13:53 +08:00
onnxruntime_providers_nnapi.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_openvino.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_qnn.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_rknpu.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_rocm.cmake CUDA EP vs ROCM EP hipify audit (#17776) 2023-10-13 10:13:53 +08:00
onnxruntime_providers_tensorrt.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_tvm.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_vitisai.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_webnn.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_xnnpack.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_pyop.cmake Use target name for flatbuffers (#13991) 2022-12-20 11:44:02 -08:00
onnxruntime_python.cmake Add LLaMA scripts (#17020) 2023-08-22 18:05:11 -07:00
onnxruntime_rocm_hipify.cmake Add MatMul 4bits support on GPU (#17890) 2023-10-13 16:55:30 -07:00
onnxruntime_session.cmake added support for cmake "find_package" (#8919) 2023-06-19 22:20:31 -07:00
onnxruntime_snpe_provider.cmake Use target name for flatbuffers (#13991) 2022-12-20 11:44:02 -08:00
onnxruntime_training.cmake Triton Codegen for ORTModule (#15831) 2023-07-13 18:17:58 +08:00
onnxruntime_unittests.cmake [QNN EP] fixed input for InstanceNormU8 unit test and update copy lib paths (#17806) 2023-10-05 22:17:15 -07:00
onnxruntime_util.cmake Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
onnxruntime_webassembly.cmake Add training WASM generation to Web CI pipeline (#17319) 2023-09-08 15:49:47 -07:00
precompiled_header.cmake
Sdl.ruleset Add a Github workflow for Prefast (#15763) 2023-05-03 11:42:51 -07:00
set_winapi_family_desktop.h
target_delayload.cmake
uwp_stubs.h Run clang-format in CI (#15524) 2023-04-18 09:26:58 -07:00
wcos_rules_override.cmake
winml.cmake Rework WIL dependency retrieval/usage (#17130) 2023-08-15 09:11:46 -07:00
winml_cppwinrt.cmake
winml_sdk_helpers.cmake
winml_unittests.cmake Update C/C++ dependencies: abseil, date, nsync, googletest, wil, mp11, cpuinfo and safeint (#15470) 2023-09-08 13:35:04 -07:00