onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-16 21:00:14 +00:00

History

Jambay Kinley d30d4d372a Add MatMul FP4 and NF4 Support (#18066 ) ### Description Add a contrib op MatMulBnb4 (FP4 and NF4) and related toolchain to support quantization on weight. This PR adds: - schema for contrib op MatMulBnb4 which can support FP4 (4-bit floating point) and NF4 (4-bit NormalFloat) quantization on weight. - a naive implementation for MatMulBnb4 on CPU and GPU, i.e., implemented like MatMul(A, Dequantize(B)). - a special implementation for GemV for MatMulBnb4 and related benchmark tool. - tool to quantize model to FP4 or NF4.		2023-10-25 15:34:58 -07:00
..
c_cxx	Remove extraneous javascript includes (#17558 )	2023-09-14 20:43:24 -07:00
execution_providers/images
images
python	Bump Up Version to 1.17.0 (#17587 )	2023-09-20 11:02:58 +08:00
ABI_Dev_Notes.md	Fix a typo in ABI_Dev_Notes.md (#17832 )	2023-10-09 07:51:34 -07:00
Android_testing.md
C_API_Guidelines.md	Replace 'master' branch ref to 'main' in the code (#12547 )	2022-08-22 10:48:12 -07:00
cmake_guideline.md	fix some typo in docs (#13212 )	2022-10-07 15:58:18 -07:00
Coding_Conventions_and_Standards.md	[docs] Specify Objective-C max line length. (#16503 )	2023-06-28 16:58:23 -07:00
ContribOperators.md	Add MatMul FP4 and NF4 Support (#18066 )	2023-10-25 15:34:58 -07:00
FAQ.md	[Technical docs] Fixed a couple of old links in `FAQ.md` (#17415 )	2023-09-26 13:38:24 -07:00
How_To_Update_ONNX_Dev_Notes.md	Remove exclusions for ONNX model tests that now pass. (#14337 )	2023-01-24 08:04:27 +10:00
Memory_Optimizer.md	Add guidelines for ORTModule (#13553 )	2022-11-04 19:42:10 +08:00
Model_Test.md
NotesOnThreading.md	Replace 'master' branch ref to 'main' in the code (#12547 )	2022-08-22 10:48:12 -07:00
ONNX_Runtime_Server_Usage.md
onnxruntime_dependencies.dot
onnxruntime_dependencies.png
onnxruntime_extensions.md	Remove the extensions submodule (#17097 )	2023-08-14 10:16:33 -07:00
OperatorKernels.md	Add MatMul FP4 and NF4 Support (#18066 )	2023-10-25 15:34:58 -07:00
ORT_Format_Update_in_1.13.md	Update ORT format v5 change docs to cover limited backwards compatibility in 1.14. (#14413 )	2023-01-25 08:23:12 -08:00
ORT_Use_Trtion_Kernel.md	[ROCm] Add ROCm Triton TunableOp for GroupNorm (#16196 )	2023-07-11 13:55:30 +08:00
ORTMobilePackageOperatorTypeSupport.md	Replace 'master' branch ref to 'main' in the code (#12547 )	2022-08-22 10:48:12 -07:00
ORTModule_Convergence_Notes.md	Introduce ZeROOffloadSubscriber for ORTModule (#17006 )	2023-08-25 00:15:22 +08:00
ORTModule_ModuleWithLoss_Wrapper.md	add steps to write modulewithloss wrapper (#16486 )	2023-07-11 09:07:35 +08:00
ORTModule_PythonOp_Notes.md	Add document for PythonOp (#17888 )	2023-10-12 08:36:22 +08:00
ORTModule_Training_Guidelines.md	Use full qualified name for PythonOp export (#17021 )	2023-08-09 10:58:33 +08:00
PR_Guidelines.md
Privacy.md
Python_Dev_Notes.md
Reduced_Operator_Kernel_build.md	replace 'master' branch ref to 'main' for onnx repo (#12678 )	2022-08-30 13:41:42 -07:00
ReleaseManagement.md
Roadmap.md	Replace 'master' branch ref to 'main' in the code (#12547 )	2022-08-22 10:48:12 -07:00
Server.md
TVM_EP.md	Fix: update hyperlinks to the Jupyter notebooks (#16145 )	2023-08-21 09:53:05 -07:00
Versioning.md	replace 'master' branch ref to 'main' for onnx repo (#12678 )	2022-08-30 13:41:42 -07:00
WinML_principles.md	Replace 'master' branch ref to 'main' in the code (#12547 )	2022-08-22 10:48:12 -07:00