onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-01 03:45:06 +00:00

History

Adrian Lizarraga b02d5e6d76 [CPU EP] Int4 support for QuantizeLinear, DequantizeLinear, and Transpose (#20362 ) ### Description - 4-bit QuantizeLinear(21). Blocked quantization still missing (i.e., do not support the new `block_size` attribute) - 4-bit DequantizeLinear(21). Blocked dequantization still missing (i.e., do not support the new `block_size` attribute) - 4-bit Transpose(21). - Update quantization tool with int4 types. - Disable QDQ fusions for 4-bit types. See: https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selector_action_transformer.cc - MLAS 4-bit quantization kernels for intel, neon, powerpc. ##### Notes To calculate a tensor's storage size, we normally get the number of elements from the shape (i.e., `tensor_shape.Size()`) and multiply by the size of a single element. This does not directly work for sub-byte elements like int4 as each element in a `Tensor<Int4x2>` stores two packed int4 elements in a byte. The `Tensor:: CalculateTensorStorageSize` should be called to perform the correct calculation for any tensor element type. ### Motivation and Context ONNX 1.16 added the int4 and uint4 types. This initial PR adds the int4 type to ORT and adds int4 implementations for the Quant, Dequant, and Transpose ops on CPU EP. We still need to add int4 support for many ops and execution providers. See the ONNX 1.16 release notes: https://github.com/onnx/onnx/releases.		2024-05-30 18:56:24 -07:00
..
cffconvert.yml	Bump actions/checkout from 3 to 4 (#17487 )	2023-09-13 09:22:21 -07:00
codeql.yml	Use Java 11 to build project in the codeql pipeline (#19999 )	2024-03-20 17:53:48 -07:00
generate-skip-doc-change.py	Adopt linrtunner as the linting tool - take 2 (#15085 )	2023-03-24 15:29:03 -07:00
gradle-wrapper-validation.yml	Bump gradle/wrapper-validation-action from 2 to 3 (#20305 )	2024-04-16 14:20:51 -07:00
labeler.yml	Update labeler.yml to change permissions (#19709 )	2024-02-28 21:10:25 -08:00
lint.yml	[CPU EP] Int4 support for QuantizeLinear, DequantizeLinear, and Transpose (#20362 )	2024-05-30 18:56:24 -07:00
mac.yml	Add Mac CI GitHub Actions workflow (#20717 )	2024-05-20 10:27:03 -07:00
publish-c-apidocs.yml	Bump actions/upload-artifact from 3 to 4 (#18920 )	2023-12-31 21:10:47 -08:00
publish-csharp-apidocs.yml	Bump nuget/setup-nuget from 1 to 2 (#19411 )	2024-02-13 15:59:15 -08:00
publish-gh-pages.yml	Add website publish placeholder (#17318 )	2023-08-30 11:01:54 -07:00
publish-java-apidocs.yml	Bump gradle/gradle-build-action from 2 to 3 (#19297 )	2024-02-05 09:41:57 -08:00
publish-js-apidocs.yml	Bump actions/upload-artifact from 3 to 4 (#18920 )	2023-12-31 21:10:47 -08:00
publish-objectivec-apidocs.yml	Fix training and macos ci pipelines (#20034 )	2024-03-26 12:20:11 -07:00
publish-python-apidocs.yml	Bump actions/upload-artifact from 3 to 4 (#18920 )	2023-12-31 21:10:47 -08:00
skip-doc-change.yml.j2	Update Win_GPU_CI trigger (#13290 )	2022-10-12 15:22:42 +08:00
stale.yml	Update stale.yml to use old version as a bug fix (#19532 )	2024-02-15 17:03:11 -08:00
windows.yml	Remove TVM EP's pipeline (#20813 )	2024-05-25 20:42:41 -07:00