onnxruntime/include/onnxruntime/core/framework
Adrian Lizarraga b02d5e6d76
[CPU EP] Int4 support for QuantizeLinear, DequantizeLinear, and Transpose (#20362)
### Description
- 4-bit QuantizeLinear(21). **Blocked quantization still missing (i.e.,
do not support the new `block_size` attribute)**
- 4-bit DequantizeLinear(21). **Blocked dequantization still missing
(i.e., do not support the new `block_size` attribute)**
- 4-bit Transpose(21).
- Update quantization tool with int4 types.
- Disable QDQ fusions for 4-bit types. See:
https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selector_action_transformer.cc
- MLAS 4-bit quantization kernels for intel, neon, powerpc.

##### Notes
To calculate a tensor's storage size, we normally get the number of
elements from the shape (i.e., `tensor_shape.Size()`) and multiply by
the size of a single element. This does not directly work for sub-byte
elements like int4 as each element in a `Tensor<Int4x2>` stores **two**
packed int4 elements in a byte. The `Tensor::
CalculateTensorStorageSize` should be called to perform the correct
calculation for any tensor element type.

### Motivation and Context
ONNX 1.16 added the int4 and uint4 types. This initial PR adds the int4
type to ORT and adds int4 implementations for the Quant, Dequant, and
Transpose ops on CPU EP. We still need to add int4 support for many ops
and execution providers. See the ONNX 1.16 release notes:
https://github.com/onnx/onnx/releases.
2024-05-30 18:56:24 -07:00
..
alloc_kind.h
allocator.h Expose Reserve() in OrtAllocator to allow custom allocators to work when session.use_device_allocator_for_initializers is specified. (#19904) 2024-03-28 12:28:37 -07:00
buffer_deleter.h
customregistry.h
data_types.h [CPU EP] Int4 support for QuantizeLinear, DequantizeLinear, and Transpose (#20362) 2024-05-30 18:56:24 -07:00
data_types_internal.h [CPU EP] Int4 support for QuantizeLinear, DequantizeLinear, and Transpose (#20362) 2024-05-30 18:56:24 -07:00
endian.h
execution_provider.h Enable CUDA EP unit testing on Windows (#20039) 2024-03-27 13:32:36 -07:00
float8.h implement isinf20 and isnan20 (#17874) 2023-10-24 10:58:54 -07:00
float16.h [C#, CPP] Introduce Float16/BFloat16 support and tests for C#, C++ (#16506) 2023-07-14 10:46:52 -07:00
framework_common.h
framework_provider_common.h Add TRT plugins support using custom ops (#13847) 2023-04-18 20:24:32 -07:00
func_api.h Run clang-format in CI (#15524) 2023-04-18 09:26:58 -07:00
int4.h [CPU EP] Int4 support for QuantizeLinear, DequantizeLinear, and Transpose (#20362) 2024-05-30 18:56:24 -07:00
kernel_def_builder.h Remove onnxruntime_PYBIND_EXPORT_OPSCHEMA definition from onnxruntime (#15776) 2023-05-03 13:08:35 -07:00
kernel_registry.h Remove onnxruntime_PYBIND_EXPORT_OPSCHEMA definition from onnxruntime (#15776) 2023-05-03 13:08:35 -07:00
op_kernel.h [wasm] upgrade emsdk to 3.1.44 (#17069) 2023-08-10 16:08:36 -07:00
op_kernel_context.h ExecutionProvider API refactor - replace OrtMemoryInfo with OrtDevice (#15618) 2023-05-01 10:06:00 -07:00
op_kernel_info.h Make session configuration options available to kernels via OpKernelInfo (#18897) 2024-01-13 10:02:43 +10:00
op_node_proto_helper.h MLAS AArch64 quantized int4 Gemm kernel (#18031) 2023-11-15 09:31:54 -08:00
ort_value.h Two fixes involving minimal builds (#17000) 2023-08-23 16:01:22 +10:00
ortdevice.h ExecutionProvider API refactor - replace OrtMemoryInfo with OrtDevice (#15618) 2023-05-01 10:06:00 -07:00
ortmemoryinfo.h Run clang-format in CI (#15524) 2023-04-18 09:26:58 -07:00
provider_options.h
provider_options_utils.h
provider_shutdown.h
run_options.h Enable CUDA EP unit testing on Windows (#20039) 2024-03-27 13:32:36 -07:00
sparse_tensor.h Run clang-format in CI (#15524) 2023-04-18 09:26:58 -07:00
stream_handles.h Add new API KernelContext_GetScratchBuffer (#19809) 2024-03-13 19:41:15 -07:00
tensor.h [CPU EP] Int4 support for QuantizeLinear, DequantizeLinear, and Transpose (#20362) 2024-05-30 18:56:24 -07:00
tensor_shape.h Make TensorShapeVector to use InlinedVector<Int64_t> to reduce on template instantiations (#18519) 2023-11-21 14:13:50 -08:00
to_tensor_proto_element_type.h [CPU EP] Int4 support for QuantizeLinear, DequantizeLinear, and Transpose (#20362) 2024-05-30 18:56:24 -07:00