onnxruntime/onnxruntime/test/python/quantization
Jambay Kinley d30d4d372a
Add MatMul FP4 and NF4 Support (#18066)
### Description
Add a contrib op MatMulBnb4 (FP4 and NF4) and related toolchain to
support quantization on weight.

This PR adds:
- schema for contrib op MatMulBnb4 which can support FP4 (4-bit floating
point) and NF4 (4-bit NormalFloat) quantization on weight.
- a naive implementation for MatMulBnb4 on CPU and GPU, i.e.,
implemented like MatMul(A, Dequantize(B)).
- a special implementation for GemV for MatMulBnb4 and related benchmark
tool.
- tool to quantize model to FP4 or NF4.
2023-10-25 15:34:58 -07:00
..
op_test_utils.py ONNX 1.15 integration (#17125) 2023-09-26 14:44:48 -07:00
read_me.txt enable pipeline to run quantization tests (#6416) 2021-01-25 09:33:08 -08:00
resnet_code.py Fix static quantization for QDQ and Percentile distribution (#17649) 2023-09-25 10:11:58 -07:00
test_calibration.py Updating QDQ to support Float8E4M3FN (#16550) 2023-08-08 12:18:48 +02:00
test_conv_dynamic.py fix: supported typo (#17216) 2023-09-27 10:45:27 -07:00
test_onnx_model.py fix topo sort in quantization tool (#16003) 2023-05-18 13:43:52 -07:00
test_op_argmax.py Adopt linrtunner as the linting tool - take 2 (#15085) 2023-03-24 15:29:03 -07:00
test_op_attention.py fix protobuf copyfrom 2G limit (#16422) 2023-06-21 20:45:11 -07:00
test_op_concat.py [Better Engineering] Fix N802 lint errors in tests (#16788) 2023-07-21 09:17:34 -07:00
test_op_conv_transpose.py Reuse QDQConv for ConvTranspose to generate the QDQ model (#15385) 2023-04-06 15:07:44 -07:00
test_op_embed_layernorm.py fix protobuf copyfrom 2G limit (#16422) 2023-06-21 20:45:11 -07:00
test_op_gavgpool.py Adopt linrtunner as the linting tool - take 2 (#15085) 2023-03-24 15:29:03 -07:00
test_op_gemm.py Updating QDQ to support Float8E4M3FN (#16550) 2023-08-08 12:18:48 +02:00
test_op_instance_normalization.py Adopt linrtunner as the linting tool - take 2 (#15085) 2023-03-24 15:29:03 -07:00
test_op_matmul_4bits.py Add MatMul 4bits support on GPU (#17890) 2023-10-13 16:55:30 -07:00
test_op_matmul_bnb4.py Add MatMul FP4 and NF4 Support (#18066) 2023-10-25 15:34:58 -07:00
test_op_matmulfpq4.py add int4 quantization code in python (#17077) 2023-08-11 15:17:58 -07:00
test_op_maxpool.py Adopt linrtunner as the linting tool - take 2 (#15085) 2023-03-24 15:29:03 -07:00
test_op_pad.py Fix Pad's quantization (#17807) 2023-10-08 22:09:23 -07:00
test_op_pooling.py Adopt linrtunner as the linting tool - take 2 (#15085) 2023-03-24 15:29:03 -07:00
test_op_relu.py Adopt linrtunner as the linting tool - take 2 (#15085) 2023-03-24 15:29:03 -07:00
test_op_reshape.py Adopt linrtunner as the linting tool - take 2 (#15085) 2023-03-24 15:29:03 -07:00
test_op_resize.py Adopt linrtunner as the linting tool - take 2 (#15085) 2023-03-24 15:29:03 -07:00
test_op_softmax.py Adopt linrtunner as the linting tool - take 2 (#15085) 2023-03-24 15:29:03 -07:00
test_op_split.py [Better Engineering] Fix N802 lint errors in tests (#16788) 2023-07-21 09:17:34 -07:00
test_op_squeeze_unsqueeze.py Adopt linrtunner as the linting tool - take 2 (#15085) 2023-03-24 15:29:03 -07:00
test_op_transpose.py Adopt linrtunner as the linting tool - take 2 (#15085) 2023-03-24 15:29:03 -07:00
test_op_where.py [Better Engineering] Fix N802 lint errors in tests (#16788) 2023-07-21 09:17:34 -07:00
test_qdq.py [QNN/CPU EP] Add 16-bit Quantize/Dequantize contrib ops (#17015) 2023-09-18 09:43:34 -07:00
test_qdq_loss_debug.py Disable PERF* rules in ruff to allow better readability (#16834) 2023-07-25 15:38:22 -07:00
test_quant_util.py fix protobuf copyfrom 2G limit (#16422) 2023-06-21 20:45:11 -07:00
test_quantize_static.py skip test_smooth_quant to unblock Python Package Pipeline (#16914) 2023-07-29 11:24:28 -07:00
test_quantize_static_resnet.py Fix static quantization for QDQ and Percentile distribution (#17649) 2023-09-25 10:11:58 -07:00
test_quantizeblockwise_4bits.py Add MatMul 4bits support on GPU (#17890) 2023-10-13 16:55:30 -07:00
test_quantizeblockwise_bnb4.py Add MatMul FP4 and NF4 Support (#18066) 2023-10-25 15:34:58 -07:00
test_symmetric_flag.py [Linter] Bump ruff and remove pylint (#17797) 2023-10-05 21:07:33 -07:00