pytorch/test/expect
Nikhil Gupta 94737e8a2a [ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124)
Description:
1. Quantize Linear Layer Weights to 4-bits:
Quantize the weights of the Linear layer to 4 bits, using symmetric quantization.
Pack two 4-bit weights into one uint8 container.
Choose a quantization scheme (channel-wise or group-wise), with the group size being a multiple of 32.

2. Prepare Quantized Weights, Scales, and Optional Bias:
After quantizing, obtain the quantized_weights, scales, and groupsize.
If the original Linear layer has a bias, prepare it as well.

3. Pack the Weights Efficiently:
Use torch.ops.aten._dyn_quant_pack_4bit_weight to optimally pack the weights, scales, and optional bias.
```python
packed_weights = torch.ops.aten._dyn_quant_pack_4bit_weight(weight, scales_and_zeros, bias, groupsize, in_features, out_features)
```
Input parameters should include:
in_features and out_features (the same as the Linear layer’s corresponding parameters).

4. Perform Dynamic Quantized Matrix Multiplication:
Use torch.ops.aten._dyn_quant_matmul_4bit to perform matrix multiplication with quantized weights.
```python
output = torch.ops.aten._dyn_quant_matmul_4bit(input, packed_weights,  groupsize, in_features, out_features)
```
Inputs required include:
The input tensor, packed_weights , groupsize, and the in_features and out_features.

API Usage: https://github.com/pytorch/pytorch/issues/143289

Model Perf :
7B Transformer model:
Prefill : 340 t/s
Decode  : 40  t/s
2B Transformer model
Prefill : 747 t/s
Decode  : 80  t/s

Tests:
python test/test_linalg.py -k test__dyn_quant_pack_4bit_weight
Ran 1 test in 0.016s

OK

python test/test_linalg.py -k test__dyn_quant_matmul_4bit
Ran 8 tests in 0.077s

OK

python test/test_linalg.py -k test_compile_dyn_quant_matmul_4bit
Ran 8 tests in 11.454s

Change-Id: Ia1672bad5e6ec94e64d8bb1971395d60f4b3a452

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134124
Approved by: https://github.com/digantdesai, https://github.com/malfet
2024-12-20 19:32:03 +00:00
..
__init__.py
HasDecompTest.test_aten_core_operators.expect Revert "Fix unbind_copy and add its decomposition (#134319)" 2024-10-29 04:54:37 +00:00
HasDecompTest.test_has_decomposition.expect [ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124) 2024-12-20 19:32:03 +00:00
TestAutograd.test_function-x_grad_desc.expect
TestAutograd.test_function-y_grad_desc.expect
TestFXAPIBackwardCompatibility.test_class_member_back_compat-fx_backcompat_class_members.expect Add output_node util function to fx.Graph (#139770) 2024-11-07 18:54:59 +00:00
TestFXAPIBackwardCompatibility.test_function_back_compat-fx_backcompat_function_signatures.expect [BE] Add type annotation to eliminate_dead_code (#142251) 2024-12-10 17:09:21 +00:00
TestJit.test_cu_escaped_number.expect
TestJit.test_import_method.expect
TestJit.test_non_ascii_string.expect
TestJit.test_pretty_printer-empty_float_list_test.expect
TestJit.test_pretty_printer-empty_int_list_test.expect
TestJit.test_pretty_printer-if_one.expect
TestJit.test_pretty_printer-if_test.expect
TestJit.test_pretty_printer-loop_use_test.expect
TestJit.test_pretty_printer-print_weird_test.expect
TestJit.test_pretty_printer-python_op_name_test.expect
TestJit.test_pretty_printer-while_if_test.expect
TestJit.test_pretty_printer-while_test.expect
TestPytorchExportModes.test_aten_fallback.expect
TestPytorchExportModes.test_onnx_aten.expect
TestScript.test_annot_ast_mypy_fn.expect
TestScript.test_annot_ast_mypy_method.expect
TestScript.test_annot_ast_py3_fn.expect
TestScript.test_annot_ast_py3_method.expect
TestScript.test_annot_string_mypy_fn.expect
TestScript.test_annot_string_mypy_method.expect
TestScript.test_annot_string_py3_fn.expect
TestScript.test_annot_string_py3_method.expect
TestScript.test_annotated_script_fn.expect
TestScript.test_annotated_script_method.expect
TestScript.test_format-stdout.expect
TestScript.test_listconstruct_erasure.expect
TestScript.test_parser_type_annotations.expect
TestScript.test_parser_type_annotations_comment.expect
TestScript.test_print-stdout.expect
TestScript.test_python_frontend.expect
TestScript.test_python_frontend_py2.expect
TestScript.test_python_frontend_py3.expect
TestScript.test_string_print-stdout.expect
TestScript.test_torch_dot_tensor_annotation.expect
TestSparseCompressedCPU.test_print_SparseBSC_cpu.expect
TestSparseCompressedCPU.test_print_SparseBSR_cpu.expect
TestSparseCompressedCPU.test_print_SparseCSC_cpu.expect
TestSparseCompressedCPU.test_print_SparseCSR_cpu.expect
TestSparseCompressedCUDA.test_print_SparseBSC_cuda.expect
TestSparseCompressedCUDA.test_print_SparseBSR_cuda.expect
TestSparseCompressedCUDA.test_print_SparseCSC_cuda.expect
TestSparseCompressedCUDA.test_print_SparseCSR_cuda.expect
TestSparseCPU.test_print_coalesced_cpu_float64.expect
TestSparseCPU.test_print_uncoalesced_cpu_float64.expect
TestSparseCUDA.test_print_coalesced_cuda_float64.expect
TestSparseCUDA.test_print_uncoalesced_cuda_float64.expect
TestSparseMeta.test_print_meta_SparseBSC_float64.expect
TestSparseMeta.test_print_meta_SparseBSR_float64.expect
TestSparseMeta.test_print_meta_SparseCOO_float64.expect
TestSparseMeta.test_print_meta_SparseCSC_float64.expect
TestSparseMeta.test_print_meta_SparseCSR_float64.expect
TestTensorBoard.test_audio.expect
TestTensorBoard.test_caffe2_simple_cnnmodel.expect
TestTensorBoard.test_caffe2_simple_model.expect
TestTensorBoard.test_histogram_auto.expect
TestTensorBoard.test_histogram_doane.expect
TestTensorBoard.test_histogram_fd.expect
TestTensorBoard.test_hparams_bool.expect
TestTensorBoard.test_hparams_number.expect
TestTensorBoard.test_hparams_string.expect
TestTensorBoard.test_image_with_3_channel_batched.expect
TestTensorBoard.test_image_with_boxes.expect
TestTensorBoard.test_image_with_one_channel.expect
TestTensorBoard.test_image_with_one_channel_batched.expect
TestTensorBoard.test_image_without_channel.expect
TestTensorBoard.test_mesh.expect
TestTensorBoard.test_nested_nn_squential.expect
TestTensorBoard.test_pr_curve.expect
TestTensorBoard.test_pr_curve_raw.expect
TestTensorBoard.test_pytorch_graph.expect
TestTensorBoard.test_scalar_new_style.expect
TestTensorBoard.test_text.expect
TestTensorBoard.test_video.expect
TestTorch.test_is_nonzero-empty.expect
TestTorch.test_is_nonzero-multiple.expect
TestTorch.test_print-non_contiguous.expect