pytorch/test/quantization/core
Huamin Li 3d8db41337 Add new op wrapped_quantized_linear (#134024)
Summary:
This diff adds a new operator wrapped_quantized_linear (torch.ops._quantized.wrapped_quantized_linear) and takes the following input argument: input (in fp32) , input_scale, input_zero_point, weight (in fp32), weight_scale, weight_zero_point, bias (in fp32), output_scale, output_zero_point, and out_channel. It does the following

1. Use quantize_per_tensor(input, input_scale, input_zero_point) to quantize the input tensor to int8
2. Use quantized::linear_prepack(weight, weight_scale, weight_zero_point, bias) to pack the weight and bias
3. Use quantized::linear to perform int8 quantized linear
4. dequantize

This new op is essentially a wrapper of mutiple ops. We do this as torch.export cannot handle models where it has old quantize apis.

Reviewed By: jerryzh168

Differential Revision: D61377266

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134024
Approved by: https://github.com/houseroad
2024-08-21 09:26:58 +00:00
..
experimental Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
__init__.py
test_backend_config.py
test_docs.py
test_quantized_functional.py
test_quantized_module.py Fix failures when default is flipped for weights_only (#127627) 2024-08-16 00:22:43 +00:00
test_quantized_op.py Add new op wrapped_quantized_linear (#134024) 2024-08-21 09:26:58 +00:00
test_quantized_tensor.py Fix failures when default is flipped for weights_only (#127627) 2024-08-16 00:22:43 +00:00
test_top_level_apis.py
test_utils.py Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
test_workflow_module.py Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00
test_workflow_ops.py Add None return type to init -- tests (#132352) 2024-08-01 15:44:51 +00:00