mirror of
https://github.com/saymrwulf/pytorch.git
synced 2026-05-14 20:57:59 +00:00
Summary: This diff adds a new operator wrapped_quantized_linear (torch.ops._quantized.wrapped_quantized_linear) and takes the following input argument: input (in fp32) , input_scale, input_zero_point, weight (in fp32), weight_scale, weight_zero_point, bias (in fp32), output_scale, output_zero_point, and out_channel. It does the following 1. Use quantize_per_tensor(input, input_scale, input_zero_point) to quantize the input tensor to int8 2. Use quantized::linear_prepack(weight, weight_scale, weight_zero_point, bias) to pack the weight and bias 3. Use quantized::linear to perform int8 quantized linear 4. dequantize This new op is essentially a wrapper of mutiple ops. We do this as torch.export cannot handle models where it has old quantize apis. Reviewed By: jerryzh168 Differential Revision: D61377266 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134024 Approved by: https://github.com/houseroad |
||
|---|---|---|
| .. | ||
| experimental | ||
| __init__.py | ||
| test_backend_config.py | ||
| test_docs.py | ||
| test_quantized_functional.py | ||
| test_quantized_module.py | ||
| test_quantized_op.py | ||
| test_quantized_tensor.py | ||
| test_top_level_apis.py | ||
| test_utils.py | ||
| test_workflow_module.py | ||
| test_workflow_ops.py | ||