Summary:
As title
This is a BC-breaking change because graph produced by "capture_pre_autograd_graph" cannot be input to quantization anymore. But this is ok, since this API is deprecated for a while and is going to be deleted. We have removed all call sites of it.
We remove the deprecated API references in code, docs, and tests.
We also removed two tests that specific to capture_pre_autograd_graph API.
Test Plan: CI
Differential Revision: D65351887
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139505
Approved by: https://github.com/tugsbayasgalan, https://github.com/andrewor14, https://github.com/jerryzh168
Summary: Before we would recompile the model unnecessarily even
if the model is already in the desired mode. For training
frameworks that assume `model.train()` is idempotent and calls
this before every single training step, this led to a bunch of
tiny graphs and poor performance. This commit makes these calls
no-ops if we're already in the target train/eval mode.
Test Plan:
python test/test_quantization -k TestQuantizePT2E.test_allow_exported_model_train_eval_idempotent
Pull Request resolved: https://github.com/pytorch/pytorch/pull/142239
Approved by: https://github.com/jerryzh168
Add option to split Linear gates for Quantizable LSTM into separate ops (#141366)
Summary:
Reattempt to land D65283170, adding pyre-fixmes / mypy ignores following D52890934
For LSTM, the input and hidden state are projected with Linear layers to construct the 4 gates. This is typically performed together as a single Linear (for each state) with output channel count `4 * hidden_dim` for efficiency.
https://www.internalfb.com/code/fbsource/[ebef7c4238aa55948b2b444044f2c8ed2040de55]/fbcode/caffe2/torch/ao/nn/quantizable/modules/rnn.py?lines=52-58
The output is then ultimately split into 4:
https://www.internalfb.com/code/fbsource/[ebef7c4238aa55948b2b444044f2c8ed2040de55]/fbcode/caffe2/torch/ao/nn/quantizable/modules/rnn.py?lines=83-87
For on-device latency (and possibly memory) considerations, we want to avoid constructing the intermediate `gates` tensor (which can be relatively large), by splitting `igates` and `hgates` first (as 4x `Linear(hidden_dim, hidden_dim)` each), applying add separately, then proceeding as usual.
This functionality can be enabled by specifying `split_gates=True` (default False is original behavior) at any entry point (directly with `torch.ao.nn.quantizable.LSTM` or via `_get_lstm_with_individually_observed_parts`).
Test Plan:
piggy back on existing test to check for correct swap handling, numerics, and jit.script during prepare/convert
```
buck2 test 'fbcode//mode/opt' fbcode//caffe2/test/quantization:test_quantization -- --exact 'caffe2/test/quantization:test_quantization - test_custom_module_lstm (caffe2.test.quantization.core.test_quantized_op.TestQuantizedOps)'
```
https://www.internalfb.com/intern/testinfra/testrun/4503599884152725
This test is quite long running now (more than double original).
---
shorter test to confirm original `LSTMCell` passes
```
buck2 test 'fbcode//mode/opt' fbcode//caffe2/test:quantization_fx -- --exact 'caffe2/test:quantization_fx - test_static_lstm_with_custom_fixed_qparams (quantization.fx.test_quantize_fx.TestQuantizeFx)'
```
https://www.internalfb.com/intern/testinfra/testrun/11258999127933996
Reviewed By: Ninja91
Differential Revision: D66380336
Annotate linear node for `linear_dynamic_fp16` with `X86InductorQuantizer`
After `convert_pt2e`, the pattern will be
```
x
|
linear <- to_fp32 <- to_fp16 <- w
```
**Test plan**
```
pytest test/quantization/pt2e/test_x86inductor_quantizer.py -k test_linear_dynamic_fp16
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141480
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
Summary:
Populate nn.module.stack in _fuse_conv_bn_qat for replacement nodes that correspond to a `get_attr` node in the original graph.
In new training ir , `get_attr` nodes don't have `nn_module_stack` in node meta anymore (because the get_attr nodes are de-duplicated, so one get_attr node can potential have users in different module stacks).
We populate it by checking if "conv_input" or "conv_weight" replacement node has nn_module_stack. If not, we copy it from the conv node.
Test Plan:
CI
```
buck run fbcode//caffe2/test:quantization_pt2e -- -r test_preserve_nn_module_stack
```
Differential Revision: D66393517
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141400
Approved by: https://github.com/angelayi
**About this PR**
This PR adds the following ops for `linear_dynamic_fp16` in onednn namespace. These ops are intended for PT2E quantization eager mode.
- `onednn::linear_prepack_fp16`: packs fp32 weight to an fp16 MkldnnCPU tensor.
- `onednn::linear_dynamic_fp16`: takes an fp32 CPU tensor and an fp16 MkldnnCPU tensor and compute linear in fp32
- `onednn::linear_relu_dynamic_fp16`: similar as the former and apply relu on output.
**Test plan**
`python test/test_quantization.py -k test_linear_dynamic_fp16_onednn`
**Implementation**
These ops call oneDNN lib under the hood. It's worth noting that oneDNN does not support f32 * f16 -> f32 computation, so we have to convert fp16 weight to fp32 before computation. And weight is still in plain format after packing.
**Correctness and performance**
Correctness is guaranteed by UT.
Performance of the new ops may be better than the FBGEMM implementation when weight shape is small but worse when weight shape is large. It's because weight dtype conversion and computation are not fused.
For example, I ran benchmarks on an Intel(R) Xeon(R) Platinum 8490H machine with different cores and shapes. When using 1 core per instance, the new implementation generally is faster for weight shape < 1024 * 1024. When using more cores, the threshold will increase.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140376
Approved by: https://github.com/jerryzh168, https://github.com/jgong5
This PR adds C shim for `QConvPointWisePT2E` and `QConvPointWiseBinaryPT2E` similar to https://github.com/pytorch/pytorch/pull/138439. Besides that, we aligned the implementation of `qconv_pointwise` with `qlinear_pointwise` in the following aspects:
1. The parameter order of `qconv_pointwise` and `qlinear_pointwise` are quite different, we aligned the schema of `qconv_pointwise` to have similar parameter order as `qlinear_pointwise` to make it more consistent.
2. We always converted `x_scale` and `x_zero_point` to Tensors, just like in the lowering of `qlinear_pointwise`. This avoids the need to create two separate C APIs (one for `double x_scale` and `int64_t x_zero_point`, and another for `Tensor` versions). Instead, we only need one API for `Tensor`-based `x_scale` and `x_zero_point`. If we later add dynamic quantization for qconv (which will use `Tensor` for `x_scale` and `x_zero_point`), we can reuse the code from this PR and don't need to change the C shim layer API.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138540
Approved by: https://github.com/jgong5, https://github.com/desertfire
ghstack dependencies: #138691, #138806
**MOTIVATION**
We recently integrated support for Intel Gaudi devices (identified as 'hpu') into the common_device_type framework via the pull request at https://github.com/pytorch/pytorch/pull/126970. This integration allows tests to be automatically instantiated for Gaudi devices upon loading the relevant library. Building on this development, the current pull request extends the utility of these hooks by adapting selected CUDA tests to operate on Gaudi devices. Additionally, we have confirmed that these modifications do not interfere with the existing tests on CUDA devices.
**CHANGES**
- Add support for HPU devices within the test_move_exported_model_bn using TEST_HPU flag
- Use instantiate_device_type_tests with targeted attributes to generate device-specific test instances.
- Apply skipIfHPU decorator to bypass tests that are not yet compatible with HPU devices.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137863
Approved by: https://github.com/jerryzh168
Summary:
as title
also change it in `prepare_pt2e()` docstring
Test Plan:
```
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:quantization_pt2e_qat
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization
```
Differential Revision: D63345059
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137233
Approved by: https://github.com/tugsbayasgalan
Summary: Previously is_fbcode just checked whether the checkout was git or not. This is extremely error prone. Lets make it fool-proof.
Test Plan: unit tests
Reviewed By: masnesral
Differential Revision: D63545169
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136871
Approved by: https://github.com/masnesral
Summary:
Add a customizable loss function callback to NodeAccuracySummary to
allow users to pass in their own loss function.
Also, fix some type errors and propagate better exception messages when
unexpected tensor comparisons occur. Finally, enhance the robustness of
`generate_numeric_debug_handle` in the case where it is called multiple
times on the same model, by avoiding reuse of the same IDs.
Test Plan: Added a test for this case in `test_numeric_debugger`.
Reviewed By: jerryzh168
Differential Revision: D62898297
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136282
Approved by: https://github.com/jerryzh168
Summary:
Add a customizable loss function callback to NodeAccuracySummary to
allow users to pass in their own loss function.
Also, fix some type errors and propagate better exception messages when
unexpected tensor comparisons occur. Finally, enhance the robustness of
`generate_numeric_debug_handle` in the case where it is called multiple
times on the same model, by avoiding reuse of the same IDs.
Test Plan: Added a test for this case in `test_numeric_debugger`.
Reviewed By: jerryzh168
Differential Revision: D62898297
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136282
Approved by: https://github.com/jerryzh168
Summary:
Previously we only checked dtype and is_dynamic to decide if two quantization spec are equivalent
this may not work in some cases, e.g. when people use different qscheme or quant_min/quant_max
This PR added checks for other fields as well
Test Plan:
regression tests
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: [D62530974](https://our.internmc.facebook.com/intern/diff/D62530974)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135736
Approved by: https://github.com/sxu
Summary:
Fixes https://github.com/pytorch/pytorch/issues/134778
The previous D62304294 broke some executorch tests. It has already been reverted.
In this diff, `_collect_param_buffer_metadata()` is modified in a way that when a `call_function` node is encountered and its input nodes include `get_attr`. We skip the fields that have been collected previously and only collect rest of the fields. This prevents over-writing.
Test Plan:
```
buck2 test 'fbcode//mode/dev-nosan' fbcode//executorch/backends/xnnpack/test:test_xnnpack_ops
buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_re_export_preserve_handle
buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_run_decompositions_preserve_handle
```
Differential Revision: D62514208
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135720
Approved by: https://github.com/zhxchen17, https://github.com/jerryzh168
Summary:
Skip test_prepare_qat_conv_bn_fusion_getitem_placeholder when we use training ir, since it's only for bn-getitem pattern, but the pattern doesn't exist in training ir.
Remove BLOCK_LIST since it's empty.
Now all internal unittests will use training ir.
Test Plan:
```
buck2 run 'fbcode//mode/dev-nosan' caffe2/test/quantization:test_quantization -- -r test_prepare_qat_conv_bn_fusion_getitem_placeholder
buck2 run 'fbcode//mode/dev-nosan' caffe2/test:quantization_pt2e_qat -- -r test_prepare_qat_conv_bn_fusion_getitem_placeholder
```
Differential Revision: D62387987
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135729
Approved by: https://github.com/tugsbayasgalan
Summary: as title
Test Plan:
```
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:test_export -- -r test_conv_dynamic
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:fx -- -r matcher
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r x86
```
CI
Differential Revision: D62448302
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135623
Approved by: https://github.com/tugsbayasgalan