pytorch/test/quantization/core
Johnson Wong 3fcf66f61f Add option to split Linear gates for Quantizable LSTM into separate ops (#140868)
Summary:
For LSTM, the input and hidden state are projected with Linear layers to construct the 4 gates. This is typically performed together as a single Linear (for each state) with output channel count `4 * hidden_dim` for efficiency.
https://www.internalfb.com/code/fbsource/[ebef7c4238aa55948b2b444044f2c8ed2040de55]/fbcode/caffe2/torch/ao/nn/quantizable/modules/rnn.py?lines=52-58
The output is then ultimately split into 4:
https://www.internalfb.com/code/fbsource/[ebef7c4238aa55948b2b444044f2c8ed2040de55]/fbcode/caffe2/torch/ao/nn/quantizable/modules/rnn.py?lines=83-87

For on-device latency (and possibly memory) considerations, we want to avoid constructing the intermediate `gates` tensor (which can be relatively large), by splitting `igates` and `hgates` first (as 4x `Linear(hidden_dim, hidden_dim)` each), applying add separately, then proceeding as usual.

This functionality can be enabled by specifying `split_gates=True` (default False is original behavior) at any entry point (directly with `torch.ao.nn.quantizable.LSTM`  or via `_get_lstm_with_individually_observed_parts`).

Test Plan:
piggy back on existing test to check for correct swap handling, numerics, and jit.script during prepare/convert
```
buck2 test 'fbcode//mode/opt' fbcode//caffe2/test/quantization:test_quantization -- --exact 'caffe2/test/quantization:test_quantization - test_custom_module_lstm (caffe2.test.quantization.core.test_quantized_op.TestQuantizedOps)'
```
https://www.internalfb.com/intern/testinfra/testrun/11540474102848372

This test is quite long running now (more than double original).

Reviewed By: Ninja91

Differential Revision: D65283170

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140868
Approved by: https://github.com/jerryzh168
2024-11-22 04:10:26 +00:00
..
experimental
__init__.py
test_backend_config.py
test_docs.py
test_quantized_functional.py
test_quantized_module.py
test_quantized_op.py Add option to split Linear gates for Quantizable LSTM into separate ops (#140868) 2024-11-22 04:10:26 +00:00
test_quantized_tensor.py
test_top_level_apis.py
test_utils.py
test_workflow_module.py
test_workflow_ops.py Replace clone.detach with detach.clone (#140264) 2024-11-13 07:01:02 +00:00