Commit graph

51 commits

Author SHA1 Message Date
Zhang Lei
f77ff1bc3d
Quantization support for split operator with its NHWC support (#6107)
* Make split working for quantization.

* NHWC transformer support for split operator

* Refactor some according to Feedback. Will add test cases soon.

* Fix build error on windows.

* Add test case for split op on uint8_t support

* Add nhwc_transformer_test for split uint8_t support

* Some change according to PR feedbacks.
2021-01-13 10:05:34 -08:00
Chi Lo
945fae8f56
Lochi/quantization tool for trt (#6103)
* Initial implementation of generating calibration dynamic range table

* Initialize validation support for Quantization

* Initialize validation support for Quantization (cont.)

* Improve validation support for Quantization

* Improve validation support for Quantization

* Rewrite/Refine for calibration and validation

* Rewrite/Refine for calibration and validation (cont.)

* Refine code

* Refine code

* Add data reader for BERT

* Add flatbuffers to serialize calibration table

* Refine code and add BERT evaluation

* Refine the code

* minor modification

* Add preprocess/postprocess of vision team yolov3 and refine the code

* Update annotation

* Make bbox cooridates more accurate

* Fix bug

* Add support of batch processing

* Batch processing for model zoo yolov3

* Add batch inference for evaluation

* Refine the code

* Add README

* Add comments

* Refine the code for PR

* Remove batch support checking in data_reader and refine the code

* Refine the code for PR

* Refine the code for PR review

Co-authored-by: Olivia Jain <oljain@microsoft.com>
2020-12-21 20:59:08 -08:00
Zhang Lei
648c9c7789
Fix bugs for 1: Calibrator should check model inputs; 2: (#6017)
quantize_inupts forgot to use parameter initializer_use_weight_qtyp.
2020-12-03 00:00:16 -08:00
Zhang Lei
9992f0f812
Implement QLinear GlobalAveragePool with sse2/neon. (#5838)
Add QLinear Global Average Pool for quantization for ARM and SSE2.

Co-authored-by: Tracy Sharpe <tracysh@microsoft.com>
2020-11-23 19:23:58 -08:00
Yufeng Li
6f86c4dbe3
Quantize LSTM (#5595)
Quantize LSTM:
1. dynamically quantizes MatMul inside the LSTM. It doesn't quantize activation function.
2. support per-channel on the input weight and recurrent weight.
2020-11-18 11:21:49 -08:00
Peichen Xie
e8c0f5d0ff
Update the quantization script to support GEMM (transB==1) (#5432)
* Modify onnx_quantizer.py

* Fix topology order issues

* Handle more cases
2020-11-17 21:24:48 -08:00
Zhang Lei
77b1eea9cf
Add option to allow quantize_input() use input_qtype for initializers. (#5721) 2020-11-06 09:33:24 -08:00
Yufeng Li
5c4543e194
Calibrate float tensor only (#5704) 2020-11-04 23:55:48 -08:00
Yufeng Li
6c2162e97a
Fix quantization of Conv1D with bias (#5491)
* Fix reshape for Conv with bias
2020-10-20 15:27:26 -07:00
Yufeng Li
b04cf2d229
Update ORT to 1.5.1 in Bert Quantization Notebook (#5396)
* Update ORT to 1.5.1 in Bert Quantization Notebook
2020-10-08 09:55:01 -07:00
Yufeng Li
e8b9aa1f29 fix quantization of EmbeddingLayerNorm (#5321) 2020-10-01 20:08:43 -07:00
RRRachelllll555
507f5bf5f6
Update test calibrate script (#5185)
* update test_calibrate according to latest calibrate.py

* fix datasize bug in e2e example

Co-authored-by: t-yguo <t-yguo@microsoft.com>
2020-09-27 21:59:56 -07:00
Yufeng Li
61ba5b501a
Fix bug in the back to back quantization of matmul and conv (#5264)
* fix bug in the back to back quantization of matmul and conv

* fix bug in back to back gather
2020-09-23 08:47:20 -07:00
RRRachelllll555
f7c1e51810
Remove shape inference and fix save large model(>2g) issue (#5210)
* remove shape inference and fix save large model problem

* remove unnecessary import

* refine code and add external format for quantize_qat

* remove initializers in tensors_to_calibrate

* small refine

Co-authored-by: t-yguo <t-yguo@microsoft.com>
2020-09-18 08:46:31 -07:00
Pranav Prakash
f5df96256c
Fix order of returned values in quantize_weight_per_channel (#5205)
Must match returned order of `quantize_inputs`
2020-09-17 17:57:46 -07:00
Zhang Lei
cd0386b649
MaxPool versioning in quantization tools. (#5194)
MaxPool versioning in quantization tools.
2020-09-16 22:52:24 -07:00
Yufeng Li
3068a835f1
Fix quantization of 1-D conv with bias (#5157) 2020-09-14 18:07:14 -07:00
Andrei Shadrikov
82b25e1731
Fix datasize call in calibrate (#5110)
* Moving datasize to the interface.

* Reverting changes and adressing the comment
2020-09-14 18:06:23 -07:00
Zhang Lei
d45e49dd2b
Add LeakyRelu and Sigmoid QLinear Quantization support (#5116)
* Add LeakyRelu and Sigmoid QLinear Quantization support

* Change due to reflect master changes.
2020-09-14 14:46:24 -07:00
Yufeng Li
20b2f45b24
Support per-channel quantization of weight tensor (#5057)
* Support per-channel quantization of weight tensor

* rename util functions

* fix bugs in calibrate

* add support of reduce_range

* refine opset check
2020-09-14 11:53:50 -07:00
Yufeng Li
ffc2b25a3a
Quantization tool improvement (#4933)
Improve quantization tools:
1. Support QAT
2. Make quantization tool to register Operators.
3. Make the API clear to use

Co-authored-by: t-yguo <t-yguo@microsoft.com>
2020-09-01 09:07:46 -07:00
RRRachelllll555
9a6db9b9f4
Fix next node access bug in calibration tool (#4863)
* fix bug in calibration tool

* fix next node access bugs

* rm file in wrong folder

* refine

* optimize

* refine

* refine format

* refine

Co-authored-by: t-yguo <t-yguo@microsoft.com>
2020-08-21 20:48:54 -07:00
Yufeng Li
0575881949
Update quantization notebook to pytorch 1.6 (#4834) 2020-08-18 14:20:46 -07:00
Vagif
6499a38b7d
Add the missing onnx_proto import (#4705)
* add missing onnx_proto import
* Fix TensorProto usage in calibrate.py
* remove unused imports
2020-08-10 12:46:21 -07:00
RRRachelllll555
f3fc8ca954
Add input tensor calibration (#4619)
* add input tensor calibration

* set default fusions to be true

Co-authored-by: t-yguo <t-yguo@microsoft.com>
2020-07-28 14:04:41 -07:00
Yufeng Li
9c75c29403
refine opset version getter (#4602) 2020-07-24 10:34:56 -07:00
RRRachelllll555
c5df918744
improve calibration tool (#4561)
* improve calibration tool

* modify calibration interface name

* modify calibration interface name

* refine calibrate and calibrate_user

* refine and add type info

* refine and add type info

* add e2e user example file

* remove unnecessary files

* remote test images no longer needed

* update readme document

Co-authored-by: t-yguo <t-yguo@microsoft.com>
2020-07-22 21:31:49 -07:00
Yufeng Li
822b23ff2f
Add support of EmbeddingLayerNorm (#4562) 2020-07-21 21:43:02 -07:00
Yufeng Li
e92e0860c8
BERT quantization notebook (#4543)
* BERT quantization notebook

* update notebooks

* more benchmark

* add version info
2020-07-20 18:23:37 -07:00
Yufeng Li
d4db83858b
Only quantize gather with initializer (#4469) 2020-07-09 13:33:43 -07:00
Yufeng Li
5dc7339be6
Add quantization tool to python package (#4458)
* Add quantization tool to python package
2020-07-08 21:42:53 -07:00
Yufeng Li
fc5e65a22d
Add quantization support for GPT2 past state and use model to generate outputs in OpTester (#4340)
* Make quantization support GPT2 past state
* Make OpTester to be able to generate reference outputs with a model. With it, there is no need to compute outputs manually, which are impossible for some cases.
2020-06-26 09:29:29 -07:00
Yufeng Li
197da135eb
Implement quantized Attention on cpu (#4111)
* Implement QAttention on CPU
* support QAttention in quantization tool
* refine attention code
* add more unit tests
2020-06-03 13:42:00 -07:00
Yufeng Li
7c774e967a
support quantization of optimized model with ir<4 (#3853) 2020-05-13 11:16:37 -07:00
Zhang Lei
eab61e87ce
Fix quantization tool bugs when model nodes have no name. (#3854)
Fix bugs when model nodes have no name.
2020-05-12 20:38:26 -07:00
Zhang Lei
c365822808
Refactor some for the calibate.py. Add QLinearAdd and QLinearMul support. Fix bugs loading jpgs not strict RGB, and typoes in load_batch call. (#3542) 2020-04-18 17:10:55 -07:00
Yufeng Li
af618278f6
fix bugs in quantization and calibration tools (#3329)
Fix 3 bugs:
node names duplicate in calibration augment_graph if the name of node to quantize is empty.
If output nodes are quantized, output value are quantized and not dequantized back
Gather with data type int64 should not be quantized
2020-03-30 17:50:25 -07:00
Tianlei Wu
403f99cd77
Use yapf to format python (#3276)
Update ReformatSourcePython.bat to use YAPF to format python code, and add onnxruntime\test directory to be formatted.

Add onnxruntime\.style.yapf for configuration. The style is based on google, except max column width 120.

Format python scripts using ReformatSourcePython.bat.
2020-03-20 14:34:10 -07:00
Yufeng Li
a69d859912
fix quantize_bias (#3270) 2020-03-20 11:36:47 -07:00
Tracy Sharpe
fe0b2b2abd
QLinearConv speed up (#3196)
For x86/x64 builds, change the QLinearConv op to use MLAS for the u8u8=s32 GEMM, then requantize the intermediate buffer to u8.
2020-03-13 16:54:55 -07:00
Yufeng Li
c69194ec4c
fix the missing return in _get_quantize_input_nodes and format code with yapf (#3199)
* fix the missing return for function _get_quantize_input_nodes

* format quantization code with yapf
2020-03-13 09:28:41 -07:00
Yufeng Li
876d0c5430
Make quantization parameters as constant weigth instead of overrideable (#3160) 2020-03-10 08:35:02 -07:00
Yufeng Li
1d2b8115e2
Support u8u8 in quantization tool (#3140) 2020-03-05 14:42:46 -08:00
Ashwini Khade
807a59c55d
Add calibration tool (#2845)
* add calibration tool

* add model for e2e example

* format readme

* some more formatting updates

* plus a few more updates

* plus review comments

* plus updates

* more updates
2020-01-20 14:49:35 -08:00
Ashwini Khade
8643f3ebbb
add domain check for nodes + update documentation (#2831) 2020-01-14 11:15:50 -08:00
Ashwini Khade
cc75e5a162
update quantization doc (#2783)
* update documentation for quantization script

* plus some spell corrections
2020-01-13 10:52:46 -08:00
Ashwini Khade
d197079473
quantization script updates (#2208) 2019-10-21 12:25:52 -07:00
Jonny Shipton
df472cbfbd quantise: Don't error when initializer graph input is missing (#1872)
ONNX IR version 4 and above do not require graph initializers to have
corresponding graph inputs.
2019-09-30 21:57:00 +00:00
Ashwini Khade
0f6cf9a335
enable quantizing specific nodes (#1742) 2019-09-03 11:04:17 -07:00
Ashwini Khade
16087f3133
update default values for weight quatization (#1564) 2019-08-05 21:39:37 -07:00