Commit graph

251 commits

Author SHA1 Message Date
Ye Wang
4f82ad1b58
Topo sort the model before saving (#7913)
* checkin toposort

* review comments

* revert and add TODO
2021-06-02 16:57:08 -07:00
Tianlei Wu
71b05f74a2
fix duplicated node name (#7865) 2021-05-27 17:16:17 -07:00
Yufeng Li
94bb09bf47
fix topo sort in quant tool (#7833)
* fix topo sort in quant tool

* add unit test and make the topo sort stable
2021-05-26 17:53:35 -07:00
Xiaoyu Liu
224a664811
GPT-2 one step search tutorial (#7718)
* GPT2 with one step search tutorial
* remove quantization section

Co-authored-by: Xiaoyu Liu <xiaoyu@xiaoyu-VM.z4vh1dzj5eoevgybsksdpz2izh.jx.internal.cloudapp.net>
2021-05-18 12:31:39 -07:00
Young Jin Kim
e9057d2e49
ZCode FastFormers changes (#5827)
* Add FBGEMM submodule

* Add fbgemm based per-channel quantization

* Add missing logic for pre-layernorm transformer model fusion

* add support for structured pruning architecture -fastformers

* Fix windows build

* Add a default behavior when head_size is not present for the backward compatibility

* Remove FBGEMM and default to tensor-wise quantization, column-wise quantization will be enabled later

* Fixed some unit test errors

* Fix windows compile error and unit test errors

* delete the option removed from the upstream

* Addresses review comments and fixes a merge error

* Remove commented out code

* add non-zero zp support

* support A and B scale with any dimensions

* fix build breaks

* fix warning in MSVC

* Fix bug for not checking original float value names when treat it as not existing.

* Clean up head size

* Clean up python tools

* Enable per column quantization

* fix quant weight cleanup bug

* A few code clean up

* Some code clean-up

* Some code clean-up

* Change option name

* update default value

* Rename option and parameter names

* Missing argument name change

* Add tests for quantization options for attention and matmul

Co-authored-by: Yufeng Li <liyufeng1987@gmail.com>
Co-authored-by: Lei Zhang <zhang.huanning@hotmail.com>
2021-05-17 21:12:21 -07:00
Ye Wang
5e8086ad8e
Support fusions inside subgraphs in optimizer tool (#7701)
* skip subgraph when updating model

* intreim checkin

* interim checkin 2

* support transformers optimizations in subgraph

* change more files

* fix comments typo
2021-05-17 12:43:55 -07:00
Yufeng Li
6b0a7905ed
fix quant weight cleanup bug (#7707) 2021-05-14 22:04:35 -07:00
Zhang Lei
0f7721a019
Fix bug for not checking original float value names when treat it as not existing. (#7695) 2021-05-14 12:50:30 -07:00
Zhang Lei
033f0b3b7c
fix typo. (#7690) 2021-05-14 10:25:34 -07:00
Vincent Wang
dac24f7d63
Add ATenOp and call aten::embedding and its Backward Op from ORT (#7590)
* build with libtorch and impl torchembedding

* fix op shape infer

* local commit

* atenfunctionop

* call aten operator from online extension

* rollback build.py

* resolve comments

* bugfix

* fix build

* fix ortmodule test

* remove external outputs, resolve comments

* resolve comments

* export embedding to microsoft::atenop

* bugfix
2021-05-13 09:24:27 +08:00
Zhang Lei
1c7e683a95
Add Squeeze and Unsqueeze support for quantizaton tools. (#7673) 2021-05-12 14:56:46 -07:00
Zhang Lei
31d4413919
fix quantization tool bug when existing pass through only input (#7674) 2021-05-12 14:54:42 -07:00
Olivia Jain
29172d8f54
Setup EP Dashboard (#7321)
* setting up dashboard
* posting to ort dashboard
* creating separate docker file
* including common deps
* tracking latency over time
2021-05-11 10:33:39 -07:00
Tianlei Wu
55c086b664
symbolic shape inference improvements for contrib ops (#7606)
* add EmbedLayerNormalization
* use onnx shape inference for Unsqueeze
* Fix type warning in Attention
2021-05-07 17:03:24 -07:00
Tianlei Wu
d88da44066
Allow flexible order of Add inputs in Attention fusion (#7565) 2021-05-06 09:43:28 -07:00
Zhang Lei
9465948715
Quantization tools using one more extra_options on interface. (#7293)
handle nnapi special sigmoid options.
2021-05-05 13:51:50 -07:00
Zhang Lei
f6cefc92e2
Add quantized value map after quantize input node added. (#7558) 2021-05-04 15:27:56 -07:00
Tianlei Wu
3c9ece4a11
[transformers optimizer] catch symbolic shape inference exception and clean up (#7560)
catch symbolic shape inference exception.
no prune graph when there is inner graph (Loop/If/Scan)
add an wrapper for numpy_helper.to_array so that we can debug onnx graph without external data
remove fuse_mask that is not used any more in onnx_model_bert_tf.py
2021-05-03 20:42:13 -07:00
Tianlei Wu
731f9e5033
Fix symbolic shape inference for Unsqueeze (#7555)
* fix Unsqueeze shape inference
* add tests
2021-05-03 18:06:59 -07:00
Ryota Tomioka
d1cb8c9dc9
Support negative indices and fix bound checking in symbolic shape inference for Slice (#7401)
* Use positivity everywhere; handle negative index in Slice

* limit positivity to inputs

* make handle_negative_index private

* strengthen sympy comparison

* further strengthen compariso
n and a minor refactoring

* Add flip test

* Fall through if -int_max in handle_negative_index()

* minor fix for infer_Concat to include initializers

* Add more tests

* use simplify

* more tests
2021-05-03 09:07:55 -07:00
Xiaoyu Liu
994c2ed420
GPT2 one step beam search update with configuration support (#7425)
* check in early stop search as separate type
* rename to beam search configurations
* update do sample configuration flag help
* rename to configurable search step
* add option groups
* add more unit tests

Co-authored-by: Xiaoyu Liu <xiaoyu@xiaoyu-VM.z4vh1dzj5eoevgybsksdpz2izh.jx.internal.cloudapp.net>
2021-04-29 13:19:56 -07:00
thilow
22d7cde725
Fix a 'Squeeze' related issue in symbolic_shape_infer.py (#7380)
* Update symbolic_shape_infer.py

don't rely on static code infer in _infer_Squeeze_

* checking if dorpped axes might be =! 1

* Checking opset. Logging assumption that symbolic dimensions are unequal to 1.

* more checks
2021-04-28 13:13:04 -07:00
Zhang Lei
ada0fbbd2d
Implement qlinear concat and unit test. (#7341)
* Implement qlinear concat and unit test.
Add quantization tools for QLinearConcat and it quantization tests.

* Add kernel def hash for QLinearConcat.

* Change according to PR. Add qdq transformer support for QLinearConcat.

* Add QDQ Transformer unittest. Fix typo on domain.

* remove dup logic of no use.

* fix x86 build error.

* Update operator docs.
2021-04-26 13:38:40 -07:00
Xiaoyu Liu
913ea8264b
GPT2 with one step beam search (#7163)
* beam search refactoring checkin
* add factory class and deduplicate code
* one step beam search works on gpu

Co-authored-by: Xiaoyu Liu <xiaoyu@xiaoyu-VM.z4vh1dzj5eoevgybsksdpz2izh.jx.internal.cloudapp.net>
2021-04-20 06:23:52 -07:00
Tianlei Wu
aa9ab565f5
FastGelu fusion for Megatron model (#7344)
* add a fastgelu pattern from Megatron model

* update comment

* add test
2021-04-15 00:39:33 -07:00
Oliver Rausch
87bd836886
Fixes in symbolic shape inference (#7258)
* Add symbolic shape inference for Transpose

* Support steps in symbolic shape inference for Slice

* Add inference for BatchNormalization

* Address review changes

* Address review changes
2021-04-13 22:17:30 -07:00
Zhang Lei
f62db1a09c
quantization tools support qlinear average pool (#7309) 2021-04-13 18:22:42 -07:00
Zhang Lei
a4fdb4dbd9
Support transpose by merge Reshape etc into direct xint8 operators. (#7265)
* Suppose transpose by merge Reshape etc into direct xint8 operators.

* Add resize operator quantization support

* Add QDQ tests for resize, reshape, maxpool, transpose.
2021-04-08 18:00:35 -07:00
KeDengMS
0d49e53985
[Symbolic shape infer] fix scalar shape in Expand (#7285) 2021-04-08 10:26:28 -07:00
Olivia Jain
fb40602ea2
Mem trt (#6868)
* adding trt comparison and memory consumption

* creating separate docker file
2021-04-05 22:16:12 -07:00
Marek Šuppa
008065aab1
Update README.md (#7043)
* Fix the precision type (switch from nonexistent `int32` to `fp32`).
2021-04-05 10:03:14 -07:00
Yufeng Li
8d737f9770
handle optional input in quant topo sort (#7223) 2021-04-02 20:42:48 -07:00
Yufeng Li
c4ebc60870
sort quantized nodes in topo logical order (#7172) 2021-03-30 09:01:15 -07:00
Yufeng Li
77c19436c0
add a notebook for mobilenetv2 quantization (#7164)
* add a notebook for quant mobilenetv2
2021-03-29 13:24:14 -07:00
Ashwini Khade
b22e60bd44
pull onnx latest commit (#7102)
* update onnx commit

* fix test scripts to remove deprecated call

* update filters

* add registration for relu and cumsum ver 14

* add promote trilu to onnx domain

* update onnx-tensorrt submodule

* update flag

* update flag

* update dependencies

* fix android ci failure
2021-03-29 11:00:38 -07:00
Yufeng Li
3771e0bf10
update bert quantization notebook (#7137) 2021-03-25 18:12:53 -07:00
Yufeng Li
8e54b76e2d
QDQ implementation (#7033)
* Add QDQ basic implementation
2021-03-25 09:17:23 -07:00
Yufeng Li
fffe16cb43
Fix a bug in quant GEMM and add an unit test (#7111) 2021-03-23 16:39:35 -07:00
Yufeng Li
c965878a69
fix a bug in global average pool and add unit test (#6913)
* fix bug in QGlobalAveragePool

* add unit test for quant GlobalAveragePool

* not run quantization tests if disable_contrib_ops enabled
2021-03-22 20:01:27 -07:00
Scott McKay
b2c6617b0f
Use 'as_scalar' when checking the 'cond' value of 'If' (#7063)
#6884
2021-03-22 18:04:38 +10:00
Chi Lo
8c3b59a026
Quantization calibration refactor (#6893)
* Code refactor

* Modify code to tackle OOM when calibrating on larget dataset

* Fix mismatch issue when setting keepdims on ReduceMin/ReduceMax

* Add COCO val 2017 annotation

* Fix mismatch issue when setting keepdims on ReduceMin/ReduceMax

* Fix bug of "No module named:onnxruntime.quantization.CalTableFlatBuffers"

* Check and install flatbuffers module

* Add script to donwload coco dataset image and refactor example

* Fix bug of "No module
named:onnxruntime.quantization.CalTableFlatBuffers"

* Add CalTableFaltBuffers as module

* Remove annotation, user can download by themselves.

* Uncommet code

* Add back instances_val2017.json

* Make sure flatbuffers installed when ORT is installed

* Refactor code to call coco api

* Enable FP16 for example
2021-03-19 01:09:11 -07:00
Cecilia Liu
4fd9fef9ee
Support HuggingFace Models Converted From tf2onnx in Python Script (#6985)
Support tf2onnx huggingface models in python script
2021-03-17 15:33:57 -07:00
Tianlei Wu
73d085ccdd
add slow test (#7035) 2021-03-16 20:49:51 -07:00
Ye Wang
4e670f7ab1
Support larger hidden size in Attention Cuda kernel (#7002)
* Support larger hidden size in Attention Cuda kernel

* Update attention_transpose.cu

* review comments

* fix typo and add check in quantization

* update readme
2021-03-15 15:46:10 -07:00
Ye Wang
b57a85d863
Support symbolic shape infer in transformers tool (#6899)
* fusion support runtime edge shape checking

* trim ctor

* add test

* fix

* Update test_shape_infer_helper.py

* use torch input size as dynamic axis hints

* check dir

* update

* support longformerattention

* update and add support for bert ops

* trim

* review comments

* review comments
2021-03-10 21:37:12 -08:00
Tianlei Wu
4884eee642
Attention fusion detect num_heads and hidden_size automatically (#6920) 2021-03-10 10:17:00 -08:00
Funtowicz Morgan
9126faa35b
Ability to fuse non-square (pruned) attention weights for BERT-like models (#6850) 2021-03-04 17:08:08 -08:00
Reuben Zotz-Wilson
107c9672fd
No such file or directory with --use_external_data_form and int8 (#6867)
Implemented following change to avoid the error when using both --use_external_data_form and --precision int8 with GPT2LMHeadModel, which results in
line 161, in save_external_data; open(external_data_file_path, 'ab').close()
FileNotFoundError: [Errno 2] No such file or directory:
This may also be related to the identified bug #6047.
2021-03-04 15:14:23 -08:00
Tianlei Wu
8f1786d5d2
Save output tensors in bert_test_data tool (#6872) 2021-03-04 13:09:05 -08:00
Faith Xu
6285ee2398
Reroute quantization tool readme to /docs page (#6854) 2021-03-02 13:49:42 -08:00