Commit graph

258 commits

Author SHA1 Message Date
Tianlei Wu
e33de20861
Update gpt2 notebook for int8 quantization (#5346)
* Update gpt2 notebook for ORT 1.5
* add sections for int8 quantization including QAT note
2020-10-02 09:41:52 -07:00
Yufeng Li
e8b9aa1f29 fix quantization of EmbeddingLayerNorm (#5321) 2020-10-01 20:08:43 -07:00
KeDengMS
7495dc167a
Symbolic shape inference: fix a bug in auto_merge when broadcasting (#5349)
The bug happens when merging following shapes:

input0: [1, 1, 'Min(1024, input1_dynamic_axes_3)', 'Min(1024, input1_dynamic_axes_3)']
input1: ['input1_dynamic_axes_1*input1_dynamic_axes_2', 12, 'input1_dynamic_axes_3', 'input1_dynamic_axes_3']
input2: []

The fix is to avoid broadcasting merge on input2
2020-10-01 15:24:00 -07:00
Ye Wang
caed6c264c
Add tf2pytorch wrapper in transformers tool (#5316)
* init checkin

* format

* refactor

* review comments
2020-10-01 13:58:58 -07:00
Ye Wang
1a12f510fc
Support T5 benchmarking in transformers tool (#5133)
* init checkin

* review comments

* modify according to transformers release
2020-09-29 22:58:28 -07:00
RRRachelllll555
507f5bf5f6
Update test calibrate script (#5185)
* update test_calibrate according to latest calibrate.py

* fix datasize bug in e2e example

Co-authored-by: t-yguo <t-yguo@microsoft.com>
2020-09-27 21:59:56 -07:00
Sherlock
b03fb82ab7
Transformer layer-wise Recompute (#4526)
* Build Recomputation Graph

* Make topological sort to run FW nodes first

* Pattern match start and end of transformer layer

* Topological sort with Priority

* Add logger to Gradient Graph Builder

* Use Logger

* Introduce Execution Order
2020-09-24 19:56:32 -07:00
KeDengMS
5a71819be6
Symbolic shape inference: fix a case for concat (#5277)
* Symbolic shape inference: fix a case when concat requires merge multiple dims

* Fix a bug triggered in newer version of sympy
Fix a bug in output data type guessing
2020-09-24 08:16:47 -07:00
Yufeng Li
61ba5b501a
Fix bug in the back to back quantization of matmul and conv (#5264)
* fix bug in the back to back quantization of matmul and conv

* fix bug in back to back gather
2020-09-23 08:47:20 -07:00
Tianlei Wu
3bbce69185
bump version to 1.5.1 (#5258) 2020-09-22 20:57:34 -07:00
KeDengMS
8dceebda0e
[Training/Python] Add option to enable symbolic shape inference (#5107)
This change adds symbolic shape inference to ORT training which helps static memory planning for model like BART.
2020-09-22 10:49:07 -07:00
Ye Wang
65740deb10
Fix a bug in EmbedLayerNorm fusion (#5150)
* fix embedlayernorm bug

* review comments

* interim checkin

* review comments

* Fix core dump in MacOS

* remove unnecessary lines

* update document

* Update graph_utils.cc

* Update onnx_exporter.py

* resolve comments
2020-09-21 12:26:14 -07:00
KeDengMS
ce3b67e0cd
[Python] Move symbolic_shape_infer from nuphar to tools (#5162)
* [Python] Move symbolic shape inference from nuphar to tools

* Fix PEP8 ERROR
2020-09-18 09:31:06 -07:00
RRRachelllll555
f7c1e51810
Remove shape inference and fix save large model(>2g) issue (#5210)
* remove shape inference and fix save large model problem

* remove unnecessary import

* refine code and add external format for quantize_qat

* remove initializers in tensors_to_calibrate

* small refine

Co-authored-by: t-yguo <t-yguo@microsoft.com>
2020-09-18 08:46:31 -07:00
Pranav Prakash
f5df96256c
Fix order of returned values in quantize_weight_per_channel (#5205)
Must match returned order of `quantize_inputs`
2020-09-17 17:57:46 -07:00
Zhang Lei
cd0386b649
MaxPool versioning in quantization tools. (#5194)
MaxPool versioning in quantization tools.
2020-09-16 22:52:24 -07:00
Chi Lo
9f526f45ac
TensorRT Perf Tool (#4900)
* Initialize tensorrt perf script

* Add bert-squad dependencies

* Modified code to make ort inference with CUDA/Tensorrt

* Add get CUDA/TRT version

* uncomment bert-squad

* Add BERT-SQUAD inputs.json

* Add FastRCNN

* Make preprocess/validation in to common functions

* Add MaskRCNN and SSD and consolidate the code

* Add dependencies for MaskRCNN

* following modifications are made:
    - create common fetch function to get inputs/outputs of model from ONNX model zoo.
    - create common validation function to compare inference outputs with reference outputs from ONNX model zoo.
    - move run/repeat time to argument list. (still working on other arguments, like fp16 or fp32, latency percentile).
    - generate table in csv file to show the latency comparison (TRT vs CUDA) side by side.

* Add approache to analyze profling file and also update model related
settings

* Add models

* Add most of models from ONNX model zoo

* Add model input name and print all the model names at the end of run

* Add system info

* Add TRT fp16 support

* Refine the code

* Handle TRT fall back and modify the way to get input data

* Refine code

* Modify code

* Add more precise approach to measure inference

* Add io-binding

* Add YoLoV4

* Refine the code

* Refine the code

* Add models

* Add yolov4 notebook for jetson device

* Update notebook

* Update notebook

* Add CVS models

* Add missing model

* Add support of float16

* Add new way to get trt version

* Add "validate" and "benchmark" mode

* Add randomly generated input

* Refine perf script

* Refine the code.

* Add README

* Refine the code

* Update README.md

* Refine code

* Update README.md

* Remove all the model related python and instead using model_list.json as
models configuration.

Refine the benchmark.py

* Refine the code

Co-authored-by: Chi Lo <lochi@microsoft.com>
2020-09-15 10:06:01 -07:00
Yufeng Li
3068a835f1
Fix quantization of 1-D conv with bias (#5157) 2020-09-14 18:07:14 -07:00
Andrei Shadrikov
82b25e1731
Fix datasize call in calibrate (#5110)
* Moving datasize to the interface.

* Reverting changes and adressing the comment
2020-09-14 18:06:23 -07:00
S. Manohar Karlapalem
f7edf0aa57
[OpenVINO-EP] Enable EP config options for VPU hardware (#5119)
* Added config flags for VPU Fast Recompile

* clean-up ifdefs

* Add VPU Fast compile config option

Adds an option that enables Fast compilation of models to VPU
hardware specific format.

* Add config option to choose specific device id for inference

Inference of all subgraphs will be scheduled only on this device
even if other devices of the same type are available.

* Add Python API to list available device IDs

* code cleanup

* Add second C/C++ API with settings string parameter

Adds an additional C/C++ API that allows passing multiple
key-value pairs for settings as a single string. Multiple
settings are delimited by '\n' while the key and value
within a setting are delimited by '|'.

* Append 'Ex' to the extended C/C++ API

* Use set_providers Py API to set config options.

Uses Session.set_providers Python API to set EP runtime config
options as key/val pairs
Deprecated older module function definitions for config settings.
Updates documentation.

* avoid globals for py config options where possible

Co-authored-by: intel <you@example.com>
2020-09-14 15:46:14 -07:00
Zhang Lei
d45e49dd2b
Add LeakyRelu and Sigmoid QLinear Quantization support (#5116)
* Add LeakyRelu and Sigmoid QLinear Quantization support

* Change due to reflect master changes.
2020-09-14 14:46:24 -07:00
Yufeng Li
20b2f45b24
Support per-channel quantization of weight tensor (#5057)
* Support per-channel quantization of weight tensor

* rename util functions

* fix bugs in calibrate

* add support of reduce_range

* refine opset check
2020-09-14 11:53:50 -07:00
Ye Wang
5302fe4079
A fix in load_pretrained_model() (#5137)
* Fix in load_pretrained_model

* Update onnx_exporter.py
2020-09-11 17:23:02 -07:00
Tianlei Wu
7511021e0e
Save Gpt2 test data (#5132)
(1) Save gpt2 test data during test generation.
(2) Use torch fp32 model as baseline when onnx model is fp16.
(3) Refine logic to compose onnx model path
2020-09-11 14:31:49 -07:00
Ye Wang
89509f256a
Not fuse SkipLayerNorm when add has initializer input (#5123) 2020-09-11 11:46:31 -07:00
Ye Wang
879751f3b7
Support Tensorflow benchmarking and onnx export in transformers tool (#5068)
* init checkin for tf export and tf benchmark

* small fix on argparse

* refactor

* review comments

* review comments
2020-09-11 00:47:37 -07:00
Tianlei Wu
c5d4ae0401
Add transformers tools to python package (#5090)
* Add transformers to onnxruntime python package
2020-09-10 15:42:15 -07:00
Ye Wang
b23e08b85c
Add AutoModel selector in transformers tool (#5051)
* Add AutoModel selector in transformers tool

* change distilbert-*-squad's pipeline to AutoModelForQuestionAnswering

* rule base selector and add model_class as parameter

* Update huggingface_models.py

* review comments
2020-09-08 15:06:04 -07:00
Cameron Maske
4553b2eecd
Expose DirectML provider to python (conflicts resolved from #3359) (#4630) 2020-09-08 14:34:09 -07:00
Hariharan Seshadri
e1ed0fde2b
Prevent registering both DML and CUDA EPs in an ML op test (#5078) 2020-09-08 11:13:50 -07:00
Xiang Zhang
0dad79b495
Add SetLanguageProjection C Api and use it in four projections (#5023)
* Add SetLanguageProjection C Api and use it in four projections

* static cast enum languageprojection to uint32_t

* resolve comments

* fix typo and line added unintentionally

* revert unecessary change

* reorder c# api

* add TensorAt and CreateAndRegisterAllocator in Csharp to keep the same order as C apis
2020-09-04 14:26:39 -07:00
Ashwini Khade
9ba2cfb71b
fix py packaging pipeline (#5038)
* add test skip logic when opset > allowed opset

* fix attribute error

* plus fix
2020-09-03 09:32:10 -07:00
Scott McKay
28445c88f9
Changes to enable saving and loading an ORT format model (#4995)
* Changes to enable saving and loading an ORT format model via the public APIs.
Cleanup session.py to try and make slightly more understandable. More refactoring is needed here.
Couple of bug fixes

* Fix bug in handling NodeArg serialization for optional inputs which has a name and no type info.

* Address PR comments
  - tweak SessionOptions config to avoid double lookup
  - merge duplicated functionality in python binding around registering an EP with optional options

Fix a couple of build issues.

* Update C API to be consistent with python API
  - only load model in InferenceSession ctor if required
  - support loading ORT model in minimal build

* Fix nodejs test.
We get an invalid path error from LoadInterOp first now

* Another attempt at fixing nodejs test.
Error message depends on whether ENABLE_LANGUAGE_INTEROP_OPS is defined. Make the output consistent.

The interop implementation looks suspicious given it appears to be internal code that is going via the public api. TBD if that should be fixed.

* Fix couple of build issues.

* Disable test temporarily so PR can be checked in.
Will fix in separate PR that adds final pieces for minimal build as the test is required there.

* Give up on nodejs test and make the match simpler.
Fix init call in TrainingSession python to not pass through sess. it wasn't being used in Session anyway so passing it through just adds confusion.

* Fix call to Session.__init__ in TrainingSession.
Session now initializes Session._sess to None to make it clearer where the 'ownership' of that member is, and that needs to happen before TrainingSession sets it.
2020-09-03 09:10:48 -07:00
Hariharan Seshadri
a9db287bd7
Return windows error code for library loading and unloading failure (#5036) 2020-09-02 18:07:36 -07:00
Ye Wang
b4e9e98cee
Add more huggingface models in benchmark tools (#4986)
* checkin more huggingface models

* review comments

* review comments
2020-09-02 16:41:58 -07:00
Hariharan Seshadri
d30dd41c0e
Remove public default ctor in PyInferenceSession and replace it with a protected ctor (#4990) 2020-09-01 17:10:36 -07:00
Tianlei Wu
a47cae031f
Use raw attention mask in BERT related fusions (#4889)
* Use raw attention mask in fusion
* update python scripts to use raw attention mask by default
2020-09-01 13:22:20 -07:00
Yufeng Li
ffc2b25a3a
Quantization tool improvement (#4933)
Improve quantization tools:
1. Support QAT
2. Make quantization tool to register Operators.
3. Make the API clear to use

Co-authored-by: t-yguo <t-yguo@microsoft.com>
2020-09-01 09:07:46 -07:00
Hariharan Seshadri
7045910d10
Support RegisterCustomOpsLibrary via the Python API (#4764) 2020-08-28 13:24:29 -07:00
Scott McKay
08eb15068c
Exclude the Map types from the build if ML ops are disabled. (#4908)
* Exclude the Map types from the build if ML ops are disabled. They're the only ops that use Map.
2020-08-27 17:48:12 +10:00
Tianlei Wu
268d2283c0
Export GPT-2 ONNX model without postion_ids and attention_mask inputs (#4852)
* Export GPT-2 ONNX model without postion_ids and attention_mask inputs
* allow benchmark_gpt2 on user's model
* refactor:  get_dummy_inputs returns a data class.
2020-08-24 13:05:25 -07:00
Scott McKay
db7669b225
Reduce ONNX dependency in minimal build (#4890)
* Next round of changes.

Remove inclusion of ONNX schema header
Exclude custom registry related things
Move IsConstantInitializer from graph_utils to Graph as it's needed in a minimal build and graph_utils is excluded.
2020-08-23 07:02:13 +10:00
RRRachelllll555
9a6db9b9f4
Fix next node access bug in calibration tool (#4863)
* fix bug in calibration tool

* fix next node access bugs

* rm file in wrong folder

* refine

* optimize

* refine

* refine format

* refine

Co-authored-by: t-yguo <t-yguo@microsoft.com>
2020-08-21 20:48:54 -07:00
Yufeng Li
0575881949
Update quantization notebook to pytorch 1.6 (#4834) 2020-08-18 14:20:46 -07:00
gwang-msft
dee7596724
Add a generic collection of session configurations to the SessionOptions (#4718)
* adding generic configurations for session options

* fix a build break on linux

* fix training ci build break

* fix training ci build break

* addressed CR comments

* fix traning ci build break

* move config_key from enum to string

* add c# api

* add python api

* fix build break

* move prepacking from 2 new api entries to session options configs

* fix traning ci build break

* add python test, update some comments, move const key definition to avoid build break

* addressed comments

* move definitions of keys to common.h

* move api to version 5

* remove accidental change in build.py

* remove pragma to avoid build break

* addressed CR comments

* fix the python build break, and move location of config keys definition

* small typo changes
2020-08-18 13:40:40 -07:00
Tianlei Wu
1ce2982f65
Update GPT-2 notebook using IO Binding example (#4799) 2020-08-17 10:43:36 -07:00
Tianlei Wu
a69ca63895
add --no_attention_mask option (#4750)
output producer name and version in optimized model.
avoid removing initializer that existed in graph output
2020-08-12 15:56:25 -07:00
Tianlei Wu
316d1a9e69
Update benchmark for large model or model name with non-alphanumeric. (#4743)
* Export model > 2GB using external data format
2020-08-10 12:58:01 -07:00
Vagif
6499a38b7d
Add the missing onnx_proto import (#4705)
* add missing onnx_proto import
* Fix TensorProto usage in calibrate.py
* remove unused imports
2020-08-10 12:46:21 -07:00
Tianlei Wu
9c729d1719
Update notebook for mac since onnxruntime 1.3 or 1.4 in mac does not have openmp (#4732) 2020-08-07 14:01:48 -07:00