Commit graph

245 commits

Author SHA1 Message Date
RRRachelllll555
f7c1e51810
Remove shape inference and fix save large model(>2g) issue (#5210)
* remove shape inference and fix save large model problem

* remove unnecessary import

* refine code and add external format for quantize_qat

* remove initializers in tensors_to_calibrate

* small refine

Co-authored-by: t-yguo <t-yguo@microsoft.com>
2020-09-18 08:46:31 -07:00
Pranav Prakash
f5df96256c
Fix order of returned values in quantize_weight_per_channel (#5205)
Must match returned order of `quantize_inputs`
2020-09-17 17:57:46 -07:00
Zhang Lei
cd0386b649
MaxPool versioning in quantization tools. (#5194)
MaxPool versioning in quantization tools.
2020-09-16 22:52:24 -07:00
Chi Lo
9f526f45ac
TensorRT Perf Tool (#4900)
* Initialize tensorrt perf script

* Add bert-squad dependencies

* Modified code to make ort inference with CUDA/Tensorrt

* Add get CUDA/TRT version

* uncomment bert-squad

* Add BERT-SQUAD inputs.json

* Add FastRCNN

* Make preprocess/validation in to common functions

* Add MaskRCNN and SSD and consolidate the code

* Add dependencies for MaskRCNN

* following modifications are made:
    - create common fetch function to get inputs/outputs of model from ONNX model zoo.
    - create common validation function to compare inference outputs with reference outputs from ONNX model zoo.
    - move run/repeat time to argument list. (still working on other arguments, like fp16 or fp32, latency percentile).
    - generate table in csv file to show the latency comparison (TRT vs CUDA) side by side.

* Add approache to analyze profling file and also update model related
settings

* Add models

* Add most of models from ONNX model zoo

* Add model input name and print all the model names at the end of run

* Add system info

* Add TRT fp16 support

* Refine the code

* Handle TRT fall back and modify the way to get input data

* Refine code

* Modify code

* Add more precise approach to measure inference

* Add io-binding

* Add YoLoV4

* Refine the code

* Refine the code

* Add models

* Add yolov4 notebook for jetson device

* Update notebook

* Update notebook

* Add CVS models

* Add missing model

* Add support of float16

* Add new way to get trt version

* Add "validate" and "benchmark" mode

* Add randomly generated input

* Refine perf script

* Refine the code.

* Add README

* Refine the code

* Update README.md

* Refine code

* Update README.md

* Remove all the model related python and instead using model_list.json as
models configuration.

Refine the benchmark.py

* Refine the code

Co-authored-by: Chi Lo <lochi@microsoft.com>
2020-09-15 10:06:01 -07:00
Yufeng Li
3068a835f1
Fix quantization of 1-D conv with bias (#5157) 2020-09-14 18:07:14 -07:00
Andrei Shadrikov
82b25e1731
Fix datasize call in calibrate (#5110)
* Moving datasize to the interface.

* Reverting changes and adressing the comment
2020-09-14 18:06:23 -07:00
S. Manohar Karlapalem
f7edf0aa57
[OpenVINO-EP] Enable EP config options for VPU hardware (#5119)
* Added config flags for VPU Fast Recompile

* clean-up ifdefs

* Add VPU Fast compile config option

Adds an option that enables Fast compilation of models to VPU
hardware specific format.

* Add config option to choose specific device id for inference

Inference of all subgraphs will be scheduled only on this device
even if other devices of the same type are available.

* Add Python API to list available device IDs

* code cleanup

* Add second C/C++ API with settings string parameter

Adds an additional C/C++ API that allows passing multiple
key-value pairs for settings as a single string. Multiple
settings are delimited by '\n' while the key and value
within a setting are delimited by '|'.

* Append 'Ex' to the extended C/C++ API

* Use set_providers Py API to set config options.

Uses Session.set_providers Python API to set EP runtime config
options as key/val pairs
Deprecated older module function definitions for config settings.
Updates documentation.

* avoid globals for py config options where possible

Co-authored-by: intel <you@example.com>
2020-09-14 15:46:14 -07:00
Zhang Lei
d45e49dd2b
Add LeakyRelu and Sigmoid QLinear Quantization support (#5116)
* Add LeakyRelu and Sigmoid QLinear Quantization support

* Change due to reflect master changes.
2020-09-14 14:46:24 -07:00
Yufeng Li
20b2f45b24
Support per-channel quantization of weight tensor (#5057)
* Support per-channel quantization of weight tensor

* rename util functions

* fix bugs in calibrate

* add support of reduce_range

* refine opset check
2020-09-14 11:53:50 -07:00
Ye Wang
5302fe4079
A fix in load_pretrained_model() (#5137)
* Fix in load_pretrained_model

* Update onnx_exporter.py
2020-09-11 17:23:02 -07:00
Tianlei Wu
7511021e0e
Save Gpt2 test data (#5132)
(1) Save gpt2 test data during test generation.
(2) Use torch fp32 model as baseline when onnx model is fp16.
(3) Refine logic to compose onnx model path
2020-09-11 14:31:49 -07:00
Ye Wang
89509f256a
Not fuse SkipLayerNorm when add has initializer input (#5123) 2020-09-11 11:46:31 -07:00
Ye Wang
879751f3b7
Support Tensorflow benchmarking and onnx export in transformers tool (#5068)
* init checkin for tf export and tf benchmark

* small fix on argparse

* refactor

* review comments

* review comments
2020-09-11 00:47:37 -07:00
Tianlei Wu
c5d4ae0401
Add transformers tools to python package (#5090)
* Add transformers to onnxruntime python package
2020-09-10 15:42:15 -07:00
Ye Wang
b23e08b85c
Add AutoModel selector in transformers tool (#5051)
* Add AutoModel selector in transformers tool

* change distilbert-*-squad's pipeline to AutoModelForQuestionAnswering

* rule base selector and add model_class as parameter

* Update huggingface_models.py

* review comments
2020-09-08 15:06:04 -07:00
Cameron Maske
4553b2eecd
Expose DirectML provider to python (conflicts resolved from #3359) (#4630) 2020-09-08 14:34:09 -07:00
Hariharan Seshadri
e1ed0fde2b
Prevent registering both DML and CUDA EPs in an ML op test (#5078) 2020-09-08 11:13:50 -07:00
Xiang Zhang
0dad79b495
Add SetLanguageProjection C Api and use it in four projections (#5023)
* Add SetLanguageProjection C Api and use it in four projections

* static cast enum languageprojection to uint32_t

* resolve comments

* fix typo and line added unintentionally

* revert unecessary change

* reorder c# api

* add TensorAt and CreateAndRegisterAllocator in Csharp to keep the same order as C apis
2020-09-04 14:26:39 -07:00
Ashwini Khade
9ba2cfb71b
fix py packaging pipeline (#5038)
* add test skip logic when opset > allowed opset

* fix attribute error

* plus fix
2020-09-03 09:32:10 -07:00
Scott McKay
28445c88f9
Changes to enable saving and loading an ORT format model (#4995)
* Changes to enable saving and loading an ORT format model via the public APIs.
Cleanup session.py to try and make slightly more understandable. More refactoring is needed here.
Couple of bug fixes

* Fix bug in handling NodeArg serialization for optional inputs which has a name and no type info.

* Address PR comments
  - tweak SessionOptions config to avoid double lookup
  - merge duplicated functionality in python binding around registering an EP with optional options

Fix a couple of build issues.

* Update C API to be consistent with python API
  - only load model in InferenceSession ctor if required
  - support loading ORT model in minimal build

* Fix nodejs test.
We get an invalid path error from LoadInterOp first now

* Another attempt at fixing nodejs test.
Error message depends on whether ENABLE_LANGUAGE_INTEROP_OPS is defined. Make the output consistent.

The interop implementation looks suspicious given it appears to be internal code that is going via the public api. TBD if that should be fixed.

* Fix couple of build issues.

* Disable test temporarily so PR can be checked in.
Will fix in separate PR that adds final pieces for minimal build as the test is required there.

* Give up on nodejs test and make the match simpler.
Fix init call in TrainingSession python to not pass through sess. it wasn't being used in Session anyway so passing it through just adds confusion.

* Fix call to Session.__init__ in TrainingSession.
Session now initializes Session._sess to None to make it clearer where the 'ownership' of that member is, and that needs to happen before TrainingSession sets it.
2020-09-03 09:10:48 -07:00
Hariharan Seshadri
a9db287bd7
Return windows error code for library loading and unloading failure (#5036) 2020-09-02 18:07:36 -07:00
Ye Wang
b4e9e98cee
Add more huggingface models in benchmark tools (#4986)
* checkin more huggingface models

* review comments

* review comments
2020-09-02 16:41:58 -07:00
Hariharan Seshadri
d30dd41c0e
Remove public default ctor in PyInferenceSession and replace it with a protected ctor (#4990) 2020-09-01 17:10:36 -07:00
Tianlei Wu
a47cae031f
Use raw attention mask in BERT related fusions (#4889)
* Use raw attention mask in fusion
* update python scripts to use raw attention mask by default
2020-09-01 13:22:20 -07:00
Yufeng Li
ffc2b25a3a
Quantization tool improvement (#4933)
Improve quantization tools:
1. Support QAT
2. Make quantization tool to register Operators.
3. Make the API clear to use

Co-authored-by: t-yguo <t-yguo@microsoft.com>
2020-09-01 09:07:46 -07:00
Hariharan Seshadri
7045910d10
Support RegisterCustomOpsLibrary via the Python API (#4764) 2020-08-28 13:24:29 -07:00
Scott McKay
08eb15068c
Exclude the Map types from the build if ML ops are disabled. (#4908)
* Exclude the Map types from the build if ML ops are disabled. They're the only ops that use Map.
2020-08-27 17:48:12 +10:00
Tianlei Wu
268d2283c0
Export GPT-2 ONNX model without postion_ids and attention_mask inputs (#4852)
* Export GPT-2 ONNX model without postion_ids and attention_mask inputs
* allow benchmark_gpt2 on user's model
* refactor:  get_dummy_inputs returns a data class.
2020-08-24 13:05:25 -07:00
Scott McKay
db7669b225
Reduce ONNX dependency in minimal build (#4890)
* Next round of changes.

Remove inclusion of ONNX schema header
Exclude custom registry related things
Move IsConstantInitializer from graph_utils to Graph as it's needed in a minimal build and graph_utils is excluded.
2020-08-23 07:02:13 +10:00
RRRachelllll555
9a6db9b9f4
Fix next node access bug in calibration tool (#4863)
* fix bug in calibration tool

* fix next node access bugs

* rm file in wrong folder

* refine

* optimize

* refine

* refine format

* refine

Co-authored-by: t-yguo <t-yguo@microsoft.com>
2020-08-21 20:48:54 -07:00
Yufeng Li
0575881949
Update quantization notebook to pytorch 1.6 (#4834) 2020-08-18 14:20:46 -07:00
gwang-msft
dee7596724
Add a generic collection of session configurations to the SessionOptions (#4718)
* adding generic configurations for session options

* fix a build break on linux

* fix training ci build break

* fix training ci build break

* addressed CR comments

* fix traning ci build break

* move config_key from enum to string

* add c# api

* add python api

* fix build break

* move prepacking from 2 new api entries to session options configs

* fix traning ci build break

* add python test, update some comments, move const key definition to avoid build break

* addressed comments

* move definitions of keys to common.h

* move api to version 5

* remove accidental change in build.py

* remove pragma to avoid build break

* addressed CR comments

* fix the python build break, and move location of config keys definition

* small typo changes
2020-08-18 13:40:40 -07:00
Tianlei Wu
1ce2982f65
Update GPT-2 notebook using IO Binding example (#4799) 2020-08-17 10:43:36 -07:00
Tianlei Wu
a69ca63895
add --no_attention_mask option (#4750)
output producer name and version in optimized model.
avoid removing initializer that existed in graph output
2020-08-12 15:56:25 -07:00
Tianlei Wu
316d1a9e69
Update benchmark for large model or model name with non-alphanumeric. (#4743)
* Export model > 2GB using external data format
2020-08-10 12:58:01 -07:00
Vagif
6499a38b7d
Add the missing onnx_proto import (#4705)
* add missing onnx_proto import
* Fix TensorProto usage in calibrate.py
* remove unused imports
2020-08-10 12:46:21 -07:00
Tianlei Wu
9c729d1719
Update notebook for mac since onnxruntime 1.3 or 1.4 in mac does not have openmp (#4732) 2020-08-07 14:01:48 -07:00
Ye Wang
61726e58f0
fix (#4697) 2020-08-07 13:08:41 -07:00
Yufeng Li
b22091dc91
Add the framework to support prepack (#4413)
* add support of prepack
* add support for QAttention and DynamicQuantizeMatMul
* add an use_prepacking option
* add use_prepacking in c_sharp api
2020-08-07 09:39:19 -07:00
Tianlei Wu
e70e9e2f67
refine machine_info and output onnxruntime_tools version (#4679)
* output onnxruntime_tools version
* change get_machine_info return data type to string
2020-08-02 18:20:59 -07:00
Ye Wang
b1bfff34e0
Support distill-bert fusion in transformers tool (#4631)
* checkin attention

* checkin embedlayer but cause invalid onnx model

* resolve comments

* fix comments

* check return values

* add version limit

* fix comments

* add warning
2020-07-31 17:57:54 -07:00
Tianlei Wu
3588c5b545
Add GPT-2 test generation to convert_to_onnx.py (#4670)
* add gpt2 tester
* add an option to include output latency.
2020-07-30 21:03:53 -07:00
Tianlei Wu
326cc686df
Update notebook: disable GPU for tensorflow (#4649) 2020-07-29 10:09:06 -07:00
RRRachelllll555
f3fc8ca954
Add input tensor calibration (#4619)
* add input tensor calibration

* set default fusions to be true

Co-authored-by: t-yguo <t-yguo@microsoft.com>
2020-07-28 14:04:41 -07:00
Yufeng Li
a06cf6a3b3
Show quantization model size in benchmark of transformer (#4626)
* Show quantization model size in benchmark of transformer

* refine model size calculation
2020-07-27 23:56:33 -07:00
Hariharan Seshadri
9510f26744
[Python] Support more APIs for the SessionOptions class (#4596) 2020-07-24 12:56:54 -07:00
Yufeng Li
9c75c29403
refine opset version getter (#4602) 2020-07-24 10:34:56 -07:00
Tianlei Wu
ace41b8064
Force return_tuple=True to handle transformers breaking change of output format. (#4599) 2020-07-23 11:35:41 -07:00
Tianlei Wu
ea87c0d028
Update Transformer Optimizer documents (#4591)
(1) Add bert-base-cased and gpt2 benchmark results on V100
(2) Update list of supported models.
(3) Add comments to gpt2_helper.
(4) Use IO Binding in test parity by default.
2020-07-23 08:38:39 -07:00
RRRachelllll555
c5df918744
improve calibration tool (#4561)
* improve calibration tool

* modify calibration interface name

* modify calibration interface name

* refine calibrate and calibrate_user

* refine and add type info

* refine and add type info

* add e2e user example file

* remove unnecessary files

* remote test images no longer needed

* update readme document

Co-authored-by: t-yguo <t-yguo@microsoft.com>
2020-07-22 21:31:49 -07:00