Commit graph

241 commits

Author SHA1 Message Date
Yufeng Li
3068a835f1
Fix quantization of 1-D conv with bias (#5157) 2020-09-14 18:07:14 -07:00
Andrei Shadrikov
82b25e1731
Fix datasize call in calibrate (#5110)
* Moving datasize to the interface.

* Reverting changes and adressing the comment
2020-09-14 18:06:23 -07:00
S. Manohar Karlapalem
f7edf0aa57
[OpenVINO-EP] Enable EP config options for VPU hardware (#5119)
* Added config flags for VPU Fast Recompile

* clean-up ifdefs

* Add VPU Fast compile config option

Adds an option that enables Fast compilation of models to VPU
hardware specific format.

* Add config option to choose specific device id for inference

Inference of all subgraphs will be scheduled only on this device
even if other devices of the same type are available.

* Add Python API to list available device IDs

* code cleanup

* Add second C/C++ API with settings string parameter

Adds an additional C/C++ API that allows passing multiple
key-value pairs for settings as a single string. Multiple
settings are delimited by '\n' while the key and value
within a setting are delimited by '|'.

* Append 'Ex' to the extended C/C++ API

* Use set_providers Py API to set config options.

Uses Session.set_providers Python API to set EP runtime config
options as key/val pairs
Deprecated older module function definitions for config settings.
Updates documentation.

* avoid globals for py config options where possible

Co-authored-by: intel <you@example.com>
2020-09-14 15:46:14 -07:00
Zhang Lei
d45e49dd2b
Add LeakyRelu and Sigmoid QLinear Quantization support (#5116)
* Add LeakyRelu and Sigmoid QLinear Quantization support

* Change due to reflect master changes.
2020-09-14 14:46:24 -07:00
Yufeng Li
20b2f45b24
Support per-channel quantization of weight tensor (#5057)
* Support per-channel quantization of weight tensor

* rename util functions

* fix bugs in calibrate

* add support of reduce_range

* refine opset check
2020-09-14 11:53:50 -07:00
Ye Wang
5302fe4079
A fix in load_pretrained_model() (#5137)
* Fix in load_pretrained_model

* Update onnx_exporter.py
2020-09-11 17:23:02 -07:00
Tianlei Wu
7511021e0e
Save Gpt2 test data (#5132)
(1) Save gpt2 test data during test generation.
(2) Use torch fp32 model as baseline when onnx model is fp16.
(3) Refine logic to compose onnx model path
2020-09-11 14:31:49 -07:00
Ye Wang
89509f256a
Not fuse SkipLayerNorm when add has initializer input (#5123) 2020-09-11 11:46:31 -07:00
Ye Wang
879751f3b7
Support Tensorflow benchmarking and onnx export in transformers tool (#5068)
* init checkin for tf export and tf benchmark

* small fix on argparse

* refactor

* review comments

* review comments
2020-09-11 00:47:37 -07:00
Tianlei Wu
c5d4ae0401
Add transformers tools to python package (#5090)
* Add transformers to onnxruntime python package
2020-09-10 15:42:15 -07:00
Ye Wang
b23e08b85c
Add AutoModel selector in transformers tool (#5051)
* Add AutoModel selector in transformers tool

* change distilbert-*-squad's pipeline to AutoModelForQuestionAnswering

* rule base selector and add model_class as parameter

* Update huggingface_models.py

* review comments
2020-09-08 15:06:04 -07:00
Cameron Maske
4553b2eecd
Expose DirectML provider to python (conflicts resolved from #3359) (#4630) 2020-09-08 14:34:09 -07:00
Hariharan Seshadri
e1ed0fde2b
Prevent registering both DML and CUDA EPs in an ML op test (#5078) 2020-09-08 11:13:50 -07:00
Xiang Zhang
0dad79b495
Add SetLanguageProjection C Api and use it in four projections (#5023)
* Add SetLanguageProjection C Api and use it in four projections

* static cast enum languageprojection to uint32_t

* resolve comments

* fix typo and line added unintentionally

* revert unecessary change

* reorder c# api

* add TensorAt and CreateAndRegisterAllocator in Csharp to keep the same order as C apis
2020-09-04 14:26:39 -07:00
Ashwini Khade
9ba2cfb71b
fix py packaging pipeline (#5038)
* add test skip logic when opset > allowed opset

* fix attribute error

* plus fix
2020-09-03 09:32:10 -07:00
Scott McKay
28445c88f9
Changes to enable saving and loading an ORT format model (#4995)
* Changes to enable saving and loading an ORT format model via the public APIs.
Cleanup session.py to try and make slightly more understandable. More refactoring is needed here.
Couple of bug fixes

* Fix bug in handling NodeArg serialization for optional inputs which has a name and no type info.

* Address PR comments
  - tweak SessionOptions config to avoid double lookup
  - merge duplicated functionality in python binding around registering an EP with optional options

Fix a couple of build issues.

* Update C API to be consistent with python API
  - only load model in InferenceSession ctor if required
  - support loading ORT model in minimal build

* Fix nodejs test.
We get an invalid path error from LoadInterOp first now

* Another attempt at fixing nodejs test.
Error message depends on whether ENABLE_LANGUAGE_INTEROP_OPS is defined. Make the output consistent.

The interop implementation looks suspicious given it appears to be internal code that is going via the public api. TBD if that should be fixed.

* Fix couple of build issues.

* Disable test temporarily so PR can be checked in.
Will fix in separate PR that adds final pieces for minimal build as the test is required there.

* Give up on nodejs test and make the match simpler.
Fix init call in TrainingSession python to not pass through sess. it wasn't being used in Session anyway so passing it through just adds confusion.

* Fix call to Session.__init__ in TrainingSession.
Session now initializes Session._sess to None to make it clearer where the 'ownership' of that member is, and that needs to happen before TrainingSession sets it.
2020-09-03 09:10:48 -07:00
Hariharan Seshadri
a9db287bd7
Return windows error code for library loading and unloading failure (#5036) 2020-09-02 18:07:36 -07:00
Ye Wang
b4e9e98cee
Add more huggingface models in benchmark tools (#4986)
* checkin more huggingface models

* review comments

* review comments
2020-09-02 16:41:58 -07:00
Hariharan Seshadri
d30dd41c0e
Remove public default ctor in PyInferenceSession and replace it with a protected ctor (#4990) 2020-09-01 17:10:36 -07:00
Tianlei Wu
a47cae031f
Use raw attention mask in BERT related fusions (#4889)
* Use raw attention mask in fusion
* update python scripts to use raw attention mask by default
2020-09-01 13:22:20 -07:00
Yufeng Li
ffc2b25a3a
Quantization tool improvement (#4933)
Improve quantization tools:
1. Support QAT
2. Make quantization tool to register Operators.
3. Make the API clear to use

Co-authored-by: t-yguo <t-yguo@microsoft.com>
2020-09-01 09:07:46 -07:00
Hariharan Seshadri
7045910d10
Support RegisterCustomOpsLibrary via the Python API (#4764) 2020-08-28 13:24:29 -07:00
Scott McKay
08eb15068c
Exclude the Map types from the build if ML ops are disabled. (#4908)
* Exclude the Map types from the build if ML ops are disabled. They're the only ops that use Map.
2020-08-27 17:48:12 +10:00
Tianlei Wu
268d2283c0
Export GPT-2 ONNX model without postion_ids and attention_mask inputs (#4852)
* Export GPT-2 ONNX model without postion_ids and attention_mask inputs
* allow benchmark_gpt2 on user's model
* refactor:  get_dummy_inputs returns a data class.
2020-08-24 13:05:25 -07:00
Scott McKay
db7669b225
Reduce ONNX dependency in minimal build (#4890)
* Next round of changes.

Remove inclusion of ONNX schema header
Exclude custom registry related things
Move IsConstantInitializer from graph_utils to Graph as it's needed in a minimal build and graph_utils is excluded.
2020-08-23 07:02:13 +10:00
RRRachelllll555
9a6db9b9f4
Fix next node access bug in calibration tool (#4863)
* fix bug in calibration tool

* fix next node access bugs

* rm file in wrong folder

* refine

* optimize

* refine

* refine format

* refine

Co-authored-by: t-yguo <t-yguo@microsoft.com>
2020-08-21 20:48:54 -07:00
Yufeng Li
0575881949
Update quantization notebook to pytorch 1.6 (#4834) 2020-08-18 14:20:46 -07:00
gwang-msft
dee7596724
Add a generic collection of session configurations to the SessionOptions (#4718)
* adding generic configurations for session options

* fix a build break on linux

* fix training ci build break

* fix training ci build break

* addressed CR comments

* fix traning ci build break

* move config_key from enum to string

* add c# api

* add python api

* fix build break

* move prepacking from 2 new api entries to session options configs

* fix traning ci build break

* add python test, update some comments, move const key definition to avoid build break

* addressed comments

* move definitions of keys to common.h

* move api to version 5

* remove accidental change in build.py

* remove pragma to avoid build break

* addressed CR comments

* fix the python build break, and move location of config keys definition

* small typo changes
2020-08-18 13:40:40 -07:00
Tianlei Wu
1ce2982f65
Update GPT-2 notebook using IO Binding example (#4799) 2020-08-17 10:43:36 -07:00
Tianlei Wu
a69ca63895
add --no_attention_mask option (#4750)
output producer name and version in optimized model.
avoid removing initializer that existed in graph output
2020-08-12 15:56:25 -07:00
Tianlei Wu
316d1a9e69
Update benchmark for large model or model name with non-alphanumeric. (#4743)
* Export model > 2GB using external data format
2020-08-10 12:58:01 -07:00
Vagif
6499a38b7d
Add the missing onnx_proto import (#4705)
* add missing onnx_proto import
* Fix TensorProto usage in calibrate.py
* remove unused imports
2020-08-10 12:46:21 -07:00
Tianlei Wu
9c729d1719
Update notebook for mac since onnxruntime 1.3 or 1.4 in mac does not have openmp (#4732) 2020-08-07 14:01:48 -07:00
Ye Wang
61726e58f0
fix (#4697) 2020-08-07 13:08:41 -07:00
Yufeng Li
b22091dc91
Add the framework to support prepack (#4413)
* add support of prepack
* add support for QAttention and DynamicQuantizeMatMul
* add an use_prepacking option
* add use_prepacking in c_sharp api
2020-08-07 09:39:19 -07:00
Tianlei Wu
e70e9e2f67
refine machine_info and output onnxruntime_tools version (#4679)
* output onnxruntime_tools version
* change get_machine_info return data type to string
2020-08-02 18:20:59 -07:00
Ye Wang
b1bfff34e0
Support distill-bert fusion in transformers tool (#4631)
* checkin attention

* checkin embedlayer but cause invalid onnx model

* resolve comments

* fix comments

* check return values

* add version limit

* fix comments

* add warning
2020-07-31 17:57:54 -07:00
Tianlei Wu
3588c5b545
Add GPT-2 test generation to convert_to_onnx.py (#4670)
* add gpt2 tester
* add an option to include output latency.
2020-07-30 21:03:53 -07:00
Tianlei Wu
326cc686df
Update notebook: disable GPU for tensorflow (#4649) 2020-07-29 10:09:06 -07:00
RRRachelllll555
f3fc8ca954
Add input tensor calibration (#4619)
* add input tensor calibration

* set default fusions to be true

Co-authored-by: t-yguo <t-yguo@microsoft.com>
2020-07-28 14:04:41 -07:00
Yufeng Li
a06cf6a3b3
Show quantization model size in benchmark of transformer (#4626)
* Show quantization model size in benchmark of transformer

* refine model size calculation
2020-07-27 23:56:33 -07:00
Hariharan Seshadri
9510f26744
[Python] Support more APIs for the SessionOptions class (#4596) 2020-07-24 12:56:54 -07:00
Yufeng Li
9c75c29403
refine opset version getter (#4602) 2020-07-24 10:34:56 -07:00
Tianlei Wu
ace41b8064
Force return_tuple=True to handle transformers breaking change of output format. (#4599) 2020-07-23 11:35:41 -07:00
Tianlei Wu
ea87c0d028
Update Transformer Optimizer documents (#4591)
(1) Add bert-base-cased and gpt2 benchmark results on V100
(2) Update list of supported models.
(3) Add comments to gpt2_helper.
(4) Use IO Binding in test parity by default.
2020-07-23 08:38:39 -07:00
RRRachelllll555
c5df918744
improve calibration tool (#4561)
* improve calibration tool

* modify calibration interface name

* modify calibration interface name

* refine calibrate and calibrate_user

* refine and add type info

* refine and add type info

* add e2e user example file

* remove unnecessary files

* remote test images no longer needed

* update readme document

Co-authored-by: t-yguo <t-yguo@microsoft.com>
2020-07-22 21:31:49 -07:00
George Wu
6b53a74867
replace invalid sample (#4567) 2020-07-21 23:51:17 -07:00
Yufeng Li
822b23ff2f
Add support of EmbeddingLayerNorm (#4562) 2020-07-21 21:43:02 -07:00
Chi Lo
affdeb53c2
Add Python API for specifying device options. (#4205)
* Add python API for specifying CUDA device id

* Modification for providing session based python api for specifying
device id

* When include header file pybind11/stl.h, conversion between c++
containers and Python list, vector and dict data structure are
automatically enabled.

https://pybind11.readthedocs.io/en/stable/advanced/cast/stl.html#

Therefore, refactor the code for better leverage this advantage.

* Make struct CudaDeviceOptions as default cuda device options

* Implement sess.set_providers(list_of_providers, list_of_provider_option_dicts)

But still stay consistent with existing sess.set_providers(list_of_provider)

* Add cuda provider option default setting

* Add support for setting cuda cuda_mem_limit and arena_extend_strategy.
Also resolved the merge conflict on session.py

* Use python ctypes to call cuda library to help python unittest

* Refine the code with reviewer's suggestions

* Add the capability of getting execution provider's configuration

- Once we introduced the capability to set execution provider's
configuration, it makes sense to add capability of getting ep's configuration.

* Modify the code with reviewer's suggestions.

* Using stoull() and stoul() depends on 32/64-bits architecture.

* Rewrite the testcases for testing setting CUDA device id

Note: We need to make sure every ORT process be run on one CUDA device
at a time.

* Make sure old session object is destroyed by python gc before new
session object is being created

* Move testcases to original onnxruntime_test_python.py

* Fix bugs to pass CI build

* Make it pass CI build (cont.)

* Make it pass CI build (cont.)
2020-07-21 07:28:13 -07:00
Yufeng Li
e92e0860c8
BERT quantization notebook (#4543)
* BERT quantization notebook

* update notebooks

* more benchmark

* add version info
2020-07-20 18:23:37 -07:00