Commit graph

323 commits

Author SHA1 Message Date
Hariharan Seshadri
d42399e1b0
Allow querying a GraphProto's doc_string as part of ModelMetadata (#6248) 2021-01-05 22:18:03 -08:00
liqunfu
addb4b8c2b
Liqun/speech model loop to scan (#6070)
Provide a tool to convert Loop to Scan for Nuphar performance
Fix Nuphar CI pipeline failures.

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-01-05 15:15:23 -08:00
Olivia Jain
c8de3f355a
Refactor EP Perf Tool (#6202)
* merge master, keep postprocess status commit

* download float16.py everytime

* using variables to reference eps

* adding ACL EP to ep perf tool

* accuracy with absolute tolerance configurable

* add acl to dict + remove commented line
2021-01-04 08:50:41 -08:00
Changming Sun
1b23b28706
Remove MKLML/openblas/jemalloc build config (#6212) 2020-12-30 17:18:19 -08:00
Chi Lo
945fae8f56
Lochi/quantization tool for trt (#6103)
* Initial implementation of generating calibration dynamic range table

* Initialize validation support for Quantization

* Initialize validation support for Quantization (cont.)

* Improve validation support for Quantization

* Improve validation support for Quantization

* Rewrite/Refine for calibration and validation

* Rewrite/Refine for calibration and validation (cont.)

* Refine code

* Refine code

* Add data reader for BERT

* Add flatbuffers to serialize calibration table

* Refine code and add BERT evaluation

* Refine the code

* minor modification

* Add preprocess/postprocess of vision team yolov3 and refine the code

* Update annotation

* Make bbox cooridates more accurate

* Fix bug

* Add support of batch processing

* Batch processing for model zoo yolov3

* Add batch inference for evaluation

* Refine the code

* Add README

* Add comments

* Refine the code for PR

* Remove batch support checking in data_reader and refine the code

* Refine the code for PR

* Refine the code for PR review

Co-authored-by: Olivia Jain <oljain@microsoft.com>
2020-12-21 20:59:08 -08:00
Olivia Jain
234e94b4e1
Add Status.csv to EP Perf Tool (#6167)
* merge master, keep postprocess status commit

* download float16.py everytime

* removing hardcoded values
2020-12-21 20:23:19 -08:00
Cecilia Liu
980a93c164
Model Fusion For Bart (#6105)
Fusion fix for Bart models
2020-12-15 14:30:15 -08:00
Edward Chen
64709b1335
Deprecate Python global configuration functions [Part 1] (#5923)
Enable options to be set via execution provider (EP)-specific options and log deprecation warning from current global configuration functions.
2020-12-15 11:32:43 -08:00
ashbhandare
b1a75d0e98
Enable passing initial optimizer state while creating training session (#5869)
* Support to pass initial optimizer states to optimizer graph builder

* Changes for passing init optim state to training session config

* Pass optimizer state through cpp and python frontend

* Cleanup

* Review comments

* Fix windows and mac CI

* Review comments

* review comments

* Review comments

* Frontend review changes

* Fix CI
2020-12-08 21:20:51 -05:00
Ye Wang
fa06be2133
Support export >2G model when using optimizer.py only (#6014)
* checkin

* add warning if user specify same inut and output path
2020-12-07 17:18:49 -08:00
Tianlei Wu
51fbe87b9b
Update profiler tool to support gpt2 and longformer models (#6011)
* support gpt2 and longformer in profiler tool
* rename bert_profiler to profiler
* Add --basic_optimization to allow user to use basic level of graph optimization
* Add --kernel_time_only to filter kernel time and exclude fence time
* Add --threshold to filter nodes that with low run time percentage.
2020-12-07 10:33:41 -08:00
Changming Sun
925879a8b0
Remove python 3.8 Windows GPU build from python packaging pipeline (#6054)
Revert the last a few changes to get the pipeline back to a normal state.
2020-12-07 10:23:07 -08:00
George Wu
020efc9002
fix windows cuda support for python 3.8 + (#6046)
* fix

* noqa

* fix.

* remove unused import
2020-12-07 10:09:22 -08:00
Tianlei Wu
cdb91208a3
longformer onnx conversion and benchmark tools (#6007)
* initial implementation of longformer tools for onnx conversion and benchmark

* Support ONNX conversion for transformers 4.0
Add an option to optimize onnx model, and export fp16 model
2020-12-03 11:37:30 -08:00
Cecilia Liu
3b198c9614
Support Fusion for 1 and 2 Inputs Bert Models Converted From tf (#5993)
Support fusion for 1 and 2 inputs Bert models converted from tf
2020-12-03 10:52:33 -08:00
Zhang Lei
648c9c7789
Fix bugs for 1: Calibrator should check model inputs; 2: (#6017)
quantize_inupts forgot to use parameter initializer_use_weight_qtyp.
2020-12-03 00:00:16 -08:00
Ye Wang
5f516899bf
optimize a bert model converted using tf2onnx (#5492)
* optimize a bert model converted using tf2onnx

* add test data

* update

* remove comments

* format

* Revert "format"

This reverts commit f8ae88cb564bce5caf4780e56561403f3ba3d524.

* Revert "remove comments"

This reverts commit 59d8a693581a731fd0291b70fe2c9cec6c4950fe.

* add a squeeze node to convert a 3-d mask to 2-d

* update

* update

* verify and add comments
2020-12-01 11:19:16 -08:00
Changming Sun
2d9dcc4576
Add python 3.9 support (#5874)
1. Add python 3.9 support(except Linux ARM)
2. Add Windows GPU python 3.8 to our packaging pipeline.
2020-11-30 12:02:48 -08:00
Ivan Stojiljkovic
015fbb3dbb
Add support for Python 3.8+ on Windows when CUDA is enabled (#5956) 2020-11-26 15:52:30 -08:00
KeDengMS
ee908eb0aa
Symbolic shape inference: fix rank for ConstantOfShape (#5912) 2020-11-24 14:50:41 -08:00
Zhang Lei
9992f0f812
Implement QLinear GlobalAveragePool with sse2/neon. (#5838)
Add QLinear Global Average Pool for quantization for ARM and SSE2.

Co-authored-by: Tracy Sharpe <tracysh@microsoft.com>
2020-11-23 19:23:58 -08:00
sfatimar
916410151c
Fix for hetero multi python binding with new shared library (#5895)
Co-authored-by: sfatimar <sahar.fatima@intel/com>
2020-11-23 15:41:10 -08:00
Ye Wang
3d5b48a894
remove use_cdn when loading pretrained model (#5900) 2020-11-23 14:26:55 -08:00
Hariharan Seshadri
d46dbeafd3
Expose knobs to create and share (CPU) allocators across sessions in C# and Python (#5634) 2020-11-21 14:12:33 -08:00
Ryan Hill
ba739a8000
Convert OpenVINO into a shared provider (#5778)
Same as Dnnl and TensorRT before it, now with more methods and more cleanup.
2020-11-20 17:39:57 -08:00
Olivia Jain
3738ca7e10
Improve perf testing (#5760)
* build off a specific commit and archive wheel file

* rename to fp32, prefix results w/ commit, add CPU col

* rename 99th to 90 percentile

* get symbolic_shape from master each time

* add install archive wheel, parallel build

* shortening hash
2020-11-20 16:03:09 -08:00
Scott McKay
f0142da59c
Add NNAPI to providers that can be used via the python bindings. (#5867)
Update ORT model conversion script
  - add args for specifying optimization level and whether to use NNAPI
  - add logic to create a list of required ops and ORT format model that can be used with NNAPI
2020-11-21 09:18:35 +10:00
Takeshi Watanabe
a622533ecc
Support profile_file_prefix in python binding (#5864) 2020-11-20 14:28:50 -08:00
S. Manohar Karlapalem
ff58f621fa
Remove nGraph Execution Provider (#5858)
* Remove nGraph Execution Provider

Pursuant to nGraph deprecation notice: https://github.com/microsoft/onnxruntime/blob/master/docs/execution_providers/nGraph-ExecutionProvider.md#deprecation-notice

**Deprecation Notice**

| | |
| --- | --- |
| Deprecation Begins	| June 1, 2020 |
| Removal Date |	December 1, 2020 |

Starting with the OpenVINO™ toolkit 2020.2 release, all of the features
previously available through nGraph have been merged into the OpenVINO™
toolkit. As a result, all the features previously available through
ONNX RT Execution Provider for nGraph have been merged with ONNX RT
Execution Provider for OpenVINO™ toolkit.

Therefore, ONNX RT Execution Provider for **nGraph** will be deprecated
starting June 1, 2020 and will be completely removed on December 1,
2020. Users are recommended to migrate to the ONNX RT Execution Provider
for OpenVINO™ toolkit as the unified solution for all AI inferencing on
Intel® hardware.

* Remove nGraph Licence info from ThirdPartyNotices.txt

* Use simple Test.Run() for tests without EP exclusions

To be consistent with rest of test code.

* Remove nGraph EP functions from Java code
2020-11-19 16:47:55 -08:00
Hariharan Seshadri
62508ef0e4
Revert "Remove MKLML build config (#5559)" (#5855) 2020-11-19 10:53:08 -08:00
Yufeng Li
6f86c4dbe3
Quantize LSTM (#5595)
Quantize LSTM:
1. dynamically quantizes MatMul inside the LSTM. It doesn't quantize activation function.
2. support per-channel on the input weight and recurrent weight.
2020-11-18 11:21:49 -08:00
Peichen Xie
e8c0f5d0ff
Update the quantization script to support GEMM (transB==1) (#5432)
* Modify onnx_quantizer.py

* Fix topology order issues

* Handle more cases
2020-11-17 21:24:48 -08:00
Scott McKay
7b76b57fc8
Support EPs that compile nodes in a minimal build. (#5776)
* Support EPs that compile nodes in a minimal build. This enables NNAPI being used.
2020-11-17 13:52:22 +10:00
Maajid khan
a84a058f9e
[OpenVINO-EP] Enabling Multi Device support (#5740)
* Enabling Multi Device support for UEP

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Minor fix added
*Added a simple fix to determine OpenVINO
version for Arm build as well

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
2020-11-11 15:16:30 -08:00
Chi Lo
92292de135
Tensorrt perf tool (#5436)
* Add YAML file for pipeline

* Modify typo

* Add working directory

* Modify and test

* Modfiy and test

* Modify and test

* Modify and test

* Modify

* Modify

* Modify

* Modify

* Make sure to copy all the result files

* Add clearn up

* Modify

* Modify agent pool name

* Upload only specific artifacts

* Modify

* Integrated CI Pipeline for running TRT perf as well as added the “large amount of models” into perf model target

* Fix bug

* Fix bug

* Add reading the information regarding previously known failing models
and then skip testing them during benchmark/validation

* Modify the script file for CI

* Replace print with logger.info

* Fix bug

* Fix bug

* Refine the code

* Modify the script so that it can capture script segmentation fault while
running ORT

* Fix bug

* fix bug

* fix bug

* Add debug info

* fix bug

* Refine perf code

* Refine the code

* fix bug

* Code refactoring

* change many-models path

* remove metadata after validation/benchmark are done

* Update README.md

* Fix bug so that metadata doesn't hold stale value

* Remove hardcode and update README

* Add arguments to the script to make it run correctly

* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines

* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines

* Fix bug so that metadata doesn't hold stale value

* Fix small bug of finding test dataset directory for FP16 test data, as
well as modification of some output information

* use -i random for perf test of TRT changes

Co-authored-by: Olivia Jain <oljain@microsoft.com>
2020-11-06 12:27:42 -08:00
Ye Wang
95e6da7957
Revert saving optimized model as external data (#5690)
* revert and add support for saving external data

* review comments

* update
2020-11-06 11:54:19 -08:00
Zhang Lei
77b1eea9cf
Add option to allow quantize_input() use input_qtype for initializers. (#5721) 2020-11-06 09:33:24 -08:00
Yufeng Li
5c4543e194
Calibrate float tensor only (#5704) 2020-11-04 23:55:48 -08:00
Ye Wang
a028ca41ec
Optimize flaubert (#5651)
* optimize flaubert

* fix an issue and format

* revert non-relevent change

* review comments
2020-11-03 09:51:42 -08:00
Wei-Sheng Chin
8856c2595b
Sync the two IDs in OrtMemoryInfo when calling ctor (#5663)
* Sync the two IDs in OrtMemoryInfo when calling ctor

* Also fix the same problem for output
2020-11-02 23:22:47 -08:00
Tianlei Wu
2c02530603
Bert Model Profiling Tool (#5654)
* Add profiler tool for BERT models
2020-11-02 13:47:37 -08:00
Derek Murray
ff538b8d3a
Minor fixes in BERT Inference notebook (#5637)
Add missing commas to the code example.
2020-11-02 09:49:23 -08:00
Maajid khan
d98062da0c
[OpenVINO-EP] Hetero support (#5627)
* Implement Hetero in UEP
* Added security checks to take valid Hetero combinations
  as device type
* Integrating Hetero features
* Get the statistics Report in Debug Mode

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Passing right device type for vadm_baackend

Added simple fix to pick the right device type
when using vadm_backend with Hetero as well.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixed batching logic for 2020.4 and above

* Fixed flake8 PEP8 errors

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Minor Fixes Added
*Added security checks for device_type passed
in for Hetero build during run time
*code cleanup

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Minor changes Added
*Fixed batch_size bug in vadm_backend
*code cleanup
*Documentation updated for Hetero

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
2020-10-30 22:35:08 -07:00
KeDengMS
32bf6390ad
Some fixes to symbolic shape inference (#5642)
* Some fixes to symbolic shape inference

1. Topological sort before iteration in graph
2. Fix a case in slice: start=100000, end=-100000, step=-1, dim=2
3. Fix Nuphar Gemm test's random seed
4. Slice opset 1 axes is optional
2020-10-30 19:28:47 -07:00
Weixing Zhang
aec4cb489e
ROCm EP for AMD GPU (#5480)
The ROCm EP is designed and implemented based on AMD GPU software stack named ROCm. Here is the link for the details about ROCm: https://rocmdocs.amd.com/en/latest/

ROCm EP was created based on the following things:
1. AMD GPU programming language: HIP
2. AMD GPU HIP language runtime: amdhip64
3. BLAS: rocBLAS, hipBLAS
4. DNN: miOpen
5. Collective Communication library: RCCL
6. cub: hipCub
7. …

Current status:
BERT-L and GPT2 training can be ran on AMD GPU with data parallel.

Next:
1. Make more GPU code be sharable between ROCm EP and CUDA EP since HIP language and HIP runtime API are very close to CUDA.
2. Continue improving the implementation.
3. Continue GPU kernel optimization.
4. Support model parallelism on ROCm EP.
……

The rocm kernels have been removed from this commit and will be in a separate PR. Since the original PR was too big(~180 files), it was suggested to split the PR into two parts, one is rocm-kernels, the other is non rocm kernels.  

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
Co-authored-by: sabreshao <sabre.shao@amd.com>
Co-authored-by: anghostcici <11013544+anghostcici@users.noreply.github.com>
Co-authored-by: Suffian Khan <sukha@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2020-10-29 17:13:04 -07:00
Maajid khan
ddf83d1ace
Maajid/multi threading 2 (#5568)
* Enabled multi-threading for OpenVino EP

->Enabled support for concurrent_session_runs

*Run UEP using concurrent_session_runs > 1
*Enabled support for ORT_PARALLEL ExecutionMode

->Documentation Added for Enabling MultiThreading

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Minor Fixes added
*Configure the value of nireq during Runtime
*Documentation typos rectified and details
added for Multi_Threaded Inference

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Some checks added for this fix
*Added checks to invalidate wrong nireq value
and assigned it to default value of 8
*Added new config options for enable_vpu_fast_compile
which were changed w.r.t OpenVINO_2021.1 Release

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
2020-10-27 14:48:12 -07:00
Tianlei Wu
1f304fbee7
Attention with past and no unidirectional mask (#5557)
* Update fusions to support shared node, and mask of all ones
2020-10-21 20:12:02 -07:00
Changming Sun
5802fe1699
Remove MKLML build config (#5559)
Remove MKLML build config
2020-10-21 13:11:25 -07:00
Hariharan Seshadri
4291c57322
[C# and Python APIs] Expose knobs to enable/disable platform telemetry collection (#5481) 2020-10-21 10:32:13 -07:00
Yufeng Li
6c2162e97a
Fix quantization of Conv1D with bias (#5491)
* Fix reshape for Conv with bias
2020-10-20 15:27:26 -07:00