Commit graph

3554 commits

Author SHA1 Message Date
sfatimar
6d2a30eae3
[OPENVINO-EP] 2021.1 Release (#5431)
* Cmake changes for 2021.1

* added new ov version 2020.1 for faster rcnn

* Added missing defs

* equal op modified

* changes to incoroporate faster rcnn

* backend util.cc

* hddl_plugin_config.hpp is depreceated . instead use hddl_config.hpp

* changing myriad precision bool to i32

* gather is not enabled for gpu

* conv2D and pooltest auto_pad attribute should not be null

* negative indices are not valid for scatter op in myriad

* non max suppression op only supported in faster rcnn mode

* maxpool indices output is not supported

* Cleaned redundant code in backends

* Added ifdefs for HDDL config

* cast output dimensions check
topk operator k input it seems only resolved for myriad as it is
throwing issues for ask rcnn . need to verify

* we are limiting the subgraph size to 3 here

* taking care of review comments

* Fixed minor bugs

* Modified Slice op checks
* Added NonZero, Upsample
* Removed TopK if it's in the middle of a subgraph

* incorporated upsample conditions too

* Dockerfile changes for 2021.1 release

* dockerfile aptkey update

* Minor fixes

* ceil condition added  again

* Fixed few gpu models

* Disabled LSTM and yolov3 in ModelTests

* python softmax cross entropy tests and negative log likelihood

* Update Build.md

Updated for openvino 2021.1

* Update OpenVINO-ExecutionProvider.md

update openvino execution provider for 2021.1

* Update READMe.md

updated new openvino version

* Update Dockerfile.openvino 

added environment variable for DEBIAN Frontend

* Fixed myriad models

* Fixed gather condition
* Fixed mask rcnn model on myriad

* Modified Gather condition

* set default target of MCR dockerfile to MYRIAD_FP16

* Fixed tinyolov3 on CPU

* Update OpenVINO-ExecutionProvider.md

update openvino execution provider documentation

* Update Dockerfile.openvino

Removed environment variable

* Update OpenVINO-ExecutionProvider.md

update image manipulation networks supported

* Update onnx_backend_test_series_filters.jsonc

removed test_upsample_nearest from cpu test cases

* New InternalCI changes for 2021.1

* Full protobuf removed for OpenVINO

* Protobuf added

* Updated with apt installation for openvino

* Revert the testing changes

* Reverted testing changes

* File permessions are changed to original

* Deleted openvino installation and cmake change

* Optimized Dockerfile

Removed unnecessary cmake installation, numpy

* Added missing ifdefs

* delete array fix

* backend_utils.cc output_shape

* Revert "set default target of MCR dockerfile to MYRIAD_FP16"

This reverts commit 928d3e2b71e2f589cf51dacd3a133951cf9ca18d.

Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel/com>
Co-authored-by: suryasidd <48925384+suryasidd@users.noreply.github.com>
Co-authored-by: S. Manohar Karlapalem <manohar.karlapalem@intel.com>
Co-authored-by: Aravind <aravindx.gunda@intel.com>
Co-authored-by: Aravind Gunda <38353114+gundaarx@users.noreply.github.com>
2020-10-14 15:56:00 -07:00
Chun-Wei Chen
2b6b3a2ee6
Add GetProfilingStartTimeNs() to Python/C# APIs (#5280)
* add Python API for getProfilingStartTime

* debug for using Python API

* add in C# api

* use uint intead of uint64_t to prevent warning

* typo for GetProfilingStartTimeNs

* remove const

* Update onnxruntime/python/session.py

Co-authored-by: Pranav Sharma <emailpranav@gmail.com>

* remove unnecessary return

* Add Python unit test

* Add C# unit test and refactor Python test

* use ulong in C# for uint64_t in C++

* remove time.monotonic_ns

* syntax: remove public for inner function

* correct the API's order

* getprofilingstarttime after run

* Correct the right order in NativeMethod.cs

* update order

* nit: remove spaces

* Update csharp/src/Microsoft.ML.OnnxRuntime/InferenceSession.cs

Co-authored-by: Guoyu Wang <62914304+gwang-msft@users.noreply.github.com>

* use the updated function

* add comment about the precision

* add more comments

* add session.py back

* fix flake8

* remove session.py

* Add comments in C, C#, Python APIs about precision

Co-authored-by: Pranav Sharma <emailpranav@gmail.com>
Co-authored-by: Guoyu Wang <62914304+gwang-msft@users.noreply.github.com>
2020-10-14 05:32:43 -07:00
Changming Sun
1514509fd7
Update protobuf submodule url (#5477) 2020-10-14 02:35:38 -07:00
Ashwini Khade
44248d9646
opset13 kernel registration (Transpose, Tile, ScatterND, ScatterElements, Gather, GatherElements, Slice, DepthToSpace, SpaceToDepth) (#5454)
* register kernels for opset 13

* fix formatting
2020-10-13 22:10:01 -07:00
Tiago Koji Castro Shibata
fabe02ddc2
Don't change global FPU state during round-half-to-even (#5376)
* Don't change global FPU state

* Handle infinity properly
2020-10-13 20:10:33 -07:00
Ye Wang
67315d8ae0
Optimize openai-gpt/albert model and add fusion test (#5466)
* optimize openai-gpt

* add huggingface model fusion test

* move albert's attention fusion here

* add test for albert fusion
2020-10-13 19:24:14 -07:00
Scott McKay
5544391e79
Fix linking of MLAS unit test lib on platforms where libatomic is required. (#5469) 2020-10-14 07:25:43 +10:00
Bowen Bao
8e9afe1944
Add long type support for SplitToSequence operator (#5367) 2020-10-13 12:57:11 -07:00
Hariharan Seshadri
e01d152464
Add OpSet kernel registrations as part of opset 13 support (#5465) 2020-10-13 10:02:00 -07:00
S. Manohar Karlapalem
6e6147fb75
Use correct protoc tool file name for C# builds (#5429)
In Linux builds, the protoc tool is simply named 'protoc' (without
the .exe extension).
2020-10-13 09:43:03 -07:00
Xiang Zhang
b12824fa7a
add telemetry event for nodejs binding (#5463) 2020-10-12 22:53:01 -07:00
Guoyu Wang
ce5465d5f3
[NNAPI EP] Add Resize and Clip support (#5427)
* Add resize and clip support in NNAPI EP

* Try to get around tensor rt test failure

* Addressed PR comments
2020-10-12 22:29:19 -07:00
KeDengMS
c444b9d76a
Add CUDA option to run copy in default stream (#5445)
* Add CUDA option to run copy in default stream

This change fixes #4829. Thanks @maherzog for providing the repro!

The bug is caused by memory reuse in BFC arena, where copy and
compute stream in CUDA has a racing condition.

BFC arena is an arena allocator on top of cudaMalloc/Free to
reduce the cost in syncing CPU and GPU when alloc/free. It means
when CPU alloc/free the memory, GPU might not finished previous
work on the memory, so that CPU and GPU could run asynchronously.

This is OK if there's only one stream, where the execution order
in CPU and GPU are consistent. For example, if we have two kernels
A and B, CPU runs allocA->computeA->freeA->allocB->computeB->freeB,
A and B could shares the same memory since computeA and computeB
will not have racing as long as they run in the same GPU compute
stream.

However, if CPU runs allocA->CopyA->freeA->allocB->computeB->freeB,
the order of execution in GPU could have copyA happen after computeB,
if copy and compute happens in different GPU streams.

This change makes copy to run in default compute stream, while adding
an option to fall back to previous behavior if there's perf hit. This
is a short term fix before BFC arena could support multiple streams.

User may use following options to revert to previous behavior:
C API:
  struct OrtCUDAProviderOptions cudaProviderOpt;
  cudaProviderOpt.do_copy_in_default_stream = false;
C++ API:
  CUDAExecutionProviderInfo cudaEPInfo;
  cudaEPInfo.do_copy_in_default_stream = false;
C# API:
  pending...
Python:
  import onnxruntime
  onnxruntime.capi._pybind_state.set_do_copy_in_default_stream(False)

* Confirmed the test failes in CI when doing copy in separate stream

Revert the test to get CI pass now

* Fix Windows test

* Address CR
2020-10-12 22:12:05 -07:00
Wenbing Li
80d36eab86
enable the onnxruntime shared library test on iOS (#5443)
* enable the onnxruntime shared library test on iOS

* fixing as commented.

* add return status check.
2020-10-12 21:40:57 -07:00
RandySheriffH
913116e64e
bump ops version to opset13 (#5456) 2020-10-12 20:47:09 -07:00
Sergii Dymchenko
05b1c02d32
Fix commands in README.md. (#5459) 2020-10-12 17:53:09 -07:00
Sherlock
60dbd8a1e5
Update maximum batch size for UT; Include recompute modes (#5444)
* Update MaxBatchSize and include recompute mode
* Minor fix for frontend test

Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-10-12 14:50:43 -07:00
Derek Murray
dbc626dcbe
Add ExpGrad registration and test. (#5438)
**Description**: Add missing gradient registration for the `Exp` op.

**Motivation and Context**
* Adding support for training a model that uses the `Exp` op.

Co-authored-by: Derek Murray <demurra@microsoft.com>
2020-10-12 13:56:08 -07:00
Ashwini Khade
2a018cc235
revert contrib op version bump and deprecation of TransposeMatMul (#5424)
* revert contrib op version bump and deprecation of TransposeMatMul

* update documentation
2020-10-12 13:02:15 -07:00
jingyanwangms
20c47ce91c
Simplified layer norm changes (#5028)
* t5 layer norm changes

* add t5 layer norm kernel

* use template for t5 layer norm

* template definition changes

* no build error

* add CPU cuda kernel

* first unit test

* other forward unit tests

* add T5LayerNormGrad

* Add c++ transform and test for T5 LN

* fix and some debug prints

* fix cuda error

* rename from t5 to simplified

* PR comments

* revert change on invertible LM code path

* remove duplicate forward computation

* add GradientCheckerTest.SimplifiedLayerNormGrad

* change back macro

* Fix SimplifiedLayerNorm Gradient

* merge with Sherlockss changes

* changed cuda kernel

* reapply cpu kernel changes

Co-authored-by: Jingyan Wang <jingywa@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: aishwarya bhandare <aibhanda@microsoft.com>
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-10-12 11:22:12 -07:00
edgchen1
ed60e0fe39
Fix BUILD.md environment variable name typo. (#5402) 2020-10-12 11:17:09 -07:00
Pranav Sharma
5e48c0fd6c
Register opset13 ops: Dropout, Flatten, LRN, MeanVarianceNormalization, ArgMax, ArgMin, Reshape, Shape, Concat. (#5451) 2020-10-12 10:09:38 -07:00
stevenlix
186f0668b0
update onnx-tensorrt submodule (#5442) 2020-10-09 21:49:40 -07:00
Hariharan Seshadri
b9f90e297e
Support sharing of initializers between session via the Python API (#5407) 2020-10-09 20:26:28 -07:00
Ryan Hill
6132e1f6ae
Shared providers - fix logging plus cleanup (#5406)
* Fix logging, cleanup, and implement the remainder of the not implemented functions from the shared provider interface.
2020-10-09 17:31:03 -07:00
Wei-Sheng Chin
6cba42e942
Avoid inserting other CUDA calls in-between NCCL Send's and Recv's (#5430)
* Avoid inserting other CUDA calls in-between NCCL Send's and Recv's

* Add a comment

* Place CUDA EP on the right device

* Fix a warning

* Address a comment
2020-10-09 15:34:46 -07:00
liqunfu
dbe7e6623b
only use/import pytest if needed (by enable_training) (#5437)
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-10-09 12:42:19 -07:00
Dmitri Smirnov
9642f1448e
Add OpSet 13 Registrations (#5426)
Register Sigmoid for OpSet13
  Register OpSet 13 for Sum, Min, Max, Mean.
  Add Erf OpSet 13 registration.
  Register Clip for OpSet 13
  Add Gemm/MatMul Opset 13 resigstartions

Signed-off-by: Dmitri Smirnov <dmitrism@microsoft.com>
2020-10-09 12:39:22 -07:00
Sergii Dymchenko
3a9a1a4ef1
Fix registration for GatherGrad (#5382)
* Fix registration for GatherGrad to fix GatherGradOpTest.GatherGrad_axis0_indices2d_half.

* Fix GatherGrad registration for CUDA also.
2020-10-09 11:57:50 -07:00
liqunfu
1cceefc7d4
use run_orttraining_test_orttrainer_frontend_separately to work aroun… (#5408)
* use run_orttraining_test_orttrainer_frontend_separately to work around a sporadic segfault.

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-10-09 09:16:10 -07:00
Scott McKay
a92ccbe1bc
Various armv7 related fixes (#5394)
* - Link with libatomic if needed
 - Install pip differently so it doesn't clash with the system pip which may involve a wrapper script
 - Remove ability to specify offset when Tensor allocates the data. The data prior to offset isn't accessible by anything.
 - Fix use of offset in TensorOpTest to work on armv7 where it must be aligned to the type it points to.
 - Fix ActivationOpNoInfTest.Softsign to allow for armv7 behavior
 - Fix ReductionOpTest.ReduceMean_*keepdims to allow for armv7 floating point inaccuracy

* Address PR comments
2020-10-09 22:34:32 +10:00
Yufeng Li
b99eaa99cd
Prepacking MatMulInteger (#5403)
* prepack matmulinteger
Prepacking constant matrix B for MatMulInteger to get better performance.
2020-10-09 02:37:19 -07:00
Xavier Dupré
621fdb44e5
Fixes #4688, remove CPUAllocator in TreeEnsemble (#5375) 2020-10-09 11:26:07 +02:00
Keizo Fujiwara
d4507e9331
Use relative path for HEADER_SEARCH_PATHS (#5412)
Currently HEADER_SEARCH_PATHS refers a personal directory.
2020-10-08 23:06:11 -07:00
Ye Wang
90f976d060
Some improvements on transformers tool (#5383)
* modify tensoflow benchmark gpu setting

* add export from tf choice in script

* fix typo

* match more embedlayernorm pattern

* format
2020-10-08 19:35:17 -07:00
Tracy Sharpe
fab7f799a7
MLAS: fix ARM64 + VS2017 build break (#5423) 2020-10-08 18:03:45 -07:00
Sergii Dymchenko
8a632a903f
Remove unused imports from Python tests. (#5405) 2020-10-08 17:24:10 -07:00
Tianlei Wu
15696b8fce
bump version to 1.5.2 (#5420) 2020-10-08 16:30:13 -07:00
Suffian Khan
498f94668d
Keep all_finite tensor on CPU when using PyTorch Frontend (#5371) 2020-10-08 15:47:18 -07:00
Pranav Sharma
c2c78399ee
Include config keys header file in the release packages for Linux and Mac. (#5388) 2020-10-08 15:00:29 -07:00
Changming Sun
09aef240d6
Skip running onnx tests in python mac os pipeline (#5416) 2020-10-08 11:49:28 -07:00
Tiago Koji Castro Shibata
83ead3e2eb
Fix com ptr refcount (#5404) 2020-10-08 10:18:38 -07:00
Yufeng Li
b04cf2d229
Update ORT to 1.5.1 in Bert Quantization Notebook (#5396)
* Update ORT to 1.5.1 in Bert Quantization Notebook
2020-10-08 09:55:01 -07:00
manashgoswami
132ab2230d
Updated with image for creating the onnxruntime pkg (#5400)
* Create Mobile.png

* Update ONNX_Runtime_for_Mobile_Platforms.md

* Update ONNX_Runtime_for_Mobile_Platforms.md
2020-10-08 08:54:27 -07:00
Scott McKay
9684e1b5a8
Add doco for pre-requisites to be able to cross compile for Android on Windows with Java bindings enabled. (#5395) 2020-10-08 12:31:46 +10:00
Tianlei Wu
8133223871
clear cudaDelayLoadedLibs since delayload is disabled (#5386) 2020-10-07 11:33:12 -07:00
Tianlei Wu
8ee2b08325
Allow benchmark different threads (#5390) 2020-10-07 11:13:01 -07:00
Tianlei Wu
094384781e
Add --use_external_data_format in convert_to_onnx.py (#5393) 2020-10-07 09:42:02 -07:00
Guoyu Wang
5947445457
Add flatbuffers verifier for ORT format buffer (#5378)
* Add flatbuffers verifier before accessing data in ort format models

* Address review comments
2020-10-07 09:23:17 -07:00
Guoyu Wang
deb708d3b1
Move flatbuffers to 1.12 release (#5392) 2020-10-07 09:23:03 -07:00