Commit graph

991 commits

Author SHA1 Message Date
Ke Zhang
3bf0e364e2
Move CopyTensor out of IExecutionProvider interface. (#1268)
* add ortdevice class

* add data transfer manager for copying tensors.

* update

* add data trasnfer for gpu

* fix constexpr build break.

* update

* remove unnecessary header files.

* remove unnecessary header files.

* add dependency

* add dependency

* add dependency

* add dependency

* fix linux build break.

* update

* fix build break

* fix build break

* fix build break

* update

* update

* update c api.

* update to not use OrtCreateAllocatorInfo

* change to all eps .

* fix linux build break

* remove useless codes.

* update

* move datatransfermanager in session state

* update

* fix cuda build break.

* fix comments

* fix windows GPU build.

* fix comments

* fix build break

* fix comments

* fix test failure

* update

* fix comments

* fix onnx runtime server.

* update

* fix test failure.

* fix comments

* fix comment
2019-07-11 14:49:20 -07:00
jignparm
e580b76305
Fix ARM64 build + Add NuGet pipeline including ARM binaries (#1335)
* Add arm64 nocontribops pipeline

* minor fix

* Added new template for arm build -- disable all tests

* fix build command

* add arm64 flag for msbuild

* add arm leg as upstream dependency

* update platform to arm64 for msbuild

* remove test task from arm build

* remove ESRP signing of C# dlls in arm build

* Updated to work for both --arm and --arm64

* Make the cross compiling cmake flags symmetric

* Add dynamic check for /Wno-error flag, instead of extra build option

* remove extra full-stop
2019-07-11 11:49:17 -07:00
Maik Riechert
bfda9ca1c1 Make sure submodule urls are up-to-date (#1357)
This extends build.py to run git submodule sync --recursive before running git submodule update --init --recursive. This makes sure submodule URLs are up-to-date.
2019-07-10 13:11:59 -07:00
Changming Sun
20f6c84fd2 Switch to use nvidia-docker2 command format 2019-07-10 13:11:07 -07:00
S. Manohar Karlapalem
a7fcd60572 Add missing 'openvino' option in perftest Usage message (#1367) 2019-07-10 10:58:18 -07:00
Faith Xu
aba7271ad7 Fix links (#1371) 2019-07-10 08:34:31 -07:00
Tracy Sharpe
823fa3f39c
Integrate MLAS NCHWc support into ONNX Runtime (#1327)
This change integrates the NCHWc support recently added to MLAS into ONNX Runtime. When using "-o 3" optimizations, then the runtime will do a NCHWc layout optimization pass to convert standard ONNX operators such as Conv/MaxPool to the com.microsoft.nchwc domain with weights and biases reordered for speed.
2019-07-09 20:41:19 -07:00
Hector Li
42c18762f3
Update the log message for fallback case. (#1370)
Log a warning if the fallback is caused by functional limitation
Log a information if the fallback is by design. e.g Nodes between Shape (CPU output) -> CUDA nodes .. -> ReShape (CPU input)
2019-07-09 16:54:40 -07:00
Tracy Sharpe
c483a1e3c6
Use simpler GEMM function for MatMul operator (#1365)
More cleanup of the math files. Instead of using templates to instantiate a full GEMM for the types added for MatMul (integers and double), use a simpler MatMul function that doesn't do any transposing and assumes alpha=1 and beta=0.
2019-07-09 15:07:50 -07:00
jignparm
57225cd4ee
Add C++ API test for NuGet package (#1364) 2019-07-09 13:51:51 -07:00
Hector Li
298f30546b
Fix the random UT failure for RNN/GRU cases which have padded sequenc… (#1361)
Fix the random UT failure for RNN/GRU cases which have padded sequence. e.g. max_seq = 2. batch_size =2, sequence_lengths = {2, 1}. For the output beyond the shorter sequence {1}, we should initialize the value to 0.

Root cause:
Cudnn library doesn't guarantee the value beyond the shorter sequence.
Fix:
Initialize the output Y data to all 0 before calling cudnn library.
2019-07-09 13:28:11 -07:00
Changming Sun
27da857b51 Fix an SAL annotation in onnxruntime_c_api.h 2019-07-09 10:14:58 -07:00
Vinitra Swamy
6b32c77804 Dockerfiles for TensorRT, CUDA, build from source (#922)
* dockerfile updates for BYOC scenario

* updates for 3 different build versions

* updating to remove libopenblas, python3, python3-pip

* Including LICENSE-IMAGE.txt for CUDA/TensorRT dockerfiles

* remove unnecessary cmake files

* fixing comment typo

* optimizing dockerfile.source as per review suggestions (not working currently)

* Optimizing dockerfiles with install_dependencies script

* update dockerfile with --cmake_extra_defines version number

* add &&\ for license copy lines

* updates, adding miniconda to path, reincluded clearing the pycache

* adding maintainer note

* update readme instructions

* update tensorrt versioning in dockerfile
2019-07-09 02:03:55 -07:00
Maik Riechert
3cae067a9b fix non-standard u_int32_t type (#1358) 2019-07-09 00:19:58 -07:00
Scott McKay
ac6a4afb0f
Add validation of shape when re-using a buffer in ExecutionFrame (#1356)
* Check for empty string as dim_param in allocation planner.
* Validate shape is compatible at runtime when re-using Tensor.
2019-07-09 14:59:07 +10:00
Changming Sun
58d6ff3f13 Remove AgentPool setting in CI yaml 2019-07-08 15:40:54 -07:00
Tracy Sharpe
3a588860cc
remove unused math routines (#1354)
This change removes a number of unused math helpers from core/util/math.h. Most operators are already using MLAS or Eigen directly.
2019-07-08 14:05:27 -07:00
Pranav Sharma
e9ce51ead4
Make GetTensorShapeFromTensorShapeProto return TensorShape and not it's internal representation. (#1353) 2019-07-08 11:45:55 -07:00
Faith Xu
5b93b02c69 Issue template update (#1339)
* Update to include urgency

* Wording update

* Wording update
2019-07-07 23:38:52 -07:00
Faith Xu
b7ae0d5694 Fix link (#1351)
* Fix link

* Update PyOp.md
2019-07-07 21:56:18 -07:00
R. G. Esteves
93528d9b3c Reduce memory footprint of nGraph (#1296)
* Fix unnecessary memory allocation in MKLDNN 1x1 convolution.

* remove the patch header.
2019-07-07 20:23:19 -07:00
NonStatic
9f9ff19bdc Copy shared library after build ORT Server (#1347) 2019-07-07 20:21:16 -07:00
Hariharan Seshadri
2714576d0a Update ONNX Runtime server doc to reference Jupyter notebook (#1340) 2019-07-05 14:30:17 -07:00
Colin Versteeg
a8ff209ab6 Refactor Onnx runtime Server to only use public APIs (#1271)
* replace log sinks

* limit headers to include dir

* first changes to do dynamic linking

* wip for using cxx api

* remove weird dangling dependency

* building with tests failing

* finish updating converters

* fix const

* intital introduction of typedef

* change logging to use spdlog

* get tests passing

* clang format

* map logging levels better

* clean up unused imports

* trent cr comments

* clang-format

* code review comments

* changing buffer use to reserve

* Dynamically link

* revert tvm

* update binary uploading

* catch exceptions by const-ref

* Revert "revert tvm"

This reverts commit 387676dd1018134d15eb71fa126f7caf94380800.

* fix typo

* update versioning of lib
2019-07-04 01:08:14 -07:00
Scott McKay
e3919d3fce
Cleanup naming of test input to use .onnx for models. (#1337)
* Cleanup naming of test input to use .onnx for models.

* Remove file deleted on master
2019-07-04 13:10:29 +10:00
KeDengMS
0d204f3f06
Implementation of TVM codegen library (#888)
Description:

This change adds the common part of TVM based codegen library. It includes following parts:
* Microsoft TVM Inventory (MTI): a set of TVM ops for neural networks, similar to TOPI
* Compiler pass for traversing ONNX graph and generate TVM ops
* Compiler pass for traversing generated graph and specify TVM schedule
* Compiler pass for handling weight layout
* Utils for debugging

Motivation and Context:

TVM is an open deep learning compiler stack for cpu, gpu and specialized accelerators. To leverage it in ONNX, we built an execution provider named Nuphar. Currently, Nuphar gets good performance on CPUs with AVX2 on quantized LSTM models.

This codegen library was part of Nuphar execution provider. It is split out for sharing with other execution providers, as we'd like to reuse TVM in more devices.
2019-07-03 10:32:59 -07:00
Scott McKay
9d3b6b3a49
Disallow overriding initializers if IR version < 4 (#1324)
Description:

Disallow overriding an initializer via a graph input if the IR version is < 4. This enforces an implicit assumption that initializers should be treated as constant, and allows constant folding to be done on a model with an older IR version.
Separate constant and overridable initializers so that it's clear which ones constant folding can utilize.
Update Graph to not add all initializers to the graph inputs when the graph is manually created (i.e. not loaded from a GraphProto) and the IR version is >= 4.
Motivation and Context
In order to do constant folding we need to know which initializers can be treated as constant and which are overridable. All initializers were required to have a matching graph input prior to IR version 4, technically making all of them overridable. The intention however was for them to be treated as constants, and this change enforces that intent.

The benefit of doing so is that constant folding will work for models with IR version < 4. The cost is that if someone is actually overriding an initializer they will need to update the IR version of their model to version 4 in order to keep doing so. The belief is that this is a very small subset of usage (e.g. models involving feeding in a truncated sequence) and the cost to update that small subset is warranted by the benefit of constant folding being able to be enabled on all older models without them needing an IR version update.
2019-07-03 18:43:38 +10:00
Hector Li
2a6c69de2b
Implement the Concat CUDA kernel (#1333)
* Improve CUDA kernel performance for Concat. Implement the kernel code instead of using cudaMemCpy in a loop.

* Update the index lookup part for Concat & Split
2019-07-02 23:08:59 -07:00
Faith Xu
5e54bbffec PyOp documentation Revisions (#1318)
* Revisions

* Minor fix
2019-07-02 18:00:51 -07:00
Ryan Hill
1bf80e30fa
Ryanunderhill/MNIST sample (#1330) 2019-07-02 14:41:27 -07:00
RandySheriffH
bf6a9f9c27
Rashuai/py op example (#1325)
* add scikit example

* format text

* format doc
2019-07-02 09:53:49 -07:00
daquexian
c65489a47f Initial PR for NNAPI execution provider (#1220)
* init

* Update DNNLibrary

* Update DNNLibrary, set compiler flags, it compiles now

* Add more missing flags, add test

* Update DNNLibrary

* Update Compile method, fix allocator and some other bugs

* Update DNNLibrary

* Implement CopyTensor

* Not delete state explicitly since it is managed by unique_ptr

* Add the missing files when SingleUnitTestProjct is ON

* misc changes

* Fix wrong name in provider factory

* Add my own test

* Update the code of add node into graph, and add the missing initializer into graph

* Fix the bug that re-build the graph produces extra output

* Update DNNLibrary

* Transpose nchw (ONNX) -> nhwc (NNAPI)

* Add license

* Add GetSupportedNodes method (implement it later)

* Rename onnxruntime_nnapi_test->onnxruntime_nnapi_squeezenet_test

* Update squeezenet_test.cpp after rebase master

* Remove squeezenet_test.cpp since it is almost same with the c++ sample

* Update DNNLibrary for GetSupportedNodes

* Update GetSupportedNodes

* Revert "Remove squeezenet_test.cpp since it is almost same with the c++ sample"

This reverts commit a97575fd9ff49e50ba1dc8d8154790d8cd86c48d.

* Update DNNLibrary

* Fix multiple outputs bug

* Remove GetKernelRegistry

* Revert "Revert "Remove squeezenet_test.cpp since it is almost same with the c++ sample""

This reverts commit 2a0670e9cbf10ea654111ce39e198a4be0ddd838.

* Set default memory type of NNAPI EP

* Add CPUOutput allocator

* Update DNNLibrary for multiple outputs

* Fix bug of nhwc->nchw

* Remove GetExecutionHandle()
2019-07-02 06:03:29 -07:00
xkszltl
98ea675e40 Fix typo: op[s]iops -> op[t]ions. (#1329)
Resolve https://github.com/microsoft/onnxruntime/issues/1322
2019-07-01 21:25:38 -07:00
Faith Xu
f67a1629fc Documentation reorganization (#1143)
* Update Versioning.md

* Update Versioning.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update BUILD.md

* Update HighLevelDesign.md

* Update Versioning.md

* Update README.md

* Update tool compat table

* typo

* Updates based on feedback

* Update template to include model

* Updates based on feedback

* Typos
2019-07-01 17:11:50 -07:00
S. Manohar Karlapalem
5af2aec3a2 [OpenVINO-EP] use logging infrastructure to display EP log messages (#1289)
* Add logging messages

Implements logging messages at INFO, WARNING and ERROR levels.
Utilizes ONNX-RT's logging infrastrcture.
Reversed IsGraphSupported check logic to facilitate logging.

* Add MO exception text to WARNING log message
2019-07-02 08:28:20 +10:00
Changming Sun
28759e2f6f Uninstall the preinstalled cmake in tensorrt image because it's too old (#1316) 2019-07-01 15:08:01 -07:00
Hariharan Seshadri
a077ac8df5
Support non-default negative axis value and intuitive data type combination for OneHot op (#1317)
* Handle nondefault negative axis value

* Support more intuitive data types for this op
2019-07-01 14:29:33 -07:00
Ashwini Khade
2698edbc98
enable tests (#1310) 2019-06-28 12:04:14 -07:00
Dmitri Smirnov
2f698bd54b
Fix NMS const_cast that modified kernel state creating (#1303)
* Fix NMS const_cast that modified kernel state creating
  thread safety issue. Re-factor for future CUDA implementation.
2019-06-28 09:41:17 -07:00
Matthieu Darbois
04d581995d Use manylinux2010 image to build linux python wheels (#1282)
* Update cuda for python wheels

* Update cuda for python wheels

* Update cuda for python wheels

* Update azure-pipelines-py-packaging.yml

* Update to cuda 10

* Only test win gpu

* Update cuda for python wheels

* Use manylinux2010 image to build linux python wheels

Allow wheels built to truly be compliant with a manylinux policy
2019-06-27 15:45:06 -07:00
Scott McKay
0951f53c80 Update ONNX to d94f99d21a9a0820d58966410ceaf525132f85f1 to pickup change to checker that makes ssd_mobilenet model load 20x faster by avoiding unnecessary copies. (#1307) 2019-06-27 08:39:41 -07:00
jignparm
59de37af1f
Add CUDA Expand operator (#1292)
* Add CUDA expand operator

* Reset counter variables when striding

* Reset counter variables when striding

* use fast_divmod and other PR comments

* Fix merge variable rename

* Fix indentation per PR comment

* Remove maxpool_argmax

* Reduce number of type templates for Expand operator

* removed all types

* Commit updated cuda_execution_provider.cc
2019-06-27 02:31:56 -07:00
ybrnathan
a79ab5ec5b
Add document for ONNX Runtime latency profiling and JSON file viewing. (#1301) 2019-06-26 21:58:10 -07:00
Pranav Sharma
b8d370029f
Check that specific inputs are constants in Conv(Add|Mul|BN)Fusion rules (#1270)
* Check for non-existent initializers while fusing conv and add.

* Fix other places where initializer can be null

* Add check if initializer is an input

* update the models to comply with the new ONNX spec.

In new ONNX spec, the initializers should not be in inputs.

* Fix previous temporary code

* Add negative test

* Revert changes to conv_bn_fusion and conv_mul_fusion

* making helper IsNodeArgConstant a little more general; updating remaining Conv*Fusion rules

* minor comment

* AllNodeIputsAreConstant to use new function
2019-06-26 15:42:05 -07:00
jignparm
089b1ef3bb Enable max/average pooling onnx_test_runner tests (#1129)
* Enable max/average pooling tests

* minor edit

* comment out maxpool_with_argmax
2019-06-26 13:45:35 -07:00
Ashwini Khade
05a222a961 enable quantization tests (#1293) 2019-06-26 10:08:19 -07:00
Tracy Sharpe
3ebad81abc
MLAS: NCHWc low-level changes (#1283)
Implementation of the MLAS changes for NCHWc convolution/pooling support. These changes adopt the blocking format used by MKL-DNN and other convolution libraries for better performance.
2019-06-25 16:57:30 -07:00
Scott McKay
a462328d9d
Handle case where the Loop 'M' and 'cond' inputs can be considered scalars but the rank doesn't match the subgraph. Use the subgraph rank when creating the MLValue instance for the subgraph input. (#1285) 2019-06-26 09:33:11 +10:00
RandySheriffH
c0cf2213bc
Parallelize TreeEnsembleClassifier batch predition (#1276)
* use openmp for loop

* Fix windows compile err

* fix windows com err
2019-06-25 12:13:05 -07:00
jignparm
a56b294428
Activate compliance tasks for private builds, and also set a daily scheduler (#1280)
* add compliance and build schedules

* cpu-esrp-pipeline.yml

* update schedule time for testing

* add schedules to all pipelines
2019-06-25 11:13:55 -07:00