Commit graph

692 commits

Author SHA1 Message Date
Pranav Sharma
3ff97de8da
Modify roialign to conform with the new onnx spec and take it out from contrib ops. (#901)
* support non-tensor types

* support non-tensor types.

* support non-tensor types.

* fix compilation issues

* fix compilation issues

* Build without mkldnn for release packages. We'll default to MLAS.

* Modify roialign to conform with the new onnx spec and take it out from contrib ops.
2019-04-24 20:30:44 -07:00
Yufeng Li
73bc09421c Fix deadlock in parallel executor (#891)
Fix deadlock in parallel executor
  Execute immediately if ParalellFor has only 1 task
2019-04-24 15:55:04 -07:00
nivas-x86
ba3b82648e ng ep update1 (#895) 2019-04-24 10:35:26 -07:00
Dmitri Smirnov
95ac7a2f35
Implement separators as regex (#857)
* Implement separators as regex
2019-04-24 10:23:45 -07:00
Changming Sun
1f066d4dc4 Update onnx (#893) 2019-04-24 21:31:49 +10:00
Hariharan Seshadri
9d89b23d81
BatchNorm CPU does not support non-spatial cases - explicitly handle such cases (#890)
* BatchNorm CPU does not support non-spatial cases

* skip test in c#

* Update comments
2019-04-23 21:37:21 -07:00
Yufeng Li
d0f846aad5
Tuning GRU performance for batch size >= 2 (#644)
GRU with batch size >1 is implemented on the assumption that Lotus use single-thread Eigen Gemm. The assumption doesn't hold anymore. MLAS and MKLML support multi-thread. We don't rely eigen gemm anymore.
This PR implements batch size > 1 as batch size ==1. With this change, we have about 2x performance gain for GRU.Please refer to the performance test below:
(ms)
Batch_size | Seq_length | input_size | hiddden_size | Old | New
8 | 30 | 512 | 512 | 19.16 | 10.47
16 | 30 | 512 | 512 | 28.13 | 15.15
32 | 30 | 512 | 512 | 36.97 | 26.89
8 | 30 | 1024 | 1024 | 142.853 | 55.67
16 | 30 | 1024 | 1024 | 184.397 | 72.32
32 | 30 | 1024 | 1024 236.364 | 112.78
2019-04-23 14:50:11 -07:00
Changming Sun
80d69515ed
C API: catch exceptions in OrtCreateStatus (#821) 2019-04-23 14:41:44 -07:00
Changming Sun
11806529d0
Update test data (#864)
Add:

1. mxnet_arcface
2. tf_mobilenet_v1_1.0_224
3. tf_mobilenet_v2_1.0_224
4. tf_mobilenet_v2_1.4_224
5. tf_inception_v2
2019-04-23 13:24:24 -07:00
Hariharan Seshadri
4b4b585f58
Fix minor bugs in RemoveDuplicateGraphTransformer (#883) 2019-04-23 11:30:20 -07:00
Ashwini Khade
fb3b63438d
Add python api for graph optimization level (#882) 2019-04-23 11:11:35 -07:00
ybrnathan
b0a37477db Fix memory corruption issue when CPU->CUDA memcpy is involved (#879) 2019-04-22 20:21:14 -07:00
Dmitri Smirnov
7d7627b1ac
Implement IsInf (#871)
* Implement IsInf.
2019-04-22 17:45:54 -07:00
Yufeng Li
0bf12e9dbf
Add option to enable/disable memory pattern back (#872)
Memory pattern doesn't work for parallel executor by design. Enabling Memory Pattern for parallel executor logs warning and make the perf bad.
Add option to enable/disable memory pattern back.
2019-04-22 13:49:41 -07:00
Hector Li
e8d722003a
Move NMS to Onnx domain (#865)
* move files

* move files

* Remove NonMaxSuppression from Contrib op, move it to Onnx domain, opset 10

* move NMS out of namespace contrib

* update data type in UT

* update to latest onnx

* white list the node test for Mod which is not implemented yet
2019-04-22 13:24:27 -07:00
Lei Zhang
2947e1f9d4 Fix onehot code arm build break. 2019-04-22 11:06:20 -07:00
Changming Sun
2879ee8bd2 Fix a few warnings (#762)
* Fix warning in tensor_type_and_shape.cc

tensor_type_and_shape.cc:139:18: error: ‘out’ may be used uninitialized in this function [-Werror=maybe-uninitialized]

* fix warnings
2019-04-22 09:45:02 +10:00
Tracy Sharpe
cb69c65756
Update MLAS to be able to build standalone again (#874)
Change MLAS to be able to build standalone without onnxruntime header dependencies. This is enabled when building with MLAS_NO_ONNXRUNTIME_THREADPOOL defined.
mlas.h had been changed to include the ThreadPool header, but this header now just has a forward reference for the class. The header was also doing a "using onnxruntime::concurrency"; that has been removed and the external mlas.h users fixed up as needed.
As before, if ThreadPool==nullptr, then MLAS uses OpenMP or falls back to a single threaded implementation. The build option to use the Win32 system thread pool has been removed as onnxruntime can't hit that path and I don't use that option for standalone tests anymore.
2019-04-21 14:04:15 -07:00
nivas-x86
a4d7052aeb Add nGraph Execution Provider (#832)
* Add nGraph Execution Provider

* feedback changes 1

* feedback2

* Feedback and upgrade nGraph

* Feedback 4

* Fix CI

* Disable new ops
2019-04-20 17:02:35 -07:00
Changming Sun
7e1edbb9a2 Fix a build error in onnxruntime/core/common/threadpool.cc 2019-04-19 15:59:15 -07:00
Changming Sun
d78c340eac update onnx (#861)
* update onnx

* ignore some tests
2019-04-19 10:52:47 -07:00
jignparm
b2268a6378
removing specific target framework for c-api test (#860) 2019-04-18 23:58:18 -07:00
Pranav Sharma
07a4ecbddb
Disable tests for certain models (Cherry pick from 0.3.1) (#842)
* Disable tests for certain models (Cherry pick from 0.3.1)

* Disable more tests

* More tests

* even more tests

* Fix gpu builds

* Disable L2 transformers

* Env variable to disable contrip ops for csharp tests
2019-04-18 23:57:52 -07:00
Pranav Sharma
780aad8fd0 Eliminate unused code and data from Linux binaries. (#849) 2019-04-18 23:00:27 -07:00
Konstantinos Karanasos
f09a76d669 Don't trigger constant folding in subgraphs (#858)
* don't trigger constant folding in subgraphs
2019-04-18 22:59:19 -07:00
Changming Sun
687bac455d Convert eigen to a submodule and update it to the latest version 2019-04-18 21:24:56 -07:00
Konstantinos Karanasos
ada90086f7
More efficient rule-based transformer (#815)
Introduce a quick pre-filtering of rules based on the node op types they are targeting.
The goal is to avoid evaluating all rules for all nodes. Instead, for each node, we will only be evaluating the rules associated with its op type.
2019-04-18 17:10:13 -07:00
Bowen Bao
ed0c86cd90 update onnx to fix matmul shape inference (#847)
* update onnx to fix matmul shape inference

* update onnx submodule hash in cgmanifest.json and ci scripts
2019-04-18 14:52:48 -07:00
stevenlix
f2694ab526
Enable provider unit tests for TensorRT (#802)
* Update provider_test_utils.cc

* Update tensorrt_execution_provider.h

* Update tensorrt_execution_provider.cc

* Update gemm_test.cc

* Update softmax_test.cc

* Update logsoftmax_test.cc

* Update matmul_test.cc

* Update batch_norm_op_test.cc

* Update conv_op_test.cc

* Update batch_norm_op_test.cc

* Update softmax_test.cc

* Update conv_transpose_op_test.cc

* Update instance_norm_op_test.cc

* Update flatten_op_test.cc

* Update loop_test.cc

* Disable failed tests for TensorRT

* Disable unsupported tests for TensorRT

* Disable unsupported tests for TensorRT

* Disable unsupported tests for TensorRT

* Disable unsupported tests for TensorRT

* Update matmul_test.cc

* Update logsoftmax_test.cc

* Update topk_op_test.cc

* disable unsupported tests for TensorRT

* resolve conflicts

* Update identity_op_test.cc

* Update activation_op_test.cc

* make max batch size configurable and simplify the code for disabling unsupported tests

* make max batch size configurable at runtime

* update tensorrt ci pipline

* move max batch size to private

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.h

* Update tensorrt_execution_provider.cc

* add comments on the test changes

* Update tensorrt_execution_provider.h

* Update tensorrt_execution_provider.cc

* Update build.py
2019-04-18 13:20:37 -07:00
shschaefer
ff253631b5
Enable use of session based threadpool. (#854)
* Enable use of session based threadpool.

* Fix build dir issue
2019-04-18 10:20:46 -07:00
daquexian
ac82c1f483 enable android build (#715)
* enable android build

* Add 'log' to onnxruntime_EXTERNAL_LIBRARIES

* Remove cmake about header_files_test.cc

* Add Android CI pipeline

* Remove some ms-specific(?) ci

* Fix bash error

* Add execute flag for install_deps_android.sh

* Add install_ubuntu_for_android.sh

* Remove python in deps for android

* Add comment for BUILD_ARCH

* Set BUILD_SERVICE to cpu

* Set BUILD_OS in run_build.sh

* Fix -o bug in run_build.sh

* Android -> android

* Correct the android ndk location

* Checkout submodules in my own azure pipelines

* Revert "Remove some ms-specific(?) ci"

This reverts commit 302463213480487d8944c3127a3b311c591d55c0.

* Revert "Checkout submodules in my own azure pipelines"

This reverts commit 1acfb6755f933e532b8312ca35bb4900a833903f.
2019-04-18 09:59:04 +08:00
Ke Zhang
951c428ee1
Simplify the validation in Run call (#850)
* Simpplify Run()

* remove the lock

* remove a file added wrongly.

* fix tests

* fix c# test
2019-04-18 08:38:17 +08:00
Changming Sun
38a0c0b0a7 Fix a bug in perf test runner 2019-04-17 16:43:39 -07:00
Raymond Yang
2a2de42bb2
Add docker image clean script (#844)
* Add docker image clean script

* Change the command not to generate warning if no such image presents

* Update linux-gpu-ci-pipeline.yml

* Update linux-ci-pipeline.yml

* Update azure-pipelines-py-packaging.yml
2019-04-17 11:20:41 -07:00
Hector Li
f1af493b75
Fix some issue in CUDA GRU and ReduceSum (#845)
* Fix issues in GRU GPU implementation. The cudnnGetRNNWorkspaceSize could failed because some descriptor are defined as local variable and are destroyed.

* Fix the issue for ReduceSum.  cudnnReduceTensor for ReduceSum has issue if input and output has same size, we just need to copy the data for this case.
2019-04-17 09:48:08 -07:00
Rui Xia
9fb7e98c0b fix the allocator type in lru of cuda conv algorithm cache. (#848) 2019-04-16 23:58:58 -07:00
Ke Zhang
41dc3130f5
no need putting initializers (for constant node) into graph inputs. (#665)
* constant node should not be put into graph inputs any more.

* simplify graph input/output set logic.

* refactor comments.

* remove adding initializers as graph inputs when creating graph from scratch.
2019-04-17 07:38:08 +08:00
RandySheriffH
60d71d63b5
Rashuai/onnx test reduce mem (#790)
* define new test load function

* remove bak file

* add stat operator

* add arguments

* fix comments

* try enable fp16_tiny_yolov2 on linux

* fix compile err

* try enable fp16_tiny_yolov2
2019-04-16 15:47:52 -07:00
Tracy Sharpe
3a8b9a4918
fix trivial size_t warnings (#843) 2019-04-16 14:37:50 -07:00
Ashwini Khade
14d63b5f45
generate transformers bug fix (#838)
* fix graph transformer generation

* add more tests

* cosmetic changes

* more changes per review
2019-04-16 14:10:33 -07:00
Du Li
1818835795
Adding kernels for Resize op (#809)
* Adding the kernel for Resize op.

* Fixing a bug in nearest neighbour.

* remove gpu resize kernel.
will add it in another pr.

* fix a bug.

* Accomodating PR comments.
2019-04-16 13:05:40 -07:00
Pranav Sharma
29ad798c56
Update license - came up during IP scan (#841)
* support non-tensor types

* support non-tensor types.

* support non-tensor types.

* fix compilation issues

* fix compilation issues

* Build without mkldnn for release packages. We'll default to MLAS.

* Update license - came up during IP scan
2019-04-16 11:01:02 -07:00
Ashwini Khade
07e6dfa7ab
update onnx and enable tests for qlinearconv (#840) 2019-04-16 09:43:17 -07:00
jignparm
7775551a6f
Refactor C# and native packaging tests (#825)
* Refactor C# and native packaging tests

* Pass package name into docker

* add libiomp5ml.dll required by mklml.dll
2019-04-16 00:00:07 -07:00
Pranav Sharma
54e04cb8bb
cherry pick PR from 0.3.1 release - enable MSVC static runtime (#837) 2019-04-15 22:37:47 -07:00
Mika Fischer
b2658b3594 Cache CUDNN convolution benchmark results in cuda::Conv kernels (#712)
* Cache CUDNN convolution benchmark results in cuda::Conv kernels

Previously, the best convolution algorithm was determined by running
cudnnFindConvolutionForwardAlgorithmEx and cudnnFindConvolutionBackwardDataAlgorithmEx
on every shape change.

This is very detrimental for variable input shapes, such as variable batch
sizes.

This change adds a map to cache previously determined benchmark results.

The caching results in significant speedups for variable input shapes.

* Use LRU to limit cached benchmark results

* Only cache benchmark results for a fixed weight shape

In case the weight shape changes, all cached results are discarded.

* Use padded shape as key for cached benchmarks

* Add constant for max number of cached benchmark results

* Use unordered_map to store cached benchmark results

* Only store the parameters that are actuallt needed
2019-04-15 22:15:14 -07:00
Tracy Sharpe
f19d9a4907
Reduce code size of kernel registration (#833)
Some changes that reduce the size of the release onnxruntime.dll by 170KB:

Change the ONNX_OPERATOR_KERNEL macros to not create a unique virtual class per kernel create lambda, but instead use a generic class with the raw function address supplied at BuildCreateKernelInfo time.

Changed the exceution providers to use a table driven approach to calling the BuildCreateKernelInfo functions instead of a massive function with construct/call/delete sequences.

The CreateFunc in data_types.h didn't need to be a std::function, eliminating more lambda virtual classes.

N.B. To accommodate MSVC 14.11 toolchain (used for CUDA builds), the operator+() syntax cannot be used to retrieve the raw function address. The older toolchain can't resolve between cdecl/vectorcall and gives up. An explicit cast is needed to help the compiler along.
2019-04-15 16:39:59 -07:00
Pranav Sharma
049ba2d747
Exclude tests that fail when contrib ops are disabled. (#835) 2019-04-15 15:57:48 -07:00
Pranav Sharma
4b4a359943
Exclude unreferenced global data and op doc strings in the opschema object. The first causes a decrease in the binary size by at least 85k. The latter reduces resident memory size. (#823)
* Exclude unreferenced global data and op doc strings in the opschema object. The first causes a decrease in the binary size by at least 85k. The latter reduces resident memory size.

* Update onnx to incorporate my PR that fixes SetDoc compiler warnings
2019-04-15 15:57:19 -07:00
Ashwini Khade
e999af61b2
bug fix for shape inference (#834) 2019-04-15 15:51:12 -07:00