Commit graph

1409 commits

Author SHA1 Message Date
Scott McKay
3fcb4ee7d4
Refine optimizers (#1407)
* Refine optimizers

* Address PR comments

* Changes from PR comments and discussion.

* Fixed signed/unsigned mismatch

* Address PR comments

* Address PR comments

* Fix linux build

* Fix issue with mkldnn logic.

* Turn off optimizers by default for operator unit tests.

* Handle edge case of graph with no nodes in partitioner so all execution providers don't need to.

* Comment out change to turn off optimizers for unit tests. Add details on what needs to be done to re-enable.
2019-10-15 14:49:59 -07:00
Sreekanth Yalachigere
485c24b62d MKL-DNN 1.0 (#2134)
* MKL-DNN 1.0

* changed libmkldnn version to 1
2019-10-15 12:06:34 -07:00
Hariharan Seshadri
6857bb8aba
Fix bug in GatherElements (#2130)
* Fix bug in GatherElements

* Uncomment some tests

* Updates

* Nits

* Nits

* Nits
2019-10-15 07:54:42 -07:00
shahasad
7ef02f14d2
Add missing test model file for symbolic dimensions (#2123) 2019-10-15 06:55:51 -07:00
Adrian Tsai
4090d0d0de
Add DirectML Execution Provider (#2057)
This change adds a new execution provider powered by [DirectML](https://aka.ms/DirectML).

DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning on Windows. DirectML provides GPU acceleration for common machine learning tasks across a broad range of supported hardware and drivers.

The DirectML execution provider is capable of greatly improving evaluation time of models using commodity GPU hardware, without sacrificing broad hardware support or requiring vendor-specific extensions to be installed.

**Note** that the DML EP code was moved verbatim from the existing WindowsAI project, which is why it doesn't yet conform to the onnxruntime coding style. This is something that can be fixed later; we would like to keep formatting/whitespace changes to a minimum for the time being to make it easier to port fixes from WindowsAI to ORT during this transition.

Summary of changes:
* Initial commit of DML EP files under onnxruntime/core/providers/dml
* Add cmake entries for building the DML EP and for pulling down the DirectML redist using nuget
* Add a submodule dependency on the Windows Implementation Library (WIL)
* Add docs under docs/execution_providers/DirectML-ExecutionProvider.md
* Add support for DML EP to provider tests and perf tests
* Add support for DML EP to fns_candy_style_transfer sample
* Add entries to the C ABI for instantiating the DML EP
2019-10-15 06:13:07 -07:00
KeDengMS
b101f1bcee
Nuphar: Fix a bug in weight layout where read may go out of bound (#2129) 2019-10-15 00:11:41 -07:00
Yang Chen
5c2803f2d5
various fixes for shape inference script (#2124)
* use dilations for computing effective kernel shape for conv/pool ops

* when auto_pad is 'VALID', total_pads should be empty

* added support for ArrayFeatureExtractor and ZipMap

* check out_shape only if the output has shape, i.e. output is of TensorType
  or SparseTensorType
2019-10-14 19:44:29 -07:00
Hariharan Seshadri
95ab5ad39f
Support non-spatial mode in BatchNormalization (#2092)
* Initial commit

* Update

* Update

* Fix build break

* Update

* More changes

* Update type

* Exclude Nuphar for non-spatial tests

* Update

* Resolve PR comments
2019-10-14 18:14:14 -07:00
Yufeng Li
2536553136
use cublasHgemm for Volta GPU (#2074)
* use cublasHgemm for Volta GPU
2019-10-14 17:29:13 -07:00
Yufeng Li
8c5db7f973
use legacy stream mode (#2076)
In ORT, there is only 3 cuda stream: default, HtoD, DtoH. And both HtoD and DtoH are non-blocking stream. Thus, per-thread stream mode doesn't have any benefit.
I also tried in multiple thread env and the legacy mode is also better than per-thread model.
Below is the perf of a 3 layer bert on v100. Unit is ms:
batch size 1:
 concurrency | c=1 | c=2 | c=4
legacy | 0.54 | 1.17 | 2.68
per-thread | 0.66 | 1.37 | 2.86
 
batch size 4:  
 concurrency | c=1 | c=2 | c=4
legacy | 1.1 | 2.22 | 4.6
per-thread | 1.21 | 2.44 | 4.98

batch size 64:
concurrency  | c=1 | c=2 | c=4
legacy | 8.09 | 16.13 | 32.37
per-thread | 8.18 | 16.26 | 32.45
2019-10-14 16:03:04 -07:00
Hariharan Seshadri
80d09f0c59 Allow creation of empty tensors in c# (#1976)
* Allow creation of empty tensors in c#

* Keep test with updated behavior

* Add more empty tensor tests

* Nits
2019-10-14 14:47:02 -07:00
Hector Li
640f71c91b
Enable Gpu multi-device test for CUDA EP and Trt EP
Enable multi-device test for GPU
* Add build pipeline for TensorRT multi-GPU test
* Add code to disable fp16 test if hardware architecture not supported
* Add option to set the device id in onnx_test_runner for model tests
2019-10-14 11:16:34 -07:00
Tomasz Socha
f93be8af90 Update nGraph to version 0.26 (#1965)
* Adjust ngraph cmake files to onnx 1.5.0

* Enable LSTM reverse direction mode in nGraph EP

* Enable full support for the Split op in nGraph EP

* Revert "Disable the unsigned input Shrink op tests for nGraph until the next update"

This reverts commit 257b42a55bdd98f804d4846868542b8e3aeb4b4e.

* Enable Gather and remove unused subgraph attribute

* Remove the unused param from AppendClusterToSubGraph

* Fix for the incorrect onnx opset version

* Use the r0.26 release branch before the tag is created

* Enable the quantizelinear and dequantizelinear for NGEP

* Use the v0.26.0-rc.2 tag in ngraph.cmake

* Add skip for modes others than default in Pad operator

* Reenable negative axis tests for ngraph

* Use temporary ngraph version

* Use branch name instead of SHA for temporary ngraph branch

* Use ngraph v0.26.0-rc.4

* Remove patch for missing symbol in MKLDNN

* Use MKLDNN 1.0 in ngraph

* Exclude the Pad op for opsets greater than 10

* Disable quantizelinear and dequantizelinear tests for ONNX 1.5.0

* Fix the onnx-headers related compilation errors

* ONNX libs linking fix

* Use a tag for ngraph and support more Pad modes

* Use the v0.26.0 release tag for nGraph

* Update ngraph to RC8 - bigobj flag for Windows builds

* Fix the MKLDNN constexpr error on Windows
2019-10-14 10:37:48 -07:00
Pranav Sharma
91db840b6b
Introduce execution mode enum for clarity and extensibility; Change Python, C and C# APIs accordingly; Removed EnableSequentialExecution, DisableSequentialExecution in favor of the more general SetExecutionModeAPI. (#2098)
* Introduce execution mode for clarity and extensibility; Change Python APIs accordingly; Replace DisableSequentialExecution API with EnableParallelExecution for clarity.

* Fix cuda build

* Modify the test slightly

* Make C and C# APIs consistent with Python.
2019-10-14 09:48:19 -07:00
Changming Sun
5558b80774
clean up ubuntu docker scripts (#2103) 2019-10-14 07:20:20 -07:00
KeDengMS
9363c14d23
Fix a bug in nuphar settings where existing options cannot be override (#2113)
Fix a bug in nuphar settings where existing options cannot be override
2019-10-13 23:02:37 -07:00
Scott McKay
b829d55320
Fix invalid logic that ran past end of nodes and double increment. (#2117) 2019-10-13 11:37:57 +10:00
Scott McKay
eb24617d2e Add ability to get symbolic dimension info for graph inputs and outputs. (#2051)
* Add ability to get symbolic dimension info for graph inputs and outputs.
WIP to get initial feedback.

* Fix linxu build error.
Update C# API and add unit test

* Clarify the two different ways Tensor shape and type info is created. One is from concrete values and one is from a type proto where symbolic dimensions may exist. Doing so allows a change to default to empty strings for the symbolic dimensions if not provided.
2019-10-12 15:46:28 -07:00
jignparm
20515363e5
Add int32 and int64 types for Equal(11) (#2112) 2019-10-12 11:16:19 -07:00
Scott McKay
50faab308b
Remove 'Ort' prefix from OrtAddFreeDimensionOverride for consistency. (#2099) 2019-10-12 06:11:29 +10:00
Ryan Hill
e8e33977da
Ryanunderhill/customop dll (#2002)
* Add OrtApiBase
* Add RegisterCustomOpsLibrary API
2019-10-11 11:12:51 -07:00
Changming Sun
c24d7a8a0a
Update eigen to the latest version (#1910) 2019-10-11 10:44:19 -07:00
Scott McKay
bdfff800ea Move access to intra-op threadpool into OpKernelContext. (#2091) 2019-10-11 10:36:20 -07:00
Changming Sun
368bdfd936
Update README.md (#2070)
Update the vcredist package link

Note: Visual C++ 2015, 2017 and 2019 all share the same redistributable files.
2019-10-11 10:06:50 -07:00
Hector Li
3b335c933f
Fix issue that TRT not work for device other than device id 0
Fix issue that TRT not work for device other than device id 0. Because the allocation planner need to get the default allocator to allocate memory for graph input data. (#2094)
2019-10-11 09:22:25 -07:00
Scott McKay
ffb94fd170
Fix bug with delayed allocation of If and Scan outputs. (#2024)
* Fix bug with delayed allocation of If and Scan outputs.
If the subgraph is producing output on a non-CPU device the delayed allocation was incorrectly providing a CPU allocated tensor.
Check for the required location, and update 'fetches' instead if a device copy is needed.
The utils::ExecuteGraph logic will handle the device copy in this case.
2019-10-11 19:49:21 +10:00
Yang Chen
ca1b88c069
Added support to infer Pad11 (#2085)
* Added support to infer Pad11

* address CR
2019-10-10 23:18:49 -07:00
shahasad
8803f6fff4
C# end to end test fix, and make end to end tests mandatory (#2079) 2019-10-10 19:23:43 -07:00
Changming Sun
a314402097
Downgrade python gpu package to CUDA 10.0 (#2086) 2019-10-10 18:31:24 -07:00
Dmitri Smirnov
af9dbb70f2
Introduce a separate check and conditional for AVX512BW build (#2083)
Separate checks for AVX512f and AVX512BW
  Make AVX512BW cmake instructions nested within AVX512F support.
2019-10-10 16:14:00 -07:00
Hariharan Seshadri
2ba705ed99
Handle nodes with subgraphs in ORT function handling implementation (#2053)
* Initial commit

* Update

* Update

* Nits

* More updates

* to be reverted

* Update

* Update

* More changes

* Updates

* Update Function

* Nits

* Fix build break

* Comment
2019-10-10 16:07:42 -07:00
Pranav Sharma
2d4d0abd36
Add support for output seq(tensor) in python binding and test framework. Implement SequenceConstruct, SequenceEmpty, SequenceInsert and SequenceErase ops. (#2040)
Add support for output seq(tensor) in python binding and test framework. Implement SequenceConstruct, SequenceEmpty, SequenceInsert and SequenceErase ops. (#2040)
2019-10-10 15:58:49 -07:00
Scott McKay
ddbc2086e4 Add support for opset 11 Clip in optimizers. (#2059) 2019-10-10 10:47:29 -07:00
Yulong Wang
a41c71cbf2
check and fix CUDA kernel launch errors in several OPs (#2047) 2019-10-10 23:47:00 +08:00
baowenlei
b4a98aab78
change MatMulInteger/MatMulInteger16 fallback option (#2064)
* change MatMulInteger/MatMulInteger16 fallback option when no initializer exist

* add AVX option

* fix condition for old machines
2019-10-09 22:03:21 -07:00
Hariharan Seshadri
d186c19c45
Add opset-11 TopK CPU kernel (#1912)
* initial commit

* Update

* Update top_k.cc

* PR comments

* Add more tests

* Update

* Add another test case

* Update

* Resolve conflicts

* Update

* Nits

* Nits

* Nits

* Pick sorted content using 2 different approaches

* Update to logic

* PR comments

* PR feedback

* Update

* Fix build

* Fix build

* Update
2019-10-09 19:09:30 -07:00
Colin Versteeg
8fda6593fe Update failing tests (#2038)
* Fix failing tests from when they were not enabled

* split into two

* fix failing test
2019-10-09 15:17:21 -07:00
Tracy Sharpe
57e0099425
MLAS: Implement U8S8 GEMV kernels (#2069)
This implements an optimization for U8S8 MlasGemm when M=1, aka GEMV.
2019-10-09 11:54:16 -07:00
Changming Sun
eee9c55030
C++11 fix for memcpy_transformer_test.cc (#2061) 2019-10-09 10:52:10 -07:00
Changming Sun
cefae93305
Add a test case for linearregressor (#1962) 2019-10-09 10:17:08 -07:00
Changming Sun
ccaf692ff2
Run auditwheel for manylinux1 (#2063) 2019-10-09 09:23:00 -07:00
Dmitri Smirnov
cae571c713 Add a test for AVX512 compilation before compiling 512 asm (#2055) 2019-10-08 21:18:04 -07:00
Changming Sun
af8fe0f980
Replace make_unique in cuda_utils.cu (#2052) 2019-10-08 18:32:08 -07:00
Scott McKay
db0dd09ded
Cleanup some aspects of the Initializer class used by optimizers (#2005)
* Move check on data type outside of the Initializer class as it's specific to Conv processing.
Use references for arguments that can't be null.
2019-10-09 10:37:44 +10:00
Changming Sun
a00ca56ae1
Remove gcc from manylinux1 docker image (#2048) 2019-10-08 13:49:15 -07:00
baowenlei
b82de794d5
Weba/update nuphar doc (#2026)
* update nuphar xp doc

* address comments

* address CR

* update doc
2019-10-08 12:41:25 -07:00
RandySheriffH
f501b6e234
pack pyop in nightly build (#2018)
* pack pyop in nightly build

* correct logic

* add comment

* exclude debug build

* add dependency

* reset postbuild rule

* remove dep
2019-10-08 12:02:45 -07:00
Changming Sun
e9bed8b23b
Change python packaging pipeline to use manylinux1 (#2035)
1. Change the python packaing pipeline to use manylinux1
2. Temporarily disable model test in the python pipeline.
2019-10-08 10:03:54 -07:00
Changming Sun
3053af812c
Fix a crash in deep_cpu_gru_op_test.cc (#2028) 2019-10-08 10:03:07 -07:00
Zhang Lei
71b389322e Implement cuda scatter op. (#1991)
* Implement cuda scatter op.
Disable Invalid Index of Scatter op only for cuda provider.

* Fix some pipeline's type narrow warning as error.
2019-10-08 09:53:33 -07:00