Commit graph

244 commits

Author SHA1 Message Date
Paul McDaniel
d1159b7008 Adding platform telemetry (#2109) 2019-10-19 18:25:57 -07:00
Changming Sun
021073b5e5
Update python packaging pipelines (#2167) 2019-10-19 07:42:54 -07:00
Tomasz Dołbniak
72110d3508 Patch for the MKLDNN v1 segfaults (#2145) 2019-10-17 12:10:00 -07:00
Scott McKay
3fcb4ee7d4
Refine optimizers (#1407)
* Refine optimizers

* Address PR comments

* Changes from PR comments and discussion.

* Fixed signed/unsigned mismatch

* Address PR comments

* Address PR comments

* Fix linux build

* Fix issue with mkldnn logic.

* Turn off optimizers by default for operator unit tests.

* Handle edge case of graph with no nodes in partitioner so all execution providers don't need to.

* Comment out change to turn off optimizers for unit tests. Add details on what needs to be done to re-enable.
2019-10-15 14:49:59 -07:00
Sreekanth Yalachigere
485c24b62d MKL-DNN 1.0 (#2134)
* MKL-DNN 1.0

* changed libmkldnn version to 1
2019-10-15 12:06:34 -07:00
Adrian Tsai
4090d0d0de
Add DirectML Execution Provider (#2057)
This change adds a new execution provider powered by [DirectML](https://aka.ms/DirectML).

DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning on Windows. DirectML provides GPU acceleration for common machine learning tasks across a broad range of supported hardware and drivers.

The DirectML execution provider is capable of greatly improving evaluation time of models using commodity GPU hardware, without sacrificing broad hardware support or requiring vendor-specific extensions to be installed.

**Note** that the DML EP code was moved verbatim from the existing WindowsAI project, which is why it doesn't yet conform to the onnxruntime coding style. This is something that can be fixed later; we would like to keep formatting/whitespace changes to a minimum for the time being to make it easier to port fixes from WindowsAI to ORT during this transition.

Summary of changes:
* Initial commit of DML EP files under onnxruntime/core/providers/dml
* Add cmake entries for building the DML EP and for pulling down the DirectML redist using nuget
* Add a submodule dependency on the Windows Implementation Library (WIL)
* Add docs under docs/execution_providers/DirectML-ExecutionProvider.md
* Add support for DML EP to provider tests and perf tests
* Add support for DML EP to fns_candy_style_transfer sample
* Add entries to the C ABI for instantiating the DML EP
2019-10-15 06:13:07 -07:00
Yufeng Li
8c5db7f973
use legacy stream mode (#2076)
In ORT, there is only 3 cuda stream: default, HtoD, DtoH. And both HtoD and DtoH are non-blocking stream. Thus, per-thread stream mode doesn't have any benefit.
I also tried in multiple thread env and the legacy mode is also better than per-thread model.
Below is the perf of a 3 layer bert on v100. Unit is ms:
batch size 1:
 concurrency | c=1 | c=2 | c=4
legacy | 0.54 | 1.17 | 2.68
per-thread | 0.66 | 1.37 | 2.86
 
batch size 4:  
 concurrency | c=1 | c=2 | c=4
legacy | 1.1 | 2.22 | 4.6
per-thread | 1.21 | 2.44 | 4.98

batch size 64:
concurrency  | c=1 | c=2 | c=4
legacy | 8.09 | 16.13 | 32.37
per-thread | 8.18 | 16.26 | 32.45
2019-10-14 16:03:04 -07:00
Tomasz Socha
f93be8af90 Update nGraph to version 0.26 (#1965)
* Adjust ngraph cmake files to onnx 1.5.0

* Enable LSTM reverse direction mode in nGraph EP

* Enable full support for the Split op in nGraph EP

* Revert "Disable the unsigned input Shrink op tests for nGraph until the next update"

This reverts commit 257b42a55bdd98f804d4846868542b8e3aeb4b4e.

* Enable Gather and remove unused subgraph attribute

* Remove the unused param from AppendClusterToSubGraph

* Fix for the incorrect onnx opset version

* Use the r0.26 release branch before the tag is created

* Enable the quantizelinear and dequantizelinear for NGEP

* Use the v0.26.0-rc.2 tag in ngraph.cmake

* Add skip for modes others than default in Pad operator

* Reenable negative axis tests for ngraph

* Use temporary ngraph version

* Use branch name instead of SHA for temporary ngraph branch

* Use ngraph v0.26.0-rc.4

* Remove patch for missing symbol in MKLDNN

* Use MKLDNN 1.0 in ngraph

* Exclude the Pad op for opsets greater than 10

* Disable quantizelinear and dequantizelinear tests for ONNX 1.5.0

* Fix the onnx-headers related compilation errors

* ONNX libs linking fix

* Use a tag for ngraph and support more Pad modes

* Use the v0.26.0 release tag for nGraph

* Update ngraph to RC8 - bigobj flag for Windows builds

* Fix the MKLDNN constexpr error on Windows
2019-10-14 10:37:48 -07:00
Changming Sun
c24d7a8a0a
Update eigen to the latest version (#1910) 2019-10-11 10:44:19 -07:00
Changming Sun
a314402097
Downgrade python gpu package to CUDA 10.0 (#2086) 2019-10-10 18:31:24 -07:00
Dmitri Smirnov
af9dbb70f2
Introduce a separate check and conditional for AVX512BW build (#2083)
Separate checks for AVX512f and AVX512BW
  Make AVX512BW cmake instructions nested within AVX512F support.
2019-10-10 16:14:00 -07:00
Tracy Sharpe
57e0099425
MLAS: Implement U8S8 GEMV kernels (#2069)
This implements an optimization for U8S8 MlasGemm when M=1, aka GEMV.
2019-10-09 11:54:16 -07:00
Dmitri Smirnov
cae571c713 Add a test for AVX512 compilation before compiling 512 asm (#2055) 2019-10-08 21:18:04 -07:00
Scott McKay
db0dd09ded
Cleanup some aspects of the Initializer class used by optimizers (#2005)
* Move check on data type outside of the Initializer class as it's specific to Conv processing.
Use references for arguments that can't be null.
2019-10-09 10:37:44 +10:00
RandySheriffH
f501b6e234
pack pyop in nightly build (#2018)
* pack pyop in nightly build

* correct logic

* add comment

* exclude debug build

* add dependency

* reset postbuild rule

* remove dep
2019-10-08 12:02:45 -07:00
Changming Sun
8f7657fa32
Ignore some gcc warnings (#1996) 2019-10-07 16:32:34 -07:00
stevenlix
544e53e24e Update TensorRT to version 6.0.1.5 (#1966)
* remove onnx-tensorrt submodule

* add new onnx-tensorrt submodule (experiment) for trt6

* update engine build for trt6

* update compile and compute for tensorrt6.0

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* switch to onnx-tensorrt master for TensorRT6'

* Update tensorrt_execution_provider.cc

* Handle dynamic batch size and add memcpy in TensorRT EP

* update test cases

* Update tensorrt_execution_provider.cc

* update onnx-tensorrt submodule

* Update Dockerfile.ubuntu_tensorrt

* Update Dockerfile.ubuntu_tensorrt

* Update run_dockerbuild.sh

* Update run_dockerbuild.sh

* Update install_ubuntu.sh

* Update concat_op_test.cc

* Update tensorrt_execution_provider.cc

* Upgrade TensorRT to version 6.0.1.5

* Update onnxruntime_providers.cmake

* Update CMakeLists.txt

* Update reduction_ops_test.cc

* Update install_ubuntu.sh

* Update Dockerfile.ubuntu_tensorrt

* Update Dockerfile.tensorrt

* Update BUILD.md

* Update run_dockerbuild.sh

* Update install_ubuntu.sh

* Update onnxruntime_providers.cmake

* Update install_ubuntu.sh

* Update install_ubuntu.sh

* Update gemm_test.cc

* Update gather_op_test.cc

* Update CMakeLists.txt

* Removed submodule

* update onnx-tensorrt submodule

* Add Ubuntu18.04 build option

* Add Ubuntu18.04 build option

* Add Ubuntu18.04 build option

* Add Ubuntu18.04 build option

* Remove redundency

* Fix issue that it does not add memcopy node correctly if some nodes fall back to CUDA EP.
e.g. after partition, there's TRT_Node -> Cuda_node (with CPU memory expected), we still need to add memcpy node between them.

* update for Trt Windows build

* Update onnxruntime_providers.cmake

* Disable opset11 tests on TensorRT

* Update pad_test.cc

* Update build.py

* update scripts for ubuntu18.04

* Disable warning for Windows build
2019-10-06 10:40:53 -07:00
baowenlei
4bb6385dca
Weba/merge ngemm (#2021)
* save status: add tiling layout; add avx512 skylake cpuid info

* unit tests and matmul integer model passed on skylake, need to verify model

* save commit before update master

* fix check

* address comments
2019-10-05 12:09:22 -07:00
Hariharan Seshadri
f528da35f2
Update ONNX to a newer commit (#2015)
* Update ONNX to a newer version

* PR comments
2019-10-04 19:41:00 -07:00
Dmitri Smirnov
f5a8a23951 Replace std::regex with re2 bc CentOS std::regex is broken (#2017) 2019-10-04 18:47:03 -07:00
daquexian
e071a1249b Android CI (#1600) 2019-10-04 17:39:51 -07:00
Dmitri Smirnov
627f853a44
Downgrade compiler to CentOS 4.8.5 (#1985)
Make onnxruntime CPU build and run on CentOS GCC 4.8.5
2019-10-03 15:40:46 -07:00
Dmitri Smirnov
d1b1cdc5c4
Replace GSL with GSL-LITE submodule and fix up refs (#1920)
Remove gsl subodule and replace with a local copy of gsl-lite
  Refactor for onnxruntime::make_unique
  gsl::span size and index are now size_t
  Remove lambda auto argument type detection.
  Remove constexpr from fail_fast in gsl due to Linux not being happy.
  Comment out std::stream support due to MacOS std lib broken.
  Move make_unique into include/core/common so it is accessible for server builds.
  Relax requirements for onnxruntime/test/providers/cpu/ml/write_scores_test.cc
  due to x86 build.
  Add ONNXRUNTIME_ROOT to Server Lib includes so gsl is recognized
2019-10-01 12:43:29 -07:00
KeDengMS
e361174f78
Add nuphar python scripts to wheel, and notebook tutorial (#1952)
* Fixed a bug of missing tvm in python wheel
* Put Nuphar Python scripts into wheel
* Add note book tutorial
* Some improvements in symbolic shape inference for quantized models
2019-09-30 10:39:02 -07:00
Tracy Sharpe
4c995d3251 MLAS: add DGEMM support (#1953)
* rename existing kernels

* add dgemm support

* rename existing kernels

* add dgemm support

* synchronize with amd64

* dgemm

* remove test code

* remove more test code

* fix file extension
2019-09-30 10:04:59 -07:00
baowenlei
611dd3ea0c update ort-tvm version (#1945)
* update ort-tvm version

* update tvm patch
2019-09-27 22:11:14 -07:00
suryasidd
ceaaff0f81 [OpenVINO-EP] Enabling VAD-F in OpenVINO Execution Provider (#1885)
* Added support for Hetero plugin

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Fixed spelling error in cmake for hetero plugin

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Added listener to print messages from the plugin

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Updated Documentation for VAD-F enablement

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Added VAD-F option for FPGA

*Disabled unit tests and backed tests because FPGA only accepts NCHW models

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Added comment for why tests need to be disabled on VAD-F

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
2019-09-26 18:32:16 -07:00
Yulong Wang
e6ce384402
add dependency 'cub' as submodule (#1924) 2019-09-26 16:10:39 +08:00
Hariharan Seshadri
dbff8272e7
Update ONNX to newer commit (#1907) 2019-09-24 19:25:34 -07:00
Tracy Sharpe
28a62f7728
MLAS: add U8S8 MatMul operation (#1895)
Implement the second round of changes for quantization inside MLAS. This adds a MatMul operation for U8xS8=S32 for x86/x64 processors.
2019-09-24 18:15:11 -07:00
avidiyal
ca89387817 Build OnnxRT with Openvino EP on Windows (#1865)
* Avoid variable length stack array variables for VC++ compatibility

Use dynamically allocated arrays or vectors instead.

* windows enabling

* openvino windows build

* Update build instructions

* resolve conflicts for PR

* remove debug messages from cmake

* PR fix for window support
2019-09-23 13:36:49 -07:00
Hariharan Seshadri
aacfa2af65
Bump up ONNX to the latest commit (#1868)
* Initial commit

* Delete unnecessary files

* Update generated proto files

* Update server proto file

* Update submodule onnx

* Update OnnxMl.cs

* update OnnxMl.cs

* Update OnnxMl.cs

* Comment one test

* Update disabled test list

* Update backend tests

* Formatting fix

* Formatting

* Disable a test

* More tests updated

* commit id update

* Update to a newer commit

* More updates

* More test updates

* Update

* Update

* Updates

* Update
2019-09-20 18:15:16 -07:00
Changming Sun
561f2c4a9a
Update pybind (#1876) 2019-09-19 17:44:33 -07:00
KeDengMS
6792e00e1e
Revert accidental TVM change (#1874) 2019-09-18 17:50:16 -07:00
KeDengMS
80bda77203
Fix symbolic shape inference for faster_rcnn, mask_rcnn, yolov3 (#1867)
* Fix symbolic shape inference for faster_rcnn, mask_rcnn, yolov3

Force merge when --auto_merge, on symbolic dims which sympy cannot simplify
Add symbolic inference for Resize opset 10
Add support for step != 1 in Slice
Add support for computed dim in TopK
Bug fixes in passing symbolic dims from subgraph
Fix an outdate comment in Nuphar provider header
2019-09-18 14:18:32 -07:00
Yang Chen
71d414cacb
update tvm submodule to the latest (#1849)
* update tvm submodule to the latest

* also update manifest for tvm submodule
2019-09-16 17:16:33 -07:00
kile0
125900c961 Enable integration with mimalloc memory allocator (#1673)
* add mimalloc submodule

* basic hooks into execution provider header and build script option

* pull mimalloc into build

* windows has to use the override vcxproj already set up, and disable bfcarena when using mimalloc

* fix import_location

* generalize build msbuild command

* add mimalloc dependency to python package as well as various commenting cleanups

* update mimalloc commit as stop gap

* include mimalloc changes from master

* create capi directory if doesn't exist for mimalloc copying over

* disable runtime hooks and remove old comment

* temporary change to test CI

* fetch the mimalloc output name property

* uniformly call target_link_libraries

* query cmake to get the correct windows sdk to target

* revert change to trailing directory slash

* pickup windows sdk off msbuild path if possible

* copy the produced dll/so at install time, not configure time

* deal with mimalloc unimplemented atomic

* move to dev branch of mimalloc to avoid atomic issues on gcc

* for windows specify solution settings (x86) rather than individual project settings

* pin mimalloc submodule to updated commit

* typo

* Revert "temporary change to test CI"

This reverts commit 764867376936a5d307dded3cc37f00a34e3b0c96.
2019-09-13 17:12:48 -07:00
Changming Sun
bd5451b4ed
Don't define USE_OPENMP if the compiler doesn't support OpenMP (#1836) 2019-09-13 16:42:50 -07:00
KeDengMS
cf22ea6893
Upgrade TVM for a fix in tvm::Integer with int64_t input (#1824)
* Upgrade TVM for a fix in tvm::Integer with int64_t input
2019-09-12 23:15:13 -07:00
Bowen Bao
8712a523a4
Bump onnx to latest (#1756)
* Bump onnx to latest

Update onnx.in.proto with changes for SparseTensor.

* add temp skip tests

* remove passed tests from skip list

* skip more tests for new ops in opset 11

* skip crashing tests

* update handling of new attribute types sparse tensor and sparse tensors

* advance onnx commit and remove skip cpu_flaky_tests

* temporarily skip yolo3 model test due to resize opset10 shape inference regression

* update proto for onnxruntime server

* advance onnx commit further
2019-09-12 11:46:49 -07:00
Dmitri Smirnov
fe8915863c
Implement C API entry points for creating and fetching non-standard types to OrtValue (#1714)
C/C++ Opage APIs
 Add new virtual interfaces for NonTensorType
 Implement entry points.
 Add shared header for the data container.
 Add export symbols.
 Add serialization/deserialization.
 Implement model with Opaque types.
 Rework opqaue_api_test as a standalone executable.
2019-09-11 14:52:47 -07:00
shahasad
6a5b11756b
Conditionally export execution provider apis in chsarp (#1724) 2019-09-09 11:17:44 -07:00
Tracy Sharpe
071a0c2522
MLAS: MlasSgemm refactoring (#1749)
Refactor the SGEMM kernels to resynchronize the code between Windows/Linux and remove unneeded binary bloat from a different zero/add mode kernel. Another goal is to get to a cleaner state for then doing a DGEMM kernel.
2019-09-06 22:26:28 -07:00
KeDengMS
58fe5a6bf1
Enable Nuphar docker build, and reinstate Nuphar tests (#1757)
Enable Nuphar EP docker build
Revert back to LLVM 6.0.1
Reinstate disabled Softmax tests caused by LLVM 8.0.1
Reinstate Nuphar Python test due to stale sympy version
Increase build timeout of Linux CI
2019-09-05 08:50:48 -07:00
Changming Sun
94d9161166
Add nuphar to Linux CI build (#1750) 2019-09-03 11:39:27 -07:00
KeDengMS
c9240f4e93
Implementation of Nuphar execution provider (#881)
* Implement Nuphar execution provider

Nuphar execution provider is a TVM-based compilation provider. It has shown great speedups for RNN models using Scan.
This PR is mainly for a preview of the shared codegen library for other TVM-based providers.

* Fix submodules

* Fix TVM submodule

* Update Nuphar to latest and resolve confliction

* Remove stale files caused by merge -X theirs

* Revert heap buffer change to not introduce onnxruntime_framework into onnxruntime_perf_test

* Fix bad merge

* Merge from Nuphar

* Fix warning treated as error, revert some unnecessary changes

* Revert some more test changes

* Some more test revert or comments to make review easier
New tests could be added later

* One more revert of unnecessary changes

* More change revert. Test could be added back later.
2019-09-01 23:01:47 -07:00
Changming Sun
81ad48080b
Remove TaskThreadPool (#1713) 2019-08-28 18:00:10 -07:00
Tracy Sharpe
73312b8195
MLAS: Android sgemm kernel build fix (#1710)
Fix the aarch64 kernel to build properly with the Android NDK (specifically clang).
2019-08-28 16:14:12 -07:00
Ashwini Khade
961b14ac4a
use MLAS for QGEMM in matmulInteger and convInteger (#1692)
* use mlas qgemm for u8u8_s32 gemms

* update test
2019-08-26 18:13:22 -07:00
Changming Sun
4de0aa8049
Optimize kernel index (#1672) 2019-08-22 10:26:35 -07:00