Commit graph

1388 commits

Author SHA1 Message Date
Changming Sun
c24d7a8a0a
Update eigen to the latest version (#1910) 2019-10-11 10:44:19 -07:00
Scott McKay
bdfff800ea Move access to intra-op threadpool into OpKernelContext. (#2091) 2019-10-11 10:36:20 -07:00
Changming Sun
368bdfd936
Update README.md (#2070)
Update the vcredist package link

Note: Visual C++ 2015, 2017 and 2019 all share the same redistributable files.
2019-10-11 10:06:50 -07:00
Hector Li
3b335c933f
Fix issue that TRT not work for device other than device id 0
Fix issue that TRT not work for device other than device id 0. Because the allocation planner need to get the default allocator to allocate memory for graph input data. (#2094)
2019-10-11 09:22:25 -07:00
Scott McKay
ffb94fd170
Fix bug with delayed allocation of If and Scan outputs. (#2024)
* Fix bug with delayed allocation of If and Scan outputs.
If the subgraph is producing output on a non-CPU device the delayed allocation was incorrectly providing a CPU allocated tensor.
Check for the required location, and update 'fetches' instead if a device copy is needed.
The utils::ExecuteGraph logic will handle the device copy in this case.
2019-10-11 19:49:21 +10:00
Yang Chen
ca1b88c069
Added support to infer Pad11 (#2085)
* Added support to infer Pad11

* address CR
2019-10-10 23:18:49 -07:00
shahasad
8803f6fff4
C# end to end test fix, and make end to end tests mandatory (#2079) 2019-10-10 19:23:43 -07:00
Changming Sun
a314402097
Downgrade python gpu package to CUDA 10.0 (#2086) 2019-10-10 18:31:24 -07:00
Dmitri Smirnov
af9dbb70f2
Introduce a separate check and conditional for AVX512BW build (#2083)
Separate checks for AVX512f and AVX512BW
  Make AVX512BW cmake instructions nested within AVX512F support.
2019-10-10 16:14:00 -07:00
Hariharan Seshadri
2ba705ed99
Handle nodes with subgraphs in ORT function handling implementation (#2053)
* Initial commit

* Update

* Update

* Nits

* More updates

* to be reverted

* Update

* Update

* More changes

* Updates

* Update Function

* Nits

* Fix build break

* Comment
2019-10-10 16:07:42 -07:00
Pranav Sharma
2d4d0abd36
Add support for output seq(tensor) in python binding and test framework. Implement SequenceConstruct, SequenceEmpty, SequenceInsert and SequenceErase ops. (#2040)
Add support for output seq(tensor) in python binding and test framework. Implement SequenceConstruct, SequenceEmpty, SequenceInsert and SequenceErase ops. (#2040)
2019-10-10 15:58:49 -07:00
Scott McKay
ddbc2086e4 Add support for opset 11 Clip in optimizers. (#2059) 2019-10-10 10:47:29 -07:00
Yulong Wang
a41c71cbf2
check and fix CUDA kernel launch errors in several OPs (#2047) 2019-10-10 23:47:00 +08:00
baowenlei
b4a98aab78
change MatMulInteger/MatMulInteger16 fallback option (#2064)
* change MatMulInteger/MatMulInteger16 fallback option when no initializer exist

* add AVX option

* fix condition for old machines
2019-10-09 22:03:21 -07:00
Hariharan Seshadri
d186c19c45
Add opset-11 TopK CPU kernel (#1912)
* initial commit

* Update

* Update top_k.cc

* PR comments

* Add more tests

* Update

* Add another test case

* Update

* Resolve conflicts

* Update

* Nits

* Nits

* Nits

* Pick sorted content using 2 different approaches

* Update to logic

* PR comments

* PR feedback

* Update

* Fix build

* Fix build

* Update
2019-10-09 19:09:30 -07:00
Colin Versteeg
8fda6593fe Update failing tests (#2038)
* Fix failing tests from when they were not enabled

* split into two

* fix failing test
2019-10-09 15:17:21 -07:00
Tracy Sharpe
57e0099425
MLAS: Implement U8S8 GEMV kernels (#2069)
This implements an optimization for U8S8 MlasGemm when M=1, aka GEMV.
2019-10-09 11:54:16 -07:00
Changming Sun
eee9c55030
C++11 fix for memcpy_transformer_test.cc (#2061) 2019-10-09 10:52:10 -07:00
Changming Sun
cefae93305
Add a test case for linearregressor (#1962) 2019-10-09 10:17:08 -07:00
Changming Sun
ccaf692ff2
Run auditwheel for manylinux1 (#2063) 2019-10-09 09:23:00 -07:00
Dmitri Smirnov
cae571c713 Add a test for AVX512 compilation before compiling 512 asm (#2055) 2019-10-08 21:18:04 -07:00
Changming Sun
af8fe0f980
Replace make_unique in cuda_utils.cu (#2052) 2019-10-08 18:32:08 -07:00
Scott McKay
db0dd09ded
Cleanup some aspects of the Initializer class used by optimizers (#2005)
* Move check on data type outside of the Initializer class as it's specific to Conv processing.
Use references for arguments that can't be null.
2019-10-09 10:37:44 +10:00
Changming Sun
a00ca56ae1
Remove gcc from manylinux1 docker image (#2048) 2019-10-08 13:49:15 -07:00
baowenlei
b82de794d5
Weba/update nuphar doc (#2026)
* update nuphar xp doc

* address comments

* address CR

* update doc
2019-10-08 12:41:25 -07:00
RandySheriffH
f501b6e234
pack pyop in nightly build (#2018)
* pack pyop in nightly build

* correct logic

* add comment

* exclude debug build

* add dependency

* reset postbuild rule

* remove dep
2019-10-08 12:02:45 -07:00
Changming Sun
e9bed8b23b
Change python packaging pipeline to use manylinux1 (#2035)
1. Change the python packaing pipeline to use manylinux1
2. Temporarily disable model test in the python pipeline.
2019-10-08 10:03:54 -07:00
Changming Sun
3053af812c
Fix a crash in deep_cpu_gru_op_test.cc (#2028) 2019-10-08 10:03:07 -07:00
Zhang Lei
71b389322e Implement cuda scatter op. (#1991)
* Implement cuda scatter op.
Disable Invalid Index of Scatter op only for cuda provider.

* Fix some pipeline's type narrow warning as error.
2019-10-08 09:53:33 -07:00
Yang Chen
a94c9bd88d throw exception using dmlc::LogMessageFatal (#2033)
* throw exception using dmlc::LogMessageFatal

On windows, ORT_THROW couldn't be caught if the exception was thrown from
a jitted functions. Let's call dmlc::LogMessageFatal instead.

* address CR

use LOG(FATAL)
2019-10-08 09:31:35 -07:00
Yang Chen
19b0d0af87
Enabled bool input type for Equal for op_ver 11 (#2034)
This change enabled bool type for Equal-11's inputs
2019-10-08 01:50:37 -07:00
Yang Chen
203c2f5b59
updated reduce_ops for op_ver 11 (#2039)
After enabling op_ver 11 for reduce ops, we need to check axes to
make sure it's not empty.
2019-10-08 01:05:05 -07:00
Pranav Sharma
f13b66768a
Fix build for gcc 4.8.5. (#2036) 2019-10-08 00:50:53 -07:00
shahasad
b70fc34fae
Fix C# end to end tests in NuGet pipeline, failing for missing test data file 2019-10-07 20:14:20 -07:00
shahasad
b0feaef9de
Update the C# pretrained model test to include opset9 and 10 models (#2003) 2019-10-07 19:14:34 -07:00
George Wu
0bd807f3b3
trt provider status return cleanup (#2032)
* status and code cleanup.

* revert change. seems like a bug in TRT causes intermittent failure return?
2019-10-07 18:34:48 -07:00
Tianlei Wu
b2c1937523
Add EmbedLayerNormalization and SkipLayerNormalization ops for bert optimization (#2012)
* Add Embed Layer Normalization and Skip Layer Normalization ops for bert optimization.

* add float16 test for skiplayernorm

* Add test for EmbedLayerNormalization op

* fix cpu build error

* fix build warning

* update HasCudaEnvironment function

* handle cuda error
2019-10-07 17:29:43 -07:00
Changming Sun
8f7657fa32
Ignore some gcc warnings (#1996) 2019-10-07 16:32:34 -07:00
Pranav Sharma
ea60469af5
Support seq(tensor), implement 2 sequence ops that use the new type. (#1983)
* Mention OrtCreateSessionFromArray in C API doc

* fix seq of tensors

* changes on 9/30

* All tests passing

* Add SequenceAt op

* Fix shared_lib non_tensor_types test

* Address some PR comments

* Address PR comments

* Add support in python bindings to accept seq(tensor)

* Change data type from vector<Tensor> to TensorSeq

* Change data type from vector<Tensor> to TensorSeq

* Added some documentation

* Added missing test model

* Fix Linux build

* Fix Mac build

* Fix Mac build
2019-10-07 15:35:09 -07:00
Hector Li
00e24ae4fe
refactor Cuda Ops Sum, Max, Min, remove dup code (#1946)
refactor Cuda Ops Sum, Max, Min, remove dup code
2019-10-07 13:17:49 -07:00
Tianlei Wu
7b39f5090c
Add Attention op for multi-head self attention in BERT (#1984)
* Add Attention op for multi head self attention in BERT

* Add test cases

* Move op from kOnnxDomain to kMSDomain.
Limit test to run by CUDA provider only.

* fix test

* Add float16 test

* fix cpu build error

* handle cuda error

* get last cuda error when failed
2019-10-07 12:22:54 -07:00
Yang Chen
7d2f0c79bd Bumped up to op_ver 11 for a bunch of Nuphar Ops (#2025)
This change enabled op_ver 11 for a dozen of Nuphar Ops
2019-10-07 10:34:05 -07:00
Changming Sun
3c26ae5b6d
ThreadPool fix for roialign and CropAndResize (#2020) 2019-10-06 22:43:59 -07:00
Pranav Sharma
4cdb95e436
Resort to sequential execution if the inter op thread pool ptr is nullptr; (#2023) 2019-10-06 16:08:41 -07:00
stevenlix
544e53e24e Update TensorRT to version 6.0.1.5 (#1966)
* remove onnx-tensorrt submodule

* add new onnx-tensorrt submodule (experiment) for trt6

* update engine build for trt6

* update compile and compute for tensorrt6.0

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* switch to onnx-tensorrt master for TensorRT6'

* Update tensorrt_execution_provider.cc

* Handle dynamic batch size and add memcpy in TensorRT EP

* update test cases

* Update tensorrt_execution_provider.cc

* update onnx-tensorrt submodule

* Update Dockerfile.ubuntu_tensorrt

* Update Dockerfile.ubuntu_tensorrt

* Update run_dockerbuild.sh

* Update run_dockerbuild.sh

* Update install_ubuntu.sh

* Update concat_op_test.cc

* Update tensorrt_execution_provider.cc

* Upgrade TensorRT to version 6.0.1.5

* Update onnxruntime_providers.cmake

* Update CMakeLists.txt

* Update reduction_ops_test.cc

* Update install_ubuntu.sh

* Update Dockerfile.ubuntu_tensorrt

* Update Dockerfile.tensorrt

* Update BUILD.md

* Update run_dockerbuild.sh

* Update install_ubuntu.sh

* Update onnxruntime_providers.cmake

* Update install_ubuntu.sh

* Update install_ubuntu.sh

* Update gemm_test.cc

* Update gather_op_test.cc

* Update CMakeLists.txt

* Removed submodule

* update onnx-tensorrt submodule

* Add Ubuntu18.04 build option

* Add Ubuntu18.04 build option

* Add Ubuntu18.04 build option

* Add Ubuntu18.04 build option

* Remove redundency

* Fix issue that it does not add memcopy node correctly if some nodes fall back to CUDA EP.
e.g. after partition, there's TRT_Node -> Cuda_node (with CPU memory expected), we still need to add memcpy node between them.

* update for Trt Windows build

* Update onnxruntime_providers.cmake

* Disable opset11 tests on TensorRT

* Update pad_test.cc

* Update build.py

* update scripts for ubuntu18.04

* Disable warning for Windows build
2019-10-06 10:40:53 -07:00
baowenlei
4bb6385dca
Weba/merge ngemm (#2021)
* save status: add tiling layout; add avx512 skylake cpuid info

* unit tests and matmul integer model passed on skylake, need to verify model

* save commit before update master

* fix check

* address comments
2019-10-05 12:09:22 -07:00
Xavier Dupré
0b5aac0a2e fix python setup (#2022) 2019-10-05 09:46:41 -07:00
Yang Chen
e8285a7996
Added GatherElements to Nuphar (#2016)
* Added GatherElements to Nuphar

This change added GatherElements (op_ver 11) to the Nuphar provider.

* address CR feedback

* create a utilify function for accessing index safely

* address more CR

* SafeIndex -> ClampIndex
2019-10-04 23:53:02 -07:00
Colin Versteeg
1ba76c5f74 add support for empty version and score route (#1995) 2019-10-04 22:53:11 -07:00
Changming Sun
a9e04a29b3
Ignore a test: ParallelExecutor.StatusPropagation (#2019) 2019-10-04 22:51:47 -07:00