Changming Sun
c24d7a8a0a
Update eigen to the latest version ( #1910 )
2019-10-11 10:44:19 -07:00
Scott McKay
bdfff800ea
Move access to intra-op threadpool into OpKernelContext. ( #2091 )
2019-10-11 10:36:20 -07:00
Changming Sun
368bdfd936
Update README.md ( #2070 )
...
Update the vcredist package link
Note: Visual C++ 2015, 2017 and 2019 all share the same redistributable files.
2019-10-11 10:06:50 -07:00
Hector Li
3b335c933f
Fix issue that TRT not work for device other than device id 0
...
Fix issue that TRT not work for device other than device id 0. Because the allocation planner need to get the default allocator to allocate memory for graph input data. (#2094 )
2019-10-11 09:22:25 -07:00
Scott McKay
ffb94fd170
Fix bug with delayed allocation of If and Scan outputs. ( #2024 )
...
* Fix bug with delayed allocation of If and Scan outputs.
If the subgraph is producing output on a non-CPU device the delayed allocation was incorrectly providing a CPU allocated tensor.
Check for the required location, and update 'fetches' instead if a device copy is needed.
The utils::ExecuteGraph logic will handle the device copy in this case.
2019-10-11 19:49:21 +10:00
Yang Chen
ca1b88c069
Added support to infer Pad11 ( #2085 )
...
* Added support to infer Pad11
* address CR
2019-10-10 23:18:49 -07:00
shahasad
8803f6fff4
C# end to end test fix, and make end to end tests mandatory ( #2079 )
2019-10-10 19:23:43 -07:00
Changming Sun
a314402097
Downgrade python gpu package to CUDA 10.0 ( #2086 )
2019-10-10 18:31:24 -07:00
Dmitri Smirnov
af9dbb70f2
Introduce a separate check and conditional for AVX512BW build ( #2083 )
...
Separate checks for AVX512f and AVX512BW
Make AVX512BW cmake instructions nested within AVX512F support.
2019-10-10 16:14:00 -07:00
Hariharan Seshadri
2ba705ed99
Handle nodes with subgraphs in ORT function handling implementation ( #2053 )
...
* Initial commit
* Update
* Update
* Nits
* More updates
* to be reverted
* Update
* Update
* More changes
* Updates
* Update Function
* Nits
* Fix build break
* Comment
2019-10-10 16:07:42 -07:00
Pranav Sharma
2d4d0abd36
Add support for output seq(tensor) in python binding and test framework. Implement SequenceConstruct, SequenceEmpty, SequenceInsert and SequenceErase ops. ( #2040 )
...
Add support for output seq(tensor) in python binding and test framework. Implement SequenceConstruct, SequenceEmpty, SequenceInsert and SequenceErase ops. (#2040 )
2019-10-10 15:58:49 -07:00
Scott McKay
ddbc2086e4
Add support for opset 11 Clip in optimizers. ( #2059 )
2019-10-10 10:47:29 -07:00
Yulong Wang
a41c71cbf2
check and fix CUDA kernel launch errors in several OPs ( #2047 )
2019-10-10 23:47:00 +08:00
baowenlei
b4a98aab78
change MatMulInteger/MatMulInteger16 fallback option ( #2064 )
...
* change MatMulInteger/MatMulInteger16 fallback option when no initializer exist
* add AVX option
* fix condition for old machines
2019-10-09 22:03:21 -07:00
Hariharan Seshadri
d186c19c45
Add opset-11 TopK CPU kernel ( #1912 )
...
* initial commit
* Update
* Update top_k.cc
* PR comments
* Add more tests
* Update
* Add another test case
* Update
* Resolve conflicts
* Update
* Nits
* Nits
* Nits
* Pick sorted content using 2 different approaches
* Update to logic
* PR comments
* PR feedback
* Update
* Fix build
* Fix build
* Update
2019-10-09 19:09:30 -07:00
Colin Versteeg
8fda6593fe
Update failing tests ( #2038 )
...
* Fix failing tests from when they were not enabled
* split into two
* fix failing test
2019-10-09 15:17:21 -07:00
Tracy Sharpe
57e0099425
MLAS: Implement U8S8 GEMV kernels ( #2069 )
...
This implements an optimization for U8S8 MlasGemm when M=1, aka GEMV.
2019-10-09 11:54:16 -07:00
Changming Sun
eee9c55030
C++11 fix for memcpy_transformer_test.cc ( #2061 )
2019-10-09 10:52:10 -07:00
Changming Sun
cefae93305
Add a test case for linearregressor ( #1962 )
2019-10-09 10:17:08 -07:00
Changming Sun
ccaf692ff2
Run auditwheel for manylinux1 ( #2063 )
2019-10-09 09:23:00 -07:00
Dmitri Smirnov
cae571c713
Add a test for AVX512 compilation before compiling 512 asm ( #2055 )
2019-10-08 21:18:04 -07:00
Changming Sun
af8fe0f980
Replace make_unique in cuda_utils.cu ( #2052 )
2019-10-08 18:32:08 -07:00
Scott McKay
db0dd09ded
Cleanup some aspects of the Initializer class used by optimizers ( #2005 )
...
* Move check on data type outside of the Initializer class as it's specific to Conv processing.
Use references for arguments that can't be null.
2019-10-09 10:37:44 +10:00
Changming Sun
a00ca56ae1
Remove gcc from manylinux1 docker image ( #2048 )
2019-10-08 13:49:15 -07:00
baowenlei
b82de794d5
Weba/update nuphar doc ( #2026 )
...
* update nuphar xp doc
* address comments
* address CR
* update doc
2019-10-08 12:41:25 -07:00
RandySheriffH
f501b6e234
pack pyop in nightly build ( #2018 )
...
* pack pyop in nightly build
* correct logic
* add comment
* exclude debug build
* add dependency
* reset postbuild rule
* remove dep
2019-10-08 12:02:45 -07:00
Changming Sun
e9bed8b23b
Change python packaging pipeline to use manylinux1 ( #2035 )
...
1. Change the python packaing pipeline to use manylinux1
2. Temporarily disable model test in the python pipeline.
2019-10-08 10:03:54 -07:00
Changming Sun
3053af812c
Fix a crash in deep_cpu_gru_op_test.cc ( #2028 )
2019-10-08 10:03:07 -07:00
Zhang Lei
71b389322e
Implement cuda scatter op. ( #1991 )
...
* Implement cuda scatter op.
Disable Invalid Index of Scatter op only for cuda provider.
* Fix some pipeline's type narrow warning as error.
2019-10-08 09:53:33 -07:00
Yang Chen
a94c9bd88d
throw exception using dmlc::LogMessageFatal ( #2033 )
...
* throw exception using dmlc::LogMessageFatal
On windows, ORT_THROW couldn't be caught if the exception was thrown from
a jitted functions. Let's call dmlc::LogMessageFatal instead.
* address CR
use LOG(FATAL)
2019-10-08 09:31:35 -07:00
Yang Chen
19b0d0af87
Enabled bool input type for Equal for op_ver 11 ( #2034 )
...
This change enabled bool type for Equal-11's inputs
2019-10-08 01:50:37 -07:00
Yang Chen
203c2f5b59
updated reduce_ops for op_ver 11 ( #2039 )
...
After enabling op_ver 11 for reduce ops, we need to check axes to
make sure it's not empty.
2019-10-08 01:05:05 -07:00
Pranav Sharma
f13b66768a
Fix build for gcc 4.8.5. ( #2036 )
2019-10-08 00:50:53 -07:00
shahasad
b70fc34fae
Fix C# end to end tests in NuGet pipeline, failing for missing test data file
2019-10-07 20:14:20 -07:00
shahasad
b0feaef9de
Update the C# pretrained model test to include opset9 and 10 models ( #2003 )
2019-10-07 19:14:34 -07:00
George Wu
0bd807f3b3
trt provider status return cleanup ( #2032 )
...
* status and code cleanup.
* revert change. seems like a bug in TRT causes intermittent failure return?
2019-10-07 18:34:48 -07:00
Tianlei Wu
b2c1937523
Add EmbedLayerNormalization and SkipLayerNormalization ops for bert optimization ( #2012 )
...
* Add Embed Layer Normalization and Skip Layer Normalization ops for bert optimization.
* add float16 test for skiplayernorm
* Add test for EmbedLayerNormalization op
* fix cpu build error
* fix build warning
* update HasCudaEnvironment function
* handle cuda error
2019-10-07 17:29:43 -07:00
Changming Sun
8f7657fa32
Ignore some gcc warnings ( #1996 )
2019-10-07 16:32:34 -07:00
Pranav Sharma
ea60469af5
Support seq(tensor), implement 2 sequence ops that use the new type. ( #1983 )
...
* Mention OrtCreateSessionFromArray in C API doc
* fix seq of tensors
* changes on 9/30
* All tests passing
* Add SequenceAt op
* Fix shared_lib non_tensor_types test
* Address some PR comments
* Address PR comments
* Add support in python bindings to accept seq(tensor)
* Change data type from vector<Tensor> to TensorSeq
* Change data type from vector<Tensor> to TensorSeq
* Added some documentation
* Added missing test model
* Fix Linux build
* Fix Mac build
* Fix Mac build
2019-10-07 15:35:09 -07:00
Hector Li
00e24ae4fe
refactor Cuda Ops Sum, Max, Min, remove dup code ( #1946 )
...
refactor Cuda Ops Sum, Max, Min, remove dup code
2019-10-07 13:17:49 -07:00
Tianlei Wu
7b39f5090c
Add Attention op for multi-head self attention in BERT ( #1984 )
...
* Add Attention op for multi head self attention in BERT
* Add test cases
* Move op from kOnnxDomain to kMSDomain.
Limit test to run by CUDA provider only.
* fix test
* Add float16 test
* fix cpu build error
* handle cuda error
* get last cuda error when failed
2019-10-07 12:22:54 -07:00
Yang Chen
7d2f0c79bd
Bumped up to op_ver 11 for a bunch of Nuphar Ops ( #2025 )
...
This change enabled op_ver 11 for a dozen of Nuphar Ops
2019-10-07 10:34:05 -07:00
Changming Sun
3c26ae5b6d
ThreadPool fix for roialign and CropAndResize ( #2020 )
2019-10-06 22:43:59 -07:00
Pranav Sharma
4cdb95e436
Resort to sequential execution if the inter op thread pool ptr is nullptr; ( #2023 )
2019-10-06 16:08:41 -07:00
stevenlix
544e53e24e
Update TensorRT to version 6.0.1.5 ( #1966 )
...
* remove onnx-tensorrt submodule
* add new onnx-tensorrt submodule (experiment) for trt6
* update engine build for trt6
* update compile and compute for tensorrt6.0
* Update tensorrt_execution_provider.cc
* Update tensorrt_execution_provider.cc
* Update tensorrt_execution_provider.cc
* Update tensorrt_execution_provider.cc
* switch to onnx-tensorrt master for TensorRT6'
* Update tensorrt_execution_provider.cc
* Handle dynamic batch size and add memcpy in TensorRT EP
* update test cases
* Update tensorrt_execution_provider.cc
* update onnx-tensorrt submodule
* Update Dockerfile.ubuntu_tensorrt
* Update Dockerfile.ubuntu_tensorrt
* Update run_dockerbuild.sh
* Update run_dockerbuild.sh
* Update install_ubuntu.sh
* Update concat_op_test.cc
* Update tensorrt_execution_provider.cc
* Upgrade TensorRT to version 6.0.1.5
* Update onnxruntime_providers.cmake
* Update CMakeLists.txt
* Update reduction_ops_test.cc
* Update install_ubuntu.sh
* Update Dockerfile.ubuntu_tensorrt
* Update Dockerfile.tensorrt
* Update BUILD.md
* Update run_dockerbuild.sh
* Update install_ubuntu.sh
* Update onnxruntime_providers.cmake
* Update install_ubuntu.sh
* Update install_ubuntu.sh
* Update gemm_test.cc
* Update gather_op_test.cc
* Update CMakeLists.txt
* Removed submodule
* update onnx-tensorrt submodule
* Add Ubuntu18.04 build option
* Add Ubuntu18.04 build option
* Add Ubuntu18.04 build option
* Add Ubuntu18.04 build option
* Remove redundency
* Fix issue that it does not add memcopy node correctly if some nodes fall back to CUDA EP.
e.g. after partition, there's TRT_Node -> Cuda_node (with CPU memory expected), we still need to add memcpy node between them.
* update for Trt Windows build
* Update onnxruntime_providers.cmake
* Disable opset11 tests on TensorRT
* Update pad_test.cc
* Update build.py
* update scripts for ubuntu18.04
* Disable warning for Windows build
2019-10-06 10:40:53 -07:00
baowenlei
4bb6385dca
Weba/merge ngemm ( #2021 )
...
* save status: add tiling layout; add avx512 skylake cpuid info
* unit tests and matmul integer model passed on skylake, need to verify model
* save commit before update master
* fix check
* address comments
2019-10-05 12:09:22 -07:00
Xavier Dupré
0b5aac0a2e
fix python setup ( #2022 )
2019-10-05 09:46:41 -07:00
Yang Chen
e8285a7996
Added GatherElements to Nuphar ( #2016 )
...
* Added GatherElements to Nuphar
This change added GatherElements (op_ver 11) to the Nuphar provider.
* address CR feedback
* create a utilify function for accessing index safely
* address more CR
* SafeIndex -> ClampIndex
2019-10-04 23:53:02 -07:00
Colin Versteeg
1ba76c5f74
add support for empty version and score route ( #1995 )
2019-10-04 22:53:11 -07:00
Changming Sun
a9e04a29b3
Ignore a test: ParallelExecutor.StatusPropagation ( #2019 )
2019-10-04 22:51:47 -07:00