Commit graph

436 commits

Author SHA1 Message Date
Dmitri Smirnov
da1cf8fff0
Remove exclusions for Sign operator model tests. (#490) 2019-02-19 11:52:26 -08:00
Faith Xu
54acfc0432 TPN update and link fix (#483)
* TPN update

* Update TPN

* Update link
2019-02-17 22:29:10 -08:00
shahasad
ee702bd288
patched the logic of removing the ._*.onnx file, in case it comes in position other than the first in listdir (#484) 2019-02-15 16:08:20 -08:00
jignparm
1f1dcc352f
Add Native C API test from NuGet (#481)
* Initial check-in of Native Capi tests

* Minor update

* Updated with OrtCreateCpuAllocatorInfo working after including cpu_provider_factory.h

* Minor editw

* Minor update
2019-02-15 13:42:24 -08:00
Randy
2a9a924c23
Add float16 support for fusion (#476)
* Add float16 support for fusion

* update test case

* update test case
2019-02-14 10:01:25 -08:00
liqunfu
9add0e9a9f random generator to continue generate random numbers (#477)
* random generator to continue generate random numbers

* update with reviewer's comments

* update with reviewer's comments, remove an unnecessary change

* random generator to continue generate random numbers, update with reviewer's comments
2019-02-14 13:59:34 +10:00
jignparm
f6ffa1280a
Updated endtoendtests to not copy model files (#479) 2019-02-13 17:43:43 -08:00
Hariharan Seshadri
62532ec1b0
Minor cleanup in TopK operator (#478) 2019-02-13 12:51:05 -08:00
ybrnathan
f2510127a2
Optimize pad performance (#472)
* Optimize pad performance by flatten the inner most no padding axis. This will significantly reduces the total number of memcpy since memcpy usually only happen for inner most axis.

For example, for a shape of [1,224,224,3] with padding [0,3,3,0,0,3,3,0], can be flatten as [1,224,672] with padding [0,3,9,0,3,9].

With this fix, Pad performance can be improved by >7 times for above example.

* Fix typo in comments of pad performance optimization

* Pass dims as const reference instead of value.

* Fix Linux GPU warning

* Move dim check to Init.
2019-02-12 21:48:54 -08:00
edgchen1
0a23d23266
Initial implementation of Where op. (#412)
* Implemented Where op.

* Enabled Where op for string T, changed to broadcaster to pass const reference scalars.

* Addressed PR comments.

* Removed where_example from broken tests.

* Removed some Python ONNX backend tests from exclusion pattern.

* Addressed PR comment.

* Fixed Linux build error.

* Added non-Eigen path for non-arithmetic types.

* Use std::is_arithmetic instead of std::is_arithmetic_v.

* Added type_traits include, renamed function.

* Fixed gcc build error.
2019-02-12 16:06:17 -08:00
Changming Sun
d05b74b1b7 Delete Tensor::ShallowCopy 2019-02-12 15:51:36 -08:00
Ke Zhang
fc90a9b2fc
allocator refactor (#467)
* update CPUAllocator.

* onnxruntime

* fix build break

* remove useless subclasses of CPUAllocator.

* refactor to get allocator from executionproviders instead of execution provider.
2019-02-12 14:14:21 -08:00
jignparm
0c4fef9ac2
Jignparm/removemodelcopies (#471)
* Adding initial props file updates to support native projects

* remove unnecessary header files

* removed double backslashes

* only include c api header, drop cxx api

* Remove copying of test models
2019-02-12 13:04:51 -08:00
Raymond Yang
ec8ac04f30
Update cast op to support string <-> numeric (#379)
* Update cast kernel to support to/from string

* Update namespace

* Add support for literal numeric case

* Update to support -INF test

* Update kernel registration for cast

* Update ONNX to 1.4.1

* Update registy api

* Resolve some comments

* Update cast kernel implementation

* Resolve comments

* Fixed test data in onnx

* Update cast kernel implementation

* Resolve PR comments

* Update cast_op.cc

* Update onnx commits info

* Update comments
2019-02-12 10:10:56 -08:00
shahasad
f72474c24b
Updated System requirements in README.md (#466)
* Updated System requirements in README.md

* spell correct
2019-02-12 09:58:20 -08:00
Hector Li
e7c1b774e8
Move build dependencies like setuptools wheel numpy into docker image (#468)
* Move build dependencies like setuptools wheel numpy into docker image, so won't install them again and again for docker build

* revert the changes in install_deps.sh
2019-02-11 21:29:36 -08:00
Hariharan Seshadri
892c0653cc
TopK Op: Include support for valid axis attribute in implementation (#461) 2019-02-11 16:05:47 -08:00
Hariharan Seshadri
fdd71574d6
misc: Fix comment in op_node_proto_helper (#460)
* Fix comment in op_node_proto_helper

* PR feedback
2019-02-11 14:38:43 -08:00
shahasad
88949485ff
removed MklDnn dependency from C# (#455) 2019-02-11 14:23:09 -08:00
Jesse Benson
e57b5116d6 BrainSlice parameter represents the IP. Update parameter name to match 2019-02-11 13:01:26 -08:00
Dmitri Smirnov
aac711ab2f
Implement Sign operator. (#456)
Implement Sign operator.
2019-02-11 10:25:54 -08:00
Yufeng Li
360fc32db4
compute forward and backward parallel for MLAS and not use_openmp (#457) 2019-02-08 17:20:45 -08:00
jignparm
5d00b8b375
Fix docker gpu test for csharp package cuda 9.1 and 10 (#432)
* Fix docker gpu test for csharp package cuda 9.1 and 10

* correct docker file name
2019-02-08 14:02:38 -08:00
Yufeng Li
7b37dc6105
Enable USE_MKLML_FOR_BLAS (#387)
* Enable USE_MKLML_FOR_BLAS

* add mklml include directory for onnxruntime_provider and onnxruntime_provider_cuda

* add mklml_include_dir to include_directories
2019-02-08 07:14:37 -08:00
Pranav Sharma
db0fde9add
Make USE_MLAS macro conditional on cmake flag for consistency with other options and make it ON by default. It was already enabled by default today. (#454) 2019-02-07 18:33:00 -08:00
Changming Sun
4cdb0cbf6e A tiny fix in KernelCreateInfo 2019-02-06 17:59:20 -08:00
edgchen1
fb04940ad3
Initial implementation of NonZero op. (#437)
Initial implementation of NonZero op.
2019-02-06 17:46:31 -08:00
Changming Sun
7c70d9349a Fix a bug in execution_provider.cc 2019-02-06 17:08:38 -08:00
Dmitri Smirnov
657d46fb3c
Output empty shape scalar for empty input. (#451) 2019-02-06 17:04:19 -08:00
Changming Sun
f20258e9ed Delete dead code 2019-02-06 15:34:41 -08:00
shahasad
8a8d1b0cea
Fix MacOS shared library build (#447)
* try removing the --version-script

* remove --no-undefined flag

* remove the -rpath linker flag

* remove the -rpath linker flag, including the -Wl

* remove the --whole-archive flags

* added -all_load -noall_load flags in place of --whole-archive and --no-whole-archive

* spell correct all-load

* set the MacOS specific cmake configs with if(APPLE) condition

* added --build_shared_lib to mac CI
2019-02-06 15:27:37 -08:00
Hector Li
f14b258a5c
Fix float 16 type support for some CUDA kernels (#436)
* Correct the Consts::Zero & Consts::One for half type

* 1. Fix the CreateConstantOnes for float16 type
2. Add cuda kernel code in the BatchNorm for float 16 type, there's issue to run cudnnBatchNormalizationForwardInference with float 16 type
3. Add float 16 test case for Gemm & BatchNorm CUDA kernel only

* Fix build

* fix Linux build

* fix build

* Update the fix for BatchNorm, still use cuddn API cudnnBatchNormalizationForwardInference. The root case is, for half type, should use alpha, beta, scale, B, mean, var with float type.

* fix build

* enable 2 fp16 models for GPU test

* enable fp16 test for MaxPool

* Need to adjust per_sample_tolerance configuration in the model test
2019-02-06 14:17:36 -08:00
Changming Sun
5866e853c4 Add dev notes 2019-02-06 14:10:48 -08:00
Raymond Yang
7cd393d697
Fix 3.7 build; Add cuda version in README (#427) 2019-02-06 13:38:04 -08:00
Weixing Zhang
b29c6e48b4
The files of graph_transformer.h and rewrite_rule.h has been moved. (#446) 2019-02-06 13:30:39 -08:00
Changming Sun
405c4bacbc Fix a bug in SessionState 2019-02-06 13:28:03 -08:00
Dmitri Smirnov
c932ab8e99
Implement ConstantOfShape (#443)
Implement ConstantOfShape
2019-02-06 11:38:22 -08:00
stevenlix
4038db14e2
update trt due to removing reference counting API changes (#444) 2019-02-06 10:49:00 -08:00
Weixing Zhang
851e291f22
Make OpKernelInfo not depend on SessionState. (#442) 2019-02-05 22:38:50 -08:00
Changming Sun
9faac70dae Delete Tensor's copy constructor 2019-02-05 16:38:27 -08:00
Hariharan Seshadri
d35409f58e Support uint8 datatype for Upsample op in CPU and CUDA providers (#440) 2019-02-05 15:08:52 -08:00
Randy
2062c49033 Rashuai/fix dilation (#415)
* test with conv

* add dilation to shape inferencing

* add test cases

* add test cases
2019-02-04 23:28:27 -08:00
Weixing Zhang
696ab8a194
Create a separate component for graph optimization. (#421)
* Create a project for graph optimizer.

Move optimizer related code to the folder optimizer.

* Fix build failures.

* rebase and fix build failures.

* fix build failure.

* fix build failure with cuda path.

* fix python build failure.

* Move two transformers(memcpy and insert_cast) from framework to optimizer.

* rebase.

* SessionState should not depend on optimizer.
2019-02-04 15:45:12 -08:00
shahasad
737700f94f
fixed the win10 runtime paths to win (#435) 2019-02-04 12:16:53 -08:00
souptc
214c1b88e3 fix brainslice break 2019-02-04 10:28:02 -08:00
jignparm
3b061d60a9
Updating new protobuf generated C# file (#430) 2019-02-01 17:33:46 -08:00
Artem Rudoy
5cac965471
Copy input tensors (#395)
* Copy input tensors

* Check that default CPU execution provider is registered successfully

* Insert Memcpy only when an input is connected to both provider and non-provider nodes.
2019-02-01 14:53:45 -08:00
Changming Sun
ebfed60741 Resync protobuf def 2019-02-01 14:51:58 -08:00
Scott McKay
f85cd520c0
Recurse into subgraphs in transformers and session initialization (#368)
* Add Recurse method to GraphTransformer.
Move GraphTransformer::Apply to ApplyImpl and make private.
Add non-virtual GraphTransformer::Apply method to handle calling Graph::Resolve in a more consistent manner.
Create MemcpyTransformer GraphTransformer to handle memcpy operations on subgraphs in a more standard way.

* Checkpoint

* Make the subgraph insert less verbose

* Add graph nesting level to transformer ApplyImpl
Tweak cast transformer to recurse nicely and avoid unnecessary Resolve calls by splitting out the duplicate removal into a separate transformer.
Decouple memcpy transformer from ExecutionProviders and minimise what's in the header.

* Recurse into subgraphs inside GraphPartitioner

* Update a couple of new transformers

* Check Recurse return value.

* Cleanup some memory management in inference session by moving some things into SessionState

* Add deleted flag to rewrite rules so we stop processing nodes that are removed.
Remove some (most likely) unnecessary Resolve calls. As we always call Resolve for a graph modified by a transformer there's generally no need for the transformer to do it.

* Minor cleanups.

* Add some extra usage information to the comments in GraphTransformer.

* Address PR comments
2019-02-02 06:03:00 +10:00
Ashwini Khade
93bcd9beb6 Type and Shape inference for QuantizeLinear and DeQuantizeLinear Ops (#408)
* Type and Shape inference for QuantizeeLinear and DeQuantizeLinear Ops

* removing redundant type checking for some inputs and outputs

* remove unnecessary type check deom type inference
2019-02-01 07:59:45 -08:00