Commit graph

406 commits

Author SHA1 Message Date
shahasad
8a8d1b0cea
Fix MacOS shared library build (#447)
* try removing the --version-script

* remove --no-undefined flag

* remove the -rpath linker flag

* remove the -rpath linker flag, including the -Wl

* remove the --whole-archive flags

* added -all_load -noall_load flags in place of --whole-archive and --no-whole-archive

* spell correct all-load

* set the MacOS specific cmake configs with if(APPLE) condition

* added --build_shared_lib to mac CI
2019-02-06 15:27:37 -08:00
Hector Li
f14b258a5c
Fix float 16 type support for some CUDA kernels (#436)
* Correct the Consts::Zero & Consts::One for half type

* 1. Fix the CreateConstantOnes for float16 type
2. Add cuda kernel code in the BatchNorm for float 16 type, there's issue to run cudnnBatchNormalizationForwardInference with float 16 type
3. Add float 16 test case for Gemm & BatchNorm CUDA kernel only

* Fix build

* fix Linux build

* fix build

* Update the fix for BatchNorm, still use cuddn API cudnnBatchNormalizationForwardInference. The root case is, for half type, should use alpha, beta, scale, B, mean, var with float type.

* fix build

* enable 2 fp16 models for GPU test

* enable fp16 test for MaxPool

* Need to adjust per_sample_tolerance configuration in the model test
2019-02-06 14:17:36 -08:00
Changming Sun
5866e853c4 Add dev notes 2019-02-06 14:10:48 -08:00
Raymond Yang
7cd393d697
Fix 3.7 build; Add cuda version in README (#427) 2019-02-06 13:38:04 -08:00
Weixing Zhang
b29c6e48b4
The files of graph_transformer.h and rewrite_rule.h has been moved. (#446) 2019-02-06 13:30:39 -08:00
Changming Sun
405c4bacbc Fix a bug in SessionState 2019-02-06 13:28:03 -08:00
Dmitri Smirnov
c932ab8e99
Implement ConstantOfShape (#443)
Implement ConstantOfShape
2019-02-06 11:38:22 -08:00
stevenlix
4038db14e2
update trt due to removing reference counting API changes (#444) 2019-02-06 10:49:00 -08:00
Weixing Zhang
851e291f22
Make OpKernelInfo not depend on SessionState. (#442) 2019-02-05 22:38:50 -08:00
Changming Sun
9faac70dae Delete Tensor's copy constructor 2019-02-05 16:38:27 -08:00
Hariharan Seshadri
d35409f58e Support uint8 datatype for Upsample op in CPU and CUDA providers (#440) 2019-02-05 15:08:52 -08:00
Randy
2062c49033 Rashuai/fix dilation (#415)
* test with conv

* add dilation to shape inferencing

* add test cases

* add test cases
2019-02-04 23:28:27 -08:00
Weixing Zhang
696ab8a194
Create a separate component for graph optimization. (#421)
* Create a project for graph optimizer.

Move optimizer related code to the folder optimizer.

* Fix build failures.

* rebase and fix build failures.

* fix build failure.

* fix build failure with cuda path.

* fix python build failure.

* Move two transformers(memcpy and insert_cast) from framework to optimizer.

* rebase.

* SessionState should not depend on optimizer.
2019-02-04 15:45:12 -08:00
shahasad
737700f94f
fixed the win10 runtime paths to win (#435) 2019-02-04 12:16:53 -08:00
souptc
214c1b88e3 fix brainslice break 2019-02-04 10:28:02 -08:00
jignparm
3b061d60a9
Updating new protobuf generated C# file (#430) 2019-02-01 17:33:46 -08:00
Artem Rudoy
5cac965471
Copy input tensors (#395)
* Copy input tensors

* Check that default CPU execution provider is registered successfully

* Insert Memcpy only when an input is connected to both provider and non-provider nodes.
2019-02-01 14:53:45 -08:00
Changming Sun
ebfed60741 Resync protobuf def 2019-02-01 14:51:58 -08:00
Scott McKay
f85cd520c0
Recurse into subgraphs in transformers and session initialization (#368)
* Add Recurse method to GraphTransformer.
Move GraphTransformer::Apply to ApplyImpl and make private.
Add non-virtual GraphTransformer::Apply method to handle calling Graph::Resolve in a more consistent manner.
Create MemcpyTransformer GraphTransformer to handle memcpy operations on subgraphs in a more standard way.

* Checkpoint

* Make the subgraph insert less verbose

* Add graph nesting level to transformer ApplyImpl
Tweak cast transformer to recurse nicely and avoid unnecessary Resolve calls by splitting out the duplicate removal into a separate transformer.
Decouple memcpy transformer from ExecutionProviders and minimise what's in the header.

* Recurse into subgraphs inside GraphPartitioner

* Update a couple of new transformers

* Check Recurse return value.

* Cleanup some memory management in inference session by moving some things into SessionState

* Add deleted flag to rewrite rules so we stop processing nodes that are removed.
Remove some (most likely) unnecessary Resolve calls. As we always call Resolve for a graph modified by a transformer there's generally no need for the transformer to do it.

* Minor cleanups.

* Add some extra usage information to the comments in GraphTransformer.

* Address PR comments
2019-02-02 06:03:00 +10:00
Ashwini Khade
93bcd9beb6 Type and Shape inference for QuantizeLinear and DeQuantizeLinear Ops (#408)
* Type and Shape inference for QuantizeeLinear and DeQuantizeLinear Ops

* removing redundant type checking for some inputs and outputs

* remove unnecessary type check deom type inference
2019-02-01 07:59:45 -08:00
Ashwini Khade
60cdb79204 Enable tests for EyeLike and enable datatypes present in tests (#424)
* Enable tests for EyeLike and enable datatypes present in tests

* fix failure
2019-02-01 00:21:08 -08:00
Changming Sun
9f0298261d Fix a build warning in onnxruntime python extension (#416) 2019-02-01 00:19:41 -08:00
Changming Sun
d369ff945d Re-enable python tests (#419) 2019-02-01 00:18:52 -08:00
Raymond Yang
011a784eaa
Merge back from rel-0.2.1 (#422)
* Addl TPN updates (#403)

* Updated TPN

* Update batch_norm_op_test.cc

* Update ThirdPartyNotices.txt

* Update ThirdPartyNotices.txt

* Update readme with package links

* Update README.md

* Update README.md

* Update README.md

* Merged Ryan and TPN changes into single PR

* minor fix

* added mkldnn to GPU pipeline. Required by C# library as it is the default execution provider

* Bump up version number for 0.2.1 release (#420)
2019-01-31 19:04:33 -08:00
Changming Sun
6ae6853519 Update test data md5 2019-01-31 17:07:25 -08:00
Scott McKay
efb72540be
Separate out constant node index information from ExecutionFrame (#410)
* Separate out the NodeArg index information from ExecutionFrame so it is only calculated once.

* Skip copy to/from device if only CPU execution provider is registered.
Cleanups.

* Address PR comments.
Clean up a few areas.

* Fix Linux build error
2019-02-01 10:55:49 +10:00
Changming Sun
fb7be27096 Update test dataset 2019-01-31 13:11:45 -08:00
Changming Sun
266201de0a limit thread pool size when running mkldnn model tests 2019-01-31 10:18:20 -08:00
DronexAI
70985fa803 Update BUILD.md 2019-01-31 08:36:48 -08:00
Faith Xu
91ffb980a2 Addl TPN updates (#403)
* Updated TPN

* Update batch_norm_op_test.cc

* Update ThirdPartyNotices.txt

* Update ThirdPartyNotices.txt

* Update readme with package links

* Update README.md

* Update README.md

* Update README.md

* Merged Ryan and TPN changes into single PR

* minor fix

* added mkldnn to GPU pipeline. Required by C# library as it is the default execution provider
2019-01-30 17:28:17 -08:00
Konstantinos Karanasos
c76725da2d
Slice elimination rewrite rule; re-implementation of identity elimination using new Graph API (#87)
Rewrite rule that eliminates slice operators when they are redundant (i.e., when they preserve the whole input).
Re-implementation of the identity elimination rule using the latest Graph API.
2019-01-30 16:19:40 -08:00
Changming Sun
59955da188 Add test case for pre-allocated output 2019-01-30 16:04:50 -08:00
Changming Sun
925b3f059c Update cgmanifest.json 2019-01-30 13:10:53 -08:00
Vinitra Swamy
ea94cbf14b
Docker containers for CPU and GPU quickstart (#332)
* Docker Container for CPU Version (Ubuntu 16.04, Python3 Bindings, compatible with Windows-Docker)
* Nvidia-Docker Container for GPU Version (Ubuntu 16.04, CUDA, CUDNN, Python3 Bindings)
* README with Docker quickstart instructions (i.e. docker pull .../onnxruntime:cpu, docker run -it ...)
* Include plans to publish public images (with ONNX Runtime 0.2) on README
2019-01-30 10:58:30 -08:00
Changming Sun
583631adf5 Add cgmanifest.json 2019-01-30 00:46:09 -08:00
Changming Sun
6fc48c60de Add win-ci-pipeline-cg.yml 2019-01-29 20:49:55 -08:00
Tracy Sharpe
20ef8b43a6
Change GEMM kernels to natively handle broader range of row counts (#406)
The outer GEMM loop repeatedly calls the inner GEMM kernel with a row count (the M parameter to GEMM) and the inner kernel decides how many rows it will actually handle. The FMA3 kernel only handled row counts of 1,3,6 to keep code size down. To be competitive however, the FMA3 kernel needs to handle any row count from 1-6.

One example model was issuing a GEMM with M=11 and this had been broken up into 6,3,1,1, but can now be handled as 6,5.

The kernels have been templatized MASM style to avoid the cut/paste code from the original implementation. The Linux variants will be updated after doing some additional work on the MASM variants first.
2019-01-29 17:20:32 -08:00
shahasad
d75bdc5194
updated C# API doc (#405)
* updated C# API doc

* Add how to run on GPU
2019-01-29 16:37:04 -08:00
Ryan Hill
09806625cf
Rename OrtInitialize to OrtCreateEnv in preparation for future. (#399)
* Rename OrtInitialize to OrtCreateEnv in preparation for future.
Add version number to structures

* Forgot about exports

* Update documentation
2019-01-29 15:03:18 -08:00
Randy
2f73d7abf8
compile with GL/LTCG (#391)
* compile with GL/LTCG

* apply the change to release build

* remove GL/LTCG from release build

* exclude cuda from using GL/LTCG
2019-01-29 14:18:23 -08:00
Yufeng Li
f13b9ac429
Refine word_conv_embedding (#388)
Compute conv with one gemm
2019-01-29 10:35:45 -08:00
Xavier Dupré
439dbbada9
Adds OnnxTransformer to plug onnxruntime in sckit-learn's pipeline (#389)
Useful for transfer learning
2019-01-29 18:51:24 +01:00
jignparm
7c21c15732
Jignparm/addcsharptestgpu (#393)
* add file for cshar test using CUDA docker image

* add gpu csharp end to end test scripts

* uncomment data download

* minor change to kick off ci build

* small change to kick off build

* Add windows GPU test runner script
2019-01-29 09:20:07 -08:00
jignparm
68881fadcd Delay load cudart64 for cpu execution 2019-01-29 08:50:37 -08:00
Scott McKay
d4d3270891
Update transpose logic for Scan 9 outputs so it's the inverse of the input transpose logic. (#401) 2019-01-29 22:17:04 +10:00
Scott McKay
b194b7df0d
Add the ability to use a custom allocator for fetches to avoid unnecessary copies in control flow operators. (#377)
* Add the ability to use a custom allocator for fetches.

Allows control flow nodes to forward the allocation to the control flow op and avoid an unnecessary copy when the subgraph output has a symbolic dimension.

Update Scan and If to use custom allocators when applicable.

* Remove unnecessary forward declaration

* Fix Mac build warnings
2019-01-29 19:48:10 +10:00
Faith Xu
b194d79dfb Third party attribution updates (#398)
* Update ThirdPartyNotices.txt

* Update murmur_hash3.cc

* Update normalizer_test.cc

* Update simple_thread_pool.h

* Update simple_thread_pool.h

* Update ThirdPartyNotices.txt
2019-01-28 23:37:02 -08:00
Scott McKay
8f215b44e0
Refactor InferenceSession::Impl::Load code to remove duplication. (#248)
* Add ability to initialize InferenceSession with a model that is already loaded.

* Cleanup some unnecessary namespace qualifications and some long lines.

* Remove InferenceSession::Initialize(std::shared_ptr<Model>&)

* Remove unit test for init from existing Model instance.
2019-01-29 17:18:38 +10:00
Ashwini Khade
b92bc99861
QLinearConv (#370)
* First draft QLinearConv

* Add shape inference for quantized conv operators

* adding test cases for QLinearConv

* plus minor corrections
2019-01-28 23:13:47 -08:00
shahasad
5ef4c90f1d Make the return namedonnxvalue objects disposable in C# API (#392)
* added the disposablenamedonnxvalue as result container

* C-API related fixes and tensorproto fix

* addressed some of the review comments
2019-01-28 21:40:19 -08:00