Commit graph

112 commits

Author SHA1 Message Date
Dmitri Smirnov
d34fb62012
Introduce container type runtime checks and other improvements (#2522)
Rework TensorSeq in a manner consistent with Tensor and SparseTensor
  in terms of type system setup.
  Reduce templating. Introduce helpers to ensure the same
  data type.
  Make OrtValue __dtor not virtual.
  Introduce ContainerChecker
2019-12-04 16:04:17 -08:00
Hariharan Seshadri
5c2e474751
Add provision in ORT for session options to be parsed when available via model file (#2449)
* Initial commit

* Fix gitmodules

* Nits

* Nits

* Updates

* Update

* More changes

* Updates

* Update

* Some updates

* More changes

* Update

* Update

* Merge

* Update

* Updates

* More changes

* Update

* Fix nits

* Updates

* Fix warning

* Fix build

* Add comment

* PR feedback

* PR feedback

* Updates

* Updates

* Update

* More changes

* Fix build break

* Comment test for now

* Updates

* Updates

* PR feedback

* Updates

* Nits

* Add tests

* Fix build

* Fix build

* Fix build

* Fix build break

* Fix build

* Nits

* PR feedback

* More change

* Expose GetSessionOptions in pybind logic and add unit test for python

* Fix build

* PR feedback

* PR feedback
2019-12-03 16:56:07 -08:00
Jeff Bloomfield
b9faa0b6fd Fix kernel registry validation to reenable DML kernels 2019-12-02 15:43:44 -08:00
Tianlei Wu
e57b735bb9 Add a transformer to use Gelu approximation for cuda provider (#2480)
* Add Gelu Approximation Transformer to convert Gelu or AddGeluFusion to FastGelu to get better inference performance.
2019-11-27 10:15:50 -08:00
Changming Sun
0341ee9060
Clean up build.py (#2446) 2019-11-22 12:14:03 -08:00
Changming Sun
109b3cb450
Avoid using the default logger in the graph lib and optimizers (#2361)
1. Use the session logger if it is available.
2. Don't disable warning 4100 globally. We should fix the warnings instead of disabling it.
2019-11-14 13:23:28 -08:00
Pranav Sharma
7e164eaa6a
Fix reuse logic in allocation planner. (#2393)
* Fix reuse logic in allocation planner.

* PR comments

* Add helpful comments

* Don't allow reuse across string tensors.
2019-11-13 22:51:12 -08:00
KeDengMS
ff64d1f55b
Relax check for optimized model saving (#2291)
So user may save model with layout optimization.
2019-10-30 21:48:49 -07:00
Scott McKay
983a616bda
Revert to using opset 7 as the default for OpTester. Add explanation as to why that is: (#2256) 2019-10-30 09:42:21 +10:00
Scott McKay
20e6a2b6da
Disable optimizers for OpTester operator unit tests (#2237)
* Disable optimizers for operator unit tests as they're intended to test the operator directly rather than something that could have been modified by an optimizer.

Disable TensorRT for Scan9 unit tests that fails when optimizers are enabled. Bug 525222 tracks that.

* Disable TRT for the lenient shape inferencing test as it uses Unsqueeze and TRT doesn't cope with that op.
2019-10-24 11:37:09 -07:00
Ryan Hill
6fca8b0a94 Move CXX API global into the header (#2228) 2019-10-23 14:15:53 -07:00
edgchen1
856c6cae0a Edgchen1/endian utils (#2181) 2019-10-21 22:28:35 -07:00
Scott McKay
3cda9f717b Relax shape inferencing error handling if model uses an old opset (#2199) 2019-10-21 10:51:22 -07:00
Konstantinos Karanasos
33c639a022 Slice elimination support for opsets 10 and 11 (#2171)
* work on slice elimination for opset 10

* more work on slice elimination

* first working version

* adding python notebook for building models; fixing test

* fixing build error in macOS
2019-10-20 01:14:55 -07:00
Changming Sun
13f8b49d58
Fix kernel registry bug (#2137) 2019-10-17 23:10:54 -07:00
Pranav Sharma
91db840b6b
Introduce execution mode enum for clarity and extensibility; Change Python, C and C# APIs accordingly; Removed EnableSequentialExecution, DisableSequentialExecution in favor of the more general SetExecutionModeAPI. (#2098)
* Introduce execution mode for clarity and extensibility; Change Python APIs accordingly; Replace DisableSequentialExecution API with EnableParallelExecution for clarity.

* Fix cuda build

* Modify the test slightly

* Make C and C# APIs consistent with Python.
2019-10-14 09:48:19 -07:00
Ryan Hill
e8e33977da
Ryanunderhill/customop dll (#2002)
* Add OrtApiBase
* Add RegisterCustomOpsLibrary API
2019-10-11 11:12:51 -07:00
Changming Sun
eee9c55030
C++11 fix for memcpy_transformer_test.cc (#2061) 2019-10-09 10:52:10 -07:00
Pranav Sharma
ea60469af5
Support seq(tensor), implement 2 sequence ops that use the new type. (#1983)
* Mention OrtCreateSessionFromArray in C API doc

* fix seq of tensors

* changes on 9/30

* All tests passing

* Add SequenceAt op

* Fix shared_lib non_tensor_types test

* Address some PR comments

* Address PR comments

* Add support in python bindings to accept seq(tensor)

* Change data type from vector<Tensor> to TensorSeq

* Change data type from vector<Tensor> to TensorSeq

* Added some documentation

* Added missing test model

* Fix Linux build

* Fix Mac build

* Fix Mac build
2019-10-07 15:35:09 -07:00
Pranav Sharma
4cdb95e436
Resort to sequential execution if the inter op thread pool ptr is nullptr; (#2023) 2019-10-06 16:08:41 -07:00
Changming Sun
a9e04a29b3
Ignore a test: ParallelExecutor.StatusPropagation (#2019) 2019-10-04 22:51:47 -07:00
daquexian
e071a1249b Android CI (#1600) 2019-10-04 17:39:51 -07:00
Dmitri Smirnov
627f853a44
Downgrade compiler to CentOS 4.8.5 (#1985)
Make onnxruntime CPU build and run on CentOS GCC 4.8.5
2019-10-03 15:40:46 -07:00
shahasad
103b92889e
Opset-11 support (negative axis) for reduce ops (#1929) 2019-10-02 13:45:17 -07:00
Dmitri Smirnov
d1b1cdc5c4
Replace GSL with GSL-LITE submodule and fix up refs (#1920)
Remove gsl subodule and replace with a local copy of gsl-lite
  Refactor for onnxruntime::make_unique
  gsl::span size and index are now size_t
  Remove lambda auto argument type detection.
  Remove constexpr from fail_fast in gsl due to Linux not being happy.
  Comment out std::stream support due to MacOS std lib broken.
  Move make_unique into include/core/common so it is accessible for server builds.
  Relax requirements for onnxruntime/test/providers/cpu/ml/write_scores_test.cc
  due to x86 build.
  Add ONNXRUNTIME_ROOT to Server Lib includes so gsl is recognized
2019-10-01 12:43:29 -07:00
Hariharan Seshadri
aacfa2af65
Bump up ONNX to the latest commit (#1868)
* Initial commit

* Delete unnecessary files

* Update generated proto files

* Update server proto file

* Update submodule onnx

* Update OnnxMl.cs

* update OnnxMl.cs

* Update OnnxMl.cs

* Comment one test

* Update disabled test list

* Update backend tests

* Formatting fix

* Formatting

* Disable a test

* More tests updated

* commit id update

* Update to a newer commit

* More updates

* More test updates

* Update

* Update

* Updates

* Update
2019-09-20 18:15:16 -07:00
Ryan Hill
5781222456
Ryanunderhill/api interface (#1855)
* Convert ABI to a versioned interface.
* Convert ORT_THROW_ON_ERROR to inline function to fix link errors.
2019-09-20 13:39:11 -07:00
Pranav Sharma
a9ce941579
Refine threading control options and move inter op thread pool to session state. (#1841)
Description: Refine threading control options and move inter op thread pool to session state.
Added thread_utils.h/cc to centralize the decision around the thread pool size under various conditions.

Motivation and Context
Currently the thread pool size of the parallel executor is hardcoded to 32 for some reason. This PR makes the options to configure the thread pool sizes clearer.
2019-09-18 22:36:23 -07:00
Changming Sun
dc03ce0278
New OP: CDist (#1808)
Add a new op for scikit-learn converter. It's for scikit's cdist function:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdist.html

Will add docs and shape-inference function later.
Will convert it to an ONNX function before pushing into ONNX.
2019-09-17 10:55:31 -07:00
Scott McKay
fc1c9971a3
Support replacing OrtValue of feed in IOBinding instance (#1819)
* Reinstate support for replacing a feed in the IOBinding class.
Add unit test to validate.
2019-09-17 07:43:47 +10:00
Pranav Sharma
f8c3442880
Part 2 of renaming AllocatorInfo to MemoryInfo. (#1804)
* Mention OrtCreateSessionFromArray in C API doc

* Part 2 of renaming AllocatorInfo to MemoryInfo.

* pr comments

* fix comment
2019-09-12 08:19:29 -07:00
Pranav Sharma
52fe574fed
Rename OrtAllocatorInfo to OrtMemoryInfo to make it more obvious. (#1758)
* Mention OrtCreateSessionFromArray in C API doc

* Rename OrtAllocatorInfo to OrtMemoryInfo to avoid confusion
2019-09-05 14:20:37 -07:00
KeDengMS
c9240f4e93
Implementation of Nuphar execution provider (#881)
* Implement Nuphar execution provider

Nuphar execution provider is a TVM-based compilation provider. It has shown great speedups for RNN models using Scan.
This PR is mainly for a preview of the shared codegen library for other TVM-based providers.

* Fix submodules

* Fix TVM submodule

* Update Nuphar to latest and resolve confliction

* Remove stale files caused by merge -X theirs

* Revert heap buffer change to not introduce onnxruntime_framework into onnxruntime_perf_test

* Fix bad merge

* Merge from Nuphar

* Fix warning treated as error, revert some unnecessary changes

* Revert some more test changes

* Some more test revert or comments to make review easier
New tests could be added later

* One more revert of unnecessary changes

* More change revert. Test could be added back later.
2019-09-01 23:01:47 -07:00
Pranav Sharma
25d02a33c8
Fix reading of onnx domain causing one of the automl models to break in 0.5 release. (#1694)
* Mention OrtCreateSessionFromArray in C API doc

* Fix registration of Equal op causing one of the automl models to break in 0.5 release.

* updates...
2019-08-29 12:18:39 -07:00
Changming Sun
4de0aa8049
Optimize kernel index (#1672) 2019-08-22 10:26:35 -07:00
Changming Sun
6b89c7ad04
Let mlas use session thread pool (#1609)
1.Let mlas use session thread pool
2.Remove onnxruntime_USE_MLAS cmake option
3. Remove the win32 thread pool code inside mlas

mlas will:

1.use ort thread pool if it get passed in
2.use openmp if the threadpool parameter is nullptr
3.run single threaded if the threadpool parameter is nullptr and openmp is disabled.
2019-08-16 13:21:15 -07:00
Ashwini Khade
0044be6259
update onnx to latest commit (#1622)
* update onnx to latest commit

* Disable and/or fix failing tests

* disable not yet implemented tests for opset 11

* disable tests

* fix bug in mkldnn fp16 graph check
2019-08-15 17:10:32 -07:00
pulkittomar
a50a63aa9e Serialize optimized onnx model (#1470)
* Model serialization

* Removed duplicate symbol

* Minor update

* Review comments

* add tests

* Model serialization

* Removed duplicate symbol

* Minor update

* Merged PR 1106437: Model Serialization in onnxruntime

* Review comments

* Merged PR 1107226: Review comments

Review comments

* add tests

* Fixed merge conflict

* Correct python tests

* InferenceSesssion Refeed Test

* Replace use of widechar const literal-L

* Fixed failing tests

* Updated comment

* Removed unnecessary session options

* Spell check on comments

* Do not serialize when level 3 optimization specified

* Updated error logs

* Changed log severity to WARN
2019-08-12 18:43:40 -07:00
stevenlix
1c5b15c2b8
Remove memory copy between TensorRT and CUDA (#1561)
* remove memory copy between CUDA and TRT

* add info to RegisterExecutionProvider input

* use new IDeviceAllocator for trt allocator

* remove SetDefaultInputsMemoryType from TRT EP

* remove onnx-tensorrt 5.0

* add submodule onnx-tensorrt branch 5.1

* remove redundancy

* Update transformer_memcpy.cc

* Update tensorrt_execution_provider.cc

* switch to TensorRT 5.1.5.0

* update python binding

* disable failed test case on TensorRT

* Update activation_op_test.cc

* upgrade to TensorRT container 19.06

* update according to feedback

* add comments

* remove tensorrt allocator and use cuda(gpu) allocator

* update onnx-tensorrt submodule

* change ci build cuda directory name
2019-08-08 19:31:39 -07:00
Pranav Sharma
a443b013dd
Remove unneeded C APIs + some refactoring. (#1555)
* Mention OrtCreateSessionFromArray in C API doc

* c api changes after review (1)

* updates...

* fixes

* Reorder include
2019-08-07 11:05:29 -07:00
Scott McKay
9fb8867a24
Don't create implicit input for outer scope value if there is a subgraph input with the same name. (#1186)
* If there is an outer scope value that matches a subgraph input, don't create an implicit input from the outer scope value.

Minor unrelated change for issue noticed while debugging: Use unordered_set for implicit inputs so we don't add them multiple times.

* Add unit test based on onnx issue.
2019-08-02 07:23:41 +10:00
Yufeng Li
d6a30485be
Rename Tensor.Size() to Tensor.SizeInBytes() (#1502)
Rename Tensor.Size() to Tensor.SizeInBytes()
2019-07-26 14:15:53 -07:00
Hariharan Seshadri
751ee7bb23
Fix bug in TransformerMemcpy (#1413)
* Initial commit

* Add test case

* Revert unintentional change

* Update comments

* Resolve PR feedback

* Craft test casse and add more logs

* Fix build failures

* Fix minor bug in the way modified is updated

* Remove full model inference session test

* Resolve PR comments

* Resolve more PR feedback

* Resolve more PR feedback

* Resolve more PR comments

* Remove logging

* Move GetInitializer() method to memcpy_transformer scope

* Remove some unnecessary blank lines

* Make GetInitializer static
2019-07-19 13:54:08 -07:00
Yuan Yu
93fb62bb3e More code cleanup (#1405)
* More code cleanup

* More cleanup
2019-07-17 14:45:50 -07:00
Scott McKay
61b733ce6d
Update optimizers to be able to utilize a constant initializer from an ancestor graph (#1346)
* Now that we check for a constant initializer in an ancestor graph we also need to be able to retrieve and replace that initializer.
Add helpers to do so.
Update optimizers to use the new helpers.
Fix bug in UnsqueezeElimination where it wasn't checking if the initializer it was replacing was constant.
2019-07-15 12:41:01 +10:00
Ke Zhang
3bf0e364e2
Move CopyTensor out of IExecutionProvider interface. (#1268)
* add ortdevice class

* add data transfer manager for copying tensors.

* update

* add data trasnfer for gpu

* fix constexpr build break.

* update

* remove unnecessary header files.

* remove unnecessary header files.

* add dependency

* add dependency

* add dependency

* add dependency

* fix linux build break.

* update

* fix build break

* fix build break

* fix build break

* update

* update

* update c api.

* update to not use OrtCreateAllocatorInfo

* change to all eps .

* fix linux build break

* remove useless codes.

* update

* move datatransfermanager in session state

* update

* fix cuda build break.

* fix comments

* fix windows GPU build.

* fix comments

* fix build break

* fix comments

* fix test failure

* update

* fix comments

* fix onnx runtime server.

* update

* fix test failure.

* fix comments

* fix comment
2019-07-11 14:49:20 -07:00
Scott McKay
ac6a4afb0f
Add validation of shape when re-using a buffer in ExecutionFrame (#1356)
* Check for empty string as dim_param in allocation planner.
* Validate shape is compatible at runtime when re-using Tensor.
2019-07-09 14:59:07 +10:00
Pranav Sharma
e9ce51ead4
Make GetTensorShapeFromTensorShapeProto return TensorShape and not it's internal representation. (#1353) 2019-07-08 11:45:55 -07:00
Scott McKay
e3919d3fce
Cleanup naming of test input to use .onnx for models. (#1337)
* Cleanup naming of test input to use .onnx for models.

* Remove file deleted on master
2019-07-04 13:10:29 +10:00
Scott McKay
9d3b6b3a49
Disallow overriding initializers if IR version < 4 (#1324)
Description:

Disallow overriding an initializer via a graph input if the IR version is < 4. This enforces an implicit assumption that initializers should be treated as constant, and allows constant folding to be done on a model with an older IR version.
Separate constant and overridable initializers so that it's clear which ones constant folding can utilize.
Update Graph to not add all initializers to the graph inputs when the graph is manually created (i.e. not loaded from a GraphProto) and the IR version is >= 4.
Motivation and Context
In order to do constant folding we need to know which initializers can be treated as constant and which are overridable. All initializers were required to have a matching graph input prior to IR version 4, technically making all of them overridable. The intention however was for them to be treated as constants, and this change enforces that intent.

The benefit of doing so is that constant folding will work for models with IR version < 4. The cost is that if someone is actually overriding an initializer they will need to update the IR version of their model to version 4 in order to keep doing so. The belief is that this is a very small subset of usage (e.g. models involving feeding in a truncated sequence) and the cost to update that small subset is warranted by the benefit of constant folding being able to be enabled on all older models without them needing an IR version update.
2019-07-03 18:43:38 +10:00