Commit graph

91 commits

Author SHA1 Message Date
Scott McKay
0ad940027c
Use ConstPointerContainer for Node::ImplicitInputDefs() for better consistency with InputDefs() and OutputDefs(). (#894) 2019-05-01 14:22:28 +10:00
Konstantinos Karanasos
1b7d1f2645
Convert constant folding to a transformer (#866) 2019-04-29 18:12:49 -07:00
Ke Zhang
f39a8d1f59
allow users to set graph inputs and outputs fully. (#905)
* allow users to set graph inputs and outputs fully.

* update

* update the comments of the APIs

* update

* remove commented-out codes.

* fix test failures.

* fix comments.

* adding more check to throw not support exception right now.
2019-04-29 15:58:39 +08:00
ybrnathan
b0a37477db Fix memory corruption issue when CPU->CUDA memcpy is involved (#879) 2019-04-22 20:21:14 -07:00
Yufeng Li
0bf12e9dbf
Add option to enable/disable memory pattern back (#872)
Memory pattern doesn't work for parallel executor by design. Enabling Memory Pattern for parallel executor logs warning and make the perf bad.
Add option to enable/disable memory pattern back.
2019-04-22 13:49:41 -07:00
nivas-x86
a4d7052aeb Add nGraph Execution Provider (#832)
* Add nGraph Execution Provider

* feedback changes 1

* feedback2

* Feedback and upgrade nGraph

* Feedback 4

* Fix CI

* Disable new ops
2019-04-20 17:02:35 -07:00
Konstantinos Karanasos
ada90086f7
More efficient rule-based transformer (#815)
Introduce a quick pre-filtering of rules based on the node op types they are targeting.
The goal is to avoid evaluating all rules for all nodes. Instead, for each node, we will only be evaluating the rules associated with its op type.
2019-04-18 17:10:13 -07:00
shschaefer
ff253631b5
Enable use of session based threadpool. (#854)
* Enable use of session based threadpool.

* Fix build dir issue
2019-04-18 10:20:46 -07:00
Ke Zhang
951c428ee1
Simplify the validation in Run call (#850)
* Simpplify Run()

* remove the lock

* remove a file added wrongly.

* fix tests

* fix c# test
2019-04-18 08:38:17 +08:00
Ke Zhang
41dc3130f5
no need putting initializers (for constant node) into graph inputs. (#665)
* constant node should not be put into graph inputs any more.

* simplify graph input/output set logic.

* refactor comments.

* remove adding initializers as graph inputs when creating graph from scratch.
2019-04-17 07:38:08 +08:00
Ashwini Khade
14d63b5f45
generate transformers bug fix (#838)
* fix graph transformer generation

* add more tests

* cosmetic changes

* more changes per review
2019-04-16 14:10:33 -07:00
Tracy Sharpe
f19d9a4907
Reduce code size of kernel registration (#833)
Some changes that reduce the size of the release onnxruntime.dll by 170KB:

Change the ONNX_OPERATOR_KERNEL macros to not create a unique virtual class per kernel create lambda, but instead use a generic class with the raw function address supplied at BuildCreateKernelInfo time.

Changed the exceution providers to use a table driven approach to calling the BuildCreateKernelInfo functions instead of a massive function with construct/call/delete sequences.

The CreateFunc in data_types.h didn't need to be a std::function, eliminating more lambda virtual classes.

N.B. To accommodate MSVC 14.11 toolchain (used for CUDA builds), the operator+() syntax cannot be used to retrieve the raw function address. The older toolchain can't resolve between cdecl/vectorcall and gives up. An explicit cast is needed to help the compiler along.
2019-04-15 16:39:59 -07:00
Tracy Sharpe
c55e2de593
Status class optimizations (#824)
optimize onnxruntime::common::Status to reduce code size
2019-04-11 21:57:01 -07:00
Ryan Hill
fda1d0dce9
Ryanunderhill/ocr custom op (#744)
* Adding a custom op interface to the C API to remove shared library dependency.
* Remove old custom op test
* Rework how custom ops handle inputs/outputs to enable custom op output shape calculation in the compute method
* Add a nicer C++ API for custom ops and switch the tests to use it.
2019-04-05 18:53:20 -07:00
utsabsingharoy
36ed91ee9f CustomRegistry should use composition instead of inheritence
CustomRegistry should use composition instead of inheritence
2019-04-05 14:14:10 -07:00
Konstantinos Karanasos
512cfdd9fe
Generalize node removal (#743)
Generalize node removal method in graph_utils. This is a higher-level method that keeps the graph consistent so that no Resolve is needed after the removal of a node. 
The new method supports the removal of nodes with a single input (be it an incoming node or an initializer) and a single output (but allowing multiple output edges of that output). It also takes into account the case that one of the output edges is fed to a subgraph.
Also updated the rewrite rules to use this new, less restrictive method, and improved the rules' conditions. Introduced a GraphEdge struct to simplify various methods in graph_utils.
2019-04-03 22:34:20 -07:00
Ahmad El Husseini
e643ce0e08 Fix inconsistent dimension data type in C-API (#726)
* update dimension type

* update dimension type for items added after 0.2.1

* fix gpu build
2019-03-29 00:23:25 -07:00
Ashwini Khade
77b981824a
fix graph transformers and refactor tests (#696)
* fix graph transformers and refactor tests

* fix merge master

* Set default optimization level to Level1

* fix build warnings for Linux

* try root cause tensorrt test failures

* try root cause tensorrt test failure

* Test level2 transformers with  all CI builds

* remove ConvActivation fusion transformer

* change default level back to level1

* remove providers from apply api

* more changes
2019-03-26 20:38:12 -07:00
Konstantinos Karanasos
a872ba7894
Convert Unsqueeze elimination to rewrite rule + improvements in graph utils and graph transformer utils (#670)
* Convert unsqueeze elimination to rewrite rule

* Simplify the way we register predefined transformers and rules in the inference session (all details are now moved to the graph transformer utils)

* Some reorganization and renaming of methods in graph_utils

* Updates in graph transformers test

* Update in edge removal to not perform unnecessary check of node args that led to race conditions when updating the graph

* Improve documentation for rewrite rules

* Remove top-down rule-based transformer (given we currently have only one type of rule-based transformer)
2019-03-26 13:58:15 -07:00
Changming Sun
a26696fb0e Enable LTO on Linux 2019-03-22 15:30:37 -07:00
Ryan Hill
cd52431b8f
Custom op interface to the C API to remove shared library dependency (#668)
* Adding a custom op interface to the C API to remove shared library dependency.

* Fixup const issues

* Renaming to make things a little simpler

* Add a comment
2019-03-21 15:46:50 -07:00
Pranav Sharma
5d452b3029
Use protobuf-lite to reduce onnxruntime.dll size. (#639)
* Test protobuf-lite

* Test protobuf-lite

* Test protobuf-lite

* Optimize protobuf usage for LITE_RUNTIME to reduce the binary size of
onnxruntime.dll. More details can be found here https://developers.google.com/protocol-buffers/docs/proto.
The reduction is significant. For commit id: 4873b452151bafe49da332aaeab639ef0318fc1ca28d728, the size
reduced by ~700K; from 4873728 to 4172800.

* Add LITE_RUNTIME flag in in.proto files

* Fix merge conflict.

* Address PR comments

* Forgot to add 2 files + fix linux and gpu build errors.

* Fix build errors + test failures

* Fix cuda tests

* Fix tensor rt build

* Use full protobuf for trt

* Address PR comments

* Print tensor shape proto as text string for easier debugging
2019-03-21 14:06:38 -07:00
Scott McKay
a3499083da
Add iterator traits aliases to ConstPointerContainer::ConstIterator (#634)
* Add iterator traits aliases.

* Add a few more pieces to make more compliant with the input iterator requirements.
2019-03-21 15:45:58 +10:00
Ashwini Khade
2f1c3028b7
add capi to set graph optimization level (#657)
* add capi to set graph optimization level

* remove 1 unnecessary check + review comment

* plus updates
2019-03-20 17:14:46 -07:00
Scott McKay
17af8e9ba7
Add subgraph check/update to node removal logic. Fix a few minor issues with Graph that came up during testing of the changes. (#651)
* Check usage of node output as implicit input in any subgraphs.

* Add logic to check/update subgraphs when removing a node.
Fix some issues with Graph
  - Include local outer scope variables when validating. Required if calling Resolve on a subgraph
  - Include outer scope variables in the value info so the type information is captured. Also required to Resolve a subgraph but will detect a type mismatch (previously we threw the type information away).
  - Fix GraphNodes iterator so it can be used with std::find_if. Needed to be assignable so the end_ value can't be const.
2019-03-20 14:57:45 +10:00
Casey Carter
3f52de07c7 Add missing include to status.h
status.h must include <ostream> to use std::ostream.
2019-03-19 11:59:41 -07:00
Ryan Hill
da9af592d9 Remove OrtAppendCustomOpLibPath (#642)
* Remove OrtAppendCustomOpLibPath

* Fix parameter mismatch

* More parameter fixes
2019-03-18 19:44:32 -07:00
Ashwini Khade
481eb971ec
graph transformers update (#608)
* graph transformers update

* some updates

* plus changes

* more updates

* fixes per review comments

* enable tests

* adding more tests

* more changes

* update api in inference sesion

* changes per review

* Linux CI fix

* fix linux CI failure

* fix MAC CI failure

* more updates

* add more documentation and add level param to register transformer
2019-03-18 14:52:16 -07:00
Scott McKay
971058fc38
Avoid copy of pre-existing value to subgraph output (#637)
* Add AllocKind::kShare to allow copying the MLValue for a pre-existing value to a graph output when an Identity node is involved. Ideally we can make this handling for an Identity node more general purpose, however the current logic to free an MLValue during execution doesn't take into account a re-use point also needing a free. Due to that, limit the scope and start with a somewhat ugly hardcoded approach.

Migrate some changes from PR497

The existing Loop unit tests exercise the new code. Also manually stepped through the problematic model to verify the unnecessary copy was avoided.

* Fix build error

* Fix missing switch case in debug output of allocation plan

* Limit optimization to Loop
2019-03-19 06:55:59 +10:00
stevenlix
e8b0ae8923
Trt execution provider (#382)
* updated cmake files for trt

* added trt execution provider

* added trt basic test

* removed trt_path action attribute

* Add files via upload

* Update build.py

* Update trt_allocator.h

* fixed issues found by reviewers

* changed cast operator

* added comment for custom kernel implementation

* changed auto to auto&

* changed to function compile APIs for TRT execution provider

* changed to function compile APIs for TRT execution provider

* added new DType DInt64

* adapted to the changes of onnxruntime_c_api

* removed trt kernel (use function compile instead)

* updated onnx-tensorrt submodule

* set default memory type to TRT fused kernel

* resolve merge conflict

* fixed the issue that USE_CUDA conflicts with USE_TRT

* construct graph by adding nodes in topological order

* made changes for Windows

* change buffers type

* bypass HasImplementationOf check for TRT XP because TRT kernel is not registered

* added domain to version info in rebuilt model proto

* added trt to test option list

* added DomainToVersionMap() to GraphViewer

* removed Copy()

* fixed broken code

* format the code to clang format

* used local reference to the frequently used values

* fixed a couple of issues according to reviewers feedback

* fixed a couple of issues according to reviewers feedback

* added python binding for TRT and enable use_cuda when use_trt is on

* fixed a redefinition issue

* changed shared_ptr to unique_ptr on trt engines, and made a few changes required by reviewers

* enabled trtexecution provider for unit tests

* renamed trt to tensorrt

* added tesorrt to python binding

* update submodule onnx and onnx-tensorrt

* made a couple of minor changes based on reviewer's feedback

* added CUDA_CHECK

* removed test code

* fixed broken code after merge

* updated onnx-tensorrt submodule

* added post processing to align trt inputs/outputs with graph inputs/outputs

* updated onnx submodule

* added CUDA fallback for TensorRT and fixed TensorRT cmake issue

* added ci pipeline for tensorrt and removed some redundent code from trt xp

* fixed syntax issue

* updated onnx-tensorrt submodule

* fix trt build problem by: (#602)

1. Add additional /wd for debug build
2. Add io.h for additional targets
3. Bring back mb version of getopt

* Update install_ubuntu.sh

* Update linux-gpu-tensorrt-ci-pipeline.yml

* Update linux-gpu-tensorrt-ci-pipeline.yml

* Update run_build.sh

* Update run_build.sh

* Update run_build.sh

* Update run_build.sh

* fixed the issue that GetKernelRegistry returns nullptr

* merged master to this branch

* moved some data types to private

* fixed tensorrt CI pipeline issue

* customized test data for TensorRT pipeline

* added onnx-tensorrt in json file and fixed an issue in ci script

* added comments
2019-03-14 12:00:39 -07:00
Konstantinos Karanasos
2ae83c580c
Constant folding (#168)
Constant folding rewrite rule computes nodes that have only constant inputs at compile time and avoids these computations at run time.
2019-03-13 15:44:26 -07:00
jignparm
de9f1ff1ff Add new C function OrtOnnxTypeFromTypeInfo (#585) 2019-03-12 10:11:14 -07:00
Changming Sun
3ef273b84b Support memory mapping on Linux 2019-03-11 19:39:02 -07:00
Ryan Hill
af9c554dd3
Ryanunderhill/custom op (#550)
* Prototype version that demonstrates it can work
* Switched to OrtValue and removed the OrtCustomOpTensor code.
* Support multiple outputs and reading of attributes
* Add custom domain handling to custom ops
* Update documentation
* more wording changes
2019-03-06 19:09:55 -08:00
Scott McKay
0e65bfe7ae
Remove caching from InferenceSession::Run (#547)
* Remove caching from InferenceSession::Run

* Fix automatic merge of one file

* trigger rerunning checks
2019-03-06 14:29:42 -08:00
Changming Sun
8e0fff7b8d
Support large model(>2GB) (#520)
1. Support the new external data extension in ONNX 1.4 onnx/onnx#678
2. Enable onnxruntime_perf_test in Mac Build
3. move path_lib.h from onnx_test_runner source dir to onnxruntime_framework
4. Enable memory planner for string tensors
5. Make memory planner always enabled, to simplify model loading logic
6. Delete some duplicated code between onnxruntime_perf_test and onnx_test_runner
7. Delete win_getopt_mb lib.
8. Remove the dependency on Pathcch lib, which is only available on Windows 8 and newer.
2019-03-05 21:27:12 -08:00
Hariharan Seshadri
a697e0b710
Implement Shrink operator (#485)
* Initial commit

* Adding shrink tests

* Fix formatting in shrink_test.cc

* Fix broken build

* More changes

* PR feedback and formatting

* Place files in the right location corresponding to def file location in onnx

* Exclude shrink model test in test_series.py

* Remove shrink from exclusion list in main.cc

* Adding test to exclusion list

* More tests

* Formatting

* PR feedback

* PR feedback

* More changes

* PR feedback

* More changes

* Fix broken build

* Fix nit

* Fix nit
2019-03-01 12:51:22 -08:00
Scott McKay
6c7099a18e
Break dependency on SessionState for ExecutionFrame and OpKernelContext so optimizers can execute a node with a minimal setup (#498)
* Break dependency on SessionState for ExecutionFrame and OpKernelContext so optimizers can execute a node with a minimal setup.

- Create IExecutionFrame
  - split out core logic and interface from extended logic used in full Graph execution (that uses allocation plan and memory pattern planner)
- Update NodeIndexInfo to allow contruction from a subset of nodes
  - split out logic from GraphNodes into a re-usable template so it can be used with a vector of const Node* as well as a vector of unique_ptr<Node>
- Remove SessionState from OpKernelContext
- Misc cleanups
  - move AllocPlanPerValue out of SequentialExecutionPlan as it's used in a more generic manner that isn't specific to a sequential execution plan

NOTE: I manually tested the new paths, especially NodeIndexInfo. There will shortly be optimizers added that use the new infrastucture so they'll get test coverage as part of those changes.

* Fix linux build issue.
Handle graph with no nodes in NodeIndexInfo.
2019-02-27 15:46:50 -08:00
Scott McKay
dfa21af302
Update C API to allow user to enable caching of feeds and fetches info across calls to Run (#522)
* Add ability to enable caching to the C API, and update the internals to pass the feed names and MLValue instances in vectors so the order is deterministic (so cache entry matching works as expected).

* Address PR comment and don't use 'bool'

* Remove meaningless C# test around duplicate input.

We _could_ check input names for duplicates (previously we did this via the usage of unordered_map), but the system will gracefully handle with the duplicate anyway (will just use the last value provided for the input name).

Based on that, I don't think the cost of checking for duplicates is worth it.

* Fix c-style cast in test_run_options.
2019-02-27 13:41:17 -08:00
shahasad
f9bae489bd
cleanup extra header from c api and sanitize C api test (#517)
* cleaned up the additional header in C-api

* ensure test failure surfaces in the build pipeline

* sanitized runtest.bat

* cleanup unneeded headers

* formatting and typos
2019-02-24 21:06:54 -08:00
Scott McKay
5171e8b129
Make IExecutionProvider::Type return const std::string& instead of a new string. (#506)
Store the type string in IExecutionProvider so that Type() doesn't need to be a virtual.
2019-02-22 18:27:01 +10:00
Changming Sun
b69c834c06 Optimize graph partition 2019-02-20 16:32:04 -08:00
Changming Sun
b02c1d80d4 Fix an SAL annotation in the C API 2019-02-20 12:51:00 -08:00
Scott McKay
fc7185f060
Various optimizations to reduce the setup and device copying cost outside of the call to ExecuteGraph. (#470)
* Various optimizations to reduce the setup and execution cost.

Cache information about the feeds and fetches, and any device copies required to execute the graph so we minimize checking for later calls to ExecuteGraph using the same input/output.
  - enable use of caching in Loop and Scan
  - make use of caching optional for InferenceSession::Run
    - handle calls to Run with different feeds and fetches to support scenarios where there may be a truncated sequence in some calls

Take the feed names and MLValue instances as vectors so the order is deterministic.

Add unit tests

Update onnxruntime_perf_test to enable caching.

* Couple of tweaks.
Fix shared library unit test failure.
Attempt to workaround MacOS build failure due to VC++ bug around including reaching scope values in a lambda automatically.

* Rework order of init in Run so we get nice error messages about invalid feed/output names.

* Refine logic around copying MLValue using execution provider so common code can be used. Simplify the logic due to this change.
Split the paths for executing with/without cached info so we can be more const correct with how FeedsFetchesManager is passed in. This makes it clearer when a shared instance can be used due to it being const.
Cache the FeedsFetchesManager instances in the control flow nodes. They can be re-used across calls to Compute.

* Removed unused local variable to fix some builds.

* Fix build issue by cleaning up some more unused params.

* Check names when using cache entry from SessionState. Add unit test.
2019-02-20 12:12:17 +10:00
Pranav Sharma
9bc6503463
Support non-tensor types in the C API. (#489)
* support non-tensor types

* support non-tensor types.

* support non-tensor types.

* fix compilation issues

* fix compilation issues

* fix compilation issues

* add test cases

* test cases

* add test cases

* try to fix string test case

* working now

* use allocator (broken)

* string test broken after using allocator

* full working example

* Fix PR comments
2019-02-19 14:11:46 -08:00
Changming Sun
d05b74b1b7 Delete Tensor::ShallowCopy 2019-02-12 15:51:36 -08:00
Ke Zhang
fc90a9b2fc
allocator refactor (#467)
* update CPUAllocator.

* onnxruntime

* fix build break

* remove useless subclasses of CPUAllocator.

* refactor to get allocator from executionproviders instead of execution provider.
2019-02-12 14:14:21 -08:00
Hariharan Seshadri
fdd71574d6
misc: Fix comment in op_node_proto_helper (#460)
* Fix comment in op_node_proto_helper

* PR feedback
2019-02-11 14:38:43 -08:00
Changming Sun
4cdb0cbf6e A tiny fix in KernelCreateInfo 2019-02-06 17:59:20 -08:00
Changming Sun
7c70d9349a Fix a bug in execution_provider.cc 2019-02-06 17:08:38 -08:00