* Prototype version that demonstrates it can work
* Switched to OrtValue and removed the OrtCustomOpTensor code.
* Support multiple outputs and reading of attributes
* Add custom domain handling to custom ops
* Update documentation
* more wording changes
1. Support the new external data extension in ONNX 1.4 onnx/onnx#678
2. Enable onnxruntime_perf_test in Mac Build
3. move path_lib.h from onnx_test_runner source dir to onnxruntime_framework
4. Enable memory planner for string tensors
5. Make memory planner always enabled, to simplify model loading logic
6. Delete some duplicated code between onnxruntime_perf_test and onnx_test_runner
7. Delete win_getopt_mb lib.
8. Remove the dependency on Pathcch lib, which is only available on Windows 8 and newer.
* Initial commit
* Adding shrink tests
* Fix formatting in shrink_test.cc
* Fix broken build
* More changes
* PR feedback and formatting
* Place files in the right location corresponding to def file location in onnx
* Exclude shrink model test in test_series.py
* Remove shrink from exclusion list in main.cc
* Adding test to exclusion list
* More tests
* Formatting
* PR feedback
* PR feedback
* More changes
* PR feedback
* More changes
* Fix broken build
* Fix nit
* Fix nit
* Break dependency on SessionState for ExecutionFrame and OpKernelContext so optimizers can execute a node with a minimal setup.
- Create IExecutionFrame
- split out core logic and interface from extended logic used in full Graph execution (that uses allocation plan and memory pattern planner)
- Update NodeIndexInfo to allow contruction from a subset of nodes
- split out logic from GraphNodes into a re-usable template so it can be used with a vector of const Node* as well as a vector of unique_ptr<Node>
- Remove SessionState from OpKernelContext
- Misc cleanups
- move AllocPlanPerValue out of SequentialExecutionPlan as it's used in a more generic manner that isn't specific to a sequential execution plan
NOTE: I manually tested the new paths, especially NodeIndexInfo. There will shortly be optimizers added that use the new infrastucture so they'll get test coverage as part of those changes.
* Fix linux build issue.
Handle graph with no nodes in NodeIndexInfo.
* Add ability to enable caching to the C API, and update the internals to pass the feed names and MLValue instances in vectors so the order is deterministic (so cache entry matching works as expected).
* Address PR comment and don't use 'bool'
* Remove meaningless C# test around duplicate input.
We _could_ check input names for duplicates (previously we did this via the usage of unordered_map), but the system will gracefully handle with the duplicate anyway (will just use the last value provided for the input name).
Based on that, I don't think the cost of checking for duplicates is worth it.
* Fix c-style cast in test_run_options.
* cleaned up the additional header in C-api
* ensure test failure surfaces in the build pipeline
* sanitized runtest.bat
* cleanup unneeded headers
* formatting and typos
* Various optimizations to reduce the setup and execution cost.
Cache information about the feeds and fetches, and any device copies required to execute the graph so we minimize checking for later calls to ExecuteGraph using the same input/output.
- enable use of caching in Loop and Scan
- make use of caching optional for InferenceSession::Run
- handle calls to Run with different feeds and fetches to support scenarios where there may be a truncated sequence in some calls
Take the feed names and MLValue instances as vectors so the order is deterministic.
Add unit tests
Update onnxruntime_perf_test to enable caching.
* Couple of tweaks.
Fix shared library unit test failure.
Attempt to workaround MacOS build failure due to VC++ bug around including reaching scope values in a lambda automatically.
* Rework order of init in Run so we get nice error messages about invalid feed/output names.
* Refine logic around copying MLValue using execution provider so common code can be used. Simplify the logic due to this change.
Split the paths for executing with/without cached info so we can be more const correct with how FeedsFetchesManager is passed in. This makes it clearer when a shared instance can be used due to it being const.
Cache the FeedsFetchesManager instances in the control flow nodes. They can be re-used across calls to Compute.
* Removed unused local variable to fix some builds.
* Fix build issue by cleaning up some more unused params.
* Check names when using cache entry from SessionState. Add unit test.
* support non-tensor types
* support non-tensor types.
* support non-tensor types.
* fix compilation issues
* fix compilation issues
* fix compilation issues
* add test cases
* test cases
* add test cases
* try to fix string test case
* working now
* use allocator (broken)
* string test broken after using allocator
* full working example
* Fix PR comments
* Create a project for graph optimizer.
Move optimizer related code to the folder optimizer.
* Fix build failures.
* rebase and fix build failures.
* fix build failure.
* fix build failure with cuda path.
* fix python build failure.
* Move two transformers(memcpy and insert_cast) from framework to optimizer.
* rebase.
* SessionState should not depend on optimizer.
* Add Recurse method to GraphTransformer.
Move GraphTransformer::Apply to ApplyImpl and make private.
Add non-virtual GraphTransformer::Apply method to handle calling Graph::Resolve in a more consistent manner.
Create MemcpyTransformer GraphTransformer to handle memcpy operations on subgraphs in a more standard way.
* Checkpoint
* Make the subgraph insert less verbose
* Add graph nesting level to transformer ApplyImpl
Tweak cast transformer to recurse nicely and avoid unnecessary Resolve calls by splitting out the duplicate removal into a separate transformer.
Decouple memcpy transformer from ExecutionProviders and minimise what's in the header.
* Recurse into subgraphs inside GraphPartitioner
* Update a couple of new transformers
* Check Recurse return value.
* Cleanup some memory management in inference session by moving some things into SessionState
* Add deleted flag to rewrite rules so we stop processing nodes that are removed.
Remove some (most likely) unnecessary Resolve calls. As we always call Resolve for a graph modified by a transformer there's generally no need for the transformer to do it.
* Minor cleanups.
* Add some extra usage information to the comments in GraphTransformer.
* Address PR comments
* Separate out the NodeArg index information from ExecutionFrame so it is only calculated once.
* Skip copy to/from device if only CPU execution provider is registered.
Cleanups.
* Address PR comments.
Clean up a few areas.
* Fix Linux build error
Rewrite rule that eliminates slice operators when they are redundant (i.e., when they preserve the whole input).
Re-implementation of the identity elimination rule using the latest Graph API.
* refactor kernel registry to make it a little bit more readable.
* update
* update cudaexecutionprovider
* fix build break
* fix comments
* fix build break
* Make OrtAllocator not be reference counted
* Make the allocator interface more type safe
* Fix build break
* Build break fix
* Build break fix
* Mistake in previous build fix.
* Fix review comments + build break
* Missed the export symbols
* C specific error, need 'struct' keyword in one case.
* Function calling OrtReleaseObject instead of OrtReleaseEnv
* merge function compile interface
* fix build error
* fix linux build break
* fix static cast issue; fix clang style
* fix argument change
* use alignment allocation;fix comments in pr
* fix linux break
* apply clang format
* rename according to comments in pr
* rename according to pr comments;remove useless file
* remove the need_compile flag
* avoid passing whole session state
* More API changes, remove 'Inference' from function names. Remove enum values. Make Status match other types.
* Switch to bool instead of int, and remove stdbool
Applies to all public headers and macros, plus many internal ones. There are still some internal things with OnnxRuntime in the name, but this fixes all public functions & macros.
* Rename ONNXRUNTIME_API_STATUSCALL to ONNXRUNTIME_API_CALL since it's just for calling convention, there's no status return value implied.
* Missed a function
* remove input edges while removing a node.
* put comments on how to remove a node without using resolve.
* fix test failure
* fix test failure.
* fix format
* try to guess and fix mac ci failure.
* change add remove edge api.
* update IR
* update comments
* fix comments.