* Add string attribute interface for C API.
* Add string attribute interface for C++ API accordingly.
* Update comment to say that string is also valid
* Use INFO instead of WARNING for an unused graph input.
* Drop severity of unused initializer as well
* Update to output a warning level message if removing an initializer that is never used, and an info level message if removing an initializer that optimization has made redundant.
* Now that we check for a constant initializer in an ancestor graph we also need to be able to retrieve and replace that initializer.
Add helpers to do so.
Update optimizers to use the new helpers.
Fix bug in UnsqueezeElimination where it wasn't checking if the initializer it was replacing was constant.
This change integrates the NCHWc support recently added to MLAS into ONNX Runtime. When using "-o 3" optimizations, then the runtime will do a NCHWc layout optimization pass to convert standard ONNX operators such as Conv/MaxPool to the com.microsoft.nchwc domain with weights and biases reordered for speed.
Description:
Disallow overriding an initializer via a graph input if the IR version is < 4. This enforces an implicit assumption that initializers should be treated as constant, and allows constant folding to be done on a model with an older IR version.
Separate constant and overridable initializers so that it's clear which ones constant folding can utilize.
Update Graph to not add all initializers to the graph inputs when the graph is manually created (i.e. not loaded from a GraphProto) and the IR version is >= 4.
Motivation and Context
In order to do constant folding we need to know which initializers can be treated as constant and which are overridable. All initializers were required to have a matching graph input prior to IR version 4, technically making all of them overridable. The intention however was for them to be treated as constants, and this change enforces that intent.
The benefit of doing so is that constant folding will work for models with IR version < 4. The cost is that if someone is actually overriding an initializer they will need to update the IR version of their model to version 4 in order to keep doing so. The belief is that this is a very small subset of usage (e.g. models involving feeding in a truncated sequence) and the cost to update that small subset is warranted by the benefit of constant folding being able to be enabled on all older models without them needing an IR version update.
* init
* Update DNNLibrary
* Update DNNLibrary, set compiler flags, it compiles now
* Add more missing flags, add test
* Update DNNLibrary
* Update Compile method, fix allocator and some other bugs
* Update DNNLibrary
* Implement CopyTensor
* Not delete state explicitly since it is managed by unique_ptr
* Add the missing files when SingleUnitTestProjct is ON
* misc changes
* Fix wrong name in provider factory
* Add my own test
* Update the code of add node into graph, and add the missing initializer into graph
* Fix the bug that re-build the graph produces extra output
* Update DNNLibrary
* Transpose nchw (ONNX) -> nhwc (NNAPI)
* Add license
* Add GetSupportedNodes method (implement it later)
* Rename onnxruntime_nnapi_test->onnxruntime_nnapi_squeezenet_test
* Update squeezenet_test.cpp after rebase master
* Remove squeezenet_test.cpp since it is almost same with the c++ sample
* Update DNNLibrary for GetSupportedNodes
* Update GetSupportedNodes
* Revert "Remove squeezenet_test.cpp since it is almost same with the c++ sample"
This reverts commit a97575fd9ff49e50ba1dc8d8154790d8cd86c48d.
* Update DNNLibrary
* Fix multiple outputs bug
* Remove GetKernelRegistry
* Revert "Revert "Remove squeezenet_test.cpp since it is almost same with the c++ sample""
This reverts commit 2a0670e9cbf10ea654111ce39e198a4be0ddd838.
* Set default memory type of NNAPI EP
* Add CPUOutput allocator
* Update DNNLibrary for multiple outputs
* Fix bug of nhwc->nchw
* Remove GetExecutionHandle()
* Initial commit for OpenVINO Execution Provider
OpenVINO Execution Provider provides the interface for ONNX Runtime
applications to access Intel's hardware accelerators using Intel's
OpenVINO Toolkit.
* Fixed bug in GetCapability to disable custom ops
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Added OPENVINO ci pipeline
Added new pipeline for openvino provider,
made changes to support the docker build and
onnxruntime build with openvino.
Signed-off-by: Luis Daniel Castellanos <luis.daniel.castellanos@intel.com>
* Enabled all unit tests for OpenVINO EP
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Fixed syntax issue in run_docker_build.sh file
* Added missing default OPENVINO_VERSION
Default value for OPENVINO_VERSION env was
missing causing the build to fail
* Added install Model Optimizer deps step
* Fixed python unit tests and some tests from onnx_backend_test_series
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Fixed indentation bug
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Disabled some of the python backend tests for OpenVINO
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Disabled some model tests
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Remove Duplicate checks for openvino in build.py
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Modified GetCapability for FP16
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Disabled GPU FP32 tests that are not supported
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Convert modelProto to string and use it in compile
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Pass byte-array input args to MO
* Serialized ModelProto passed in-memory to MO
ModelOptimizer python module receives the serialized ModelProto
in-memory.
Uses appropriate ONNX function to load the serialized bytes.
* Make Py_Finalize compatible with older python versions
Also, remove pFunc unassigned variable possibility.
* Fallback if input dims of Matmul is greater than 2
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* fixup: Device #define syntax
* Updated the documentation
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Enable dynamic dim value
* removed commented out code
* Added Dockerfile for openvino EP
Updated instructions on dockerfiles/README.md file
Signed-off-by: Luis Daniel Castellanos <luis.daniel.castellanos@intel.com>
* Disabled fp16_inception_v1 test
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Code formatting with clang-format
Uses style from the .clang-format file in root directory.
* fixup: docker tag and build error fixes
* Heuristics to automatically detect batching
Distributes slices from batch into parallel infer-request objects.
* Handle disabled tests in GetCapability
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Disabled average pool and max pool if ceil_mode is 1
Also dilations are not supported if they are greater than 1
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Disabled Unsqueeze int32 test
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* changes to fix output results bug
* Disabled a few C++ unit tests for MYRIAD FP16
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Manually revert '9fe162bb Enable dynamic dim value'
Reverts compile time setting of dynamic shape
Reverting manually due to significantly huge auto-revert conflicts.
* Fixed unused variable warning
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Disabled Mul test for GPU_FP16 due to accuracy issue
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* VPU documentation update
* Disabled inception_v1 for MYRIAD and HDDL
*Also disabled few C++ accuracy tests for HDDL
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* updates from upstream
* use the new CustomOpApis for I/O interfacing
* Pass initializers as subgraph meta-def inputs in GetCapability()
Requirement due to API changes introduced with PR# 1019.
* Remove obsolete functions
* Save indexes of graph inputs from fused_node info
Both inputs and initializers are passed as data inputs to the
infer function. To identify only inputs among them, save thier
index info from fused_node in Compile function.
* Documentation changes to enable VPU
* Fix VPU related changes in documentation
* Fix minor changes in documentation
* Fix VPU related changes in documentation
* Use Node.In/OutputDefs() to track graph inputs and outputs.
Don't use graph_viewer's GetInputs() or
GetInputsIncludingInitializers().
* Permit "SAME_UPPER" auto_pad attribute from MaxPool
* Disabled fp16_tiny_yolov2 in onnx model tests
* Updated documentation to include configuration guides for myriad and hddl
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Use 8 Infer requests only for VAD-R
* disable debug prints
* Clang-format source files
* Updated BUILD.md with OpenVINO R5 links
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Disabled same upper python tests
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Update test exclusion syntax
* Change path of install_onnx.sh
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Disable tiny_yolov2 in broken tests
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Revert "Change path of install_onnx.sh"
This reverts commit ba9db165f3be430f2aff1ef413299ed04637196a.
This change is only required for Intel internal CI pipeline until
the settings are matched with the upstream's CI pipeline.
* Added debug statements for debugging CI error
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Add --build_wheel to linux openvino pipeline
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Added -v option to onnx_test_runner for debugging
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Removed path change patch
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Added -c 1 to onnx_test_runner
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Refactor MO python invocation in separate function
Cleans up Model Optimizer python invocation check and conversion
logic. Invokes MO only once in GetCapability() and passes the
IR strings (xml and bin) to the Compiler as meta-def attributes.
* Add comments
* code cleanup and comments
* Code cleanup for GetCapability
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Removed unnecessary files
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Revert "Added -v option to onnx_test_runner for debugging"
This reverts commit d1dd70938a94d648df1a1dbbc2e48d0b97e49ec8.
* Revert "Added debug statements for debugging CI error"
This reverts commit b86d41afed2aa29c3508155d6f9c8d3a7263cc60.
* incorporate Status Code changes
* ComputeFunc returns Status::OK() on success
* Use test names to disable tests for MYRIAD and VAD-R
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Rename local identifiers from CNNNetwork to OpenVINO network
CNNNetwork is an OpenVINO's API class that represents more than
just convolutional neural networks (CNNs). Renaming helps to avoid
confusion that the API's only support CNN type models.
* Added error message if building on windows
* Removed duplicate option in Cmake
* Removed unnecessary parameters in activation_opt_test
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Refactor Map search and access logic for efficiently and cleanliness.
* use C++ style casts
* Use os.path.join for python directory path operations
* use C++ style casts
* EP classes should use onnxruntime namespace
* Clean up fixes from PR comments
* Don't explicitly shutdown Py interpreter
* Remove debug print statements
Prints will be re-enabled later with a logging mechanism with
debug/verbose printing options.
* Decrement ref counts for used pyObjects
* Restore build instructions for other compilers
Content under the "Using other compilers" section has been
accidentally deleted by a previous commit. Restoring back that
content from the latest upstream repo.
* CMake code cleanup
Code clean up, commenting and formatting of CMake code.
* Don't pass the unused device_info parameter to OpenVINOGraph ctor.
* Add support for multiple I/O data types
Adds support for the following tensor data types for graph inputs
and outputs:
1) float
2) float16
3) int32
4) int16
5) int8
6) uint16
7) uint8
* cleanup setup.py module list definition
* Deduce index of input using tracked input index map
Ignores initializers in case they are ordered before inputs.
* Removed debug statement in MO code
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* PR feedback
* Removed per_sample_tolerance for openvino
* Removed unnecessary disabled tests
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Removed debug function
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Disabled tiny_yolo_v2 due to accuracy issues
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Changed the disabled reason for broken tests
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Disabled Reshape with no input
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Python formatting with Autopep8
* Minor fix for MYRIAD devices
* Added zero dimension check
*Removed setting batch size for the network
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Set the threshold to larger value for MNIST
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Removed setting higher threshold in provider_test_utils
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Check for --use_openvino in python wheel setup.py
Add openvino modules to the setup script for building the wheel
package only for --use_openvino a build option.
* Removed nullptr checks for GetNode()
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
Add ability to set the session and run logger severity via SessionOptions and RunOptions
Inherit severity from the next logger up if logger severity isn't specified in SessionOptions or RunOptions
Expose ability to set default logger severity in python bindings.
* refactoring the ep codes.
* remove unnecessary lock.
* fix the comment to claim KernelRegistryManager is not thread safe.
* clarify that APIs to add custom op in inferencesession is not thread safe.
More C++ API improvements and cleanup
Add templates to tensor creation
Add run method that allows preallocated outputs
Simplify CreateTensor<T> to multiply by sizeof(T)
Convert io_types code
Optimize away vector copies in Session::Run
* Change function signature
* Convert compute to use custom op style APIs
* Remove dead CustomOp function
* Use CustomOp API in TensorRT EP
* Switch to new API in ngraph
* More C++ API improvements and conversions
* Mark more constructors as explicit
* Fix CSharp function name changes
* Change more test cases to use C++ API
* allow users to set graph inputs and outputs fully.
* update
* update the comments of the APIs
* update
* remove commented-out codes.
* fix test failures.
* fix comments.
* adding more check to throw not support exception right now.
Memory pattern doesn't work for parallel executor by design. Enabling Memory Pattern for parallel executor logs warning and make the perf bad.
Add option to enable/disable memory pattern back.
Introduce a quick pre-filtering of rules based on the node op types they are targeting.
The goal is to avoid evaluating all rules for all nodes. Instead, for each node, we will only be evaluating the rules associated with its op type.
* constant node should not be put into graph inputs any more.
* simplify graph input/output set logic.
* refactor comments.
* remove adding initializers as graph inputs when creating graph from scratch.
Some changes that reduce the size of the release onnxruntime.dll by 170KB:
Change the ONNX_OPERATOR_KERNEL macros to not create a unique virtual class per kernel create lambda, but instead use a generic class with the raw function address supplied at BuildCreateKernelInfo time.
Changed the exceution providers to use a table driven approach to calling the BuildCreateKernelInfo functions instead of a massive function with construct/call/delete sequences.
The CreateFunc in data_types.h didn't need to be a std::function, eliminating more lambda virtual classes.
N.B. To accommodate MSVC 14.11 toolchain (used for CUDA builds), the operator+() syntax cannot be used to retrieve the raw function address. The older toolchain can't resolve between cdecl/vectorcall and gives up. An explicit cast is needed to help the compiler along.
* Adding a custom op interface to the C API to remove shared library dependency.
* Remove old custom op test
* Rework how custom ops handle inputs/outputs to enable custom op output shape calculation in the compute method
* Add a nicer C++ API for custom ops and switch the tests to use it.
Generalize node removal method in graph_utils. This is a higher-level method that keeps the graph consistent so that no Resolve is needed after the removal of a node.
The new method supports the removal of nodes with a single input (be it an incoming node or an initializer) and a single output (but allowing multiple output edges of that output). It also takes into account the case that one of the output edges is fed to a subgraph.
Also updated the rewrite rules to use this new, less restrictive method, and improved the rules' conditions. Introduced a GraphEdge struct to simplify various methods in graph_utils.
* fix graph transformers and refactor tests
* fix merge master
* Set default optimization level to Level1
* fix build warnings for Linux
* try root cause tensorrt test failures
* try root cause tensorrt test failure
* Test level2 transformers with all CI builds
* remove ConvActivation fusion transformer
* change default level back to level1
* remove providers from apply api
* more changes
* Convert unsqueeze elimination to rewrite rule
* Simplify the way we register predefined transformers and rules in the inference session (all details are now moved to the graph transformer utils)
* Some reorganization and renaming of methods in graph_utils
* Updates in graph transformers test
* Update in edge removal to not perform unnecessary check of node args that led to race conditions when updating the graph
* Improve documentation for rewrite rules
* Remove top-down rule-based transformer (given we currently have only one type of rule-based transformer)
* Adding a custom op interface to the C API to remove shared library dependency.
* Fixup const issues
* Renaming to make things a little simpler
* Add a comment
* Test protobuf-lite
* Test protobuf-lite
* Test protobuf-lite
* Optimize protobuf usage for LITE_RUNTIME to reduce the binary size of
onnxruntime.dll. More details can be found here https://developers.google.com/protocol-buffers/docs/proto.
The reduction is significant. For commit id: 4873b452151bafe49da332aaeab639ef0318fc1ca28d728, the size
reduced by ~700K; from 4873728 to 4172800.
* Add LITE_RUNTIME flag in in.proto files
* Fix merge conflict.
* Address PR comments
* Forgot to add 2 files + fix linux and gpu build errors.
* Fix build errors + test failures
* Fix cuda tests
* Fix tensor rt build
* Use full protobuf for trt
* Address PR comments
* Print tensor shape proto as text string for easier debugging
* Check usage of node output as implicit input in any subgraphs.
* Add logic to check/update subgraphs when removing a node.
Fix some issues with Graph
- Include local outer scope variables when validating. Required if calling Resolve on a subgraph
- Include outer scope variables in the value info so the type information is captured. Also required to Resolve a subgraph but will detect a type mismatch (previously we threw the type information away).
- Fix GraphNodes iterator so it can be used with std::find_if. Needed to be assignable so the end_ value can't be const.
* graph transformers update
* some updates
* plus changes
* more updates
* fixes per review comments
* enable tests
* adding more tests
* more changes
* update api in inference sesion
* changes per review
* Linux CI fix
* fix linux CI failure
* fix MAC CI failure
* more updates
* add more documentation and add level param to register transformer
* Add AllocKind::kShare to allow copying the MLValue for a pre-existing value to a graph output when an Identity node is involved. Ideally we can make this handling for an Identity node more general purpose, however the current logic to free an MLValue during execution doesn't take into account a re-use point also needing a free. Due to that, limit the scope and start with a somewhat ugly hardcoded approach.
Migrate some changes from PR497
The existing Loop unit tests exercise the new code. Also manually stepped through the problematic model to verify the unnecessary copy was avoided.
* Fix build error
* Fix missing switch case in debug output of allocation plan
* Limit optimization to Loop
* updated cmake files for trt
* added trt execution provider
* added trt basic test
* removed trt_path action attribute
* Add files via upload
* Update build.py
* Update trt_allocator.h
* fixed issues found by reviewers
* changed cast operator
* added comment for custom kernel implementation
* changed auto to auto&
* changed to function compile APIs for TRT execution provider
* changed to function compile APIs for TRT execution provider
* added new DType DInt64
* adapted to the changes of onnxruntime_c_api
* removed trt kernel (use function compile instead)
* updated onnx-tensorrt submodule
* set default memory type to TRT fused kernel
* resolve merge conflict
* fixed the issue that USE_CUDA conflicts with USE_TRT
* construct graph by adding nodes in topological order
* made changes for Windows
* change buffers type
* bypass HasImplementationOf check for TRT XP because TRT kernel is not registered
* added domain to version info in rebuilt model proto
* added trt to test option list
* added DomainToVersionMap() to GraphViewer
* removed Copy()
* fixed broken code
* format the code to clang format
* used local reference to the frequently used values
* fixed a couple of issues according to reviewers feedback
* fixed a couple of issues according to reviewers feedback
* added python binding for TRT and enable use_cuda when use_trt is on
* fixed a redefinition issue
* changed shared_ptr to unique_ptr on trt engines, and made a few changes required by reviewers
* enabled trtexecution provider for unit tests
* renamed trt to tensorrt
* added tesorrt to python binding
* update submodule onnx and onnx-tensorrt
* made a couple of minor changes based on reviewer's feedback
* added CUDA_CHECK
* removed test code
* fixed broken code after merge
* updated onnx-tensorrt submodule
* added post processing to align trt inputs/outputs with graph inputs/outputs
* updated onnx submodule
* added CUDA fallback for TensorRT and fixed TensorRT cmake issue
* added ci pipeline for tensorrt and removed some redundent code from trt xp
* fixed syntax issue
* updated onnx-tensorrt submodule
* fix trt build problem by: (#602)
1. Add additional /wd for debug build
2. Add io.h for additional targets
3. Bring back mb version of getopt
* Update install_ubuntu.sh
* Update linux-gpu-tensorrt-ci-pipeline.yml
* Update linux-gpu-tensorrt-ci-pipeline.yml
* Update run_build.sh
* Update run_build.sh
* Update run_build.sh
* Update run_build.sh
* fixed the issue that GetKernelRegistry returns nullptr
* merged master to this branch
* moved some data types to private
* fixed tensorrt CI pipeline issue
* customized test data for TensorRT pipeline
* added onnx-tensorrt in json file and fixed an issue in ci script
* added comments
* Prototype version that demonstrates it can work
* Switched to OrtValue and removed the OrtCustomOpTensor code.
* Support multiple outputs and reading of attributes
* Add custom domain handling to custom ops
* Update documentation
* more wording changes
1. Support the new external data extension in ONNX 1.4 onnx/onnx#678
2. Enable onnxruntime_perf_test in Mac Build
3. move path_lib.h from onnx_test_runner source dir to onnxruntime_framework
4. Enable memory planner for string tensors
5. Make memory planner always enabled, to simplify model loading logic
6. Delete some duplicated code between onnxruntime_perf_test and onnx_test_runner
7. Delete win_getopt_mb lib.
8. Remove the dependency on Pathcch lib, which is only available on Windows 8 and newer.
* Initial commit
* Adding shrink tests
* Fix formatting in shrink_test.cc
* Fix broken build
* More changes
* PR feedback and formatting
* Place files in the right location corresponding to def file location in onnx
* Exclude shrink model test in test_series.py
* Remove shrink from exclusion list in main.cc
* Adding test to exclusion list
* More tests
* Formatting
* PR feedback
* PR feedback
* More changes
* PR feedback
* More changes
* Fix broken build
* Fix nit
* Fix nit
* Break dependency on SessionState for ExecutionFrame and OpKernelContext so optimizers can execute a node with a minimal setup.
- Create IExecutionFrame
- split out core logic and interface from extended logic used in full Graph execution (that uses allocation plan and memory pattern planner)
- Update NodeIndexInfo to allow contruction from a subset of nodes
- split out logic from GraphNodes into a re-usable template so it can be used with a vector of const Node* as well as a vector of unique_ptr<Node>
- Remove SessionState from OpKernelContext
- Misc cleanups
- move AllocPlanPerValue out of SequentialExecutionPlan as it's used in a more generic manner that isn't specific to a sequential execution plan
NOTE: I manually tested the new paths, especially NodeIndexInfo. There will shortly be optimizers added that use the new infrastucture so they'll get test coverage as part of those changes.
* Fix linux build issue.
Handle graph with no nodes in NodeIndexInfo.
* Add ability to enable caching to the C API, and update the internals to pass the feed names and MLValue instances in vectors so the order is deterministic (so cache entry matching works as expected).
* Address PR comment and don't use 'bool'
* Remove meaningless C# test around duplicate input.
We _could_ check input names for duplicates (previously we did this via the usage of unordered_map), but the system will gracefully handle with the duplicate anyway (will just use the last value provided for the input name).
Based on that, I don't think the cost of checking for duplicates is worth it.
* Fix c-style cast in test_run_options.
* cleaned up the additional header in C-api
* ensure test failure surfaces in the build pipeline
* sanitized runtest.bat
* cleanup unneeded headers
* formatting and typos
* Various optimizations to reduce the setup and execution cost.
Cache information about the feeds and fetches, and any device copies required to execute the graph so we minimize checking for later calls to ExecuteGraph using the same input/output.
- enable use of caching in Loop and Scan
- make use of caching optional for InferenceSession::Run
- handle calls to Run with different feeds and fetches to support scenarios where there may be a truncated sequence in some calls
Take the feed names and MLValue instances as vectors so the order is deterministic.
Add unit tests
Update onnxruntime_perf_test to enable caching.
* Couple of tweaks.
Fix shared library unit test failure.
Attempt to workaround MacOS build failure due to VC++ bug around including reaching scope values in a lambda automatically.
* Rework order of init in Run so we get nice error messages about invalid feed/output names.
* Refine logic around copying MLValue using execution provider so common code can be used. Simplify the logic due to this change.
Split the paths for executing with/without cached info so we can be more const correct with how FeedsFetchesManager is passed in. This makes it clearer when a shared instance can be used due to it being const.
Cache the FeedsFetchesManager instances in the control flow nodes. They can be re-used across calls to Compute.
* Removed unused local variable to fix some builds.
* Fix build issue by cleaning up some more unused params.
* Check names when using cache entry from SessionState. Add unit test.
* support non-tensor types
* support non-tensor types.
* support non-tensor types.
* fix compilation issues
* fix compilation issues
* fix compilation issues
* add test cases
* test cases
* add test cases
* try to fix string test case
* working now
* use allocator (broken)
* string test broken after using allocator
* full working example
* Fix PR comments
* Create a project for graph optimizer.
Move optimizer related code to the folder optimizer.
* Fix build failures.
* rebase and fix build failures.
* fix build failure.
* fix build failure with cuda path.
* fix python build failure.
* Move two transformers(memcpy and insert_cast) from framework to optimizer.
* rebase.
* SessionState should not depend on optimizer.
* Add Recurse method to GraphTransformer.
Move GraphTransformer::Apply to ApplyImpl and make private.
Add non-virtual GraphTransformer::Apply method to handle calling Graph::Resolve in a more consistent manner.
Create MemcpyTransformer GraphTransformer to handle memcpy operations on subgraphs in a more standard way.
* Checkpoint
* Make the subgraph insert less verbose
* Add graph nesting level to transformer ApplyImpl
Tweak cast transformer to recurse nicely and avoid unnecessary Resolve calls by splitting out the duplicate removal into a separate transformer.
Decouple memcpy transformer from ExecutionProviders and minimise what's in the header.
* Recurse into subgraphs inside GraphPartitioner
* Update a couple of new transformers
* Check Recurse return value.
* Cleanup some memory management in inference session by moving some things into SessionState
* Add deleted flag to rewrite rules so we stop processing nodes that are removed.
Remove some (most likely) unnecessary Resolve calls. As we always call Resolve for a graph modified by a transformer there's generally no need for the transformer to do it.
* Minor cleanups.
* Add some extra usage information to the comments in GraphTransformer.
* Address PR comments
* Separate out the NodeArg index information from ExecutionFrame so it is only calculated once.
* Skip copy to/from device if only CPU execution provider is registered.
Cleanups.
* Address PR comments.
Clean up a few areas.
* Fix Linux build error
Rewrite rule that eliminates slice operators when they are redundant (i.e., when they preserve the whole input).
Re-implementation of the identity elimination rule using the latest Graph API.
* refactor kernel registry to make it a little bit more readable.
* update
* update cudaexecutionprovider
* fix build break
* fix comments
* fix build break
* Make OrtAllocator not be reference counted
* Make the allocator interface more type safe
* Fix build break
* Build break fix
* Build break fix
* Mistake in previous build fix.
* Fix review comments + build break
* Missed the export symbols
* C specific error, need 'struct' keyword in one case.
* Function calling OrtReleaseObject instead of OrtReleaseEnv
* merge function compile interface
* fix build error
* fix linux build break
* fix static cast issue; fix clang style
* fix argument change
* use alignment allocation;fix comments in pr
* fix linux break
* apply clang format
* rename according to comments in pr
* rename according to pr comments;remove useless file
* remove the need_compile flag
* avoid passing whole session state
* More API changes, remove 'Inference' from function names. Remove enum values. Make Status match other types.
* Switch to bool instead of int, and remove stdbool
Applies to all public headers and macros, plus many internal ones. There are still some internal things with OnnxRuntime in the name, but this fixes all public functions & macros.
* Rename ONNXRUNTIME_API_STATUSCALL to ONNXRUNTIME_API_CALL since it's just for calling convention, there's no status return value implied.
* Missed a function
* remove input edges while removing a node.
* put comments on how to remove a node without using resolve.
* fix test failure
* fix test failure.
* fix format
* try to guess and fix mac ci failure.
* change add remove edge api.
* update IR
* update comments
* fix comments.
* Checkpoint.
* Add doco to graph.h and graph_base.h.
Change NodeConstIterator to return a reference to clearly advertise no nullptr's are going to be returned as it's only iterating valid Nodes.
Fix some code analysis warnings.
* Make a couple of APIs return a reference instead of a pointer as they never return nullptr.
* More doco and some minor naming cleanups.
* Cleanups
Couple more consistency changes.
* Fix CUDA test file
* Fix invalid line.