onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-22 02:30:26 +00:00

Author	SHA1	Message	Date
Scott McKay	d22f6fddf7	Add ability to specify just the device when using IOBinding for an output (#4386 ) * Add ability to specify just the device when using IOBinding for an output. This enables keeping an output on a different device GPU when it has a dynamic size that is not known ahead of graph execution.	2020-07-03 09:26:47 +10:00
Scott McKay	274e6b4153	Cleanup SessionState. Move allocator lookup to SessionState. (#4194 ) * Move allocators to SessionState so they're decoupled from ExecutionProviders - when looking up an allocator it's based on OrtMemoryInfo not the EP so SessionState is a more natural place for that infromation to be stored - add device based lookup - simplifies logic for copying feeds/fetches across devices Cleanup SessionState and SessionStateInitializer - provide more things to SessionState at construction time so we don't construct and instance and immediately after call a bunch of setters - simplify SessionStateInitializer - reduced down to FinalizeSessionState method	2020-06-28 14:55:42 +10:00
Scott McKay	2fed37c8eb	Fix bug in handling of an initializer that provides a graph output. (#3912 ) * Outputs from model execution should always be returned in a newly allocated buffer or an pre-allocated buffer provided in fetches. When an initializer is providing a graph output (e.g. constant folding may result in this) we were returning an OrtValue that pointed to the initializer and not a separately allocated buffer with a copy. This was wrong as: - value wasn't returned in a pre-allocated fetch so whilst the value returned was correct, it was returned in the wrong place - user could alter the data in the initializer via the returned value * Add unit test with and without pre-allocated fetch. * Add some extra info around why we're handling this special case.	2020-05-12 20:42:58 +10:00
Changming Sun	7c89f38a34	Fix static analysis warnings found by VC++ (#3530 ) 1. Fix static analysis warnings found by VC++ 2. Add a new pipeline for static analysis 3. Merge all the windows CI build into one single yaml file.(Easier to queue them all). 4. Make DNNL build faster by disabling building the tests and examples. 5. Enable custom op unitest.	2020-04-16 01:46:47 -07:00
Changming Sun	06fc9506fd	Thread pool changes (#3153 ) 1. Copy tensorflow's thread pool class to ORT, so that we can get a better implementation of thread pool based parallelfor 2. Copy Eigen's thread pool class to ORT 3. Support thread affinity 4. Remove RNN kernel’s private thread pool 5. Modify pool kernels to use the thread pool when openmp is disabled.	2020-03-30 12:18:40 -07:00
Pranav Sharma	435f014d71	Add support for sessions to share a global threadpool. (#3177 ) * Add support for sessions to share a global threadpool. * Fix build issues * Add tests, fix build issues. * Added some documentation * Fix centos issue when threadpools become nullptr due to 1 core. * Fix mac and x86 build issues * Address some PR comments * Disabled test for android, added few more tests and addressed more PR comments. * const_cast	2020-03-18 15:42:46 -07:00
edgchen1	37f5fd8fb8	Add support for loading TensorProtos with external data from optimizer Initializer (#3045 ) - Added support for loading TensorProtos with external data from the optimizer Initializer class. - Added some file path utilities.	2020-02-28 13:19:16 -08:00
Changming Sun	201b089a36	Fix some warnings on Windows (#2560 ) 1. Enable warning "4503" # Decorated name length exceeded. 2. Enable warning "4146" # unary minus operator applied to unsigned type. 3. Enable float64 support for the Softmax operator 4. Enable compliance checks for Windows x86 32bits build 5. Use TryBatchParallelFor to replace some fallback code in mlas pooling.cc 6. Fix Android CI pipeline.	2020-01-22 15:59:11 -08:00
Changming Sun	5c391854f4	Upgrade gtest to the latest version (#2827 ) WinML would like to update the googletest submodule. They want some newer features (namely GTEST_SKIP to skip tests programmatically and be able to skip entire fixtures easily) and would need to update the submodule version. However, because the new version of code hit a bug in gcc, even though the bug is already fixed in the latest gcc but we're using gcc 4.8.x and it won't get patched for the bug, so we have to do a compromise, change our code a little bit to make it work. The gcc bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51213	2020-01-13 20:16:48 -08:00
Changming Sun	109b3cb450	Avoid using the default logger in the graph lib and optimizers (#2361 ) 1. Use the session logger if it is available. 2. Don't disable warning 4100 globally. We should fix the warnings instead of disabling it.	2019-11-14 13:23:28 -08:00
Pranav Sharma	91db840b6b	Introduce execution mode enum for clarity and extensibility; Change Python, C and C# APIs accordingly; Removed EnableSequentialExecution, DisableSequentialExecution in favor of the more general SetExecutionModeAPI. (#2098 ) * Introduce execution mode for clarity and extensibility; Change Python APIs accordingly; Replace DisableSequentialExecution API with EnableParallelExecution for clarity. * Fix cuda build * Modify the test slightly * Make C and C# APIs consistent with Python.	2019-10-14 09:48:19 -07:00
Dmitri Smirnov	627f853a44	Downgrade compiler to CentOS 4.8.5 (#1985 ) Make onnxruntime CPU build and run on CentOS GCC 4.8.5	2019-10-03 15:40:46 -07:00
shahasad	103b92889e	Opset-11 support (negative axis) for reduce ops (#1929 )	2019-10-02 13:45:17 -07:00
Dmitri Smirnov	d1b1cdc5c4	Replace GSL with GSL-LITE submodule and fix up refs (#1920 ) Remove gsl subodule and replace with a local copy of gsl-lite Refactor for onnxruntime::make_unique gsl::span size and index are now size_t Remove lambda auto argument type detection. Remove constexpr from fail_fast in gsl due to Linux not being happy. Comment out std::stream support due to MacOS std lib broken. Move make_unique into include/core/common so it is accessible for server builds. Relax requirements for onnxruntime/test/providers/cpu/ml/write_scores_test.cc due to x86 build. Add ONNXRUNTIME_ROOT to Server Lib includes so gsl is recognized	2019-10-01 12:43:29 -07:00
Hariharan Seshadri	aacfa2af65	Bump up ONNX to the latest commit (#1868 ) * Initial commit * Delete unnecessary files * Update generated proto files * Update server proto file * Update submodule onnx * Update OnnxMl.cs * update OnnxMl.cs * Update OnnxMl.cs * Comment one test * Update disabled test list * Update backend tests * Formatting fix * Formatting * Disable a test * More tests updated * commit id update * Update to a newer commit * More updates * More test updates * Update * Update * Updates * Update	2019-09-20 18:15:16 -07:00
Pranav Sharma	a9ce941579	Refine threading control options and move inter op thread pool to session state. (#1841 ) Description: Refine threading control options and move inter op thread pool to session state. Added thread_utils.h/cc to centralize the decision around the thread pool size under various conditions. Motivation and Context Currently the thread pool size of the parallel executor is hardcoded to 32 for some reason. This PR makes the options to configure the thread pool sizes clearer.	2019-09-18 22:36:23 -07:00
Pranav Sharma	52fe574fed	Rename OrtAllocatorInfo to OrtMemoryInfo to make it more obvious. (#1758 ) * Mention OrtCreateSessionFromArray in C API doc * Rename OrtAllocatorInfo to OrtMemoryInfo to avoid confusion	2019-09-05 14:20:37 -07:00
Changming Sun	4de0aa8049	Optimize kernel index (#1672 )	2019-08-22 10:26:35 -07:00
Changming Sun	6b89c7ad04	Let mlas use session thread pool (#1609 ) 1.Let mlas use session thread pool 2.Remove onnxruntime_USE_MLAS cmake option 3. Remove the win32 thread pool code inside mlas mlas will: 1.use ort thread pool if it get passed in 2.use openmp if the threadpool parameter is nullptr 3.run single threaded if the threadpool parameter is nullptr and openmp is disabled.	2019-08-16 13:21:15 -07:00
Ashwini Khade	0044be6259	update onnx to latest commit (#1622 ) * update onnx to latest commit * Disable and/or fix failing tests * disable not yet implemented tests for opset 11 * disable tests * fix bug in mkldnn fp16 graph check	2019-08-15 17:10:32 -07:00
Scott McKay	ac6a4afb0f	Add validation of shape when re-using a buffer in ExecutionFrame (#1356 ) * Check for empty string as dim_param in allocation planner. * Validate shape is compatible at runtime when re-using Tensor.	2019-07-09 14:59:07 +10:00
Changming Sun	c18de6817b	Rename MLValue to OrtValue (#1154 )	2019-06-03 17:29:55 -07:00
Changming Sun	2663b9c443	Remove unnecessary casts from OrtValue to MLValue(#1051 )	2019-05-17 07:52:59 -07:00
Changming Sun	99556b111d	Make MemPatternPlanner on/off switchable in model weight loading (#989 )	2019-05-16 14:39:09 -07:00
Scott McKay	971058fc38	Avoid copy of pre-existing value to subgraph output (#637 ) * Add AllocKind::kShare to allow copying the MLValue for a pre-existing value to a graph output when an Identity node is involved. Ideally we can make this handling for an Identity node more general purpose, however the current logic to free an MLValue during execution doesn't take into account a re-use point also needing a free. Due to that, limit the scope and start with a somewhat ugly hardcoded approach. Migrate some changes from PR497 The existing Loop unit tests exercise the new code. Also manually stepped through the problematic model to verify the unnecessary copy was avoided. * Fix build error * Fix missing switch case in debug output of allocation plan * Limit optimization to Loop	2019-03-19 06:55:59 +10:00
Changming Sun	8e0fff7b8d	Support large model(>2GB) (#520 ) 1. Support the new external data extension in ONNX 1.4 onnx/onnx#678 2. Enable onnxruntime_perf_test in Mac Build 3. move path_lib.h from onnx_test_runner source dir to onnxruntime_framework 4. Enable memory planner for string tensors 5. Make memory planner always enabled, to simplify model loading logic 6. Delete some duplicated code between onnxruntime_perf_test and onnx_test_runner 7. Delete win_getopt_mb lib. 8. Remove the dependency on Pathcch lib, which is only available on Windows 8 and newer.	2019-03-05 21:27:12 -08:00
Scott McKay	6c7099a18e	Break dependency on SessionState for ExecutionFrame and OpKernelContext so optimizers can execute a node with a minimal setup (#498 ) * Break dependency on SessionState for ExecutionFrame and OpKernelContext so optimizers can execute a node with a minimal setup. - Create IExecutionFrame - split out core logic and interface from extended logic used in full Graph execution (that uses allocation plan and memory pattern planner) - Update NodeIndexInfo to allow contruction from a subset of nodes - split out logic from GraphNodes into a re-usable template so it can be used with a vector of const Node* as well as a vector of unique_ptr<Node> - Remove SessionState from OpKernelContext - Misc cleanups - move AllocPlanPerValue out of SequentialExecutionPlan as it's used in a more generic manner that isn't specific to a sequential execution plan NOTE: I manually tested the new paths, especially NodeIndexInfo. There will shortly be optimizers added that use the new infrastucture so they'll get test coverage as part of those changes. * Fix linux build issue. Handle graph with no nodes in NodeIndexInfo.	2019-02-27 15:46:50 -08:00
Changming Sun	b69c834c06	Optimize graph partition	2019-02-20 16:32:04 -08:00
Scott McKay	fc7185f060	Various optimizations to reduce the setup and device copying cost outside of the call to ExecuteGraph. (#470 ) * Various optimizations to reduce the setup and execution cost. Cache information about the feeds and fetches, and any device copies required to execute the graph so we minimize checking for later calls to ExecuteGraph using the same input/output. - enable use of caching in Loop and Scan - make use of caching optional for InferenceSession::Run - handle calls to Run with different feeds and fetches to support scenarios where there may be a truncated sequence in some calls Take the feed names and MLValue instances as vectors so the order is deterministic. Add unit tests Update onnxruntime_perf_test to enable caching. * Couple of tweaks. Fix shared library unit test failure. Attempt to workaround MacOS build failure due to VC++ bug around including reaching scope values in a lambda automatically. * Rework order of init in Run so we get nice error messages about invalid feed/output names. * Refine logic around copying MLValue using execution provider so common code can be used. Simplify the logic due to this change. Split the paths for executing with/without cached info so we can be more const correct with how FeedsFetchesManager is passed in. This makes it clearer when a shared instance can be used due to it being const. Cache the FeedsFetchesManager instances in the control flow nodes. They can be re-used across calls to Compute. * Removed unused local variable to fix some builds. * Fix build issue by cleaning up some more unused params. * Check names when using cache entry from SessionState. Add unit test.	2019-02-20 12:12:17 +10:00
Scott McKay	efb72540be	Separate out constant node index information from ExecutionFrame (#410 ) * Separate out the NodeArg index information from ExecutionFrame so it is only calculated once. * Skip copy to/from device if only CPU execution provider is registered. Cleanups. * Address PR comments. Clean up a few areas. * Fix Linux build error	2019-02-01 10:55:49 +10:00
Scott McKay	b194b7df0d	Add the ability to use a custom allocator for fetches to avoid unnecessary copies in control flow operators. (#377 ) * Add the ability to use a custom allocator for fetches. Allows control flow nodes to forward the allocation to the control flow op and avoid an unnecessary copy when the subgraph output has a symbolic dimension. Update Scan and If to use custom allocators when applicable. * Remove unnecessary forward declaration * Fix Mac build warnings	2019-01-29 19:48:10 +10:00
Changming Sun	c2704b5afb	cleanup code (#343 )	2019-01-16 17:12:22 -08:00
KeDengMS	b9cc134576	Make sure tensor sizes are 64-byte aligned (#222 ) This helps reduce misaligned access violation	2018-12-19 13:45:04 -08:00
Ryan Hill	11b369a864	Abbreviate ONNXRuntime as Ort in all of our public APIs (#175 ) Applies to all public headers and macros, plus many internal ones. There are still some internal things with OnnxRuntime in the name, but this fixes all public functions & macros.	2018-12-14 14:54:23 -08:00
Scott McKay	bd50598d17	Document the Graph header files and cleanup some issues. (#42 ) * Checkpoint. * Add doco to graph.h and graph_base.h. Change NodeConstIterator to return a reference to clearly advertise no nullptr's are going to be returned as it's only iterating valid Nodes. Fix some code analysis warnings. * Make a couple of APIs return a reference instead of a pointer as they never return nullptr. * More doco and some minor naming cleanups. * Cleanups Couple more consistency changes. * Fix CUDA test file * Fix invalid line.	2018-11-28 08:42:11 -08:00
Pranav Sharma	7aef8a1cca	Sync with internal master.	2018-11-22 20:56:43 -08:00
Pranav Sharma	89618e8f1e	Initial bootstrap commit.	2018-11-19 16:48:22 -08:00

37 commits