onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-04 04:07:22 +00:00

Author	SHA1	Message	Date
Casey Carter	3f52de07c7	Add missing include to status.h status.h must include <ostream> to use std::ostream.	2019-03-19 11:59:41 -07:00
Ryan Hill	da9af592d9	Remove OrtAppendCustomOpLibPath (#642 ) * Remove OrtAppendCustomOpLibPath * Fix parameter mismatch * More parameter fixes	2019-03-18 19:44:32 -07:00
Ashwini Khade	481eb971ec	graph transformers update (#608 ) * graph transformers update * some updates * plus changes * more updates * fixes per review comments * enable tests * adding more tests * more changes * update api in inference sesion * changes per review * Linux CI fix * fix linux CI failure * fix MAC CI failure * more updates * add more documentation and add level param to register transformer	2019-03-18 14:52:16 -07:00
Scott McKay	971058fc38	Avoid copy of pre-existing value to subgraph output (#637 ) * Add AllocKind::kShare to allow copying the MLValue for a pre-existing value to a graph output when an Identity node is involved. Ideally we can make this handling for an Identity node more general purpose, however the current logic to free an MLValue during execution doesn't take into account a re-use point also needing a free. Due to that, limit the scope and start with a somewhat ugly hardcoded approach. Migrate some changes from PR497 The existing Loop unit tests exercise the new code. Also manually stepped through the problematic model to verify the unnecessary copy was avoided. * Fix build error * Fix missing switch case in debug output of allocation plan * Limit optimization to Loop	2019-03-19 06:55:59 +10:00
stevenlix	e8b0ae8923	Trt execution provider (#382 ) * updated cmake files for trt * added trt execution provider * added trt basic test * removed trt_path action attribute * Add files via upload * Update build.py * Update trt_allocator.h * fixed issues found by reviewers * changed cast operator * added comment for custom kernel implementation * changed auto to auto& * changed to function compile APIs for TRT execution provider * changed to function compile APIs for TRT execution provider * added new DType DInt64 * adapted to the changes of onnxruntime_c_api * removed trt kernel (use function compile instead) * updated onnx-tensorrt submodule * set default memory type to TRT fused kernel * resolve merge conflict * fixed the issue that USE_CUDA conflicts with USE_TRT * construct graph by adding nodes in topological order * made changes for Windows * change buffers type * bypass HasImplementationOf check for TRT XP because TRT kernel is not registered * added domain to version info in rebuilt model proto * added trt to test option list * added DomainToVersionMap() to GraphViewer * removed Copy() * fixed broken code * format the code to clang format * used local reference to the frequently used values * fixed a couple of issues according to reviewers feedback * fixed a couple of issues according to reviewers feedback * added python binding for TRT and enable use_cuda when use_trt is on * fixed a redefinition issue * changed shared_ptr to unique_ptr on trt engines, and made a few changes required by reviewers * enabled trtexecution provider for unit tests * renamed trt to tensorrt * added tesorrt to python binding * update submodule onnx and onnx-tensorrt * made a couple of minor changes based on reviewer's feedback * added CUDA_CHECK * removed test code * fixed broken code after merge * updated onnx-tensorrt submodule * added post processing to align trt inputs/outputs with graph inputs/outputs * updated onnx submodule * added CUDA fallback for TensorRT and fixed TensorRT cmake issue * added ci pipeline for tensorrt and removed some redundent code from trt xp * fixed syntax issue * updated onnx-tensorrt submodule * fix trt build problem by: (#602) 1. Add additional /wd for debug build 2. Add io.h for additional targets 3. Bring back mb version of getopt * Update install_ubuntu.sh * Update linux-gpu-tensorrt-ci-pipeline.yml * Update linux-gpu-tensorrt-ci-pipeline.yml * Update run_build.sh * Update run_build.sh * Update run_build.sh * Update run_build.sh * fixed the issue that GetKernelRegistry returns nullptr * merged master to this branch * moved some data types to private * fixed tensorrt CI pipeline issue * customized test data for TensorRT pipeline * added onnx-tensorrt in json file and fixed an issue in ci script * added comments	2019-03-14 12:00:39 -07:00
Konstantinos Karanasos	2ae83c580c	Constant folding (#168 ) Constant folding rewrite rule computes nodes that have only constant inputs at compile time and avoids these computations at run time.	2019-03-13 15:44:26 -07:00
jignparm	de9f1ff1ff	Add new C function OrtOnnxTypeFromTypeInfo (#585 )	2019-03-12 10:11:14 -07:00
Changming Sun	3ef273b84b	Support memory mapping on Linux	2019-03-11 19:39:02 -07:00
Ryan Hill	af9c554dd3	Ryanunderhill/custom op (#550 ) * Prototype version that demonstrates it can work * Switched to OrtValue and removed the OrtCustomOpTensor code. * Support multiple outputs and reading of attributes * Add custom domain handling to custom ops * Update documentation * more wording changes	2019-03-06 19:09:55 -08:00
Scott McKay	0e65bfe7ae	Remove caching from InferenceSession::Run (#547 ) * Remove caching from InferenceSession::Run * Fix automatic merge of one file * trigger rerunning checks	2019-03-06 14:29:42 -08:00
Changming Sun	8e0fff7b8d	Support large model(>2GB) (#520 ) 1. Support the new external data extension in ONNX 1.4 onnx/onnx#678 2. Enable onnxruntime_perf_test in Mac Build 3. move path_lib.h from onnx_test_runner source dir to onnxruntime_framework 4. Enable memory planner for string tensors 5. Make memory planner always enabled, to simplify model loading logic 6. Delete some duplicated code between onnxruntime_perf_test and onnx_test_runner 7. Delete win_getopt_mb lib. 8. Remove the dependency on Pathcch lib, which is only available on Windows 8 and newer.	2019-03-05 21:27:12 -08:00
Hariharan Seshadri	a697e0b710	Implement Shrink operator (#485 ) * Initial commit * Adding shrink tests * Fix formatting in shrink_test.cc * Fix broken build * More changes * PR feedback and formatting * Place files in the right location corresponding to def file location in onnx * Exclude shrink model test in test_series.py * Remove shrink from exclusion list in main.cc * Adding test to exclusion list * More tests * Formatting * PR feedback * PR feedback * More changes * PR feedback * More changes * Fix broken build * Fix nit * Fix nit	2019-03-01 12:51:22 -08:00
Scott McKay	6c7099a18e	Break dependency on SessionState for ExecutionFrame and OpKernelContext so optimizers can execute a node with a minimal setup (#498 ) * Break dependency on SessionState for ExecutionFrame and OpKernelContext so optimizers can execute a node with a minimal setup. - Create IExecutionFrame - split out core logic and interface from extended logic used in full Graph execution (that uses allocation plan and memory pattern planner) - Update NodeIndexInfo to allow contruction from a subset of nodes - split out logic from GraphNodes into a re-usable template so it can be used with a vector of const Node* as well as a vector of unique_ptr<Node> - Remove SessionState from OpKernelContext - Misc cleanups - move AllocPlanPerValue out of SequentialExecutionPlan as it's used in a more generic manner that isn't specific to a sequential execution plan NOTE: I manually tested the new paths, especially NodeIndexInfo. There will shortly be optimizers added that use the new infrastucture so they'll get test coverage as part of those changes. * Fix linux build issue. Handle graph with no nodes in NodeIndexInfo.	2019-02-27 15:46:50 -08:00
Scott McKay	dfa21af302	Update C API to allow user to enable caching of feeds and fetches info across calls to Run (#522 ) * Add ability to enable caching to the C API, and update the internals to pass the feed names and MLValue instances in vectors so the order is deterministic (so cache entry matching works as expected). * Address PR comment and don't use 'bool' * Remove meaningless C# test around duplicate input. We _could_ check input names for duplicates (previously we did this via the usage of unordered_map), but the system will gracefully handle with the duplicate anyway (will just use the last value provided for the input name). Based on that, I don't think the cost of checking for duplicates is worth it. * Fix c-style cast in test_run_options.	2019-02-27 13:41:17 -08:00
shahasad	f9bae489bd	cleanup extra header from c api and sanitize C api test (#517 ) * cleaned up the additional header in C-api * ensure test failure surfaces in the build pipeline * sanitized runtest.bat * cleanup unneeded headers * formatting and typos	2019-02-24 21:06:54 -08:00
Scott McKay	5171e8b129	Make IExecutionProvider::Type return const std::string& instead of a new string. (#506 ) Store the type string in IExecutionProvider so that Type() doesn't need to be a virtual.	2019-02-22 18:27:01 +10:00
Changming Sun	b69c834c06	Optimize graph partition	2019-02-20 16:32:04 -08:00
Changming Sun	b02c1d80d4	Fix an SAL annotation in the C API	2019-02-20 12:51:00 -08:00
Scott McKay	fc7185f060	Various optimizations to reduce the setup and device copying cost outside of the call to ExecuteGraph. (#470 ) * Various optimizations to reduce the setup and execution cost. Cache information about the feeds and fetches, and any device copies required to execute the graph so we minimize checking for later calls to ExecuteGraph using the same input/output. - enable use of caching in Loop and Scan - make use of caching optional for InferenceSession::Run - handle calls to Run with different feeds and fetches to support scenarios where there may be a truncated sequence in some calls Take the feed names and MLValue instances as vectors so the order is deterministic. Add unit tests Update onnxruntime_perf_test to enable caching. * Couple of tweaks. Fix shared library unit test failure. Attempt to workaround MacOS build failure due to VC++ bug around including reaching scope values in a lambda automatically. * Rework order of init in Run so we get nice error messages about invalid feed/output names. * Refine logic around copying MLValue using execution provider so common code can be used. Simplify the logic due to this change. Split the paths for executing with/without cached info so we can be more const correct with how FeedsFetchesManager is passed in. This makes it clearer when a shared instance can be used due to it being const. Cache the FeedsFetchesManager instances in the control flow nodes. They can be re-used across calls to Compute. * Removed unused local variable to fix some builds. * Fix build issue by cleaning up some more unused params. * Check names when using cache entry from SessionState. Add unit test.	2019-02-20 12:12:17 +10:00
Pranav Sharma	9bc6503463	Support non-tensor types in the C API. (#489 ) * support non-tensor types * support non-tensor types. * support non-tensor types. * fix compilation issues * fix compilation issues * fix compilation issues * add test cases * test cases * add test cases * try to fix string test case * working now * use allocator (broken) * string test broken after using allocator * full working example * Fix PR comments	2019-02-19 14:11:46 -08:00
Changming Sun	d05b74b1b7	Delete Tensor::ShallowCopy	2019-02-12 15:51:36 -08:00
Ke Zhang	fc90a9b2fc	allocator refactor (#467 ) * update CPUAllocator. * onnxruntime * fix build break * remove useless subclasses of CPUAllocator. * refactor to get allocator from executionproviders instead of execution provider.	2019-02-12 14:14:21 -08:00
Hariharan Seshadri	fdd71574d6	misc: Fix comment in op_node_proto_helper (#460 ) * Fix comment in op_node_proto_helper * PR feedback	2019-02-11 14:38:43 -08:00
Changming Sun	4cdb0cbf6e	A tiny fix in KernelCreateInfo	2019-02-06 17:59:20 -08:00
Changming Sun	7c70d9349a	Fix a bug in execution_provider.cc	2019-02-06 17:08:38 -08:00
Weixing Zhang	851e291f22	Make OpKernelInfo not depend on SessionState. (#442 )	2019-02-05 22:38:50 -08:00
Changming Sun	9faac70dae	Delete Tensor's copy constructor	2019-02-05 16:38:27 -08:00
Weixing Zhang	696ab8a194	Create a separate component for graph optimization. (#421 ) * Create a project for graph optimizer. Move optimizer related code to the folder optimizer. * Fix build failures. * rebase and fix build failures. * fix build failure. * fix build failure with cuda path. * fix python build failure. * Move two transformers(memcpy and insert_cast) from framework to optimizer. * rebase. * SessionState should not depend on optimizer.	2019-02-04 15:45:12 -08:00
Scott McKay	f85cd520c0	Recurse into subgraphs in transformers and session initialization (#368 ) * Add Recurse method to GraphTransformer. Move GraphTransformer::Apply to ApplyImpl and make private. Add non-virtual GraphTransformer::Apply method to handle calling Graph::Resolve in a more consistent manner. Create MemcpyTransformer GraphTransformer to handle memcpy operations on subgraphs in a more standard way. * Checkpoint * Make the subgraph insert less verbose * Add graph nesting level to transformer ApplyImpl Tweak cast transformer to recurse nicely and avoid unnecessary Resolve calls by splitting out the duplicate removal into a separate transformer. Decouple memcpy transformer from ExecutionProviders and minimise what's in the header. * Recurse into subgraphs inside GraphPartitioner * Update a couple of new transformers * Check Recurse return value. * Cleanup some memory management in inference session by moving some things into SessionState * Add deleted flag to rewrite rules so we stop processing nodes that are removed. Remove some (most likely) unnecessary Resolve calls. As we always call Resolve for a graph modified by a transformer there's generally no need for the transformer to do it. * Minor cleanups. * Add some extra usage information to the comments in GraphTransformer. * Address PR comments	2019-02-02 06:03:00 +10:00
Scott McKay	efb72540be	Separate out constant node index information from ExecutionFrame (#410 ) * Separate out the NodeArg index information from ExecutionFrame so it is only calculated once. * Skip copy to/from device if only CPU execution provider is registered. Cleanups. * Address PR comments. Clean up a few areas. * Fix Linux build error	2019-02-01 10:55:49 +10:00
Konstantinos Karanasos	c76725da2d	Slice elimination rewrite rule; re-implementation of identity elimination using new Graph API (#87 ) Rewrite rule that eliminates slice operators when they are redundant (i.e., when they preserve the whole input). Re-implementation of the identity elimination rule using the latest Graph API.	2019-01-30 16:19:40 -08:00
Ryan Hill	09806625cf	Rename OrtInitialize to OrtCreateEnv in preparation for future. (#399 ) * Rename OrtInitialize to OrtCreateEnv in preparation for future. Add version number to structures * Forgot about exports * Update documentation	2019-01-29 15:03:18 -08:00
Ryan Hill	d875ab2acd	C API - Remove reference counting (#344 )	2019-01-25 19:41:10 -08:00
stevenlix	8ea7197b82	trt (#361 ) * updated cmake files for tensorrt	2019-01-23 13:28:13 -08:00
Scott McKay	8b55596dfe	The CUDA compiler doesn't support gsl::suppress so disable when __NVCC__ is defined. (#358 )	2019-01-22 17:42:33 +10:00
Changming Sun	c87929e949	Use nsync for implementing condition variable	2019-01-21 22:59:42 -08:00
Ke Zhang	6831fc16ed	Kezhan/kernel registry refine (#346 ) * refactor kernel registry to make it a little bit more readable. * update * update cudaexecutionprovider * fix build break * fix comments * fix build break	2019-01-18 09:55:30 -08:00
Scott McKay	9f3ae4279f	Handle copy to/from non-CPU devices across control flow nodes (#339 )	2019-01-17 10:51:23 -08:00
Changming Sun	c2704b5afb	cleanup code (#343 )	2019-01-16 17:12:22 -08:00
Ryan Hill	98a92547bf	Ryanunderhill/c api 8 (#297 ) * Make OrtAllocator not be reference counted * Make the allocator interface more type safe * Fix build break * Build break fix * Build break fix * Mistake in previous build fix. * Fix review comments + build break * Missed the export symbols * C specific error, need 'struct' keyword in one case. * Function calling OrtReleaseObject instead of OrtReleaseEnv	2019-01-10 02:06:29 -08:00
Changming Sun	5e113661a9	Build system upgrades (#281 ) * update * runas normal user	2019-01-07 13:15:24 -08:00
Pranav Sharma	de383d93be	Fix inconsistency in enum names in the C API (#277 ) * Fix inconsistency in enum names in the C API * fix build	2019-01-04 16:41:15 -08:00
Tang, Cheng	d0fa974976	interface change to code-generated kernels (#192 ) * merge function compile interface * fix build error * fix linux build break * fix static cast issue; fix clang style * fix argument change * use alignment allocation;fix comments in pr * fix linux break * apply clang format * rename according to comments in pr * rename according to pr comments;remove useless file * remove the need_compile flag * avoid passing whole session state	2019-01-02 17:18:08 -08:00
Ryan Hill	6a090985fb	More C API changes (#259 ) * More API changes, remove 'Inference' from function names. Remove enum values. Make Status match other types. * Switch to bool instead of int, and remove stdbool	2018-12-28 14:53:19 -08:00
Dmitri Smirnov	7af1887b33	Introduce basic BFloat16 runtime support (#235 ) * Add basic support for BFloat16 type. * Advance onnx submodule for bfloat16 support. * Update install_deps for linux. * Address review comments.	2018-12-21 12:40:59 -08:00
Ryan Hill	a37887cfa1	More intuitive ordering to the API functions (#233 ) * More intuitive ordering to the API functions * Rename TCHAR_T	2018-12-20 13:47:48 -08:00
Tang, Cheng	c453b48b71	update kernel memory type interface (#225 ) * refactor the kernel memory type interface * remove useless change * fix comments in PR	2018-12-20 11:11:50 -08:00
Ryan Hill	773114a4f1	More C header naming changes (#202 ) * More Ort prefix changes for consistency * Fix C# methods * More C# fixes	2018-12-18 11:39:46 -08:00
edgchen1	71c56b6d7c	Fix array feature extractor out of bounds access issue (#194 ) * Fixed out of bounds access in ArrayFeatureExtractor. * some cleanup * Updated tensor_shape.h comments. * Updated macro name. * Added copy assignment, move assignment/ctor to TensorShape. * Removed i64 literal suffix. * Fixed test. * Fixed type of x_num_dims.	2018-12-18 00:30:07 -08:00
Ryan Hill	11b369a864	Abbreviate ONNXRuntime as Ort in all of our public APIs (#175 ) Applies to all public headers and macros, plus many internal ones. There are still some internal things with OnnxRuntime in the name, but this fixes all public functions & macros.	2018-12-14 14:54:23 -08:00

1 2

66 commits