onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-07 00:13:17 +00:00

Author	SHA1	Message	Date
Scott McKay	274e6b4153	Cleanup SessionState. Move allocator lookup to SessionState. (#4194 ) * Move allocators to SessionState so they're decoupled from ExecutionProviders - when looking up an allocator it's based on OrtMemoryInfo not the EP so SessionState is a more natural place for that infromation to be stored - add device based lookup - simplifies logic for copying feeds/fetches across devices Cleanup SessionState and SessionStateInitializer - provide more things to SessionState at construction time so we don't construct and instance and immediately after call a bunch of setters - simplify SessionStateInitializer - reduced down to FinalizeSessionState method	2020-06-28 14:55:42 +10:00
edelaye	64b5f7edf6	Initial release of Vitis-AI Execution Provider (#3771 ) * Initial release of Vitis-AI Execution Provider * Add documentation, fix for onnxruntime::Model changes and use stringstream instead of file dump for model passing * - Add Vitis-AI docker file - Add online quantization flow Vitis-AI execution provider - Fix remarks * - Add fatal error build message for Vitis-AI cmake build on Windows - Fix pep8 issue in build.py - Add Vitis-AI execution provider example in docs Co-authored-by: Elliott Delaye <elliott@xilinx.com> Co-authored-by: Jorn Tuyls <jornt@xilinx.com> Co-authored-by: Jorn Tuyls <jtuyls@users.noreply.github.com>	2020-05-19 05:32:32 -07:00
airockchip	edaf8a542c	Initial PR for RKNPU execution provider (#3609 ) * Initial RKNPU execution provider * Init * Support Ops: Conv, Relu, Clip, LeakyRelu, MaxPool, AveragePool, GlobalAveragePool, Concat, Softmax, BatchNormalization, Gemm, Add, Mul, Sub, Reshape, Squeeze, Unsqueeze, Flatten, Transpose, QLinearConv, DequantizeLinear * Add rknpu unittest * Update BUILD.md and Add RKNPU-ExecutionProvider.md * misc code update * fix CLIP accuracy issue. * fix "Error: Duplicate definition of name". * move rknpu_ddk out of onnxruntime submodule. * remove temporary code. * add rknpu namespace. * update misc of node_attr_helper * add const & comment for onnx_converter * add const & comment for shaper * unify variable name Co-authored-by: dkm <dkm@rock-chips.com> Co-authored-by: George Wu <jywu@microsoft.com>	2020-05-05 20:36:47 -07:00
edgchen1	4aa033b99e	Addressing review comments (#3690 ) - https://github.com/microsoft/onnxruntime/pull/3681#discussion_r414359326 - https://github.com/microsoft/onnxruntime/pull/3681#discussion_r414359463 - https://github.com/microsoft/onnxruntime/pull/3681#discussion_r414360023 - https://github.com/microsoft/onnxruntime/pull/3681#discussion_r414361667 - https://github.com/microsoft/onnxruntime/pull/3681#discussion_r414368707 - https://github.com/microsoft/onnxruntime/pull/3681#discussion_r414371480 - https://github.com/microsoft/onnxruntime/pull/3681#discussion_r414379362 - https://github.com/microsoft/onnxruntime/pull/3681#discussion_r414374516 - https://github.com/microsoft/onnxruntime/pull/3681#discussion_r414801087	2020-04-24 14:57:18 -07:00
Edward Chen	daa14b64e3	Merge remote-tracking branch 'origin/master' into edgchen1/merge_from_master	2020-04-21 03:31:32 +00:00
Scott McKay	7d5348f87e	Add ability to batch device copy for graph inputs and outputs. (#3580 ) * Add ability to batch device copy for graph inputs and outputs.	2020-04-19 17:51:07 +10:00
Edward Chen	e542cfd0e0	Introduce training changes.	2020-03-11 14:39:03 -07:00
Changming Sun	abb626ff60	Provide alternative std::mutex implementation on Windows (#3000 ) Provide alternative std::mutex implementation on Windows. OrtMutex is no longer an alias of std::mutex. We do it because: 1. This new thing is faster and much much simpler. 2. Static constructors are considered harmful. We should avoid such thing as possible as we can.	2020-02-11 11:46:08 -08:00
Scott McKay	a92e924ab2	Revert "Use IArenaAllocator::Reserve for initializers and mem pattern planner blocks (#2835 )" (#2904 ) This reverts commit `724ff0753b`.	2020-01-24 14:02:30 +10:00
Scott McKay	724ff0753b	Use IArenaAllocator::Reserve for initializers and mem pattern planner blocks (#2835 ) * Use IArenaAllocator::Reserve for initializers and mem pattern planner blocks.	2020-01-17 07:41:48 +10:00
Dmitri Smirnov	d34fb62012	Introduce container type runtime checks and other improvements (#2522 ) Rework TensorSeq in a manner consistent with Tensor and SparseTensor in terms of type system setup. Reduce templating. Introduce helpers to ensure the same data type. Make OrtValue __dtor not virtual. Introduce ContainerChecker	2019-12-04 16:04:17 -08:00
Sreekanth Yalachigere	31ea11a696	Renaming MKL-DNN as DNNL (#2515 ) * DNNL: Moving Files to rename file names * DNNL name change * azure pipeline updated * disable ceil/dialation and enable Opset10 * disable ceil/dialation tests in Python * mlperf_ssd_resnet34_1200 disabled	2019-12-03 07:34:23 -08:00
Scott McKay	be12cdc73f	Add CUDA If operator. (#2377 ) * Add CUDA If operator. Uses CPU operator for implementation. By adding a CUDA version the inputs/outputs (with the exception of the 'cond' input) stay on GPU, and no other logic is required to avoid a copy to CPU across the control flow node.	2019-11-19 12:01:46 +10:00
George Wu	0c6e9f94d0	fix builds enabling onnxruntime_DEBUG_NODE_INPUTS_OUTPUTS (#2369 ) * fix builds enabling onnxruntime_DEBUG_NODE_INPUTS_OUTPUTS * update	2019-11-11 15:26:18 -08:00
Dmitri Smirnov	25b3c51661	Introduce PrimitiveType into a Type System along with an integer constant (#2307 ) Improve perf by avoiding GetType<T>() calls. Introduce MLTypeCallDispatcher to switch on Input Type. Add Tensor IsType<T>() fast method.	2019-11-08 17:47:06 -08:00
Tianlei Wu	bc85d43809	Dump cuda tensor data (#2243 ) * dump cuda tensor * move data_type definition * Dump cuda tensors for cuda build only. Output tensor location (if it is not in CPU or pinned) * update for cuda build * Update for code review feedback * update for CR feedback * use data transfer manager for tensor copy	2019-10-31 21:09:10 -07:00
Scott McKay	5c86889beb	Fix linux build issue with debug dump of shapes and data. (#2202 ) Add option to dump just shapes or shapes and data.	2019-10-20 20:35:48 -07:00
Pranav Sharma	91db840b6b	Introduce execution mode enum for clarity and extensibility; Change Python, C and C# APIs accordingly; Removed EnableSequentialExecution, DisableSequentialExecution in favor of the more general SetExecutionModeAPI. (#2098 ) * Introduce execution mode for clarity and extensibility; Change Python APIs accordingly; Replace DisableSequentialExecution API with EnableParallelExecution for clarity. * Fix cuda build * Modify the test slightly * Make C and C# APIs consistent with Python.	2019-10-14 09:48:19 -07:00
Pranav Sharma	4cdb95e436	Resort to sequential execution if the inter op thread pool ptr is nullptr; (#2023 )	2019-10-06 16:08:41 -07:00
Dmitri Smirnov	d1b1cdc5c4	Replace GSL with GSL-LITE submodule and fix up refs (#1920 ) Remove gsl subodule and replace with a local copy of gsl-lite Refactor for onnxruntime::make_unique gsl::span size and index are now size_t Remove lambda auto argument type detection. Remove constexpr from fail_fast in gsl due to Linux not being happy. Comment out std::stream support due to MacOS std lib broken. Move make_unique into include/core/common so it is accessible for server builds. Relax requirements for onnxruntime/test/providers/cpu/ml/write_scores_test.cc due to x86 build. Add ONNXRUNTIME_ROOT to Server Lib includes so gsl is recognized	2019-10-01 12:43:29 -07:00
Scott McKay	bd2d6af9ca	Filter out info from non-const initializers during shape inferencing (#1806 ) * Don't return shape for non-const initializer in InferenceContextImpl::getInputType Don't return initializer for non-const initializer in InferenceContextImpl::getInputData Update graph_utils to support these scenarios - fix GetConstantInitializer to make sure a name is for an outer scope value before checking a parent graph, as local name could shadow an outer scope initializer.	2019-09-26 13:44:33 +10:00
Pranav Sharma	f8c3442880	Part 2 of renaming AllocatorInfo to MemoryInfo. (#1804 ) * Mention OrtCreateSessionFromArray in C API doc * Part 2 of renaming AllocatorInfo to MemoryInfo. * pr comments * fix comment	2019-09-12 08:19:29 -07:00
Scott McKay	98dbdb1e0b	Rework the feed/fetch copy setup so that it can be calculated prior to subgraph execution (#1761 ) * Rework the feed/fetch copy setup so that it can be calculated upfront by the control flow nodes. Also simplifies how it all works. Update the control flow nodes to do the calculation prior to graph execution.	2019-09-10 15:46:00 +10:00
Pranav Sharma	52fe574fed	Rename OrtAllocatorInfo to OrtMemoryInfo to make it more obvious. (#1758 ) * Mention OrtCreateSessionFromArray in C API doc * Rename OrtAllocatorInfo to OrtMemoryInfo to avoid confusion	2019-09-05 14:20:37 -07:00
KeDengMS	5873bdbb3f	Share default CPU allocator with Mlas preferred alignment (#1682 ) Description: make default CPU allocator to use MLAS preferred alignment Motivation and Context This is needed for C API to have an aligned default CPU allocator, the same as the one in CPU provider	2019-08-23 12:06:35 -07:00
Ke Zhang	b53f40a886	update set fetches for execution with allocation plan. (#1668 )	2019-08-21 19:58:05 -07:00
Ke Zhang	bd64ca3019	Kezhan/execute graph refactoring (#1553 ) * checking execution provider logic updated. * fix the logic of copy input and output. * update * update * update * update * update * update * fix ngraph failure. * fix comments	2019-08-14 01:07:05 -07:00
stevenlix	1c5b15c2b8	Remove memory copy between TensorRT and CUDA (#1561 ) * remove memory copy between CUDA and TRT * add info to RegisterExecutionProvider input * use new IDeviceAllocator for trt allocator * remove SetDefaultInputsMemoryType from TRT EP * remove onnx-tensorrt 5.0 * add submodule onnx-tensorrt branch 5.1 * remove redundancy * Update transformer_memcpy.cc * Update tensorrt_execution_provider.cc * switch to TensorRT 5.1.5.0 * update python binding * disable failed test case on TensorRT * Update activation_op_test.cc * upgrade to TensorRT container 19.06 * update according to feedback * add comments * remove tensorrt allocator and use cuda(gpu) allocator * update onnx-tensorrt submodule * change ci build cuda directory name	2019-08-08 19:31:39 -07:00
Ke Zhang	cb71c69d5e	checking execution provider logic updated. (#1547 )	2019-08-02 13:29:39 -07:00
Ke Zhang	3bf0e364e2	Move CopyTensor out of IExecutionProvider interface. (#1268 ) * add ortdevice class * add data transfer manager for copying tensors. * update * add data trasnfer for gpu * fix constexpr build break. * update * remove unnecessary header files. * remove unnecessary header files. * add dependency * add dependency * add dependency * add dependency * fix linux build break. * update * fix build break * fix build break * fix build break * update * update * update c api. * update to not use OrtCreateAllocatorInfo * change to all eps . * fix linux build break * remove useless codes. * update * move datatransfermanager in session state * update * fix cuda build break. * fix comments * fix windows GPU build. * fix comments * fix build break * fix comments * fix test failure * update * fix comments * fix onnx runtime server. * update * fix test failure. * fix comments * fix comment	2019-07-11 14:49:20 -07:00
Scott McKay	c1a34a8ba6	Add ability to dump node input/output (#1202 ) Address #1155 Add debug helper methods to be able to dump input name and shape information for node inputs, and the data from node outputs. As the input data comes from graph inputs, initializers or node outputs we don't dump it. Must be manually enabled by building with '--cmake_extra_defines onnxruntime_DEBUG_NODE_INPUTS_OUTPUTS=ON'	2019-06-13 06:47:50 +10:00
Changming Sun	2663b9c443	Remove unnecessary casts from OrtValue to MLValue(#1051 )	2019-05-17 07:52:59 -07:00
Changming Sun	99556b111d	Make MemPatternPlanner on/off switchable in model weight loading (#989 )	2019-05-16 14:39:09 -07:00
nivas-x86	3b0dda0aca	nGraph: Avoid input and output data copies (#940 )	2019-04-30 12:10:28 -07:00
stevenlix	e8b0ae8923	Trt execution provider (#382 ) * updated cmake files for trt * added trt execution provider * added trt basic test * removed trt_path action attribute * Add files via upload * Update build.py * Update trt_allocator.h * fixed issues found by reviewers * changed cast operator * added comment for custom kernel implementation * changed auto to auto& * changed to function compile APIs for TRT execution provider * changed to function compile APIs for TRT execution provider * added new DType DInt64 * adapted to the changes of onnxruntime_c_api * removed trt kernel (use function compile instead) * updated onnx-tensorrt submodule * set default memory type to TRT fused kernel * resolve merge conflict * fixed the issue that USE_CUDA conflicts with USE_TRT * construct graph by adding nodes in topological order * made changes for Windows * change buffers type * bypass HasImplementationOf check for TRT XP because TRT kernel is not registered * added domain to version info in rebuilt model proto * added trt to test option list * added DomainToVersionMap() to GraphViewer * removed Copy() * fixed broken code * format the code to clang format * used local reference to the frequently used values * fixed a couple of issues according to reviewers feedback * fixed a couple of issues according to reviewers feedback * added python binding for TRT and enable use_cuda when use_trt is on * fixed a redefinition issue * changed shared_ptr to unique_ptr on trt engines, and made a few changes required by reviewers * enabled trtexecution provider for unit tests * renamed trt to tensorrt * added tesorrt to python binding * update submodule onnx and onnx-tensorrt * made a couple of minor changes based on reviewer's feedback * added CUDA_CHECK * removed test code * fixed broken code after merge * updated onnx-tensorrt submodule * added post processing to align trt inputs/outputs with graph inputs/outputs * updated onnx submodule * added CUDA fallback for TensorRT and fixed TensorRT cmake issue * added ci pipeline for tensorrt and removed some redundent code from trt xp * fixed syntax issue * updated onnx-tensorrt submodule * fix trt build problem by: (#602) 1. Add additional /wd for debug build 2. Add io.h for additional targets 3. Bring back mb version of getopt * Update install_ubuntu.sh * Update linux-gpu-tensorrt-ci-pipeline.yml * Update linux-gpu-tensorrt-ci-pipeline.yml * Update run_build.sh * Update run_build.sh * Update run_build.sh * Update run_build.sh * fixed the issue that GetKernelRegistry returns nullptr * merged master to this branch * moved some data types to private * fixed tensorrt CI pipeline issue * customized test data for TensorRT pipeline * added onnx-tensorrt in json file and fixed an issue in ci script * added comments	2019-03-14 12:00:39 -07:00
Scott McKay	0e65bfe7ae	Remove caching from InferenceSession::Run (#547 ) * Remove caching from InferenceSession::Run * Fix automatic merge of one file * trigger rerunning checks	2019-03-06 14:29:42 -08:00
Changming Sun	cf41f76d79	Fix some warnings (#551 )	2019-03-06 11:46:59 -08:00
Changming Sun	8e0fff7b8d	Support large model(>2GB) (#520 ) 1. Support the new external data extension in ONNX 1.4 onnx/onnx#678 2. Enable onnxruntime_perf_test in Mac Build 3. move path_lib.h from onnx_test_runner source dir to onnxruntime_framework 4. Enable memory planner for string tensors 5. Make memory planner always enabled, to simplify model loading logic 6. Delete some duplicated code between onnxruntime_perf_test and onnx_test_runner 7. Delete win_getopt_mb lib. 8. Remove the dependency on Pathcch lib, which is only available on Windows 8 and newer.	2019-03-05 21:27:12 -08:00
Changming Sun	b69c834c06	Optimize graph partition	2019-02-20 16:32:04 -08:00
Scott McKay	fc7185f060	Various optimizations to reduce the setup and device copying cost outside of the call to ExecuteGraph. (#470 ) * Various optimizations to reduce the setup and execution cost. Cache information about the feeds and fetches, and any device copies required to execute the graph so we minimize checking for later calls to ExecuteGraph using the same input/output. - enable use of caching in Loop and Scan - make use of caching optional for InferenceSession::Run - handle calls to Run with different feeds and fetches to support scenarios where there may be a truncated sequence in some calls Take the feed names and MLValue instances as vectors so the order is deterministic. Add unit tests Update onnxruntime_perf_test to enable caching. * Couple of tweaks. Fix shared library unit test failure. Attempt to workaround MacOS build failure due to VC++ bug around including reaching scope values in a lambda automatically. * Rework order of init in Run so we get nice error messages about invalid feed/output names. * Refine logic around copying MLValue using execution provider so common code can be used. Simplify the logic due to this change. Split the paths for executing with/without cached info so we can be more const correct with how FeedsFetchesManager is passed in. This makes it clearer when a shared instance can be used due to it being const. Cache the FeedsFetchesManager instances in the control flow nodes. They can be re-used across calls to Compute. * Removed unused local variable to fix some builds. * Fix build issue by cleaning up some more unused params. * Check names when using cache entry from SessionState. Add unit test.	2019-02-20 12:12:17 +10:00
Changming Sun	d05b74b1b7	Delete Tensor::ShallowCopy	2019-02-12 15:51:36 -08:00
Ke Zhang	fc90a9b2fc	allocator refactor (#467 ) * update CPUAllocator. * onnxruntime * fix build break * remove useless subclasses of CPUAllocator. * refactor to get allocator from executionproviders instead of execution provider.	2019-02-12 14:14:21 -08:00
Scott McKay	efb72540be	Separate out constant node index information from ExecutionFrame (#410 ) * Separate out the NodeArg index information from ExecutionFrame so it is only calculated once. * Skip copy to/from device if only CPU execution provider is registered. Cleanups. * Address PR comments. Clean up a few areas. * Fix Linux build error	2019-02-01 10:55:49 +10:00
Scott McKay	b194b7df0d	Add the ability to use a custom allocator for fetches to avoid unnecessary copies in control flow operators. (#377 ) * Add the ability to use a custom allocator for fetches. Allows control flow nodes to forward the allocation to the control flow op and avoid an unnecessary copy when the subgraph output has a symbolic dimension. Update Scan and If to use custom allocators when applicable. * Remove unnecessary forward declaration * Fix Mac build warnings	2019-01-29 19:48:10 +10:00
stevenlix	8ea7197b82	trt (#361 ) * updated cmake files for tensorrt	2019-01-23 13:28:13 -08:00
Scott McKay	9f3ae4279f	Handle copy to/from non-CPU devices across control flow nodes (#339 )	2019-01-17 10:51:23 -08:00
Ryan Hill	11b369a864	Abbreviate ONNXRuntime as Ort in all of our public APIs (#175 ) Applies to all public headers and macros, plus many internal ones. There are still some internal things with OnnxRuntime in the name, but this fixes all public functions & macros.	2018-12-14 14:54:23 -08:00
Ke Zhang	a78acb2d2c	rename graph.h to graph_viewer.h (#84 )	2018-12-04 08:41:03 -08:00
Pranav Sharma	89618e8f1e	Initial bootstrap commit.	2018-11-19 16:48:22 -08:00

49 commits