onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-03 03:58:54 +00:00

Author	SHA1	Message	Date
Hector Li	730240d2a5	remove the link the comments (#12510 )	2022-08-08 15:20:40 -07:00
Scott McKay	8d830adf24	Rework parts of Graph::Resolve to reduce memory usage (#12176 ) * Rework some aspects of Graph::Resolve to reduce memory usage.	2022-08-05 13:20:25 +10:00
Dmitri Smirnov	a4ef0e7f7b	Remove dynamic allocation for ThreadPool ParallelSection (#12429 ) Use InlinedVector in a TP Store per thread parallel section in std::optional and avoid memory allocation	2022-08-04 09:46:16 -07:00
Ryan Hill	52d4699788	Minor doc fixes (#12388 )	2022-08-03 19:47:36 -07:00
Hariharan Seshadri	d5a1c01b38	Add C++ Session ctor taking model bytes and OrtPrepackedWeightsContainer (#12333 )	2022-07-29 12:32:43 -07:00
Yateng Hong	c579497134	Fix TRT custom op issue (#12283 ) * Pass schema registry on CreateModel. * Fix ORT_MINIMAL_BUILD. * Fix build issue.	2022-07-29 03:39:56 -07:00
Ryan Hill	3e014a5e5d	Fix C header to stop people accidentally copying the OrtApi by value (#12297 ) * Fix C header to stop people accidentally copying the OrtApi by value * Remove api_ from KernelTwo	2022-07-25 19:19:40 -07:00
Dmitri Smirnov	3bf614fd47	Eliminate memory allocations per recent profiling (#12225 ) * Alloc begin FeedsFetches refactoring Refactor Tensor class Fix buffer deletor Remove new/delete deleted Adjust alloc move Fix up xnnpack provider Clarifying the comment on Create()	2022-07-25 14:14:38 -07:00
Ashwini Khade	ceb76429db	Merge pull request #12056 from microsoft/bmeswani/merge-training_dev/on_device_poc Merge On-Device-Training Offline Tooling and C/C++ APIs	2022-07-21 15:09:48 -07:00
Baiju Meswani	cbf08c7a7b	Make GetTrainingApi as a part of the OrtApis, add Training API documentation and address other pull request review comments	2022-07-21 18:11:48 +00:00
Dmitri Smirnov	4f106d2b3b	Eliminate unnecessary status lock acquisition in TP (#12196 ) Eliminate unnecessary status lock acquisition in the Thread Pool	2022-07-19 14:16:12 -07:00
Chen Fu	040c2f4517	x86/64 U8S8 Gemm Precision Fix (#12088 ) Add a graph optimization that convert u8s8 matrix multiplication to u8u8 if needed In x86/64 platforms, specifically SSE4.1, AVX2 and AVX512 CPUs provide better performance computing u8s8 matrix multiplications. Unfortunately, the higher performance comes with value overflow problems, as described in: https://www.intel.com/content/www/us/en/develop/documentation/onednn-developer-guide-and-reference/top/advanced-topics/nuances-of-int8-computations.html In this change we added a session option "session.x64quantprecision" (default off). For operators that calls u8s8 matrix multiplications, e.g. QAttention, we convert them to u8u8 when the following conditions are all satisfied: 1. Current CPU is SSE4.1, AVX2 or AVX512 with no VNNI support 2. Session option "session.x64quantprecision" is on. 3. Constant weight tensor contains values outside of [-64, 63] range Note that when weight tensor is not constant, QDQS8ToU8Transformer should already convert it to u8.	2022-07-13 10:12:25 -07:00
Baiju Meswani	a457ddc41d	Merge branch 'master' of https://github.com/microsoft/onnxruntime into bmeswani/merge_pr	2022-06-30 21:53:07 +00:00
Baiju Meswani	6e8edfff0c	Separate training apis from shared core apis (#12027 )	2022-06-29 14:12:29 -07:00
RandySheriffH	d5fcb432fa	Generalize native op creation (#11539 ) * create op from ep * read input count from context * create holder to host nodes * fix typo * cast type before comparison * throw error on API fail * silence warning from minimal build * switch to unique_ptr with deleter to host nodes * fix typo * fix build err for minimal * fix build err for minimal * add UT for conv * enable test on CUDA * add comment * fix typo * use gsl::span and string view for Node constructor * Added two APIs - CopyKernelInfo and ReleaseKernelInfo * pass gsl::span by value * switch to span<NodeArg* const> to allow for reference to const containers * fix typo * fix reduced build err * fix reduced build err * refactoring node construction logic * rename exceptions * add input and output count as arguments for op creation * refactor static member * use ORT_CATCH instead of catch * cancel try catch * add static value name map * format input definition and set err code * fix comments * fix typo	2022-06-27 21:12:15 -07:00
Baiju Meswani	d25cf4df26	Merge branch 'master' into training_dev/on_device_poc	2022-06-24 20:18:19 +00:00
Dmitri Smirnov	088bc7494b	Deprecate APIs returning raw ptrs and provide replacements (#11922 ) Provider better documentation	2022-06-24 09:50:04 -07:00
G. Ramalingam	b1411c8357	Restructure function inliner (#11731 ) * Add nested function call tests * Add overload for Specialize * Pass symboltable to onnx shape inference * Avoid renaming empty names * Enable sequence_map tests which failed before this change	2022-06-24 09:21:31 -07:00
Dmitri Smirnov	607b7df060	Allow saving on CPU usage for infrequent inference requests by reducing thread spinning (#11841 ) Introduce Start/Stop threadpool spinning switch Add a session config option to force spinning stop at the end of the Run()	2022-06-23 10:04:37 -07:00
sfatimar	61a74f2f4d	Mohsin/enable dynamic shapes (#11867 ) * Add pypi build changes to latest Master * Add ORT training part of OV build * Disabling SqueezeOpTest.BadAxes * Add ONNXruntime branch ARG to Docker build * Changes to include file details versions * Commit File Version Updates * Change naming for linux build * Add fix for pylint format errors * Fix pylint warnings. * Enable Dynamic Shapes for OV_API_20 * Update requirements.txt whl version- internal_ci fix * Update backend_manager.cc MYRIAD Fix * Update wheel version in requirements.txt * Update backend_manager.cc * Update backend_manager.cc * Update backend_manager.cc * Update setup.py * Fix pylint warnings * Fix pylint warnings 2 * Update backend_manager.cc * Update backend_manager.cc * Update backend_manager.cc * Update backend_manager.cc * Update backend_manager.cc * Update backend_manager.cc * Update backend_manager.cc * Update backend_manager.cc Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com> Co-authored-by: mayavijx <mayax.vijayan@intel.com> Co-authored-by: Sahar Fatima <sfatima.3001@gmail.com> Co-authored-by: mohsinmx <mohsinx.mohammad@intel.com>	2022-06-21 08:03:58 -07:00
Dmitri Smirnov	267a424e52	Retry Rework execution frame to reduce memory allocations (#11897 ) * Revert "Revert "Refactor ExecutionFrame and SessionState to reduce memory all… (#11888)" This reverts commit `d2cbae3a04`. * Revert prepacked_weights to avoid indirect inclusion in CUDA and TRT code that breaks the build.	2022-06-20 10:29:43 -07:00
Edward Chen	a93fe7824a	Update EP compile API deprecation warning message. (#11808 ) Minor wording update to warning message to clarify that the function style Compile API is deprecated now and will be removed soon. Also updated some code comments.	2022-06-17 12:49:24 -07:00
Yi Zhang	d2cbae3a04	Revert "Refactor ExecutionFrame and SessionState to reduce memory all… (#11888 ) Revert "Refactor ExecutionFrame and SessionState to reduce memory allocations and improve data locality (#11804)" This reverts commit `2ecba6fd25`.	2022-06-17 17:07:21 +08:00
stevenlix	bd65acd08d	Share execution context memory between TensorRT subgraphs (#11859 ) * share trt context memory * update parser to 8.4-EA * update parser to 8.4-GA * add context memory sharing enable option * update parser to 8.2-GA * fix format issue * reverse orders * fix format * fix format * fix issues	2022-06-16 22:42:40 -07:00
Dmitri Smirnov	2ecba6fd25	Refactor ExecutionFrame and SessionState to reduce memory allocations and improve data locality (#11804 ) Refactor ExecutionFrame and SessionState for better data locality and less memory allocations.	2022-06-16 16:50:48 -07:00
Scott McKay	d64f23fec0	EP factory creation cleanup and enhancements. (#11798 ) * Rework the EP factory creation setup so we're not cut-and-pasting function declarations in multiple places. Convert append EP for SNPE to be generic, and also use for XNNPACK. Add XNNPACK to C# API * Don't need stub for MIGraphX as it's using provider bridge. * Remove old 'create' functions that aren't applicable now that the EPs are built as separate libraries. * Only use EPs that require the layout transform if the opset is supported by the layout transformer. * Update wasm registration of xnnpack.	2022-06-16 07:01:41 +10:00
Ashwini Khade	f63e28c92f	C API version 0.001 (#11758 ) * C API version 0.001 * fix linker issues * fixes for save checkpoint api * plus fixes based on tests * plus test_runner and other changes * Plus cosmetic updates * remove unnecessary headers * plus some updates * plus more changes Co-authored-by: Ashwini Khade <askhade@microsoft.com@orttrainingdev10.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2022-06-15 11:13:35 -07:00
Chen Fu	d936751aad	QlinearConv threading adjustments (#11228 ) * Reserve the first core for the main thread Currently in "auto affinity" mode the worker threads are affinized to cores 0..(N-1), leaving the very last core for the main thread. This patch preserves core #0 for the main thread, and affinizes the worker threads to cores 1..N. * Avoid unneeded spin_pause in thread pool's worker threads Remove unneeded PAUSE instruction (0.1-0.2 usec latency) after a worker thread finds a task to execute. * MLAS/x86: optimize QLinearConv on hybrid CPUs Existing 4x task granularity for task partitioning on hybrid CPUs is not sufficient to compensate the difference of VNNI instructions throughput between performance and efficient cores. This patch... * Increases granularity for QLinearConv by 2x, to have 2x more tasks with 2x smaller output count * Limits QLinearConv task count from above, to avoid output count per task getting smaller than kernel's capability * Remove hardcoded task count for QLineConv as it limited scaling on 16+ cores CPUs * MLAS/x86: optimize QLinearConv on hybrid CPUs Existing 4x task granularity for task partitioning on hybrid CPUs is not sufficient to compensate the difference of VNNI instructions throughput between performance and efficient cores. This patch... * Increases granularity for QLinearConv by 2x, to have 2x more tasks with 2x smaller output count * Limits QLinearConv task count from above, to avoid output count per task getting smaller than kernel's capability * Remove hardcoded task count for QLineConv as it limited scaling on 16+ cores CP * Addressing comments * combining x86 ARM branches in qlinearconv threaded job partition * revert first core assignment Co-authored-by: Saurabh <saurabh.tangri@intel.com> Co-authored-by: Chen Fu <fuchen@microsoft.com>	2022-06-14 14:42:12 -07:00
Vincent Wang	5ecfaef042	ATen Fallback for Inference (#11597 ) * aten op for inference * fix build error * more some code to training only * remove domain from operator name * move aten_op_executor ext out from ortmodule * add pipeline * add exec mode * fix script * fix ut script * fix test pipeline * failure test * rollback * bugfix * resolve comments * enable aten for python build only * fix win build * use target_compile_definitions * support io binding * turn off aten by default * fix ut Co-authored-by: Vincent Wang <weicwang@microsoft.com> Co-authored-by: zhijxu <zhijxu@microsoft.com>	2022-06-09 16:07:30 +08:00
Scott McKay	927bac0f86	Rework allocator sharing to work for multiple devices. (#11700 ) * Rework allocator sharing to work for multiple devices. * Update SessionState to not use allocator name in matching for consistency with IExecutionProvider. The name doesn't have any clear meaning (e.g. we use the same name for the per-thread allocator in the CUDA EP as the shared allocate there and in the TRT EP). * NOTE: this means we will have one allocator per OrtMemType+OrtDevice. * Reverse order when doing allocator setup in SessionState. This will result in the CPU and CUDA EPs allocators being preferred (they are the most configurable), and also means the per-thread CUDA allocator for default GPU memory will be used even when TRT is enabled. * NOTE: Combined with the change to remove the allocator name from the key this will mean that if CUDA and TRT or ROCM and MIGraphX are both enabled the CUDA/ROCM per-thread allocator will be used to allocate GPU memory. * Use InsertAllocator instead of TryInsertAllocator. Each EP should be registered once, and we should only enter RegisterAllocator once, so the 'try' should not be required and would indicate an unexpected setup was involved. i.e. better to fail and figure out if we need to support that setup. * Add some clarifying comments around how replace allocator works. * Add unit testing for setup where EP has local allocator that may get out of sync with values in the IExecutionProvider base class. * Fix invalid check of whether data is on CPU to use device info instead of allocator name.	2022-06-09 17:38:38 +10:00
Changming Sun	3c1dd9514d	Revert "fixed point based requantization on arm64 (#11540 )" (#11732 ) This reverts commit `1f2c926`. Because it makes our packaging pipeline crash Error message: [ RUN ] QLinearConvTest.Conv3D_S8S8_Depthwise Test #1: onnxruntime_test_all ...................Subprocess killed***Exception: 838.24 sec We haven't successfully reproduced the bug on a real ARM64 hardware. Currently we only saw it showed up with qemu. More investigations are on-going.	2022-06-03 19:12:25 -07:00
Hector Li	95a16c1ffe	Snpe ep (#11665 ) * Initiate Ort SNPE EP * fix snpe ep windows build which is caused by the utility method (ToUTF8String) name change on master * correct the source path for libonnxruntime.so while building for andorid package * add AdditionalDependencies for amr64 * On MS-Windows, the patchfile must be a text file, i.e. CR-LF must be used as line endings. A file with LF may give the error: "Assertion failed, hunk, file patch.c, line 343," unless the option '--binary' is given. * fix build failure if snpe is not enabled * update doc for contrib op * separate out snpe ep settings to onnxruntime_snpe_provider.cmake * renaming according review comments * update according review comments	2022-06-03 14:10:02 -07:00
Scott McKay	4445dd6bc1	XNNPACK EP (#11445 ) * Implement XNNPACK support via an EP. * Layout transform uses the GraphPartitioner infrastructure. * Node fusion is supported. * Conv and MaxPool implementations were ported from Changming's PR. * Added optional mutex in InferenceSession::Run as we only want to allow sequential calls if xnnpack is enabled	2022-06-03 20:22:34 +10:00
Yufeng Li	1f2c92673b	fixed point based requantization on arm64 (#11540 ) * fixed point based requantization on arm64 * reverse MlasConvSymDepthwiseKernel u8s8 and s8s8 order	2022-06-02 12:34:17 -07:00
Edward Chen	738d9b153c	Consolidate several types into onnxruntime::ArgType. (#11430 )	2022-05-09 14:44:28 -07:00
Tang, Cheng	3f3c5fcd68	Unify the Compile API for mobile build and normal build (#10632 ) * use the lightweight compile api as default; use dnnl ep for testing * apply to tensorrt ep * fix the missing files * fix build * fix the copy issue on linux * migrate migraphx and openvino ep * fix openvino build break * fix linux build * fix unused parameter * fix coreml build * use graph view's filtered initializers * fix openvino break * fix tvm compile api * fix tvm / rknpu / vitisai ep build * add IsInitializedTensor in graph_viewer; fix nuphar build * use serializer directly as tvm ep is still static lib * fix the type mismatch * fix the type mismatch * fix merge conflict * add a comment * fix minimal build * fix the DML EP's legacy approach * save type/shape in dnnl IR * fix linux break * fix tvm failure * dnnl ep: move initializer referenced out of dnnl subgraph * Revert "add IsInitializedTensor in graph_viewer; fix nuphar build" This reverts commit 1cc3c7f08c16fee4fe3309a67209eb769d479587. * add IsInitializedTensor to graph viewer * add the legacy code for nuphar build to temporarily make nuphar build work * ignore internal test for nuphar * remove the out of date tests * keep the legacy API in EP for a while * turn serializer into a static function * update comments * fix tvm build * Update include/onnxruntime/core/framework/execution_provider.h Co-authored-by: Pranav Sharma <prs@microsoft.com> * Update include/onnxruntime/core/framework/execution_provider.h Co-authored-by: Pranav Sharma <prs@microsoft.com> * Update onnxruntime/core/framework/execution_provider.cc Co-authored-by: Pranav Sharma <prs@microsoft.com> * updatee comments; add warning message for legacy compil call * add a flag to control out of scope arg in serialization * fix trt build; improve the test * resolve merege errors * fix a typo Co-authored-by: Cheng Tang <chenta@microsoft.com> Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net> Co-authored-by: Pranav Sharma <prs@microsoft.com>	2022-05-05 08:30:07 -07:00
Changming Sun	963e1ace4e	Fix SAL annotations for custom op (#11432 ) Fix SAL annotations for custom op. For example, "_In_" only applies to pointers, not integers.	2022-05-04 10:47:28 -07:00
RandySheriffH	8d69b9398b	APIs for custom op to invoke ort operator directly (#10713 ) * draft kernel creation * setup eager context * call into kernel in eager mode * redefine test case * refact eager context * add comment * remove header * rename argument * redefine API definition with types * list outputs as argument * switch to int to represent length * fix compile err * create attribute API * add test case for topk * remove bool from c api * add gru test case * remove var * fix compile warnings * rename status * fix compile err * exclude sparse tensor * fix comments * fix comments * fix build err * rename file and move location * format code * move file to session folder * fix comments Co-authored-by: Randy <Randy@randysmac.attlocal.net>	2022-05-03 14:16:30 -07:00
Changming Sun	5023f6750b	Revert "Call pluggable EP's shutdown function in Environment::~Environment() (#11120 )" (#11393 ) This reverts commit `4983d6e5d6`. We can't destroy OrtEnv through python's atexit function, because at that time there might be many other ORT python objects alive.	2022-05-02 14:38:31 -07:00
Tang, Cheng	4b875e3543	Re-implment the function support in onnxruntime (#11167 ) * initial fix * refactor the function handle * update the implementation * fix linux build break * fix training build * fix minmal build * fix gradient checker * deprecate the local function members in graph. host it in model * fix changming's comments * fix comments about inlined containers * fix a missed inlined container * fix training build * avoid const for std string_view Co-authored-by: Cheng Tang <chenta@microsoft.com>	2022-04-29 10:15:58 -07:00
Vincent Wang	1c64351e09	Create Tensor with Strides (#11294 ) * create tensor with strides * resolve comments * refactor Co-authored-by: Vincent Wang <weicwang@microsoft.com>	2022-04-28 16:49:37 +08:00
Dmitri Smirnov	a7d0158c24	Introduce a way to disable Abseil library (#11353 ) Introduce a way to disable Abseil library. Use cmake extra args, no new build switch.	2022-04-27 08:57:52 -07:00
Edward Chen	4d0214f851	Move Contains() helper function to a higher common.h. (#11289 )	2022-04-21 09:31:48 -07:00
Gary Miguel	7aa4af238a	Add strict_shape_type_inference config option (#11081 ) Prior to this, certain shape and type errors were surfaced only when the model was using the latest known op set version. Providing users an explicit option allows for better testing of code that produces models, which includes unit tests within this repo and other repos such as the TF-ONNX and PT-ONNX converters. Remove the previous behavior which seems quite counter-intuitive: an otherwise identical model with a later op set version should be treated identically in this regard. The option defaults to false to avoid causing errors for users that rely on the previous permissive behavior. Turned on the strict enforcement by default in OpTester, which revealed a few disagreements between ORT and ONNX on what the correct output shape should be. Fix shape inference bug in ReduceSumTraining with noop_with_empty_axes=1 which was revealed. Fix TensorOpTest.Unsqueeze_scalar, which was testing negative axes on an op set version where the op did not actually support negative axes. Fixes #9506.	2022-04-21 08:32:40 -07:00
Edward Chen	4854a09340	Consolidate utils::ToTensorProtoElementType, TypeToDataType, and data_types_internal::ToTensorDataType. (#9824 )	2022-04-20 12:45:53 -07:00
Ahmad Zakaria	63ff391b16	add AppendExecutionProvider_CUDA_V2 to the C++ api (#11153 )	2022-04-14 17:33:27 -07:00
Vincent Wang	9707181257	fix build error (#11199 )	2022-04-13 13:09:19 +08:00
Dmitri Smirnov	12c687f594	Rework initializer.cc to eliminate code duplication (#11131 ) Rework initializer.cc to eliminate code duplication and add type enforcement. Address review comments. Add literal operators for MLFloat16 abd BFloat16 and tests.	2022-04-08 09:42:31 -07:00
Justin Stoecker	7609694464	Enable building with a GDK (#11126 )	2022-04-07 15:06:31 -07:00
Changming Sun	4983d6e5d6	Call pluggable EP's shutdown function in Environment::~Environment() (#11120 ) I disabled some tests temporarily. I will move them to a separated executable file in another PR. In the future, I want to combine onnxruntime::Environment and OrtEnv classes. Now we have 3 env classes, it is too confusing: 1. onnxruntime::Env 2. onnxruntime::Environment 3. OrtEnv Our python binding uses onnxruntime::Environment, while all other language bindings use OrtEnv. So python doesn't unload EPs but the others do. It's better to make them consistent. Please note even I added the call, currently the unload function still is a no-op on Linux. So, currently on Windows we must unload the EPs while on Linux we must not do it.	2022-04-07 14:11:29 -07:00
Dmitri Smirnov	2700261f7c	Provide an API to supply external initializers data from user buffers (#11109 ) Imlpement AddExternalInitializers	2022-04-07 12:21:53 -07:00
Maajid khan	81fa28bc56	OpenVINO-EP v4.0 Release PR with OpenVINO 2022.1 (#11025 ) * Enabling ov-ep for 2022.1 Release ->Added ov-ep 2022.1 flow ->Validated CPU Unit tests with OV Master using onnxruntime_test_all unit tests. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fix for output mismatch b/w OpenVINO and ONNX Refer: https://jira.devtools.intel.com/browse/CVS-60310 Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enabling Adobe ops ->Enable Resize op for iGPU ->Enable Add op for iGPU Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Removing irrelevant conditions ->Removing some conditions from GetCapability() which are now not required. (Removed conditions for OV version support less than 2021.2) Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enable upsample op Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enable Adobe proxy-e model Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Removing any extra conditions for Opset13 ops * Opset13 changes Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Exception handling for devices * Added comments * Implement GPU Throttling feature Added GPU Throttling feature for iGPU's. when user enables it as a runtime option, it helps in reducing overall CPU usage of the application Added changes to exercise this option using onnxruntime_perf_test application. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Renaming the runtime config option Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added the user to video and users group * Handling_GPU.0_GPU.1 * Handling special conditions ->Handling corner cases for device_type checks Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Modification to include new api 2.0 changes in the code * Added opset13 changes ->Enabled Few ops ->Added Debug info for case 3b in getcapability() Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enabling ov-ep for 2022.1 Release ->Added ov-ep 2022.1 flow ->Validated CPU Unit tests with OV Master using onnxruntime_test_all unit tests. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fix for output mismatch b/w OpenVINO and ONNX Refer: https://jira.devtools.intel.com/browse/CVS-60310 Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enabling Adobe ops ->Enable Resize op for iGPU ->Enable Add op for iGPU Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Removing irrelevant conditions ->Removing some conditions from GetCapability() which are now not required. (Removed conditions for OV version support less than 2021.2) Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enable upsample op Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enable Adobe proxy-e model Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Removing any extra conditions for Opset13 ops * Opset13 changes Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Exception handling for devices * Added comments * Implement GPU Throttling feature Added GPU Throttling feature for iGPU's. when user enables it as a runtime option, it helps in reducing overall CPU usage of the application Added changes to exercise this option using onnxruntime_perf_test application. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Renaming the runtime config option Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added the user to video and users group * Handling_GPU.0_GPU.1 * Handling special conditions ->Handling corner cases for device_type checks Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added opset13 changes ->Enabled Few ops ->Added Debug info for case 3b in getcapability() Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Log comments updated * Changes to enable 2.0 api * Enabling ov-ep for 2022.1 Release ->Added ov-ep 2022.1 flow ->Validated CPU Unit tests with OV Master using onnxruntime_test_all unit tests. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fix for output mismatch b/w OpenVINO and ONNX Refer: https://jira.devtools.intel.com/browse/CVS-60310 Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enabling Adobe ops ->Enable Resize op for iGPU ->Enable Add op for iGPU Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Removing irrelevant conditions ->Removing some conditions from GetCapability() which are now not required. (Removed conditions for OV version support less than 2021.2) Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enable upsample op Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enable Adobe proxy-e model Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Removing any extra conditions for Opset13 ops * Opset13 changes Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Exception handling for devices * Added comments * Implement GPU Throttling feature Added GPU Throttling feature for iGPU's. when user enables it as a runtime option, it helps in reducing overall CPU usage of the application Added changes to exercise this option using onnxruntime_perf_test application. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Renaming the runtime config option Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added the user to video and users group * Handling_GPU.0_GPU.1 * Handling special conditions ->Handling corner cases for device_type checks Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added opset13 changes ->Enabled Few ops ->Added Debug info for case 3b in getcapability() Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fix build issue Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixes issues Fixes compiler warnings c4458 on windows. Fixes the bug in device_type check logic Adds print info for enable_opencl_throttling option in onnxruntime_perf_test Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> commit to make openvino_2021.4 compatible * Fixed IO Buffer Optimization * Fix output names issue * Fix 2021.3 branch * Bug Fix for Multiple inputs/outputs - Assigns the right output_name and input_name for the graph when returned by CompiledModel::inputs() OV function. - Also takex care of output mismatch issue b/w openvino output and onnx output Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Add comments for the changes made Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * IO Buffer Changes * Commit for Disabling GPU Throttling for 2021.4 * Updated branch * Fix windows build ->Fixed windows build in debug mode ->Disabled scatternd3_tensor_int64 Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixed CPP Unit tests for CPU -Fixed shrink, MVN, ReduceL2, Maxpool, upsample, scatter, slice, reshape, unsqueeze. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixed first set of GPU Tests Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixed additional failing tests on GPU ->Added conditions to disable certain ops under certain conditions ->Disabled certain tests ->Added some op supports for no_dimension supported Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added Expand op support for CPU Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added condition for squeeze op ->Shape can't have empty axes attribute Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Add support for LessOrEqual op function Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * OV Interface wait for replaced by indefinite wait call * use names from ONNX model to access OV tensors This chnage is to use the input/output names retrieved from original onnx model to access OV tensors and to check if there's any input or output names mismatch b/w ONNX naming and OV naming. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixes Myriad unit tests and other issues ->Fixes Myriad CPP unit tests ->Fixes output mismatch issue with models with sub graph partitioning Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fix segfault issue ->Fixed case 3b condition in get_capability() which was causing the segfault issue Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixed build isuse with ov 2021.4 with I/O buffer Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Disables performance counters for I/O Buffer Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixed inputs/outputs mismatch for HDDL with 2022.1 Signed-off-by: Mohammad Amir Aqeel <mohammadx.amir.aqeel@intel.com> * Fix to enable GPU FP16 * Enabled mlperf_ssd_mobilenet_300 model fully on CPU Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added ov version specific dll packaging for nuget * Fixed conditions for few ops Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Dockerfile updates * Updated License Info -Updated the copyrights License Info -modified FP16 transformations with OV 2022.1 Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Disabling mlperf_ssd_mobilenet_300 model ->Disabled this model for openvino. The test is failing in Internal_CI pipelines. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Disabling failing python CPU Tests Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixed flake8 python errors Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> Co-authored-by: hdgx <harinix.d.g@intel.com> Co-authored-by: mayavijx <mayax.vijayan@intel.com> Co-authored-by: sfatimar <sahar.fatima@intel.com> Co-authored-by: mohsinmx <mohsinx.mohammad@intel.com> Co-authored-by: Mohammad Amir Aqeel <mohammadx.amir.aqeel@intel.com>	2022-04-06 13:30:33 -07:00
Vincent Wang	3b6cee8059	[CUDA] Optimize Conv and ConvGrad for Training (#10999 ) * Optimize Conv and ConvGrad for Training * add provider option to control * fix typo	2022-03-29 07:31:36 +08:00
Chi Lo	8ba52b0a05	Bump master version to 1.12 (#10797 ) * bump master version to 1.11 * bump master version to 1.12	2022-03-28 12:30:11 -07:00
Scott McKay	47c09e6701	Clarify usage of kOnnxDomainAlias. (#10962 ) * Clarify usage of kOnnxDomainAlias.	2022-03-25 09:52:59 +10:00
Leandro Gracia Gil	1cc2cfb7b8	Move #ifndef ORT_CXX_API_THROW to the no exceptions case. (#10937 ) This is related to https://github.com/microsoft/onnxruntime/issues/10564 which introduced a fix in the wrong case where exceptions are enabled.	2022-03-21 11:12:56 -07:00
Valery Chernov	625a1f7673	[TVM EP] code refactor (#10655 ) * rename info to options for TVM EP * transfer options processing from TVMExecutionProvider to TVMEPOptions * transfer TVMRunner to separated files * implement TVMCompiler class * replace CompileFunc by TVMCompiler object. update TVMRunner. now it does not depend on TvmExecutionProvider * correct logging of TVM EP options * RunnerImpl, GERunnerImpl and VMRunnerImpl were implemented * add prepareComputeInfo method * remove update_output_shapes flag * embed all TVM EP dependences to tvm namespace. transfer model compilation from TVMRunner. connect TVMRunnerImpl to TVMRunner * refactor compileModel method * small cleaning * separate TVM EP options data store and processing * replace TvmTensorShape by InlinedVector with max_size 5 * correct indentation * update TVM hash Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>	2022-03-16 13:55:04 +01:00
Edward Chen	f468ea40e5	Refactor Node::AddAttribute() (#10869 )	2022-03-16 14:53:00 +10:00
Edward Chen	e53422c6d0	Update convert_onnx_models_to_ort.py to support runtime optimizations. (#10765 ) Add runtime optimization support to ONNX -> ORT format conversion script. Replace `--optimization_level`, `--use_nnapi`, and `--use_coreml` with a new `--optimization_style` option.	2022-03-14 16:50:41 -07:00
Hariharan Seshadri	e80ff63274	Fix bug in MemcpyToHost (#10816 )	2022-03-10 07:02:27 -08:00
Edward Chen	c147c9dda6	Remove ORT_ENABLE_RUNTIME_OPTIMIZATION_IN_MINIMAL_BUILD. (#10778 ) Remove ORT_ENABLE_RUNTIME_OPTIMIZATION_IN_MINIMAL_BUILD as it is now implied by ORT_EXTENDED_MINIMAL_BUILD. Remove related CMake option.	2022-03-08 16:18:49 -08:00
Vincent Wang	4a38f9e31d	enable strided tensor for training only (#10748 )	2022-03-08 08:31:28 +08:00
Fei Hu	60acfd3dd8	Support CUDA Graph in the CUDA EP (#9978 )	2022-03-06 20:47:31 -08:00
Scott McKay	e337f5faf3	Enable QDQ cleanup and NHWC optimizers in an extended minimal build. (#10729 ) * Enable QDQ cleanup and NHWC optimizers in an extended minimal build.	2022-03-04 15:45:42 +10:00
Rachel Guo	a9dc50ba8b	Add option to force QDQIsInt8Allowed to return true when exporting to ORT format (#10719 ) * wip * save * minor update * fix * fix * Revert "fix" This reverts commit `a76f364b2d`. * revert * revert * revert submodule removal * address pr comments * minor fix * address cr comments * fix format Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>	2022-03-02 23:26:14 -08:00
Yulong Wang	f4b2d3af2b	Upgrade emsdk to 3.1.3 (#10577 )	2022-02-28 23:52:41 -08:00
Vincent Wang	9a22b5d253	Strided Tensor Support for Eager Mode (#10578 ) * strided tensor for eager mode * fix build and resolve comments * fix win x86 build	2022-03-01 14:25:31 +08:00
Dmitri Smirnov	e23a224518	Fix CUDA 10.2 compile error due to inlined_containers.h inclusion (#10702 ) Fix CUDA 10.2 compile error due to inlined_containers.h inclusion into a common CUDA header. Use NumberOfNodes() to reserve space in a hash table Prefer separate call to reserve() rather than passing in the hash table constructor. They have somewhat different meaning.	2022-02-28 19:56:44 -08:00
cloudhan	3243c9579f	Fix VLOG?_DEFAULT macros usability. (#10568 ) * Add `set_default_logger_verbosity` api. * fix docs * make flake8 happy	2022-03-01 13:16:26 +10:00
Scott McKay	1f6d8248da	Add optional optimizer to remove leftover Q->DQ pairs after all other QDQ processing has completed (#10659 ) Add an optimizer that can remove leftover Q->DQ pairs. Depending on the model this may help with performance and/or improve accuracy. Optional as it could make things worse so user needs to be aware of this and test what works best for their scenario. Enable with SessionOptions config param `session.enable_quant_qdq_cleanup`	2022-03-01 08:05:02 +10:00
Thiago Crepaldi	e788cc2a23	Convert com.microsoft::ATen into org.pytorch.aten::ATen onnx op (#10060 ) Signed-off-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>	2022-02-28 14:14:45 -05:00
Ryan Hill	eb116595d4	Add ability to customize ORT_CXX_API_THROW (#10688 )	2022-02-28 00:15:10 -08:00
Dmitri Smirnov	b30e0e2283	Remove inline_containers include from tensor_shape (#10682 ) Hide Inlined Hash set and maps guts behind template forward declarations. Currently CUDA 10.2 compiler can not compile abseil but provider interfaces use those types in their signatures. InlinedVector seems to be fine. Introduce core/common/inlined_containers_fwd.h header	2022-02-26 20:07:18 -08:00
Dmitri Smirnov	2679711bee	Refactor transformers and other code to reduce memory allocation calls (#10523 ) Work on minimizing memory management calls by reducing number of allocations and copies. Replace std::unordered_set to InlinedHashSet and add usage of InlinedVector. Employ std::move() to minimize copying and memory allocations. Remove copying of the const shared data into each of the PropagateCast transformer instances. Move inlined_containers.h header to include/common Adjust AsSpan imlementation for C++ < 17	2022-02-24 16:17:14 -08:00
RandySheriffH	e056fbaa51	Add restrictions for hybrid cpus for thread pool task distribution (#10393 ) * add restrictions for hybrid cpus * add unit test to mock hybrid cpu * attach hybrid flag * add mocking interface to CpuInfo * make is_hybrid * make mock function const * add force_hybrid for thread pool * remove header	2022-02-17 14:34:09 -08:00
Ashwini Khade	f436d3437e	Add layout transformer for NNAPI (#10371 ) * Add layout transformer for NNAPI * plus merge fixes * plus some more merge fixes * test fixes * comments + cleanup * plus updates * post merge changes * enable layout transformer in extended minimal build * plus more comments * more tests + fix CI * plus updates per review * more updates per review * fix file name * fix qdq tests * plus more updates * plus updates * typo fix * fix qdq selection in 2nd optimization pass * fix typo * fix a test * update dependency structure for layout transformer * plus updates * more updates * plus change * more updates to fix linker error in minimal build * remove unnecessary headers	2022-02-15 20:25:29 -08:00
Vincent Wang	ceb1e2b1a6	[ROCm] Bugfix of BFloat16-float conversion and Add FastGelu Kernel for AMD (#10557 ) * bf16 bugfix on amd * enable fastgelu ut on amd	2022-02-16 11:11:08 +08:00
Valery Chernov	1cdc23aba4	[TVM EP] Rename Standalone TVM (STVM) Execution Provider to TVM EP (#10260 ) * update java API for STVM EP. Issue is from PR#10019 * use_stvm -> use_tvm * rename stvm worktree * STVMAllocator -> TVMAllocator * StvmExecutionProviderInfo -> TvmExecutionProviderInfo * stvm -> tvm for cpu_targets. resolve onnxruntime::tvm and origin tvm namespaces conflict * STVMRunner -> TVMRunner * StvmExecutionProvider -> TvmExecutionProvider * tvm::env_vars * StvmProviderFactory -> TvmProviderFactory * rename factory funcs * StvmCPUDataTransfer -> TvmCPUDataTransfer * small clean * STVMFuncState -> TVMFuncState * USE_TVM -> NUPHAR_USE_TVM * USE_STVM -> USE_TVM * python API: providers.stvm -> providers.tvm. clean TVM_EP.md * clean build scripts #1 * clean build scripts, java frontend and others #2 * once more clean #3 * fix build of nuphar tvm test * final transfer stvm namespace to onnxruntime::tvm * rename stvm->tvm * NUPHAR_USE_TVM -> USE_NUPHAR_TVM * small fixes for correct CI tests * clean after rebase. Last renaming stvm to tvm, separate TVM and Nuphar in cmake and build files * update CUDA support for TVM EP * roll back CudaNN home check * ERROR for not positive input shape dimension instead of WARNING * update documentation for CUDA * small corrections after review * update GPU description * update GPU description * misprints were fixed * cleaned up error msgs Co-authored-by: Valery Chernov <valery.chernov@deelvin.com> Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru> Co-authored-by: Thierry Moreau <tmoreau@octoml.ai>	2022-02-15 10:21:02 +01:00
Chi Lo	0f5d0a091a	Make user capable of adding new field in OrtTensorRTProviderOptionsV2 as new provider option (#10450 ) * modify code for add additional field in OrtTensorRTProviderOptionsV2 * add include file * fix typo * fix bug * add comment * fix code * revert change	2022-02-05 11:15:12 -08:00
Edward Chen	c43c1691ad	Enable transpose optimizer in minimal extended build (#10349 ) Enable transpose optimizer and infrastructure it depends on in a minimal extended build.	2022-01-31 09:41:04 -08:00
Dwayne Robinson	b02f4ece5e	Remove cbegin and cend calls which do not exist in std::span or gsl::span (#10426 )	2022-01-28 14:25:12 -08:00
Edward Chen	0e951d7d6b	Add some more documentation for the C/C++ API tensor creation functions. (#10394 )	2022-01-27 13:19:11 -08:00
Changming Sun	ec4362f8f3	Enable more static analysis warnings and enable the analyzer for training cpu (#10176 )	2022-01-27 11:17:20 -08:00
Dmitri Smirnov	3367ddc5ba	Add abseil cgmanifest declaration. Update coding standards. (#10374 ) Add abseil cgmanifest declaration. Update coding standards for InlinedContainers Adjust coding guidelines. Add default N calculation for InlinedVector<T, N> for general use. Rename T from InlinedShapeVectorT. Fix Eager build Add LLVM Copyright with modified derived code notice.	2022-01-27 08:32:05 -08:00
Weixing Zhang	ea9c8a7cdc	support MIGraphXEP to work with ROCMEP for inference on AMD GPU (#10368 ) Co-authored-by: Weixing Zhang <wezhan@microsoft.com> Support MIGraphXEP to work with ROCMEP for inference on AMD GPU	2022-01-26 15:52:56 -08:00
Edward Chen	df16c605e8	Add "available since" message for C API additions since v1.10.0. (#10348 )	2022-01-25 10:15:34 -08:00
Edward Chen	4b87d2c172	Fix dockerfiles/Dockerfile.arm32v7 build. (#10360 ) Install CMake, ignore some Eigen warnings.	2022-01-24 19:06:09 -08:00
Dmitri Smirnov	7e092a7e3f	Reduce number of memory allocations based on a customer profiling case (#10193 ) Add abseil and inlined containers typedefs Introduce TensorShapeVector for shape building. Use gsl::span<const T> to make interfaces accept different types of vector like args. Introduce InineShapeVectorT for shape capacity typed instantiations Refactor cuda slice along with provider shared interfaces Refactor Concat, Conv, Pad Build with Conv Einsum and ConvTranspose refactored. Remove TesnorShape::GetDimsAsVector() Refactor SliceIterator and SliceIteratorBase Refactor broadcast Refactor Pads for twice as long Remove memory planner intermediate shapes vector Refactor orttraining Fix passing TenshroShapeVector to tests Remove abseil copy and submodule, use FetchContent_Declare/Fetch Path with separate command Make RocmAsyncBuffer accept anything convertible to span. Adjust Linux GPU pipeline.	2022-01-24 10:40:46 -08:00
Vincent Wang	44e2db9397	CUDA BFloat16 Refactor (#10085 )	2022-01-14 19:38:56 +08:00
RandySheriffH	79d2a0d185	Dynamic cost model to mitigate high E2E perf variance (#9833 ) * commit dyamic block size * summarize granularity * add configure * add test case * call std stoi * add comments * fix typo * rename var * update comment * reset default * better comments * extend LoopCounter for dynamic blocking * fix comments and add more UT * update comments * swtich type to std::ptrdiff_t * format code with better indention * cast ptrdiff_t * fix typo	2022-01-11 17:26:41 -08:00
Shucai Xiao	ce103ace93	Amdmigraphx fix build error (#9272 ) * fix build error * rename a missing api for the MIGraphX EP	2022-01-10 15:18:43 -08:00
Dwayne Robinson	1f5b073508	Minor DirectML EP provider factory comments (#9965 )	2022-01-10 02:06:31 -08:00
Nat Kershaw (MSFT)	d52d3c0052	Update C/C++ API docs automation to create a PR (instead of push to publish branch) (#10093 )	2022-01-07 16:16:47 -08:00
Hariharan Seshadri	0552a47ec2	Enable CUDA provider option configuration for C# (#10188 )	2022-01-06 11:03:14 -08:00
Edward Chen	792db33f01	Enable loading of ORT format model graph runtime optimizations (#9901 ) Initial implementation of load/replay of runtime optimizations in an ORT format model.	2022-01-04 12:09:07 -08:00
stevenlix	05d20343ee	Remove duplicated constant initializer copies for TensorRT nodes (#10105 ) * add new field constant_initializers in metadef and remove constant initializers from trt node inputs * remove redundancy * use GetConstantInitializer() to get constant initializers * add ORT_ENFORCE check Co-authored-by: Ubuntu <azureuser@orteplinuxdev.bxgbzpva45kedp3rhbsbit4phb.jx.internal.cloudapp.net>	2021-12-22 12:19:56 -08:00
Changming Sun	4e9e01cb3c	Fix SDL warnings in CPU EP (#9975 )	2021-12-19 20:54:29 -08:00
Edward Chen	3466ee45a3	Add hash value typedef. (#9710 ) Add a typedef for the various hash value variables. Use of a typedef conveys some additional meaning.	2021-12-15 19:07:17 -08:00
Valery Chernov	b327e89efa	Standalone TVM Executor Provider (#10019 ) * squashed commit for standalone tvm execution provider * critical fix for correct python build with stvm ep * get tuning log file from ep options. It has priority over AUTOTVM_TUNING_LOG * updates and fixes * update parsing of stvm provider options * add support of external data for onnx model * add conditional dump of subgraphs * remove unused code * get input tensor shapes through provider options. get output shapes for fixed input ones by TVM API * support AUTO_TVM tuning log file inside ORT. Selector for Ansor and Auto_TVM is provider option (tuning_type) * add fp16 * add functionality of conversion of model layout to NHWC if need. Necessary parameter was added to STVM provider options * fix license text in header. fix log format * small fixes * fix issues from flake8 * remove model proto construction from GetCapability * reserve memory for vector of DLTensors * add simple tutorial for STVM EP * STVM docs * jroesch/tvm -> apache/tvm * remove dead code, unneccessary logs and comments * fix in readme * improve tutorial notebook * tvm update * update STVM_EP.md * fix default value * update STVM_EP.md * some TODOs for the future development * shorten long lines * add hyperlink to STVM_EP.md * fix Linux CI error * fix error in csharp test Co-authored-by: Jared Roesch <jroesch@octoml.ai> Co-authored-by: Valery Chernov <valery.chernov@deelvin.com> Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>	2021-12-15 16:59:20 -08:00
Changming Sun	20f8a06f1f	Remove OpenMP code (#10032 )	2021-12-15 00:58:42 -08:00

1 2 3 4 5 ...

736 commits