onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-19 21:32:23 +00:00

Author	SHA1	Message	Date
edgchen1	c270ea148a	Move 'using common::Status;' from common.h to status.h.	2022-08-26 15:05:53 -07:00
Yulong Wang	c144acc534	Replace 'master' branch ref to 'main' in the code (#12547 )	2022-08-22 10:48:12 -07:00
Dmitri Smirnov	9481893b58	Replace to lock_guard as lighter class for locking (#12616 ) Replace to lock_guard as lighter class	2022-08-17 11:08:31 -07:00
Haoming Chen	8a038b9b0c	Fix a build error (#12600 ) LLVM compiler complains the std::hash<const char> and suggests std::hash<const void>. But the intention is to hash the name string instead of the pointer. So use std::hash<std::string> to be explicit.	2022-08-17 10:49:54 -07:00
Scott McKay	0b0c51e028	Support direct usage of ORT format model flatbuffer for initializers (#12465 ) * Add ability to use ORT format model flatbuffer directly for intiializers by leveraging the TensorProto external data infrastructure. Requires user to provide ORT format model bytes when creating the session, and set both `session.use_ort_model_bytes_directly` and `session.use_ort_model_bytes_for_initializers` to 1 in SessionOptions config entries (AddSessionConfigEntry in C API).	2022-08-12 18:31:43 +10:00
Changming Sun	ac7538b909	Remove CUDA 10.2 support (#12541 )	2022-08-10 22:46:41 -07:00
Dmitri Smirnov	c10704a501	Use alignas instead of naive padding to avoid false cache sharing (#12514 ) PerThread and ChildThreadStat alignas	2022-08-10 11:23:20 -07:00
Cheng	64e991a9fc	[Qlinearsoftmax] contrib cpu (#12177 ) * [Qlinearsoftmax] contrib cpu * int8 implementation * contrib operator md * qdq transformer test * new attribute: opset * doc * quantized tool * remove template to reduce Binary size * doc of contribe operators * enforce x_shape is valid * fix reduce_size if input-shape is dynamic * add UT * register one op for reducing binarysize * kernel hash update * docs/ContribOperators.md	2022-08-10 10:52:02 +08:00
Hector Li	730240d2a5	remove the link the comments (#12510 )	2022-08-08 15:20:40 -07:00
Scott McKay	8d830adf24	Rework parts of Graph::Resolve to reduce memory usage (#12176 ) * Rework some aspects of Graph::Resolve to reduce memory usage.	2022-08-05 13:20:25 +10:00
Dmitri Smirnov	a4ef0e7f7b	Remove dynamic allocation for ThreadPool ParallelSection (#12429 ) Use InlinedVector in a TP Store per thread parallel section in std::optional and avoid memory allocation	2022-08-04 09:46:16 -07:00
Ryan Hill	52d4699788	Minor doc fixes (#12388 )	2022-08-03 19:47:36 -07:00
Hariharan Seshadri	d5a1c01b38	Add C++ Session ctor taking model bytes and OrtPrepackedWeightsContainer (#12333 )	2022-07-29 12:32:43 -07:00
Yateng Hong	c579497134	Fix TRT custom op issue (#12283 ) * Pass schema registry on CreateModel. * Fix ORT_MINIMAL_BUILD. * Fix build issue.	2022-07-29 03:39:56 -07:00
Ryan Hill	3e014a5e5d	Fix C header to stop people accidentally copying the OrtApi by value (#12297 ) * Fix C header to stop people accidentally copying the OrtApi by value * Remove api_ from KernelTwo	2022-07-25 19:19:40 -07:00
Dmitri Smirnov	3bf614fd47	Eliminate memory allocations per recent profiling (#12225 ) * Alloc begin FeedsFetches refactoring Refactor Tensor class Fix buffer deletor Remove new/delete deleted Adjust alloc move Fix up xnnpack provider Clarifying the comment on Create()	2022-07-25 14:14:38 -07:00
Ashwini Khade	ceb76429db	Merge pull request #12056 from microsoft/bmeswani/merge-training_dev/on_device_poc Merge On-Device-Training Offline Tooling and C/C++ APIs	2022-07-21 15:09:48 -07:00
Baiju Meswani	cbf08c7a7b	Make GetTrainingApi as a part of the OrtApis, add Training API documentation and address other pull request review comments	2022-07-21 18:11:48 +00:00
Dmitri Smirnov	4f106d2b3b	Eliminate unnecessary status lock acquisition in TP (#12196 ) Eliminate unnecessary status lock acquisition in the Thread Pool	2022-07-19 14:16:12 -07:00
Chen Fu	040c2f4517	x86/64 U8S8 Gemm Precision Fix (#12088 ) Add a graph optimization that convert u8s8 matrix multiplication to u8u8 if needed In x86/64 platforms, specifically SSE4.1, AVX2 and AVX512 CPUs provide better performance computing u8s8 matrix multiplications. Unfortunately, the higher performance comes with value overflow problems, as described in: https://www.intel.com/content/www/us/en/develop/documentation/onednn-developer-guide-and-reference/top/advanced-topics/nuances-of-int8-computations.html In this change we added a session option "session.x64quantprecision" (default off). For operators that calls u8s8 matrix multiplications, e.g. QAttention, we convert them to u8u8 when the following conditions are all satisfied: 1. Current CPU is SSE4.1, AVX2 or AVX512 with no VNNI support 2. Session option "session.x64quantprecision" is on. 3. Constant weight tensor contains values outside of [-64, 63] range Note that when weight tensor is not constant, QDQS8ToU8Transformer should already convert it to u8.	2022-07-13 10:12:25 -07:00
Baiju Meswani	a457ddc41d	Merge branch 'master' of https://github.com/microsoft/onnxruntime into bmeswani/merge_pr	2022-06-30 21:53:07 +00:00
Baiju Meswani	6e8edfff0c	Separate training apis from shared core apis (#12027 )	2022-06-29 14:12:29 -07:00
RandySheriffH	d5fcb432fa	Generalize native op creation (#11539 ) * create op from ep * read input count from context * create holder to host nodes * fix typo * cast type before comparison * throw error on API fail * silence warning from minimal build * switch to unique_ptr with deleter to host nodes * fix typo * fix build err for minimal * fix build err for minimal * add UT for conv * enable test on CUDA * add comment * fix typo * use gsl::span and string view for Node constructor * Added two APIs - CopyKernelInfo and ReleaseKernelInfo * pass gsl::span by value * switch to span<NodeArg* const> to allow for reference to const containers * fix typo * fix reduced build err * fix reduced build err * refactoring node construction logic * rename exceptions * add input and output count as arguments for op creation * refactor static member * use ORT_CATCH instead of catch * cancel try catch * add static value name map * format input definition and set err code * fix comments * fix typo	2022-06-27 21:12:15 -07:00
Baiju Meswani	d25cf4df26	Merge branch 'master' into training_dev/on_device_poc	2022-06-24 20:18:19 +00:00
Dmitri Smirnov	088bc7494b	Deprecate APIs returning raw ptrs and provide replacements (#11922 ) Provider better documentation	2022-06-24 09:50:04 -07:00
G. Ramalingam	b1411c8357	Restructure function inliner (#11731 ) * Add nested function call tests * Add overload for Specialize * Pass symboltable to onnx shape inference * Avoid renaming empty names * Enable sequence_map tests which failed before this change	2022-06-24 09:21:31 -07:00
Dmitri Smirnov	607b7df060	Allow saving on CPU usage for infrequent inference requests by reducing thread spinning (#11841 ) Introduce Start/Stop threadpool spinning switch Add a session config option to force spinning stop at the end of the Run()	2022-06-23 10:04:37 -07:00
sfatimar	61a74f2f4d	Mohsin/enable dynamic shapes (#11867 ) * Add pypi build changes to latest Master * Add ORT training part of OV build * Disabling SqueezeOpTest.BadAxes * Add ONNXruntime branch ARG to Docker build * Changes to include file details versions * Commit File Version Updates * Change naming for linux build * Add fix for pylint format errors * Fix pylint warnings. * Enable Dynamic Shapes for OV_API_20 * Update requirements.txt whl version- internal_ci fix * Update backend_manager.cc MYRIAD Fix * Update wheel version in requirements.txt * Update backend_manager.cc * Update backend_manager.cc * Update backend_manager.cc * Update setup.py * Fix pylint warnings * Fix pylint warnings 2 * Update backend_manager.cc * Update backend_manager.cc * Update backend_manager.cc * Update backend_manager.cc * Update backend_manager.cc * Update backend_manager.cc * Update backend_manager.cc * Update backend_manager.cc Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com> Co-authored-by: mayavijx <mayax.vijayan@intel.com> Co-authored-by: Sahar Fatima <sfatima.3001@gmail.com> Co-authored-by: mohsinmx <mohsinx.mohammad@intel.com>	2022-06-21 08:03:58 -07:00
Dmitri Smirnov	267a424e52	Retry Rework execution frame to reduce memory allocations (#11897 ) * Revert "Revert "Refactor ExecutionFrame and SessionState to reduce memory all… (#11888)" This reverts commit `d2cbae3a04`. * Revert prepacked_weights to avoid indirect inclusion in CUDA and TRT code that breaks the build.	2022-06-20 10:29:43 -07:00
Edward Chen	a93fe7824a	Update EP compile API deprecation warning message. (#11808 ) Minor wording update to warning message to clarify that the function style Compile API is deprecated now and will be removed soon. Also updated some code comments.	2022-06-17 12:49:24 -07:00
Yi Zhang	d2cbae3a04	Revert "Refactor ExecutionFrame and SessionState to reduce memory all… (#11888 ) Revert "Refactor ExecutionFrame and SessionState to reduce memory allocations and improve data locality (#11804)" This reverts commit `2ecba6fd25`.	2022-06-17 17:07:21 +08:00
stevenlix	bd65acd08d	Share execution context memory between TensorRT subgraphs (#11859 ) * share trt context memory * update parser to 8.4-EA * update parser to 8.4-GA * add context memory sharing enable option * update parser to 8.2-GA * fix format issue * reverse orders * fix format * fix format * fix issues	2022-06-16 22:42:40 -07:00
Dmitri Smirnov	2ecba6fd25	Refactor ExecutionFrame and SessionState to reduce memory allocations and improve data locality (#11804 ) Refactor ExecutionFrame and SessionState for better data locality and less memory allocations.	2022-06-16 16:50:48 -07:00
Scott McKay	d64f23fec0	EP factory creation cleanup and enhancements. (#11798 ) * Rework the EP factory creation setup so we're not cut-and-pasting function declarations in multiple places. Convert append EP for SNPE to be generic, and also use for XNNPACK. Add XNNPACK to C# API * Don't need stub for MIGraphX as it's using provider bridge. * Remove old 'create' functions that aren't applicable now that the EPs are built as separate libraries. * Only use EPs that require the layout transform if the opset is supported by the layout transformer. * Update wasm registration of xnnpack.	2022-06-16 07:01:41 +10:00
Ashwini Khade	f63e28c92f	C API version 0.001 (#11758 ) * C API version 0.001 * fix linker issues * fixes for save checkpoint api * plus fixes based on tests * plus test_runner and other changes * Plus cosmetic updates * remove unnecessary headers * plus some updates * plus more changes Co-authored-by: Ashwini Khade <askhade@microsoft.com@orttrainingdev10.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2022-06-15 11:13:35 -07:00
Chen Fu	d936751aad	QlinearConv threading adjustments (#11228 ) * Reserve the first core for the main thread Currently in "auto affinity" mode the worker threads are affinized to cores 0..(N-1), leaving the very last core for the main thread. This patch preserves core #0 for the main thread, and affinizes the worker threads to cores 1..N. * Avoid unneeded spin_pause in thread pool's worker threads Remove unneeded PAUSE instruction (0.1-0.2 usec latency) after a worker thread finds a task to execute. * MLAS/x86: optimize QLinearConv on hybrid CPUs Existing 4x task granularity for task partitioning on hybrid CPUs is not sufficient to compensate the difference of VNNI instructions throughput between performance and efficient cores. This patch... * Increases granularity for QLinearConv by 2x, to have 2x more tasks with 2x smaller output count * Limits QLinearConv task count from above, to avoid output count per task getting smaller than kernel's capability * Remove hardcoded task count for QLineConv as it limited scaling on 16+ cores CPUs * MLAS/x86: optimize QLinearConv on hybrid CPUs Existing 4x task granularity for task partitioning on hybrid CPUs is not sufficient to compensate the difference of VNNI instructions throughput between performance and efficient cores. This patch... * Increases granularity for QLinearConv by 2x, to have 2x more tasks with 2x smaller output count * Limits QLinearConv task count from above, to avoid output count per task getting smaller than kernel's capability * Remove hardcoded task count for QLineConv as it limited scaling on 16+ cores CP * Addressing comments * combining x86 ARM branches in qlinearconv threaded job partition * revert first core assignment Co-authored-by: Saurabh <saurabh.tangri@intel.com> Co-authored-by: Chen Fu <fuchen@microsoft.com>	2022-06-14 14:42:12 -07:00
Vincent Wang	5ecfaef042	ATen Fallback for Inference (#11597 ) * aten op for inference * fix build error * more some code to training only * remove domain from operator name * move aten_op_executor ext out from ortmodule * add pipeline * add exec mode * fix script * fix ut script * fix test pipeline * failure test * rollback * bugfix * resolve comments * enable aten for python build only * fix win build * use target_compile_definitions * support io binding * turn off aten by default * fix ut Co-authored-by: Vincent Wang <weicwang@microsoft.com> Co-authored-by: zhijxu <zhijxu@microsoft.com>	2022-06-09 16:07:30 +08:00
Scott McKay	927bac0f86	Rework allocator sharing to work for multiple devices. (#11700 ) * Rework allocator sharing to work for multiple devices. * Update SessionState to not use allocator name in matching for consistency with IExecutionProvider. The name doesn't have any clear meaning (e.g. we use the same name for the per-thread allocator in the CUDA EP as the shared allocate there and in the TRT EP). * NOTE: this means we will have one allocator per OrtMemType+OrtDevice. * Reverse order when doing allocator setup in SessionState. This will result in the CPU and CUDA EPs allocators being preferred (they are the most configurable), and also means the per-thread CUDA allocator for default GPU memory will be used even when TRT is enabled. * NOTE: Combined with the change to remove the allocator name from the key this will mean that if CUDA and TRT or ROCM and MIGraphX are both enabled the CUDA/ROCM per-thread allocator will be used to allocate GPU memory. * Use InsertAllocator instead of TryInsertAllocator. Each EP should be registered once, and we should only enter RegisterAllocator once, so the 'try' should not be required and would indicate an unexpected setup was involved. i.e. better to fail and figure out if we need to support that setup. * Add some clarifying comments around how replace allocator works. * Add unit testing for setup where EP has local allocator that may get out of sync with values in the IExecutionProvider base class. * Fix invalid check of whether data is on CPU to use device info instead of allocator name.	2022-06-09 17:38:38 +10:00
Changming Sun	3c1dd9514d	Revert "fixed point based requantization on arm64 (#11540 )" (#11732 ) This reverts commit `1f2c926`. Because it makes our packaging pipeline crash Error message: [ RUN ] QLinearConvTest.Conv3D_S8S8_Depthwise Test #1: onnxruntime_test_all ...................Subprocess killed***Exception: 838.24 sec We haven't successfully reproduced the bug on a real ARM64 hardware. Currently we only saw it showed up with qemu. More investigations are on-going.	2022-06-03 19:12:25 -07:00
Hector Li	95a16c1ffe	Snpe ep (#11665 ) * Initiate Ort SNPE EP * fix snpe ep windows build which is caused by the utility method (ToUTF8String) name change on master * correct the source path for libonnxruntime.so while building for andorid package * add AdditionalDependencies for amr64 * On MS-Windows, the patchfile must be a text file, i.e. CR-LF must be used as line endings. A file with LF may give the error: "Assertion failed, hunk, file patch.c, line 343," unless the option '--binary' is given. * fix build failure if snpe is not enabled * update doc for contrib op * separate out snpe ep settings to onnxruntime_snpe_provider.cmake * renaming according review comments * update according review comments	2022-06-03 14:10:02 -07:00
Scott McKay	4445dd6bc1	XNNPACK EP (#11445 ) * Implement XNNPACK support via an EP. * Layout transform uses the GraphPartitioner infrastructure. * Node fusion is supported. * Conv and MaxPool implementations were ported from Changming's PR. * Added optional mutex in InferenceSession::Run as we only want to allow sequential calls if xnnpack is enabled	2022-06-03 20:22:34 +10:00
Yufeng Li	1f2c92673b	fixed point based requantization on arm64 (#11540 ) * fixed point based requantization on arm64 * reverse MlasConvSymDepthwiseKernel u8s8 and s8s8 order	2022-06-02 12:34:17 -07:00
Edward Chen	738d9b153c	Consolidate several types into onnxruntime::ArgType. (#11430 )	2022-05-09 14:44:28 -07:00
Tang, Cheng	3f3c5fcd68	Unify the Compile API for mobile build and normal build (#10632 ) * use the lightweight compile api as default; use dnnl ep for testing * apply to tensorrt ep * fix the missing files * fix build * fix the copy issue on linux * migrate migraphx and openvino ep * fix openvino build break * fix linux build * fix unused parameter * fix coreml build * use graph view's filtered initializers * fix openvino break * fix tvm compile api * fix tvm / rknpu / vitisai ep build * add IsInitializedTensor in graph_viewer; fix nuphar build * use serializer directly as tvm ep is still static lib * fix the type mismatch * fix the type mismatch * fix merge conflict * add a comment * fix minimal build * fix the DML EP's legacy approach * save type/shape in dnnl IR * fix linux break * fix tvm failure * dnnl ep: move initializer referenced out of dnnl subgraph * Revert "add IsInitializedTensor in graph_viewer; fix nuphar build" This reverts commit 1cc3c7f08c16fee4fe3309a67209eb769d479587. * add IsInitializedTensor to graph viewer * add the legacy code for nuphar build to temporarily make nuphar build work * ignore internal test for nuphar * remove the out of date tests * keep the legacy API in EP for a while * turn serializer into a static function * update comments * fix tvm build * Update include/onnxruntime/core/framework/execution_provider.h Co-authored-by: Pranav Sharma <prs@microsoft.com> * Update include/onnxruntime/core/framework/execution_provider.h Co-authored-by: Pranav Sharma <prs@microsoft.com> * Update onnxruntime/core/framework/execution_provider.cc Co-authored-by: Pranav Sharma <prs@microsoft.com> * updatee comments; add warning message for legacy compil call * add a flag to control out of scope arg in serialization * fix trt build; improve the test * resolve merege errors * fix a typo Co-authored-by: Cheng Tang <chenta@microsoft.com> Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net> Co-authored-by: Pranav Sharma <prs@microsoft.com>	2022-05-05 08:30:07 -07:00
Changming Sun	963e1ace4e	Fix SAL annotations for custom op (#11432 ) Fix SAL annotations for custom op. For example, "_In_" only applies to pointers, not integers.	2022-05-04 10:47:28 -07:00
RandySheriffH	8d69b9398b	APIs for custom op to invoke ort operator directly (#10713 ) * draft kernel creation * setup eager context * call into kernel in eager mode * redefine test case * refact eager context * add comment * remove header * rename argument * redefine API definition with types * list outputs as argument * switch to int to represent length * fix compile err * create attribute API * add test case for topk * remove bool from c api * add gru test case * remove var * fix compile warnings * rename status * fix compile err * exclude sparse tensor * fix comments * fix comments * fix build err * rename file and move location * format code * move file to session folder * fix comments Co-authored-by: Randy <Randy@randysmac.attlocal.net>	2022-05-03 14:16:30 -07:00
Changming Sun	5023f6750b	Revert "Call pluggable EP's shutdown function in Environment::~Environment() (#11120 )" (#11393 ) This reverts commit `4983d6e5d6`. We can't destroy OrtEnv through python's atexit function, because at that time there might be many other ORT python objects alive.	2022-05-02 14:38:31 -07:00
Tang, Cheng	4b875e3543	Re-implment the function support in onnxruntime (#11167 ) * initial fix * refactor the function handle * update the implementation * fix linux build break * fix training build * fix minmal build * fix gradient checker * deprecate the local function members in graph. host it in model * fix changming's comments * fix comments about inlined containers * fix a missed inlined container * fix training build * avoid const for std string_view Co-authored-by: Cheng Tang <chenta@microsoft.com>	2022-04-29 10:15:58 -07:00
Vincent Wang	1c64351e09	Create Tensor with Strides (#11294 ) * create tensor with strides * resolve comments * refactor Co-authored-by: Vincent Wang <weicwang@microsoft.com>	2022-04-28 16:49:37 +08:00
Dmitri Smirnov	a7d0158c24	Introduce a way to disable Abseil library (#11353 ) Introduce a way to disable Abseil library. Use cmake extra args, no new build switch.	2022-04-27 08:57:52 -07:00

1 2 3 4 5 ...

694 commits