onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-18 21:21:17 +00:00

Author	SHA1	Message	Date
Edward Chen	2ecd1d6622	Switch GSL to MS GSL 4.0.0 (#13416 )	2022-10-29 04:15:20 -07:00
Fei Hu	943e156f4c	Allow custom ops to set input memory type (#10879 )	2022-10-28 21:45:26 -07:00
cloudhan	fc12abf6b1	Enable/Disbale tunable GEMM by using tunable switch in provider options and env var (#13116 ) Related PRs #12853 This allows the user enable/disbale tunable GEMM on demand.	2022-10-19 22:35:08 -07:00
Scott McKay	565da71275	Make 'env' argument to Session const (#13362 ) ### Description <!-- Describe your changes. --> The Env argument does not need to be mutable to call the underlying C API. Update the Ort::Session ctor to have a const Env. All other changes are from clang-format running. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Cleanup	2022-10-19 14:23:24 +10:00
Dmitri Smirnov	f5e3165cc3	Fix move Base::operator= (#13355 ) ### Description Base::operator= move is broken, loses a valid ptr. ### Motivation and Context Address https://github.com/microsoft/onnxruntime/pull/13215#discussion_r997814275	2022-10-18 13:07:40 -07:00
Dmitri Smirnov	4a63cd0290	Improve thread pool creation failure handling. (#13313 ) ### Description Detect and report thread creation failure on Windows. Do not throw out of constructor after the thread is created, the thread handle is lost and cannot be joined, resulting in a deadlock. Make setting a thread priority on Linux consistent with windows. Set thread priority in the thread itself. Log failure properly, but do not exit the thread. ### Motivation and Context Address issues https://github.com/microsoft/onnxruntime/issues/13291 And https://github.com/microsoft/onnxruntime/issues/13285#issuecomment-1278063223	2022-10-15 17:57:19 -07:00
Dmitri Smirnov	f0fbff6dd4	Adjust docs to comply with Doxygen requirements (#13302 ) ### Description Fix up param names in docs ### Motivation and Context Make pipelines pass	2022-10-12 18:07:18 -07:00
cloudhan	1e55949a70	Fix unsound hipify in ROCm EP (#13269 ) Some cuda related things is still left in the rocm ep statically hipified code. Eliminate them to avoid confusion.	2022-10-12 08:32:42 +08:00
cloudhan	2cf5d04e3d	Fix clang-tidy(cppcoreguidelines-pro-bounds-array-to-pointer-decay) (#13241 ) clang-tidy says "Do not implicitly decay an array into a pointer; consider using gsl::array_view or an explicit cast instead" It is a false positive scattering around all our codebase when using helper macros. It is becuase for function with 4 char name, say `main`, the type of __FUNCTION__ and __PRETTY_FUNCTION__ is `char [5]`.	2022-10-11 13:16:48 +08:00
Dmitri Smirnov	5dae0c477d	Deprecate CustomApi and refactor public API for better safety and consistency (#13215 ) ### Description Deprecate CustomOpApi and refactor dependencies for exception safety and eliminate memory leaks. Refactor API classes for clear ownership and semantics. Introduce `InitProviderOrtApi()` ### Motivation and Context Make public API better and safer. Special note about `Ort::Unowned`. The class suffers from the following problems: 1. It is not able to hold const pointers to the underlying C objects. This forces users to `const_cast` and circumvent constness of the returned object. The user is now able to call mutating interfaces on the object which violates invariants and may be a thread-safety issue. It also enables to take ownership of the pointer and destroy it unintentionally (see examples below). 2. The objects that are unowned cannot be copied and that makes coding inconvenient and at times unsafe. 3. It directly inherits from the type it `unowns`. All of the above creates great conditions for inadvertent unowned object mutations and destructions. Consider the following examples of object slicing, one of them is from a real customer issue and the other one I accidentally coded myself (and I am supposed to know how this works). None of the below can be solved by aftermarket patches and can be hard to diagnose. #### Example 1 slicing of argument ```cpp void SlicingOnArgument(Ort::Value& value) { // This will take possession of the input and if the argument // is Ort::Unowned<Ort::Value> it would again double free the ptr // regardless if it was const or not since we cast it away. Ort::Value output_values[] = {std::move(value)}; } void main() { const OrtValue* ptr = nullptr; // some value does not matter Ort::Unowned<Ort::Value> unowned{const_cast<OrtValue>(ptr)}; // onowned is destroyed when the call returns. SlicingOnArgument(unowned); } ``` #### Example 2 slicing of return value ```cpp // The return will be sliced to Ort::Value that would own and relase (double free the ptr) Ort::Value SlicingOnReturn() { const OrtValue ptr = nullptr; // some value does not matter Ort::Unowned<Ort::Value> unowned{const_cast<OrtValue*>(ptr)}; return unowned; } ```	2022-10-06 14:57:37 -07:00
Edward Chen	5c89c37f7f	Consolidate enabled/default kernel def type constraints (#13034 ) Consolidate enabled/default kernel def type constraint types into enabled.	2022-09-27 14:04:15 -07:00
RandySheriffH	a83a9ed6b0	Remove miscellaneous nuphar configs (#13070 ) Remove a handful of nuphar related configurations after deprecation. Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2022-09-26 13:41:28 -07:00
Edward Chen	5f611b63a1	Make classes IKernelTypeStrResolver and IKernelLookup have protected destructors. (#13059 )	2022-09-23 09:16:45 -07:00
wangxiyuan	952c99304a	Add CANN EP (#12416 ) Description: This PR adds Ascend CANN execution provider support. Motivation and Context - Why is this change required? What problem does it solve? As the info shown in the issue. CANN is the API layer for Ascend processor. Add CANN EP can allow user run onnx model on Ascend hardware via onnxruntime The detail change: 1. Added CANN EP framework. 2. Added the basic operators to support ResNet and VGG model. 3. Added C/C++、Python API support - If it fixes an open issue, please link to the issue here. https://github.com/microsoft/onnxruntime/issues/11477 Author: lijiawei <lijiawei19@huawei.com> wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: FFrog <ljw1101.vip@gmail.com>	2022-09-22 14:53:40 -07:00
sfatimar	cccbe90764	Openvino ep 2022.2 v4.2 (#13023 ) This changes are to align OV 2022.2 Release with ORT . Changes CPU FP16 Support, dGPU Support, RHEL Dockerfile, Ubuntu 20 Dockerfile Motivation and Context - This change is required to ensure ORT-OpenVINO Execution Provider is aligned with latest changes. - If it fixes an open issue, please link to the issue here. Co-authored-by: mayavijx <mayax.vijayan@intel.com> Co-authored-by: shamaksx <shamax.kshirsagar@intel.com> Co-authored-by: pratiksha <pratikshax.bapusaheb.vanse@intel.com> Co-authored-by: pratiksha <mohsinx.mohammad@intel.com> Co-authored-by: Sahar Fatima <sfatima.3001@gmail.com> Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com> Co-authored-by: nmaajidk <n.maajid.khan@intel.com> Co-authored-by: Mateusz Tabaka <mateusz.tabaka@intel.com> Co-authored-by: intel <intel@iotgecsp-nuc04.iind.intel.com>	2022-09-22 12:31:40 -07:00
Edward Chen	454f77cd94	Update kernel matching logic: decouple from op schemas and remove kernel def hashes (#12791 ) # Motivation Currently, ORT minimal builds use kernel def hashes to map from nodes to kernels to execute when loading the model. As the kernel def hashes must be known ahead of time, this works for statically registered kernels. This works well for the CPU EP. For this approach to work, the kernel def hashes must also be known at ORT format model conversion time, which means the EP with statically registered kernels must also be enabled then. This is not an issue for the always-available CPU EP. However, we do not want to require that any EP which statically registers kernels is always available too. Consequently, we explore another approach to match nodes to kernels that does not rely on kernel def hashes. An added benefit of this is the possibility of moving away from kernel def hashes completely, which would eliminate the maintenance burden of keeping the hashes stable. # Approach In a full build, ORT uses some information from the ONNX op schema to match a node to a kernel. We want to avoid including the ONNX op schema in a minimal build to reduce binary size. Essentially, we take the necessary information from the ONNX op schema and make it available in a minimal build. We decouple the ONNX op schema from the kernel matching logic. The kernel matching logic instead relies on per-op information which can either be obtained from the ONNX op schema or another source. This per-op information must be available in a minimal build when there are no ONNX op schemas. We put it in the ORT format model. Existing uses of kernel def hashes to look up kernels are replaced with the updated kernel matching logic. We no longer store kernel def hashes in the ORT format model’s session state and runtime optimization representations. We no longer keep the logic to generate and ensure stability of kernel def hashes.	2022-09-20 14:24:59 -07:00
Cheng	f26054deca	[XNNPACK] Support running in multi-thread with seperate pthreadpool (#11762 ) Description: Describe your changes. XNNPACK takes pthreadpool as its internal threadpool implemtation, it couples calculation and parallelization. Thus it's impossible to leverage ORT's threadpool (EIGEN/OPENMP based). So we enabled pthreadpool in XNNPACK EP in this PR. Case 1: Pthreadpool coexist with ORT-threadpool simply Expriments setup hardware:RedMi8A with 8 cores, ARMv7 The two threadpool has the same pool size form 1 to 8. Two models: mobilenet_v2 and mobilenet_egetppu. we can see the picture below and draw a conclusion, latency are even higher from 5 threads or more. ![image](https://user-images.githubusercontent.com/9417365/190550127-2304adfe-97ac-4aeb-91a0-4606b5305a82.png) Case 2: For the reason of performance regression with 5 more threads, ORT-threads are spinning on CPU and diddn't realease it after computation finished. It's equivalent of creating 5x2 threads for parallelization while we have only 8 cpu cores. So I mannuly disabled spinning after ort-threadpool finished and enabled it when enter ort-threadpool. The result is quite normal now. ![image](https://user-images.githubusercontent.com/9417365/190675230-0d85dd02-01f0-4255-967d-e3dbb2a1fe52.png) Case 3: Even we achieved a reasonable results with disabling spinning, Will ORT-threadpool still impact performance of pthreadpool? we have expriment setting up as: Setting ORT-threadpool size (intra_thread_num) as 1, and only pthreadpool created. Attention that, almost a third of ops are running by CPU EP. we are surprisingly find that disabling ort-threadpool is even better in performance than creating two threadpool. ![image](https://user-images.githubusercontent.com/9417365/190556480-d6507396-d777-44fc-94e1-938d2b9bb7d7.png) Case 4: Use a unified threadpool between CPU ep and XNNPACK ep. It's the fastest among all. But if we take the similar workload partition strategy as ORT-threadpool, it could be faster. ![image](https://user-images.githubusercontent.com/9417365/190674908-a68fd20f-bdf4-41f9-bf0a-76b304cda490.png) Motivation and Context - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. Co-authored-by: Jicheng Wen <jicwen@microsoft.com>	2022-09-20 16:02:15 +08:00
Tang, Cheng	739b5675c8	remove legacy compile api (#12932 ) Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2022-09-15 13:18:40 -07:00
Dmitri Smirnov	bc2df1bf95	Remove previously deprecated API (#12935 ) Remove previously deprecated API Format JS code, address review comments NPM Formatting	2022-09-14 10:58:03 -07:00
Scott McKay	1016c33519	Fix prefast warning in upsample.cc. (#12938 ) * Fix prefast warning. * Fix some other static analysis warnings.	2022-09-14 08:14:33 +10:00
Cheng	8cedafe250	[xnnpack] Have `Initializer` in Mobile related EPs in Minimal_build and creating EP specific dynamic-schema (#12555 ) * Remove the dependence of Qlinearsoftmax schema * refactor initializerview && create shared schema * Dynamic Create EP specific schema * Have Initializer in minimal_build * address comments * remove CancelFuseSubGraph	2022-09-06 14:32:15 +08:00
ashbhandare	27dde0b51f	Csharp bindings for on-device training APIs (#12404 )	2022-09-02 13:13:48 -07:00
Yulong Wang	82a28cc2c3	upgrade emsdk to 3.1.19 (#12690 ) * upgrade emsdk to 3.1.19 * fix build break * ignore '-Wunused-but-set-variable' in eigen * add malloc and free in exported functions * EXPORTED_FUNCTIONS	2022-08-30 13:42:45 -07:00
Baiju Meswani	b83ea3c2ff	Address prefast static analysis warnings (#12756 )	2022-08-29 10:09:32 -07:00
mwootton	817dc94345	Add first pass of rocm kernel profiler (#10911 ) * Add first pass of rocm kernel profiler * Clean up rocm_profiler. Format args. Demangle kernel names. Add Api EventRecords * Remove debug output * Temporarily disable profiling unit test 'api record check' for cupti * Fix compile error for non-gpu builds * Use common file for demangle and pid/tid. Namespace ThreadUtil. Fix gpu buffer clearing. * Merge demangle into profiler_common * Merge demangle into profiler_common part 2 * Style cleanup * Resolve linking issues via ProviderHost interface * Demangle cuda kernel names * Clean up comments * Fix formatting * Fix anal retentive formatting	2022-08-26 19:38:03 -07:00
edgchen1	c270ea148a	Move 'using common::Status;' from common.h to status.h.	2022-08-26 15:05:53 -07:00
Yulong Wang	c144acc534	Replace 'master' branch ref to 'main' in the code (#12547 )	2022-08-22 10:48:12 -07:00
Dmitri Smirnov	9481893b58	Replace to lock_guard as lighter class for locking (#12616 ) Replace to lock_guard as lighter class	2022-08-17 11:08:31 -07:00
Haoming Chen	8a038b9b0c	Fix a build error (#12600 ) LLVM compiler complains the std::hash<const char> and suggests std::hash<const void>. But the intention is to hash the name string instead of the pointer. So use std::hash<std::string> to be explicit.	2022-08-17 10:49:54 -07:00
Scott McKay	0b0c51e028	Support direct usage of ORT format model flatbuffer for initializers (#12465 ) * Add ability to use ORT format model flatbuffer directly for intiializers by leveraging the TensorProto external data infrastructure. Requires user to provide ORT format model bytes when creating the session, and set both `session.use_ort_model_bytes_directly` and `session.use_ort_model_bytes_for_initializers` to 1 in SessionOptions config entries (AddSessionConfigEntry in C API).	2022-08-12 18:31:43 +10:00
Changming Sun	ac7538b909	Remove CUDA 10.2 support (#12541 )	2022-08-10 22:46:41 -07:00
Dmitri Smirnov	c10704a501	Use alignas instead of naive padding to avoid false cache sharing (#12514 ) PerThread and ChildThreadStat alignas	2022-08-10 11:23:20 -07:00
Cheng	64e991a9fc	[Qlinearsoftmax] contrib cpu (#12177 ) * [Qlinearsoftmax] contrib cpu * int8 implementation * contrib operator md * qdq transformer test * new attribute: opset * doc * quantized tool * remove template to reduce Binary size * doc of contribe operators * enforce x_shape is valid * fix reduce_size if input-shape is dynamic * add UT * register one op for reducing binarysize * kernel hash update * docs/ContribOperators.md	2022-08-10 10:52:02 +08:00
Hector Li	730240d2a5	remove the link the comments (#12510 )	2022-08-08 15:20:40 -07:00
Scott McKay	8d830adf24	Rework parts of Graph::Resolve to reduce memory usage (#12176 ) * Rework some aspects of Graph::Resolve to reduce memory usage.	2022-08-05 13:20:25 +10:00
Dmitri Smirnov	a4ef0e7f7b	Remove dynamic allocation for ThreadPool ParallelSection (#12429 ) Use InlinedVector in a TP Store per thread parallel section in std::optional and avoid memory allocation	2022-08-04 09:46:16 -07:00
Ryan Hill	52d4699788	Minor doc fixes (#12388 )	2022-08-03 19:47:36 -07:00
Hariharan Seshadri	d5a1c01b38	Add C++ Session ctor taking model bytes and OrtPrepackedWeightsContainer (#12333 )	2022-07-29 12:32:43 -07:00
Yateng Hong	c579497134	Fix TRT custom op issue (#12283 ) * Pass schema registry on CreateModel. * Fix ORT_MINIMAL_BUILD. * Fix build issue.	2022-07-29 03:39:56 -07:00
Ryan Hill	3e014a5e5d	Fix C header to stop people accidentally copying the OrtApi by value (#12297 ) * Fix C header to stop people accidentally copying the OrtApi by value * Remove api_ from KernelTwo	2022-07-25 19:19:40 -07:00
Dmitri Smirnov	3bf614fd47	Eliminate memory allocations per recent profiling (#12225 ) * Alloc begin FeedsFetches refactoring Refactor Tensor class Fix buffer deletor Remove new/delete deleted Adjust alloc move Fix up xnnpack provider Clarifying the comment on Create()	2022-07-25 14:14:38 -07:00
Ashwini Khade	ceb76429db	Merge pull request #12056 from microsoft/bmeswani/merge-training_dev/on_device_poc Merge On-Device-Training Offline Tooling and C/C++ APIs	2022-07-21 15:09:48 -07:00
Baiju Meswani	cbf08c7a7b	Make GetTrainingApi as a part of the OrtApis, add Training API documentation and address other pull request review comments	2022-07-21 18:11:48 +00:00
Dmitri Smirnov	4f106d2b3b	Eliminate unnecessary status lock acquisition in TP (#12196 ) Eliminate unnecessary status lock acquisition in the Thread Pool	2022-07-19 14:16:12 -07:00
Chen Fu	040c2f4517	x86/64 U8S8 Gemm Precision Fix (#12088 ) Add a graph optimization that convert u8s8 matrix multiplication to u8u8 if needed In x86/64 platforms, specifically SSE4.1, AVX2 and AVX512 CPUs provide better performance computing u8s8 matrix multiplications. Unfortunately, the higher performance comes with value overflow problems, as described in: https://www.intel.com/content/www/us/en/develop/documentation/onednn-developer-guide-and-reference/top/advanced-topics/nuances-of-int8-computations.html In this change we added a session option "session.x64quantprecision" (default off). For operators that calls u8s8 matrix multiplications, e.g. QAttention, we convert them to u8u8 when the following conditions are all satisfied: 1. Current CPU is SSE4.1, AVX2 or AVX512 with no VNNI support 2. Session option "session.x64quantprecision" is on. 3. Constant weight tensor contains values outside of [-64, 63] range Note that when weight tensor is not constant, QDQS8ToU8Transformer should already convert it to u8.	2022-07-13 10:12:25 -07:00
Baiju Meswani	a457ddc41d	Merge branch 'master' of https://github.com/microsoft/onnxruntime into bmeswani/merge_pr	2022-06-30 21:53:07 +00:00
Baiju Meswani	6e8edfff0c	Separate training apis from shared core apis (#12027 )	2022-06-29 14:12:29 -07:00
RandySheriffH	d5fcb432fa	Generalize native op creation (#11539 ) * create op from ep * read input count from context * create holder to host nodes * fix typo * cast type before comparison * throw error on API fail * silence warning from minimal build * switch to unique_ptr with deleter to host nodes * fix typo * fix build err for minimal * fix build err for minimal * add UT for conv * enable test on CUDA * add comment * fix typo * use gsl::span and string view for Node constructor * Added two APIs - CopyKernelInfo and ReleaseKernelInfo * pass gsl::span by value * switch to span<NodeArg* const> to allow for reference to const containers * fix typo * fix reduced build err * fix reduced build err * refactoring node construction logic * rename exceptions * add input and output count as arguments for op creation * refactor static member * use ORT_CATCH instead of catch * cancel try catch * add static value name map * format input definition and set err code * fix comments * fix typo	2022-06-27 21:12:15 -07:00
Baiju Meswani	d25cf4df26	Merge branch 'master' into training_dev/on_device_poc	2022-06-24 20:18:19 +00:00
Dmitri Smirnov	088bc7494b	Deprecate APIs returning raw ptrs and provide replacements (#11922 ) Provider better documentation	2022-06-24 09:50:04 -07:00

1 2 3 4 5 ...

719 commits