onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-26 03:00:54 +00:00

Author	SHA1	Message	Date
Jian Chen	6662ece4a1	increase timeout to 5 hours (#13226 ) ### Description Increase MacOS pipeline timeout to 5 hours ### Motivation and Context It blocks Release pipeline	2022-10-07 13:02:48 -04:00
Baiju Meswani	04ba8a7e6e	Introduce Training C++ Apis (#12994 )	2022-10-06 20:13:37 -07:00
cloudhan	51ac6617f5	Fix warnings and enable dev mode for ROCm CI (#13223 ) Fix warnings and enable dev mode for ROCm CI: * Fix ROCm headers complaining "This file is deprecated. Use the header file from ..." * Disable warning signed and unsigned compare for kernel explorer * Fix unused and nondiscard warnings * Enable dev mode for ROCm CI * Walkaround error "unknown warning option '-Wno-nonnull-compare'" in kernel explorer by using '-Wno-unknown-warning-option' to ignore the unknown option * Fix error "unused parameter 'mask'" * Fix warning "instantiation of variable 'onnxruntime::rocm::Consts<float>::One' required here, but no definition is available", etc. Fixed by using C++17's inline (implied by constexpr) static initialization. * Remove unused variable * Add the missing `override` specifier	2022-10-07 09:45:01 +08:00
Dmitri Smirnov	5dae0c477d	Deprecate CustomApi and refactor public API for better safety and consistency (#13215 ) ### Description Deprecate CustomOpApi and refactor dependencies for exception safety and eliminate memory leaks. Refactor API classes for clear ownership and semantics. Introduce `InitProviderOrtApi()` ### Motivation and Context Make public API better and safer. Special note about `Ort::Unowned`. The class suffers from the following problems: 1. It is not able to hold const pointers to the underlying C objects. This forces users to `const_cast` and circumvent constness of the returned object. The user is now able to call mutating interfaces on the object which violates invariants and may be a thread-safety issue. It also enables to take ownership of the pointer and destroy it unintentionally (see examples below). 2. The objects that are unowned cannot be copied and that makes coding inconvenient and at times unsafe. 3. It directly inherits from the type it `unowns`. All of the above creates great conditions for inadvertent unowned object mutations and destructions. Consider the following examples of object slicing, one of them is from a real customer issue and the other one I accidentally coded myself (and I am supposed to know how this works). None of the below can be solved by aftermarket patches and can be hard to diagnose. #### Example 1 slicing of argument ```cpp void SlicingOnArgument(Ort::Value& value) { // This will take possession of the input and if the argument // is Ort::Unowned<Ort::Value> it would again double free the ptr // regardless if it was const or not since we cast it away. Ort::Value output_values[] = {std::move(value)}; } void main() { const OrtValue* ptr = nullptr; // some value does not matter Ort::Unowned<Ort::Value> unowned{const_cast<OrtValue>(ptr)}; // onowned is destroyed when the call returns. SlicingOnArgument(unowned); } ``` #### Example 2 slicing of return value ```cpp // The return will be sliced to Ort::Value that would own and relase (double free the ptr) Ort::Value SlicingOnReturn() { const OrtValue ptr = nullptr; // some value does not matter Ort::Unowned<Ort::Value> unowned{const_cast<OrtValue*>(ptr)}; return unowned; } ```	2022-10-06 14:57:37 -07:00
Ti-Tai Wang	87f55505b3	[ONNX] Support huggingface BART to ONNX (#12779 ) Add BART into transformer support, specificalyy for `BartForConditionalGeneration` Motivation and Context - fixes #11210 Currently, the custom op beam search is not working in nightly, this PR should be run with a [custom commit](`10f3d46d92`)	2022-10-06 12:20:03 -07:00
Rachel Guo	814e5cfa4c	[rn] Support UINT8 type for onnxruntime-react-native on iOS (#13210 ) ### Description <!-- Describe your changes. --> As title. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Uint8 type might be required for some model used in sample application. To match supported data types for onnxruntime-react-native for Android. Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local> Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>	2022-10-06 11:35:25 -07:00
ashari4	b09dd11ece	BFP schemas: Change block dimension type to Int (#13169 ) * Change block dimension type to Int from Ints. * In response to feedback that the block dimension corresponds to the reduction dimension of the consuming matrix multiplication. There is always only 1 reduction dimension.	2022-10-06 11:11:43 -07:00
Scott McKay	cf075fcbad	Handle edge case in CumSum causing overflow (#13174 ) ### Description <!-- Describe your changes. --> Add special case handling for exclusive + reverse where axis has dim value of 1. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? #13165	2022-10-06 07:18:02 +10:00
Edward Chen	4e37464cc5	Add build configuration to binary size checks pipeline. (#13208 ) Add another build configuration to binary size checks pipeline. Enable additional configurations to be added more easily.	2022-10-05 12:39:19 -07:00
Tony Xia	c7522e547a	Fixed a minor typo (#13194 ) ### Description binraries ==> binaries ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2022-10-05 12:10:14 -07:00
Zhang Lei	dca941795e	Fix prefast bugs: 1944959 1997925 1997926 1997927 1997928 (#13203 )	2022-10-05 08:59:40 -07:00
cloudhan	72076b1eb2	Update ROCm CI to use HIP LANGUAGE (#13214 ) Update for ROCm CI before reland tunable GEMM #12853. This PR also update composable kernel to use CMakes's HIP language support so that we can mix C/C++ compiler with HIP compiler instead of locking to hip-clang	2022-10-05 16:15:16 +08:00
Ashwini Khade	4fc8f7139a	Bug Fix - C# API order incompatibile with C API (#13191 ) ### Description Training C# bindings (ReleaseTrainingSession and ReleaseCheckpointState) broke after an API order change in Training C API. This PR fixes this issue. ### Motivation and Context Bug Fix for Training C# bindings <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2022-10-04 09:29:20 -07:00
Justin Chu	595a0c8658	Disable clang-tidy CI (#13207 ) Disable clang-tidy CI for now because it is creating a lot of false positives like in https://github.com/microsoft/onnxruntime/pull/12998	2022-10-04 07:37:49 -07:00
Tianlei Wu	b6c04f48c1	Fix reshape fusion (#13150 ) (1) Hot fixes reshape fusion, which causes stable diffusion unet model invalid. (2) Update remove_cascaded_cast_nodes to make it faster	2022-10-04 00:26:29 -07:00
Faith Xu	2d50d4be24	Update TSA path to new ADO project (#12902 ) Updates TSA item path to new ADO project area paths	2022-10-03 22:54:42 -07:00
Ashwini Khade	c780c4a2b9	Fix two prefast warnings (#13211 )	2022-10-03 20:00:57 -07:00
Tony Xia	962fee5fe5	Fix typo enviroment => environment (#13195 )	2022-10-03 17:02:26 -07:00
Justin Stoecker	9cf98dacb3	Enable command list reuse for Xbox (#13173 ) Removes the workaround introduced in #12063, which disabled DML command list reuse for Xbox builds. The ID3D12CommandList created in FusedGraphKernel takes points to an ID3D12CommandAllocator that is local to the `BuildReusableCommandList` function. On PC it would seem the command list is keeping the command allocator alive, but this is highly suspect logic that definitely doesn't work on Xbox. I find no documentation indicating this logic should work (a section on [reference counting](https://learn.microsoft.com/en-us/windows/win32/direct3d12/recording-command-lists-and-bundles#reference-counting) makes it clear command lists take no refs on D3D objects passed as args to its APIs; however, it's unclear if this also applies to its construction). A second (small) change is constructing the command list straight into `ID3D12GraphicsCommandList` and removing an unnecessary QI.	2022-10-03 16:03:29 -07:00
Ryan Hill	81a4efee6c	Prefast Fixes (#12952 ) Description: Fixes these TSA issues (no actual bugs fixed, but just changing code to make TSA happy) To fix 1944982 and 1944973 I changed DeleteOnUnloadPtr to not use 'new' and to just use placement new to go into a fixed buffer. This required changing the rocm usage of it also (probably a separate TSA bug on that one that I don't have) 1944982 Ryan Hill [prefast:Warning]: C26426 (in onnxruntime/core/providers/cuda/tensor/cast_op.cc) Global initializer calls a non-constexpr function 'operator new' (i.22). 1944973 Ryan Hill [prefast:Warning]: C26426 (in onnxruntime/core/providers/cuda/cuda_execution_provider_info.cc) Global initializer calls a non-constexpr function 'operator new' (i.22). 1944929 Ryan Hill [prefast:Warning]: C26436 (in onnxruntime/core/providers/cuda/cuda_provider_factory.cc) The type 'struct onnxruntime::ProviderInfo_CUDA_Impl' with a virtual function needs either public virtual or protected non-virtual destructor (c.35).	2022-10-03 15:50:44 -07:00
Yulong Wang	82786baed1	[js/web] add 'xnnpack' to EP list (#12723 ) Description: This PR adds support for "XNNPACK EP" in ORTWeb and changes the behavior of how ORTWeb deals with "backends", or "EPs" in API. Background: Term "backend" is introduced in ONNX.js to representing a TypeScript type which implements a "backend" interface, which is a similar but different concept to ORT's EP (execution provider). There was 3 backends in ONNX.js: "cpu", "wasm" and "webgl". When ORT Web is launched, the concept is derived to help users to integrate smoothly. Technically, when "wasm" backend is used, users need to also specify "EP" in the session options. Considering it may get complicated and confused for users to figure out the difference between "backend" and "EP", the JS API hide the "backend" concept and made a mapping between names, backends and EPs: "webgl" (Name) <==> "onnxjsBackend" (Backend) "wasm" (Name) <==> "wasmBackend" (Backend) <==> "CPU" (EP) Details: The following changes are applied in this PR: 1. allow multi-registration for backends using the same name. This is for use scenarios where both "onnxruntime-node" and "onnxruntime-web" are consumed in a Node.js App ( so "cpu" will be registered twice in this scenario. ) 2. re-assign priority values to backends. I give 100 as base to "cpu" for node and react_native, and 10 as base to "cpu" in web. 3. add "cpu", "xnnpack" as new names of backends. 4. update onnxruntime wasm exported functions to support EP registration. 5. update implementations in ort web to handle execution providers in session options. 6. add '--use_xnnpack' as default build flag for ort-web	2022-10-03 10:38:45 -07:00
Yufeng Li	1342baf1c7	refine QuantConfig (#13155 ) Refine the QuantConfig: 1. Remove the default EP config. 2. pass QuantConfig to quantize API direclty.	2022-10-03 08:34:49 -07:00
Baiju Meswani	0cf17b1921	Add linux debug training package to nightly pipeline (#13192 )	2022-10-01 06:58:43 -07:00
Nat Kershaw (MSFT)	68218935b9	Fix syntax error in labeler.yml (#13193 )	2022-09-30 23:10:21 -07:00
Edward Chen	a86b8329d9	Update unsupported ORT format version error message to link to doc on rel-1.13.0 branch. (#13187 )	2022-09-30 17:13:52 -07:00
Yulong Wang	054464dce2	fix XNNPACK on WebAssembly SIMD (#13161 ) ### Description fix XNNPACK on WebAssembly SIMD. Flag "-msimd128" need to be applied to every source file when compiling WASM SIMD. Currently only a part of the source files are compiled with this flag so we get inconsistent result for `sizeof(xnn_f32_minmax_params)` because the type definition include a `#ifdef` for `__wasm_simd128__`. The inconsistency causes writing garbage data to a stack variable and eventually cause the crash. XNNPACK libraries are C libraries so need to apply the build flags not only to `CMAKE_CXX_FLAGS` but also to `CMAKE_C_FLAGS`.	2022-09-30 16:34:15 -07:00
Nat Kershaw (MSFT)	0bf0991fa2	Update labeler.yml (#13186 )	2022-09-30 15:43:33 -07:00
Numfor Tiapo	56387c3c31	Fix SDL Unmatched Annotation Errors (#13162 ) Fixes 3 SDL unmatched annotation errors. Co-authored-by: Numfor Mbiziwo-Tiapo <numform@microsoft.com>	2022-09-30 15:36:30 -07:00
Edward Chen	aae35f2759	Binary size reduction in KernelTypeStrResolver and GraphPartitioner (#13172 ) Reduce binary size for minimal Android builds. - reduce places where Status objects are created in KernelTypeStrResolver::LoadFromOrtFormat() - remove some unused parameters (in a base minimal build) and code in graph_partitioner.cc	2022-09-30 13:50:39 -07:00
George Nash	b76a65c784	Upgrade the oneDNN ep to use oneDNNv2.7 (#13175 ) ### Description This updates the oneDNN library used by oneDNN ep from version 2.6 to version 2.7 ### Motivation and Context This brings in the many improvements incorporated into the oneDNN library to the oneDNN execution provider. Signed-off-by: George Nash <george.nash@intel.com>	2022-09-30 12:29:17 -07:00
Nat Kershaw (MSFT)	eef0a98cae	Update and fix labeling rules (#13129 )	2022-09-30 10:19:23 -07:00
Brian Martin	c20abcab87	User/brianma/eo (#13152 ) fixing SDL issues. One was a SAL mismatch, the other was handling an optional null pointer.	2022-09-30 09:43:56 -07:00
Justin Chu	402e1995f0	Create clang-tidy CI (#12653 ) Update clang-tidy config to prepare for creating a CI workflow to run clang-tidy. Added clangtidy check in CI Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2022-09-30 08:05:38 -07:00
Changming Sun	5f1bc8ff56	Add "--parallel" to the build flags of WASM pipeline (#13179 )	2022-09-30 06:54:39 -07:00
Yi Zhang	a862b0cad1	increase ios_CI_coreml stage timeout limit (#13157 ) ### Description As titile ### Motivation and Context Recently, it became more frequently that the workflow canceled due to timeout.	2022-09-30 14:45:14 +08:00
Changming Sun	dd2aec170d	Update Coding_Conventions_and_Standards.md (#7705 )	2022-09-29 23:32:37 -07:00
sumitsays	f3180f3ac8	[DML EP] Enable graph inside DML Graph (#13073 ) ### Description Kernels like Attention, BatchNormalization15, etc, can be implemented by using multiple DML APIs. This PR paves the path for graph-based kernel implementation. As part of this PR, every kernel in DML EP will now wrap their DML_OPERATOR_DESC into a graph and send it to FusedGraphKernel. FusedGraphKernel will stich this smaller graph into its main DML_GRAPH. All onnxconformance test and Winml model tests passed. Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com> Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>	2022-09-29 23:32:20 -07:00
cloudhan	c93cb8f949	Revert "Enable ROCm to use tunable GEMM" (#13160 ) Reverts microsoft/onnxruntime#12853 due to CI pipeline problem.	2022-09-30 14:01:16 +08:00
Ye Wang	c8781b77f6	Decouple use_sequence_as_input_ids from has_hidden_states (#13130 ) ### Description <!-- Describe your changes. --> A fix for parity issue in huggingface bart model with beam search https://github.com/microsoft/onnxruntime/pull/12779 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2022-09-29 22:45:52 -07:00
Scott McKay	32395e2e16	Add handling for variadic inputs/outputs in a function. (#13140 ) ### Description <!-- Describe your changes. --> Add handling for variadic inputs/outputs in a function. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> #13121	2022-09-30 14:51:17 +10:00
Scott McKay	4d8510611b	Update find_optimizer_opset_version_updates_required.py to use the ONNX headers to determine the latest opset. (#12484 ) Description: Use the onnx headers to find the latest opset for each operator. This allows the script to detect optimizers with `graph_utils::IsSupportedOptypeVersionAndDomain` calls that need updating when run during the update of the onnx commit id. Without this change issues are not detected until a new kernel is registered. Motivation and Context Detect optimizers that need updates as part of the ONNX update process.	2022-09-29 16:55:22 +10:00
Vincent Wang	6c63c1c9ee	Multiple Gather to Split Fusion (#13095 ) For below code in some transformers models: ``` fused_qkv = fused_qkv.view(batch_size, seq_length, self.num_heads, 3, self.head_dim) return fused_qkv[..., 0, :], fused_qkv[..., 1, :], fused_qkv[..., 2, :] ``` The exported graph will contains 3 Gather nodes, currently ORT's GatherGrad CUDA implementation is slow. This pattern can be fused to use one Split, so that we can launch less kernels for the compute, the perf of Split/Concat (for grad) is also better than Gather/GatherGrad. In a real example, one GatherGrad will take 15ms and there are 3 for each layer in the graph, after the fusion, one Concat takes only 35us. The total time of a step is improved from 1.5s to 0.4s.	2022-09-29 11:09:57 +08:00
PeixuanZuo	3157cdb19a	[ROCm] Fix MIGraphX ciagent user Permissions issues (#13137 ) ### Description <!-- Describe your changes. --> fix migraphx ci pipeline failed problem. Disabled MIGraphX pipeline now. It will be Enabled when this PR merge. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2022-09-29 10:25:02 +08:00
Baiju Meswani	5182d6610d	Upgrade pytorch to 1.12.1 for training pipelines (#13128 )	2022-09-28 17:59:49 -07:00
sfatimar	c9a86fa27f	Openvino GPU Unit/Python Tests fix failure (#13122 ) ### Description We fix iGPU Unit and Python tests with this PR We add packaging pip pkg to build Many Linux DockerFile ### Motivation and Context This change is required to make sure iGPU Unit Test/Python Tests with OV are fixed - If it fixes an open issue, please link to the issue here. --> Co-authored-by: shamaksx <shamax.kshirsagar@intel.com> Co-authored-by: mayavijx <mayax.vijayan@intel.com> Co-authored-by: pratiksha <pratikshax.bapusaheb.vanse@intel.com> Co-authored-by: pratiksha <mohsinx.mohammad@intel.com> Co-authored-by: Sahar Fatima <sfatima.3001@gmail.com> Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com> Co-authored-by: nmaajidk <n.maajid.khan@intel.com> Co-authored-by: Mateusz Tabaka <mateusz.tabaka@intel.com>	2022-09-28 16:00:06 -07:00
Adam Pocock	388d3cf847	[Java] Fix OnnxSequence semantics (#13012 ) Previously OnnxSequence would flatten out a list of tensors into a single output array assuming they were all scalar values. This doesn't accurately represent the semantics of an ONNX sequence, but was what the semantics appeared to be years ago when I first wrote that class. This PR changes it so that the `getValue` method on `OnnxSequence` unwraps the sequence and returns `List<? extends OnnxValue>` allowing the user to process the individual ONNX values separately. It's done this way rather than returning a multidimensional array for a tensor and a Java map for a map as multidimensional arrays are very inefficient in Java and best practice when operating with a OnnxTensor in Java is to use a `java.nio.ByteBuffer`. So allowing users to access each `OnnxTensor`s individually allows them to control how the data is materialised on the Java heap.	2022-09-28 15:53:30 -07:00
Edward Chen	55ae71c160	Reduce Objective-C static analysis build time. (#13149 )	2022-09-28 15:49:48 -07:00
PeixuanZuo	c26bb1bb19	Allow fastgelu/skiplayernorm profile by pass args from commandline (#13025 ) Description: Describe your changes. This allow us quickly launch a microbench session by, for example: `python skip_layer_norm_test.py 8 128 128 float32 `	2022-09-28 15:48:59 -07:00
cloudhan	32c2c4b480	Change ROCm to use tunable GEMM (#12853 ) Change ROCm to use tunable GEMM. It is not enabled in this PR. This will drastically improve GEMM performance in some shapes and dtypes configuration. This will benefit the overall performance for BERT inference and hopefully, training, when enabled.	2022-09-28 16:21:54 +08:00
PeixuanZuo	5e4ebbd9d9	[ROCm] add MIGraphX ci pipeline (#11569 ) Description: Describe your changes. Add migraphx ci pipeline, test build and unit tests. This PR is based on #11492 Pipeline : https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=765711&view=results	2022-09-28 10:59:30 +08:00

... 6 7 8 9 10 ...

7863 commits