onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-19 19:00:47 +00:00

Author	SHA1	Message	Date
Hector Li	db32eacda1	make the UNSIGNEDPD_CHECK for Windows only (#13260 ) Fix issue reported from https://github.com/microsoft/onnxruntime/issues/13247 The UNSIGNEDPD_CHECK should apply to Windows only	2022-10-13 11:08:35 -07:00
Vincent Wang	807b2f4dd5	[ORTModule] Use Env Variable to Set Provider Option cudnn_conv_algo_search (#13296 ) This PR is to add support of using env variable to set provider option cudnn_conv_algo_search so that user can choose better conv algo search method to run model. This is a quick fix to unblock the test of MoE model. Will have another PR to design and implement the ORTModule config so that we can config ORTModule using Python script or config file instead of env variable.	2022-10-13 15:36:21 +08:00
Vincent Wang	6fb70a82df	[ORTModule] Update Supported DeepSpeed Version for FP16_Optimizer (#13305 ) Update supported deepspeed highest version from 0.7.1 to 0.7.3 for FP16_Optimizer. Also add version info to warning log.	2022-10-13 13:03:01 +08:00
Vincent Wang	afb5f76770	[ORTModule] ATen Support for torch.nn.GroupNorm (#13293 ) Model [huggingface's diffusers library](https://github.com/huggingface/diffusers) has torch.nn.GroupNorm which will be exported to sub-graph containing ONNX's InstanceNormalization, which is lack of gradient. The implementation of ORT's InstanceNormalization will call cuDNN's BatchNorm for part of computation, which is not efficient compared to PyTorch's implementation. This PR is to use ATen fallback to support this torch module, including its forward and backward.	2022-10-13 11:59:03 +08:00
pengwa	79ac0231a9	Fix scalar sharing bug (#13299 ) ### Description float and half initializers with same value are merged into same initializer. This is a bug due to when we create pattern key, data type is always be -1 (which is a naive mistake when doing code refactoring previously), plus float and half are stored as float in constant store for easier data comparison. Added test coverage. ` [ONNXRuntimeError] : 1 : FAIL : Type Error: Type parameter (T) of Optype (Mul) bound to different types (tensor(float) and tensor(float16) in node ` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2022-10-13 11:19:00 +08:00
Dmitri Smirnov	f0fbff6dd4	Adjust docs to comply with Doxygen requirements (#13302 ) ### Description Fix up param names in docs ### Motivation and Context Make pipelines pass	2022-10-12 18:07:18 -07:00
PeixuanZuo	6895918b1c	[ROCm] Revert CI pipeline to ROCm5.2.3 (#13297 ) ### Description <!-- Describe your changes. --> Unit test with ROCm5.3 slower than ROCm5.2.3. Revert to ROCm5.2.3. We will update to ROCm5.3 when the issue resloved by AMD. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2022-10-12 10:47:33 -07:00
Edward Chen	9422438782	Objective-C static analysis - use different llvm path to try to find clang-tidy. (#13280 ) Use different llvm path to try to find clang-tidy. Sometimes the build fails because it can't find clang-tidy. Hopefully this path works better.	2022-10-12 10:16:26 -07:00
Scott McKay	cbe4eb65b3	Add backwards compatibility for all versions of ORT format model in full build. (#13242 ) ### Description <!-- Describe your changes. --> Add ability to upgrade an ORT format model when loaded in a full build by inserting the kernel constraint info and ignoring the kernel hashes. This also allows upgrading the model to the latest format by saving the model after loading. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? Provide official path to upgrading an ORT format model directly (vs. reconverting). Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2022-10-12 17:45:52 +10:00
Yi Zhang	67bde18d0d	Update Win_GPU_CI trigger (#13290 ) ### Description supplement of #13248 Add PR trigger https://learn.microsoft.com/en-us/azure/devops/pipelines/repos/github?view=azure-devops&tabs=yaml#pr-triggers fix: master -> main Testted with #13289 #13292 NB: the real pipeline is always triggered if the workflow yaml changed even it's added in the path filter. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Make sure the real pipeline not run in the backend.	2022-10-12 15:22:42 +08:00
Pranav Sharma	5b0d28b5b5	Check for null input (#13286 ) ### Description Check for null input ### Motivation and Context This has been reported at least twice (once by the Windows team and once by Speech team). Currently we just segfault.	2022-10-12 00:15:27 -07:00
Vincent Wang	a2658f0784	[ORTModule] Fix Graph Builder for Eval Mode (#13255 ) Current graph builder for ORTModule will apply the training's graph optimizations for both training and eval mode. Take BatchNorm as example, one of training's graph optimizations will replace BatchNormalization Op to BatchNormInternal which is for training only. This PR is to fix this, for eval mode, we will not apply the training's graph optimizations. The inference's graph optimizations will be applied when InferenceSession initialization.	2022-10-12 14:39:54 +08:00
Yi Zhang	0d672e9112	Enable C# test load models with more complex directories. (#13251 ) ### Description Currently, C# test only load models with the directory structure as `{modelroot}->{opsetXX}->{modelname}->{.onnx}` In this PR, C# test can load models from `{modelroot}->{model-source}->{opsetXX}->{modelname}->{.onnx}` ### Motivation and Context There're multiple sources of testing models. 1. model zoo (Not in official image) 2. 1st party models 3. models with contrib-ops 4. others. It'd better to insert a mid-directory for new sources. This PR is compatible with current models. From https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=776643&view=logs&j=6df8fe70-7b8f-505a-8ef0-8bf93da2bac7&t=e7d9f128-b630-5ee6-a99e-2fca70d04619&l=79 the test result is same as master build `Passed: 583, Skipped: 14, Total: 597` model zoo models (mounted in ..\models\zoo) could be loaded And from this test workflow, it can load both existing models and models from model zoo. https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=777018&view=logs&j=6df8fe70-7b8f-505a-8ef0-8bf93da2bac7&t=e7d9f128-b630-5ee6-a99e-2fca70d04619 Skipping failed models will be in other PRs	2022-10-12 13:53:58 +08:00
Yi Zhang	8a3407d54f	update file name in the comment (#13275 ) ### Description Correct the file name in the comments of the generated yaml. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2022-10-12 08:35:42 +08:00
cloudhan	1e55949a70	Fix unsound hipify in ROCm EP (#13269 ) Some cuda related things is still left in the rocm ep statically hipified code. Eliminate them to avoid confusion.	2022-10-12 08:32:42 +08:00
PeixuanZuo	b2353fa737	[ROCm] Add ROCm5.3 to python package pipeline (#13249 ) ### Description <!-- Describe your changes. --> 1. Remove ROCm5.1.1 and ROCm5.2 from ROCm python package pipeline 2. Add ROCm5.3 to ROCm python package pipeline pipeline: https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=237172&view=results ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2022-10-12 07:23:42 +08:00
Nat Kershaw (MSFT)	fb86edb19f	Update publish-c-apidocs.yml to use main instead of master (#13281 )	2022-10-11 15:37:59 -07:00
Prathik Rao	93e0a15117	implement cos gradient as a function op (#13227 ) ### Description Implemented gradient of cos as per the function below. ![image](https://user-images.githubusercontent.com/31260940/193900310-b62a3e77-06d5-45af-ad28-a1d41920bad0.png) ### Motivation and Context Cos gradient required for [huggingface's diffusers library](https://github.com/huggingface/diffusers) ### Testing built ORT from source: `./build.sh --config RelWithDebInfo --enable_training --use_cuda --cuda_home /usr/local/cuda --cudnn_home /usr/local/cuda --build_wheel --parallel --skip_tests` tested CosGrad implementation: `cd build/Linux/RelWithDebInfo/ && ./onnxruntime_test_all --gtest_filter=GradientCheckerTest.CosGrad` Co-authored-by: Prathik Rao <prathikrao@microsoft.com>	2022-10-11 10:11:19 -07:00
Prathik Rao	05acd20a88	convert singrad to function op and remove cpu kernel (#13263 ) ### Description Implemented gradient of sin as a function op. ### Motivation and Context Sin gradient currently implemented as cpu op which could hurt performance. ### Testing built ORT from source: `./build.sh --config RelWithDebInfo --enable_training --use_cuda --cuda_home /usr/local/cuda --cudnn_home /usr/local/cuda --build_wheel --parallel --skip_tests` tested SinGrad implementation: `cd build/Linux/RelWithDebInfo/ && ./onnxruntime_test_all --gtest_filter=GradientCheckerTest.SinGrad` Co-authored-by: Prathik Rao <prathikrao@microsoft.com> Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>	2022-10-11 10:11:08 -07:00
Yi Zhang	cd2e8b306c	Replace or remove some characters to meet gtest name convention (#13266 ) ### Description To construct test name, replace whitespace to underscore and remove parentheses ### Motivation and Context gtest name only accepts '_' and alphanumeric	2022-10-11 16:23:54 +08:00
petermcaughan	febd5facce	Change head_size parameter dependent on qkv_hidden_size (#12933 ) Description: Add qkv_hidden_size support in CUDA Attention Layer implementation. Changes include: - Modify UT to test GPU and CPU implementation - Add overload for CUDA kernel `AddBiasTransposeQKV` to support scenario where V_HIDDEN_SIZE != QK_HIDDEN_SIZE - Update variable names from `head_size` to `qkv_head_sizes[0]` or `qkv_head_sizes[2]` - Modify function definitions to allow communication of `qkv_hidden_sizes` or `qkv_head_sizes` Note that this feature is not supported in Rocm EP or quantized attention right now. Motivation and Context - Why is this change required? What problem does it solve? The current CUDA implementation of attention layer doesn't support the parameter qkv_hidden_size added in the CPU implementation in PR [8039](https://github.com/microsoft/onnxruntime/pull/8039) - If it fixes an open issue, please link to the issue here. Co-authored-by: Peter Mcaughan <petermca@microsoft.com>	2022-10-11 00:25:47 -07:00
Vincent Wang	b9e23bd086	[ORTModule] Fix Custom Op Registry for Torch 1.13+ (#13250 ) This PR has two fixes: - https://github.com/pytorch/pytorch/pull/85636 change the behavior of register_custom_op_symbolic to only register the symbolic function at a single version. For ORTModule we need to pass the op_set version when calling it. - Since torch_1.13 the signature of einsum is changed to have a new argument, need to change our custom op symbolic registry code accordingly. Without the fixes, ORTModule will not work with the nightly torch, and the new torch version will be released.	2022-10-11 15:20:51 +08:00
Yi Zhang	6b499db7e1	increase ios pipeline timeout limit (#13268 ) ### Description <!-- Describe your changes. --> ### Motivation and Context The timeout issues increased	2022-10-11 14:07:04 +08:00
Yi Zhang	ea128cdb18	skip windows GPU check if changes only in doc (#13248 ) ### Description Use Path filter and fake workflow to skip windows GPU check if there's only changes in doc. Refs: https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/defining-the-mergeability-of-pull-requests/troubleshooting-required-status-checks#handling-skipped-but-required-checks The fake github yaml is generated by code. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> ###verifications:### In this PR: since the win-gpu-ci-pipeline.yml and .github are updated, so the real Windows GPU workflows are always triggered. in #13256 To avoid update win-gpu-ci-pipleline.yml, I added the path filter in devops page. the fake win GPU workflows triggered, and the real workflows are skipped.	2022-10-11 13:51:44 +08:00
PeixuanZuo	4d25b9c8f0	[ROCm] Update ROCm and MIGraphX CI pipeline to ROCm5.3 (#13257 ) ### Description <!-- Describe your changes. --> 1. Update ROCm pipeline and MIGraphX pipeline to ROCm5.3 ROCm pipeline run ortmodule test one time and disable it : https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=777794&view=logs&j=48b14a85-ff1a-5ca4-53fa-8ea420d27feb&t=9c199f35-fc50-565d-6c65-5162c9bb1b04 2. Add `workspace: clean: all `. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2022-10-11 13:47:22 +08:00
cloudhan	2cf5d04e3d	Fix clang-tidy(cppcoreguidelines-pro-bounds-array-to-pointer-decay) (#13241 ) clang-tidy says "Do not implicitly decay an array into a pointer; consider using gsl::array_view or an explicit cast instead" It is a false positive scattering around all our codebase when using helper macros. It is becuase for function with 4 char name, say `main`, the type of __FUNCTION__ and __PRETTY_FUNCTION__ is `char [5]`.	2022-10-11 13:16:48 +08:00
Edward Chen	00146b2541	Add onnxruntime_BUILD_UNIT_TESTS=OFF definition to iOS package build options. (#13238 ) Add onnxruntime_BUILD_UNIT_TESTS=OFF definition to iOS package build options. The `--skip_tests` option is already specified.	2022-10-10 18:00:17 -07:00
Dmitri Smirnov	25c0a66934	Natvis adjustments to make debugging bearable (#13237 ) ### Description - Fix Abseil::InlinedVector inlined storage visualization - Fix typo in protobuf natvis. - Add basic gsl.natvis ### Motivation and Context Debugging is hard.	2022-10-10 10:06:55 -07:00
pengwa	0668600255	Share scalar constant initializer (#12878 ) Description: 1. Share scalar constant for same data type, value and shape. 2. Fix the order of Graph resolve context clear and CleanUnusedInitializersAndNodeArgs(). Share initializer for those who hold same value in same type and shape, currently only handle scalar value or 1-D single value array. The transformation itself did not bring much impact on memory/perf, instead is helpful to simplify the graph, making it easier for common subexpression eliminations (CSE). Imagine graphs like this: ![image](https://user-images.githubusercontent.com/10530022/188895598-e06f9bf9-5466-4009-a68c-6b339133936c.png) Add is NOT shared as inputs of Clip after CSE transformation because, all Add's second constant input are different NodeArg, so if we change all constant initializer share the same NodeArg, then only one Add will be preserved after CSE transformation. There are few other similar cases in one of 1P deberta models. E2E measurement on 1P DEBERTA model, we see an increase from SamplesPerSec=562.041593991271 to 568.0106130440271, 1.07% gains. Fix the order of Graph resolve context clear and CleanUnusedInitializersAndNodeArgs(). Graph resolve context will be cleared every time by end of Graph.Resolve(), one of the thing to be cleared is the "inputs_and_initializers" who hold string_view of all initializers. While CleanUnusedInitializersAndNodeArgs removed some initializers, so some strings that is referenced by string_view in "inputs_and_initializers" remain to be there BUT in an invalid state. Motivation and Context - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here.	2022-10-10 13:32:33 +08:00
sumitsays	e01a8519e0	[DML EP] Re-architect \| Partitioning as Transformer (#13131 ) ### Description Re-architect DML EP to allow ORT L2/L3 transformers. This change includes: - During ORT graph partitioning, DML EP will only set the dmlExecutionProvider to all eligible nodes. - Moved DML specific operator transformer as L2 transformer - Introduced a new DMLGraphFusionTransformer, applicable only for DML EP, which is responsible to - partition the graph - fuse each partition into a IDMLCompiledOperator - register the kernel for each partition ### Motivation and Context - Why is this change required? What problem does it solve? It enables ORT L2/L3 transformers for DML EP, which will increase the perf of Transformer-based models. - If it fixes an open issue, please link to the issue here. N/A Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>	2022-10-07 22:35:47 -07:00
garanews	38906625a3	fix some typo in docs (#13212 ) ### Description <!-- Describe your changes. --> fix some typo in docs ### Motivation and Context singed vs signed succeding vs succeeding fileter vs filter kernal vs kernel libary vs library	2022-10-07 15:58:18 -07:00
Edward Chen	d411bd277e	Increase iOS packaging pipeline timeout. (#13233 ) Increase iOS packaging pipeline timeout to 300 minutes.	2022-10-07 14:49:16 -07:00
Dmitri Smirnov	bb1c133245	[MicroGraph] Address ROCM warning and build failure (#13234 ) ### Description Address build failures after Public API refactoring ### Motivation and Context Make pipelines health.	2022-10-07 14:30:19 -07:00
Jian Chen	6662ece4a1	increase timeout to 5 hours (#13226 ) ### Description Increase MacOS pipeline timeout to 5 hours ### Motivation and Context It blocks Release pipeline	2022-10-07 13:02:48 -04:00
Baiju Meswani	04ba8a7e6e	Introduce Training C++ Apis (#12994 )	2022-10-06 20:13:37 -07:00
cloudhan	51ac6617f5	Fix warnings and enable dev mode for ROCm CI (#13223 ) Fix warnings and enable dev mode for ROCm CI: * Fix ROCm headers complaining "This file is deprecated. Use the header file from ..." * Disable warning signed and unsigned compare for kernel explorer * Fix unused and nondiscard warnings * Enable dev mode for ROCm CI * Walkaround error "unknown warning option '-Wno-nonnull-compare'" in kernel explorer by using '-Wno-unknown-warning-option' to ignore the unknown option * Fix error "unused parameter 'mask'" * Fix warning "instantiation of variable 'onnxruntime::rocm::Consts<float>::One' required here, but no definition is available", etc. Fixed by using C++17's inline (implied by constexpr) static initialization. * Remove unused variable * Add the missing `override` specifier	2022-10-07 09:45:01 +08:00
Dmitri Smirnov	5dae0c477d	Deprecate CustomApi and refactor public API for better safety and consistency (#13215 ) ### Description Deprecate CustomOpApi and refactor dependencies for exception safety and eliminate memory leaks. Refactor API classes for clear ownership and semantics. Introduce `InitProviderOrtApi()` ### Motivation and Context Make public API better and safer. Special note about `Ort::Unowned`. The class suffers from the following problems: 1. It is not able to hold const pointers to the underlying C objects. This forces users to `const_cast` and circumvent constness of the returned object. The user is now able to call mutating interfaces on the object which violates invariants and may be a thread-safety issue. It also enables to take ownership of the pointer and destroy it unintentionally (see examples below). 2. The objects that are unowned cannot be copied and that makes coding inconvenient and at times unsafe. 3. It directly inherits from the type it `unowns`. All of the above creates great conditions for inadvertent unowned object mutations and destructions. Consider the following examples of object slicing, one of them is from a real customer issue and the other one I accidentally coded myself (and I am supposed to know how this works). None of the below can be solved by aftermarket patches and can be hard to diagnose. #### Example 1 slicing of argument ```cpp void SlicingOnArgument(Ort::Value& value) { // This will take possession of the input and if the argument // is Ort::Unowned<Ort::Value> it would again double free the ptr // regardless if it was const or not since we cast it away. Ort::Value output_values[] = {std::move(value)}; } void main() { const OrtValue* ptr = nullptr; // some value does not matter Ort::Unowned<Ort::Value> unowned{const_cast<OrtValue>(ptr)}; // onowned is destroyed when the call returns. SlicingOnArgument(unowned); } ``` #### Example 2 slicing of return value ```cpp // The return will be sliced to Ort::Value that would own and relase (double free the ptr) Ort::Value SlicingOnReturn() { const OrtValue ptr = nullptr; // some value does not matter Ort::Unowned<Ort::Value> unowned{const_cast<OrtValue*>(ptr)}; return unowned; } ```	2022-10-06 14:57:37 -07:00
Ti-Tai Wang	87f55505b3	[ONNX] Support huggingface BART to ONNX (#12779 ) Add BART into transformer support, specificalyy for `BartForConditionalGeneration` Motivation and Context - fixes #11210 Currently, the custom op beam search is not working in nightly, this PR should be run with a [custom commit](`10f3d46d92`)	2022-10-06 12:20:03 -07:00
Rachel Guo	814e5cfa4c	[rn] Support UINT8 type for onnxruntime-react-native on iOS (#13210 ) ### Description <!-- Describe your changes. --> As title. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Uint8 type might be required for some model used in sample application. To match supported data types for onnxruntime-react-native for Android. Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local> Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>	2022-10-06 11:35:25 -07:00
ashari4	b09dd11ece	BFP schemas: Change block dimension type to Int (#13169 ) * Change block dimension type to Int from Ints. * In response to feedback that the block dimension corresponds to the reduction dimension of the consuming matrix multiplication. There is always only 1 reduction dimension.	2022-10-06 11:11:43 -07:00
Scott McKay	cf075fcbad	Handle edge case in CumSum causing overflow (#13174 ) ### Description <!-- Describe your changes. --> Add special case handling for exclusive + reverse where axis has dim value of 1. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? #13165	2022-10-06 07:18:02 +10:00
Edward Chen	4e37464cc5	Add build configuration to binary size checks pipeline. (#13208 ) Add another build configuration to binary size checks pipeline. Enable additional configurations to be added more easily.	2022-10-05 12:39:19 -07:00
Tony Xia	c7522e547a	Fixed a minor typo (#13194 ) ### Description binraries ==> binaries ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2022-10-05 12:10:14 -07:00
Zhang Lei	dca941795e	Fix prefast bugs: 1944959 1997925 1997926 1997927 1997928 (#13203 )	2022-10-05 08:59:40 -07:00
cloudhan	72076b1eb2	Update ROCm CI to use HIP LANGUAGE (#13214 ) Update for ROCm CI before reland tunable GEMM #12853. This PR also update composable kernel to use CMakes's HIP language support so that we can mix C/C++ compiler with HIP compiler instead of locking to hip-clang	2022-10-05 16:15:16 +08:00
Ashwini Khade	4fc8f7139a	Bug Fix - C# API order incompatibile with C API (#13191 ) ### Description Training C# bindings (ReleaseTrainingSession and ReleaseCheckpointState) broke after an API order change in Training C API. This PR fixes this issue. ### Motivation and Context Bug Fix for Training C# bindings <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2022-10-04 09:29:20 -07:00
Justin Chu	595a0c8658	Disable clang-tidy CI (#13207 ) Disable clang-tidy CI for now because it is creating a lot of false positives like in https://github.com/microsoft/onnxruntime/pull/12998	2022-10-04 07:37:49 -07:00
Tianlei Wu	b6c04f48c1	Fix reshape fusion (#13150 ) (1) Hot fixes reshape fusion, which causes stable diffusion unet model invalid. (2) Update remove_cascaded_cast_nodes to make it faster	2022-10-04 00:26:29 -07:00
Faith Xu	2d50d4be24	Update TSA path to new ADO project (#12902 ) Updates TSA item path to new ADO project area paths	2022-10-03 22:54:42 -07:00
Ashwini Khade	c780c4a2b9	Fix two prefast warnings (#13211 )	2022-10-03 20:00:57 -07:00

1 2 3 4 5 ...

7546 commits