onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-13 18:08:13 +00:00

Author	SHA1	Message	Date
Changming Sun	db4fc12318	Add support for building the code on Windows ARM64 natively (#15371 ) ### Description Recently Visual Studio and python started to provide native Windows ARM64 packages. This PR is to provide better support for building on Windows ARM64. You can do it as what you did for x64. Like: ``` python tools\ci_build\build.py --config Debug --update --skip_submodule_sync --build_dir b --cmake_generator "Visual Studio 17 2022" ``` You do not need to append the "--arm64" build arg, and do not need to cross-compile protoc for a different arch as you are not cross-compiling. caveat: it does not work with the latest cmake release(3.26.x). It only works fine with cmake 3.25.x and below. Filed a bug to them: https://gitlab.kitware.com/cmake/cmake/-/issues/24797 ### Motivation and Context Provide better support for building on Windows ARM64.	2023-04-11 17:14:54 -07:00
Rachel Guo	9c42d5e31f	[CoreML EP]Add broadcasting support for binary ops (#15187 ) ### Description <!-- Describe your changes. --> As title ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> https://github.com/microsoft/onnxruntime/issues/15110 --------- Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local> Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: Scott McKay <skottmckay@gmail.com>	2023-04-11 13:50:45 -07:00
Yulong Wang	0fbf715824	[build] add script to validate generated NPM packages (#15453 ) ### Description add script to validate generated NPM packages and publish it to artifacts, so that release pipeline can use it. once this PR is merged, I will update the NPM package release pipeline.	2023-04-11 11:04:55 -07:00
Dmitri Smirnov	ce3b4eabd3	Implement Optional Metadata support and C# test support (#15314 ) ### Description Implement Optional Type metadata support in the library. Implement optional support in C# API along with metadata. Implement Sequence, Map, Optional test data support and test execution. Prune tests and provide more details for failing tests in C# code. Note, this PR does not enable running onnx test models in C++. ### Motivation and Context Opset18 optional type support.	2023-04-11 09:41:59 -07:00
Edward Chen	0497ac0432	Support additional op domains in op reduction script. (#15424 ) Add support for kMSInternalNHWCDomain and kPytorchAtenDomain op domains to op reduction script. Make it an error if the op reduction script encounters unknown op domains.	2023-04-11 08:57:51 -07:00
Patrice Vignola	3be5bfe363	[DML EP] Add MatMul + SoftMax fusion (#15240 )	2023-04-11 08:31:04 -07:00
Patrice Vignola	7c927bb95c	[DML EP] Add BiasSplitGelu (#15197 )	2023-04-11 08:30:37 -07:00
Yi Zhang	311f84d00c	Fix one nuget packaging pipline error (#15458 ) ### Description Fix one typo in #14965 ### Motivation and Context Fix the error `"onnxruntime_providers_shared.dll not found for win-x64"`	2023-04-11 18:00:10 +08:00
zhijiang	29c74d3c43	softmax perf improvement pr1 - add more softmax related test (#15176 ) 1. add fp16 test 2. add test for shape is not power of two.	2023-04-11 17:02:40 +08:00
Ye Wang	ef42fd09fb	google/mt5 optimization and fix (#15454 ) ### Description <!-- Describe your changes. --> 1. enabled self-attention fusion in mt-5 decoder graph 2. fix a parity issue https://github.com/microsoft/onnxruntime/issues/15042 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>	2023-04-11 00:09:11 -07:00
Patrice Vignola	c5b6ee1a99	[DML EP] Add NhwcConv (#15194 )	2023-04-10 23:16:09 -07:00
cloudhan	9acbfc6a29	ROCm MHA (#15279 ) Add MultiHeadAttention for ROCm EP. Before: ``` 'engine': 'onnxruntime' 'version': '1.15.0' 'height': 512 'width': 512 'steps': 50 'batch_size': 1 'batch_count': 5 'num_prompts': 1 'average_latency': 3.878769588470459 'median_latency': 3.8792178630828857 'first_run_memory_MB': -1 'second_run_memory_MB': -1 'model_name': 'runwayml/stable-diffusion-v1-5' 'directory': './sd-v1-5-onnx-fp16-nomha' 'provider': 'ROCMExecutionProvider' 'disable_safety_checker': True ``` After: ``` 'engine': 'onnxruntime' 'version': '1.15.0' 'height': 512 'width': 512 'steps': 50 'batch_size': 1 'batch_count': 5 'num_prompts': 1 'average_latency': 2.364924430847168 'median_latency': 2.3650705814361572 'first_run_memory_MB': -1 'second_run_memory_MB': -1 'model_name': 'runwayml/stable-diffusion-v1-5' 'directory': './sd-v1-5-onnx-fp16' 'provider': 'ROCMExecutionProvider' 'disable_safety_checker': True ```	2023-04-11 13:20:44 +08:00
Yi Zhang	feafbc4263	Refactor all Mac build steps (#15440 ) ### Description ### Motivation and Context Make the compilation cache steps easy to use and maintain Reduce cache storage.	2023-04-11 12:12:46 +08:00
Changming Sun	d175e87a1f	Delete eager mode code and increase minimal required python version to 3.8 (#15450 ) ### Description 1. Delete eager mode code. 2. Increase the minimal required python version to 3.8.	2023-04-10 16:00:04 -07:00
Patrice Vignola	4a676b011a	[DML EP] Add BiasAdd (#15211 )	2023-04-10 14:46:33 -07:00
Sheil Kumar	ce9ad8c8bc	For HLSL shader ops in the DirectML EP (STFT,DFT) FP16 ops should fal… (#15448 ) CP: [For HLSL shader ops in the DirectML EP (STFT,DFT) FP16 ops should fallback to CPU when there is no hardware support #15414 ](https://github.com/microsoft/onnxruntime/pull/15414) For HLSL shader ops in the DirectML EP (STFT,DFT) FP16 ops should fallback to CPU when there is no hardware support.	2023-04-10 13:21:40 -07:00
Shukant Pal	6657df9212	[CoreML EP] Add support for LeakyReLU activation layers (#15327 ) ## Description Implements support for LeakyReLU in ActivationOpBuilder for CoreML's EP. ### Motivation and Context This speeds up inference on macOS significantly for models using LeakyReLU.	2023-04-10 13:01:55 -07:00
Yulong Wang	0205b63756	[wasm] optimize default session options parsing (#15428 ) ### Description optimize default session options parsing. - do minimal property assignment to the passed in `options` object. - modify default value of `enableCpuMemArena` and `enableMemPattern` to `false`. We don't get benefits from enabling these 2 flags in web assembly	2023-04-10 11:09:09 -07:00
Changming Sun	c8524d2dab	Refactor web-ci pipeline and delete eager mode CI pipeline (#15416 ) ### Description 1. Move it to a separated pool that use the same image as [the public hosted pool](https://learn.microsoft.com/en-us/azure/devops/pipelines/agents/hosted?view=azure-devops&tabs=yaml). Also, create a beta pool which contains the next version image of the hosted pool, and add jobs in our post merge pipeline to test if the next version image will break our CI. So, usually we will have at least one week to prepare. 2. Change the cmake generator in use in our pipelines from "Ninja" to "MingW Makefile", because the latest version of cmake doesn't work with the latest version of Ninja. People who prefer Ninja could still use ninja in their local build by passing "--cmake_generator ninja" to [build.py](https://github.com/microsoft/onnxruntime/blob/main/tools/ci_build/build.py). 3. Delete eager mode CI pipeline. ### Motivation and Context I need to update the software we have in our CI build machines, and I need to resolve this incompatibility issue. In more detail, the build error I hit was: em++: error: CMakeFilesonnxruntime_mlas_test.dirC_a_work1sonnxruntimetestmlasunittesttest_activation.cpp.o: No such file or directory ("CMakeFilesonnxruntime_mlas_test.dirC_a_work1sonnxruntimetestmlasunittesttest_activation.cpp.o" was expected to be an input file, based on the commandline arguments provided) After this PR we will deprecate python 3.7 support. The eager mode CI pipeline is the last one that still use python 3.7. Then we can rework the PR #10953 made by [fs-eire](https://github.com/fs-eire) last year. Fixed [AB#14435](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/14435)	2023-04-10 10:41:04 -07:00
Hector Li	9ef11f1c6a	[QNN EP] Qnn batchnorm Op support (#15222 ) ### Description Support BatchNorm Op in Qnn EP Node Unit group support for BatchNorm, Exp ops ### Motivation and Context Enable more models.	2023-04-10 10:36:57 -07:00
Yi Zhang	0ea965c541	clear cache stat. after building (#15439 ) ### Description Add `ccache -z` after every building. ### Motivation and Context Uploaded Cache stat shouldn't include cache stat.	2023-04-10 13:56:55 +08:00
stevenlix	6d126f8996	Add FP16 support for Whisper model (#15427 ) Current ORT can only run inference for Whisper FP32 model. This PR adds FP16 support.	2023-04-08 21:36:10 -07:00
Ye Wang	34f22daf25	Support T5 Beam Search with DecoderMaskedMHA (#15386 ) ### Description <!-- Describe your changes. --> tldr: Latency improvement t5-small: 37.8% t5-base: 24.5% Benchmark on V100 Before: T5-small ORT {'test_times': 1, 'latency_variance': '0.00', 'latency_90_percentile': '104.74', 'latency_95_percentile': '104.74', 'latency_99_percentile': '104.74', 'average_latency_ms': '104.74', 'QPS': '19.10', 'parity': True} T5-base ORT {'test_times': 1, 'latency_variance': '0.00', 'latency_90_percentile': '200.93', 'latency_95_percentile': '200.93', 'latency_99_percentile': '200.93', 'average_latency_ms': '200.93', 'QPS': '9.95', 'parity': True} After: T5-small ORT {'test_times': 1, 'latency_variance': '0.00', 'latency_90_percentile': '76.01', 'latency_95_percentile': '76.01', 'latency_99_percentile': '76.01', 'average_latency_ms': '76.01', 'QPS': '26.31', 'parity': True} T5-base ORT {'test_times': 1, 'latency_variance': '0.00', 'latency_90_percentile': '161.40', 'latency_95_percentile': '161.40', 'latency_99_percentile': '161.40', 'average_latency_ms': '161.40', 'QPS': '12.39', 'parity': True} ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>	2023-04-08 12:50:18 -07:00
Hariharan Seshadri	f77c8f4863	Fix Npm packaging pipeline (#15425 ) ### Description It seems like https://github.com/microsoft/onnxruntime/pull/15329 re-worked some jobs in `react-native-ci.yml` into stages. When this template is used from within `npm-packaging-pipeline.yml`, there is problem in that there is a stage that contains multiple stages as jobs. Per my understanding, this is not acceptable to Azure DevOps. So, re-working some portion of `npm-packaging-pipeline.yml` to accomadate changes in https://github.com/microsoft/onnxruntime/pull/15329 ### Motivation and Context Fix NPM packaging pipeline Validating test run with fix: https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=297391&view=results	2023-04-07 22:13:39 -07:00
Ryan Hill	56beac4b5b	VIT model handling in the Benchmark.sh file (#15045 ) ### Description Adds VIT model type to the benchmark Also adds Swin (v1) model type ### Motivation and Context Image models are important and we should verify these work as expected at the performance we expect.	2023-04-07 20:17:29 -07:00
Pranav Prakash	3c5d02a9ce	Implement BatchNormGradient kernel for CPU EP (#7622 ) Description: Register an implementation for BatchNormInternal and add a CPU kernel for BatchNormGradient. This is the third in a series of PRs to implement BN training on CPU (first was #6946, second was #7539). Motivation and Context Support training networks with BatchNorm (e.g. convnets). Also note that there exists a CUDA kernel for BN (forward training & backwards) but it's currently disabled due to flaky failures; someone more familiar with those parts can register the implementation for BNInternal on CUDA (gradient kernel doesn't have to change). --------- Co-authored-by: Simon Zirui Guo <simonguozirui@berkeley.edu> Co-authored-by: mindest <linminuser@gmail.com> Co-authored-by: mindest <30493312+mindest@users.noreply.github.com>	2023-04-08 09:20:26 +08:00
Rui Ren	5e2f46df2b	update deepspeed version 0.8.3 (#15415 ) ### Description <!-- Describe your changes. --> Update the support deepspeed to 0.8.3 as it's the latest version ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This will fix the error of `Skip modifying optimizer because of unsupported DeepSpeed version` Co-authored-by: ruiren <ruiren@microsoft.com>	2023-04-07 17:59:50 -07:00
Edward Chen	666aff56a4	Add workflow to update Objective-C docs. (#15413 ) Add workflow to update Objective-C API docs. Remove the Objective-C API doc generation step from the packaging pipeline. There are similar workflows for automatically updating other language API docs. This change enables this for Objective-C too.	2023-04-07 15:00:15 -07:00
Edward Chen	8db86f2c52	Use fixed version of Android NDK in binary size checks pipeline. (#15422 ) Ensure that we build with a known version of NDK and are not surprised when the default version on the build machine changes. A similar change was made for other Android build pipelines previously, but this one was missed.	2023-04-07 14:53:54 -07:00
Yateng Hong	9bb4e4bef4	Fix masm flags (#15417 ) ### Description Fix onnxruntime_mlas build failure with cmake 3.26. Updated CMAKE generator expression to make sure certain complier flags only apply for C/CXX compiler. ### Motivation and Context CMake changed the behavior of ASM_MASM in version 3.26. See https://gitlab.kitware.com/cmake/cmake/-/issues/24639. This also fixed the issue of #15101	2023-04-07 10:20:03 -07:00
Adrian Lizarraga	c294040bac	[QNN EP] Support AveragePool operator (#15419 ) ### Description Adds support for the AveragePool operator to QNN EP. ### Motivation and Context This is needed to enable more models to run with QNN EP.	2023-04-07 10:09:55 -07:00
Edward Chen	139f3df4d2	Update binary size checks pipeline to use stages for separate checks. (#15408 ) Allow running of any single check instead of all of them.	2023-04-07 09:55:40 -07:00
Chen Fu	8dce83a818	Fuse 'Add' operator into FP16 Conv (#15213 ) ### Description Adding 'Add' functionality to FP16 Conv operator. It takes a tensor that has the same shape of the output tensor, and add it to the result tensor. ### Motivation and Context Needed to run Resnet 50	2023-04-07 09:51:03 -07:00
Hector Li	bb21031cbb	[QNN EP]Fix issue in LeakyRelu Opbuilder for HTP backend. (#15356 ) ### Description Fix issue in LeakyRelu Opbuilder for HTP backend. Qnn Prelu(Onnx LeakyRelu) requires alpha data as the 2nd input while Onnx set it as attribute. HTP backend requires input to be quantized. It caused Qnn Op validation failed by setting the 2ns input as float32 data type. Fix: Need to set the 2nd input as quantized input for HTP backend. Calculate the quantization parameter and quantize the alpha data into uint8. ### Motivation and Context Unblock models with the LeakyRelu execution on QualComm HTP backend.	2023-04-07 09:15:07 -07:00
pengwa	16f5909f2d	Introduce shrunken gather operator (#15396 ) ### Introduce shrunken gather operator Exist Gather operator schema won't guarantee output element count will be smaller than input element count. Actually, it is possible output element count >, =, or < input element count. For some cases we know for sure output element count MUST be <= input element count, we will upstream those Gather operators to reduce compute flops. So this PR introduces an ShrunkenGather which explicitly guarantee output count will be smaller than input count. The operator add additional restriction on inputs, but still re-use existing Gather's implementations plus input check during runtime. This is a requirement for subsequent optimization (Draft PR: https://github.com/microsoft/onnxruntime/pull/15401) we will do for label sparsity and embedding sparsity.	2023-04-07 15:12:58 +08:00
Adrian Lizarraga	d31dd5935a	[QNN EP] Support Resize's pytorch_half_pixel coordinate transformation mode on HTP (#15390 ) ### Description - Now uses QNN's Resize operator for quantized models - Still uses QNN's ResizeBilinear or ResizeNearestNeighbor for non-quantized models. ### Motivation and Context This update is necessary to support more models on QNN HTP backend. Specifically, we need to support Resize's `pytorch_half_pixel` coordinate transformation mode on HTP.	2023-04-06 23:56:33 -07:00
Hector Li	03dd4e6da3	[QNN EP]fix bug in DlError (#15412 ) ### Description fix bug in DlError. nullptr returned from DlError() will cause crash.	2023-04-06 20:01:08 -07:00
Changming Sun	df11c85955	Download protoc.exe from nuget when cross-compiling (#15395 ) ### Description 1. The protoc package on nuget.org contains binaries for Windows_x86/Windows_x64/Linux_x86/Linux_x64/MacOS_x64, which can cover most use cases. Though it doesn't have binaries for AMR64, they are only needed when we cross-compile for Intel CPUs on ARM CPUs. It is rare. When you have such a need, you always can build protoc from source by yourself and pass it to build.py as "--path_to_protoc_exe". Or if you have security concerns that you don't want to use prebuilt binaries from outside, you can do the same thing. 2. Remove GoogleTestAdapter related thing. That part of code is out of maintain. ### Motivation and Context As a follow-up of PR #15190.	2023-04-06 17:06:59 -07:00
Yuriy Chernyshov	65579021ee	Remove UTF-8 BOM (#15026 )	2023-04-06 16:09:17 -07:00
Aditya Goel	e5617617fc	Float to float label encoder (#15400 )	2023-04-06 16:05:36 -07:00
Hector Li	276c0a00e4	Reuse QDQConv for ConvTranspose to generate the QDQ model (#15385 ) ### Description Reuse QDQConv for ConvTranspose to generate the QDQ model ### Motivation and Context Generate the correct QDQ model	2023-04-06 15:07:44 -07:00
petermcaughan	2bd8e4a130	Petermca/whisper dedup (#15365 ) ### Description Apply `get_shared_initializers()` to the encoder and decoder subgraphs of Whisper before chaining and exporting the full, final model. ### Motivation and Context The Whisper export process has some overlap between the encoder and decoder subgraphs due to the format of the BeamSearch contrib op. Consequently, there is some shared model data that is duplicated in the final exported product, which can result in a file size increase of ~40%. This PR takes the methods in `convert_generation.py` and applies them during the whisper export process. --------- Co-authored-by: Peter McAughan <petermca@microsoft.com>	2023-04-06 13:27:05 -07:00
Dmitri Smirnov	dc1845a9c8	Update mimalloc dependancy to the latest release (2.1.1) for Windows build. (#15382 ) ### Description Update mimalloc dependency. ### Motivation and Context The latest release contains important fixes including memory leaks and used by customers.	2023-04-06 13:07:00 -07:00
petermcaughan	d0cca91cfb	Fix token_id values for whisper export (#15362 ) ### Description The current ONNX export of Whisper utilizes hard-coded values for token_ids when configuring the BeamSearch node. This PR removes these literals and instead takes these values straight from the WhisperConfig. ### Motivation and Context Hard-coding these values can cause some parity issues when comparing to default PyTorch behavior - this change to take from WhisperConfig resolves these. Co-authored-by: Peter McAughan <petermca@microsoft.com>	2023-04-06 11:01:21 -07:00
Deokhwan Kim	55495cc809	Do not apply QuickGeluFusion if an intermediate tensor is a graph output (#15109 )	2023-04-06 10:17:06 -07:00
Stephan Gocht	026fb3ca1e	Fix compilation error when CUDNN_HOME is defined. (#15348 )	2023-04-06 08:56:20 -07:00
Sheil Kumar	0fbbb6a43e	WindowsAI build failing due to deprecated .NET5 SDK missing in build image (#15383 ) WindowsAI build failing due to deprecated .NET5 SDK missing in build image .NET5 was deprecated last year, and recently the build machine images have been updated to not include this SDK. Unblock failing builds by force insalling .NET5 SDK as part of the build pipeline.	2023-04-06 08:51:07 -07:00
Changming Sun	a5b4d2a8a7	XNNPack: allow users to choose whether enable CPU MEM arena or not (#15392 ) ### Description XNNPack: allow users to choose whether enable CPU MEM arena or not. Right now it is hardcoded to true and it is not impacted by the on/off switch in SessionOption. We should make it work. ### Motivation and Context As we have such a switch in SessionOption, it should work as expected.	2023-04-06 15:43:13 +08:00
Hariharan Seshadri	ca68ab6126	Support decoder masked self attention for greedy sampling (#15319 )	2023-04-05 23:08:43 -07:00
cloudhan	71a4e7eb97	Automatically enable tunable op usage for production models (#15156 ) Split `IsTunbaleOpEnable` semantics into enable tunable op for using and enable tunable op for tuning. They remain disabled in general for safety purpose. But - if session is created with onnx model with tuning results embeded - the embedded tuning results is set to the EP without error `Status` then we automatically enable the using, tuning remains disabled. The planned options will be - `tunable_op_enable`: The top-level switch of `TunableOp`, indicate if we will run into `TunableOp` related logic. NOTE: most of our impls have a bottom impl that is acting as a fallback and is set as the default. In this case, we still call into the `TunableOp`, but no kernel selection, no kernel tuning and caching is involved. This reduced our maintainance burden of a duplicate code path. - `tunable_op_tuning_enable`: The secondary switch of `TunableOp`, indicate if we will run into the tuning related logic of `TunableOp` Then for the possible future options: - `tunable_op_tuning_max_iteration`: blahblah - `tunable_op_tuning_max_duration_ms`: blahblah - `tunable_op_flash_attention_enable`: blahblah, for example only, we will not have this. For developer oriented envvar, it is for developers' convenience to inspect the performance impact of tuning. So there is only `ORT_ROCM_TUNABLE_OP_ENABLE`, `ORT_ROCM_TUNABLE_OP_TUNING_ENABLE` to take the fine-grind control of combinations.	2023-04-06 13:52:47 +08:00

1 2 3 4 5 ...

8537 commits