onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-13 18:08:13 +00:00

Author	SHA1	Message	Date
Hector Li	bb21031cbb	[QNN EP]Fix issue in LeakyRelu Opbuilder for HTP backend. (#15356 ) ### Description Fix issue in LeakyRelu Opbuilder for HTP backend. Qnn Prelu(Onnx LeakyRelu) requires alpha data as the 2nd input while Onnx set it as attribute. HTP backend requires input to be quantized. It caused Qnn Op validation failed by setting the 2ns input as float32 data type. Fix: Need to set the 2nd input as quantized input for HTP backend. Calculate the quantization parameter and quantize the alpha data into uint8. ### Motivation and Context Unblock models with the LeakyRelu execution on QualComm HTP backend.	2023-04-07 09:15:07 -07:00
pengwa	16f5909f2d	Introduce shrunken gather operator (#15396 ) ### Introduce shrunken gather operator Exist Gather operator schema won't guarantee output element count will be smaller than input element count. Actually, it is possible output element count >, =, or < input element count. For some cases we know for sure output element count MUST be <= input element count, we will upstream those Gather operators to reduce compute flops. So this PR introduces an ShrunkenGather which explicitly guarantee output count will be smaller than input count. The operator add additional restriction on inputs, but still re-use existing Gather's implementations plus input check during runtime. This is a requirement for subsequent optimization (Draft PR: https://github.com/microsoft/onnxruntime/pull/15401) we will do for label sparsity and embedding sparsity.	2023-04-07 15:12:58 +08:00
Adrian Lizarraga	d31dd5935a	[QNN EP] Support Resize's pytorch_half_pixel coordinate transformation mode on HTP (#15390 ) ### Description - Now uses QNN's Resize operator for quantized models - Still uses QNN's ResizeBilinear or ResizeNearestNeighbor for non-quantized models. ### Motivation and Context This update is necessary to support more models on QNN HTP backend. Specifically, we need to support Resize's `pytorch_half_pixel` coordinate transformation mode on HTP.	2023-04-06 23:56:33 -07:00
Hector Li	03dd4e6da3	[QNN EP]fix bug in DlError (#15412 ) ### Description fix bug in DlError. nullptr returned from DlError() will cause crash.	2023-04-06 20:01:08 -07:00
Changming Sun	df11c85955	Download protoc.exe from nuget when cross-compiling (#15395 ) ### Description 1. The protoc package on nuget.org contains binaries for Windows_x86/Windows_x64/Linux_x86/Linux_x64/MacOS_x64, which can cover most use cases. Though it doesn't have binaries for AMR64, they are only needed when we cross-compile for Intel CPUs on ARM CPUs. It is rare. When you have such a need, you always can build protoc from source by yourself and pass it to build.py as "--path_to_protoc_exe". Or if you have security concerns that you don't want to use prebuilt binaries from outside, you can do the same thing. 2. Remove GoogleTestAdapter related thing. That part of code is out of maintain. ### Motivation and Context As a follow-up of PR #15190.	2023-04-06 17:06:59 -07:00
Yuriy Chernyshov	65579021ee	Remove UTF-8 BOM (#15026 )	2023-04-06 16:09:17 -07:00
Aditya Goel	e5617617fc	Float to float label encoder (#15400 )	2023-04-06 16:05:36 -07:00
Hector Li	276c0a00e4	Reuse QDQConv for ConvTranspose to generate the QDQ model (#15385 ) ### Description Reuse QDQConv for ConvTranspose to generate the QDQ model ### Motivation and Context Generate the correct QDQ model	2023-04-06 15:07:44 -07:00
petermcaughan	2bd8e4a130	Petermca/whisper dedup (#15365 ) ### Description Apply `get_shared_initializers()` to the encoder and decoder subgraphs of Whisper before chaining and exporting the full, final model. ### Motivation and Context The Whisper export process has some overlap between the encoder and decoder subgraphs due to the format of the BeamSearch contrib op. Consequently, there is some shared model data that is duplicated in the final exported product, which can result in a file size increase of ~40%. This PR takes the methods in `convert_generation.py` and applies them during the whisper export process. --------- Co-authored-by: Peter McAughan <petermca@microsoft.com>	2023-04-06 13:27:05 -07:00
Dmitri Smirnov	dc1845a9c8	Update mimalloc dependancy to the latest release (2.1.1) for Windows build. (#15382 ) ### Description Update mimalloc dependency. ### Motivation and Context The latest release contains important fixes including memory leaks and used by customers.	2023-04-06 13:07:00 -07:00
petermcaughan	d0cca91cfb	Fix token_id values for whisper export (#15362 ) ### Description The current ONNX export of Whisper utilizes hard-coded values for token_ids when configuring the BeamSearch node. This PR removes these literals and instead takes these values straight from the WhisperConfig. ### Motivation and Context Hard-coding these values can cause some parity issues when comparing to default PyTorch behavior - this change to take from WhisperConfig resolves these. Co-authored-by: Peter McAughan <petermca@microsoft.com>	2023-04-06 11:01:21 -07:00
Deokhwan Kim	55495cc809	Do not apply QuickGeluFusion if an intermediate tensor is a graph output (#15109 )	2023-04-06 10:17:06 -07:00
Stephan Gocht	026fb3ca1e	Fix compilation error when CUDNN_HOME is defined. (#15348 )	2023-04-06 08:56:20 -07:00
Sheil Kumar	0fbbb6a43e	WindowsAI build failing due to deprecated .NET5 SDK missing in build image (#15383 ) WindowsAI build failing due to deprecated .NET5 SDK missing in build image .NET5 was deprecated last year, and recently the build machine images have been updated to not include this SDK. Unblock failing builds by force insalling .NET5 SDK as part of the build pipeline.	2023-04-06 08:51:07 -07:00
Changming Sun	a5b4d2a8a7	XNNPack: allow users to choose whether enable CPU MEM arena or not (#15392 ) ### Description XNNPack: allow users to choose whether enable CPU MEM arena or not. Right now it is hardcoded to true and it is not impacted by the on/off switch in SessionOption. We should make it work. ### Motivation and Context As we have such a switch in SessionOption, it should work as expected.	2023-04-06 15:43:13 +08:00
Hariharan Seshadri	ca68ab6126	Support decoder masked self attention for greedy sampling (#15319 )	2023-04-05 23:08:43 -07:00
cloudhan	71a4e7eb97	Automatically enable tunable op usage for production models (#15156 ) Split `IsTunbaleOpEnable` semantics into enable tunable op for using and enable tunable op for tuning. They remain disabled in general for safety purpose. But - if session is created with onnx model with tuning results embeded - the embedded tuning results is set to the EP without error `Status` then we automatically enable the using, tuning remains disabled. The planned options will be - `tunable_op_enable`: The top-level switch of `TunableOp`, indicate if we will run into `TunableOp` related logic. NOTE: most of our impls have a bottom impl that is acting as a fallback and is set as the default. In this case, we still call into the `TunableOp`, but no kernel selection, no kernel tuning and caching is involved. This reduced our maintainance burden of a duplicate code path. - `tunable_op_tuning_enable`: The secondary switch of `TunableOp`, indicate if we will run into the tuning related logic of `TunableOp` Then for the possible future options: - `tunable_op_tuning_max_iteration`: blahblah - `tunable_op_tuning_max_duration_ms`: blahblah - `tunable_op_flash_attention_enable`: blahblah, for example only, we will not have this. For developer oriented envvar, it is for developers' convenience to inspect the performance impact of tuning. So there is only `ORT_ROCM_TUNABLE_OP_ENABLE`, `ORT_ROCM_TUNABLE_OP_TUNING_ENABLE` to take the fine-grind control of combinations.	2023-04-06 13:52:47 +08:00
Jian Chen	2e52de265a	Upgrade remainding python to 3.11 removing 3.7 (#15321 ) ### Description Upgrade remainding python to 3.11 removing 3.7 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-04-05 21:43:51 -07:00
Thuy Dao	6e1e808ec8	fix error unqualified call to 'std::move' (#15347 )	2023-04-05 20:40:30 -07:00
Yi Zhang	962d8d2b19	Add compilation cache in react native CI (#15329 ) ### Description 1. Replacing jobs with stages for better debugging and maintainance 2. Added compilation cache to accelerate the workflow. 3. Splited building protobuf and major code as 2 tasks ### Motivation and Context Reduced compilation time about one hour. test run: https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=943695&view=logs&j=de302ec2-2305-57e0-e8c6-cd89c569f2a3&t=8b360243-7783-51da-8079-2304089d3d1d	2023-04-06 10:39:14 +08:00
Aditya Goel	a7d321e9dc	String to string label encoder (#15379 )	2023-04-05 14:04:34 -07:00
Leso_KN	ea6b32fea8	Fix: Add def main() in onnxruntime_test.py (#15208 )	2023-04-05 12:31:39 -07:00
Adam Pocock	ef11032c89	[java] Allows the creation and extraction of zero length tensors (#15116 ) ### Description Allows the creation of zero length tensors via the buffer path (the array path with zero length arrays still throws as the validation logic to check it's not ragged would require more intrusive revision), and allows the `tensor.getValue()` method to return a Java multidimensional array with a zero dimension. Also added a test for the creation and extraction behaviour. ### Motivation and Context The Python interface can return zero length tensors (e.g. if object detection doesn't find any objects), and before this PR in Java calling `tensor.getValue()` throws an exception with a confusing error message. Fixes #7270 & #15107.	2023-04-05 10:49:59 -07:00
Patrice Vignola	9191e04259	[DML EP] Add QuickGelu (#15220 )	2023-04-05 10:49:34 -07:00
Justin Chu	a96e19abc4	Add type annotations to `onnxruntime_inference_collection.py` (#15364 ) ### Description Add type annotations to `onnxruntime_inference_collection.py` ### Motivation and Context Fixes #15334	2023-04-05 10:32:49 -07:00
Chen Fu	764e489a00	Adding FP16 Global Average Pool operator (#15324 ) ### Description Adding FP16 Global Average Pool operator ### Motivation and Context Supporting fp16 cpu inference	2023-04-05 09:38:02 -07:00
Aditya Goel	a4e9a48345	Reduce operators support for int64 type (#15358 )	2023-04-05 09:19:43 -07:00
Edward Chen	9f5aa8e021	Add clog back to onnxruntime_EXTERNAL_LIBRARIES. (#15363 ) ### Description <!-- Describe your changes. --> Add clog back to onnxruntime_EXTERNAL_LIBRARIES. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix iOS packaging pipeline build failure.	2023-04-05 09:11:19 -07:00
Hector Li	a0d8dbe28d	Register Resize op into nhwc schema for Qnn EP (#15373 ) ### Description Register Resize op into nhwc schema for Qnn EP. ### Motivation and Context Resize op is identified as layout sensitive op for Qnn EP, need to register it into nhwc schema	2023-04-05 08:41:16 -07:00
George Wu	4db10c93d1	[TensorRT EP] make --use_tensorrt_builtin_parser the default behavior in build.py (#15320 ) Change the default behavior to link against the nvonnxparser library (onnx-tensorrt parser) that is included with the TensorRT package. Previously, the default behavior was to build and statically link against the OSS onnx-tensorrt parser. Historically, we wanted to incorporate the latest commits/fixes from OSS parser. These days the OSS parser is not significantly different from the included parser library so there is less reason to build against it by default. By linking with parser shared library from TensorRT library, the major benefit is it's much easier to build/link against a minor version update of TensorRT. And OnnxRuntime can be updated with a new TensorRT minor version by simply replacing TensorRT libraries with the newer version. (because the parser is no longer statically linked into onnxruntime) Added --use_tensorrt_oss_parser to build.py to support the previous default behavior. (build + static link OSS parser)	2023-04-05 07:53:29 -07:00
pengwa	fe0db63dee	Upstream reshape of merging batch/sequence (#15023 ) ### Upstream reshape of merging batch/sequence For Reshape node that fulfills following requirements: - input data rank = 3 - input shape is constant initializer, the untorched dim value MUST be a constant value. - Reshape is merging the first dimension, so output data rank = 2. We upstream it to make it run as earlier as possible. Doing this will allow us to upstream other operators (Gather) that is blocked by those kind of Reshape node. Currently, we did not enable it in graph_transformer_utils, since the combined upstream gather changes are not ready yet. Before: ![image](https://user-images.githubusercontent.com/10530022/224698252-f9705082-9710-4385-95ec-f1ccf50dc0e3.png) After: ![image](https://user-images.githubusercontent.com/10530022/224698381-7e124d0d-ba47-4f35-8e37-6015014cd1c4.png)	2023-04-05 18:51:07 +08:00
Baiju Meswani	6b755debbc	Miscellaneous updates to training artifact generation (#15315 )	2023-04-04 20:09:51 -07:00
Nhat Nguyen	198994d01d	Register PytorchAtenDomain in RegisterOrtOpSchemas (#14567 )	2023-04-04 17:34:13 -07:00
Hariharan Seshadri	5294cd0c55	Print value errors in ort.InferenceSession to user (#15360 )	2023-04-04 16:01:24 -07:00
Anton Korablin	207c57219a	Add support for full ViT optimization (#15289 ) Add support for ViT optimization in optimizer.py As ViT architecture follows BERT rather closely, we can easily reuse BERT fusions for ViT. The only difference is that ViT does not have attention mask, which means there is no Add node in qk paths. Make the necessary changes in onnx_exporter.py to be able to cover optimizations with test.	2023-04-04 14:05:24 -07:00
Aditya Goel	1c1d386561	Adds int32_t and uint32_t clip kernels (#15306 )	2023-04-04 13:44:50 -07:00
Hariharan Seshadri	adb3d5dcb9	Allow constant folding nodes that have missing optional inputs (#15344 )	2023-04-04 11:55:37 -07:00
Severin Simmler	4400e80452	Allow `Path` objects for deserialization of ONNX models (#15307 )	2023-04-04 11:38:00 -07:00
Jian Chen	af28754e6f	Update python package pipeline to support 3.11 (#15311 ) ### Description Update python package pipeline to support 3.11 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-04-04 10:55:32 -07:00
Ye Wang	0412bffbb4	fix build bug when enabling DEBUG_GENERATION (#15338 )	2023-04-04 09:44:07 -07:00
petermcaughan	1251964f96	Petermca/beamsearch whisper (#15339 ) ### Description Adjust various code paths to allow Whisper model to function with BeamSearch op. Approach: Add a new kModelType enum value in IGenerationParameters as so: #### Old: 0 = GPT2, 1 = T5 #### New: 0 = GPT2, 1 = T5, 2 = Whisper When the user assigns this attribute value to 2, various shape and type checks are changed to accommodate Whisper inputs. ### Motivation and Context BeamSearch is currently designed to function with BERT-based models with inputs as vocab tokens, and needs changes to function with Whisper inputs (3-D float values processed from audio data). --------- Co-authored-by: Peter McAughan <petermca@microsoft.com>	2023-04-04 09:09:10 -07:00
Yi Zhang	b54ca9a041	Read the cache in main build if it's a (Intermediate)merge branch. (#15330 ) ### Description In merge branch, the run only reads the cache generated in main build. As a result, each run in merge branch will not upload new cache except at the first time. ### Motivation and Context 1.Reduce the cache storage. If there's some big changes, devs should trigger the specific builds manually in https://dev.azure.com/onnxruntime/onnxruntime/_build. It still reads own branch cache.	2023-04-04 20:21:05 +08:00
pengwa	5baf5f506b	log level control + fix typos (#15302 ) ### log level control + fix typos	2023-04-04 20:19:13 +08:00
petermcaughan	f30e2d4387	Whisper Export (#15247 ) ### Description Add scripts to export Whisper model to ONNX and integrate the ORT BeamSearch op with the resulting graphs. Example command to execute this script: python convert_to_onnx.py -m openai/whisper-large --output whisper -e --------- Co-authored-by: Peter McAughan <petermca@microsoft.com>	2023-04-04 05:01:04 -07:00
Tianlei Wu	3cf3fa0467	Fix prefast warnings (#15340 ) Fix prefast warnings: (1) Arithmetic overflow: Using operator '' on a 4 byte value and then casting the result to a 8 byte value. Cast the value to the wider type before calling operator '' to avoid overflow (io.2). (2) Dereferencing NULL pointer 'key'.	2023-04-03 22:29:13 -07:00
Ye Wang	dec11afb83	Fix a prefast warning (#15343 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> https://aiinfra.visualstudio.com/ONNX%20Runtime/_workitems/edit/14272/?triage=true	2023-04-03 18:25:25 -07:00
Hector Li	44027797b0	[QNN EP] Gather support int64 indices input (#15317 ) ### Description Gather support int64 indices input ### Motivation and Context Support more scenario	2023-04-03 17:51:42 -07:00
Matthieu Darbois	85bb13345d	Rework some external targets to ease building with `-DFETCHCONTENT_FULLY_DISCONNECTED=ON` (#15323 ) ### Description Rework some external targets to ease building with `-DFETCHCONTENT_FULLY_DISCONNECTED=ON` This will allow package managers to more easily provide an onnxruntime package by reducing the amount of patching needed downstream at each version. ### Motivation and Context Availability of onnxruntime in some C++ package managers https://github.com/microsoft/onnxruntime/issues/7150 https://github.com/conan-io/conan-center-index/issues/16699 https://github.com/microsoft/vcpkg/issues/20548 My initial intent is to get this in conan but the PR would most likely be useful (though not tested) to vcpkg as well (and maybe others). I tried to get only a first batch of not too specific patches (i.e. not specific to conan). The first commit reworks `flatbuffers` and just extends what @snnn did in https://github.com/microsoft/onnxruntime/pull/13991 The second commit reworks `pytorch_cpuinfo` The third commit reworks `google_nsync`	2023-04-03 17:45:12 -07:00
RandySheriffH	e4aae94f20	Remove azure build to unblock PRs (#15336 ) Temporarily remove Azure build check to unblock PR(s). We need to investigate the sudden build failure and reenable. Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-04-03 12:47:14 -07:00
Ye Wang	fbfe92f66a	DecoderMaskedMultiHeadAttention enhancement (#15292 )	2023-04-02 21:53:03 -07:00

1 2 3 4 5 ...

8504 commits