onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-01 03:45:06 +00:00

Author	SHA1	Message	Date
cloudhan	9110e5b9bd	[ROCm] Add attention kv cache for decoding (#16076 )	2023-06-16 14:17:56 +08:00
Tianlei Wu	96471491d7	Fix test failure in debug CUDA build (#16370 ) Fix assertion failure in onnxruntime_test_all in debug build with CUDA, which is caused by a test case added in https://github.com/microsoft/onnxruntime/pull/16075. Remove an assumption that bias exists in MultiHeadAttention.	2023-06-15 23:16:16 -07:00
Tianlei Wu	1866a9d818	Use the lowest float for causal mask (#16369 ) Always set causal mask to the lowest float. Note that since huggingface transformers v4.21, gpt2 uses lowest half for FP16, and lowest float for FP32: `66fd3a8d62/src/transformers/models/gpt2/modeling_gpt2.py (L199)` Assume that most fp16 ONNX models are converted from fp32 models. We decided to use lowest float32 for both half and float model for consistency. The mask_filter_value only applies to raw attention mask (2D, 3D or 4D). For 1D mask, masked item is 0.0 after softmax so mask filter value is the lowest float for 1D mask. * For BERT model, when users use 1D mask (required by FMHA) and mask_filter_value is not applicable. * For BERT or GPT-2, when fused kernel is used, mask_filter_value has no impact ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> https://github.com/microsoft/onnxruntime/issues/12843 https://github.com/microsoft/onnxruntime/issues/14363	2023-06-15 21:32:29 -07:00
PeixuanZuo	bcdb81c563	[Whisper] add a fusion option to split input bias from MHA/DMHA (#16049 ) Whsiper model contains five different types of attention, q, k, v bias was fused into Attention/MHA/DMHA op, encoderdecoderinit subgraph - Attention: encoder attention - Attention: decoder self attention + present k, v - MultiHeadAttention: decoder cross attention + present k and v. q and v have bias. decoder subgraph - DecoderMultiHeadAttention: decoder cross attention + past k, v. q has bias - DecoderMultiHeadAttention: decoder self attention + past/present k, v. q, k, v have bias. For ROCm EP, MHA/DMHA doesn't support additional bias. This PR add a fusion option `disable_multi_head_attention_bias` to split q.k,v bias from MHA/DMHA.	2023-06-16 10:29:48 +08:00
Jeff Bloomfield	6949cfaf94	Fix MS domain QuantizeLinear and DequantizeLinear type registrations … (#16298 ) This fixes the type lists used to register DML kernels for Microsoft domain QuantizeLinear and DequantizeLinear. These previously did not include FP16 and incorrectly used the same type list for both operators. The new type lists are the same as opset 19 ONNX which aren't implemented yet in the DML EP.	2023-06-15 18:21:56 -07:00
Changming Sun	188d5f5398	Fix Linux Multi GPU build pipeline (#16368 ) ### Description The build pipeline runs on Azure NV12 machines that will be deprecated soon because the SKU is too old. So this PR will move the pipeline to a Windows machine with two A10 GPUs.	2023-06-15 16:24:46 -07:00
Changming Sun	5754cd7d1d	Add fp16 support to CPU EP gemm op (#15506 )	2023-06-15 14:38:17 -07:00
Skand Hurkat	67093b204d	Clean up aarch64 quantized GEMM dispatch (#16120 ) ### Description - Add a new field to `MLAS_PLATFORM` for S8S8 GEMM dispatch. - Set this field to either dot product instructions or NEON MLA in platform.cpp. - Clean up dispatch selector in qgemm.h. ### Motivation and Context This will allow future extensibility as other functions that use other ARM64 extensions for quantized matrix multiplication. --------- Co-authored-by: Skand Hurkat <skhurkat@microsoft.com>	2023-06-15 14:24:40 -07:00
Guenther Schmuelling	5c0d5768e7	make package.json more rebost (#16366 ) "default" should be last element for exports. This fixes "Module not found: Error: Default condition should be last one" when importing the onnxruntime-web package in some conditions.	2023-06-15 14:17:37 -07:00
Hariharan Seshadri	63f5573354	Relax node placement check for CUDA Graph usage (#16358 )	2023-06-15 14:03:08 -07:00
Dipanjan Sengupta	681a0d084d	Removing AMX build flag (#16086 ) ### Description 1. Replacing AMX intrinsics with machine code macro instructions in QGEMM kernel. 2. Removing AMX build flags for GCC in cmake file. ### Motivation and Context The additional AMX flag in cmake adds an extra layer of dependency on GCC version to use the feature.These changes should allow the usage of the AMX feature with just the CPU ID check.	2023-06-15 11:22:59 -07:00
Rachel Guo	65434dce57	Bump decode-uri-component from 0.2.0 to 0.2.2 in /js/react_native/e2e (#16329 ) ### Description <!-- Describe your changes. --> As title. Similar as this pr: https://github.com/microsoft/onnxruntime/pull/13846 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> To resolve component governance alert. https://aiinfra.visualstudio.com/Lotus/_componentGovernance/97926/alert/8087084?typeId=16589570 Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>	2023-06-15 10:30:48 -07:00
Yulong Wang	4f7900b553	[js/web] enable ONNX Runtime Web error messages in JS (#16335 ) ### Description enabling passing error messages from C++ to JavaScript so that when ORT Web API fails it generates more verbose errors.	2023-06-15 09:45:41 -07:00
Yi Zhang	3e99e43a1d	extend Final AAR testing timeout limit (#16340 ) ### Description <!-- Describe your changes. --> ### Motivation and Context improve nuget pipeline stability	2023-06-15 17:27:45 +08:00
pengwa	735a32fee1	Introduce memory observer for ORTModule (#16213 ) ### Introduce memory observer for ORTModule To analyze memory usage for ORTModule training, we need collect per-iteration memory footprint in different stages (pre-forward, post-forward, pre-backward, and post-backward). Currently we only collect the data using torch.cuda APIs. The next step is, we could collect the detailed stashed activation list and its percentage within ORT backend, which is beyond this PR. Sample as below: ``` 0/8] step 0 memory (MiB) \| phase: pre_forward \| allocated: 1866 \| max allocated: 1866 \| cached: 1874 \| max cached: 1874 \| inactive: 8 \| max inactive: 8 [0/8] step 0 memory (MiB) \| phase: post_forward \| allocated: 23277 \| max allocated: 26215 \| cached: 26406 \| max cached: 26406 \| inactive: 193 \| max inactive: 405 [0/8] step 0 memory (MiB) \| phase: pre_backward \| allocated: 23277 \| max allocated: 26215 \| cached: 26406 \| max cached: 26406 \| inactive: 193 \| max inactive: 405 [0/8] step 0 memory (MiB) \| phase: post_backward \| allocated: 2932 \| max allocated: 26215 \| cached: 26406 \| max cached: 26406 \| inactive: 6158 \| max inactive: 6158 0%\|█ \| 1/200 [00:26<1:26:18, 26.02s/it] [0/8] step 1 memory (MiB) \| phase: pre_forward \| allocated: 2356 \| max allocated: 26215 \| cached: 26406 \| max cached: 26406 \| inactive: 2454 \| max inactive: 6165 [0/8] step 1 memory (MiB) \| phase: post_forward \| allocated: 23767 \| max allocated: 26705 \| cached: 29342 \| max cached: 29342 \| inactive: 2639 \| max inactive: 6165 [0/8] step 1 memory (MiB) \| phase: pre_backward \| allocated: 23767 \| max allocated: 26705 \| cached: 29342 \| max cached: 29342 \| inactive: 2639 \| max inactive: 6165 [0/8] step 1 memory (MiB) \| phase: post_backward \| allocated: 3422 \| max allocated: 26705 \| cached: 29342 \| max cached: 29342 \| inactive: 5284 \| max inactive: 6165 1%\|██ \| 2/200 [00:26<36:47, 11.15s/it] [0/8] step 2 memory (MiB) \| phase: pre_forward \| allocated: 2356 \| max allocated: 26705 \| cached: 29342 \| max cached: 29342 \| inactive: 2454 \| max inactive: 6165 [0/8] step 2 memory (MiB) \| phase: post_forward \| allocated: 23767 \| max allocated: 26705 \| cached: 29342 \| max cached: 29342 \| inactive: 2639 \| max inactive: 6165 [0/8] step 2 memory (MiB) \| phase: pre_backward \| allocated: 23767 \| max allocated: 26705 \| cached: 29342 \| max cached: 29342 \| inactive: 2639 \| max inactive: 6165 [0/8] step 2 memory (MiB) \| phase: post_backward \| allocated: 3422 \| max allocated: 26705 \| cached: 29342 \| max cached: 29342 \| inactive: 5284 \| max inactive: 6165 ```	2023-06-15 15:45:36 +08:00
pengwa	574e17ade4	Fix Reshape check (#16349 ) ### Fix Reshape check 3D->2D reshape by merging the first dims. There is a bug for the case. ```mermaid stateDiagram [768,12,64] --> Reshape (—1,768) --> Reshape Reshape --> [768,768] ``` The Reshape pass the upstream Reshape check, but it should not. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-06-15 13:50:53 +08:00
PeixuanZuo	097346be9d	[ROCm] Add clean step for ROCm CI pipeline (#16336 ) 1. Add clean step for ROCm CI pipeline 2. Fix error "device or resource busy" bug by setting umount dataset step as `always()` step.	2023-06-15 13:44:12 +08:00
Baiju Meswani	5eec24837f	Fix for AMD GPU pipeline (#16357 )	2023-06-14 20:36:16 -07:00
Wanming Lin	73dad4452b	[WebNN EP] Support Shape op (#16282 ) Since WebNN API doesn't support shape op, in the WebNN EP, we calculate the ONNX Shape node output and pass the values to a WebNN's constant + slice as workaround.	2023-06-14 20:31:01 -07:00
Changming Sun	dbc7a195b1	Update win-ci-pipeline.yml: enable xnnpack tests (#16244 ) 1. Enable xnnpack test 2. Change TSA database name from onnxruntime_master to onnxruntime_main. This is a leftover of renaming the "master" branch to "main" 3. Add two static analysis jobs for WinML and DML 4. Rename the machine pool "aiinfra-dml-winbuild" to "onnxruntime-Win2019-GPU-dml-A10", so that the internal and public ADO instances use the same machine pool name. 5. Move Windows GPU CI build pipeline from "onnxruntime-Win2022-GPU-T4" to "onnxruntime-Win2022-GPU-A10" machine pool, because we do not have enough T4 GPUs.	2023-06-14 19:12:42 -07:00
Tianlei Wu	9be133231f	Fix cuda graph capture (#15005 ) Fix two issues related to cuda graph capture: https://github.com/microsoft/onnxruntime/issues/14942 and https://github.com/microsoft/onnxruntime/issues/15002 Issue 1: Previously, graph capture starts at the second run. However, memory pattern optimization will allocate memory from the second run, and cudamalloc is not allowed during graph capture. In this PR, the graph capture will start graph capture after 2 runs to avoid the issue. Issue 2: https://github.com/microsoft/onnxruntime/pull/13495 introduced multiple stream support. But stream cleanup will call cudaStreamSyncronize which is not allowed in cuda graph capture. In this PR, we move stream cleanup after cuda graph capture. Update the squeeze net test model with dynamic axis so that we can test with larger batch size. Add a test that could reproduce the bug (when changing min runs from 2 back to 1).	2023-06-14 18:10:20 -07:00
Baiju Meswani	8a3de16d14	Temporary fix to make the training pipeline green (#16353 )	2023-06-14 13:11:35 -07:00
Baiju Meswani	ed2482667b	Fix training pipeline (#16342 )	2023-06-13 15:06:38 -07:00
zesongw	c5176ed122	[WebNN EP] Add several new unary Ops (Ceil, Exp, Identity, Reciprocal, Tan) (#16302 ) ### Description - Add new Ops: Ceil, Exp, Identity, Reciprocal, Tan. - Set MinSupportedOpSet for unary Ops. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Support more Ops for other models. The legacy optimization attribute "consumed_inputs" is not supported in WebNN EP.	2023-06-13 08:14:55 -07:00
Edward Chen	4f23577cb5	[React Native] Publish E2E test logs on build failure too. (#16327 ) ### Description <!-- Describe your changes. --> Publish E2E test logs on build failure too. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Get more information about intermittent test failures.	2023-06-12 17:56:46 -07:00
Yulong Wang	e3e4926d00	[js/common] allow import onnxruntime-common as ESM and CJS (#15772 ) ### Description allow import onnxruntime-common as ESM and CJS.	2023-06-12 12:05:11 -07:00
Sheil Kumar	0df9e42960	User/sheilk/register div nonzero (#16309 ) [DML EP] NonZero supported datatypes has incorrect number of template datatypes 2 should be 1	2023-06-12 10:11:59 -07:00
satyajandhyala	889f80082f	[js/web] Added Reduce operators support (#16122 ) ### Description Added support for ReduceL1, ReduceL2, ReduceMean, ReduceMin, ReduceMax, ReduceSum, ReduceLogSum, ReduceLogSumExp, ReduceProd and ReduceSquareSum. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Satya Jandhyala <sajandhy@microsoft.com> Co-authored-by: guschmue <guschmue@microsoft.com>	2023-06-12 07:46:27 -07:00
pengwa	40bcc0441b	Enhance StatisticsSubscriber (#16098 ) ### Enhance StatisticsSubscriber There are few improvements for `StatisticsSubscriber`: - Reduce peak memory impact for tensors (having many many many elements, consuming too much GPU memory, causing original recipe run failed with OOM), by split the statistics into two phases (split into buckets, and merge result across buckets). - Allow dump intermediate tensors. Originally only nn.Module forward()'s return value are dumped, there are requirements we want to inspect some specific intermediate tensor in the forward() function, now we support it. - Add documents for collecting dumps on multiple ranks Docs link on this branch for better view: https://github.com/microsoft/onnxruntime/blob/pengwa/conv_tool_v2/docs/ORTModule_Convergence_Notes.md --------- Co-authored-by: mindest <30493312+mindest@users.noreply.github.com>	2023-06-12 18:32:08 +08:00
JiCheng	eed02a3f78	Xnnpack QDQ test (#16281 ) ### Description A few QDQ tests failed on XNNPACK EP. The reason should be the range of input_data doesn't fit for scale and zero_point. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-06-12 14:00:42 +08:00
zhangsibo1129	97751ad516	[CANN] Fix registration of Identity operator (#16210 ) ### Description <!-- Describe your changes. --> This [PR](`e726151b5c (diff-6957596681c25d78e7f3f56485f307fb7e66369309523240209a62c8fa21646b)`) introduces a missing registration of Identity operator for version greater than 14. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> It broke the CANN CI. I added the registration of identity operator.	2023-06-10 17:23:21 -07:00
JiCheng	5ab51694ab	gather OP with scalar indice in NNAPI EP (#16279 ) ### Description NNAPI Doesn't support the indices input of Gather to be a scalar. To workaround it. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-06-10 09:32:07 +08:00
Yulong Wang	59f42cccb8	[js/common] refactor tensor type in onnxruntime-common (#15843 ) ### Description <!-- Describe your changes. --> refactor tensor type in onnxruntime-common. ### Motivation and Context There major motivation is that I am doing a local change to address the API part of #15312. And I am doing a refactoring of onnxruntime-common anyway (#15772). The `tensor.ts` and `tensor-impl.ts` are too large, so I split contents into multiple files to make the type declarations clearer. The original target of this change is for API only ( ie. do not refactor any implementation.). However, there are a few type/implementation inconsistencies so I also made minimal changes to fix them. ### Changes - extract `TensorUtils` for non-template interfaces - extract `TensorFactory` for all overloads of `Tensor.fromImage()` - refactor options type that used for `Tensor.fromImage()` - fix JSDoc comments to make option descriptions consistent with actual type declarations - fix an inconsistency for `options.format` and `options.bitmapFormat`; change all `bitmapFormat` to `format` - extract `ConversionUtils` for `tensor.toDataURL()` and `tensor.toImageData()` - put implementations into multiple files from `tensor-impl.ts` - fix a bug that cause unittest fail. put comments for future fix.	2023-06-09 16:19:29 -07:00
Yulong Wang	f274bbb0c8	[js] add API that allows to get package version (#16207 ) ### Description Add an API for users to get version of current package. example usage: ```js import { env } from 'onnxruntime-node'; console.log(env.versions.node); // output "1.16.0" ``` ```js import { env } from 'onnxruntime-web'; console.log(env.versions.web); // output "1.16.0" console.log(env.versions.common); // output "1.16.0" console.log(env.versions.node); // output "undefined" ``` #16156	2023-06-09 16:18:53 -07:00
Yi Zhang	3b5a8352c1	CodeSign Mac packages in nuget pipeline (#16291 ) ### Description 1. Updated Mac package workflow for easily debugging. 2. Changed Archive type from tgz to zip since zip is supported by ESRP. 3. .../dylib.dSYM/Contents/Resources/DWARF/libonnxruntime.1.16.0.dylib is a debug symbol file, so it couldn't be signed. ### Motivation and Context It‘s required from VS code. Mac binaries in nuget should be signed	2023-06-10 06:35:47 +08:00
Adrian Lizarraga	1a22d245e2	[QNN EP] Fix auto_pad handling for Conv operator (#16299 ) ### Description Correctly sets padding when the `auto_pad` attribute is specified for Conv operator. ### Motivation and Context Needed to correctly translate ONNX Conv to QNN Conv2d.	2023-06-09 09:23:08 -07:00
Edward Chen	b668a6da96	Treat Objective-C static analysis warnings as errors (#16293 ) - Update Objective-C static analysis check to fail on warnings. - Address warning. - Clean up build definition.	2023-06-09 08:51:49 -07:00
Scott McKay	443f553782	Fix native onnxruntime library not loading in Azure App Service (#16286 ) ### Description <!-- Describe your changes. --> SetThreadDescription isn't available in an Azure App Service sandbox. #15219 removed a check that it was available, making it a hard dependency. When it's not available the dll load fails with a 'procedure not found' error. Add back the check. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> #15375 - although note this has nothing to do with the original issue. This is just for https://github.com/microsoft/onnxruntime/issues/15375#issuecomment-1579464889	2023-06-09 18:40:51 +10:00
Hector Li	a9d47f72a4	[QNN EP] Add model description into context binary file metadata for validation (#16248 ) ### Description Add model description into context binary file metadata for validation ### Motivation and Context Dump more information for validation --------- Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>	2023-06-08 22:13:43 -07:00
Hector Li	d1e8d4a261	[QNN EP] Fix an issue for Conv with dynamic weights (#16235 ) ### Description Fix an issue for Conv with dynamic weights Root cause: Conv op builder create the weight input tensor with wrong name. With dynamic weight, Transpose node is inserted. Conv op builder should use the new name which is Transpose output. It cause the weight producer has wrong output shape.	2023-06-08 17:09:35 -07:00
Jhen-Jie Hong	ac8444f299	[js/rn] Implement dispose native method (#16131 ) ### Description <!-- Describe your changes. --> Implement `dispose` react native method. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Currently we are not able to release the memory used by model in JS runtime if we don't want to use it anymore, we can do that only by reload app on debug or restart app on release.	2023-06-09 09:17:33 +10:00
Adrian Lizarraga	b48628f1cd	[QNN EP] Add tests for large inputs that trigger memory alloc errors (#16223 ) ### Description Adds tests for operators that return error 1002 (QNN_COMMON_ERROR_MEM_ALLOC) when the call to graphFinalize() fails. This seems to happen for large input sizes. Operators: - Sub - Div - Conv - MaxPool ### Motivation and Context This documents bugs that need to be addressed with unit tests.	2023-06-08 15:47:51 -07:00
Changming Sun	b72fe664c1	Refactor prepack buffer code (#16280 ) ### Description 1. Use IAllocatorUniquePtr to replace BufferUniquePtr. It will ensure the deleter is always right. 2. Change some std::unique_ptr to std::optional 3. Bypass Arena allocator when allocating the prepack buffers for mlas. In this special case, Arena doesn't help any. And this change is just an internal implementation change, it doesn't affect our public interface.	2023-06-08 14:42:02 -07:00
Sheil Kumar	9d52632da9	[DML EP] Register Div with int64 and NonZero with bool (#16276 ) [DML] Register Div with int64 and NonZero with bool These data types are supported by DML	2023-06-08 13:49:39 -07:00
kunal-vaishnavi	79e0230002	Add vocab masks to Whisper export with beam search (#16180 ) ### Description This PR adds flags for exporting Whisper with vocab masks for logits processing. This PR also sets `input_features` back to FP32 precision for the user and casts `input_features` to FP16 precision when needed. ### Motivation and Context This helps enable specific logits processing for the exported Whisper model.	2023-06-08 12:36:35 -07:00
Yuriy Chernyshov	a3a443c804	Support re2 == 2023-06-02 (#16257 ) ### Description google/re2 [was switched](`49d776b9d2`) to absl::string_view in version 2023-06-02. As `absl::string_view` is a drop-in replacement for `std::string_view` it does not have `as_string()` method. This PR ensures the forward compatibility with the newest versions of re2 library.	2023-06-08 11:26:26 -07:00
Scott McKay	b07b647f66	Fix some issues with NNAPI Softmax (#16095 ) ### Description <!-- Describe your changes. --> Update NNAPI Softmax to coerce to 2D when opset is < 13. This prevents the layout change to NHWC from breaking the implementation, as well as making it work correctly when the ONNX node's axis != 1. Add check for opset 13+ that axis is inner-most dimension as we don't currently handle any other value correctly. Update tests to add model to check NHWC layout, as well as 4D tests. We didn't notice the issues with the NNAPI EP as it was only processing input shapes that were 2D or 4D (which was overly restrictive as well). ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> #15949	2023-06-08 13:56:06 +10:00
Artur	dc1312cfb1	[web] fix: Provide typings for exports (#16249 ) ### Description Adds typings to be compatible with `moduleResolution: bundler` ### Motivation and Context Fixes #16242	2023-06-07 14:52:36 -07:00
Changming Sun	fe0cc8ce62	Remove some usages of CUDA_VERSION macro (#16199 ) ### Description We should avoid using the macro since the value of the macro is inaccurate. For example, our prebuilt packages are built with CUDA 11.8 but people may run the binaries with CUDA 11.4. (The minimal CUDA version we support is CUDA 11.4) A runtime function should be used to determine CUDA version. Like: ```C++ int cuda_runtime_version = 0; CUDA_CALL_THROW(cudaRuntimeGetVersion(&cuda_runtime_version)); ORT_ENFORCE(cuda_runtime_version >= 11040, "ONNX Runtime needs cuda runtime higher than 11.4"); ```	2023-06-07 14:34:22 -07:00
Dmitri Smirnov	908e940660	[CPP Api] Remove deprecated CustomOp API (#16256 ) ### Description Custom Op API has been deprecated in 1.15 release. We are removing it.	2023-06-07 14:03:13 -07:00

1 2 3 4 5 ...

8979 commits