onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-04 04:07:22 +00:00

Author	SHA1	Message	Date
Sumit Agarwal	f4f49535a4	[ORT 1.18.2] Cherry Pick Pad Optimizations + Update DML to 1.15.1 (#21670 ) ### Description This change cherry-picks 2 Pad fusion optimization: https://github.com/microsoft/onnxruntime/pull/21640 and https://github.com/microsoft/onnxruntime/pull/21556. It also has to cherry-pick 2 extra changes to unblock pipeline and dependency failure: https://github.com/microsoft/onnxruntime/pull/21300 and https://github.com/microsoft/onnxruntime/pull/21662 (didn't include test which are part of 1.18.1 payload). Also uploaded new version of [onnxruntime_build_dependencies:10.177](https://dev.azure.com/onnxruntime/onnxruntime/_artifacts/feed/onnxruntime/UPack/onnxruntime_build_dependencies/overview/1.0.177) and updated the same in `download-deps.yml`. Additionally it also updates DML binary to 1.15.1. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Changming Sun <chasun@microsoft.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com>	2024-08-12 07:02:00 -07:00
Adrian Lizarraga	387127404e	[ORT 1.18.1 Release] Update ORT numpy dependency to >=1.21.6,<2.0 (#21141 ) ### Description Updates the version of numpy required by onnxruntime to >=1.21.6,<2.0 ### Motivation and Context Numpy released version 2.0. The onnxruntime 1.18.1 release is using numpy < 2.0, so we need to update requirement files to only install versions between 1.21.6 and 2.0 (non-inclusive).	2024-06-24 11:24:55 -07:00
Yifan Li	d0aee204af	[ORT 1.18.1 Release] Cherry pick 3rd round (#21129 ) ### Description <!-- Describe your changes. --> Adding critical TensorRT EP support ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com> Co-authored-by: Ye Wang <52801275+wangyems@users.noreply.github.com> Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com> Co-authored-by: pengwa <pengwa@microsoft.com> Co-authored-by: wejoncy <wejoncy@163.com> Co-authored-by: Yi Zhang <zhanyi@microsoft.com> Co-authored-by: Yi Zhang <your@email.com> Co-authored-by: Pranav Sharma <prs@microsoft.com> Co-authored-by: Adam Pocock <adam.pocock@oracle.com> Co-authored-by: cao lei <jslhcl@gmail.com> Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com> Co-authored-by: inisis <46103969+inisis@users.noreply.github.com> Co-authored-by: Jeff Bloomfield <38966965+jeffbloo@users.noreply.github.com> Co-authored-by: mo-ja <60505697+mo-ja@users.noreply.github.com> Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com> Co-authored-by: Sumit Agarwal <sumitagarwal330@gmail.com> Co-authored-by: Atanas Dimitrov <70822030+neNasko1@users.noreply.github.com> Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com> Co-authored-by: Yufeng Li <liyufeng1987@gmail.com> Co-authored-by: Dhruv Matani <dhruvbird@gmail.com> Co-authored-by: Dhruv Matani <dhruv.matani@grammarly.com> Co-authored-by: wangshuai09 <391746016@qq.com> Co-authored-by: Xiaoyu <85524621+xiaoyu-work@users.noreply.github.com> Co-authored-by: Xu Xing <xing.xu@intel.com> Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com> Co-authored-by: Rachel Guo <35738743+YUNQIUGUO@users.noreply.github.com> Co-authored-by: Sai Kishan Pampana <sai.kishan.pampana@intel.com> Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: Jian Chen <cjian@microsoft.com> Co-authored-by: Shubham Bhokare <32080845+shubhambhokare1@users.noreply.github.com> Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com> Co-authored-by: Andrew Fantino <15876180+afantino951@users.noreply.github.com> Co-authored-by: Thomas Boby <thomas@boby.uk> Co-authored-by: Tianlei Wu <tlwu@microsoft.com> Co-authored-by: Scott McKay <skottmckay@gmail.com> Co-authored-by: Michal Guzek <mguzek@nvidia.com> Co-authored-by: George Wu <jywu@microsoft.com> Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>	2024-06-24 10:02:38 -07:00
Yifan Li	8bfcf14b42	[ORT 1.18.1 Release] update 1.18.1 patch release version (#21143 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-06-24 09:20:31 -07:00
Yifan Li	25ab935664	[ORT 1.18.1 Release] Cherry pick 2nd round (#21111 ) ### Description <!-- Describe your changes. --> [#21062](https://github.com/microsoft/onnxruntime/pull/21062), [#21096](https://github.com/microsoft/onnxruntime/pull/21096) to fix Xamarin, [#21095](https://github.com/microsoft/onnxruntime/pull/21095) and [#21109](https://github.com/microsoft/onnxruntime/pull/21109) to fix python on NuGet_Packaging stages [#21104](https://github.com/microsoft/onnxruntime/pull/21104) to remove failing roslynanalyzer on NuGet_Packaging stages ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Scott McKay <skottmckay@gmail.com> Co-authored-by: Yi Zhang <zhanyi@microsoft.com> Co-authored-by: Changming Sun <chasun@microsoft.com>	2024-06-20 17:01:57 -07:00
Yifan Li	91fb865058	[ORT 1.18.1 Release] Cherry pick 1st round (#21105 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Jian Chen <cjian@microsoft.com> Co-authored-by: Yi Zhang <zhanyi@microsoft.com> Co-authored-by: Changming Sun <chasun@microsoft.com> Co-authored-by: Ye Wang <52801275+wangyems@users.noreply.github.com> Co-authored-by: Your Name <you@example.com>	2024-06-19 22:10:58 -05:00
Yi-Hong Lyu	45737400a2	[ORT 1.18.0 Release] Cherry pick 3rd/Final round (#20677 ) Co-authored-by: Rachel Guo <35738743+YUNQIUGUO@users.noreply.github.com> Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local> Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: Tianlei Wu <tlwu@microsoft.com> Co-authored-by: George Wu <jywu@microsoft.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> Co-authored-by: Jian Chen <cjian@microsoft.com>	2024-05-15 00:14:29 -07:00
Yi-Hong Lyu	ed349b9d9d	Mark end of version 17 and 18 C API (#20671 ) Additionally, these versions are safeguarded by the `static_assert`.	2024-05-14 02:26:15 -07:00
Yi-Hong Lyu	d72b476723	[ORT 1.18.0 Release] Cherry pick 2nd round (#20620 )	2024-05-10 01:23:14 -07:00
Yi-Hong Lyu	65f3fbf137	[ORT 1.18.0 Release] Cherry pick 1st round (#20585 )	2024-05-08 08:42:07 -07:00
Yi-Hong Lyu	204f1f59b9	Run fuzz testing before the CG task cleans up the build directory (#20500 ) (#20516 ) ### Description <!-- Describe your changes. --> Update order of steps ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix CI Co-authored-by: Scott McKay <skottmckay@gmail.com>	2024-04-30 02:51:13 -07:00
Satya Kumar Jandhyala	21b3cbc3af	[WIP][JS/WebGPU] Inputs Key and Value could be 4-dims. (#20470 ) ### Description The Key and Value inputs could be 4-dims ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-25 13:33:46 -07:00
Edward Chen	2c19db0af1	Put x64 specific benchmark code into ifdefs. (#20456 )	2024-04-25 12:33:12 -07:00
Frank Dong	227c4419fc	add bf16 support for few ops (#20385 ) ### Description Add bf16 support for below ops: ConstantOfShape Exp Erf convolution PythonOp ### Motivation and Context phimm model works on bf16, ORT need support bf16 on previous ops to work with phimm on bf16	2024-04-25 11:28:34 -07:00
Yi Zhang	464f199b95	Extend mac package jobs time out limit (#20459 )	2024-04-25 10:13:13 -07:00
Yi-Hong Lyu	edffa2a180	Optimize MlasComputeSoftmax with prefetch (#20393 ) The prefetching instructions (_mm_prefetch) is used to anticipate memory accesses by prefetching the next row of the input buffer. This optimization is designed to reduce the impact of memory latency, thereby enhancing the performance of the MlasComputeSoftmax function. As a result, the worst-case performance of the OCR model has improved by approximately 50ms, which equates to a 3% improvement.	2024-04-25 08:28:59 -07:00
Chi Lo	a077330c3e	[TensorRT] adapt for TRT lib name change after TRT 10 GA (#20445 ) For TensorRT 10 GA onwards, the TensorRT libraries will have major version appended to the end on Windows, for example, nvinfer_10.dll, nvinfer_plugin_10.dll, nvonnxparser_10.dll ... Change cmake file accordingly.	2024-04-24 21:46:54 -07:00
Yi Zhang	e5947f5729	Two improvements in pipelines (#20449 ) ### Description 1. Update the image name to avoid docker image wouldn't be overwrite. there was an mistake that variables.CUDA_VERSION_MAJOR is always empty `14fcf0a52d/tools/ci_build/github/azure-pipelines/stages/nuget-linux-cuda-packaging-stage.yml (L120)` 3. set one artifact name as variable to make the job rerunnable ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-25 10:15:40 +08:00
Xavier Dupré	218b6b0a73	Fix missing argument when calling _get_quantize_input_nodes (#20245 ) ### Description The current code is calling one method with a missing argument. ### Motivation and Context It breaks Olive's unittests. --------- Co-authored-by: Xavier Dupré <xavier.dupre@gmail.com>	2024-04-25 00:46:48 +02:00
Yulong Wang	a5182a2ef3	[js/web] update test condition for '--force-localhost' (#20450 ) ### Description Fixes the NPM packaging pipeline failure.	2024-04-24 12:14:03 -07:00
Edward Chen	9cc5badc49	Fix Objective-C static analysis warnings. (#20417 ) Replace most usages of [NSString stringWithUTF8String:] with checked helper function. The issue is that the former can return nil.	2024-04-24 11:48:29 -07:00
maggie1059	dfd4bce36e	Use compute queues by default in DML EP (#20438 ) ### Description We originally only use compute queues for compute-only devices; this change sets the default for DX12 devices to use compute queues as well. ### Motivation and Context There have been issues with TDRs occurring when using the current default queues, which doesn't happen on compute queues.	2024-04-24 10:44:16 -07:00
Xavier Dupré	f78215adad	Fix quantization tools for issue #19529 (#19591 ) ### Description Fix issue #19529, the code was using a variable loop outside a loop.	2024-04-24 19:16:27 +02:00
Scott McKay	a46bab6364	Update podspec url to use AFD hostname (#20452 ) Update to use AFD url when generating podspec	2024-04-24 09:37:24 -07:00
Satya Kumar Jandhyala	ae78cdb5d7	[JS/WebGPU] MultiheadAttention bugfix (#20447 ) ### Description Fixed pastkey, key and pastvalue, value concatenation condition and fixed index error. Added new test cases. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-24 08:43:14 -07:00
Guenther Schmuelling	33d5ea39b3	[js/webgpu] fixes for fp16 attention (#20440 )	2024-04-24 08:01:28 -07:00
Xavier Dupré	80213a9e66	Add implementation for ScatterND (#19540 ) ### Description onnxruntime switches to CPU for ScatterND after opset 13. This extends the implementation of higher opsets.	2024-04-24 14:08:50 +02:00
Rachel Guo	14fcf0a52d	Support visionos build (#20365 ) ### Description <!-- Describe your changes. --> This PR supports a build of onnxruntime.xcframework for xros/xrsimulator for visionos via the build command of `python3 tools/ci_build/github/apple/build_apple_framework.py --config Release/Debug tools/ci_build/github/apple/default_vision_os_framework_build_settings.json`. For officially include visionos in ios cocoapods package and testing in CI, would require separate work for upgrading the Xcode version & upgrade macOS CI agent to macos-13-arm64 or higher. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> visionos support: https://github.com/microsoft/onnxruntime/discussions/19313 --------- Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>	2024-04-23 18:15:07 -07:00
Adam Louly	4ce7bbf6f1	Add LayerSpec Support to ORTPipelineModule (#20410 ) ### Description In Deepspeed's Pipeline Parallel Implementation, there is a class used to instantiate the object after it's moved to the device and assigned in a stage. This approach helps reduce peak memory usage. In this PR, we're adding support to ORT for wrapping this LayerSpec.	2024-04-23 17:57:08 -07:00
Yulong Wang	5055dc0aa8	[js/web] add diagnose log for chrome (#20439 ) ### Description Add logs to further diagnose the pipeline issue.	2024-04-23 17:18:54 -07:00
Maximilian Müller	b4e50758c0	Fix shape conv fuse opt (#20282 ) FIx: - Multiples Convs into an Add+Relu will fuse the op although intermediates are needed ![image](https://github.com/microsoft/onnxruntime/assets/44298237/0c85a30c-5f41-4e62-ae2e-f41eada6c2c3) - Also fixes an issue with Shape Initializers Merge as input, that occurs when the input initializer is the same across multiple nodes but not all nodes are Shape nodes.	2024-04-23 16:19:57 -07:00
Yulong Wang	8f53957bcf	[js/web] add "browser" field to support parcel v2 (#20422 ) ### Description As described in latest discussion in #19915, parcel v2 without using the [new resolver](https://parceljs.org/blog/v2-9-0/#new-resolver) will not work correctly with onnxruntime-web. There are still users who uses parcel with default resolver, so add this deprecated field "browser" back for backward compatibility. This PR also corrects the "main" field, which is for old resolver for Node.js.	2024-04-23 13:10:11 -07:00
Yulong Wang	13bda11583	[Node.js binding] Fix install script (#20416 ) ### Description Fix a few bugs of the install script of onnxruntime-node package. This change is integrated from branch `rel-1.17.3` (#20397)	2024-04-23 13:01:16 -07:00
Satya Kumar Jandhyala	d42ac7f0c6	[JS/WebGPU] Multihead attention improvements (#20286 ) ### Description Enabled more usecases ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-23 12:39:49 -07:00
Edward Chen	76461c8f4d	Increase timeout for iOS packaging pipeline jobs. (#20434 )	2024-04-23 11:55:55 -07:00
Guenther Schmuelling	b8e6684313	more conservitive gpu-buffer cache algo (#20312 ) tuned based on 80 models to keep performance impact minimal	2024-04-23 09:07:04 -07:00
guyang3532	ffb9c8d598	fix embedding sparsity log bug of -1% density (#20420 ) ### Description When not checked valid embedding sparsity, the log print a wrong info of "-1% density", this pr is to fix it.	2024-04-23 20:37:50 +08:00
Scott McKay	ed6f1adcb8	Fix overflow causing test failure on x86 (#20425 ) ### Description <!-- Describe your changes. --> Fix comparison that was not updated when the threshold was converted to bytes. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix CI failure	2024-04-23 21:33:59 +10:00
Maximilian Müller	5eae33fc6b	[CUDA EP] RNN check if tf32 is allowed (#20338 ) Respect the use_tf32 flag.	2024-04-23 00:19:09 -07:00
Yi Zhang	7ebc653f04	Revert "Nuget .NET changes for Mac Catalyst (#19923 )" (#20418 ) This reverts commit `f396748ed6`. ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-23 15:08:12 +08:00
Adrian Lizarraga	e6a677f6b7	[QNN EP] Download QNN SDK from azure blob in packaging pipelines (#20359 ) ### Description - Updates Windows QNN Nuget and Python packaging pipelines to download QNN SDK from blob storage. - Makes the QNN SDK version configurable when launching the python packaging pipeline. ### Motivation and Context Removes the need to rebuild images to update QNN SDK. Only applies to Windows pipelines. Linux pipelines still get the SDK from disk.	2024-04-22 22:32:55 -07:00
aciddelgado	94c69f55d4	GQA 4 CPU (#20299 ) ### Description Support GQA operator on CPU with FP32. ### Motivation and Context Right now, models generated for CPU and GPU must be different. GQA CPU allows these models to be the same.	2024-04-22 19:57:05 -07:00
Scott McKay	c47a6ce70b	XNNPACK: Support 1D input for Conv and ConvTranspose (#20349 ) ### Description <!-- Describe your changes. --> Support 1D input to XNNPACK Conv and ConvTranspose by using faking height of 1 to convert to 2D input. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Enable speech model with 1D input to use XNNPACK. There is no CPU EP quantized ConvTranspose, so this fills that gap.	2024-04-23 11:50:31 +10:00
Edward Chen	3270a002fa	Fix handling of nodes that get assigned to kMSInternalNHWCDomain when loading an ORT format model. (#20379 ) Fix handling of nodes that get assigned to kMSInternalNHWCDomain when loading an ORT format model. The ORT format model doesn't contain information about kMSInternalNHWCDomain since it is set during layout transformation. Fall back to known domains instead.	2024-04-22 18:34:01 -07:00
Preetha Veeramalai	c7de4de501	OVEP Bug fix 1.18 (#20408 ) ### Description Contains critical bug fix ### Motivation and Context This PR handles the bug fix wrt OV caching and blob generation. This also handles the precision for AUTO plugin. --------- Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com>	2024-04-22 18:31:05 -07:00
pengwa	a7787a0bad	Introduce memory efficient topological sort (#20258 ) ### Introduce memory efficient topo sort (for training) ~~and laze initialize Priority-Based and Memory-Efficient topo sort. Because in most cases, they are not needed, so we free the overheads of GraphViewer construction for most use cases.~~ ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-23 08:00:23 +08:00
Scott McKay	9372e9a0a3	Support >2GB of Tensor data in training checkpoint (#20077 ) ### Description <!-- Describe your changes. --> Add ability to store initializer data in an external file. Update training checkpoint code to use external file if data > ~2GB. I don't see a way for the flatbuffers 64-bit offsets to be used, as they don't support storing 'table' types with 64-bit offsets (and our Tensor is a 'table' type not a simple struct). `0cfb7eb80b/tests/64bit/test_64bit.fbs (L38-L39)` Allowing a Tensor to have its raw_data in an external file should hopefully work with the least friction. As it's an extra field it's backwards compatible. Please feel free to suggest alternative approaches. Side note: the diffs in the generated *.fbs.h files are unexpectedly large. Maybe they weren't re-generated when the new flatbuffers version was checked in. I updated by running: `python .\compile_schema.py -f <build output dir>\_deps\flatbuffers-build\Debug\flatc.exe` from onnxruntime\core\flatbuffers\schema which I thought was the correct way but maybe that's out of date. I think you can ignore all the diffs in the generated files and just worry about the changes to the .fbs files in onnxruntime/core/flatbuffers/schema. Basically start at the bottom of the files changed and work up as all the 'real' diffs are there. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: carzh <wolfivyaura@gmail.com>	2024-04-22 15:17:43 -07:00
Yulong Wang	4385602386	[js/web] fix test runner with optional input/output (#20399 ) ### Description fix test runner with optional input/output. This change fixes the OP test runner (.jsonc format test) with optional input(s) and/or output(s). this fix reveals a problem of dealing with optional outputs: > Take SkipSimplifiedLayerNorm as example: > > if in the ONNX model, the node's outputs are: [ 'output_0', '' ] instead of [ 'output_0' ], the current implementation will fail. The difference is, in the first case, context.outputCount == 2, and then the typescript implementation will try to create a tensor for output[1]. It will eventually call to C++ function (OpKernelContext::Output), and the output.DataRaw() will be nullptr. WebGPU backend will fail because it cannot deal with a TensorView with data == 0. > This problem may need to be fixed or workaround in separated PR. This PR does not fix this problem. Failed test cases are modified to work - please note this PR does not break those test cases as they never work.	2024-04-22 12:53:10 -07:00
aamajumder	d0e33d2078	[DML EP] Register opset 20 operators (#20092 ) ### Description This PR registers the following opset 20 operators to the DML EP: -IsNaN-20 -IsInf-20 -ReduceMax-20 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-22 12:01:59 -07:00
Yi Zhang	197b3f1d90	Enable Whisper Test with OMP_FFMPEG (#20402 ) ### Description Installing OMP_FFMPEG in the docker and Readd Whisper Test Download OMP_FFMPEG in restricted accessed Azure blob.	2024-04-22 10:55:56 -07:00

1 2 3 4 5 ...

10986 commits