onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-17 18:40:28 +00:00

Author	SHA1	Message	Date
Tianlei Wu	8d81e56166	support bfloat16	2025-01-14 06:01:55 +00:00
Jiajia Qin	80d8931f1d	[webgpu] Use subgroup for matmulnbits (#23224 ) ### Description This PR applies subgroup to implement matmulnbits when tile_m > 1 for intel devices. With this PR, prefill for 500 tokens prompt for phi3 becomes 3.5s from 8.5s on intel Meteor Lake.	2025-01-13 08:20:42 -08:00
Tianlei Wu	73f5b0c597	LayerNormalization broadcast (limited support for axis=2) (#23297 ) ### Description Spec of LayerNormalization supports broadcasting (tensors Scale and B should be unidirectional broadcastable to tensor X). https://onnx.ai/onnx/operators/onnx__LayerNormalization.html However, current implementation only allow scale and bias size to be X.shape()[axis:]. Example of input tensors that normalized with axis=2: \| X shape \| Scale shape \| B shape \| Before \| After \| \| - \| - \| - \| - \| - \| \| (B, S, D) \| (D) \| (D) \| Supported \| Supported \| \| (B, S, D) \| (1, 1, D) \| (1, 1, D) \| Supported \| Supported \| \| (B, S, D) \| (B, 1, D) \| (B, 1, D) \| Not Supported \| Supported \| \| (B, S, D) \| (1, S, D) \| (1, S, D) \| Not Supported \| Supported \| \| (B, S, D) \| (B, S, D) \| (B, S, D) \| Not Supported \| Supported \| Here we add limited support: axis=2; scale/bias has same shape; scale/bias/X have same number of dimensions. It could support common use case in LLM and vision models. ### Motivation and Context Support Stable Diffusion 3.x and Flux model.	2025-01-10 21:57:18 -08:00
Yulong Wang	a74817ab10	add missing build dependency for onnxruntime_providers_webgpu (#23324 ) ### Description Fixes build when specify with flag `--target onnxruntime_providers_webgpu` Otherwise the following error will occur: ``` range.cc D:\code\onnxruntime\build\Windows\Debug\_deps\onnx-src\onnx\onnx_pb.h(65,10): error C1083: Cannot open include file: 'o nnx/onnx-ml.pb.h': No such file or directory [D:\code\onnxruntime\build\Windows\Debug\onnxruntime_providers_webgpu.vcxp roj] (compiling source file '../../../onnxruntime/core/providers/webgpu/math/binary_elementwise_ops.cc') ```	2025-01-10 18:07:12 -08:00
Changming Sun	b461f06a15	Remove a hack in adjust_global_compile_flags.cmake (#23313 ) ### Description Remove a hack in adjust_global_compile_flags.cmake because the issue should have been resolved.	2025-01-10 18:05:43 -08:00
Xiaoyu	6e5efb5dba	Fix quant modelproto error (#23322 ) ### Description Fixing [issue](https://github.com/microsoft/onnxruntime/issues/23268#issuecomment-2579010227). Saving a `ModelProto` with `save_as_external_data=True` updates its metadata, which could lead to issues later if not managed carefully. Using a deepcopy to prevent such problems.	2025-01-10 17:48:01 -08:00
Changming Sun	ecdeecae61	Update MACOSX_DEPLOYMENT_TARGET (#23308 ) Fix some inconsistency. All our iOS build should target iOS 15.1. All our macOS desktop build should target macOS 13.3 to align with the changes made in #17361	2025-01-10 14:25:32 -08:00
Satya Kumar Jandhyala	436dfc3c9d	[Native WebGPU] Fix the error when past and present key/value share buffer (#23315 ) ### Description Fix error causing incorrect output when past key/value share buffer with present key/value ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2025-01-10 13:31:50 -08:00
Changming Sun	e7d8596c7c	Update docker images: remove python 3.8 and 3.9 (#23310 ) Python 3.8 and 3.9 are removed from the new manylinux images, to reduce image size.	2025-01-10 13:09:04 -08:00
Changming Sun	1ce59577d5	Add VCPKG triplet files (#23298 ) Add VCPKG triplet files. All the triplet files are automatically generated by gen.py. Put the files there to ease use.	2025-01-09 16:18:51 -08:00
Jiajia Qin	7be006c466	[js/webgpu] Optimize convtranspose (#23302 ) ### Description <!-- Describe your changes. --> BUG #23273 With this change, I see the convTranspose time in that bug becomes ~7s from ~90s on my Meteor Lake. This PR does below things: 1. Use stride to update the increasement in the loop. In the bug, the stride is 1024, which can greatly reduce the loop times. 2. Support components for A to reduce the memory access times. 3. When output channels is 1, the b components can be same with A to further reduce the memory access times.	2025-01-09 11:24:42 -08:00
Yulong Wang	0627a6cb93	[js/web] fix package export for bundlers (#23257 ) ### Description <!-- Describe your changes. --> This PR tries to fix #22615. (see detailed description in the issue) A perfect solution would be too difficult to make, because there are a huge number of combinations of usage scenarios, including combinations of development framework, bundler, dev/prod mode, and so on. This PR is using the following approach: - Introduce a new type of end to end test: export test. This type of tests are complete web apps that use popular web development frameworks, and the tests are using puppeteer to run the apps and check if the apps can run without error. - added one nextjs based web app and one vite based web app. - In the test, perform the following test steps: - `npm install` for packages built locally - `npm run dev` to start dev server and use puppeteer to launch the browser to test - `npm run build && npm run start` to test prod build and use puppeteer to launch the browser to test - Make changes to ort-web, including: - special handling on Webpack's behavior of rewriting `import.meta.url` to a `file://` string - revise build definitions - fix wasm URL for proxy, if used in a bundled build	2025-01-09 11:01:00 -08:00
Changming Sun	0ec2171b9f	Update Linux docker images (#23244 ) The new images contain the following updates: 1. Added Git, Ninja and VCPKG to all docker images 2. Updated CPU containers' GCC version from 12 to 14 3. Pinned CUDA 12 images' CUDNN version to 9.5(The latest one is 9.6) 4. Addressed container supply chain warnings by building CUDA 12 images from scratch(avoid using Nvidia's prebuilt images) 5. Updated manylinux commit id to 75aeda9d18eafb323b00620537c8b4097d4bef48 Also, this PR updated some source code to make the CPU EP's source code compatible with GCC 14.	2025-01-09 10:20:33 -08:00
Corentin Maravat	16a246dc1c	Add Gradient for Atan (#23172 )	2025-01-09 09:30:53 -08:00
Satya Kumar Jandhyala	d0c7438f5a	[JSEP/WebGPU] Add a fatal error message for unsupported GQA do_rotary attribute. (#23287 ) ### Description <!-- Describe your changes. --> Added a fatal error message for unsupported GroupQuerryAttention do_rotary attribute. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> https://github.com/microsoft/onnxruntime/issues/22987 Help user understand that this attribute is not supported.	2025-01-09 08:52:17 -08:00
Vincent Wang	3b1a9002f5	Fix Build Error (#23299 ) Fix build error.	2025-01-09 13:34:19 +08:00
Vincent Wang	4134cd9e42	Add Optional Redundant Clip Node to NodeUnit (#22888 ) Currently we have Clip/Relu with Q fusion on level 2. But for EPs that are using NodeUnit, these optimizers are not applied. If we want to remove such redundant Clip/Relu nodes, we need to add code to handle it for each EP separately. The PR detects a Clip/Relu is made redundant with a Q node, and add this information to the corresponding QDQ NodeUnit, so that EPs can ignore it, and can handle the target node only in the QDQ NodeUnit.	2025-01-09 10:25:32 +08:00
Preetha Veeramalai	ca77de54d2	Updated the Documentation for nuget packages (#23182 ) ### Description Update documentation for Nuget packages for OVEP Co-authored-by: jatinwadhwa921 <jatin.wadhwa@intel.com>	2025-01-08 17:19:39 -08:00
Changming Sun	3328eb3bb3	Update min iOS version to 15.1 to align with React Native 0.76 (#23292 ) Update min iOS version to 15.1 to align with React Native 0.76. We need to update React Native . See https://github.com/react-native-community/discussions-and-proposals/discussions/812 for background. Similar to PR #20773	2025-01-08 16:02:45 -08:00
Changming Sun	ccbe66d422	Update NDK (#23280 ) Similar to #21989	2025-01-08 13:57:23 -08:00
Sam Webster	080f87fa0b	[QNN EP] Make sure everything gets cleaned up (#23275 ) ### Description Always make sure resources and callbacks are cleaned up ### Motivation and Context We've seen problems where the log callback isn't deregistered which can lead to crashes --------- Co-authored-by: Adrian Lizarraga <adrianlm2@gmail.com>	2025-01-08 12:56:30 -08:00
Hector Li	76d6345f0b	Fix the issue for Gather int64 indices handling (#23274 ) ### Description Fix the issue for Gather int64 indices handling. Make it still insert Cast node if it's non-quantized Gather node.	2025-01-08 12:52:08 -08:00
PARK DongHa	5b9c968eaa	Correct ONNX and Protobuf version in vcpkg build (#23285 ) ### Description Changes vcpkg manifest and configuration file (vcpkg.json & vcpkg-configuration.json) * Update vcpkg version to https://github.com/microsoft/vcpkg/releases/tag/2024.12.16 * Use protobuf 3.21.12(= `v21.12`) to sync with [cmake/deps.txt](https://github.com/microsoft/onnxruntime/blob/main/cmake/deps.txt) * Resolve https://github.com/microsoft/onnxruntime/issues/22750 * Add `onnx` to vcpkg manifest so `find_package(ONNX)` and `find_dependency(Protobuf)` can work as expected. * Currently, It uses 1.16.2 * v1.17.0 will become available after https://github.com/microsoft/vcpkg/pull/42942 However, `onnx` in vcpkg doesn't configure `ONNX_DISABLE_STATIC_REGISTRATION` build option. * https://github.com/microsoft/vcpkg/pull/38879 * Create "cmake/vcpkg-triplets/" folder and triplet files which use `VCPKG_CMAKE_CONFIGURE_OPTIONS` for the option * This requires `VCPKG_OVERLAY_TRIPLETS` environment variable for CI steps, which is a bit inconvenient. I will try to find simple way to get same result ### Motivation and Context * Help #23158 * "ONNX is not consumed from vcpkg" * "Mismatch protobuf version. When vcpkg is enabled , we should not fetch protoc from Github which may cause version mismatches." * https://github.com/microsoft/vcpkg/pull/43126 * #21348	2025-01-08 12:25:17 -08:00
Jian Chen	da35cceac9	Add a temporary path to RN 0.69.3 to update the boost url (#23281 ) ### Description Add a temporary path to RN 0.69.3 to update the boost url ### Motivation and Context Fix the React-native CI until we update the RN to 0.70.15 or 0.73.3+ versions	2025-01-08 09:28:35 -08:00
Vincent Wang	34d70f5fae	[QNN] MatMul Op Builder to Handle All Cases of ONNX's MatMul (#22639 ) ONNX's MatMul is same as numpy.matmul, which supports input tensors with rank >= 1. But QNN's MatMul can only support input tensors with rank >= 2. This PR is to add MatMulOpBuilder for QNN EP to build QNN graph to support all possible cases of ONNX's MatMul, by adding Reshape nodes if necessary, e.g., if Reshape 1D input to 2D if exists, and Reshape output to expected shape at the end. This PR also tries to use FullyConnected Op for MatMul if 2nd input is 2D initializer or 1D tensor because FullyConnected is faster than MatMul on QNN EP. If 2nd input is 2D tensor, we require it an initializer because FullyConnected requires 2nd input in [n, k] shape, we can transpose it when graph building if it's an initializer (we don't want to add extra Transpose node). Use swin_base model as example, which contains several MatMul nodes with 2nd input is 2D initializer (not followed by Add), running on Gen3 mobile device, before the change, it takes 34.8876 ms, after this change, it's 27.0639 ms.	2025-01-08 10:15:55 +08:00
Vincent Wang	ff0ab0a8a5	Quantize Weight for Gemm/Conv on Quantized Model (#22969 ) Some quantized models have QDQ around Conv/Gemm but the weight and/or bias are not quantized. This PR adds WeightBiasQuantization optimizer to quantize float weight and/or bias to INT8 and INT32 tensors respectively. We only do this for weight and/or bias initializer so that ConstantFolding will fold the sub-graph to real quantized initializers during the graph optimization next round.	2025-01-08 10:00:24 +08:00
wonchung-microsoft	c75681a404	Address CodeQL security issues on comparison of different types (#23276 ) ### Description Fix comparison of narrow type with wide type in loop condition. ### Motivation and Context Comparison between types of different widths in a loop condition can cause the loop to fail to terminate.	2025-01-07 17:30:44 -08:00
Prathik Rao	d8e8d4fac0	disable scatternd op for jsep (#23277 ) mitigates https://github.com/microsoft/onnxruntime/issues/23183 while we investigate final solution	2025-01-07 16:50:06 -08:00
Matthieu Darbois	4b0cee3adb	fix: Pad/AveragePool fusion (#23190 ) ### Description Fusing Pad & AveragePool requires AveragePool to use `count_include_pad=1`. If the AveragePool already set some padding and `count_include_pad=0`, fusion can't happen. This PR adds a condition to perform fusion depending on those attributes. If fusion occurs, `count_include_pad` is always set to `1`. ### Motivation and Context Fix #22177 (mislabelled as a performance issue but there's an actual bug in the implementation) Bug introduced in #21556	2025-01-07 15:48:38 -08:00
Jiajia Qin	4883ec50c4	[webgpu] Use override shape in shader key (#23188 ) ### Description This PR 1) uses override shape instead of tensor original shape in shader key to reduce some shader variants; 2) adds indices shape rank to shader key in case some potential errors.	2025-01-07 15:36:02 -08:00
Wanming Lin	519fae019b	[WebNN] Fix bug in SkipSimplifiedLayerNormalization (#23236 ) The input should be added by skip and bias (if it exits) firstly.	2025-01-07 14:24:26 -08:00
Jian Chen	655b3efee4	Separating result processor out from profiler.py (#23251 ) ### Description Separating result processor out from profiler.py without changing the behaviors of current profile.py ### Motivation and Context Less dependency and smaller code for processing profile from other scenarios. --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2025-01-07 09:17:33 -08:00
Changming Sun	704523c2d8	[build] Be compatible with the latest protobuf (#23260 ) Resolve #21308	2025-01-06 13:10:43 -08:00
Changming Sun	c6cbda3257	Update Python-Cuda-Publishing-Pipeline (#23253 ) ### Description 1. Currently Python-Cuda-Publishing-Pipeline only publishes Linux wheels, not Windows wheels. It is because recently we refactored the upstream pipeline("Python-CUDA-Packaging-Pipeline") to use 1ES PT. This PR fixed the issue 2. tools/ci_build/github/azure-pipelines/stages/py-win-gpu-stage.yml no longer includes component-governance-component-detection-steps.yml , because 1ES PT already inserted such a thing 3. Delete tools/ci_build/github/windows/eager/requirements.txt because it is no longer used. ### Motivation and Context The "Python-CUDA-Packaging-Pipeline" is for CUDA 12. "Python CUDA ALT Packaging Pipeline" is for CUDA 11. The two pipelines are very similar, except the CUDA versions are different. Each of them has three parts: build, test, publish. "Python-CUDA-Packaging-Pipeline" is the first part: build. "Python CUDA12 Package Test Pipeline" is the second part. "Python-Cuda-Publishing-Pipeline" is the third part that publishes the packages to an internal ADO feed.	2025-01-06 11:50:58 -08:00
Yulong Wang	c53c9caf17	[js] update mocha to v11.0.1 (#23254 ) ### Description Update `mocha` to v11.0.1 and `fs-extra` to v11.2.0 ``` # npm audit report nanoid <3.3.8 Severity: moderate Predictable results in nanoid generation when given non-integer values - https://github.com/advisories/GHSA-mwcw-c2x4-8c55 fix available via `npm audit fix` node_modules/nanoid mocha 8.2.0 - 10.2.0 Depends on vulnerable versions of nanoid node_modules/mocha 2 moderate severity vulnerabilities ```	2025-01-05 22:29:02 -08:00
Yulong Wang	21b4d2ac9f	fix pipeline build-perf-test-binaries (#23255 )	2025-01-05 22:28:41 -08:00
Wu, Junze	2a16ad0215	[js/node] add proxy agent support for onnxruntime-node install script (#23232 ) ### Description Add proxy agent to fetch request ### Motivation and Context Fixes #23231 --------- Signed-off-by: Junze Wu <junze.wu@intel.com> Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2025-01-04 20:27:55 -08:00
Changming Sun	b7ef81a034	Move Linux GPU CI pipeline to A10 (#23235 ) Move Linux GPU CI pipeline to A10 machines which are more advanced. Retire onnxruntime-Linux-GPU-T4 machine pool. Disable run_lean_attention test because the new machines do not have enough shared memory. ``` skip loading trt attention kernel fmha_mhca_fp16_128_256_sm86_kernel because no enough shared memory [E:onnxruntime:, sequential_executor.cc:505 ExecuteKernel] Non-zero status code returned while running MultiHeadAttention node. Name:'MultiHeadAttention_0' Status Message: CUDA error cudaErrorInvalidValue:invalid argument ```	2025-01-04 19:11:37 -08:00
Jiajia Qin	4247153bb2	[webgpu] Add kernel type to profile info (#23167 ) ### Description This PR is convenient to do post processing for the generated json file when profiling is enabled. Kernel type can be used to aggregate the same type kernels' overall time.	2025-01-03 14:28:48 -08:00
Yulong Wang	5c2e60c5af	[js/node] update install script to allow use proxy (#23242 ) ### Description Use `https.get` instead of `fetch` in ORT Nodejs binding package install script. ### Motivation and Context According to discussions in #23232, the package `global-agent` cannot work with `fetch` API. To make it work with the proxy agent, this PR replaces the `fetch` API with `https.get` in the install script.	2025-01-03 14:27:15 -08:00
Changming Sun	5d692b0136	Merge web machine pools (#23243 ) ### Description The Web CI pipeline uses three different Windows machine pools: 1. onnxruntime-Win2022-webgpu-A10 2. onnxruntime-Win2022-VS2022-webgpu-A10 3. onnxruntime-Win-CPU-2022-web This PR merges them together to reduce ongoing maintenance cost.	2025-01-03 13:53:17 -08:00
Yueqing Zhang	aedb49beb4	[VitisAI] change all support tensor type from ir 9 to ir 10 (#23204 ) ### Description <!-- Describe your changes. --> Changed all support tensor type from ir 9 to ir 10. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> - See issue https://github.com/microsoft/onnxruntime/issues/23205 Co-authored-by: Yueqing Zhang <yueqingz@amd.com>	2025-01-02 06:45:21 -08:00
Yifan Li	bc91f5c72e	[TensorRT EP] Fix to build ORT on legacy TRT8.5 (#23215 ) ### Description <!-- Describe your changes. --> For legacy jetson users who use jetpack 5.x, the latest TRT version is 8.5. Add version check to newer trt features to fix build on jetpack 5.x (cuda11.8+gcc11 are required) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2025-01-01 19:24:24 -08:00
xhcao	a3833a5e79	[js/webgpu] validate transpose perm if specified (#23197 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2025-01-01 15:58:54 -08:00
Dmitry Deshevoy	0b87bccca8	[CUDA] Make cubins const (#23225 ) ### Description Make arrays with cubin data const. ### Motivation and Context Non-const arrays are put into the .data section which might cause excessive memory usage in some scenarios. Making cubin arrays const allows them to be put into the .rodata section.	2024-12-31 16:20:21 -08:00
Changming Sun	afd3e81c94	Remove PostBuildCleanup (#23233 ) Remove PostBuildCleanup tasks since it is deprecated. It is to address a warning in our pipelines: "Task 'Post Build Cleanup' version 3 (PostBuildCleanup@3) is dependent on a Node version (6) that is end-of-life. Contact the extension owner for an updated version of the task. Task maintainers should review Node upgrade guidance: https://aka.ms/node-runner-guidance" Now the cleanup is controlled in another place: https://learn.microsoft.com/en-us/azure/devops/pipelines/yaml-schema/workspace?view=azure-pipelines The code change was generated by the following Linux command: ```bash find . -name \*.yml -exec sed -i '/PostBuildCleanup/,+2d' {} \; ```	2024-12-31 13:12:33 -08:00
Jean-Michaël Celerier	2116fd1999	Update onnxruntime_c_api.h to work with MinGW (#23169 ) The SAL2 macros are not always available there ### Description Make SAL2 macros only available on MSVC. ### Motivation and Context https://github.com/microsoft/onnxruntime/issues/1175	2024-12-31 11:05:10 -08:00
Changming Sun	69bb53db85	Enable delay loading hooker for python packages (#23227 ) ### Description Enable delay loading hooker for python packages	2024-12-31 10:12:31 -08:00
wejoncy	86870114eb	[CoreML] support coreml model cache (#23065 ) ### Description Refactor compute plan profiling Support cache coreml model to speed up session initialization. this is only support by user provided entry and user responsible to manage the cache With the cache, session initialization time can be reduced by 50% or more: \|model\| before\| after\| \|--\|--\|--\| \|yolo11.onnx\| 0.6s\|0.1s\| \|yolo11-fp16.onnx\|1.8s\|0.1s\| ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: wejoncy <wejoncy@.com> Co-authored-by: Scott McKay <skottmckay@gmail.com>	2024-12-31 09:29:41 +08:00
Wanming Lin	2d05c4bcd9	[WebNN] Support SkipSimplifiedLayerNormalization op (#23151 ) The algorithm of `SkipSimplifiedLayerNormalization` is quite similar to the `SimplifiedLayerNormalization`, only different is `SkipSimplifiedLayerNormalization` provides an additional output used for calculating the sum of the input, skip and bias (if it exits). BTW, fix a bug in `SimplifiedLayerNormalization`, adding bias if it exits.	2024-12-24 12:44:14 -08:00

1 2 3 4 5 ...

12176 commits