onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-11 17:48:34 +00:00

Author	SHA1	Message	Date
Chi Lo	56e4fda8a8	[TensorRT EP] Revert "Add new provider option to exclude nodes from running on TRT" (#22878 ) - Revert https://github.com/microsoft/onnxruntime/pull/22681 - But still implicitly exclude DDS ops for TRT 10. Will later provide better PR to add trt_op_types_to_exclude provider option.	2024-11-19 09:08:54 -08:00
Changming Sun	a0d36a508c	Move C# doc Github Action to Windows (#22880 ) ### Description Move C# doc Github Action to Windows machines, to avoid having dependency on Mono which I think is getting deprecated. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-11-18 23:56:59 -08:00
Adrian Lizarraga	497b06f0a9	[QNN EP] QNN SDK 2.28.2 (#22844 ) ### Description - Updates pipelines to use QNN SDK 2.28.2.241116. - Re-enable LayerNormalization unit tests that failed with accuracy errors with the previous QNN SDK (2.28.0). - Update QNN EP to no longer provide a dummy bias for LayerNorm if the QNN SDK version is >= 2.28.0. ### Motivation and Context Use the latest QNN SDK. This version improves inference latency for certain customer models.	2024-11-18 20:10:36 -08:00
Jiajia Qin	e597eaed4a	[js/webgpu] Optimize transpose as reshape when suitable (#22870 ) BUG #22031	2024-11-18 12:52:48 -08:00
Tianlei Wu	c4f3742bb4	Replace INFINITY by std::numeric_limits<float>::infinity() (#22868 ) Replace INFINITY by `std::numeric_limits<float>::infinity()` to avoid build errors with Visual Studio 2022 v17.12 Preview 5 ### Motivation and Context https://github.com/microsoft/onnxruntime/issues/22728	2024-11-18 09:16:41 -08:00
Yi-Hong Lyu	02a0be3599	Optimize Transpose around QLinearSoftmax (#22849 ) ### Description <!-- Describe your changes. --> - Improved Transpose around QLinearSoftmax in Level 3 NHWC Transformer. - Removed redundant code HandleQLinearConcat, HandleQLinearBinaryOp. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> By merging and eliminating redundant transpose , the Image Segmentation i8 model (MobileNetv2 + DeepLabv3) achieves a 2.34X speedup.	2024-11-18 06:58:21 -08:00
Yi Zhang	135d8b2beb	Fix CUDA/DML package exception caused by ENABLE_CUDA_NHWC_OPS (#22851 ) ### Description Now, ENABLE_CUDA_NHWC_OPS is enabled by default. It adds a new chance to create cuda provider while both cuda/dml are enabled ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-11-18 10:46:23 +08:00
liqun Fu	101ed10e5e	Refactor SkipLayerNorm and handle beta properly (#22862 ) Signed-off-by: Liqun Fu <liqfu@microsoft.com> Signed-off-by: Liqun Fu <liqun.fu@microsoft.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2024-11-17 14:51:16 -08:00
Peishen Yan	5928009553	[WebNN EP] Support Einsum op (#19558 ) Adds support for einsum via WebNN matmul, transpose, reshape, reducesum, identity and element-wise binary ops.	2024-11-15 17:58:35 -08:00
Jing Fang	c73a3d1804	[ARM] MatMulNBits fp16 support - connect kernels (#22856 ) ### Description A breakdown PR of https://github.com/microsoft/onnxruntime/pull/22651 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-11-15 14:59:11 -08:00
Po-Wei (Vincent)	bbe7c87738	Fix 1.20 cuda minimal build failure (#22751 ) ### Description Fixes build failure for the cuda minimal build ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> [This change](https://github.com/microsoft/onnxruntime/pull/19470) in 1.20 is causing build failures for the cuda minimal build. Essentially, some cudnn logic was not guarded by the `USE_CUDA_MINIMAL`. Also the build is looking for cudnn while in the cuda minimal build it shouldn't depend on it, resulting in linking error. cc @gedoensmax @chilo-ms	2024-11-15 10:50:55 -08:00
Preetha Veeramalai	ac9c135b95	Ovep develop 1.21 (#22824 ) ### Description OVEP development changes for ORT 1.21 Release ### Motivation and Context Has critical bug fixes Support for concurrency execution of models is enabled Support for OV 2024.5 Memory optimizations for NPU platform --------- Co-authored-by: jatinwadhwa921 <jatin.wadhwa@intel.com> Co-authored-by: Ankit Maheshkar <ankit.maheshkar@intel.com> Co-authored-by: sfatimar <sahar.fatima@intel.com> Co-authored-by: saurabhkale17 <saurabh1.kale@intel.com> Co-authored-by: TejalKhade28 <tejal.khade@intel.com> Co-authored-by: Javier E. Martinez <javier.e.martinez@intel.com>	2024-11-14 20:10:07 -08:00
Jian Chen	632a36a233	Update Gradle version 8.7 and java version 17 within onnxruntime/java (#22771 ) ### Description This change is to update the Gradle version within java project to 8.7, it also upgrades the JAVA to 17. Gradle version from react-native was also updated to 7.5 to make it compatible with changes from the Java directory. However, the target java version remains the same. Java version from these will be upgraded in a separated PR. This is spited from #22206 ### Motivation and Context This is the first step to upgrade the react native version.	2024-11-14 17:10:44 -08:00
Adrian Lizarraga	0733733307	[Quant tool] Handle input models with pre-quantized weights (#22633 ) ### Description Allows the QDQ quantizer to handle input models that already have some pre-quantized weights. In this case, the qdq quantizer will properly skip/handle the pre-quantized weights. Also handles an operator (e.g., Conv) with a pre-quantized weight and a float bias. The tool will read the pre-quantized weight's quantization scale to compute the bias's scale (`bias_scale = input_scale * weight_scale`). Input model (pre-quantized Conv weight): ![image](https://github.com/user-attachments/assets/7d2626e4-49ad-47ae-bd0e-6339ac590435) Output QDQ model (everything is quantized): ![image](https://github.com/user-attachments/assets/393804d3-f042-47bd-895f-3d667fb2ae94) ### Motivation and Context Customers may use external tools to quantize some weights (e.g., int4 for Conv/MatMul). The qdq quantizer should still be able to quantize the rest of the model (float weights and activations) in this case.	2024-11-14 13:48:46 -08:00
Yifan Li	562ddce270	Re-enable test symbolic shape infer (#22737 ) ### Description <!-- Describe your changes. --> It seems after CI updated to py310, numpy got updated to 2.0 and sympy 1.2 failed to cast float numpy array. Pointing sympy to 1.13 when py>=3.9 and re-enable unit test ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Error: Linux CPU CI	2024-11-14 11:28:00 -08:00
Jing Fang	c02b398980	[ARM] MatMulNBits Fp16 support - API change only (#22826 ) ### Description A break-down PR of https://github.com/microsoft/onnxruntime/pull/22651 Op API change only. - add template to functions and classes that support fp32 and fp16 - rename functions, classes and files that support fp32 and fp16 from SQNBxxx to QNBxxx ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-11-14 10:38:59 -08:00
Jian Chen	c645bd202c	Fix spellchecks from Optional Lint (#22802 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-11-14 10:27:33 -08:00
dtang317	12dfe2859c	Register groupnorm for opset 21 (#22830 ) ### Description This PR registers GroupNormalization for opset 21 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-11-14 10:06:30 -08:00
Jian Chen	5659d055ee	Fix Linux CI pipeline where ep was not provided for py-packaging-linux-test-cpu.yml (#22828 ) ### Description Current linux-ci-pipeline was broken due to missing parameters from `py-packaging-linux-test-cpu.yml` template ### Motivation and Context Fix Linux CI pipeline	2024-11-14 09:41:37 -08:00
Tianlei Wu	09c98433e7	[CUDA] stable diffusion benchmark allows IO binding for optimum (#22834 ) ### Description Update stable diffusion benchmark: (1) allow IO binding for optimum. (2) do not use num_images_per_prompt across all engines for fair comparison. Example to run benchmark of optimum on stable diffusion 1.5: ``` git clone https://github.com/tianleiwu/optimum cd optimum git checkout tlwu/diffusers-io-binding pip install -e . pip install -U onnxruntime-gpu git clone https://github.com/microsoft/onnxruntime cd onnxruntime/onnxruntime/python/tools/transformers/models/stable_diffusion git checkout tlwu/benchmark_sd_optimum_io_binding pip install -r requirements/cuda12/requirements.txt optimum-cli export onnx --model runwayml/stable-diffusion-v1-5 --task text-to-image ./sd_onnx_fp32 python optimize_pipeline.py -i ./sd_onnx_fp32 -o ./sd_onnx_fp16 --float16 python benchmark.py -e optimum -r cuda -v 1.5 -p ./sd_onnx_fp16 python benchmark.py -e optimum -r cuda -v 1.5 -p ./sd_onnx_fp16 --use_io_binding ``` Example output in H100_80GB_HBM3: 572 ms with IO Binding; 588 ms without IO Binding; IO binding gains 16ms, or 2.7%, ### Motivation and Context Optimum is working on enabling I/O binding: https://github.com/huggingface/optimum/pull/2056. This could help testing the impact of I/O binding on the performance of the stable diffusion.	2024-11-14 00:09:07 -08:00
Michael Tyler	dd99e34d66	Enable ConvReplaceWithQLinear when using ACL (#22823 ) ### Description Enable the ConvReplaceWithQLinear graph optimization when using the ACL execution provider. ### Motivation and Context Fixes an issue where quantized Conv nodes followed by ReLU don't get converted to QLinearConv, so ACL sees the weights as mutable and therefore cannot run the Conv node. Signed-off-by: Michael Tyler <michael.tyler@arm.com>	2024-11-13 21:44:50 -08:00
Wanming Lin	82681205e4	[WebNN] Fix MLTensorUsage is undefined issue (#22831 ) `MLTensorUsage` has been removed from Chromium: https://chromium-review.googlesource.com/c/chromium/src/+/6015318, but we still need to make it compatible with old Chrome versions, so just make it `undefined` for latest Chrome version.	2024-11-13 20:22:22 -08:00
Jian Chen	f423b737a9	Fix Linux python CUDA package pipeline (#22803 ) ### Description Making ::p optional in the Linux python CUDA package pipeline ### Motivation and Context Linux stage from Python-CUDA-Packaging-Pipeline has failed since merge of #22773	2024-11-13 14:20:21 -08:00
microsoft-github-policy-service[bot]	6d7603f054	Auto-generated baselines by 1ES Pipeline Templates (#22817 )	2024-11-13 13:50:52 -08:00
Bin Miao	a15381d7fc	[WebNN EP] Fix issues of GRU operator (#22123 ) ### Description This PR fixes the spelling of the key value of the GRU operator in the map in the `GetSupportedNodes` function (Gru -> GRU) and removes the data type check for the fifth input (sequence_lens) of the GRU operator. PTAL, thanks!	2024-11-13 13:34:34 -08:00
Hector Li	a9b62fa8da	Keep the model metadata on the generated EP context model (#22825 ) ### Description Keep the model metadata on the generated EP context model	2024-11-13 11:52:21 -08:00
Chi Lo	fa4cbcd36b	[TensorRT EP] Add new provider option to exclude nodes from running on TRT (#22681 ) Add new provider option `trt_op_types_to_exclude`: - User can provide op type list to be excluded from running on TRT - e.g. `trt_op_types_to_exclude="MaxPool"` There is a known performance issue with the DDS ops (NonMaxSuppression, NonZero and RoiAlign) from TRT versions 10.0 to 10.7. TRT EP excludes DDS ops from running on TRT by default, user can override default value with empty string to include all ops.	2024-11-13 11:34:43 -08:00
shiyi	3adcf4d714	[WebNN] Remove validation for coordinate_transformation_mode (#22811 ) The performance cost of falling back to the CPU EP is high for several resampling nodes and causes multiple partitions in SD Turbo and VAE decoder. Since the asymmetric mode with nearest to floor and integer scales is identical to half_pixel anyway, stick with the WebNN EP.	2024-11-13 11:12:00 -08:00
Xu Xing	ff57ac4f3d	[js/webgpu] Add scatterND (#22755 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-11-13 09:13:00 -08:00
liqun Fu	bc2b1b5e37	Fix issue #22796 - a typo: (__GNUC__ > 9) -> (__GNUC__ > 10) (#22807 ) ### Description fix #22796 Signed-off-by: liqunfu <liqun.fu@microsoft.com>	2024-11-12 18:56:35 -08:00
Xiang Zhang	69a36eb231	Revert Implement DML copy for Lora Adapters (#22814 ) Revert https://github.com/microsoft/onnxruntime/pull/22396	2024-11-12 17:45:59 -05:00
Jing Fang	7fa69461fd	[ARM] MatMulNBits FP16 support - kernels only (#22806 ) ### Description A break down PR of https://github.com/microsoft/onnxruntime/pull/22651 Add fp16 kernels. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-11-12 14:28:47 -08:00
Jiajia Qin	7e0dd9d433	[js/webgpu] Optimize Expand (#22752 ) Use components = 4 if possible. llama3.2-1B becomes 20 tokens/s from 18 tokens/s on my iGPUs.	2024-11-12 12:37:19 -08:00
Jiajia Qin	05c8dc9d1c	[js/webgpu] Optimize ConvTranspose (#22774 ) BUG #22031 The overall time of ConvTranspose in Demucs model becomes 517.41 ms from 1415.65 ms on my iGPUs.	2024-11-12 12:37:07 -08:00
Bin Miao	67f5be0da2	[WebNN EP] Support LRN operator (#22775 ) WebNN doesn't provide dedicate op for LRN, use a couple of WebNN ops to emulate it in WebNN EP: pow -> transpose -> pad -> averagePool -> transpose -> mul -> add -> pow -> div @Honry @fdwr PTAL, thanks!	2024-11-12 11:53:52 -08:00
junchao-zhao	fd5b1a18ee	Fix LARCH64 compile error (#22759 ) ### Description Currently loongarch has not implemented AIsSigned qgemm, so I added bypass for it	2024-11-12 11:47:43 -08:00
Jian Chen	75a44582ba	Update all JDK version to 17 (#22786 )	2024-11-12 11:42:18 -08:00
Ted Themistokleous	2b0f3435d2	[MIGraphX EP] Add support for Gelu, BiasGelu, FastGelu operators (#22808 ) ### Description Adds support for different flavours of gelu already supported in MIGraphX	2024-11-12 11:04:15 -08:00
dtang317	9836ef1c89	register Identity and QLinearMatmul for opset21 (#22804 ) ### Description This PR registers the following opset 21 operators: Idenity-21 OlieanrMatmul-21 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-11-12 09:36:19 -08:00
amarin16	f0ac5e0d3d	Update skip layer norm (#22719 ) Update the `SkipLayerNorm` implementation to address issues.	2024-11-12 07:01:30 -08:00
Wanming Lin	cdc8db9984	[WebNN] Fixed WebNN Module undefined issue (#22795 ) `Module.jsepRegisterMLConstant` will be shorten by Closure Compiler in offical release, this would cause undefined error. Fix it by using `Module['jsepRegisterMLConstant']`.	2024-11-11 21:31:24 -08:00
Adrian Lizarraga	0ad44d0f79	[Quant Tool] Flaky test due to Pad reflect bug (#22798 ) ### Description Fixes a unit test that would fail intermittently due to an existing bug with Pad (reflect mode). When the number of padded values is >= the inner dimension size, the ORT Pad implementation accesses invalid memory. This PR makes the number of padding values less than the inner dimension size to avoid triggering the bug. ### Motivation and Context See related issues: https://github.com/microsoft/onnxruntime/issues/8265 https://github.com/microsoft/onnxruntime/issues/11828 https://github.com/microsoft/onnxruntime/issues/20801 Here's a valgrind trace obtained on a Linux machine (with `sess_options.enable_cpu_mem_arena = False`) ``` ==864228== Invalid read of size 4 ==864228== at 0x2716272A: void onnxruntime::PadInnermostAxis<unsigned int>(unsigned int, unsigned int, long, unsigned long) (pad.cc:370) ==864228== by 0x2715D213: onnxruntime::common::Status onnxruntime::PadImpl<unsigned int>(onnxruntime::OpKernelContext, absl::lts_20240722::InlinedVector<long, 10ul, std::allocator<long> > const&, absl::lts_20240722::InlinedVector<long, 10ul, std::allocator<long> > const&, onnxruntime::Mode const&, unsigned int) (pad.cc:551) ==864228== by 0x2715B2BB: onnxruntime::Pad::Compute(onnxruntime::OpKernelContext) const (pad.cc:725) ==864228== by 0x276FF6A7: onnxruntime::ExecuteKernel(onnxruntime::StreamExecutionContext&, unsigned long, unsigned long, bool const&, onnxruntime::SessionScope&) (sequential_executor.cc:484) ==864228== by 0x276F4A04: onnxruntime::LaunchKernelStep::Execute(onnxruntime::StreamExecutionContext&, unsigned long, onnxruntime::SessionScope&, bool const&, bool&) (execution_steps.cc:73) ... ``` The above is obtained with the basic Pad(reflect) example on the [ONNX Pad operator spec page](https://onnx.ai/onnx/operators/onnx__Pad.html#summary): ```python data = [ [1.0, 1.2], [2.3, 3.4], [4.5, 5.7], ] pads = [0, 2, 0, 0] mode = 'reflect' # Expected output by ONNX spec expected_output = [ [1.0, 1.2, 1.0, 1.2], [2.3, 3.4, 2.3, 3.4], [4.5, 5.7, 4.5, 5.7], ] # Bugged output from onnxruntime has invalid/uninitialized data for the first element in the inner dimension # invalid data may be 0.0, inf, nan, etc. ort_output = [ [inf, 1.2, 1.0, 1.2], [inf, 3.4, 2.3, 3.4], [inf, 5.7, 4.5, 5.7], ] ```	2024-11-11 19:49:27 -08:00
shiyi	f7d1f0fc5e	Reland "[WebNN] Fallback the node when its output doesn't have shape info" (#22685 ) The previous PR was reverted because it causes the whole model to fallback when there is output shape info missing. This PR fixes the issue by removing redundant fallbacks.	2024-11-11 16:30:10 -08:00
Adrian Lizarraga	b1e0930eab	Fix build for linux python wheel (#22801 ) ### Description Fixes command for building Linux python packages by preventing an empty `-p` command-line option from being passed to a subsequent build script: `1f3b675453/tools/ci_build/github/linux/run_python_dockerbuild.sh (L37)` ### Motivation and Context A recent [PR ](https://github.com/microsoft/onnxruntime/pull/22773)introduced a new optional command-line option (`-p`) to pass custom python exe paths. We need to check if the option is empty before forwarding the option to a separate build script.	2024-11-11 15:20:07 -08:00
Jian Chen	885a7acd45	Fix warning - LegacyKeyValueFormat: "ENV key=value" should be used instead of legacy "ENV key value" format (#22800 ) ### Description This PR Fix warning - `LegacyKeyValueFormat: "ENV key=value" should be used instead of legacy "ENV key value" format` from all Dockerfile ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-11-11 13:05:34 -08:00
Xavier Dupré	1f3b675453	Fix MatMulBnFusion to exclude cases when tensors are not 2D tensors (#22762 ) ### Description Fixes #22512, MatMul, Add can be fused into a single Gemm even if tensors dimensions are > 2. The PR excludes that cases. ### Motivation and Context ORT crashes on valid models due to that unexpected fusion.	2024-11-11 19:48:25 +01:00
Dmitri Smirnov	c5276ac448	Revert "enable serialize prepacked weights into data file (#22256 )" (#22788 ) This reverts commit `c5b6be045f`. ### Description Revert ### Motivation and Context This needs simpler and more robust approach	2024-11-11 09:59:05 -08:00
sheetalarkadam	e8f1d73b0b	Add Android QNN Browserstack test (#22434 ) Add Android QNN Browserstack test ### Motivation and Context Real device test in CI	2024-11-10 16:10:29 -08:00
Preetha Veeramalai	c9ed016b12	OVEP Dynamic WorkloadType support (#22779 ) ### Description Support to set EPdynamic options in OVEP ### Motivation and Context relate to https://github.com/microsoft/onnxruntime/pull/22282 --------- Co-authored-by: Javier E. Martinez <javier.e.martinez@intel.com>	2024-11-09 23:26:29 -08:00
shiyi	63cb53257b	[WebNN] Support steps >= 1 for slice operator (#22708 ) Co-authored-by: Wanming Lin <wanming.lin@intel.com>	2024-11-09 18:20:52 -08:00

1 2 3 4 5 ...

12015 commits