onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-16 21:00:14 +00:00

Author	SHA1	Message	Date
Jian Chen	5659d055ee	Fix Linux CI pipeline where ep was not provided for py-packaging-linux-test-cpu.yml (#22828 ) ### Description Current linux-ci-pipeline was broken due to missing parameters from `py-packaging-linux-test-cpu.yml` template ### Motivation and Context Fix Linux CI pipeline	2024-11-14 09:41:37 -08:00
Tianlei Wu	09c98433e7	[CUDA] stable diffusion benchmark allows IO binding for optimum (#22834 ) ### Description Update stable diffusion benchmark: (1) allow IO binding for optimum. (2) do not use num_images_per_prompt across all engines for fair comparison. Example to run benchmark of optimum on stable diffusion 1.5: ``` git clone https://github.com/tianleiwu/optimum cd optimum git checkout tlwu/diffusers-io-binding pip install -e . pip install -U onnxruntime-gpu git clone https://github.com/microsoft/onnxruntime cd onnxruntime/onnxruntime/python/tools/transformers/models/stable_diffusion git checkout tlwu/benchmark_sd_optimum_io_binding pip install -r requirements/cuda12/requirements.txt optimum-cli export onnx --model runwayml/stable-diffusion-v1-5 --task text-to-image ./sd_onnx_fp32 python optimize_pipeline.py -i ./sd_onnx_fp32 -o ./sd_onnx_fp16 --float16 python benchmark.py -e optimum -r cuda -v 1.5 -p ./sd_onnx_fp16 python benchmark.py -e optimum -r cuda -v 1.5 -p ./sd_onnx_fp16 --use_io_binding ``` Example output in H100_80GB_HBM3: 572 ms with IO Binding; 588 ms without IO Binding; IO binding gains 16ms, or 2.7%, ### Motivation and Context Optimum is working on enabling I/O binding: https://github.com/huggingface/optimum/pull/2056. This could help testing the impact of I/O binding on the performance of the stable diffusion.	2024-11-14 00:09:07 -08:00
Michael Tyler	dd99e34d66	Enable ConvReplaceWithQLinear when using ACL (#22823 ) ### Description Enable the ConvReplaceWithQLinear graph optimization when using the ACL execution provider. ### Motivation and Context Fixes an issue where quantized Conv nodes followed by ReLU don't get converted to QLinearConv, so ACL sees the weights as mutable and therefore cannot run the Conv node. Signed-off-by: Michael Tyler <michael.tyler@arm.com>	2024-11-13 21:44:50 -08:00
Wanming Lin	82681205e4	[WebNN] Fix MLTensorUsage is undefined issue (#22831 ) `MLTensorUsage` has been removed from Chromium: https://chromium-review.googlesource.com/c/chromium/src/+/6015318, but we still need to make it compatible with old Chrome versions, so just make it `undefined` for latest Chrome version.	2024-11-13 20:22:22 -08:00
Jian Chen	f423b737a9	Fix Linux python CUDA package pipeline (#22803 ) ### Description Making ::p optional in the Linux python CUDA package pipeline ### Motivation and Context Linux stage from Python-CUDA-Packaging-Pipeline has failed since merge of #22773	2024-11-13 14:20:21 -08:00
microsoft-github-policy-service[bot]	6d7603f054	Auto-generated baselines by 1ES Pipeline Templates (#22817 )	2024-11-13 13:50:52 -08:00
Bin Miao	a15381d7fc	[WebNN EP] Fix issues of GRU operator (#22123 ) ### Description This PR fixes the spelling of the key value of the GRU operator in the map in the `GetSupportedNodes` function (Gru -> GRU) and removes the data type check for the fifth input (sequence_lens) of the GRU operator. PTAL, thanks!	2024-11-13 13:34:34 -08:00
Hector Li	a9b62fa8da	Keep the model metadata on the generated EP context model (#22825 ) ### Description Keep the model metadata on the generated EP context model	2024-11-13 11:52:21 -08:00
Chi Lo	fa4cbcd36b	[TensorRT EP] Add new provider option to exclude nodes from running on TRT (#22681 ) Add new provider option `trt_op_types_to_exclude`: - User can provide op type list to be excluded from running on TRT - e.g. `trt_op_types_to_exclude="MaxPool"` There is a known performance issue with the DDS ops (NonMaxSuppression, NonZero and RoiAlign) from TRT versions 10.0 to 10.7. TRT EP excludes DDS ops from running on TRT by default, user can override default value with empty string to include all ops.	2024-11-13 11:34:43 -08:00
shiyi	3adcf4d714	[WebNN] Remove validation for coordinate_transformation_mode (#22811 ) The performance cost of falling back to the CPU EP is high for several resampling nodes and causes multiple partitions in SD Turbo and VAE decoder. Since the asymmetric mode with nearest to floor and integer scales is identical to half_pixel anyway, stick with the WebNN EP.	2024-11-13 11:12:00 -08:00
Xu Xing	ff57ac4f3d	[js/webgpu] Add scatterND (#22755 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-11-13 09:13:00 -08:00
liqun Fu	bc2b1b5e37	Fix issue #22796 - a typo: (__GNUC__ > 9) -> (__GNUC__ > 10) (#22807 ) ### Description fix #22796 Signed-off-by: liqunfu <liqun.fu@microsoft.com>	2024-11-12 18:56:35 -08:00
Xiang Zhang	69a36eb231	Revert Implement DML copy for Lora Adapters (#22814 ) Revert https://github.com/microsoft/onnxruntime/pull/22396	2024-11-12 17:45:59 -05:00
Jing Fang	7fa69461fd	[ARM] MatMulNBits FP16 support - kernels only (#22806 ) ### Description A break down PR of https://github.com/microsoft/onnxruntime/pull/22651 Add fp16 kernels. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-11-12 14:28:47 -08:00
Jiajia Qin	7e0dd9d433	[js/webgpu] Optimize Expand (#22752 ) Use components = 4 if possible. llama3.2-1B becomes 20 tokens/s from 18 tokens/s on my iGPUs.	2024-11-12 12:37:19 -08:00
Jiajia Qin	05c8dc9d1c	[js/webgpu] Optimize ConvTranspose (#22774 ) BUG #22031 The overall time of ConvTranspose in Demucs model becomes 517.41 ms from 1415.65 ms on my iGPUs.	2024-11-12 12:37:07 -08:00
Bin Miao	67f5be0da2	[WebNN EP] Support LRN operator (#22775 ) WebNN doesn't provide dedicate op for LRN, use a couple of WebNN ops to emulate it in WebNN EP: pow -> transpose -> pad -> averagePool -> transpose -> mul -> add -> pow -> div @Honry @fdwr PTAL, thanks!	2024-11-12 11:53:52 -08:00
junchao-zhao	fd5b1a18ee	Fix LARCH64 compile error (#22759 ) ### Description Currently loongarch has not implemented AIsSigned qgemm, so I added bypass for it	2024-11-12 11:47:43 -08:00
Jian Chen	75a44582ba	Update all JDK version to 17 (#22786 )	2024-11-12 11:42:18 -08:00
Ted Themistokleous	2b0f3435d2	[MIGraphX EP] Add support for Gelu, BiasGelu, FastGelu operators (#22808 ) ### Description Adds support for different flavours of gelu already supported in MIGraphX	2024-11-12 11:04:15 -08:00
dtang317	9836ef1c89	register Identity and QLinearMatmul for opset21 (#22804 ) ### Description This PR registers the following opset 21 operators: Idenity-21 OlieanrMatmul-21 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-11-12 09:36:19 -08:00
amarin16	f0ac5e0d3d	Update skip layer norm (#22719 ) Update the `SkipLayerNorm` implementation to address issues.	2024-11-12 07:01:30 -08:00
Wanming Lin	cdc8db9984	[WebNN] Fixed WebNN Module undefined issue (#22795 ) `Module.jsepRegisterMLConstant` will be shorten by Closure Compiler in offical release, this would cause undefined error. Fix it by using `Module['jsepRegisterMLConstant']`.	2024-11-11 21:31:24 -08:00
Adrian Lizarraga	0ad44d0f79	[Quant Tool] Flaky test due to Pad reflect bug (#22798 ) ### Description Fixes a unit test that would fail intermittently due to an existing bug with Pad (reflect mode). When the number of padded values is >= the inner dimension size, the ORT Pad implementation accesses invalid memory. This PR makes the number of padding values less than the inner dimension size to avoid triggering the bug. ### Motivation and Context See related issues: https://github.com/microsoft/onnxruntime/issues/8265 https://github.com/microsoft/onnxruntime/issues/11828 https://github.com/microsoft/onnxruntime/issues/20801 Here's a valgrind trace obtained on a Linux machine (with `sess_options.enable_cpu_mem_arena = False`) ``` ==864228== Invalid read of size 4 ==864228== at 0x2716272A: void onnxruntime::PadInnermostAxis<unsigned int>(unsigned int, unsigned int, long, unsigned long) (pad.cc:370) ==864228== by 0x2715D213: onnxruntime::common::Status onnxruntime::PadImpl<unsigned int>(onnxruntime::OpKernelContext, absl::lts_20240722::InlinedVector<long, 10ul, std::allocator<long> > const&, absl::lts_20240722::InlinedVector<long, 10ul, std::allocator<long> > const&, onnxruntime::Mode const&, unsigned int) (pad.cc:551) ==864228== by 0x2715B2BB: onnxruntime::Pad::Compute(onnxruntime::OpKernelContext) const (pad.cc:725) ==864228== by 0x276FF6A7: onnxruntime::ExecuteKernel(onnxruntime::StreamExecutionContext&, unsigned long, unsigned long, bool const&, onnxruntime::SessionScope&) (sequential_executor.cc:484) ==864228== by 0x276F4A04: onnxruntime::LaunchKernelStep::Execute(onnxruntime::StreamExecutionContext&, unsigned long, onnxruntime::SessionScope&, bool const&, bool&) (execution_steps.cc:73) ... ``` The above is obtained with the basic Pad(reflect) example on the [ONNX Pad operator spec page](https://onnx.ai/onnx/operators/onnx__Pad.html#summary): ```python data = [ [1.0, 1.2], [2.3, 3.4], [4.5, 5.7], ] pads = [0, 2, 0, 0] mode = 'reflect' # Expected output by ONNX spec expected_output = [ [1.0, 1.2, 1.0, 1.2], [2.3, 3.4, 2.3, 3.4], [4.5, 5.7, 4.5, 5.7], ] # Bugged output from onnxruntime has invalid/uninitialized data for the first element in the inner dimension # invalid data may be 0.0, inf, nan, etc. ort_output = [ [inf, 1.2, 1.0, 1.2], [inf, 3.4, 2.3, 3.4], [inf, 5.7, 4.5, 5.7], ] ```	2024-11-11 19:49:27 -08:00
shiyi	f7d1f0fc5e	Reland "[WebNN] Fallback the node when its output doesn't have shape info" (#22685 ) The previous PR was reverted because it causes the whole model to fallback when there is output shape info missing. This PR fixes the issue by removing redundant fallbacks.	2024-11-11 16:30:10 -08:00
Adrian Lizarraga	b1e0930eab	Fix build for linux python wheel (#22801 ) ### Description Fixes command for building Linux python packages by preventing an empty `-p` command-line option from being passed to a subsequent build script: `1f3b675453/tools/ci_build/github/linux/run_python_dockerbuild.sh (L37)` ### Motivation and Context A recent [PR ](https://github.com/microsoft/onnxruntime/pull/22773)introduced a new optional command-line option (`-p`) to pass custom python exe paths. We need to check if the option is empty before forwarding the option to a separate build script.	2024-11-11 15:20:07 -08:00
Jian Chen	885a7acd45	Fix warning - LegacyKeyValueFormat: "ENV key=value" should be used instead of legacy "ENV key value" format (#22800 ) ### Description This PR Fix warning - `LegacyKeyValueFormat: "ENV key=value" should be used instead of legacy "ENV key value" format` from all Dockerfile ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-11-11 13:05:34 -08:00
Xavier Dupré	1f3b675453	Fix MatMulBnFusion to exclude cases when tensors are not 2D tensors (#22762 ) ### Description Fixes #22512, MatMul, Add can be fused into a single Gemm even if tensors dimensions are > 2. The PR excludes that cases. ### Motivation and Context ORT crashes on valid models due to that unexpected fusion.	2024-11-11 19:48:25 +01:00
Dmitri Smirnov	c5276ac448	Revert "enable serialize prepacked weights into data file (#22256 )" (#22788 ) This reverts commit `c5b6be045f`. ### Description Revert ### Motivation and Context This needs simpler and more robust approach	2024-11-11 09:59:05 -08:00
sheetalarkadam	e8f1d73b0b	Add Android QNN Browserstack test (#22434 ) Add Android QNN Browserstack test ### Motivation and Context Real device test in CI	2024-11-10 16:10:29 -08:00
Preetha Veeramalai	c9ed016b12	OVEP Dynamic WorkloadType support (#22779 ) ### Description Support to set EPdynamic options in OVEP ### Motivation and Context relate to https://github.com/microsoft/onnxruntime/pull/22282 --------- Co-authored-by: Javier E. Martinez <javier.e.martinez@intel.com>	2024-11-09 23:26:29 -08:00
shiyi	63cb53257b	[WebNN] Support steps >= 1 for slice operator (#22708 ) Co-authored-by: Wanming Lin <wanming.lin@intel.com>	2024-11-09 18:20:52 -08:00
Wanming Lin	b9b1a0353a	[WebNN] QDQ's axis should be used for broadcasting (#22721 ) For per-axis quantization/dequantization, WebNN requires the scale and zero_point inputs to be broadcastable. Axis should be used for reshape these two inputs.	2024-11-09 18:19:46 -08:00
zz002	d3ad76b2cf	[VitisAI] Cache node subgraph when necessary (#22073 ) ### Description <!-- Describe your changes. --> [VitisAI] Cache node subgraph when necessary ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Zhenze Wang <zhenzew@xilinx.com> Co-authored-by: zhenzew <zhenzew@amd.com>	2024-11-08 23:17:16 -08:00
Yi Zhang	ef281f850a	Add XNNPack build on Linux ARM64 and improve Linux CPU (#22773 ) ### Description 1. Add XNNPack build on Linux ARM64 2. Build only one python wheel for PR request. [AB#49763](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/49763) ### Motivation and Context Why I add xnnpack build on Linux ARM64 rather than Windows ARM64. Becuase KleidiAI doesn't support Windows ``` IF(XNNPACK_TARGET_PROCESSOR STREQUAL "arm64" AND XNNPACK_ENABLE_ARM_I8MM AND NOT CMAKE_C_COMPILER_ID STREQUAL "MSVC") IF (XNNPACK_ENABLE_KLEIDIAI) MESSAGE(STATUS "Enabling KleidiAI for Arm64") ENDIF() ELSE() SET(XNNPACK_ENABLE_KLEIDIAI OFF) ENDIF() ``` ---------	2024-11-09 11:26:19 +08:00
Justin Chu	a8539ec7d1	Ignore all whitespace lint messages for cpplint (#22781 ) ### Description Ignore all whitespace lint messages for cpplint. Remove redundant configs in dml/. ### Motivation and Context They are handled automatically by clang-format and creates too much noise in the PR files tab.	2024-11-08 14:31:28 -08:00
Adrian Lizarraga	020d52d92c	[Quant Tool] Add reduce_range option to get_qdq_config() (#22782 ) ### Description Adds `reduce_range` option to `get_qdq_config()` ### Motivation and Context Make it easier to set this option when calling get_qdq_config(). Otherwise, user has to set the option manually.	2024-11-08 14:04:11 -08:00
xhcao	b5ee4ac760	[js/webgpu] support GridSample operator (#22652 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-11-08 11:02:36 -08:00
jzm-intel	d9b91682f1	WebGPU JSEP: Make shader code not depend on input broadcasting patterns (#22536 ) This PR make MatMul shaders not depend on inputs broadcasting pattern, but only depend on input ranks and their shape provided in uniform. This change fix the issue that currently shaders code are different for different broadcasting, but have identical cache key and results in wrong cache hit.	2024-11-08 11:00:51 -08:00
Michael Cho	4d614e15bd	Fix build with GCC 11 (#22770 ) ### Description Fix a build error seen with GCC 11 when building at Homebrew on our Linux x86_64 Ubuntu 22.04 CI (GitHub action runner). ### Motivation and Context When building latest v1.20.0 at Homebrew (https://github.com/Homebrew/homebrew-core/pull/196547), we hit a build failure with GCC 11: ``` [ 65%] Building CXX object CMakeFiles/onnxruntime_optimizer.dir/tmp/onnxruntime-20241103-6403-lh3bwj/onnxruntime/core/optimizer/transpose_optimization/onnx_transpose_optimization.cc.o /home/linuxbrew/.linuxbrew/Homebrew/Library/Homebrew/shims/linux/super/g++-11 -DCPUINFO_SUPPORTED_PLATFORM=1 -DEIGEN_MPL2_ONLY -DEIGEN_USE_THREADS -DENABLE_CPU_FP16_TRAINING_OPS -DHAS_STRING_VIEW=1 -DNSYNC_ATOMIC_CPP11 -DONLY_C_LOCALE=0 -DONNX_ML=1 -DONNX_NAMESPACE=onnx -DORT_ENABLE_STREAM -DORT_NO_RTTI -DPLATFORM_POSIX -DPROTOBUF_USE_DLLS -D_GNU_SOURCE -I/tmp/onnxruntime-20241103-6403-lh3bwj/build/_deps/utf8_range-src -I/tmp/onnxruntime-20241103-6403-lh3bwj/include/onnxruntime -I/tmp/onnxruntime-20241103-6403-lh3bwj/include/onnxruntime/core/session -I/tmp/onnxruntime-20241103-6403-lh3bwj/build/_deps/pytorch_cpuinfo-src/include -I/tmp/onnxruntime-20241103-6403-lh3bwj/build -I/tmp/onnxruntime-20241103-6403-lh3bwj/onnxruntime -I/tmp/onnxruntime-20241103-6403-lh3bwj/build/_deps/onnx-src -I/tmp/onnxruntime-20241103-6403-lh3bwj/build/_deps/onnx-build -ffunction-sections -fdata-sections -Wno-restrict -DCPUINFO_SUPPORTED -O3 -DNDEBUG -fPIC -fno-rtti -Wall -Wextra -Wno-deprecated-copy -Wno-tautological-pointer-compare -Wno-nonnull-compare -Wno-ambiguous-reversed-operator -Wno-deprecated-anon-enum-enum-conversion -Wno-undefined-var-template -Wno-deprecated-builtins -Wshorten-64-to-32 -Werror -MD -MT CMakeFiles/onnxruntime_optimizer.dir/tmp/onnxruntime-20241103-6403-lh3bwj/onnxruntime/core/optimizer/transpose_optimization/onnx_transpose_optimization.cc.o -MF CMakeFiles/onnxruntime_optimizer.dir/tmp/onnxruntime-20241103-6403-lh3bwj/onnxruntime/core/optimizer/transpose_optimization/onnx_transpose_optimization.cc.o.d -o CMakeFiles/onnxruntime_optimizer.dir/tmp/onnxruntime-20241103-6403-lh3bwj/onnxruntime/core/optimizer/transpose_optimization/onnx_transpose_optimization.cc.o -c /tmp/onnxruntime-20241103-6403-lh3bwj/onnxruntime/core/optimizer/transpose_optimization/onnx_transpose_optimization.cc /tmp/onnxruntime-20241103-6403-lh3bwj/onnxruntime/core/optimizer/transpose_optimization/onnx_transpose_optimization.cc: In function ‘void onnx_transpose_optimization::Permute1DConstant(onnx_transpose_optimization::api::GraphRef&, onnx_transpose_optimization::api::NodeRef&, onnx_transpose_optimization::api::TensorRef&, size_t, std::string_view, const std::vector<long int>&)’: /tmp/onnxruntime-20241103-6403-lh3bwj/onnxruntime/core/optimizer/transpose_optimization/onnx_transpose_optimization.cc:1114:10: error: ‘memcpy’ is not a member of ‘std’; did you mean ‘wmemcpy’? 1114 \| std::memcpy(dst, src, bytes_per_val); \| ^~~~~~ \| wmemcpy ``` It is possible this error may not occur on different GCC versions if `cstring` has been indirectly included by another header.	2024-11-07 21:04:57 -08:00
Jian Chen	e7987a6b0b	Replace reference to python 3.8 with python 3.10 (#22692 ) ### Description This PR will set default python to 3.10 except tools/ci_build/github/azure-pipelines/bigmodels-ci-pipeline.yml. This is needed because we are no longer using python 3.8 This PR excludes changes for Big Models CI, because it will require additional changes. Which will be track in USER STORY 52729 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-11-07 16:51:40 -08:00
Ranjit Ranjan	193671295e	[AIX] Fix for AIX build break (#22745 ) ### Description With recent changes, below build error is found under AIX. ``` ld: 0706-012 The -p flag is not recognized. ld: 0706-012 The -a flag is not recognized. ld: 0706-012 The -t flag is not recognized. ld: 0706-012 The -h flag is not recognized. ld: 0706-012 The -= flag is not recognized. ld: 0706-012 The -$ flag is not recognized. ld: 0706-012 The -$ flag is not recognized. ld: 0706-012 The -O flag is not recognized. ld: 0706-027 The -R IGIN flag is ignored. collect2: error: ld returned 255 exit status ``` ### Motivation and Context AIX linker doesn't support -rpath option , so blocking this option under AIX.	2024-11-07 13:22:22 -08:00
raoanag	f16036b6f5	[DML EP] Prefer MatMulInteger over MatMulIntegerToFloat in case of (#22469 ) ### Description Skip `MatMulIntegerToFloat` fusion in case of DML EP for cases where model uses Quantization before `MatMulInteger`. This is mainly done to be resource efficient, and we have better `MatMulInteger` Metacommand coverage which computes in int data type ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-11-07 10:02:01 -08:00
Yulong Wang	a436b3af1a	[webgpu] fix indices type when it's 4D (#22758 ) ### Description Fix indices type from `array<u32, 4>` to `vec4<u32>` when the variable is 4D.	2024-11-07 08:10:05 -08:00
jzm-intel	6a295eb75b	[JS/WebGPU] Creating devices with subgroup features enabled if possible (#21833 ) This CL make WebGPU backend support subgroup features and thus allow using subgroup optimizations in the future. ### Description With this CL WebGPU backends will create devices with subgroups and subgroups-f16 features (both are under origin trial in Chrome) or chromium-experimental-subgroups feature enabled whenever available. ### Motivation and Context This CL would allow WebGPU operator shaders to use subgroup optimizations in the future, and might get some significant speedup with these optimization.	2024-11-07 02:13:40 -08:00
Yifan Li	3b7a6eba69	[TensorRT EP] support TensorRT 10.6-GA (#22644 ) ### Description <!-- Describe your changes. --> * Update CI with TRT 10.6 * Update oss parser to [10.6-GA-ORT-DDS ](https://github.com/onnx/onnx-tensorrt/tree/10.6-GA-ORT-DDS) and update dependency version * Update Py-cuda11 CI to use trt10.6 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> (There will be 3rd PR to further reduce trt_version hardcoding)	2024-11-06 14:33:46 -08:00
Adrian Lizarraga	aa0cf1c5e1	[Quant Tool] Update QDQ Pad, Slice, Softmax (#22676 ) ### Description Updates python quantization tool: - Ensures QDQ Pad has equal quantization parameters across input and output for certain Pad configurations. - Ensures QDQ Slice always has equal quantization parameters across input and output. - Fixes bug when Softmax is _excluded_ from quantization. ### Motivation and Context QDQ Pad and Slice have lower latency on QNN EP when their quantization parameters are equal.	2024-11-06 14:06:29 -08:00
Caroline Zhu	0221693e43	[Mobile] Add E2E BrowserStack tests for iOS tests (#22610 ) ### Description - Changes running the E2E iOS tests from running in App Center to running in BrowserStack - Steps for running locally can be found in the OneNote ### Motivation and Context - Follow-up of #22117 - App Center (the previous platform for running E2E mobile tests) is getting deprecated in 2025 ### Misc info Additional build steps were required to get the necessary testing artifacts for BrowserStack. App Center consumed an entire folder, while BrowserStack requests the following: 1. a ZIP file of all the tests 2. an IPA file of the test app #### Flow Here is a rough outline of what is happening in the pipeline: 1. The build_and_assemble_apple_pods.py script builds the relevant frameworks (currently, this means packages for iOS and Mac) 4. The test_apple_packages.py script installs the necessary cocoapods for later steps 5. XCode task to build for testing builds the iOS target for the test app 6. Now that the test app and the tests have been built, we can zip them, creating the tests .zip file 7. To create the IPA file, we need to create a .plist XML file which is generated by the generate_plist.py script. - Attempts to use the Xcode@5 task to automatically generate the plist file failed. - Also, building for testing generates some plist files -- these cannot be used to export an IPA file. 8. We run the Xcode task to build an .xcarchive file, which is required for creating an IPA file. 9. We use xcodebuild in a script step to build an IPA file with the xcarchive and plist files from the last two steps. 10. Finally, we can run the tests using the BrowserStack script. --------- Co-authored-by: Scott McKay <skottmckay@gmail.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2024-11-06 11:22:29 -08:00
Adrian Lizarraga	4f6993d567	[Quant Tool] Prevent int32 quantized bias from clipping by adjusting the weight's scale (#22020 ) ### Description Fixes scenario in which a bias input quantized to int32 has a scale that is too small. A bias with a scale that is smaller than a certain threshold will overflow the range of an `int32` when quantized, which significantly decreases accuracy. Credit to @yihonglyu for finding out about this issue and the fix. ### Motivation and Context Consider the following Convolution with very small weights and a constant bias input of `[5, -4.5]`. ![image](https://github.com/user-attachments/assets/4bde2bd9-892f-4ae9-887b-61a6668779a1) The QDQ quantizer first computes the following quantization scale for `input_0` and `weight`: - `input_0`: scale=0.5 - `weight`: scale=7.843e-10 [really small] The QDQ quantizer then computes the bias input's scale as follows: ``` bias_scale = input_0_scale * weight_0_scale = 0.5 * 7.843e-10 = 3.9215686274509805e-11 ``` This `bias_scale` is too small. Before this PR, the QDQ quantizer would quantize the f32 bias with this `bias_scale`: ``` bias_quant = round(bias_f32 / bias_scale) = round([5.0/bias_scale, -4.5/bias_scale]) = [127500000000, -114750000000] ``` These quantized bias values exceed the range of int32, and so are clipped to [int32.min(), int32.max()], which is very inaccurate. #### New approach This PR increases the `weight_0_scale` by the necessary amount to ensure that `bias_scale` (which equals `weight_0_scale * input_0_scale`) is appropriate for the int32 quantization type. The smallest valid bias scale is given by the normal scale formula: `bias_smallest_valid_scale = (bias_f32_max - bias_f32_min) / (int32_max - int32_min)` Then, we compute the candidate bias scale: `bias_scale_candidate = input_0_scale * weight_0_scale` If the candidate scale is smaller than the smallest valid scale, we increase the `weight_0_scale` by the necessary ratio: ```python if bias_scale_candidate < bias_smallest_valid_scale: ratio = bias_smallest_valid_scale / bias_scale_candidate weight_0_scale = ratio * weight_0_scale ``` Then, we recompute the final bias scale: ```python bias_scale = input_0_scale * weight_0_scale ``` #### Impact on accuracy Here's the above model's quantized output compared to the f32 (ground-truth) output. - Before PR: - f32 model output[0]: 5.0f - qdq model output[0]: 0.075 - SNR: 0.1369 (higher is better) - After PR: - f32 model output[0]: 5.0f - qdq model output[0]: 4.992 - SNR: 55.656 (higher is better)	2024-11-06 10:44:54 -08:00
Adrian Lizarraga	2c1b17ce98	[Quant Tool] Introduce get_qdq_config() helper to get QDQ configurations (#22677 ) ### Description Introduces the `get_qdq_config()` function to get a quantization configuration for a full integer QDQ model. This function provides an easier way of specifying commonly used options and sets convenient defaults. Specifically: - Instead of requiring the user to pass a dictionary of `extra_options`, the new interface adds function parameters for common settings: - All calibrator settings - Whether activations/weights are symmetric - Whether to keep or fuse relu/clip into Q - Minimum real range for quantization - Dictionary of tensor quantization overrides. - Automatically scans the input floating-point model and fills out the operator types to quantize. Otherwise, only a limited number of operator types would be quantized by default. - Detects if the input model uses external data. If so, ensures that the generated QDQ model also uses external data. - Detects if the model will use newly introduced quantization types (int4/int16) with an older opset. If so, forces the use of the `com.microsoft` domain for Q/DQ ops, which support all types. - Automatically enables the "extra option" called `ForceQuantizeNoInputCheck` to ensure data movement operators (e.g., Transpose) are always quantized. - User can pass a function to indicate which nodes to exclude from quantization. - The user can still pass their own `extra_options` to override any of the above if necessary. ```python from onnxruntime.quantization import get_int_qdq_config, quantize # , ... # Get QDQ configuration qdq_config = get_int_qdq_config( float_model, data_reader, calibrate_method=CalibrationMethod.Percentile, calibrate_args={"percentile": 99.98}, # Converted to extra_options activation_type=QuantType.QUInt8, weight_type=QuantType.QInt8, per_channel=True, nodes_to_exclude=["Mul"], # Could also be a function. Ex: `lambda model, node: node.op_type == "Softmax"` # Other options converted to extra_options: min_real_range=0.0001, keep_removable_activations=True, activation_symmetric=True, weight_symmetric=True, ) # Quantize model quantize(float_model_path, qdq_model_path, qdq_config) ``` ### Motivation and Context Need a version of `get_qnn_qdq_config()` that is not EP-specific.	2024-11-06 10:27:02 -08:00

1 2 3 4 5 ...

11997 commits