onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-27 03:11:28 +00:00

Author	SHA1	Message	Date
Adrian Lizarraga	cdc5d72ba9	[QDQ Quant] Support mixed-precision integer quantization via overrides (#19925 ) ### Description Adds support for specifying mixed precision QDQ models via tensor quantization overrides. ### Motivation and Context This PR implements an approach for supported "mixed precision" models. The following figure demonstrates an example mixed precision model as defined in this PR. ![image](https://github.com/microsoft/onnxruntime/assets/19691973/40ae3bf9-b21a-4ba5-a1cd-41c1e08c21e7) A mixed precision QDQ model consists of regions with different activation/weight quantization data types. The boundary between regions converts between activation quantization data types (e.g., uint8 to uint16) using a DQ to Q sequence. The ability to specify regions with different quantization data types enables exploring the tradeoffs between accuracy and latency. A higher integer precision may improve accuracy at the expense of latency, so selectively promoting certain regions to a higher precision can aid in achieving a desirable balance in key metrics. #### Current support By default, the ORT quantizer supports specifying default activation and weight quantization data types for the entire model. A recent PR added support for specifying basic quantization overrides at the tensor level via the `extra_options["TensorQuantOverrides"]` configuration: ``` TensorQuantOverrides = dictionary : Default is {}. Set tensor quantization overrides. The key is a tensor name and the value is a list of dictionaries. For per-tensor quantization, the list contains a single dictionary. For per-channel quantization, the list contains a dictionary for each channel in the tensor. Each dictionary contains optional overrides with the following keys and values. 'quant_type' = QuantType : The tensor's quantization data type. 'scale' = Float : The scale value to use. Must also specify `zero_point` if set. 'zero_point' = Int : The zero-point value to use. Must also specify `scale` is set. 'symmetric' = Bool : If the tensor should use symmetric quantization. Invalid if also set `scale` or `zero_point`. 'reduce_range' = Bool : If the quantization range should be reduced. Invalid if also set `scale` or `zero_point`. 'rmax' = Float : Override the maximum real tensor value in calibration data. Invalid if also set `scale` or `zero_point`. 'rmin' = Float : Override the minimum real tensor value in calibration data. Invalid if also set `scale` or `zero_point`. ``` The tensor-level overrides are currently used to override the quantization type for weights/initializers or to set specific scale/zero-point values for a tensor (e.g., QNN requires Sigmoid to use a specific scale/zero-point at its output). However, these overrides are not typically used to override activation quantization types due in large part to operator data type constraints. Consider, for example, that all inputs and outputs to an Add operator must be of the same data type. Consequently, using tensor-level overrides to promote the Add’s output to 16-bits would force the inputs to also be overridden to 16-bit. In turn, this would have a cascading effect on potentially the entire graph. The solution implemented by this PR is to allow the specification of tensor boundaries where the activation quantization data type changes. #### The approach The following figure shows a model with a region that has been promoted to 16-bit from the default 8-bit activation type. ![image](https://github.com/microsoft/onnxruntime/assets/19691973/5998c301-ae20-4ac9-8a43-37f335cfcf8b) Note the following observations: - Op2’s output is consumed by Op4, Op7, and Op8. Op4 consumes the converted u16 type, while Op7 and Op8 consume the original u8 type. - Op3’s output is converted from u8 to u16. Op5 consumes the converted u16 type. - Op4’s output is just u16 (not converted). - Op5’s output is converted from u16 to u8. Op6 consumes the u8 type. The approach implemented by this PR uses the tensor-level quantization overrides to specify a tensor’s quantization type at both the producer and consumer ends. The following shows the overrides necessary to create this mixed precision QDQ model. ```python3 overrides = { “Op2_out”: [{“quant_type”: QUInt8, “convert”: {“quant_type”: QUInt16, “recv_nodes”: {“Op4”}}}], “Op3_out”: [{“quant_type”: QUInt8, “convert”: {“quant_type”: QUInt16, “recv_nodes”: {“Op5”}}}], “Op4_out”: [{“quant_type”: QUInt16}], “Op5_out”: [{“quant_type”: QUInt16, “convert”: {“quant_type”: QUInt8, “recv_nodes”: {“Op6”}}}] } ```	2024-03-23 11:05:08 -07:00
Changming Sun	3b4b99b90b	Fix a bug in WASM's GEMM (#20023 ) ### Description Fix a bug in WASM's GEMM. The bug was found when running "ConvAddActivationFusionTests.ConvGemmDirect" unit test in a wasm build with address sanitizer enabled. When CountK=25, CountN=1, lda=25, ldc=1, the function I am modifying triggered a read out of bound error. The bug fix was provided by @fs-eire.	2024-03-23 08:53:50 -07:00
Xiaoyu	71551dacd5	Add ModelProto support for transformers optimize_model (#19990 ) ### Description Add `ModelProto` support as an input to transformers `optimize_model` API. ### Motivation and Context Currently, the `optimize_model` API only accepts a model path as the input model. However, for large models, saving and loading from disk can be time-consuming. By adding `ModelProto` as an input option to the `optimize_model` API, significant time can be saved.	2024-03-22 18:40:58 -07:00
Dmitri Smirnov	3076b56947	Make MS Debug engine SymInitialize() called as needed. (#20036 ) ### Description <!-- Describe your changes. --> Initialize Symbol engine as needed with no duplicate calls. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Currently absel library may call SymInitialize more than once when shared libraries are involved. However, this can only be called only once per process. Our debug_alloc also may call it when enabled. This change enables intialization to proceed only when needed with no duplicate effort.	2024-03-22 16:17:47 -07:00
kunal-vaishnavi	f9cddd2cf5	Remove early stopping from LLaMA end-to-end benchmarking (#20033 ) ### Description This PR removes early stopping from the end-to-end LLaMA-2 benchmark script. ### Motivation and Context This allows models to always generate the requested number of new tokens.	2024-03-22 14:44:34 -07:00
Abhishek Jindal	7e84ba0ea3	remove const cast for DLManagedTensor (#20015 ) ### Description <!-- Describe your changes. --> Removing const_cast as it might lead to unknown behavior. Specifying DLMangedTensor as a const doesn't seem to be necessary and I have tested this by running torch_ort.configure. Not sure what other tests which needs to be done. Background can be found in this [PR](https://github.com/microsoft/onnxruntime/pull/19982) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-03-22 10:39:19 -07:00
Baiju Meswani	2bc29244b4	Support model with multiple SCE loss nodes (#20016 )	2024-03-22 10:28:44 -07:00
kunal-vaishnavi	6238e9c0af	Add LLaMA end-to-end benchmarking (#19985 ) ### Description This PR adds a benchmarking script to measure end-to-end performance and saves the results in a CSV file. ### Motivation and Context With this PR, end-to-end performance can be easily measured for many large-language models such as LLaMA-2. The performance numbers for LLaMA-2 are located [here](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/python/models/llama).	2024-03-21 18:59:05 -07:00
sfatimar	eab35c20fc	Ort openvino npu 1.17 master (#19966 ) ### Description Add NPU to list of device supported. Added changes for Support to OV 2024.0 Nuget packages removes packaging of OpenVINO DLL Bug Fixes with Python API Reverted Dockerfiles not being maintained. ### Motivation and Context NPU Device has been introduced by Intel in latest client systems OpenVINO 2024.0 release is out. --------- Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com> Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com> Co-authored-by: Ubuntu <ubuntu@ubuntu-118727.iind.intel.com> Co-authored-by: hmamidix <hemax.sowjanya.mamidi@intel.com> Co-authored-by: vthaniel <vishnudas.thaniel.s@intel.com> Co-authored-by: saurabhkale17 <saurabh1.kale@intel.com>	2024-03-21 18:44:00 -07:00
Yi Zhang	cd6d3aea45	Refactor Python CUDA packaging pipeline to fix random hangs in building (#19989 ) ### Description 1. Move building on CPU machine. 2. Optimize the pipeline 3. Since there isn't official ONNX package for python 12, the python 12 test stage uses the packages built with ONNX source in build stage. ### Motivation and Context 1. Resolve the random hang in compilation 4. Save a lot of GPU resources. ---------	2024-03-22 09:16:00 +08:00
Changming Sun	dafbef3a21	CMake: support reading dependency zip files from a local mirror (#20005 ) ### Description To test this feature, run ```bat python cmake\deps_update_and_upload.py --root-path mirror ``` Then run build.py as usual. The zip files will be cached local. To avoid being downloaded again and again.	2024-03-21 17:58:59 -07:00
TP Boudreau	983fd8393a	Recognize NaN operands in Min and Max ops (#19984 ) ### Description Update the Min and Max CUDA math operations on float/double types to propagate NaNs: if either operand is NaN, the result should be NaN. TODO: float16/bfloat16 need similar change. ### Motivation Currently, results differ between the CPU and CUDA implementations of the floating point Min and Max operators: the CPU operators correctly return NaN results if either operand is NaN. This PR updates the CUDA implementations to conform with this correct behavior. See the the issue and comments raised [here](https://github.com/onnx/onnx/issues/6003). ### Context Same behavior in numpy, torch and Java: ``` >>> numpy.min([numpy.NAN, 1]) nan >>> numpy.max([numpy.NAN, 1]) nan >>> torch.min(torch.tensor([1, float('nan')])) tensor(nan) >>> torch.max(torch.tensor([1, float('nan')])) tensor(nan) ``` C languguage [fmin](https://en.cppreference.com/w/c/numeric/math/fmin) and [fmax](https://en.cppreference.com/w/c/numeric/math/fmax) has different behavior: ``` fmax(NaN,1) = 1 fmin(NaN,1) = 1 ``` https://grouper.ieee.org/groups/msc/ANSI_IEEE-Std-754-2019/background/minNum_maxNum_Removal_Demotion_v3.pdf ![image](https://github.com/microsoft/onnxruntime/assets/30328909/62446cf1-f252-4ddc-8118-5ce605252331) https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2273.pdf	2024-03-21 16:08:18 -07:00
Yi Zhang	30a0d80925	Fix exception in Publish unit test results step (#20007 ) ### Description Test results files are all in RelWithDebInfo\RelWithDebInfo directory. It's not necessary to stat the directory of _deps ### Motivation and Context Recently this exception in zip-nuget pipleine occurs many times. `##[error]Error: Failed find: EPERM: operation not permitted, stat 'D:\a\_work\1\b\RelWithDebInfo\_deps\flatbuffers-src\java\src\test\java\DictionaryLookup'` https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=426981&view=logs&j=75fc0348-fe99-522b-3acb-90fd80ac5271&t=5d4ebcc1-bcde-574d-6f4e-8abd0f04ae4b	2024-03-22 06:53:59 +08:00
Tianlei Wu	06fe4f3113	Increase MNIST test tolerance (#20000 ) ### Description Found multiple occurrence of failures: https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1321061&view=logs&j=6df8fe70-7b8f-505a-8ef0-8bf93da2bac7&t=56a04c0b-9e7f-5c69-cb7b-c2a7b1a7392a&l=17537 https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1329701&view=logs&j=6df8fe70-7b8f-505a-8ef0-8bf93da2bac7&t=4f6ef737-111d-50d1-a46b-5f86d9a970bc&s=3618b4c0-1011-591a-85b8-671e72e2cff1 1: [ RUN ] ModelTests/ModelTest.Run/ cuda__models_zoo_opset7_MNIST_model 1: D:\a\_work\1\s\onnxruntime\test\providers\cpu\model_tests.cc(358): error: Expected equality of these values: 1: COMPARE_RESULT::SUCCESS 1: Which is: 4-byte object <00-00 00-00> 1: ret.first 1: Which is: 4-byte object <01-00 00-00> 1: expected -2.33638 (c0158735), got -2.30239 (c0135a47), diff: 0.0339923, tol=0.0243638 idx=9	2024-03-20 23:40:27 -07:00
Prathik Rao	0b958bb421	add random seed to layernorm tests (#19998 ) Adds random seed to layernorm tests to prevent random failure. ### Motivation and Context Fixes https://github.com/microsoft/onnxruntime/issues/19983	2024-03-20 21:00:25 -07:00
Yi Zhang	175f149b30	Remove downloading deps in CUDA package test stage (#19993 ) ### Description <!-- Describe your changes. --> ### Motivation and Context downloading deps is not needed in test stage remove it to reduce random downloading errors	2024-03-21 10:01:03 +08:00
Justin Chu	0335ea9f1e	Use Java 11 to build project in the codeql pipeline (#19999 ) Codeql uses Java 8 by default, which is too old for the project. Related: https://learn.microsoft.com/en-us/java/openjdk/reasons-to-move-to-java-11 https://github.com/actions/setup-java	2024-03-20 17:53:48 -07:00
Yufeng Li	15219e2e71	turn on neural_speed by default (#19627 ) ### Description <!-- Describe your changes. --> the crash caused by the neural_speed turns out to be a very corn case. Turn it on by default. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-03-20 12:49:58 -07:00
Rachel Guo	6b305f95e0	Support xcframework for mac catalyst builds. (#19534 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> MAUI on macOS uses mac-catalyst which requires a different native binary. --------- Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: Scott McKay <skottmckay@gmail.com>	2024-03-20 10:55:19 -07:00
Adam Pocock	19ff4a6d6c	String Tensor SplitToSequence fix (#19942 )	2024-03-20 10:52:00 -07:00
Markus Tavenrath	0af5eacc8b	Fix broken Pooling CUDA NHWC Ops and ensure NCHW / NHWC parity. (#19889 ) ### Description Fixed all CUDA NHWC Pooling operations which were broken and enabled the NHWC CUDA pooling tests. Disabled all pooling tests which are not supported by the CUDA EP. ### Motivation and Context Ensure parity between CUDA NHWC / NCHW and work towards 100% tests enabled for the CUDA EP / CUDA NHWC EP. --------- Co-authored-by: Tianlei Wu <tlwu@microsoft.com>	2024-03-20 09:57:29 -07:00
Yi Zhang	8adbc09314	[Fix] Error Python Packaging Pipeline (Training CPU) (#19992 ) ### Description fix the error caused by https://github.com/microsoft/onnxruntime/pull/19973	2024-03-20 09:02:50 -07:00
zesongw	7e18cb4c35	[WebNN EP] Support MatMul 1D (#19862 ) ### Description Support MatMul 1D inputs by combining Reshape and ReduceMean. ### Motivation and Context ONNX MatMul can support 1D inputs, which is disabled in `IsOpSupportedImpl`.	2024-03-20 08:32:57 -07:00
Ye Wang	6ff31e06d5	[MoE] Add TP and Mixtral MoE (#19945 ) ### Description <!-- Describe your changes. --> 1.Support Tensor Parallelism in ShardedMoE. 2.Make necessary code changes to support Mixtral MoE. 3.Fix a bug related to using IOBinding in test script. 4.Fix the input size limitation ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-03-19 21:28:15 -07:00
mindest	3dfe4a5e6d	[ROCm] Remove MPI dependency and collectives to use NCCL (#19830 ) ### Description * Remove MPI dependency to use NCCL AllReduce, etc. * Exclude unsupported collectives in hipify	2024-03-19 17:35:18 -07:00
Abhishek Jindal	6fe02068af	Add const cast for DLManagedTensor (#19982 ) ### Description <!-- Describe your changes. --> Add Const Cast for DLManagedTensor as PyTorch has changed it's [code](https://github.com/pytorch/pytorch/pull/121102) which creates incompatibility. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix the below error while configuring ORT-training with nightly PyTorch ``` aten_op_executor.cc:60:40: error: invalid conversion from ‘const DLManagedTensor’ to ‘DLManagedTensor’ [-fpermissive] 60 \| at::Tensor tensor = at::fromDLPack(dlpack); \| ^~~~~~ \| \| \| const DLManagedTensor* ```	2024-03-19 17:00:44 -07:00
Guenther Schmuelling	c45cff60cf	[js/webgpu] fix maxpool / fp16 (#19981 )	2024-03-19 16:15:49 -07:00
Tianlei Wu	597e828aae	Adjust test tolerance (#19947 ) ### Description Improve the precision of tests. Changes include: (1) Update checkers.cc to use consistent default tolerance. (2) Allow different default tolerances for different providers at runtime (Previously, threshold of a test is decided during compiling). (3) Explicitly set absolute and relative error tolerances for tests that failed to pass new default threshold. #### Default Thresholds Change Note that the formula of testing is `abs(expected - value) < absolute + relative * expected` Default test thresholds when both absolute and relative tolerance are not set: type \| provider \| absolute (before) \| absolute (after) \| relative (before) \| relative (after) -- \| -- \| -- \| -- \| -- \| -- double \| CPU \| 0.001 \| 0.00001 \| 0 \| 0.00001 double \| CUDA \| 0.005 \| 0.00001 \| 0 \| 0.00001 double \| TRT \| 0.005 \| 0.00001 \| 0 \| 0.00001 double \| ROCM \| 0.005 \| 0.00001 \| 0 \| 0.00001 double \| DML \| 0.005 \| 0.00001 \| 0 \| 0.00001 \| \| \| \| \| float \| CPU \| 0.0001 \| 0.00001 \| 0 \| 0.0001 float \| CUDA \| 0.005 \| 0.00001 \| 0 \| 0.0001 float \| TRT \| 0.005 \| 0.00001 \| 0 \| 0.0001 float \| ROCM \| 0.005 \| 0.00001 \| 0 \| 0.0001 float \| DML \| 0.005 \| 0.00001 \| 0 \| 0.0001 float \| Training* \| 0.005 \| 0.001 \| 0 \| 0.0001 \| \| \| \| \| half \| CPU \| 0.001 \| 0.0025 \| 0 \| 0.001 half \| CUDA \| 0.005 \| 0.0025 \| 0 \| 0.001 half \| TRT \| 0.005 \| 0.0025 \| 0 \| 0.001 half \| ROCM \| 0.005 \| 0.0025 \| 0 \| 0.001 half \| DML \| 0.02 \| 0.005 \| 0 \| 0.001 half \| Training* \| 0.005 \| 0.005 \| 0 \| 0.001 \| \| \| \| \| bfloat16 \| CPU \| 0.0001 \| 0.02 \| 0 \| 0.01 bfloat16 \| CUDA \| 0.0001 \| 0.02 \| 0.05 \| 0.01 bfloat16 \| TRT \| 0.0001 \| 0.02 \| 0.05 \| 0.01 bfloat16 \| ROCM \| 0.0001 \| 0.02 \| 0.05 \| 0.01 bfloat16 \| DML \| 0.0001 \| 0.02 \| 0.05 \| 0.01 bfloat16 \| Training* \| 0.0001 \| 0.02 \| 0.05 \| 0.01 *Training mean a build flag ENABLE_TRAINING_CORE is defined. The provider can be any one. #### Threshold for provider Previously, the threshold might change according to build flags: ``` #if defined(USE_CUDA) \|\| defined(USE_ROCM) \|\| defined(USE_DML) constexpr float threshold = 0.005f; #else constexpr float threshold = 0.0001f; #endif ``` For a cpu only build, the threshold is 0.0001. For a cuda build, the threshold for CPU provider (some tests in cuda build actually run with CPU provider) is changed to 0.005. After this change, the threshold only depends on data type and provider used in the test. It will not change by build flags for non-training builds. Default thresholds for training might be different from inference (please refer to the above table). There are a few factors there: Training has gradient outputs; TF32 is not disabled in training; Some training tests has iterations, and error might accumulate. How to set different thresholds based on these factors could be a future task.	2024-03-19 15:50:13 -07:00
Hariharan Seshadri	cd6ec50b50	Switch a portion of CI/packaging jobs to MacOS12 (#19908 )	2024-03-19 14:54:58 -07:00
Adrian Lizarraga	18a7f34ba0	[NhwcTransformerTests] Fix linker error due to explicit template instantiation of ModelBuilder methods (#19980 ) Currently, the nhwc_transformer_test.cc compilation unit defines explicit FP16 versions of `ModelTestBuilder::MakeInput<MLFloat16>` and `ModelTestBuilder::MakeInitializer<MLFloat16>` outside of the ModelTestBuilder class's header file. These explicit template instantiations cause linker errors when other compilation units also instantiate these functions due to duplicate definitions. Additionally, the versions defined in nhwc_transformer_test.cc do not really conform to the expected behavior in the original ModelTestBuilder class, which is to make random input/initializer values. Instead, the versions in nhwc_transformer_test.cc create a range of values. The solution is to edit nhwc_transformer_test.cc to use stand-alone static functions that do not change the ModelTestBuilder class. Note: This linker error cannot currently be replicated in our CIs because it requires a QNN-HTP-enabled Windows ARM64 environment with `MLAS_F16VEC_INTRINSICS_SUPPORTED` defined. I can replicate on a local build. The linker error/conflict happens with with this new FP16 QNN test: `d4c8bc359e/onnxruntime/test/providers/qnn/clip_op_test.cc (L186)`	2024-03-19 13:48:04 -07:00
Yulong Wang	01c7aaf6aa	[js/webgpu] allow setting env.webgpu.adapter (#19940 ) ### Description Allow user to set `env.webgpu.adapter` before creating the first inference session. Feature request: https://github.com/microsoft/onnxruntime/pull/19857#issuecomment-1999984753 @xenova	2024-03-19 12:55:00 -07:00
Tianlei Wu	8293aa1564	Exclude TRT provider in tests crashed in A100 (#19972 ) TensorRT EP segmentation fault on A100 for some tests. Exclude TRT EP in those tests on A100 to unblock developing. ### Motivation and Context https://github.com/microsoft/onnxruntime/issues/19530	2024-03-19 11:36:42 -07:00
Yi Zhang	d4c8bc359e	Fix Training CPU docker image name to avoid unnecessary rebuilding (#19973 ) ### Description The docker image name was fixed, but the docker argument was different in different job. It would trigger rebuilding the docker image almost every time!!!	2024-03-19 09:33:24 -07:00
Prathik Rao	26cd3c1fb0	add kernel tests for ops that changed in opset18 (#19767 ) ### Description <!-- Describe your changes. --> - [x] Pad operator has introduced a new input called "axes" which specifies which axis to pad. But it defaults to input_rank if axes is not provided which was the behavior before the opset upgrade. - [x] ReduceMean - [x] ReduceL2 - [x] ReduceLogSumExp - [x] ReduceSum - Reduction ops all had the axes attribute switched to an input and a new attribute called "noop_with_empty_axes" was added to define what to do when axes is not specified. - [x] Resize has had two new attributes introduced: antialias and keep_aspect_ratio_policy. From Operators.md I've gathered: "Antialiasing is achieved by stretching the resampling filter by a factor max(1, 1 / scale), which means that when downsampling, more input pixels contribute to an output pixel." keep_aspect_ratio_policy "describes how to interpret the `sizes` input with regard to keeping the original aspect ratio of the input." there are a couple enum-type options that specify different policies and what to do in each case. - NOTE: Baiju already included opset18 tests in https://github.com/microsoft/onnxruntime/pull/17772 - [x] ScatterElements/ScatterND has had a new attribute introduced called "reduction." This specifies the type of reduction to apply: none (default), add, mul, max, min. - [x] Split introduced a new attribute called "num_outputs" which specifies how many outputs to split the input tensor into. This is in contrast to the previous, default behavior of specifying a "split" input which defines the size of each resultant tensor of the output. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-03-19 09:33:06 -07:00
Xu Xing	4c6a6a37f7	[js/webgpu] Fix NAN caused by un-initialized buffer in instance-norm (#19387 ) The added case will be NAN because of the un-initialized buffer.	2024-03-18 22:59:32 -07:00
Ted Themistokleous	6bb64683f8	Use version instead of version-dev for ROCm (#19967 )	2024-03-19 10:40:40 +08:00
Guenther Schmuelling	a4ac727cbb	handle fp16 for where op (#19969 ) this prevents falling back from webgpu to cpu, aka helps performance	2024-03-18 13:42:51 -07:00
Tianlei Wu	141966bb69	Disable TF32 in tests of CUDA ep (#19963 ) Operator or model test result shall not depend on whether NVIDIA_TF32_OVERRIDE environment variable is set or not. This make test results more deterministic.	2024-03-18 11:17:34 -07:00
Dmitri Smirnov	a033df8c31	Implement CustomOp Output Type Inference function (#19906 ) ### Description <!-- Describe your changes. --> This change addresses the following issues with the current CustomOP Output Type inference - The function does not take into account optional inputs. When input is absent the inference is silently aborted, and no output type is inferred (P1 customer issue) - Inferring output type based on the input type for multi-kernel custom ops is done based on the latest in sequence kernel definition. There is not an attempt made to match the kernel based on the input type. - Inference is aborted when variadic inputs/outputs are detected when the generated input/output names fail to obtain type constraints. This is not immediately clear from the code, because custom op schema is not available within the inference function. - No error reporting. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Most of CustomOPs lack their own type and shape inference function as it was recently introduced. For that reason, it is important to fix this. This change is inspired by a customer issue. This is a follow up on: - https://github.com/microsoft/onnxruntime/pull/15184 - https://github.com/cbourjau/ort-custom-op/pull/11 - https://github.com/microsoft/onnxruntime-extensions/issues/451	2024-03-18 10:28:39 -07:00
Edward Chen	4d31076d68	[objc] Add check for ORTValue being a tensor in ORTValue methods that should only be used with tensors. (#19946 ) Add check to report error instead of crashing.	2024-03-18 08:54:24 -07:00
Guenther Schmuelling	7e0d424934	accumulate in fp32 for Reduce* (#19868 )	2024-03-18 08:28:43 -07:00
dependabot[bot]	28ad6c3955	Bump follow-redirects from 1.15.4 to 1.15.6 in /js/node (#19951 ) Bumps [follow-redirects](https://github.com/follow-redirects/follow-redirects) from 1.15.4 to 1.15.6. <details> <summary>Commits</summary> <ul> <li><a href="`35a517c586`"><code>35a517c</code></a> Release version 1.15.6 of the npm package.</li> <li><a href="`c4f847f851`"><code>c4f847f</code></a> Drop Proxy-Authorization across hosts.</li> <li><a href="`8526b4a1b2`"><code>8526b4a</code></a> Use GitHub for disclosure.</li> <li><a href="`b1677ce001`"><code>b1677ce</code></a> Release version 1.15.5 of the npm package.</li> <li><a href="`d8914f7982`"><code>d8914f7</code></a> Preserve fragment in responseUrl.</li> <li>See full diff in <a href="https://github.com/follow-redirects/follow-redirects/compare/v1.15.4...v1.15.6">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=follow-redirects&package-manager=npm_and_yarn&previous-version=1.15.4&new-version=1.15.6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-03-16 18:54:53 -07:00
dependabot[bot]	4e55242a30	Bump follow-redirects from 1.15.4 to 1.15.6 in /onnxruntime/test/wasm (#19950 ) Bumps [follow-redirects](https://github.com/follow-redirects/follow-redirects) from 1.15.4 to 1.15.6. <details> <summary>Commits</summary> <ul> <li><a href="`35a517c586`"><code>35a517c</code></a> Release version 1.15.6 of the npm package.</li> <li><a href="`c4f847f851`"><code>c4f847f</code></a> Drop Proxy-Authorization across hosts.</li> <li><a href="`8526b4a1b2`"><code>8526b4a</code></a> Use GitHub for disclosure.</li> <li><a href="`b1677ce001`"><code>b1677ce</code></a> Release version 1.15.5 of the npm package.</li> <li><a href="`d8914f7982`"><code>d8914f7</code></a> Preserve fragment in responseUrl.</li> <li>See full diff in <a href="https://github.com/follow-redirects/follow-redirects/compare/v1.15.4...v1.15.6">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=follow-redirects&package-manager=npm_and_yarn&previous-version=1.15.4&new-version=1.15.6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-03-16 18:54:06 -07:00
dependabot[bot]	afdab62f53	Bump follow-redirects from 1.15.4 to 1.15.6 in /js/web (#19949 ) Bumps [follow-redirects](https://github.com/follow-redirects/follow-redirects) from 1.15.4 to 1.15.6. <details> <summary>Commits</summary> <ul> <li><a href="`35a517c586`"><code>35a517c</code></a> Release version 1.15.6 of the npm package.</li> <li><a href="`c4f847f851`"><code>c4f847f</code></a> Drop Proxy-Authorization across hosts.</li> <li><a href="`8526b4a1b2`"><code>8526b4a</code></a> Use GitHub for disclosure.</li> <li><a href="`b1677ce001`"><code>b1677ce</code></a> Release version 1.15.5 of the npm package.</li> <li><a href="`d8914f7982`"><code>d8914f7</code></a> Preserve fragment in responseUrl.</li> <li>See full diff in <a href="https://github.com/follow-redirects/follow-redirects/compare/v1.15.4...v1.15.6">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=follow-redirects&package-manager=npm_and_yarn&previous-version=1.15.4&new-version=1.15.6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-03-16 18:53:17 -07:00
wangshuai09	1eb67a07ca	Add cann_dependencies (#19929 ) ### Description <!-- Describe your changes. --> Add `cann_dependencies` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> The previous [PR](https://github.com/microsoft/onnxruntime/pull/17365) avioded using patchelf but lost `cann_dependencies`, This PR adds `cann_dependencies` to avoid require cann libraries when repairing wheel.	2024-03-15 20:28:43 -07:00
Yulong Wang	b29849a287	[js/common] fix typedoc warnings (#19933 ) ### Description Fix a few warnings in typedoc (for generating JS API): ``` [warning] The signature TrainingSession.loadParametersBuffer has an @param with name "buffer", which was not used. [warning] NonTensorType, defined in ./lib/onnx-value.ts, is referenced by OnnxValue but not included in the documentation. [warning] TensorFactory, defined in ./lib/tensor-factory.ts, is referenced by Tensor but not included in the documentation. [warning] ExternalDataFileType, defined in ./lib/onnx-model.ts, is referenced by InferenceSession.SessionOptions.externalData but not included in the documentation. [warning] TensorToDataUrlOptions, defined in ./lib/tensor-conversion.ts, is referenced by Tensor.toDataURL.toDataURL.options but not included in the documentation. [warning] TensorToImageDataOptions, defined in ./lib/tensor-conversion.ts, is referenced by Tensor.toImageData.toImageData.options but not included in the documentation. [warning] Failed to resolve link to "GpuBufferType" in comment for Env.WebGpuFlags.adapter. [warning] Failed to resolve link to "GpuBufferType" in comment for Env.WebGpuFlags.device. ``` Changes highlighted: - Merge `CoreMlExecutionProviderOption` and `CoreMLExecutionProviderOption`. They expose 2 set of different options for React-native and ORT nodejs binding. This should be fixed in future. - Fix a few inconsistency of names between JSDoc and parameters - Fix broken type links - Exclude trace functions	2024-03-15 19:01:50 -07:00
Belem Zhang	acb0df2280	Fix #19931 broken Get Started link of "ONNX Runtime JavaScript API" page (#19932 ) ### Description Fix #19931 broken Get Started link HTTP 404 for "Get Started" link in "ONNX Runtime JavaScript API" page Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2024-03-15 19:00:30 -07:00
Hector Li	d5c6a2cecf	Enable code in QNN UT to verify the fix for partition issue (#19939 ) ### Description Enable code in QNN UT to verify the fix for partition issue relate to QDQ model. https://github.com/microsoft/onnxruntime/pull/19723	2024-03-15 17:02:01 -07:00
enximi	7b46b31558	fix: "UserWarning: Unsupported Windows version (11). ONNX Runtime sup… (#19845 ) fix: "UserWarning: Unsupported Windows version (11). ONNX Runtime supports Windows 10 and above, only." ### Description Include Windows 11 in the version check. Now, you will not see the warning “Unsupported Windows version (11). ONNX Runtime supports Windows 10 and above, only.” ### Motivation and Context Warning on Windows 11: Only supports systems above Windows 10, which is somewhat strange.	2024-03-15 12:41:44 -07:00
Yulong Wang	79e50aeef3	[js/web] rewrite backend resolve to allow multiple EPs (#19735 ) ### Description This PR rewrite the backend resolve logic to support specifying multiple EPs. #### Backend The first version of ONNX Runtime Web actually carried some existing code from [ONNX.js](https://github.com/microsoft/onnxjs), which includes the "backend" concept. The original "backend" in ONNX.js is designed in a way assuming there is only one backend from user's backend hint list will be used. For example, in ONNX.js, if user specify a backend hint as `['webgl', 'wasm']`, ONNX.js will first try to use WebGL backend - if it loads successfully (the browser supports webgl), then "webgl" backend will be used and "wasm" will be ignored; otherwise, "webgl" will be ignored and try to load "wasm" backend. In short: only one backend will be used when initializing a session. #### Execution Provider Execution Provider, or EP, in ONNX Runtime is a different concept. One of the differences is that users are allow to specify multiple EPs, and if one does not support a particular kernel, it can fallback to other EP. This is a very common case when using a GPU EP in ONNX Runtime. #### Current Status: Backend v.s. EP Because of the history reasons mentioned above, the current status is quite confusing. There are real backends, which means it's different implementation in code; and there are backend hints, which are used as string names for backend hint; and there are EPs of the ONNX Runtime concepts. currently there are only 2 backends in our code base: The "onnxjs backend", and the "wasm backend". The "onnxjs backend" currently only powers backend hint "webgl", which go into the old onnx.js code path. All other backend hints including "wasm", "cpu"(alias to wasm), "webgpu" and "webnn" are all powered by "wasm backend". And because ORT Web treat "backend" as an internal concept and want to align with ONNX Runtime, so those names of backend hints are becoming EP names. The following table shows today's status: \| Execution Provider Name (public) / Backend Hint (internal) \| Backend \| EP in ORT \| -------- \| ------- \| ------- \| \| "wasm"/"cpu" \| WasmBackend \| CPU EP \| "webgl" \| OnnxjsBackend \| \* technically not an EP \| "webgpu" \| WasmBackend \| JSEP \| "webnn" \| WasmBackend \| WebNN EP #### Problem While the API allows to specify multiple EPs, the backend resolving only allows one backend. This causes issues when user specify multiple EP names in session options, the backend resolve behavior and EP registration behavior is inconsistent. Specifically, in this issue: https://github.com/microsoft/onnxruntime/issues/15796#issuecomment-1925363908: EP list `['webgpu', 'wasm']` on a browser without WebGPU support resolves to 'wasm' backend, but the full EP list is passed in session options, so JSEP is still enabled, causing the runtime error. #### Solution Since we still need WebGL backend, we cannot totally remove the backend register/resolve system. In this PR I made the following changes: - initialize every backend from the EP list, instead of only do that for the first successful one. - for the first resolved backend, filter all EP using the exact same backend. Remove all EPs not using this backend from session options - for every explicitly specified EP, if it's removed, show a warning message in console	2024-03-15 11:47:45 -07:00

1 2 3 4 5 ...

10780 commits