onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-09 00:30:53 +00:00

Author	SHA1	Message	Date
Changming Sun	dafbef3a21	CMake: support reading dependency zip files from a local mirror (#20005 ) ### Description To test this feature, run ```bat python cmake\deps_update_and_upload.py --root-path mirror ``` Then run build.py as usual. The zip files will be cached local. To avoid being downloaded again and again.	2024-03-21 17:58:59 -07:00
TP Boudreau	983fd8393a	Recognize NaN operands in Min and Max ops (#19984 ) ### Description Update the Min and Max CUDA math operations on float/double types to propagate NaNs: if either operand is NaN, the result should be NaN. TODO: float16/bfloat16 need similar change. ### Motivation Currently, results differ between the CPU and CUDA implementations of the floating point Min and Max operators: the CPU operators correctly return NaN results if either operand is NaN. This PR updates the CUDA implementations to conform with this correct behavior. See the the issue and comments raised [here](https://github.com/onnx/onnx/issues/6003). ### Context Same behavior in numpy, torch and Java: ``` >>> numpy.min([numpy.NAN, 1]) nan >>> numpy.max([numpy.NAN, 1]) nan >>> torch.min(torch.tensor([1, float('nan')])) tensor(nan) >>> torch.max(torch.tensor([1, float('nan')])) tensor(nan) ``` C languguage [fmin](https://en.cppreference.com/w/c/numeric/math/fmin) and [fmax](https://en.cppreference.com/w/c/numeric/math/fmax) has different behavior: ``` fmax(NaN,1) = 1 fmin(NaN,1) = 1 ``` https://grouper.ieee.org/groups/msc/ANSI_IEEE-Std-754-2019/background/minNum_maxNum_Removal_Demotion_v3.pdf ![image](https://github.com/microsoft/onnxruntime/assets/30328909/62446cf1-f252-4ddc-8118-5ce605252331) https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2273.pdf	2024-03-21 16:08:18 -07:00
Yi Zhang	30a0d80925	Fix exception in Publish unit test results step (#20007 ) ### Description Test results files are all in RelWithDebInfo\RelWithDebInfo directory. It's not necessary to stat the directory of _deps ### Motivation and Context Recently this exception in zip-nuget pipleine occurs many times. `##[error]Error: Failed find: EPERM: operation not permitted, stat 'D:\a\_work\1\b\RelWithDebInfo\_deps\flatbuffers-src\java\src\test\java\DictionaryLookup'` https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=426981&view=logs&j=75fc0348-fe99-522b-3acb-90fd80ac5271&t=5d4ebcc1-bcde-574d-6f4e-8abd0f04ae4b	2024-03-22 06:53:59 +08:00
Tianlei Wu	06fe4f3113	Increase MNIST test tolerance (#20000 ) ### Description Found multiple occurrence of failures: https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1321061&view=logs&j=6df8fe70-7b8f-505a-8ef0-8bf93da2bac7&t=56a04c0b-9e7f-5c69-cb7b-c2a7b1a7392a&l=17537 https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1329701&view=logs&j=6df8fe70-7b8f-505a-8ef0-8bf93da2bac7&t=4f6ef737-111d-50d1-a46b-5f86d9a970bc&s=3618b4c0-1011-591a-85b8-671e72e2cff1 1: [ RUN ] ModelTests/ModelTest.Run/ cuda__models_zoo_opset7_MNIST_model 1: D:\a\_work\1\s\onnxruntime\test\providers\cpu\model_tests.cc(358): error: Expected equality of these values: 1: COMPARE_RESULT::SUCCESS 1: Which is: 4-byte object <00-00 00-00> 1: ret.first 1: Which is: 4-byte object <01-00 00-00> 1: expected -2.33638 (c0158735), got -2.30239 (c0135a47), diff: 0.0339923, tol=0.0243638 idx=9	2024-03-20 23:40:27 -07:00
Prathik Rao	0b958bb421	add random seed to layernorm tests (#19998 ) Adds random seed to layernorm tests to prevent random failure. ### Motivation and Context Fixes https://github.com/microsoft/onnxruntime/issues/19983	2024-03-20 21:00:25 -07:00
Yi Zhang	175f149b30	Remove downloading deps in CUDA package test stage (#19993 ) ### Description <!-- Describe your changes. --> ### Motivation and Context downloading deps is not needed in test stage remove it to reduce random downloading errors	2024-03-21 10:01:03 +08:00
Justin Chu	0335ea9f1e	Use Java 11 to build project in the codeql pipeline (#19999 ) Codeql uses Java 8 by default, which is too old for the project. Related: https://learn.microsoft.com/en-us/java/openjdk/reasons-to-move-to-java-11 https://github.com/actions/setup-java	2024-03-20 17:53:48 -07:00
Yufeng Li	15219e2e71	turn on neural_speed by default (#19627 ) ### Description <!-- Describe your changes. --> the crash caused by the neural_speed turns out to be a very corn case. Turn it on by default. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-03-20 12:49:58 -07:00
Rachel Guo	6b305f95e0	Support xcframework for mac catalyst builds. (#19534 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> MAUI on macOS uses mac-catalyst which requires a different native binary. --------- Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: Scott McKay <skottmckay@gmail.com>	2024-03-20 10:55:19 -07:00
Adam Pocock	19ff4a6d6c	String Tensor SplitToSequence fix (#19942 )	2024-03-20 10:52:00 -07:00
Markus Tavenrath	0af5eacc8b	Fix broken Pooling CUDA NHWC Ops and ensure NCHW / NHWC parity. (#19889 ) ### Description Fixed all CUDA NHWC Pooling operations which were broken and enabled the NHWC CUDA pooling tests. Disabled all pooling tests which are not supported by the CUDA EP. ### Motivation and Context Ensure parity between CUDA NHWC / NCHW and work towards 100% tests enabled for the CUDA EP / CUDA NHWC EP. --------- Co-authored-by: Tianlei Wu <tlwu@microsoft.com>	2024-03-20 09:57:29 -07:00
Yi Zhang	8adbc09314	[Fix] Error Python Packaging Pipeline (Training CPU) (#19992 ) ### Description fix the error caused by https://github.com/microsoft/onnxruntime/pull/19973	2024-03-20 09:02:50 -07:00
zesongw	7e18cb4c35	[WebNN EP] Support MatMul 1D (#19862 ) ### Description Support MatMul 1D inputs by combining Reshape and ReduceMean. ### Motivation and Context ONNX MatMul can support 1D inputs, which is disabled in `IsOpSupportedImpl`.	2024-03-20 08:32:57 -07:00
Ye Wang	6ff31e06d5	[MoE] Add TP and Mixtral MoE (#19945 ) ### Description <!-- Describe your changes. --> 1.Support Tensor Parallelism in ShardedMoE. 2.Make necessary code changes to support Mixtral MoE. 3.Fix a bug related to using IOBinding in test script. 4.Fix the input size limitation ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-03-19 21:28:15 -07:00
mindest	3dfe4a5e6d	[ROCm] Remove MPI dependency and collectives to use NCCL (#19830 ) ### Description * Remove MPI dependency to use NCCL AllReduce, etc. * Exclude unsupported collectives in hipify	2024-03-19 17:35:18 -07:00
Abhishek Jindal	6fe02068af	Add const cast for DLManagedTensor (#19982 ) ### Description <!-- Describe your changes. --> Add Const Cast for DLManagedTensor as PyTorch has changed it's [code](https://github.com/pytorch/pytorch/pull/121102) which creates incompatibility. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix the below error while configuring ORT-training with nightly PyTorch ``` aten_op_executor.cc:60:40: error: invalid conversion from ‘const DLManagedTensor’ to ‘DLManagedTensor’ [-fpermissive] 60 \| at::Tensor tensor = at::fromDLPack(dlpack); \| ^~~~~~ \| \| \| const DLManagedTensor* ```	2024-03-19 17:00:44 -07:00
Guenther Schmuelling	c45cff60cf	[js/webgpu] fix maxpool / fp16 (#19981 )	2024-03-19 16:15:49 -07:00
Tianlei Wu	597e828aae	Adjust test tolerance (#19947 ) ### Description Improve the precision of tests. Changes include: (1) Update checkers.cc to use consistent default tolerance. (2) Allow different default tolerances for different providers at runtime (Previously, threshold of a test is decided during compiling). (3) Explicitly set absolute and relative error tolerances for tests that failed to pass new default threshold. #### Default Thresholds Change Note that the formula of testing is `abs(expected - value) < absolute + relative * expected` Default test thresholds when both absolute and relative tolerance are not set: type \| provider \| absolute (before) \| absolute (after) \| relative (before) \| relative (after) -- \| -- \| -- \| -- \| -- \| -- double \| CPU \| 0.001 \| 0.00001 \| 0 \| 0.00001 double \| CUDA \| 0.005 \| 0.00001 \| 0 \| 0.00001 double \| TRT \| 0.005 \| 0.00001 \| 0 \| 0.00001 double \| ROCM \| 0.005 \| 0.00001 \| 0 \| 0.00001 double \| DML \| 0.005 \| 0.00001 \| 0 \| 0.00001 \| \| \| \| \| float \| CPU \| 0.0001 \| 0.00001 \| 0 \| 0.0001 float \| CUDA \| 0.005 \| 0.00001 \| 0 \| 0.0001 float \| TRT \| 0.005 \| 0.00001 \| 0 \| 0.0001 float \| ROCM \| 0.005 \| 0.00001 \| 0 \| 0.0001 float \| DML \| 0.005 \| 0.00001 \| 0 \| 0.0001 float \| Training* \| 0.005 \| 0.001 \| 0 \| 0.0001 \| \| \| \| \| half \| CPU \| 0.001 \| 0.0025 \| 0 \| 0.001 half \| CUDA \| 0.005 \| 0.0025 \| 0 \| 0.001 half \| TRT \| 0.005 \| 0.0025 \| 0 \| 0.001 half \| ROCM \| 0.005 \| 0.0025 \| 0 \| 0.001 half \| DML \| 0.02 \| 0.005 \| 0 \| 0.001 half \| Training* \| 0.005 \| 0.005 \| 0 \| 0.001 \| \| \| \| \| bfloat16 \| CPU \| 0.0001 \| 0.02 \| 0 \| 0.01 bfloat16 \| CUDA \| 0.0001 \| 0.02 \| 0.05 \| 0.01 bfloat16 \| TRT \| 0.0001 \| 0.02 \| 0.05 \| 0.01 bfloat16 \| ROCM \| 0.0001 \| 0.02 \| 0.05 \| 0.01 bfloat16 \| DML \| 0.0001 \| 0.02 \| 0.05 \| 0.01 bfloat16 \| Training* \| 0.0001 \| 0.02 \| 0.05 \| 0.01 *Training mean a build flag ENABLE_TRAINING_CORE is defined. The provider can be any one. #### Threshold for provider Previously, the threshold might change according to build flags: ``` #if defined(USE_CUDA) \|\| defined(USE_ROCM) \|\| defined(USE_DML) constexpr float threshold = 0.005f; #else constexpr float threshold = 0.0001f; #endif ``` For a cpu only build, the threshold is 0.0001. For a cuda build, the threshold for CPU provider (some tests in cuda build actually run with CPU provider) is changed to 0.005. After this change, the threshold only depends on data type and provider used in the test. It will not change by build flags for non-training builds. Default thresholds for training might be different from inference (please refer to the above table). There are a few factors there: Training has gradient outputs; TF32 is not disabled in training; Some training tests has iterations, and error might accumulate. How to set different thresholds based on these factors could be a future task.	2024-03-19 15:50:13 -07:00
Hariharan Seshadri	cd6ec50b50	Switch a portion of CI/packaging jobs to MacOS12 (#19908 )	2024-03-19 14:54:58 -07:00
Adrian Lizarraga	18a7f34ba0	[NhwcTransformerTests] Fix linker error due to explicit template instantiation of ModelBuilder methods (#19980 ) Currently, the nhwc_transformer_test.cc compilation unit defines explicit FP16 versions of `ModelTestBuilder::MakeInput<MLFloat16>` and `ModelTestBuilder::MakeInitializer<MLFloat16>` outside of the ModelTestBuilder class's header file. These explicit template instantiations cause linker errors when other compilation units also instantiate these functions due to duplicate definitions. Additionally, the versions defined in nhwc_transformer_test.cc do not really conform to the expected behavior in the original ModelTestBuilder class, which is to make random input/initializer values. Instead, the versions in nhwc_transformer_test.cc create a range of values. The solution is to edit nhwc_transformer_test.cc to use stand-alone static functions that do not change the ModelTestBuilder class. Note: This linker error cannot currently be replicated in our CIs because it requires a QNN-HTP-enabled Windows ARM64 environment with `MLAS_F16VEC_INTRINSICS_SUPPORTED` defined. I can replicate on a local build. The linker error/conflict happens with with this new FP16 QNN test: `d4c8bc359e/onnxruntime/test/providers/qnn/clip_op_test.cc (L186)`	2024-03-19 13:48:04 -07:00
Yulong Wang	01c7aaf6aa	[js/webgpu] allow setting env.webgpu.adapter (#19940 ) ### Description Allow user to set `env.webgpu.adapter` before creating the first inference session. Feature request: https://github.com/microsoft/onnxruntime/pull/19857#issuecomment-1999984753 @xenova	2024-03-19 12:55:00 -07:00
Tianlei Wu	8293aa1564	Exclude TRT provider in tests crashed in A100 (#19972 ) TensorRT EP segmentation fault on A100 for some tests. Exclude TRT EP in those tests on A100 to unblock developing. ### Motivation and Context https://github.com/microsoft/onnxruntime/issues/19530	2024-03-19 11:36:42 -07:00
Yi Zhang	d4c8bc359e	Fix Training CPU docker image name to avoid unnecessary rebuilding (#19973 ) ### Description The docker image name was fixed, but the docker argument was different in different job. It would trigger rebuilding the docker image almost every time!!!	2024-03-19 09:33:24 -07:00
Prathik Rao	26cd3c1fb0	add kernel tests for ops that changed in opset18 (#19767 ) ### Description <!-- Describe your changes. --> - [x] Pad operator has introduced a new input called "axes" which specifies which axis to pad. But it defaults to input_rank if axes is not provided which was the behavior before the opset upgrade. - [x] ReduceMean - [x] ReduceL2 - [x] ReduceLogSumExp - [x] ReduceSum - Reduction ops all had the axes attribute switched to an input and a new attribute called "noop_with_empty_axes" was added to define what to do when axes is not specified. - [x] Resize has had two new attributes introduced: antialias and keep_aspect_ratio_policy. From Operators.md I've gathered: "Antialiasing is achieved by stretching the resampling filter by a factor max(1, 1 / scale), which means that when downsampling, more input pixels contribute to an output pixel." keep_aspect_ratio_policy "describes how to interpret the `sizes` input with regard to keeping the original aspect ratio of the input." there are a couple enum-type options that specify different policies and what to do in each case. - NOTE: Baiju already included opset18 tests in https://github.com/microsoft/onnxruntime/pull/17772 - [x] ScatterElements/ScatterND has had a new attribute introduced called "reduction." This specifies the type of reduction to apply: none (default), add, mul, max, min. - [x] Split introduced a new attribute called "num_outputs" which specifies how many outputs to split the input tensor into. This is in contrast to the previous, default behavior of specifying a "split" input which defines the size of each resultant tensor of the output. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-03-19 09:33:06 -07:00
Xu Xing	4c6a6a37f7	[js/webgpu] Fix NAN caused by un-initialized buffer in instance-norm (#19387 ) The added case will be NAN because of the un-initialized buffer.	2024-03-18 22:59:32 -07:00
Ted Themistokleous	6bb64683f8	Use version instead of version-dev for ROCm (#19967 )	2024-03-19 10:40:40 +08:00
Guenther Schmuelling	a4ac727cbb	handle fp16 for where op (#19969 ) this prevents falling back from webgpu to cpu, aka helps performance	2024-03-18 13:42:51 -07:00
Tianlei Wu	141966bb69	Disable TF32 in tests of CUDA ep (#19963 ) Operator or model test result shall not depend on whether NVIDIA_TF32_OVERRIDE environment variable is set or not. This make test results more deterministic.	2024-03-18 11:17:34 -07:00
Dmitri Smirnov	a033df8c31	Implement CustomOp Output Type Inference function (#19906 ) ### Description <!-- Describe your changes. --> This change addresses the following issues with the current CustomOP Output Type inference - The function does not take into account optional inputs. When input is absent the inference is silently aborted, and no output type is inferred (P1 customer issue) - Inferring output type based on the input type for multi-kernel custom ops is done based on the latest in sequence kernel definition. There is not an attempt made to match the kernel based on the input type. - Inference is aborted when variadic inputs/outputs are detected when the generated input/output names fail to obtain type constraints. This is not immediately clear from the code, because custom op schema is not available within the inference function. - No error reporting. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Most of CustomOPs lack their own type and shape inference function as it was recently introduced. For that reason, it is important to fix this. This change is inspired by a customer issue. This is a follow up on: - https://github.com/microsoft/onnxruntime/pull/15184 - https://github.com/cbourjau/ort-custom-op/pull/11 - https://github.com/microsoft/onnxruntime-extensions/issues/451	2024-03-18 10:28:39 -07:00
Edward Chen	4d31076d68	[objc] Add check for ORTValue being a tensor in ORTValue methods that should only be used with tensors. (#19946 ) Add check to report error instead of crashing.	2024-03-18 08:54:24 -07:00
Guenther Schmuelling	7e0d424934	accumulate in fp32 for Reduce* (#19868 )	2024-03-18 08:28:43 -07:00
dependabot[bot]	28ad6c3955	Bump follow-redirects from 1.15.4 to 1.15.6 in /js/node (#19951 ) Bumps [follow-redirects](https://github.com/follow-redirects/follow-redirects) from 1.15.4 to 1.15.6. <details> <summary>Commits</summary> <ul> <li><a href="`35a517c586`"><code>35a517c</code></a> Release version 1.15.6 of the npm package.</li> <li><a href="`c4f847f851`"><code>c4f847f</code></a> Drop Proxy-Authorization across hosts.</li> <li><a href="`8526b4a1b2`"><code>8526b4a</code></a> Use GitHub for disclosure.</li> <li><a href="`b1677ce001`"><code>b1677ce</code></a> Release version 1.15.5 of the npm package.</li> <li><a href="`d8914f7982`"><code>d8914f7</code></a> Preserve fragment in responseUrl.</li> <li>See full diff in <a href="https://github.com/follow-redirects/follow-redirects/compare/v1.15.4...v1.15.6">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=follow-redirects&package-manager=npm_and_yarn&previous-version=1.15.4&new-version=1.15.6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-03-16 18:54:53 -07:00
dependabot[bot]	4e55242a30	Bump follow-redirects from 1.15.4 to 1.15.6 in /onnxruntime/test/wasm (#19950 ) Bumps [follow-redirects](https://github.com/follow-redirects/follow-redirects) from 1.15.4 to 1.15.6. <details> <summary>Commits</summary> <ul> <li><a href="`35a517c586`"><code>35a517c</code></a> Release version 1.15.6 of the npm package.</li> <li><a href="`c4f847f851`"><code>c4f847f</code></a> Drop Proxy-Authorization across hosts.</li> <li><a href="`8526b4a1b2`"><code>8526b4a</code></a> Use GitHub for disclosure.</li> <li><a href="`b1677ce001`"><code>b1677ce</code></a> Release version 1.15.5 of the npm package.</li> <li><a href="`d8914f7982`"><code>d8914f7</code></a> Preserve fragment in responseUrl.</li> <li>See full diff in <a href="https://github.com/follow-redirects/follow-redirects/compare/v1.15.4...v1.15.6">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=follow-redirects&package-manager=npm_and_yarn&previous-version=1.15.4&new-version=1.15.6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-03-16 18:54:06 -07:00
dependabot[bot]	afdab62f53	Bump follow-redirects from 1.15.4 to 1.15.6 in /js/web (#19949 ) Bumps [follow-redirects](https://github.com/follow-redirects/follow-redirects) from 1.15.4 to 1.15.6. <details> <summary>Commits</summary> <ul> <li><a href="`35a517c586`"><code>35a517c</code></a> Release version 1.15.6 of the npm package.</li> <li><a href="`c4f847f851`"><code>c4f847f</code></a> Drop Proxy-Authorization across hosts.</li> <li><a href="`8526b4a1b2`"><code>8526b4a</code></a> Use GitHub for disclosure.</li> <li><a href="`b1677ce001`"><code>b1677ce</code></a> Release version 1.15.5 of the npm package.</li> <li><a href="`d8914f7982`"><code>d8914f7</code></a> Preserve fragment in responseUrl.</li> <li>See full diff in <a href="https://github.com/follow-redirects/follow-redirects/compare/v1.15.4...v1.15.6">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=follow-redirects&package-manager=npm_and_yarn&previous-version=1.15.4&new-version=1.15.6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-03-16 18:53:17 -07:00
wangshuai09	1eb67a07ca	Add cann_dependencies (#19929 ) ### Description <!-- Describe your changes. --> Add `cann_dependencies` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> The previous [PR](https://github.com/microsoft/onnxruntime/pull/17365) avioded using patchelf but lost `cann_dependencies`, This PR adds `cann_dependencies` to avoid require cann libraries when repairing wheel.	2024-03-15 20:28:43 -07:00
Yulong Wang	b29849a287	[js/common] fix typedoc warnings (#19933 ) ### Description Fix a few warnings in typedoc (for generating JS API): ``` [warning] The signature TrainingSession.loadParametersBuffer has an @param with name "buffer", which was not used. [warning] NonTensorType, defined in ./lib/onnx-value.ts, is referenced by OnnxValue but not included in the documentation. [warning] TensorFactory, defined in ./lib/tensor-factory.ts, is referenced by Tensor but not included in the documentation. [warning] ExternalDataFileType, defined in ./lib/onnx-model.ts, is referenced by InferenceSession.SessionOptions.externalData but not included in the documentation. [warning] TensorToDataUrlOptions, defined in ./lib/tensor-conversion.ts, is referenced by Tensor.toDataURL.toDataURL.options but not included in the documentation. [warning] TensorToImageDataOptions, defined in ./lib/tensor-conversion.ts, is referenced by Tensor.toImageData.toImageData.options but not included in the documentation. [warning] Failed to resolve link to "GpuBufferType" in comment for Env.WebGpuFlags.adapter. [warning] Failed to resolve link to "GpuBufferType" in comment for Env.WebGpuFlags.device. ``` Changes highlighted: - Merge `CoreMlExecutionProviderOption` and `CoreMLExecutionProviderOption`. They expose 2 set of different options for React-native and ORT nodejs binding. This should be fixed in future. - Fix a few inconsistency of names between JSDoc and parameters - Fix broken type links - Exclude trace functions	2024-03-15 19:01:50 -07:00
Belem Zhang	acb0df2280	Fix #19931 broken Get Started link of "ONNX Runtime JavaScript API" page (#19932 ) ### Description Fix #19931 broken Get Started link HTTP 404 for "Get Started" link in "ONNX Runtime JavaScript API" page Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2024-03-15 19:00:30 -07:00
Hector Li	d5c6a2cecf	Enable code in QNN UT to verify the fix for partition issue (#19939 ) ### Description Enable code in QNN UT to verify the fix for partition issue relate to QDQ model. https://github.com/microsoft/onnxruntime/pull/19723	2024-03-15 17:02:01 -07:00
enximi	7b46b31558	fix: "UserWarning: Unsupported Windows version (11). ONNX Runtime sup… (#19845 ) fix: "UserWarning: Unsupported Windows version (11). ONNX Runtime supports Windows 10 and above, only." ### Description Include Windows 11 in the version check. Now, you will not see the warning “Unsupported Windows version (11). ONNX Runtime supports Windows 10 and above, only.” ### Motivation and Context Warning on Windows 11: Only supports systems above Windows 10, which is somewhat strange.	2024-03-15 12:41:44 -07:00
Yulong Wang	79e50aeef3	[js/web] rewrite backend resolve to allow multiple EPs (#19735 ) ### Description This PR rewrite the backend resolve logic to support specifying multiple EPs. #### Backend The first version of ONNX Runtime Web actually carried some existing code from [ONNX.js](https://github.com/microsoft/onnxjs), which includes the "backend" concept. The original "backend" in ONNX.js is designed in a way assuming there is only one backend from user's backend hint list will be used. For example, in ONNX.js, if user specify a backend hint as `['webgl', 'wasm']`, ONNX.js will first try to use WebGL backend - if it loads successfully (the browser supports webgl), then "webgl" backend will be used and "wasm" will be ignored; otherwise, "webgl" will be ignored and try to load "wasm" backend. In short: only one backend will be used when initializing a session. #### Execution Provider Execution Provider, or EP, in ONNX Runtime is a different concept. One of the differences is that users are allow to specify multiple EPs, and if one does not support a particular kernel, it can fallback to other EP. This is a very common case when using a GPU EP in ONNX Runtime. #### Current Status: Backend v.s. EP Because of the history reasons mentioned above, the current status is quite confusing. There are real backends, which means it's different implementation in code; and there are backend hints, which are used as string names for backend hint; and there are EPs of the ONNX Runtime concepts. currently there are only 2 backends in our code base: The "onnxjs backend", and the "wasm backend". The "onnxjs backend" currently only powers backend hint "webgl", which go into the old onnx.js code path. All other backend hints including "wasm", "cpu"(alias to wasm), "webgpu" and "webnn" are all powered by "wasm backend". And because ORT Web treat "backend" as an internal concept and want to align with ONNX Runtime, so those names of backend hints are becoming EP names. The following table shows today's status: \| Execution Provider Name (public) / Backend Hint (internal) \| Backend \| EP in ORT \| -------- \| ------- \| ------- \| \| "wasm"/"cpu" \| WasmBackend \| CPU EP \| "webgl" \| OnnxjsBackend \| \* technically not an EP \| "webgpu" \| WasmBackend \| JSEP \| "webnn" \| WasmBackend \| WebNN EP #### Problem While the API allows to specify multiple EPs, the backend resolving only allows one backend. This causes issues when user specify multiple EP names in session options, the backend resolve behavior and EP registration behavior is inconsistent. Specifically, in this issue: https://github.com/microsoft/onnxruntime/issues/15796#issuecomment-1925363908: EP list `['webgpu', 'wasm']` on a browser without WebGPU support resolves to 'wasm' backend, but the full EP list is passed in session options, so JSEP is still enabled, causing the runtime error. #### Solution Since we still need WebGL backend, we cannot totally remove the backend register/resolve system. In this PR I made the following changes: - initialize every backend from the EP list, instead of only do that for the first successful one. - for the first resolved backend, filter all EP using the exact same backend. Remove all EPs not using this backend from session options - for every explicitly specified EP, if it's removed, show a warning message in console	2024-03-15 11:47:45 -07:00
Yifan Li	0b2a75b274	[EP Perf] Add concurrency test (#19804 ) ### Description <!-- Describe your changes. --> * Add concurrency test to EP Perf CI panel (impl. by onnx_test_runner) * Model: FasterRCNN-10 model within CI image * `-c` param configurable via CI panel when kicking off CI tasks * Auto-replicate test input/outputs according to `-c` param * By default, the model test will be executed in 100 iterations (~2min added to T4 CI task load overall) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> To monitor potential concurrency issues of ORT-TRT	2024-03-15 07:41:21 -07:00
Hariharan Seshadri	42399dfd2b	Fix a potential race in the CUDA TopK kernel (#19917 ) ### Description If the `K` value is flowing through as a tensor, we are updating a mutable member of the `TopK` class and basing the compute off that - which is likely to cause data race issues with concurrent Run() calls and `K` value changes. ### Motivation and Context Fix potential race in CUDA TopK kernel	2024-03-14 18:13:47 -07:00
Justin Chu	bcf47d3546	Update install_deps_lort.sh to fix onnxscript installation (#19922 ) Install onnxscript correctly with `pip install`. Dev dependencies are not required. ### Motivation and Context Fix build breaks.	2024-03-14 17:05:50 -07:00
Adam Louly	32558134a9	[On-Device-Training] Upgrade Flatbuffers to Support 2GB+ Checkpoints. (#19770 ) ### Description Modifications to support 2GB+ checkpoint & Upgrading Flatbuffers ### Motivation and Context This PR includes changes that will make ort handle 2GB+ checkpoints. To do that we need to upgrade flatbuffers to 23.5.9 - https://github.com/google/flatbuffers/pull/7945 - Modified the commitHash and the hash for the new version - Removed the patch for rust generator's unused variable warning as it is no longer producing this - [Check it out here](`d121e09d89/src/idl_gen_rust.cpp`) - Updated the VerifyField calls with alignment values that were introduced in the new version. --------- Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>	2024-03-14 16:36:24 -07:00
Yi Zhang	87a9f77c56	Refactor Python Packaing Pipeline (Training Cuda 11.8) (#19910 ) ### Description 1. Use stage to organize the pipeline and split building and testing 2. Move compilation on CPU machine 3. test stage can leverage existing artifacts 4. check wheel size, it gives warning if the size above 300M 5. docker image name wasn't change even the argument changed, which caused the docker image was always rebuilt. So update the docker image name according to the argument can save the docker build time. Pipeline duration reduced by 60% (2 hours -> 50 minutes) Compilation time reduced by 75% (1.5hours -> 20 minutes) GPU time reduced by 87% ( 8 hours to 1 hours) for debugging, the GPU time could be reduced by above 95%, because we can choose run only one test stage and skip building. ### Motivation and Context Make the pipeline efficient. Optimized https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=424177&view=results Curent https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=422393&view=results ---------	2024-03-15 06:47:41 +08:00
Changming Sun	8b766bd24e	Change nuget pipeline's "Windows_Packaging_combined_GPU" job to download TRT binaries in every build (#19919 ) ### Description Change nuget pipeline's "Final_Jar_Testing_Windows_GPU" job to download TRT binaries in every build. Now all the other build jobs are already doing this. This is the only one left. Similar to #19909 ### Motivation and Context As a follow up of #19118	2024-03-14 15:07:56 -07:00
Tianlei Wu	a2ffc3740b	[Cuda] Demo multiple cuda graphs and user compute stream (#19883 ) Update stable diffusion demo to add options `--max-cuda-graphs` and `--user-compute-stream`. * Add python class GpuBindingManager to manage IO Binding based on input shape and max number of cuda graphs setting. The benefit is that one inference session could enable or disable cuda graph in different runs. * When `--user-compute-stream`, the demo will use custom compute stream.	2024-03-14 13:48:37 -07:00
Edward Chen	0b90363acb	[MLAS][AArch64] SQ4BitGemm CompInt8 multi-block implementation (#19826 ) Update SQ4BitGemm CompInt8 implementation to process multiple blocks along a single column instead of processing single blocks from multiple columns.	2024-03-14 13:05:42 -07:00
Baiju Meswani	226f60f2f1	Add support for SGD optimizer in minimal build (#19901 )	2024-03-14 11:31:20 -07:00
Changming Sun	1fb6cbddee	Add a build patch for Windows ARM64EC (#19898 ) ### Description Add a patch for Windows ARM64EC ### Motivation and Context Will need more changes in onnxruntime/core/common/cpuid_arch_definition.h and onnxruntime/core/common/cpuid_info.cc	2024-03-14 08:50:42 -07:00

1 2 3 4 5 ...

10770 commits