onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-10 17:37:14 +00:00

Author	SHA1	Message	Date
Guenther Schmuelling	c45cff60cf	[js/webgpu] fix maxpool / fp16 (#19981 )	2024-03-19 16:15:49 -07:00
Tianlei Wu	597e828aae	Adjust test tolerance (#19947 ) ### Description Improve the precision of tests. Changes include: (1) Update checkers.cc to use consistent default tolerance. (2) Allow different default tolerances for different providers at runtime (Previously, threshold of a test is decided during compiling). (3) Explicitly set absolute and relative error tolerances for tests that failed to pass new default threshold. #### Default Thresholds Change Note that the formula of testing is `abs(expected - value) < absolute + relative * expected` Default test thresholds when both absolute and relative tolerance are not set: type \| provider \| absolute (before) \| absolute (after) \| relative (before) \| relative (after) -- \| -- \| -- \| -- \| -- \| -- double \| CPU \| 0.001 \| 0.00001 \| 0 \| 0.00001 double \| CUDA \| 0.005 \| 0.00001 \| 0 \| 0.00001 double \| TRT \| 0.005 \| 0.00001 \| 0 \| 0.00001 double \| ROCM \| 0.005 \| 0.00001 \| 0 \| 0.00001 double \| DML \| 0.005 \| 0.00001 \| 0 \| 0.00001 \| \| \| \| \| float \| CPU \| 0.0001 \| 0.00001 \| 0 \| 0.0001 float \| CUDA \| 0.005 \| 0.00001 \| 0 \| 0.0001 float \| TRT \| 0.005 \| 0.00001 \| 0 \| 0.0001 float \| ROCM \| 0.005 \| 0.00001 \| 0 \| 0.0001 float \| DML \| 0.005 \| 0.00001 \| 0 \| 0.0001 float \| Training* \| 0.005 \| 0.001 \| 0 \| 0.0001 \| \| \| \| \| half \| CPU \| 0.001 \| 0.0025 \| 0 \| 0.001 half \| CUDA \| 0.005 \| 0.0025 \| 0 \| 0.001 half \| TRT \| 0.005 \| 0.0025 \| 0 \| 0.001 half \| ROCM \| 0.005 \| 0.0025 \| 0 \| 0.001 half \| DML \| 0.02 \| 0.005 \| 0 \| 0.001 half \| Training* \| 0.005 \| 0.005 \| 0 \| 0.001 \| \| \| \| \| bfloat16 \| CPU \| 0.0001 \| 0.02 \| 0 \| 0.01 bfloat16 \| CUDA \| 0.0001 \| 0.02 \| 0.05 \| 0.01 bfloat16 \| TRT \| 0.0001 \| 0.02 \| 0.05 \| 0.01 bfloat16 \| ROCM \| 0.0001 \| 0.02 \| 0.05 \| 0.01 bfloat16 \| DML \| 0.0001 \| 0.02 \| 0.05 \| 0.01 bfloat16 \| Training* \| 0.0001 \| 0.02 \| 0.05 \| 0.01 *Training mean a build flag ENABLE_TRAINING_CORE is defined. The provider can be any one. #### Threshold for provider Previously, the threshold might change according to build flags: ``` #if defined(USE_CUDA) \|\| defined(USE_ROCM) \|\| defined(USE_DML) constexpr float threshold = 0.005f; #else constexpr float threshold = 0.0001f; #endif ``` For a cpu only build, the threshold is 0.0001. For a cuda build, the threshold for CPU provider (some tests in cuda build actually run with CPU provider) is changed to 0.005. After this change, the threshold only depends on data type and provider used in the test. It will not change by build flags for non-training builds. Default thresholds for training might be different from inference (please refer to the above table). There are a few factors there: Training has gradient outputs; TF32 is not disabled in training; Some training tests has iterations, and error might accumulate. How to set different thresholds based on these factors could be a future task.	2024-03-19 15:50:13 -07:00
Hariharan Seshadri	cd6ec50b50	Switch a portion of CI/packaging jobs to MacOS12 (#19908 )	2024-03-19 14:54:58 -07:00
Adrian Lizarraga	18a7f34ba0	[NhwcTransformerTests] Fix linker error due to explicit template instantiation of ModelBuilder methods (#19980 ) Currently, the nhwc_transformer_test.cc compilation unit defines explicit FP16 versions of `ModelTestBuilder::MakeInput<MLFloat16>` and `ModelTestBuilder::MakeInitializer<MLFloat16>` outside of the ModelTestBuilder class's header file. These explicit template instantiations cause linker errors when other compilation units also instantiate these functions due to duplicate definitions. Additionally, the versions defined in nhwc_transformer_test.cc do not really conform to the expected behavior in the original ModelTestBuilder class, which is to make random input/initializer values. Instead, the versions in nhwc_transformer_test.cc create a range of values. The solution is to edit nhwc_transformer_test.cc to use stand-alone static functions that do not change the ModelTestBuilder class. Note: This linker error cannot currently be replicated in our CIs because it requires a QNN-HTP-enabled Windows ARM64 environment with `MLAS_F16VEC_INTRINSICS_SUPPORTED` defined. I can replicate on a local build. The linker error/conflict happens with with this new FP16 QNN test: `d4c8bc359e/onnxruntime/test/providers/qnn/clip_op_test.cc (L186)`	2024-03-19 13:48:04 -07:00
Yulong Wang	01c7aaf6aa	[js/webgpu] allow setting env.webgpu.adapter (#19940 ) ### Description Allow user to set `env.webgpu.adapter` before creating the first inference session. Feature request: https://github.com/microsoft/onnxruntime/pull/19857#issuecomment-1999984753 @xenova	2024-03-19 12:55:00 -07:00
Tianlei Wu	8293aa1564	Exclude TRT provider in tests crashed in A100 (#19972 ) TensorRT EP segmentation fault on A100 for some tests. Exclude TRT EP in those tests on A100 to unblock developing. ### Motivation and Context https://github.com/microsoft/onnxruntime/issues/19530	2024-03-19 11:36:42 -07:00
Yi Zhang	d4c8bc359e	Fix Training CPU docker image name to avoid unnecessary rebuilding (#19973 ) ### Description The docker image name was fixed, but the docker argument was different in different job. It would trigger rebuilding the docker image almost every time!!!	2024-03-19 09:33:24 -07:00
Prathik Rao	26cd3c1fb0	add kernel tests for ops that changed in opset18 (#19767 ) ### Description <!-- Describe your changes. --> - [x] Pad operator has introduced a new input called "axes" which specifies which axis to pad. But it defaults to input_rank if axes is not provided which was the behavior before the opset upgrade. - [x] ReduceMean - [x] ReduceL2 - [x] ReduceLogSumExp - [x] ReduceSum - Reduction ops all had the axes attribute switched to an input and a new attribute called "noop_with_empty_axes" was added to define what to do when axes is not specified. - [x] Resize has had two new attributes introduced: antialias and keep_aspect_ratio_policy. From Operators.md I've gathered: "Antialiasing is achieved by stretching the resampling filter by a factor max(1, 1 / scale), which means that when downsampling, more input pixels contribute to an output pixel." keep_aspect_ratio_policy "describes how to interpret the `sizes` input with regard to keeping the original aspect ratio of the input." there are a couple enum-type options that specify different policies and what to do in each case. - NOTE: Baiju already included opset18 tests in https://github.com/microsoft/onnxruntime/pull/17772 - [x] ScatterElements/ScatterND has had a new attribute introduced called "reduction." This specifies the type of reduction to apply: none (default), add, mul, max, min. - [x] Split introduced a new attribute called "num_outputs" which specifies how many outputs to split the input tensor into. This is in contrast to the previous, default behavior of specifying a "split" input which defines the size of each resultant tensor of the output. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-03-19 09:33:06 -07:00
Xu Xing	4c6a6a37f7	[js/webgpu] Fix NAN caused by un-initialized buffer in instance-norm (#19387 ) The added case will be NAN because of the un-initialized buffer.	2024-03-18 22:59:32 -07:00
Ted Themistokleous	6bb64683f8	Use version instead of version-dev for ROCm (#19967 )	2024-03-19 10:40:40 +08:00
Guenther Schmuelling	a4ac727cbb	handle fp16 for where op (#19969 ) this prevents falling back from webgpu to cpu, aka helps performance	2024-03-18 13:42:51 -07:00
Tianlei Wu	141966bb69	Disable TF32 in tests of CUDA ep (#19963 ) Operator or model test result shall not depend on whether NVIDIA_TF32_OVERRIDE environment variable is set or not. This make test results more deterministic.	2024-03-18 11:17:34 -07:00
Dmitri Smirnov	a033df8c31	Implement CustomOp Output Type Inference function (#19906 ) ### Description <!-- Describe your changes. --> This change addresses the following issues with the current CustomOP Output Type inference - The function does not take into account optional inputs. When input is absent the inference is silently aborted, and no output type is inferred (P1 customer issue) - Inferring output type based on the input type for multi-kernel custom ops is done based on the latest in sequence kernel definition. There is not an attempt made to match the kernel based on the input type. - Inference is aborted when variadic inputs/outputs are detected when the generated input/output names fail to obtain type constraints. This is not immediately clear from the code, because custom op schema is not available within the inference function. - No error reporting. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Most of CustomOPs lack their own type and shape inference function as it was recently introduced. For that reason, it is important to fix this. This change is inspired by a customer issue. This is a follow up on: - https://github.com/microsoft/onnxruntime/pull/15184 - https://github.com/cbourjau/ort-custom-op/pull/11 - https://github.com/microsoft/onnxruntime-extensions/issues/451	2024-03-18 10:28:39 -07:00
Edward Chen	4d31076d68	[objc] Add check for ORTValue being a tensor in ORTValue methods that should only be used with tensors. (#19946 ) Add check to report error instead of crashing.	2024-03-18 08:54:24 -07:00
Guenther Schmuelling	7e0d424934	accumulate in fp32 for Reduce* (#19868 )	2024-03-18 08:28:43 -07:00
dependabot[bot]	28ad6c3955	Bump follow-redirects from 1.15.4 to 1.15.6 in /js/node (#19951 ) Bumps [follow-redirects](https://github.com/follow-redirects/follow-redirects) from 1.15.4 to 1.15.6. <details> <summary>Commits</summary> <ul> <li><a href="`35a517c586`"><code>35a517c</code></a> Release version 1.15.6 of the npm package.</li> <li><a href="`c4f847f851`"><code>c4f847f</code></a> Drop Proxy-Authorization across hosts.</li> <li><a href="`8526b4a1b2`"><code>8526b4a</code></a> Use GitHub for disclosure.</li> <li><a href="`b1677ce001`"><code>b1677ce</code></a> Release version 1.15.5 of the npm package.</li> <li><a href="`d8914f7982`"><code>d8914f7</code></a> Preserve fragment in responseUrl.</li> <li>See full diff in <a href="https://github.com/follow-redirects/follow-redirects/compare/v1.15.4...v1.15.6">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=follow-redirects&package-manager=npm_and_yarn&previous-version=1.15.4&new-version=1.15.6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-03-16 18:54:53 -07:00
dependabot[bot]	4e55242a30	Bump follow-redirects from 1.15.4 to 1.15.6 in /onnxruntime/test/wasm (#19950 ) Bumps [follow-redirects](https://github.com/follow-redirects/follow-redirects) from 1.15.4 to 1.15.6. <details> <summary>Commits</summary> <ul> <li><a href="`35a517c586`"><code>35a517c</code></a> Release version 1.15.6 of the npm package.</li> <li><a href="`c4f847f851`"><code>c4f847f</code></a> Drop Proxy-Authorization across hosts.</li> <li><a href="`8526b4a1b2`"><code>8526b4a</code></a> Use GitHub for disclosure.</li> <li><a href="`b1677ce001`"><code>b1677ce</code></a> Release version 1.15.5 of the npm package.</li> <li><a href="`d8914f7982`"><code>d8914f7</code></a> Preserve fragment in responseUrl.</li> <li>See full diff in <a href="https://github.com/follow-redirects/follow-redirects/compare/v1.15.4...v1.15.6">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=follow-redirects&package-manager=npm_and_yarn&previous-version=1.15.4&new-version=1.15.6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-03-16 18:54:06 -07:00
dependabot[bot]	afdab62f53	Bump follow-redirects from 1.15.4 to 1.15.6 in /js/web (#19949 ) Bumps [follow-redirects](https://github.com/follow-redirects/follow-redirects) from 1.15.4 to 1.15.6. <details> <summary>Commits</summary> <ul> <li><a href="`35a517c586`"><code>35a517c</code></a> Release version 1.15.6 of the npm package.</li> <li><a href="`c4f847f851`"><code>c4f847f</code></a> Drop Proxy-Authorization across hosts.</li> <li><a href="`8526b4a1b2`"><code>8526b4a</code></a> Use GitHub for disclosure.</li> <li><a href="`b1677ce001`"><code>b1677ce</code></a> Release version 1.15.5 of the npm package.</li> <li><a href="`d8914f7982`"><code>d8914f7</code></a> Preserve fragment in responseUrl.</li> <li>See full diff in <a href="https://github.com/follow-redirects/follow-redirects/compare/v1.15.4...v1.15.6">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=follow-redirects&package-manager=npm_and_yarn&previous-version=1.15.4&new-version=1.15.6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-03-16 18:53:17 -07:00
wangshuai09	1eb67a07ca	Add cann_dependencies (#19929 ) ### Description <!-- Describe your changes. --> Add `cann_dependencies` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> The previous [PR](https://github.com/microsoft/onnxruntime/pull/17365) avioded using patchelf but lost `cann_dependencies`, This PR adds `cann_dependencies` to avoid require cann libraries when repairing wheel.	2024-03-15 20:28:43 -07:00
Yulong Wang	b29849a287	[js/common] fix typedoc warnings (#19933 ) ### Description Fix a few warnings in typedoc (for generating JS API): ``` [warning] The signature TrainingSession.loadParametersBuffer has an @param with name "buffer", which was not used. [warning] NonTensorType, defined in ./lib/onnx-value.ts, is referenced by OnnxValue but not included in the documentation. [warning] TensorFactory, defined in ./lib/tensor-factory.ts, is referenced by Tensor but not included in the documentation. [warning] ExternalDataFileType, defined in ./lib/onnx-model.ts, is referenced by InferenceSession.SessionOptions.externalData but not included in the documentation. [warning] TensorToDataUrlOptions, defined in ./lib/tensor-conversion.ts, is referenced by Tensor.toDataURL.toDataURL.options but not included in the documentation. [warning] TensorToImageDataOptions, defined in ./lib/tensor-conversion.ts, is referenced by Tensor.toImageData.toImageData.options but not included in the documentation. [warning] Failed to resolve link to "GpuBufferType" in comment for Env.WebGpuFlags.adapter. [warning] Failed to resolve link to "GpuBufferType" in comment for Env.WebGpuFlags.device. ``` Changes highlighted: - Merge `CoreMlExecutionProviderOption` and `CoreMLExecutionProviderOption`. They expose 2 set of different options for React-native and ORT nodejs binding. This should be fixed in future. - Fix a few inconsistency of names between JSDoc and parameters - Fix broken type links - Exclude trace functions	2024-03-15 19:01:50 -07:00
Belem Zhang	acb0df2280	Fix #19931 broken Get Started link of "ONNX Runtime JavaScript API" page (#19932 ) ### Description Fix #19931 broken Get Started link HTTP 404 for "Get Started" link in "ONNX Runtime JavaScript API" page Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2024-03-15 19:00:30 -07:00
Hector Li	d5c6a2cecf	Enable code in QNN UT to verify the fix for partition issue (#19939 ) ### Description Enable code in QNN UT to verify the fix for partition issue relate to QDQ model. https://github.com/microsoft/onnxruntime/pull/19723	2024-03-15 17:02:01 -07:00
enximi	7b46b31558	fix: "UserWarning: Unsupported Windows version (11). ONNX Runtime sup… (#19845 ) fix: "UserWarning: Unsupported Windows version (11). ONNX Runtime supports Windows 10 and above, only." ### Description Include Windows 11 in the version check. Now, you will not see the warning “Unsupported Windows version (11). ONNX Runtime supports Windows 10 and above, only.” ### Motivation and Context Warning on Windows 11: Only supports systems above Windows 10, which is somewhat strange.	2024-03-15 12:41:44 -07:00
Yulong Wang	79e50aeef3	[js/web] rewrite backend resolve to allow multiple EPs (#19735 ) ### Description This PR rewrite the backend resolve logic to support specifying multiple EPs. #### Backend The first version of ONNX Runtime Web actually carried some existing code from [ONNX.js](https://github.com/microsoft/onnxjs), which includes the "backend" concept. The original "backend" in ONNX.js is designed in a way assuming there is only one backend from user's backend hint list will be used. For example, in ONNX.js, if user specify a backend hint as `['webgl', 'wasm']`, ONNX.js will first try to use WebGL backend - if it loads successfully (the browser supports webgl), then "webgl" backend will be used and "wasm" will be ignored; otherwise, "webgl" will be ignored and try to load "wasm" backend. In short: only one backend will be used when initializing a session. #### Execution Provider Execution Provider, or EP, in ONNX Runtime is a different concept. One of the differences is that users are allow to specify multiple EPs, and if one does not support a particular kernel, it can fallback to other EP. This is a very common case when using a GPU EP in ONNX Runtime. #### Current Status: Backend v.s. EP Because of the history reasons mentioned above, the current status is quite confusing. There are real backends, which means it's different implementation in code; and there are backend hints, which are used as string names for backend hint; and there are EPs of the ONNX Runtime concepts. currently there are only 2 backends in our code base: The "onnxjs backend", and the "wasm backend". The "onnxjs backend" currently only powers backend hint "webgl", which go into the old onnx.js code path. All other backend hints including "wasm", "cpu"(alias to wasm), "webgpu" and "webnn" are all powered by "wasm backend". And because ORT Web treat "backend" as an internal concept and want to align with ONNX Runtime, so those names of backend hints are becoming EP names. The following table shows today's status: \| Execution Provider Name (public) / Backend Hint (internal) \| Backend \| EP in ORT \| -------- \| ------- \| ------- \| \| "wasm"/"cpu" \| WasmBackend \| CPU EP \| "webgl" \| OnnxjsBackend \| \* technically not an EP \| "webgpu" \| WasmBackend \| JSEP \| "webnn" \| WasmBackend \| WebNN EP #### Problem While the API allows to specify multiple EPs, the backend resolving only allows one backend. This causes issues when user specify multiple EP names in session options, the backend resolve behavior and EP registration behavior is inconsistent. Specifically, in this issue: https://github.com/microsoft/onnxruntime/issues/15796#issuecomment-1925363908: EP list `['webgpu', 'wasm']` on a browser without WebGPU support resolves to 'wasm' backend, but the full EP list is passed in session options, so JSEP is still enabled, causing the runtime error. #### Solution Since we still need WebGL backend, we cannot totally remove the backend register/resolve system. In this PR I made the following changes: - initialize every backend from the EP list, instead of only do that for the first successful one. - for the first resolved backend, filter all EP using the exact same backend. Remove all EPs not using this backend from session options - for every explicitly specified EP, if it's removed, show a warning message in console	2024-03-15 11:47:45 -07:00
Yifan Li	0b2a75b274	[EP Perf] Add concurrency test (#19804 ) ### Description <!-- Describe your changes. --> * Add concurrency test to EP Perf CI panel (impl. by onnx_test_runner) * Model: FasterRCNN-10 model within CI image * `-c` param configurable via CI panel when kicking off CI tasks * Auto-replicate test input/outputs according to `-c` param * By default, the model test will be executed in 100 iterations (~2min added to T4 CI task load overall) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> To monitor potential concurrency issues of ORT-TRT	2024-03-15 07:41:21 -07:00
Hariharan Seshadri	42399dfd2b	Fix a potential race in the CUDA TopK kernel (#19917 ) ### Description If the `K` value is flowing through as a tensor, we are updating a mutable member of the `TopK` class and basing the compute off that - which is likely to cause data race issues with concurrent Run() calls and `K` value changes. ### Motivation and Context Fix potential race in CUDA TopK kernel	2024-03-14 18:13:47 -07:00
Justin Chu	bcf47d3546	Update install_deps_lort.sh to fix onnxscript installation (#19922 ) Install onnxscript correctly with `pip install`. Dev dependencies are not required. ### Motivation and Context Fix build breaks.	2024-03-14 17:05:50 -07:00
Adam Louly	32558134a9	[On-Device-Training] Upgrade Flatbuffers to Support 2GB+ Checkpoints. (#19770 ) ### Description Modifications to support 2GB+ checkpoint & Upgrading Flatbuffers ### Motivation and Context This PR includes changes that will make ort handle 2GB+ checkpoints. To do that we need to upgrade flatbuffers to 23.5.9 - https://github.com/google/flatbuffers/pull/7945 - Modified the commitHash and the hash for the new version - Removed the patch for rust generator's unused variable warning as it is no longer producing this - [Check it out here](`d121e09d89/src/idl_gen_rust.cpp`) - Updated the VerifyField calls with alignment values that were introduced in the new version. --------- Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>	2024-03-14 16:36:24 -07:00
Yi Zhang	87a9f77c56	Refactor Python Packaing Pipeline (Training Cuda 11.8) (#19910 ) ### Description 1. Use stage to organize the pipeline and split building and testing 2. Move compilation on CPU machine 3. test stage can leverage existing artifacts 4. check wheel size, it gives warning if the size above 300M 5. docker image name wasn't change even the argument changed, which caused the docker image was always rebuilt. So update the docker image name according to the argument can save the docker build time. Pipeline duration reduced by 60% (2 hours -> 50 minutes) Compilation time reduced by 75% (1.5hours -> 20 minutes) GPU time reduced by 87% ( 8 hours to 1 hours) for debugging, the GPU time could be reduced by above 95%, because we can choose run only one test stage and skip building. ### Motivation and Context Make the pipeline efficient. Optimized https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=424177&view=results Curent https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=422393&view=results ---------	2024-03-15 06:47:41 +08:00
Changming Sun	8b766bd24e	Change nuget pipeline's "Windows_Packaging_combined_GPU" job to download TRT binaries in every build (#19919 ) ### Description Change nuget pipeline's "Final_Jar_Testing_Windows_GPU" job to download TRT binaries in every build. Now all the other build jobs are already doing this. This is the only one left. Similar to #19909 ### Motivation and Context As a follow up of #19118	2024-03-14 15:07:56 -07:00
Tianlei Wu	a2ffc3740b	[Cuda] Demo multiple cuda graphs and user compute stream (#19883 ) Update stable diffusion demo to add options `--max-cuda-graphs` and `--user-compute-stream`. * Add python class GpuBindingManager to manage IO Binding based on input shape and max number of cuda graphs setting. The benefit is that one inference session could enable or disable cuda graph in different runs. * When `--user-compute-stream`, the demo will use custom compute stream.	2024-03-14 13:48:37 -07:00
Edward Chen	0b90363acb	[MLAS][AArch64] SQ4BitGemm CompInt8 multi-block implementation (#19826 ) Update SQ4BitGemm CompInt8 implementation to process multiple blocks along a single column instead of processing single blocks from multiple columns.	2024-03-14 13:05:42 -07:00
Baiju Meswani	226f60f2f1	Add support for SGD optimizer in minimal build (#19901 )	2024-03-14 11:31:20 -07:00
Changming Sun	1fb6cbddee	Add a build patch for Windows ARM64EC (#19898 ) ### Description Add a patch for Windows ARM64EC ### Motivation and Context Will need more changes in onnxruntime/core/common/cpuid_arch_definition.h and onnxruntime/core/common/cpuid_info.cc	2024-03-14 08:50:42 -07:00
Changming Sun	ea4a5eea18	Change nuget pipeline's "Final_Jar_Testing_Windows_GPU" job to download TRT binaries in every build (#19909 ) ### Description Change nuget pipeline's "Final_Jar_Testing_Windows_GPU" job to download TRT binaries in every build. Now all the other build jobs are already doing this. This is the only one left. ### Motivation and Context As a follow up of #19118	2024-03-14 07:55:00 -07:00
cao lei	966fa74597	Add 2 C API for ort extension (#19808 ) ### Description <!-- Describe your changes. --> Add 2 C API for ORT extension: - KernelInfo_GetAllocator - OrtCustomOp::GetMayInplace ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Add 2 C API for ORT extension project, which will leverage these 2 APIs for GroupQueryAttention custom op.	2024-03-14 06:00:41 -07:00
pengwa	409b811325	Refine logging for execution plan print (#19777 ) ### Refine logging for execution plan print Printing NodeIndex only is not enough for us to debug the execution order. keep original behaviour for ORT_MINIMAL_BUILD build in case of any CPU memory concerns. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-03-14 16:31:32 +08:00
Scott McKay	0be0791fcc	Update MAUI model tester tool to .net8 (#19907 ) ### Description <!-- Describe your changes. --> Update to .net8. Didn't want to build with the latest VS2022 using net6 (which was EOL last year). ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-03-14 15:19:19 +10:00
Jeff Daily	9443366009	[ROCm] fix build failure when nccl is enabled (#19900 ) Building onnxruntime ROCm EP with --enable_nccl --use_mpi fails due to inclusion of MOE source files but MOE is not supported. The error observed is `error: contrib_ops/rocm/moe/ft_moe/moe_kernel.h: No such file or directory` The fix is to exclude collective/sharded_moe.* files when nccl is requested.	2024-03-13 21:16:54 -07:00
Adrian Lizarraga	9c3242ab70	[QNN EP] Copy security catalog file for HtpV73Skel.so from QNN SDK (#19903 ) ### Description Copies the `QNN_HOME/lib/hexagon-v73/unsigned/libqnnhtpv73.cat` file from QNN SDK to the unittest build directory. This is necessary in order to be able to load the `libQnnHtpV73Skel.so` file on Windows for modern versions of QNN SDK. ### Motivation and Context A [digitally-signed catalog file](https://learn.microsoft.com/en-us/windows-hardware/drivers/install/catalog-files) (.cat) can be used as a digital signature for an arbitrary collection of files.	2024-03-13 20:52:59 -07:00
cao lei	2c525a79b1	Add new API KernelContext_GetScratchBuffer (#19809 ) ### Description <!-- Describe your changes. --> add new API KernelContext_GetScratchBuffer to get scratch buffer from kernel context ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> add new API KernelContext_GetScratchBuffer to get scratch buffer from kernel context which will be used in ORT extension project for GroupQueryAttention custom op	2024-03-13 19:41:15 -07:00
Jake Mathern	18ad8587a6	[CP] Fix for xfgcheck and Fix WAI ARM64 build (#19634 ) (#19644 ) ### Description Fix WAI build by only conditionally copying linker flags ### Motivation and Context I broke the WAI build that contains ORT on ARM64	2024-03-13 17:54:06 -07:00
Markus Tavenrath	f42e6ad61e	Add support for LRN NHWC OPs (#19866 ) Support LRN NHWC in the CUDA EP. ### Motivation and Context Add support for all NHWC OPs to avoid NHWC/NCHW Layout transformation	2024-03-13 17:52:07 -07:00
raoanag	9f08f8d5b2	Set seed for DynamicQuantizeMatMul tests (#19896 ) Seed for DynamicQuantizeMatMul tests to avoid pipeline failures with marginal mismatches.	2024-03-13 17:49:55 -07:00
kunal-vaishnavi	4ac98d6d65	Update replacing MultiHeadAttention with GroupQueryAttention (#19882 ) ### Description This PR updates the replacement of MultiHeadAttention (MHA) with GroupQueryAttention (GQA). It is related to the changes in [this PR](https://github.com/microsoft/onnxruntime/pull/18906). ### Motivation and Context The updated replacement of MHA with GQA includes the following fusion changes. - Apply sliding window within GQA - Fuse the rotary embeddings within GQA - Fuse the 3 MatMuls into 1 packed MatMul if possible - Fuse the 3 Adds into 1 packed Add if possible	2024-03-13 14:10:52 -07:00
aciddelgado	8eb49c5f00	fix gqa rotary dim 1 (#19874 ) ### Description GQA Rotary Dimension 1 incorrectly assumed to be based on head size. ### Motivation and Context This change should enable us to run phi-2 with GQA and Rotary Embedding fused.	2024-03-13 14:09:54 -07:00
Yulong Wang	e771a763c3	[js/test] align web test runner flags with ort.env (#19790 ) ### Description the `npm test` flags are difficult to memorize, because they are different to the `ort.env` flags. This change makes those flags align with ort JS API. eg. `--wasm-enable-proxy` became `--wasm.proxy`. Old flags are marked as deprecated except `-x` (as a shortcut of `--wasm.numThreads`)	2024-03-13 12:00:36 -07:00
Yi Zhang	d5d9dbd51d	reuse T4 on Linux GPU (#19879 ) ### Description ### Motivation and Context Linux GPU test on A10 isn't very stable	2024-03-13 10:41:36 -07:00
Satya Kumar Jandhyala	ed250b88c3	[JS/WebGPU] Optimize MatMulNBits (#19852 ) ### Description Use vec<2> or vec<4>, operands in MatMulNBits ### Motivation and Context Improve performance	2024-03-13 10:33:14 -07:00
Hariharan Seshadri	ed306b4f97	Fix Android CI pipeline (#19877 )	2024-03-13 10:09:43 -07:00

1 2 3 4 5 ...

10754 commits