onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-17 18:40:28 +00:00

Author	SHA1	Message	Date
Jing Fang	fbe22fdac7	[ARM CPU] Fix flaky hqnbitgemm UT (#23010 ) ### Description Increase fp16 qnbitgemm UT tol and use fixed seeds. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-12-04 14:55:52 -08:00
Yulong Wang	7b0fa407eb	fix requirements.txt path (#22946 ) ### Description #22380 removes the file `tools/ci_build/github/linux/docker/inference/x86_64/python/cpu/scripts/requirements.txt` but it is still used in `dockerfiles/Dockerfile.cuda`. This change updates the file path of the requirements.txt fixes #22945.	2024-12-04 13:08:29 -08:00
Yulong Wang	d0dde4f7d4	[wasm/test] update packages versions (#23008 ) ### Description Upgrade packages version to resolve the following dependabot alerts: - https://github.com/microsoft/onnxruntime/security/dependabot/269 - https://github.com/microsoft/onnxruntime/security/dependabot/268 - https://github.com/microsoft/onnxruntime/security/dependabot/275 - https://github.com/microsoft/onnxruntime/security/dependabot/306 ``` # npm audit report braces <3.0.3 Severity: high Uncontrolled resource consumption in braces - https://github.com/advisories/GHSA-grv7-fg5c-xmjg fix available via `npm audit fix` node_modules/braces cookie <0.7.0 cookie accepts cookie name, path, and domain with out of bounds characters - https://github.com/advisories/GHSA-pxg6-pf52-xh8x fix available via `npm audit fix` node_modules/cookie engine.io 0.7.8 - 0.7.9 \|\| 1.8.0 - 6.6.1 Depends on vulnerable versions of cookie Depends on vulnerable versions of ws node_modules/engine.io socket.io 1.6.0 - 4.7.5 Depends on vulnerable versions of engine.io node_modules/socket.io ws 8.0.0 - 8.17.0 Severity: high ws affected by a DoS when handling a request with many HTTP headers - https://github.com/advisories/GHSA-3h5v-q93c-6h6q fix available via `npm audit fix` node_modules/ws socket.io-adapter 2.5.2 - 2.5.4 Depends on vulnerable versions of ws node_modules/socket.io-adapter 6 vulnerabilities (1 low, 1 moderate, 4 high) ```	2024-12-04 13:08:13 -08:00
Yulong Wang	fdf5ffe2cf	[js/node] fix TypeScript declaration in onnxruntime-node (#23000 ) ### Description fix TypeScript declaration in onnxruntime-node ### Motivation and Context Fixes #22978	2024-12-04 11:29:27 -08:00
Xu Xing	c19617a24a	[js/webgpu] Add GatherND (#22847 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-12-04 09:57:32 -08:00
Yulong Wang	a615bd6688	Bump version of Dawn to 12a3b24c4 (#23002 ) ### Description Upgrade version of Dawn. Removed dawn.patch, because all patches are included in upstream. Updated code that affected by API changes (`const char*` -> `WGPUStringView`) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-12-04 09:47:16 -08:00
Yulong Wang	50b38ca9d5	[js/web] update default export to include webgpu (#22754 ) ### Description This PR changes the following exports: - `onnxruntime-web` now is same to `onnxruntime-web/webgpu`. - `onnxruntime-web/webgpu` is deprecating. ### Migration instructions: - use `onnxruntime-web` instead of `onnxruntime-web/webgpu`. - use `onnxruntime-web/wasm` if want to use onnxruntime-web without webgpu/webnn. ### Export table \| file name \| export entry \| includes WASM \| includes JSEP (WebGPU & WebNN) \| includes WebGL \| ------------- \| ------------- \| ----- \| ----- \| ----- \| ort.all.min.js<br/>ort.all.js<br/>ort.all.min.mjs<br/>ort.all.mjs \| `onnxruntime-web/all` \| ✔️\| ✔️\| ✔️ \| ort.min.js<br/>ort.js<br/>ort.min.mjs<br/>ort.mjs \| `onnxruntime-web` \| ✔️\| ❌ --> ✔️\| ✔️ -->❌ \| ort.webgpu.min.js<br/>ort.webgpu.js<br/>ort.webgpu.min.mjs<br/>ort.webgpu.mjs \| `onnxruntime-web/webgpu` \| ✔️ \| ✔️ \|❌ \| ort.wasm.min.js<br/>ort.wasm.js<br/>ort.wasm.min.mjs<br/>ort.wasm.mjs \| `onnxruntime-web/wasm` \| ✔️ \| ❌ \|❌	2024-12-04 09:46:45 -08:00
Chi Lo	9b9f881475	[TensorRT EP] Use TRT/CUDA/ORT version from runtime instead of build time to generate hash value (#22921 ) Use TensorRT and CUDA version fetched at runtime to get the hash value which determines the cache name. The old way to get the version is at compile/build time that might have some issues in some cases, ex: TRT EP uses the TRT version which we or users built against at compile time. However, users can change different TRT version at run time, that can cause issue because TRT EP always checks the "fixed" TRT version, not the TRT version it uses now. This can cause TRT EP to use incompatible TRT engine cache. see the github issue here: https://github.com/microsoft/onnxruntime/issues/22382#issuecomment-2404140754	2024-12-03 21:58:43 -08:00
dependabot[bot]	bd701e4f33	Bump cross-spawn from 7.0.3 to 7.0.6 in /js (#23003 )	2024-12-04 05:07:21 +00:00
Yulong Wang	06526af346	[js/webgpu] fix a bug in transpose shader (#22997 ) ### Description Fix a bug in transpose shader, when input/output rank is 1. ### Motivation and Context Fixes #22994	2024-12-03 20:21:08 -08:00
Yulong Wang	e84b8e7bd5	allow specify a custom local source path for Dawn (#22999 ) ### Description Allows to build ONNX Runtime with a custom local path of Dawn's source code. Usage: ```sh build --use_webgpu --cmake_extra_defines "onnxruntime_CUSTOM_DAWN_SRC_PATH=C:/src/dawn" ```	2024-12-03 19:25:22 -08:00
dependabot[bot]	4497c97d54	Bump cross-spawn from 7.0.3 to 7.0.6 in /js/node (#22998 ) Bumps [cross-spawn](https://github.com/moxystudio/node-cross-spawn) from 7.0.3 to 7.0.6. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/moxystudio/node-cross-spawn/blob/master/CHANGELOG.md">cross-spawn's changelog</a>.</em></p> <blockquote> <h3><a href="https://github.com/moxystudio/node-cross-spawn/compare/v7.0.5...v7.0.6">7.0.6</a> (2024-11-18)</h3> <h3>Bug Fixes</h3> <ul> <li>update cross-spawn version to 7.0.5 in package-lock.json (<a href="`f700743918`">f700743</a>)</li> </ul> <h3><a href="https://github.com/moxystudio/node-cross-spawn/compare/v7.0.4...v7.0.5">7.0.5</a> (2024-11-07)</h3> <h3>Bug Fixes</h3> <ul> <li>fix escaping bug introduced by backtracking (<a href="`640d391fde`">640d391</a>)</li> </ul> <h3><a href="https://github.com/moxystudio/node-cross-spawn/compare/v7.0.3...v7.0.4">7.0.4</a> (2024-11-07)</h3> <h3>Bug Fixes</h3> <ul> <li>disable regexp backtracking (<a href="https://redirect.github.com/moxystudio/node-cross-spawn/issues/160">#160</a>) (<a href="`5ff3a07d9a`">5ff3a07</a>)</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`77cd97f3ca`"><code>77cd97f</code></a> chore(release): 7.0.6</li> <li><a href="`6717de49ff`"><code>6717de4</code></a> chore: upgrade standard-version</li> <li><a href="`f700743918`"><code>f700743</code></a> fix: update cross-spawn version to 7.0.5 in package-lock.json</li> <li><a href="`9a7e3b2165`"><code>9a7e3b2</code></a> chore: fix build status badge</li> <li><a href="`085268352d`"><code>0852683</code></a> chore(release): 7.0.5</li> <li><a href="`640d391fde`"><code>640d391</code></a> fix: fix escaping bug introduced by backtracking</li> <li><a href="`bff0c87c8b`"><code>bff0c87</code></a> chore: remove codecov</li> <li><a href="`a7c6abc6fe`"><code>a7c6abc</code></a> chore: replace travis with github workflows</li> <li><a href="`9b9246e096`"><code>9b9246e</code></a> chore(release): 7.0.4</li> <li><a href="`5ff3a07d9a`"><code>5ff3a07</code></a> fix: disable regexp backtracking (<a href="https://redirect.github.com/moxystudio/node-cross-spawn/issues/160">#160</a>)</li> <li>Additional commits viewable in <a href="https://github.com/moxystudio/node-cross-spawn/compare/v7.0.3...v7.0.6">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=cross-spawn&package-manager=npm_and_yarn&previous-version=7.0.3&new-version=7.0.6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) Dependabot will merge this PR once CI passes on it, as requested by @fs-eire. [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-12-03 18:48:22 -08:00
Yulong Wang	d3bc3180d8	[js/node] fix CUDA artifact installation script for Linux/x64 (#22984 ) ### Description This PR updates installation script to fix it for CUDA v12. However, it may be difficult for CUDA v11 since the steps are quite complicated to automate. Added a few lines of instructions instead. fixes #22877	2024-12-03 16:07:43 -08:00
Prathik Rao	5c644d3747	[WebGPU EP] Flatten implementation (#22964 ) Implements flatten operator for native webgpu.	2024-12-03 14:40:57 -08:00
Jian Chen	9ed0c7fe26	Redo "Update Gradle version 8.7 and java version 17 within onnxruntime/java" (#22923 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-12-02 18:34:25 -08:00
Edward Chen	e2356a0403	Use UTF8 string encoding in ORTSaveCodeAndDescriptionToError(). (#22982 ) Update from ASCII to UTF8 string encoding when creating the `NSString` description.	2024-12-02 17:41:52 -08:00
Kee	8c52fa3924	[VSINPU]Split/Pad and some element-wise OPs support (#22916 ) ### Description -Add split/pad/neg/not/ceil/round/min/max op support -Fix conv2d op default pads value issue -Add VSINPU EP to support python bindings ### Motivation and Context -New OPs support for VSINPU EP --------- Signed-off-by: Kee <xuke537@hotmail.com>	2024-12-02 13:57:30 -08:00
Satya Kumar Jandhyala	e8bf46a70e	[WebGPU EP] Support GroupQueryAttention (#22658 ) ### Description <!-- Describe your changes. --> Support GroupQueryAttention operator for native webgpu ep. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This is required for inferencing some LLMs.	2024-12-02 12:40:03 -08:00
Jian Chen	6c2ff5fc55	Refactor emulator start and stop functions for clarity and efficiency (#22861 ) ### Description This pull request introduces several enhancements and new functionalities to the `tools/python/util/android/android.py` file, focusing on improving the management of Android emulators. The most important changes include adding a timeout parameter to the `start_emulator` function, adding checks to prevent multiple emulators from running simultaneously, and introducing new utility functions to manage emulator processes more effectively. Enhancements to `start_emulator` function: * Added a `timeout_minutes` parameter to the `start_emulator` function to make the startup timeout configurable. [[1]](diffhunk://#diff-c54db556a9c445989f830c09ab90ce2704e648deaccce9c9e0ee4875ddaa864dL108-R117) [[2]](diffhunk://#diff-c54db556a9c445989f830c09ab90ce2704e648deaccce9c9e0ee4875ddaa864dL158-R170) * Added a check to prevent starting a new emulator if one with the same AVD name is already running. * Included additional emulator arguments `-verbose` for better control and debugging. * Added a final verification step to ensure the emulator has started successfully. New utility functions for managing emulator processes: * Introduced `check_emulator_running_using_avd_name `, `check_emulator_running_using_process`, and `check_emulator_running_using_pid` to check if an emulator is running based on AVD name, process instance, or PID, respectively. * Added `stop_emulator_by_proc` and `stop_emulator_by_pid` functions to stop the emulator process using a `subprocess.Popen` instance or PID, with a configurable timeout. * Updated the `stop_emulator` function to use the new utility functions for stopping the emulator process. These changes enhance the robustness and flexibility of the emulator management utilities, making it easier to handle different scenarios in CI environments and development workflows. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Scott McKay <skottmckay@gmail.com>	2024-12-02 09:29:17 -08:00
Chi Lo	e234023d11	[TensorRT EP] Fix wrong input order when generating IndexedSubGraph (#22857 ) The input order of generated indexedSubGraph needs to be consistent with the input order of original graph. This PR will also fix the github issue https://github.com/microsoft/onnxruntime/issues/22729	2024-12-02 01:45:29 -08:00
Chi Lo	49a80df77f	Keep the model metadata on the generated EP context model (use bridge api) (#22860 ) In addition to the [PR](https://github.com/microsoft/onnxruntime/pull/22825) which directly uses internal graph api, this PR updates the bridge api for the case of TRT EP and OpenVINO EP.	2024-12-01 21:57:45 -08:00
Vincent Wang	1128882bfd	Quantize Bias for Conv/Gemm on Quantized Model (#22889 ) Some quantized models don't have Conv/Gemm node's bias quantized but still leave them in float. This PR is to create a sub-graph to quantize the bias for Conv/Gemm nodes with scale = scale_input_0 * scale_input_1 and zp = 0. We only do this for bias initializer so that ConstantFolding will fold the sub-graph to a real quantized int32 bias initializer during the graph optimization next round.	2024-11-28 10:10:24 +08:00
Vincent Wang	42ecb05080	[QNN] ReduceL2 Support (#22636 ) Add ReduceL2 support to QNN EP. Some of the QNN AI Hub models contain Reduce L2, such as openai_clip_CLIPTextEncoder and openai_clip_CLIPIamgeEncoder, without this PR, the ReduceL2 will be assigned to CPU and the graph will be split to 2 QNN graphs, which this PR, all nodes will be in QNN EP.	2024-11-28 10:09:13 +08:00
Jing Fang	08abab0b14	[CPU] Fix mamtulnbits accuracy level (#22963 ) ### Description Fix mamtulnbits accuracy level ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-11-27 17:40:04 -08:00
wejoncy	a24723df16	[CoreML ] ML Program more operators support [3/N] (#22710 ) ### Description - Erf - Round - Max - ReduceMax - ReduceMean - ReduceSum - Unsqueeze - Squeeze - Softmax ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Scott McKay <skottmckay@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2024-11-28 09:21:02 +08:00
Yi Zhang	b930b4ab5b	Limit PipAuthenticate in Private Project now (#22954 ) ### Description Fixes regression in post merge pipeline caused by #22612 ### Motivation and Context So far, there isn't the artifactFeeds in Public Project	2024-11-27 13:32:35 +08:00
Wanming Lin	fe749a88a5	[WebNN EP] Fixed bug in usage of Array.reduce() (#22944 ) In JS, reduce of empty array with no initial value will throw error. Fix it by checking the array length firstly.	2024-11-26 19:03:44 -08:00
wejoncy	c284a686f2	[CoreML] Create EP by AppendExecutionProvider (#22675 ) ### Description AppendExecutionProvider("CoreML", {{"MLComputeUnits","MLProgram"}}) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Scott McKay <skottmckay@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2024-11-27 09:26:31 +08:00
Chen Feiyue	487184fa42	[VSINPU] update crosscompiling patch (#22937 ) ### Description <!-- Describe your changes. --> Update this patch because the origin file has changed ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-11-26 14:35:16 -08:00
amancini-N	8826e39a81	#22890 Fix profiling on empty Optional (#22891 ) ### Description Fix sequential_executor.cc to avoid segfault when profiling is used on model with empty Optional ### Motivation and Context Fixes #22890	2024-11-26 11:18:47 -08:00
shiyi	afbb53937c	[WebNN] Support negative steps for slice (#22871 ) Slice with negative steps can be emulated by reverse+slice.	2024-11-25 23:06:23 -08:00
Bin Miao	558ae8621c	[WebNN EP] Fix an issue of CumSum operator (#22936 ) This PR limits the axis of the CumSum operator to be a constant when using WebNN EP. @Honry @fdwr PTAL.	2024-11-25 21:05:53 -08:00
sheetalarkadam	f80afeb9a1	Override android qnn sdk version with pipeline param (#22895 ) We need to be able to control/override the exact version of qnn sdk used for the android build as qnn-runtime (maven package) releases are slower to QNN SDK releases.	2024-11-25 21:01:05 -08:00
Tianlei Wu	09d2ee6274	Update pipeline status (#22924 ) ### Description Update pipeline status: (1) replace dead link of cuda pipeline (2) remove dead link of training distributed pipeline (3) add webgpu pipeline Before: https://github.com/microsoft/onnxruntime/blob/main/README.md#builtin-pipeline-status After: `8ec473d013/README.md (builtin-pipeline-status)` ### Motivation and Context Some pipelines are removed, need replace with new one.	2024-11-24 21:26:27 -08:00
Yi Zhang	85751e7276	Build DML in Windows GPU CI pipeline (#22869 ) ### Description Add a new stage to build cuda and dml in Windows GPU CI pipeline (PR checks) to prevent regressions introduced by new cuda tests. Update all tests in cuda/testcases name prefix to CudaEp for skipping them easily ### Motivation and Context 1. CudaNhwcEP is added by default when using cuda ep 2. if onnxruntime_ENABLE_CUDA_EP_INTERNAL_TES is enable, the tests in tests/provider/cuda/testcases is added too. ### To do add enable_pybind in the new stage. Now, --enable_pybind will trigger some python test, like onnxruntime_test_python.py. It uses the API of get_avaible_providers() . More discussions are needed to decide how to make it works	2024-11-25 10:50:52 +08:00
Xavier Dupré	a2ba3cb547	Implementation of TreeEnsemble ai.onnx.ml==5 (#22333 ) ### Description Merges PR #21851, #21222. Implements TreeEnsemble from ai.onnx.ml==5 (CPU). --------- Co-authored-by: Bilyana Indzheva <bilyana2002@gmail.com> Co-authored-by: Bilyana Indzheva <36890669+bili2002@users.noreply.github.com> Co-authored-by: Christian Bourjau <cbourjau@users.noreply.github.com>	2024-11-22 19:48:23 +01:00
Tianlei Wu	c97dd6e3c1	Update transformers test requirements (#22911 ) ### Description * Install PyTorch for transformers tests. The installation is before python tests so that it can use torch if needed. * Update protobuf and numpy versions used in transformers test. ### Motivation and Context Currently, transformers tests are enabled in the following CI pipelines: * Linux CPU CI Pipeline (torch for cpu-only) * Linux GPU CI Pipeline (torch for cuda 12) * Windows GPU CUDA CI Pipeline (torch for cpu-only right now, note that we might change it to torch for cuda 12 in the future). For ROCm CI Pipeline, transformer tests are enabled but skipped since onnx package is not installed in CI. Previously, torch was not installed before python tests, so some tests depending on torch were skipped like [test_bind_onnx_types_not_supported_by_numpy](`f6e1d44829/onnxruntime/test/python/onnxruntime_test_python_iobinding.py (L199)`) or [test user_compute_stream](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/test/python/onnxruntime_test_python.py#L465-L476). In this PR, we changed build.py to install torch before running python tests.	2024-11-22 09:45:12 -08:00
Scott McKay	b1ccbe2a8e	Minor update to onnxruntime_perf_test usage info for `-I` (#22810 ) ### Description <!-- Describe your changes. --> Update comment for `-I` to mention that symbolic dim values can be provided with `-f`. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-11-22 16:38:25 +11:00
Aleksei Nikiforov	f6e1d44829	Add option to force generic algorithms on x86 (#22917 ) Option is named onnxruntime_FORCE_GENERIC_ALGORITHMS Follow up to https://github.com/microsoft/onnxruntime/pull/22125. ### Description This change adds compile-time option to disable optimized algorithms and use generic algorithms (exclude AVX* and SSE etc in GEMM) on x86. This new option is intended only for testing these algorithms, not for production use. Following build command on linux x86_64 builds onnxruntime with new option enabled: `./build.sh --parallel --cmake_extra_defines onnxruntime_FORCE_GENERIC_ALGORITHMS=1` ### Motivation and Context This change allows testing generic algorithms. This may be needed for platforms which don't have optimized implementations available, like in https://github.com/microsoft/onnxruntime/pull/22125.	2024-11-21 13:45:46 -08:00
Tianlei Wu	8d99b1a8dc	reduce GQA test combinations (#22918 ) ### Description * Reduce GQA test combinations to save about 35 minutes test time in CI pipelines. * Show latency of transformers tests * Use seed in DMMHA test to avoid random failure. * For test_flash_attn_rocm.py, test skipping condition from "has cuda ep" to "not has rocm ep", so that it does not run in cpu build. * For test_flash_attn_cuda.py, move flash attention and memory efficient attention tests to different classes, so that we can skip a test suite instead of checking in each test. ### Motivation and Context It takes too long to run GQA tests in CI pipelines since there are too many combinations. ###### Linux GPU CI Pipeline Before: 5097 passed, 68 skipped, 8 warnings in 1954.64s (0:32:34) After: 150 passed, 176 skipped, 8 warnings in 530.38s (0:08:50) Time Saved: 1424 seconds (0:23:44) ###### Windows GPU CUDA CI Pipeline Before: 1781 passed, 72 skipped, 6 warnings in 605.48s (0:10:05) After: 116 passed, 118 skipped, 6 warnings in 275.48s (0:04:35) Time Saved: 330 seconds (0:05:30) ###### Linux CPU CI Pipeline Before: 5093 passed, 72 skipped, 4 warnings in 467.04s (0:07:47) - 212.96s transformers/test_gqa_cpu.py::TestGQA::test_gqa_past - 154.12s transformers/test_gqa_cpu.py::TestGQA::test_gqa_no_past - 26.45s transformers/test_gqa_cpu.py::TestGQA::test_gqa_interactive_one_batch After: 116 passed, 210 skipped, 4 warnings in 93.41s (0:01:33) - 0.97s transformers/test_gqa_cpu.py::TestGQA::test_gqa_past - 19.23s transformers/test_gqa_cpu.py::TestGQA::test_gqa_no_past - 2.41s transformers/test_gqa_cpu.py::TestGQA::test_gqa_interactive_one_batch Time Saved: 374 seconds (0:06:14).	2024-11-21 12:26:46 -08:00
Tianlei Wu	55f0559e5d	Update attention fusion to support SDPA pattern (#22629 ) ### Description Match new SDPA pattern for huggingface BERT model that exported from latest transformers package. Some changes of transformers tests in CI pipeline: (1) Enable tests for bert, distilbert and roberta models in CI. (2) Remove out-of-date tests for huggingface models that were marked as slow and not enabled in CI pipeline. (3) Upgrade transformers package version to the latest. ### Motivation and Context Recent huggingface transformers use torch SDPA in bert modeling. The graph pattern change causes attention fusion not working anymore. Update the fusion script to match the new pattern.	2024-11-21 09:42:41 -08:00
kailums	1e605be166	bigmodel pipeline update cp38 to cp310 (#22793 ) ### Description <!-- Describe your changes. --> when updating from cp38 to cp310, there has some issues for bigmodel pipeine. there are two jobs failed: stable_diffusion and whisper. 1. for stable_diffusion, we are now using "nvcr.io/nvidia/pytorch:22.11-py3" from nvidia repo. it is for cuda11 and python3.8. and they are not providing python3.10 version for cuda 11. the latest version of this docker image is for cuda12 and python3.10. To solve this problem, i use a docker image of ubuntu22.04, and then install all need python package for this job. 2. for whisper. the original docker image is ubuntu20.04 which doesn't have python3.10, and has to update to ubuntu22.04.	2024-11-21 07:25:01 -08:00
Jian Chen	369d7bf887	Update the Docker image version (#22907 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-11-21 19:38:39 +08:00
Yi Zhang	a28246a994	Revert "Update Gradle version 8.7 and java version 17 within onnxrunt… (#22914 ) …ime/java (#22771)" This reverts commit `632a36a233`. ### Description <!-- Describe your changes. --> ### Motivation and Context Run E2E tests using Browserstack failed due to this PR.	2024-11-21 18:12:28 +08:00
Aleksei Nikiforov	e430795332	Fix MlasSgemmKernel: properly process more than 2 rows (#22125 ) This change fixes multiple tests like QDQTransformerTests.MatMul_U8S8S8, for all architectures where architecture-specific optimized function is not available yet, like s390x. ### Description Matrix B is packed by 16 elements, thus new row starts 16 items later. Also, for next C increment index only by 1 for each increment of C. ### Motivation and Context This change fixes mlas sgemm fallback implementation for all architectures which don't have architecture-specific implementations available, like s390x.	2024-11-20 16:00:23 -08:00
Kyle	712bee13db	Fix Pipeline Timeout Issue (#22901 ) ### Description <!-- Describe your changes. --> Extend timeout for always failed job. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-11-20 17:18:50 +01:00
Edward Chen	af0303f9b4	Simplify CPU allocator arena usage helper function, fix unit tests that check old ifdefs. (#22876 )	2024-11-19 14:24:52 -08:00
Changming Sun	13346fdf18	Cleanup code (#22827 ) ### Description 1. Delete TVM EP because it is out of maintain 2. Delete ortmodule related docker files and scripts.	2024-11-19 14:13:33 -08:00
Wanming Lin	5b787121e8	[WebNN] Check split's output name (#22884 ) Chromium will rename split's output name from "output" to "outputs" in `OpSupportLimits` to align with spec, the EP should check which name is available to make it compatible.	2024-11-19 12:44:23 -08:00
Wanming Lin	8a06f13301	[WebNN] Remove wasm.currentContext check (#22886 ) If a WebNN session is threw early, this check for `wasm.currentContext` will break all the following WebNN sessions, this often happens in npm tests.	2024-11-19 12:22:02 -06:00

1 2 3 4 5 ...

12066 commits