Commit graph

12111 commits

Author SHA1 Message Date
Yulong Wang
ae6dcc839e
Revert "[js/webgpu] disable failed tests temporarily (#23127)" (#23130)
### Description

This reverts commit 9115682d69.

### Motivation and Context
2024-12-18 18:07:50 -08:00
Prathik Rao
31e6e1010c
gather elements webgpu implementation (#23137)
Increases operator coverage for WebGPU EP.
2024-12-18 16:29:26 -08:00
Changming Sun
5d7030e4c6
Revert DML pipeline changes (#23135)
### Description
Previously we wanted to add DirectML EP to existing onnxruntime Windows
CUDA packages. After careful consideration, we will postpone the change.
This PR reverts some pipeline changes previously made by @mszhanyi and
@jchen351 .
2024-12-18 10:42:10 -08:00
Changming Sun
e76bd2f5e9
Update CODEOWNERS: remove onnxruntime-es (#21677)
Removing this restriction for now.
2024-12-17 13:39:13 -08:00
Wanming Lin
a5b60ec03f
[WebNN] Add limit to QDQ ops (#23076)
WebNN requires the `scale_shape` to be a subsample of the `input_shape`.
2024-12-17 12:52:08 -08:00
Enrico Galli
54edb43e77
[WebNN] Fixes MLTensor caching across different contexts (#23100)
We weren't checking that MLTensors were from the same context before
reusing them.

Found while debugging microsoft/webnn-developer-preview#69
2024-12-17 12:51:16 -08:00
Tianlei Wu
5afab787db
Update python version metadata (remove 3.7, 3.8, 3.9; add 3.13). (#23067)
### Description

* Update python version metadata to be in sync with latest python
packages (onnxruntime, onnxruntime-gpu and onnxruntime-qnn).
* Update black format target-version to 3.10, and use lintrunner to
format all files.
* Update the lintrunner installation command line to be consistent.
* Include `requirements-lintrunner.txt` in `requirements-dev.txt` to
avoid duplicated settings.

### Motivation and Context

https://github.com/microsoft/onnxruntime/issues/22993

Python support by numpy:
https://numpy.org/neps/nep-0029-deprecation_policy.html#drop-schedule
```
On Apr 05, 2024 drop support for Python 3.9
On Apr 04, 2025 drop support for Python 3.10
```
2024-12-17 10:59:20 -08:00
Jiajia Qin
0981bbf4ca
[webgpu] Optimize matmulnbits with M > 1 (#23102)
This is the webgpu native ep implementation of #23092.

I used https://github.com/fs-eire/ort-webgpu-nodejs-chatapp-prototype to
test. Meanwhile, applied
https://github.com/fs-eire/ort-webgpu-nodejs-chatapp-prototype/pull/2 to
print the first token time.

The result is like below:
The latest main branch:
Intel Arc Graphics
```
659 tokens in 24.8sec, 26.57 tokens/sec
    Decoding first token with input 449 tokens: 13.0 sec
    Decoding remaining 210 tokens:
        11.8 sec
        17.79 tokens/sec
```
NV RTX 2000
```
659 tokens in 14.4sec, 45.85 tokens/sec
    Decoding first token with input 449 tokens: 7.3 sec
    Decoding remaining 210 tokens:
        7.0 sec
        29.81 tokens/sec
```

-------------------------------------------------------------------------
With this PR:
Intel Arc Graphics
```
657 tokens in 20.6sec, 31.92 tokens/sec
    Decoding first token with input 449 tokens: 8.5 sec
    Decoding remaining 208 tokens:
        12.1 sec
        17.23 tokens/sec
```
NV RTX 2000
```
659 tokens in 11.4sec, 57.93 tokens/sec
    Decoding first token with input 449 tokens: 4.1 sec
    Decoding remaining 210 tokens:
        7.2 sec
        28.98 tokens/sec
```

From above data, you can see that with this PR, both intel (13s -> 8.5s)
and NV (7.3s -> 4.1s) GPUs for the first token time are performing
better.
2024-12-16 20:47:40 -08:00
Yulong Wang
9115682d69
[js/webgpu] disable failed tests temporarily (#23127)
### Description


Those test cases start to fail for unknown reasons.

To unblock the CI, I disabled those tests temporarily to earn time to
investigate the root cause.
2024-12-16 15:35:47 -08:00
Dmitri Smirnov
ae97068137
Fix Pybind memory leak (#23105)
### Description
<!-- Describe your changes. -->
Array GETITEM returns new reference which is a leak


### Motivation and Context
Address  https://github.com/microsoft/onnxruntime/issues/22271
2024-12-16 10:38:23 -08:00
tianf-fff
a4eb8f27b6
[VitisAI] Add profiler interface for vitisai (#23032)
### Description
<!-- Describe your changes. -->
Add common interfaces for vitis ep profiler.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Vitis ep can collect and record api and kernel timestamps in file when
onnxruntime '-p' is enabled.
2024-12-16 09:09:48 -08:00
Changming Sun
2ff66b80e0
Fix a deadlock bug in EigenNonBlockingThreadPool.h (#23098)
### Description
This PR fixes a deadlock bug in EigenNonBlockingThreadPool.h. It only happens on platforms with weakly ordered memory model, such as ARM64.
2024-12-16 09:05:12 -08:00
Yulong Wang
3a0b958586
add 2 CMake build options of Dawn (#23096)
### Description

This change adds the following CMake build options for Dawn:
- onnxruntime_BUILD_DAWN_MONOLITHIC_LIBRARY
  - OFF by default
  - when enabled, builds Dawn as a monolithic library (webgpu_dawn.dll)
- onnxruntime_ENABLE_DAWN_BACKEND_VULKAN
  - OFF by default
  - when enabled, build with Vulkan backend for Dawn on Windows
- onnxruntime_ENABLE_DAWN_BACKEND_D3D12
  - ON by default
  - when enabled, build with DirectX 12 backend for Dawn on Windows



### File Size Comparison (Windows)

|  Build | cmdline  |  File Size  |
|---|---|---|
| Baseline | --config Release<br/> --build_shared_lib | `12,755,456
onnxruntime.dll` |
| WebGPU D3D12 (default) | --use_webgpu<br/> --config Release<br/>
--build_shared_lib | `17,082,368 dxcompiler.dll`<br/>` 1,508,472
dxil.dll`<br/>`18,708,480 onnxruntime.dll` |
| WebGPU D3D12+Vulkan | --use_webgpu<br/> --config Release<br/>
--build_shared_lib<br/> --cmake_extra_defines<br/>
onnxruntime_ENABLE_DAWN_BACKEND_D3D12=1<br/>
onnxruntime_ENABLE_DAWN_BACKEND_VULKAN=1 | `17,081,344
dxcompiler.dll`<br/>` 1,508,472 dxil.dll`<br/>`19,388,416
onnxruntime.dll` |
| WebGPU Vulkan | --use_webgpu<br/> --config Release<br/>
--build_shared_lib<br/> --cmake_extra_defines<br/>
onnxruntime_ENABLE_DAWN_BACKEND_D3D12=0<br/>
onnxruntime_ENABLE_DAWN_BACKEND_VULKAN=1 | `17,615,872 onnxruntime.dll`
|
| Monolithic | --use_webgpu<br/> --config Release<br/>
--build_shared_lib<br/> --cmake_extra_defines<br/>
onnxruntime_BUILD_DAWN_MONOLITHIC_LIBRARY=1 | `17,082,368
dxcompiler.dll`<br/>` 1,508,472 dxil.dll`<br/>`13,277,696
onnxruntime.dll`<br/>` 5,616,640 webgpu_dawn.dll` |
| External Dawn | --use_webgpu<br/> --config Release<br/>
--build_shared_lib<br/> --cmake_extra_defines<br/>
onnxruntime_USE_EXTERNAL_DAWN=1<br/> --skip_tests | `17,081,344
dxcompiler.dll`<br/>` 1,508,472 dxil.dll`<br/>`13,277,184
onnxruntime.dll`
2024-12-13 16:05:48 -08:00
genmingz@AMD
62e7e24f17
Add attrProto.release_s interface (#22977)
### Description
Add AttributeProto.release_s interface, which is used to obtain the
string in the attribute using move semantics instead of copying it



### Motivation and Context
The ep_context node stores a lot of information in attributes, which may
cause the memory usage to increase. Use this interface to avoid memory
waste

---------

Co-authored-by: GenMing Zhong <genmingz@xlnx.xilinx.com>
Co-authored-by: genmingz <genmingz@amd.com>
2024-12-12 21:13:43 -08:00
Hector Li
2a36fd4f6e
Fix the ctx_gen tool to make sure all generated ctx.onnx have max_size (#23097)
### Description
Fix the qnn_ctx_gen tool to make sure all generated ctx.onnx have max_size
2024-12-12 21:12:02 -08:00
Hector Li
f43f40facf
Backward compatible with old QNN version (#23095)
### Description
Make QNN EP compliable with old QNN version
2024-12-12 17:04:20 -08:00
Yulong Wang
01539ee7ab
[js/webgpu] fix Conv2DMatMul shader's out-of-bound read (#23085)
### Description
<!-- Describe your changes. -->

Fix a bug caused by potential out-of-bound reads of `W` in the
Conv2DMatMul shader.

### Motivation and Context

Fixes #22983
2024-12-12 11:33:53 -08:00
Dmitri Smirnov
890a719c91
Remove deprecated static from Eigen that contributes to size increase (#23084)
### Description
<!-- Describe your changes. -->
This patches Eigen source to remove an unused deprecated static var.

### Motivation and Context
Internal customer request.
2024-12-12 10:19:47 -08:00
Ankit Maheshkar
1f88284f96
OVEP 1.21.0 Development Updates (#23080)
### Description
OVEP development changes for ORT 1.21 Release
 
 
### Motivation and Context
- Has Critical Bug Fixes
- Improved Performance optimizations for both memory & inference latency
(https://github.com/intel/onnxruntime/pull/513)
- Enabled Model Compilation using NPUW
(https://github.com/intel/onnxruntime/pull/508)
- Fixed support for EPContext embed mode 0 for lower memory utilization
- Updated NuGet package name as `Intel.ML.OnnxRuntime.OpenVino`
- Fixed QDQ Stripping logic on NPU
2024-12-11 22:26:32 -08:00
Hector Li
ebb968d34a
disable the EP context embed model by default in session option (#23070)
change the default value for session option ep.context_embed_mode to 0 to avoid the model loading memory overhead
2024-12-11 17:26:29 -08:00
Yulong Wang
e605870783
[js/web] Update API for ort.env.webgpu (#23026)
### Description

This PR is a replacement of #21671. It offers a new way for accessing
the following:
- `ort.env.webgpu.adapter`:
- **deprecating**. There is no point to get the value of it. Once
`GPUDevice.adapterInfo` is supported, there is no point to set the value
too.
- `ort.env.webgpu.device`:
  - set value of `GPUDevice` if user created it. Use at user's own risk.
- get value of `Promise<GPUDevice>`. if not exist, create a new one. if
exist return it.
- `ort.env.webgpu.powerPreference`:
- **deprecating**. encouraging users to set `ort.env.webgpu.device` if
necessary.
- `ort.env.webgpu.forceFallbackAdapter`:
- **deprecating**. encouraging users to set `ort.env.webgpu.device` if
necessary.
2024-12-11 10:24:14 -08:00
sushraja-msft
8800830a44
Implement 2d tiled matmulnbits specialized for prefill (#23058)
### Description
This change implements matmul4bits with tiling both for A and B. This is
beneficial for prefill scenarios on Intel integrated GPUs, because each
row of A has to run through the same set of shared rows of B. This
change should improve core occupancy and model_benchmark does indicate
improvements for prefill.

The same shader is not used for generation because when A has just a
single row, the other threads in the workgroup get unused and that hurts
performance.

```
-- Baseline run on an Alderlake GPU --

C:\onnxruntime>C:\model_benchmark\model_benchmark.exe -i C:\Phi-3.5-mini-instruct-onnx-web\Phi-3.5-mini-instruct-onnx-web -l 500
Batch size: 1, prompt tokens: 501, tokens to generate: 128
Prompt processing (time to first token):
        avg (us):       1.72338e+07
        avg (tokens/s): 29.0707                          << 
        p50 (us):       1.72548e+07
        stddev (us):    57012.8
        n:              5 * 501 token(s)
Token generation:
        avg (us):       79227.5
        avg (tokens/s): 12.6219
        p50 (us):       79284.4
        stddev (us):    2109.72
        n:              635 * 1 token(s)
Token sampling:
        avg (us):       15.8198
        avg (tokens/s): 63211.8
        p50 (us):       14.3
        stddev (us):    8.67178
        n:              640 * 1 token(s)
E2E generation (entire generation loop):
        avg (ms):       27297.8
        p50 (ms):       27269.8
        stddev (ms):    89.4322
        n:              5
Peak working set size (bytes): 5490987008
WebGPU device lost (2): Device was destroyed.

----------------------------------- With Prefill Optimization ----

C:\onnxruntime>C:\model_benchmark\model_benchmark.exe -i C:\Phi-3.5-mini-instruct-onnx-web\Phi-3.5-mini-instruct-onnx-web -l 500                                                                                                                                                               
Batch size: 1, prompt tokens: 501, tokens to generate: 128
Prompt processing (time to first token):
        avg (us):       1.2135e+07
        avg (tokens/s): 41.2856                                 << 
        p50 (us):       1.21288e+07
        stddev (us):    21282.1
        n:              5 * 501 token(s)
Token generation:
        avg (us):       78945.3
        avg (tokens/s): 12.667
        p50 (us):       78900.7
        stddev (us):    2232.43
        n:              635 * 1 token(s)
Token sampling:
        avg (us):       20.5608
        avg (tokens/s): 48636.3
        p50 (us):       18.7
        stddev (us):    19.0409
        n:              640 * 1 token(s)
E2E generation (entire generation loop):
        avg (ms):       22163.8
        p50 (ms):       22160.1
        stddev (ms):    31.3122
        n:              5
Peak working set size (bytes): 5478862848
WebGPU device lost (2): Device was destroyed.
```
2024-12-10 17:07:11 -08:00
amancini-N
d8de3c4096
[CUDA EP] Fix BeamSearch on T5 with sequence_as_input_ids (#20667) (#20668)
### Description
Change the implementation of BeamSearch op when using CUDA EP: in case
of T5 model, and in case the decoder input_ids are sequences, copy the
sequences device-to-device instead of host-to-device

### Motivation and Context
- Fixes #20667
2024-12-10 16:20:47 -08:00
shiyi
02f0af0d08
[WebNN] Improve data type check of slice op (#22988)
A follow-up of [[WebNN] Support negative steps for
slice](https://github.com/microsoft/onnxruntime/pull/22871#discussion_r1847929774).
Slice op is emulated by reverse+slice when steps < 0 so
`SliceOpBuilder::HasSupportedInputsImpl()` should also check the
supported data types of reverse.

---------

Co-authored-by: Wanming Lin <wanming.lin@intel.com>
2024-12-10 15:48:16 -08:00
Edward Chen
fa6ad202aa
Minor updates to onnxruntime_java.cmake (#23068)
- Use `ANDROID` instead of `CMAKE_SYSTEM_NAME STREQUAL "Android"`.
- Put common gradle arguments into `COMMON_GRADLE_ARGS` to make them easier to reuse.
2024-12-10 15:44:36 -08:00
Jiajia Qin
defcc4f819
[webgpu] Optimize Expand (#23052)
### Description
<!-- Describe your changes. -->
Use components = 4 if possible.

This is the webgpu native implementation from #22752
2024-12-10 14:58:57 -08:00
Misha Chornyi
bf4d3e1a5b
Update vcpkg.json - lock flatbuffer version (#23046)
### Description
Locking version introduced in:

03ea5dc495/onnxruntime/core/flatbuffers/schema/ort_training_checkpoint.fbs.h (L11-L13)

### Motivation and Context
Resolve issue for version `>=1.20.` 
https://github.com/microsoft/onnxruntime/issues/22666
2024-12-10 11:23:01 -08:00
Jian Chen
5f7b9d0245
Upgrade gradle to 8.7 (#23016)
### Description
This PR only upgrade the gradle version and
`com.android.tools.build:gradle` version from build.gradle.

This only update the react-native library gradle version, not the e2e
test.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-12-10 10:49:03 -08:00
A-Satti
b14b4ec703
Restore Qspectre flag (#23060)
Restore a removed Qspectre flag and update comment

### Motivation and Context
Adjustment for PR
f5293d253c
2024-12-09 21:52:21 -08:00
Scott McKay
708ee8556e
Reduce default logger usage (#23030)
### Description
<!-- Describe your changes. -->
We have use cases where multiple sessions are created concurrently.
Minimizing the usage of the default logger is important for these
scenarios.

Wire through the session logger to as many places as possible. The EP
logger can also be used once the session is created (can't be used
during EP construction/kernel registration but can be used in
GetCapability and Compile).

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Improve logging when there are concurrent sessions.
2024-12-10 12:54:14 +11:00
wejoncy
e12421be30
[CoreML] more performace flag (#22975)
### Description
refactor unsquzee's implementation
add more flags to boost peformance.
add profile flag


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: jicwen <jicwen@YiMacBook-Pro.local>
Co-authored-by: wejoncy <wejoncy@.com>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
2024-12-10 09:35:05 +08:00
amancini-N
8f3384b4c1
Fix BeamSearch T5 if initializers are on outer scope (#23044)
### Description
This PR adds the logic needed to consider only the needed implicit
inputs on BeamSearch op in case of T5 model (encoder/decoder, 2 graphs).
The logic added is similar to what happens in the _If_ kernel setup.


### Motivation and Context
Fixes #23043
2024-12-09 15:15:20 -08:00
Scott McKay
2f2c73bdde
Miscellaneous cleanups (#23048)
### Description
<!-- Describe your changes. -->
- fix some missing end of version markers and since_version info
- fix include to use onnx_protobuf.h which handles minimal builds
- we should always prefer that header over directly using the onnx ones


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-12-10 09:24:16 +11:00
Yulong Wang
22ae97c7dc
[webgpu] Add Alias def for Flatten (#23038)
### Description

Add `Alias` definition for Flatten in WebGPU EP.

also add int32/uint32 in type constraint T.
2024-12-09 14:19:43 -08:00
Wanming Lin
6d9636f07c
[WebNN] Allow ops to handle ignoring an empty tensor as input (#22972)
### Description
Some ops should allow empty tensor as input, e.g. roi, scales inputs in
Resize
### Motivation and Context
It avoid some unexpected fallback for optional input with empty tensor.
e.g. roi and scales are both optional inputs in Resize, in some models
they have non-empty name but with empty initializer presented as `[0]`,
WebNN currently will fallback all nodes with 0 dimension, which is not
expected.

![image](https://github.com/user-attachments/assets/599ba351-b5f6-49ac-8a1f-69fb28dbaf9b)
2024-12-06 17:58:15 -08:00
A-Satti
f5293d253c
Update Intel Thread Counts (#22894)
### Description
The default thread count methodology by onnxruntime did not account for
new upcoming Intel microarchitectures leading to a suboptimal thread
count. Optimizing the thread count for new Intel microarchitectures
reveal gains on the majority of models across datatypes and shows gains
up to ~1.5x speedup.


### Motivation and Context
Applications should run on Intel with the most performant thread
configuration for the majority of models. With new microarchitectures,
adjusting the thread count methodology is required to take advantage of
their differences.
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-12-06 13:56:50 -08:00
Jing Fang
bd5a759d0c
[ARM CPU] Add rotary embedding fp16 kernel (#23013)
### Description
Add fp16 kernel to rotary embedding to boost performance.


### Motivation and Context
Part of performance optimization work for group query attention
2024-12-06 13:25:48 -08:00
Hector Li
401d16c671
Enable QNN HTP spill fill buffer setting to save RAM usage. (#22853)
### Description
Enable QNN HTP spill fill buffer setting to save RAM usage.
This feature is available after QNN 2.28. Need to re-generate QNN
context binary.

https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/htp_backend.html#qnn-htp-backend-api

Requirements:
1. Need to re-generate the Onnx model with QNN context binary by set the
EP option enable_htp_spill_fill_buffer = 1.
2. Works for a model with multiple Context binaries. Need manually merge
2 Onnx model with context binary into 1 Onnx model.
3. Requires Linux platform if generate the context binary offline since
QnnSystem lib is not available for Windows x86_64 platform.
No need to do extra thing while running the model inference.

The generated EPContext node will have a max_size attribute with the
maximum spill fill buffer size for the context binary
<img width="353" alt="image"
src="https://github.com/user-attachments/assets/a3bf48be-a8da-4381-8a1d-3f2558eea37d">

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-12-06 11:36:52 -08:00
dependabot[bot]
d27fecd3d3
Bump cross-spawn from 6.0.5 to 6.0.6 in /js/web (#23019)
Bumps [cross-spawn](https://github.com/moxystudio/node-cross-spawn) from
6.0.5 to 6.0.6.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/moxystudio/node-cross-spawn/blob/v6.0.6/CHANGELOG.md">cross-spawn's
changelog</a>.</em></p>
<blockquote>
<h2><a
href="https://github.com/moxystudio/node-cross-spawn/compare/v6.0.5...v6.0.6">6.0.6</a>
(2024-11-18)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>disable regexp backtracking (<a
href="https://redirect.github.com/moxystudio/node-cross-spawn/issues/160">#160</a>)
(<a
href="https://github.com/moxystudio/node-cross-spawn/commit/ba5aaef">ba5aaef</a>)</li>
<li><strong>core:</strong> support worker threads (<a
href="https://redirect.github.com/moxystudio/node-cross-spawn/issues/127">#127</a>)
(<a
href="https://github.com/moxystudio/node-cross-spawn/commit/f4af31c">f4af31c</a>)</li>
</ul>
<p><!-- raw HTML omitted --><!-- raw HTML omitted --></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="d35c865b87"><code>d35c865</code></a>
chore(release): 6.0.6</li>
<li><a
href="5a37e19173"><code>5a37e19</code></a>
chore: update package.json and package.lock</li>
<li><a
href="ba5aaef783"><code>ba5aaef</code></a>
fix: disable regexp backtracking (<a
href="https://redirect.github.com/moxystudio/node-cross-spawn/issues/160">#160</a>)</li>
<li><a
href="f4af31c8ee"><code>f4af31c</code></a>
fix(core): support worker threads (<a
href="https://redirect.github.com/moxystudio/node-cross-spawn/issues/127">#127</a>)</li>
<li>See full diff in <a
href="https://github.com/moxystudio/node-cross-spawn/compare/v6.0.5...v6.0.6">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=cross-spawn&package-manager=npm_and_yarn&previous-version=6.0.5&new-version=6.0.6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
Dependabot will merge this PR once it's up-to-date and CI passes on it,
as requested by @fs-eire.

[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-05 10:07:08 -08:00
Yi Zhang
6ed77cc374
Deprecate macos-12 (#23017)
### Description
<!-- Describe your changes. -->



### Motivation and Context
ESRP code-sign task has supported .net 8, so we can remove macos-12
2024-12-05 14:07:21 +08:00
Yulong Wang
1c79a4c9dd
[js/common] use TS type inference to eliminate unknown (#23012)
### Description

This change uses a TypeScript trick to infer global types in
onnxruntime-common. Thanks to the strong type system of TypeScript, we
are able to refer to types that may not be available in the context.

This helps to keep onnxruntime-common not to include dependencies like
"@webgpu/types", and still being able to use the types in the
declaration. See comments of `TryGetGlobalType` in `type-helper.ts`.
2024-12-04 19:01:26 -08:00
Jian Chen
f340b3cad3
Adding DML to python cuda package (#22606) 2024-12-04 21:20:12 -05:00
Yulong Wang
3234487385
[js] remove more unused training types (#22753)
### Description

remove more unused training types
2024-12-04 16:44:09 -08:00
dependabot[bot]
3975e79303
Bump axios from 1.6.1 to 1.7.9 in /js/node (#23009) 2024-12-04 23:52:24 +00:00
Wanming Lin
cacd97dba3
[WebNN] Improve the util function of creating WebNN constant MLOperand (#22935)
Merge the util functions to create or retrieve:
- A WebNN constant MLOperand filled with the specified value, data type,
and shape.
- A WebNN scalar constant MLOperand with the specified value and data
type.
2024-12-04 15:09:54 -08:00
Jing Fang
fbe22fdac7
[ARM CPU] Fix flaky hqnbitgemm UT (#23010)
### Description
Increase fp16 qnbitgemm UT tol and use fixed seeds.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-12-04 14:55:52 -08:00
Yulong Wang
7b0fa407eb
fix requirements.txt path (#22946)
### Description

#22380 removes the file
`tools/ci_build/github/linux/docker/inference/x86_64/python/cpu/scripts/requirements.txt`
but it is still used in `dockerfiles/Dockerfile.cuda`.

This change updates the file path of the requirements.txt

fixes #22945.
2024-12-04 13:08:29 -08:00
Yulong Wang
d0dde4f7d4
[wasm/test] update packages versions (#23008)
### Description

Upgrade packages version to resolve the following dependabot alerts:
- https://github.com/microsoft/onnxruntime/security/dependabot/269
- https://github.com/microsoft/onnxruntime/security/dependabot/268
- https://github.com/microsoft/onnxruntime/security/dependabot/275
- https://github.com/microsoft/onnxruntime/security/dependabot/306



```
# npm audit report

braces  <3.0.3
Severity: high
Uncontrolled resource consumption in braces - https://github.com/advisories/GHSA-grv7-fg5c-xmjg
fix available via `npm audit fix`
node_modules/braces

cookie  <0.7.0
cookie accepts cookie name, path, and domain with out of bounds characters - https://github.com/advisories/GHSA-pxg6-pf52-xh8x
fix available via `npm audit fix`
node_modules/cookie
  engine.io  0.7.8 - 0.7.9 || 1.8.0 - 6.6.1
  Depends on vulnerable versions of cookie
  Depends on vulnerable versions of ws
  node_modules/engine.io
    socket.io  1.6.0 - 4.7.5
    Depends on vulnerable versions of engine.io
    node_modules/socket.io


ws  8.0.0 - 8.17.0
Severity: high
ws affected by a DoS when handling a request with many HTTP headers - https://github.com/advisories/GHSA-3h5v-q93c-6h6q
fix available via `npm audit fix`
node_modules/ws
  socket.io-adapter  2.5.2 - 2.5.4
  Depends on vulnerable versions of ws
  node_modules/socket.io-adapter

6 vulnerabilities (1 low, 1 moderate, 4 high)

```
2024-12-04 13:08:13 -08:00
Yulong Wang
fdf5ffe2cf
[js/node] fix TypeScript declaration in onnxruntime-node (#23000)
### Description
fix TypeScript declaration in onnxruntime-node

### Motivation and Context

Fixes #22978
2024-12-04 11:29:27 -08:00
Xu Xing
c19617a24a
[js/webgpu] Add GatherND (#22847)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-12-04 09:57:32 -08:00