Commit graph

11930 commits

Author SHA1 Message Date
Yulong Wang
bd5dbf86fe
support WebGPU EP in Node.js binding (#22660)
### Description

This change enhances the Node.js binding with the following features:
- support WebGPU EP
- lazy initialization of `OrtEnv`
- being able to initialize ORT with default log level setting from
`ort.env.logLevel`.
- session options:
  - `enableProfiling` and `profileFilePrefix`: support profiling.
  - `externalData`: explicit external data (optional in Node.js binding)
- `optimizedModelFilePath`: allow dumping optimized model for diagnosis
purpose
  - `preferredOutputLocation`: support IO binding.

======================================================
`Tensor.download()` is not implemented in this PR.
Build pipeline update is not included in this PR.
2024-11-04 21:09:07 +00:00
Wanming Lin
6c21ab7337
[WebNN] Support SimplifiedLayerNormalization op (#22674)
WebNN doesn't provide dedicate op for SimplifiedLayerNormalization, use
a couple of WebNN ops to emulate it in WebNN EP.

X --> Pow --> ReduceMean --> Add --> Sqrt --> Div -> Mul
2024-11-04 12:25:11 -08:00
Wanming Lin
57668339e4
[WebNN EP] Align QDQ ops with latest Chromium implementation (#22180)
- Pass inputs to WebNN directly, WebNN will handle the broadcasting
- If `zero_point` is not provided, make a WebNN Constant with 0 values
and same shape as `scale` input
2024-11-04 12:21:49 -08:00
Kyle
74adfc2099
Nuget Windows AI Pipeline, Disable SDL Submodules. (#22711)
### Description
<!-- Describe your changes. -->
Set SDL's git submodule to false. 


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
* Previous job's SDL logs:It has 'git submodule sync' command, which
means 'git submodule sync synchronizes all submodules while git
submodule sync'

* After set sdl git submodules to false, the logs don't have 'git
submodule sync' command.
2024-11-04 08:39:28 -08:00
Wanming Lin
c64459fdd7
[WebNN] Don't skip scalar tensor registration (#22688)
ORT will optimize same scalar initializers into one, we should not skip
such scalar registration as a WebNN Constant.
2024-11-03 21:33:34 -08:00
Bin Miao
777fe7922c
[WebNN EP] Support Sign and CumSum operators (#22616)
This PR supports Sign and CumSum operators for WebNN EP. @Honry @fdwr
PTAL, thanks.
2024-11-03 20:08:16 -08:00
Christian Larson
ac6fe48375
Adjust max chunk size to fix error limit check from DX12 for large resources that are CPU accessible. (#22680)
Adjust max chunk size to fix error limit check from DX12 for large
resources that are CPU accessible.

### Description
Current agility SDK restricts CPU visible buffers to 0xFFFF0000 bytes or
slightly smaller than 4GiB. Verified restriction is still in latest
Agility SDK 1.614.1.


### Motivation and Context
Allocation of Resources 4GiB or larger fail in DX12 verification layer.

---------

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>
2024-11-03 18:08:12 -08:00
Tianlei Wu
120cb5a804
[Doc] Add I/O binding example using onnx data type in python API summary (#22695)
### Description

Add I/O binding example using onnx data type in python API summary. The
API is available since 1.20 release.

### Motivation and Context

Follow up of https://github.com/microsoft/onnxruntime/pull/22306 to add
some documentation.
2024-11-02 12:51:37 -07:00
mindest
4ffc1ff3b4
DMMHA: add unit tests; fix CPU, CUDA kernel (#22567)
### Description

Fixes:
(1) cpu kernel: applying scale before bias and mask like other MHA ops
(2) cpu kernel: correct offset during appending past to present.
(3) cuda kernel: apply mask if provided; fix output_qk offset.

Add DMMHA unit tests
2024-11-02 21:05:56 +08:00
ivberg
2e4e221da8
Fix crash when running ORT under low integrity process like Edge where ETW registration can fail (#22699)
### Description
Make ETW provider registration non-fatal and not throw an exception

Needs to work under build with exceptions enabled & --disable_exceptions

### Motivation and Context
ORT should not crash

Addresses #22475. Private tested by filer of that issue
2024-11-01 22:58:47 -07:00
Xavier Dupré
d419df4076
Fix too strict assert in onnx_quantizer.py (#22283)
### Description
Partial answer to issue #19997. The example succeeds after this change.
2024-11-01 19:08:35 -07:00
Justin Chu
6d5e970642
[CI] Set up proper permissions for linting workflow (#22696)
Allow writing security events to post lint messages on PRs.
2024-11-01 18:14:52 -07:00
Wanming Lin
62c3476822
[WebNN] Remove some useless verbose logs (#22690)
These logs are not quite useful and create a lot of noise during
debugging, especially when working with large models.
2024-11-01 14:33:25 -07:00
Justin Chu
f7cabf6d4c
Use suggest-changes@v2 (#22667)
Use suggest-changes@v2
(https://github.com/parkerbxyz/suggest-changes/issues/36#issuecomment-2447605058)
to post suggested changes as comments instead of requested changes to
streamline the review process.

- Also updated the script to `set +e` to ignore exit code only for the
linter run. So that if there is errors in dependency installation we can
still get signals.
2024-11-01 13:13:04 -07:00
Scott McKay
ba0bb43b00
Rework the native library usage so that a pre-built ORT native package can be easily used (#22345)
### Description
The local build of the native library was being included by almost every
project, but is only needed to run tests. Due to the multiple inclusions
attempting to use a pre-built package was clashing with any local builds
that were available.

Create a helper file to include either a local built of a pre-built
package and include that in the two test projects.

Cleanup various miscellaous things.

### Motivation and Context

Create setup to simplify running on-device tests with the nuget
packages.
2024-11-01 11:03:33 -07:00
Jiajia Qin
8fbbf2fd4f
[js/webgpu] Optimize MatMul with M = 1 (#22577)
### Description
<!-- Describe your changes. -->
BUG #22031

In the demucs model, there are lots of MatMul ops with shapes like
below:
`input[0]: [3448,1,512] | float32, input[1]: [512,1536] | float32,
output[0]: [3448,1,1536] | float32`

We can see that for this kind of shape, the batch size is a big value,
but M = 1. Our current algorithm is based on [M, N] to partition tiles,
which is not efficient for such kind of shapes. This PR reshapes the
inputs to improve the matmul performance.
Before:  [3448,1,512] x [512,1536] =  [3448,1,1536]
After: [1, 3448, 512] x [512, 1536] = [1, 3448, 1536] , then the output
can be reshaped to [3448, 1, 1536]

The overall MatMul time in demucs model becomes 1778.45 ms from 4418.17
ms on my iGPUs.

---------

Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
2024-11-01 08:04:42 -07:00
wejoncy
9daf7664fc
[CoreML] ML Program more ops (2/N) (#22480)
- cast 
 - argmax
 - gelu 
 - cast 
 - LayerNorm 
 - GroupNorm 
 - InstanceNorm

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-11-01 08:37:56 +08:00
shiyi
c7ecc081ca
Revert "[WebNN] Fallback the node when its output doesn't have shape info" (#22669)
Reverts microsoft/onnxruntime#22556 since it causes incorrect fallback.
2024-10-31 16:07:56 -07:00
Changming Sun
f9bc24e1a7
Add concurrency setting to codeql workflow (#22678)
### Description
1. Add concurrency setting to codeql workflow
2. Modify lint workflow's PATH setting.


### Motivation and Context
To save machine resource.
2024-10-31 16:01:07 -07:00
Yi Zhang
8e8b62b8b5
Build CUDA and DML together (#22602)
### Description
Now, we need to build cuda and dml in one package.
But CUDA EP and DML EP can't run in one process.
It will throw the exception of `the GPU device instance has been
suspended`
So the issue is CUDA EP and DML EP coexist in compile time but can't
exist in run time.

This PR is to split cuda ep test and dml ep test in all unit tests.
The solution is to use 2 environment variable, NO_CUDA_TEST and
NO_DML_TEST, in CI.

For example, if NO_CUDA_TEST is set, the DefaultCudaExecutionProvider
will be nullptr, and the test will not run with CUDA EP.
In debugging, the CUDAExecutionProvider will not be called. 
I think, as long as cuda functions, like cudaSetDevice, are not called,
DML EP tests can pass.

Disabled java test of testDIrectML because it doesn't work now even
without CUDA EP.
2024-10-31 15:51:13 -07:00
Wanming Lin
7b9db658c3
Fixed a minor bug in layout transformation for Resize (#21954)
Since opset 18, 'scales' and 'sizes' constant inputs can be 2D tensors,
transpose for 2D tensors are not supported at current implementation,
fix it by only allowing 4D constant inputs.
2024-10-31 14:20:10 -07:00
dtang317
55e0128b13
[DML EP] Cast to bool correctly, adding explicit clip after cast (#22665)
### Description
The CastNonStringTester test in CastOpTest was failing due to bitwise
mismatches when casting other types to bool. This was caused by bool
being represented as uint8 in DML. Added a clipping step in
DmlOperatorCast to ensure correct bitwise matching after casting to bool
ref: https://dev.azure.com/microsoft/OS/_workitems/edit/44572678


### Motivation and Context
2024-10-31 13:27:37 -07:00
Tianlei Wu
1b60209938
[CUDA/ROCm/Migraphx] consolidate gpu data transfer (#22609)
### Description
Consolidate the gpu data transfer in CUDA, ROCm and Migraphx EP.
(1) Remove some redundant stream synchronize on default stream according
to spec of cudaMemcpy
(2) consolidate CUDA, ROCm and MigrphaX to try use same logic.

### Motivation
This is a follow up on reviewing
https://github.com/microsoft/onnxruntime/pull/22589.

### Context


https://docs.nvidia.com/cuda/cuda-runtime-api/api-sync-behavior.html#api-sync-behavior
##### cudaMemcpy()
* For transfers from pageable host memory to device memory, a stream
sync is performed before the copy is initiated. The function will return
once the pageable buffer has been copied to the staging memory for DMA
transfer to device memory, **but the DMA to final destination may not
have completed**.
* For transfers from pinned host memory to device memory, the function
is synchronous with respect to the host.
* For transfers from device to either pageable or pinned host memory,
the function returns only once the copy has completed.
* For transfers from device memory to device memory, **no host-side
synchronization is performed**.
* For transfers from any host memory to any host memory, the function is
fully synchronous with respect to the host.

#### cudaMemcpyAsync

* For transfers between device memory and pageable host memory, the
function might be synchronous with respect to host.
* For transfers from any host memory to any host memory, the function is
fully synchronous with respect to the host.
* If pageable memory must first be staged to pinned memory, the driver
may synchronize with the stream and stage the copy into pinned memory.
 * For all other transfers, the function should be fully asynchronous.


https://rocm.docs.amd.com/projects/HIP/en/latest/doxygen/html/group___memory.html

##### hipMemcpyAsync()

If host or dest are not pinned, the memory copy will be performed
synchronously. For best performance, use hipHostMalloc to allocate host
memory that is transferred asynchronously.
on HCC hipMemcpyAsync does not support overlapped H2D and D2H copies.
For hipMemcpy, the copy is always performed by the device associated
with the specified stream.

##### hipMemcpy()
For hipMemcpy, the copy is always performed by the current device (set
by hipSetDevice).

https://github.com/ROCm/ROCm/blob/roc-5.7.x/tools/autotag/templates/rocm_changes/5.6.1.md

ROCm 5.6.1 release note: hipMemcpy device-to-device (intra device) is
now asynchronous with respect to the host
2024-10-31 09:52:50 -07:00
Pranav Sharma
69fe58b0ef
Fix formatting of DML EP files that was disturbed in an earlier PR. (#22672) 2024-10-31 07:37:28 -07:00
sstamenk
a2070bf091
Fix input shape related compile logs for MIGraphX EP to be semantically correct (#22624)
As the title suggests, recompilation is done if a mismatch is detected.
Changed the logs to reflect that behavior.
2024-10-30 19:46:35 -07:00
Changming Sun
60bfa7fab1
Update publish-python-apidocs.yml (#22655)
To fix a permission error
2024-10-30 19:25:29 -07:00
Wanming Lin
eb66bfa7b4
[WebNN] Convert MLOperand methods into readonly attributes (#22653)
Adapt to spec change at
https://github.com/webmachinelearning/webnn/pull/774
2024-10-30 17:54:49 -07:00
Wanming Lin
fc375a6f58
[WebNN] Support And, Or and Xor ops (#22598)
Co-authored-by: Dwayne Robinson <fdwr@hotmail.com>
2024-10-30 17:52:10 -07:00
Yi Zhang
5200e098aa
Not using predefined marco to check EP (#22654)
### Description
We'll build CUDA EP and DML  EP in one package.
As a result, USE_DML and USE_CUDA will coexist.
We can't use predefined macros to check EP any more


### Motivation and Context
Other changes are in test code, so I make this change of core runtime
into one PR.
2024-10-31 08:46:13 +08:00
Pranav Sharma
03ea5dc495
Distinguish between DML and the generic 'GPU' term. This is needed for packaging DML EP in the same ORT GPU pkg. (#22657)
### Description
Distinguish between DML and the generic 'GPU' term. This is needed for
packaging DML EP in the same ORT GPU pkg.

### Motivation and Context
Customer requirement.
2024-10-30 11:58:34 -07:00
Enrico Galli
df236c7894
[WebNN EP] Add cache for MLContexts in the WebNNBackend (#22510)
### Description
This change adds a cache of `MLContext`s keyed by their options to the
`WebNNBackend`. This makes is so that multiple `InferenceSession`s
create with the same options will share the same context.

### Motivation and Context
Since `MLTensor`s are tied `MLContext`s, developer can't easily share
tensors between `InferenceSession` (outside of manually an `MLContext`
and specifying the `context` options). This leads strange behaviors such
as,
```js
const sessionsA = ort.InferenceSession.create(urlA, {
  executionProviders: ["webnn"],
  preferredOutputLocation: "ml-buffer",
});
const sessionsB = ort.InferenceSession.create(urlB, {
  executionProviders: ["webnn"],
});
const temp = await sessionA.run({/* arguments */});
const result = await sessionB.run({"input":temp["output"]}); // ERROR: Failed to execute 'dispatch' on 'MLContext': Invalid inputs: The context of MLGraph doesn't match the context of the MLTensor with name "input".
```
We encountered this behavior when updating the transformers.js version
in the developer preview demos. microsoft/webnn-developer-preview#46
2024-10-30 10:26:33 -07:00
shiyi
46ff240821
[WebNN] Add ScatterElements and GatherElements (#22534) 2024-10-30 10:20:21 -07:00
shiyi
86b3b89a6b
[WebNN EP] Check if the tensor shape has 0 dimension (#22573)
WebNN doesn't support empty tensor.
2024-10-30 08:18:21 -07:00
Yulong Wang
7a8fa12850
Add implementation of WebGPU EP (#22591)
### Description

This PR adds the actual implementation of the WebGPU EP based on
https://github.com/microsoft/onnxruntime/pull/22318.

This change includes the following:

<details>
<summary><b>core framework of WebGPU EP</b></summary>

  - WebGPU EP factory classes for:
    - handling WebGPU options
    - creating WebGPU EP instance
    - creating WebGPU context
  - WebGPU Execution Provider classes
    - GPU Buffer allocator
    - data transfer
  - Buffer management classes
    - Buffer Manager
    - BufferCacheManager
      - DisabledCacheManager
      - SimpleCacheManager
      - LazyReleaseCacheManager
      - BucketCacheManager
  - Program classes
    - Program (base)
    - Program Cache Key
    - Program Manager
  - Shader helper classes
    - Shader Helper
    - ShaderIndicesHelper
    - ShaderVariableHelper
  - Utils
    - GPU Query based profiler
    - compute context
    - string utils
  - Miscs
    - Python binding webgpu support (basic)
 
</details>

<details>
<summary><b>Kernel implementation</b></summary>


  - onnx.ai (default opset):
- Elementwise (math): Abs, Neg, Floor, Ceil, Reciprocal, Sqrt, Exp, Erf,
Log, Sin, Cos, Tan, Asin, Acos, Atan, Sinh, Cosh, Asinh, Acosh, Atanh,
Tanh, Not, Cast
- Elementwise (activation): Sigmoid, HardSigmoid, Clip, Elu, Relu,
LeakyRelu, ThresholdedRelu, Gelu
- Binary (math): Add, Sub, Mul, Div, Pow, Equal, Greater,
GreaterOrEqual, Less, LessOrEqual
    - (Tensors): Shape, Reshape, Squeeze, Unsqueeze
    - Where
    - Transpose
    - Concat
    - Expand
    - Gather
    - Tile
    - Range
    - LayerNormalization
  - com.microsoft
    - FastGelu
    - MatMulNBits
    - MultiHeadAttention
    - RotaryEmbedding
    - SkipLayerNormalization
    - LayerNormalization
    - SimplifiedLayerNormalization
    - SkipSimplifiedLayerNormalization

</details>

<details>
<summary><b>Build, test and CI pipeline integration</b></summary>

  - build works for Windows, macOS and iOS
  - support onnxruntime_test_all and python node test
  - added a new unit test for `--use_external_dawn` build flag.
  - updated MacOS pipeline to build with WebGPU support
  - added a new pipeline for WebGPU Windows

</details>

This change does not include:

- Node.js binding support for WebGPU (will be a separate PR)
2024-10-29 18:29:40 -07:00
Prathik Rao
5cc7fb4a74
[JSEP] Upgrade to ONNX Opset 21 (#22595)
### JSEP Ops that need updating

- [x] Cast
- [x] ReduceMax
- [x] ReduceMin
- [x] Squeeze
- [x] Unsqueeze
- [x] Transpose
- [x] AveragePool
- [x] Flatten
- [x] Pad
- [x] If
2024-10-29 17:44:38 -07:00
Indy Zhu
e2e837584f
[DML EP] Update DML to 1.15.4 (#22635)
### Description
[DML EP] Update DML to 1.15.4



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
We want the customer to use the latest DirectML.
2024-10-29 17:13:57 -07:00
Jiajia Qin
04e696d8e0
[js/webgpu] Optimize InstanceNorm in some shapes (#22637)
BUG #22031

Optimize below two situations:
1. Increase workgroupSize if only one workgroup is dispatched.
2. Avoid transpose if not necessary.

The overall time of demucs model becomes 106.36 ms from 154.60 ms on my
dGPUs with this PR and PR #22577
2024-10-29 17:10:14 -07:00
dependabot[bot]
4850bcd896
Bump onnx from 1.16.1 to 1.17.0 in /onnxruntime/python/tools/transformers/models/whisper (#22641)
Bumps [onnx](https://github.com/onnx/onnx) from 1.16.1 to 1.17.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/onnx/onnx/releases">onnx's
releases</a>.</em></p>
<blockquote>
<h2>v1.17.0</h2>
<p>ONNX v1.17.0 is now available with exciting new features! We would
like to thank everyone who contributed to this release!
Please visit <a href="https://onnx.ai/">onnx.ai</a> to learn more about
ONNX and associated projects.</p>
<h1>Key Updates</h1>
<h2>ai.onnx Opset 22</h2>
<ul>
<li>Update to support bfloat16:
<ul>
<li><a
href="https://onnx.ai/onnx/operators/onnx__Acos.html#acos-22">Acos</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__Acosh.html#acosh-22">Acosh</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__Asin.html#asin-22">Asin</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__Asinh.html#asinh-22">Asinh</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__Atan.html#atan-22">Atan</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__Atanh.html#atanh-22">Atanh</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__AveragePool.html#averagepool-22">AveragePool</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__Bernoulli.html#bernoulli-22">Bernoulli</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__Conv.html#conv-22">Conv</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__ConvTranspose.html#convtranspose-22">ConvTranspose</a>,
<a href="https://onnx.ai/onnx/operators/onnx__Cos.html#cos-22">Cos</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__Cosh.html#cosh-22">Cosh</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__DeformConv.html#deformconv-22">DeformConv</a>,
<a href="https://onnx.ai/onnx/operators/onnx__Det.html#det-22">Det</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__Dropout.html#dropout-22">Dropout</a>,
<a href="https://onnx.ai/onnx/operators/onnx__Elu.html#elu-22">Elu</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__EyeLike.html#eyelike-22">EyeLike</a>,
<a href="https://onnx.ai/onnx/operators/onnx__GRU.html#gru-22">GRU</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__GlobalAveragePool.html#globalaveragepool-22">GlobalAveragePool</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__GlobalLpPool.html#globallppool-22">GlobalLpPool</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__GlobalMaxPool.html#globalmaxpool-22">GlobalMaxPool</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__GridSample.html#gridsample-22">GridSample</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__HardSigmoid.html#hardsigmoid-22">HardSigmoid</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__HardSwish.html#hardswish-22">HardSwish</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__InstanceNormalization.html#instancenormalization-22">InstanceNormalization</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__LSTM.html#lstm-22">LSTM</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__LpNormalization.html#lpnormalization-22">LpNormalization</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__LpPool.html#lppool-22">LpPool</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__MaxPool.html#maxpool-22">MaxPool</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__MaxRoiPool.html#maxroipool-22">MaxRoiPool</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__MaxUnpool.html#maxunpool-22">MaxUnpool</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__Mish.html#mish-22">Mish</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__Multinomial.html#multinomial-22">Multinomial</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__NegativeLogLikelihoodLoss.html#negativeloglikelihoodloss-22">NegativeLogLikelihoodLoss</a>,
<a href="https://onnx.ai/onnx/operators/onnx__RNN.html#rnn-22">RNN</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__RandomNormal.html#randomnormal-22">RandomNormal</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__RandomNormalLike.html#randomnormallike-22">RandomNormalLike</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__RandomUniform.html#randomuniform-22">RandomUniform</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__RandomUniformLike.html#randomuniformlike-22">RandomUniformLike</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__RoiAlign.html#roialign-22">RoiAlign</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__Round.html#round-22">Round</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__Selu.html#selu-22">Selu</a>,
<a href="https://onnx.ai/onnx/operators/onnx__Sin.html#sin-22">Sin</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__Sinh.html#sinh-22">Sinh</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__Softplus.html#softplus-22">Softplus</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__Softsign.html#softsign-22">Softsign</a>,
<a href="https://onnx.ai/onnx/operators/onnx__Tan.html#tan-22">Tan</a>,
<a
href="https://onnx.ai/onnx/operators/onnx__ThresholdedRelu.html#thresholdedrelu-22">ThresholdedRelu</a></li>
</ul>
</li>
</ul>
<h2>Python Changes</h2>
<ul>
<li>Support for numpy &gt;= 2.0</li>
</ul>
<h1>Bug fixes and infrastructure improvements</h1>
<ul>
<li>Fix Check URLs errors <a
href="https://redirect.github.com/onnx/onnx/pull/5972">5972</a></li>
<li>Use CMAKE_PREFIX_PATH in finding libprotobuf <a
href="https://redirect.github.com/onnx/onnx/pull/5975">5975</a></li>
<li>Bump main VERSION_NUMBER to 1.17.0 <a
href="https://redirect.github.com/onnx/onnx/pull/5968">5968</a></li>
<li>Fix source and pip tar.gz builds on s390x systems <a
href="https://redirect.github.com/onnx/onnx/pull/5984">5984</a></li>
<li>Fix unique_name <a
href="https://redirect.github.com/onnx/onnx/pull/5992">5992</a></li>
<li>Fix SegFault bug in shape inference <a
href="https://redirect.github.com/onnx/onnx/pull/5990">5990</a></li>
<li>Fix onnx.compose when connecting subgraphs <a
href="https://redirect.github.com/onnx/onnx/pull/5991">5991</a></li>
<li>Fix conversion from split 11 to split 18 <a
href="https://redirect.github.com/onnx/onnx/pull/6020">6020</a></li>
<li>Update error messages for NegativeLogLikelihoodLoss inference
function <a
href="https://redirect.github.com/onnx/onnx/pull/6021">6021</a></li>
<li>Generalize input/output number check in shape inference <a
href="https://redirect.github.com/onnx/onnx/pull/6005">6005</a></li>
<li>Replace rank inference with shape inference for Einsum op <a
href="https://redirect.github.com/onnx/onnx/pull/6010">6010</a></li>
<li>build from source instruction with latest cmake change <a
href="https://redirect.github.com/onnx/onnx/pull/6038">6038</a></li>
<li>Handle OneHot's depth value during shape inference <a
href="https://redirect.github.com/onnx/onnx/pull/5963">5963</a></li>
<li>Not to install cmake in pyproject.toml on Windows <a
href="https://redirect.github.com/onnx/onnx/pull/6045">6045</a></li>
<li>fix a skipped shape infer code <a
href="https://redirect.github.com/onnx/onnx/pull/6049">6049</a></li>
<li>Include the &quot;.onnxtext&quot; extension in supported
serialization format <a
href="https://redirect.github.com/onnx/onnx/pull/6051">6051</a></li>
<li>Allow ReferenceEvaluator to return intermediate results <a
href="https://redirect.github.com/onnx/onnx/pull/6066">6066</a></li>
<li>Fix 1 typo in numpy_helper.py <a
href="https://redirect.github.com/onnx/onnx/pull/6041">6041</a></li>
<li>Remove benchmarking code <a
href="https://redirect.github.com/onnx/onnx/pull/6076">6076</a></li>
<li>Prevent crash on import after GCC 8 builds <a
href="https://redirect.github.com/onnx/onnx/pull/6048">6048</a></li>
<li>Check graph outputs are defined <a
href="https://redirect.github.com/onnx/onnx/pull/6083">6083</a></li>
<li>Enable additional ruff rules <a
href="https://redirect.github.com/onnx/onnx/pull/6032">6032</a></li>
<li>Add missing shape inference check for DequantizeLinear <a
href="https://redirect.github.com/onnx/onnx/pull/6080">6080</a></li>
<li>Add bfloat16 to all relevant ops <a
href="https://redirect.github.com/onnx/onnx/pull/6099">6099</a></li>
<li>fix(ci): install python dependencies with --only-binary :all: in
manylinux <a
href="https://redirect.github.com/onnx/onnx/pull/6120">6120</a></li>
<li>fix: install google-re2 with --only-binary option <a
href="https://redirect.github.com/onnx/onnx/pull/6129">6129</a></li>
<li>Specify axis parameter for DequantizeLinear when input rank is 1 <a
href="https://redirect.github.com/onnx/onnx/pull/6095">6095</a></li>
<li>Pin onnxruntime to 1.17.3 for release CIs <a
href="https://redirect.github.com/onnx/onnx/pull/6143">6143</a></li>
<li>Fix INT4 TensorProto byte size is 5x larger than expected with
negative values <a
href="https://redirect.github.com/onnx/onnx/pull/6161">6161</a></li>
<li>Mitigate tarball directory traversal risks <a
href="https://redirect.github.com/onnx/onnx/pull/6164">6164</a></li>
<li>Fix reference implementation for ScatterND with 4D tensors <a
href="https://redirect.github.com/onnx/onnx/pull/6174">6174</a></li>
<li>Addition of group &gt; 1 in test and in backend for ConvTranspose <a
href="https://redirect.github.com/onnx/onnx/pull/6175">6175</a></li>
<li>Support for bfloat16 for binary, unary operators in reference
implementation <a
href="https://redirect.github.com/onnx/onnx/pull/6166">6166</a></li>
<li>Refactor windows workflow to work on standard windows <a
href="https://redirect.github.com/onnx/onnx/pull/6190">6190</a></li>
<li>Fix a few crashes while running shape inference <a
href="https://redirect.github.com/onnx/onnx/pull/6195">6195</a></li>
<li>Update onnx to work with numpy&gt;=2.0 <a
href="https://redirect.github.com/onnx/onnx/pull/6196">6196</a></li>
<li>Use sets to improve performance of dfs search <a
href="https://redirect.github.com/onnx/onnx/pull/6213">6213</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="b8baa84466"><code>b8baa84</code></a>
Set version 1.17.0 for official release (<a
href="https://redirect.github.com/onnx/onnx/issues/6405">#6405</a>)</li>
<li><a
href="6d77b80821"><code>6d77b80</code></a>
[Cherry-Pick] Fix main url checks (<a
href="https://redirect.github.com/onnx/onnx/issues/6312">#6312</a>) (<a
href="https://redirect.github.com/onnx/onnx/issues/6327">#6327</a>)</li>
<li><a
href="174938d8b7"><code>174938d</code></a>
[Cherry-Pick] Fix protobuf pkg 5.28.0 failing on Windows (<a
href="https://redirect.github.com/onnx/onnx/issues/6342">#6342</a>) (<a
href="https://redirect.github.com/onnx/onnx/issues/6347">#6347</a>)</li>
<li><a
href="f18d5931ad"><code>f18d593</code></a>
[Cherry-Pick] Remove unused variables (<a
href="https://redirect.github.com/onnx/onnx/issues/6303">#6303</a>) (<a
href="https://redirect.github.com/onnx/onnx/issues/6324">#6324</a>)</li>
<li><a
href="c58890537f"><code>c588905</code></a>
Set version in rel-1.17.0 to 1.17.0rc1 (<a
href="https://redirect.github.com/onnx/onnx/issues/6317">#6317</a>)</li>
<li><a
href="4392c2c9ae"><code>4392c2c</code></a>
Prepare for rel-1.17.0 (<a
href="https://redirect.github.com/onnx/onnx/issues/6281">#6281</a>)</li>
<li><a
href="cb54169e4f"><code>cb54169</code></a>
Update ort filter to 1.20.0 to skip tests known to fail with ort 1.19.0
(<a
href="https://redirect.github.com/onnx/onnx/issues/6306">#6306</a>)</li>
<li><a
href="99e1fd352c"><code>99e1fd3</code></a>
Bump reviewdog/action-misspell from 1.21.0 to 1.23.0 (<a
href="https://redirect.github.com/onnx/onnx/issues/6268">#6268</a>)</li>
<li><a
href="1920565505"><code>1920565</code></a>
Bump ossf/scorecard-action from 2.3.3 to 2.4.0 (<a
href="https://redirect.github.com/onnx/onnx/issues/6273">#6273</a>)</li>
<li><a
href="2e8f2289b9"><code>2e8f228</code></a>
Bump mypy from 1.10.1 to 1.11.1 (<a
href="https://redirect.github.com/onnx/onnx/issues/6275">#6275</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/onnx/onnx/compare/v1.16.1...v1.17.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=onnx&package-manager=pip&previous-version=1.16.1&new-version=1.17.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-10-29 15:17:44 -07:00
ivberg
43e6296075
Fix reliability issues in LogAllSessions. (#22568)
### Description
Issue can happen with multiple sessions and when ETW captureState /
rundown is triggered.

Resolves use after free issue.

Tested with local unit test creating/destroying multiple sessions while
continually enabling & disabling ETW. This currently requires Admin
prompt so not checking in

### Motivation and Context
ORT should not crash
2024-10-29 14:22:35 -07:00
Dmitri Smirnov
e106131260
Enable Ort objects to be stored in a resizable std::vector (#22608)
### Description
<!-- Describe your changes. -->
Allow some classes to be default constructed.
The effect is the same as constructing it with nullptr.
Make default ctor visible from the base classes.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Multiple customers complained that when storing Ort::Value
in an instance of std::vector, vector can not be resized.

We enable that with allowing it default constructed.
2024-10-29 09:59:59 -07:00
Yifan Li
951d9aa99f
[TensorRT EP] Refactor TRT version update logic & apply TRT 10.5 (#22483)
### Description
<!-- Describe your changes. -->
* Leverage template `common-variables.yml` and reduce usage of hardcoded
trt_version

8391b24447/tools/ci_build/github/azure-pipelines/templates/common-variables.yml (L2-L7)
* Among all CI yamls, this PR reduces usage of hardcoding trt_version
from 40 to 6, by importing trt_version from `common-variables.yml`
* Apply TRT 10.5 and re-enable control flow op test


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
- Reduce usage of hardcoding trt_version among all CI ymls

### Next refactor PR 
will work on reducing usage of hardcoding trt_version among
`.dockerfile`, `.bat` and remaining 2 yml files
(download_win_gpu_library.yml & set-winenv.yml, which are step-template
yaml that can't import variables)
2024-10-29 09:23:41 -07:00
Yulong Wang
dbe8c83893
[js/web] remove "node": null in export table (#22618)
### Description

This change resolves issue No.3 described in #22615
2024-10-29 04:01:26 -07:00
Changming Sun
3641d184f8
Add pipauth to more ADO pipelines and enable CSV (#22612)
### Description
1. Add pipauth to more ADO pipeline. (We will use a private ADO feed to
fetch python packages in these pipeline, to improve security)
2. Enforce codeSignValidation(CSV).

### Motivation and Context
Fulfill some internal compliance requirements.
2024-10-28 16:39:22 -07:00
shiyi
dcf91266bd
[WebNN EP] Support GatherND and ScatterND op (#22181) 2024-10-28 15:04:45 -07:00
Tianlei Wu
975d3dffcf
Update bert benchmark: replace deprecated API (#22611)
### Description
(1) tokenizer.max_model_input_sizes was deprecated. Use
tokenizer.model_max_length to replace it.
(2) onnx opset updated to 16 instead of 11/12 for models.
(3) Update a few comments related to torch installation.
(4) Test gpu instead of cpu in dev_benchmark.cmd.

### Motivation and Context
Update bert benchmark script so that it can run with latest huggingface
transformers package.
2024-10-28 13:24:17 -07:00
kailums
dd28f09ce2
fix issue when build with hipblasLt on rocm6.1 (#22553)
### Description
<!-- Describe your changes. -->

hipblasLt library is released with rocm6.x, and current onnxruntime's
code need some modifications to match new hipblasLt API.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-10-28 13:57:08 +08:00
Ted Themistokleous
7ad78733e6
Add support for softmaxcrossentropy loss to MIGraphX EP (#64) (#22603)
Add support for softmaxcrossentropy loss. This is already enabled on our
ROCm Fork of the MIGraphX EP


### Motivation and Context
Adds support for the SoftmaxCrossEntropyLoss operator and removes the
filtering of inputs here.
2024-10-27 13:59:35 -07:00
Satya Kumar Jandhyala
05fbb43b34
[JSEP/WebGPU] Fix data causing output mismatch resulting in CI build failures occasionally (#22596)
### Description
<!-- Describe your changes. -->
Test case failing sometimes and passing other times.


### Motivation and Context
Prevent unnecessary CI build failures requiring manually rerunning tests
2024-10-26 01:37:12 -07:00
Wanming Lin
008c9090b4
[WebNN] Support int4 and uint4 data types (#22575) 2024-10-25 17:44:46 -07:00
shiyi
c547306d5f
[WebNN] Fallback the node when its output doesn't have shape info (#22556)
WebNN requires that each input and output must have shape info.
2024-10-25 17:41:45 -07:00