Commit graph

10770 commits

Author SHA1 Message Date
Changming Sun
dafbef3a21
CMake: support reading dependency zip files from a local mirror (#20005)
### Description
To test this feature, run 
```bat
python cmake\deps_update_and_upload.py --root-path mirror
```
Then run build.py as usual. 

The zip files will be cached local. To avoid being downloaded again and
again.
2024-03-21 17:58:59 -07:00
TP Boudreau
983fd8393a
Recognize NaN operands in Min and Max ops (#19984)
### Description
Update the Min and Max CUDA math operations on float/double types to
propagate NaNs: if either operand is NaN, the result should be NaN.

TODO: float16/bfloat16 need similar change.

### Motivation
Currently, results differ between the CPU and CUDA implementations of
the floating point Min and Max operators: the CPU operators correctly
return NaN results if either operand is NaN. This PR updates the CUDA
implementations to conform with this correct behavior.

See the the issue and comments raised
[here](https://github.com/onnx/onnx/issues/6003).

### Context
Same behavior in numpy, torch and Java:
```
>>> numpy.min([numpy.NAN, 1])
nan
>>> numpy.max([numpy.NAN, 1])
nan

>>> torch.min(torch.tensor([1, float('nan')]))
tensor(nan)
>>> torch.max(torch.tensor([1, float('nan')]))
tensor(nan)
```

C languguage [fmin](https://en.cppreference.com/w/c/numeric/math/fmin)
and [fmax](https://en.cppreference.com/w/c/numeric/math/fmax) has
different behavior:
```
fmax(NaN,1) = 1
fmin(NaN,1) = 1
```

https://grouper.ieee.org/groups/msc/ANSI_IEEE-Std-754-2019/background/minNum_maxNum_Removal_Demotion_v3.pdf

![image](https://github.com/microsoft/onnxruntime/assets/30328909/62446cf1-f252-4ddc-8118-5ce605252331)

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2273.pdf
2024-03-21 16:08:18 -07:00
Yi Zhang
30a0d80925
Fix exception in Publish unit test results step (#20007)
### Description
Test results files are all in RelWithDebInfo\RelWithDebInfo directory.
It's not necessary to stat the directory of _deps 

### Motivation and Context
Recently this exception in zip-nuget pipleine occurs many times.
`##[error]Error: Failed find: EPERM: operation not permitted, stat
'D:\a\_work\1\b\RelWithDebInfo\_deps\flatbuffers-src\java\src\test\java\DictionaryLookup'`

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=426981&view=logs&j=75fc0348-fe99-522b-3acb-90fd80ac5271&t=5d4ebcc1-bcde-574d-6f4e-8abd0f04ae4b
2024-03-22 06:53:59 +08:00
Tianlei Wu
06fe4f3113
Increase MNIST test tolerance (#20000)
### Description

Found multiple occurrence of failures:

https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1321061&view=logs&j=6df8fe70-7b8f-505a-8ef0-8bf93da2bac7&t=56a04c0b-9e7f-5c69-cb7b-c2a7b1a7392a&l=17537

https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1329701&view=logs&j=6df8fe70-7b8f-505a-8ef0-8bf93da2bac7&t=4f6ef737-111d-50d1-a46b-5f86d9a970bc&s=3618b4c0-1011-591a-85b8-671e72e2cff1

1: [ RUN      ] ModelTests/ModelTest.Run/
cuda__models_zoo_opset7_MNIST_model
1: D:\a\_work\1\s\onnxruntime\test\providers\cpu\model_tests.cc(358):
error: Expected equality of these values:
1:   COMPARE_RESULT::SUCCESS
1:     Which is: 4-byte object <00-00 00-00>
1:   ret.first
1:     Which is: 4-byte object <01-00 00-00>
1: expected -2.33638 (c0158735), got -2.30239 (c0135a47), diff:
0.0339923, tol=0.0243638 idx=9
2024-03-20 23:40:27 -07:00
Prathik Rao
0b958bb421
add random seed to layernorm tests (#19998)
Adds random seed to layernorm tests to prevent random failure.

### Motivation and Context
Fixes https://github.com/microsoft/onnxruntime/issues/19983
2024-03-20 21:00:25 -07:00
Yi Zhang
175f149b30
Remove downloading deps in CUDA package test stage (#19993)
### Description
<!-- Describe your changes. -->



### Motivation and Context
downloading deps is not needed in test stage
remove it to reduce random downloading errors
2024-03-21 10:01:03 +08:00
Justin Chu
0335ea9f1e
Use Java 11 to build project in the codeql pipeline (#19999)
Codeql uses Java 8 by default, which is too old for the project.

Related:

https://learn.microsoft.com/en-us/java/openjdk/reasons-to-move-to-java-11
https://github.com/actions/setup-java
2024-03-20 17:53:48 -07:00
Yufeng Li
15219e2e71
turn on neural_speed by default (#19627)
### Description
<!-- Describe your changes. -->
the crash caused by the neural_speed turns out to be a very corn case.
Turn it on by default.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-03-20 12:49:58 -07:00
Rachel Guo
6b305f95e0
Support xcframework for mac catalyst builds. (#19534)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

MAUI on macOS uses mac-catalyst which requires a different native
binary.

---------

Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
2024-03-20 10:55:19 -07:00
Adam Pocock
19ff4a6d6c
String Tensor SplitToSequence fix (#19942) 2024-03-20 10:52:00 -07:00
Markus Tavenrath
0af5eacc8b
Fix broken Pooling CUDA NHWC Ops and ensure NCHW / NHWC parity. (#19889)
### Description
Fixed all CUDA NHWC Pooling operations which were broken and enabled the
NHWC CUDA pooling tests. Disabled all pooling tests which are not
supported by the CUDA EP.



### Motivation and Context
Ensure parity between CUDA NHWC / NCHW and work towards 100% tests
enabled for the CUDA EP / CUDA NHWC EP.

---------

Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
2024-03-20 09:57:29 -07:00
Yi Zhang
8adbc09314
[Fix] Error Python Packaging Pipeline (Training CPU) (#19992)
### Description
fix the error caused by
https://github.com/microsoft/onnxruntime/pull/19973
2024-03-20 09:02:50 -07:00
zesongw
7e18cb4c35
[WebNN EP] Support MatMul 1D (#19862)
### Description
Support MatMul 1D inputs by combining Reshape and ReduceMean.



### Motivation and Context
ONNX MatMul can support 1D inputs, which is disabled in
`IsOpSupportedImpl`.
2024-03-20 08:32:57 -07:00
Ye Wang
6ff31e06d5
[MoE] Add TP and Mixtral MoE (#19945)
### Description
<!-- Describe your changes. -->

1.Support Tensor Parallelism in ShardedMoE.
2.Make necessary code changes to support Mixtral MoE.
3.Fix a bug related to using IOBinding in test script.
4.Fix the input size limitation

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-03-19 21:28:15 -07:00
mindest
3dfe4a5e6d
[ROCm] Remove MPI dependency and collectives to use NCCL (#19830)
### Description
* Remove MPI dependency to use NCCL AllReduce, etc.
* Exclude unsupported collectives in hipify
2024-03-19 17:35:18 -07:00
Abhishek Jindal
6fe02068af
Add const cast for DLManagedTensor (#19982)
### Description
<!-- Describe your changes. -->
Add Const Cast for DLManagedTensor as PyTorch has changed it's
[code](https://github.com/pytorch/pytorch/pull/121102) which creates
incompatibility.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix the below error while configuring ORT-training with nightly PyTorch
```
aten_op_executor.cc:60:40: error: invalid conversion from ‘const DLManagedTensor*’ to ‘DLManagedTensor*’ [-fpermissive]
   60 |     at::Tensor tensor = at::fromDLPack(dlpack);
      |                                        ^~~~~~
      |                                        |
      |                                        const DLManagedTensor*
```
2024-03-19 17:00:44 -07:00
Guenther Schmuelling
c45cff60cf
[js/webgpu] fix maxpool / fp16 (#19981) 2024-03-19 16:15:49 -07:00
Tianlei Wu
597e828aae
Adjust test tolerance (#19947)
### Description
Improve the precision of tests. 

Changes include:
(1) Update checkers.cc to use consistent default tolerance.
(2) Allow different default tolerances for different providers at
runtime (Previously, threshold of a test is decided during compiling).
(3) Explicitly set absolute and relative error tolerances for tests that
failed to pass new default threshold.

#### Default Thresholds Change

Note that the formula of testing is `abs(expected - value) < absolute +
relative * expected`

Default test thresholds when both absolute and relative tolerance are
not set:

type | provider | absolute (before) | absolute (after) | relative
(before) | relative (after)
-- | -- | -- | -- | -- | --
double | CPU | 0.001 | 0.00001 | 0 | 0.00001
double | CUDA | 0.005 | 0.00001 | 0 | 0.00001
double | TRT | 0.005 | 0.00001 | 0 | 0.00001
double | ROCM | 0.005 | 0.00001 | 0 | 0.00001
double | DML | 0.005 | 0.00001 | 0 | 0.00001
  |   |   |   |   |  
float | CPU | 0.0001 | 0.00001 | 0 | 0.0001
float | CUDA | 0.005 | 0.00001 | 0 | 0.0001
float | TRT | 0.005 | 0.00001 | 0 | 0.0001
float | ROCM | 0.005 | 0.00001 | 0 | 0.0001
float | DML | 0.005 | 0.00001 | 0 | 0.0001
float | Training* | 0.005 | 0.001 | 0 | 0.0001
  |   |   |   |   |  
half | CPU | 0.001 | 0.0025 | 0 | 0.001
half | CUDA | 0.005 | 0.0025 | 0 | 0.001
half | TRT | 0.005 | 0.0025 | 0 | 0.001
half | ROCM | 0.005 | 0.0025 | 0 | 0.001
half | DML | 0.02 | 0.005 | 0 | 0.001
half | Training* | 0.005 | 0.005 | 0 | 0.001
  |   |   |   |   |  
bfloat16 | CPU | 0.0001 | 0.02 | 0 | 0.01
bfloat16 | CUDA | 0.0001 | 0.02 | 0.05 | 0.01
bfloat16 | TRT | 0.0001 | 0.02 | 0.05 | 0.01
bfloat16 | ROCM | 0.0001 | 0.02 | 0.05 | 0.01
bfloat16 | DML | 0.0001 | 0.02 | 0.05 | 0.01
bfloat16 | Training* | 0.0001 | 0.02 | 0.05 | 0.01

*Training mean a build flag ENABLE_TRAINING_CORE is defined. The
provider can be any one.

#### Threshold for provider
 
Previously, the threshold might change according to build flags:
```
#if defined(USE_CUDA) || defined(USE_ROCM) || defined(USE_DML)
  constexpr float threshold = 0.005f;
#else
  constexpr float threshold = 0.0001f;
#endif
```
For a cpu only build, the threshold is 0.0001. For a cuda build, the
threshold for CPU provider (some tests in cuda build actually run with
CPU provider) is changed to 0.005.

After this change, the threshold only depends on data type and provider
used in the test. It will not change by build flags for non-training
builds.


Default thresholds for training might be different from inference
(please refer to the above table). There are a few factors there:
Training has gradient outputs; TF32 is not disabled in training; Some
training tests has iterations, and error might accumulate. How to set
different thresholds based on these factors could be a future task.
2024-03-19 15:50:13 -07:00
Hariharan Seshadri
cd6ec50b50
Switch a portion of CI/packaging jobs to MacOS12 (#19908) 2024-03-19 14:54:58 -07:00
Adrian Lizarraga
18a7f34ba0
[NhwcTransformerTests] Fix linker error due to explicit template instantiation of ModelBuilder methods (#19980)
Currently, the nhwc_transformer_test.cc compilation unit defines
explicit FP16 versions of `ModelTestBuilder::MakeInput<MLFloat16>` and
`ModelTestBuilder::MakeInitializer<MLFloat16>` outside of the
ModelTestBuilder class's header file.

These explicit template instantiations cause linker errors when other
compilation units also instantiate these functions due to duplicate
definitions. Additionally, the versions defined in
nhwc_transformer_test.cc do not really conform to the expected behavior
in the original ModelTestBuilder class, which is to make random
input/initializer values. Instead, the versions in
nhwc_transformer_test.cc create a range of values.

The solution is to edit nhwc_transformer_test.cc to use stand-alone
static functions that do not change the ModelTestBuilder class.

**Note**: This linker error cannot currently be replicated in our CIs
because it requires a QNN-HTP-enabled Windows ARM64 environment with
`MLAS_F16VEC_INTRINSICS_SUPPORTED` defined. I can replicate on a local
build. The linker error/conflict happens with with this new FP16 QNN
test:

d4c8bc359e/onnxruntime/test/providers/qnn/clip_op_test.cc (L186)
2024-03-19 13:48:04 -07:00
Yulong Wang
01c7aaf6aa
[js/webgpu] allow setting env.webgpu.adapter (#19940)
### Description
Allow user to set `env.webgpu.adapter` before creating the first
inference session.

Feature request:
https://github.com/microsoft/onnxruntime/pull/19857#issuecomment-1999984753

@xenova
2024-03-19 12:55:00 -07:00
Tianlei Wu
8293aa1564
Exclude TRT provider in tests crashed in A100 (#19972)
TensorRT EP segmentation fault on A100 for some tests. Exclude TRT EP in
those tests on A100 to unblock developing.

### Motivation and Context
https://github.com/microsoft/onnxruntime/issues/19530
2024-03-19 11:36:42 -07:00
Yi Zhang
d4c8bc359e
Fix Training CPU docker image name to avoid unnecessary rebuilding (#19973)
### Description
The docker image name was fixed, but the docker argument was different
in different job.
It would trigger rebuilding the docker image almost every time!!!
2024-03-19 09:33:24 -07:00
Prathik Rao
26cd3c1fb0
add kernel tests for ops that changed in opset18 (#19767)
### Description
<!-- Describe your changes. -->

- [x] Pad operator has introduced a new input called "axes" which
specifies which axis to pad. But it defaults to input_rank if axes is
not provided which was the behavior before the opset upgrade.
- [x] ReduceMean
- [x] ReduceL2
- [x] ReduceLogSumExp
- [x] ReduceSum
- Reduction ops all had the axes attribute switched to an input and a
new attribute called "noop_with_empty_axes" was added to define what to
do when axes is not specified.
- [x] Resize has had two new attributes introduced: antialias and
keep_aspect_ratio_policy. From Operators.md I've gathered:
"Antialiasing is achieved by stretching the resampling filter by a
factor max(1, 1 / scale), which means that when downsampling, more input
pixels contribute to an output pixel."
keep_aspect_ratio_policy "describes how to interpret the `sizes` input
with regard to keeping the original aspect ratio of the input." there
are a couple enum-type options that specify different policies and what
to do in each case.
- NOTE: Baiju already included opset18 tests in
https://github.com/microsoft/onnxruntime/pull/17772
- [x] ScatterElements/ScatterND has had a new attribute introduced
called "reduction." This specifies the type of reduction to apply: none
(default), add, mul, max, min.
- [x] Split introduced a new attribute called "num_outputs" which
specifies how many outputs to split the input tensor into. This is in
contrast to the previous, default behavior of specifying a "split" input
which defines the size of each resultant tensor of the output.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-03-19 09:33:06 -07:00
Xu Xing
4c6a6a37f7
[js/webgpu] Fix NAN caused by un-initialized buffer in instance-norm (#19387)
The added case will be NAN because of the un-initialized buffer.
2024-03-18 22:59:32 -07:00
Ted Themistokleous
6bb64683f8
Use version instead of version-dev for ROCm (#19967) 2024-03-19 10:40:40 +08:00
Guenther Schmuelling
a4ac727cbb
handle fp16 for where op (#19969)
this prevents falling back from webgpu to cpu, aka helps performance
2024-03-18 13:42:51 -07:00
Tianlei Wu
141966bb69
Disable TF32 in tests of CUDA ep (#19963)
Operator or model test result shall not depend on whether
NVIDIA_TF32_OVERRIDE environment variable is set or not. This make test results more deterministic.
2024-03-18 11:17:34 -07:00
Dmitri Smirnov
a033df8c31
Implement CustomOp Output Type Inference function (#19906)
### Description
<!-- Describe your changes. -->
This change addresses the following issues with the current CustomOP
Output Type inference
- The function does not take into account optional inputs. When input is
absent the inference is silently aborted, and no output type is inferred
(P1 customer issue)
- Inferring output type based on the input type for multi-kernel custom
ops is done based on the latest in sequence kernel definition. There is
not an attempt made to match the kernel based on the input type.
- Inference is aborted when variadic inputs/outputs are detected when
the generated input/output names fail to obtain type constraints. This
is not immediately clear from the code, because custom op schema is not
available within the inference function.
- No error reporting.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Most of CustomOPs lack their own type and shape inference function as it
was recently introduced. For that reason, it is important to fix this.
This change is inspired by a customer issue.

This is a follow up on:
- https://github.com/microsoft/onnxruntime/pull/15184
- https://github.com/cbourjau/ort-custom-op/pull/11
- https://github.com/microsoft/onnxruntime-extensions/issues/451
2024-03-18 10:28:39 -07:00
Edward Chen
4d31076d68
[objc] Add check for ORTValue being a tensor in ORTValue methods that should only be used with tensors. (#19946)
Add check to report error instead of crashing.
2024-03-18 08:54:24 -07:00
Guenther Schmuelling
7e0d424934
accumulate in fp32 for Reduce* (#19868) 2024-03-18 08:28:43 -07:00
dependabot[bot]
28ad6c3955
Bump follow-redirects from 1.15.4 to 1.15.6 in /js/node (#19951)
Bumps
[follow-redirects](https://github.com/follow-redirects/follow-redirects)
from 1.15.4 to 1.15.6.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="35a517c586"><code>35a517c</code></a>
Release version 1.15.6 of the npm package.</li>
<li><a
href="c4f847f851"><code>c4f847f</code></a>
Drop Proxy-Authorization across hosts.</li>
<li><a
href="8526b4a1b2"><code>8526b4a</code></a>
Use GitHub for disclosure.</li>
<li><a
href="b1677ce001"><code>b1677ce</code></a>
Release version 1.15.5 of the npm package.</li>
<li><a
href="d8914f7982"><code>d8914f7</code></a>
Preserve fragment in responseUrl.</li>
<li>See full diff in <a
href="https://github.com/follow-redirects/follow-redirects/compare/v1.15.4...v1.15.6">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=follow-redirects&package-manager=npm_and_yarn&previous-version=1.15.4&new-version=1.15.6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-03-16 18:54:53 -07:00
dependabot[bot]
4e55242a30
Bump follow-redirects from 1.15.4 to 1.15.6 in /onnxruntime/test/wasm (#19950)
Bumps
[follow-redirects](https://github.com/follow-redirects/follow-redirects)
from 1.15.4 to 1.15.6.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="35a517c586"><code>35a517c</code></a>
Release version 1.15.6 of the npm package.</li>
<li><a
href="c4f847f851"><code>c4f847f</code></a>
Drop Proxy-Authorization across hosts.</li>
<li><a
href="8526b4a1b2"><code>8526b4a</code></a>
Use GitHub for disclosure.</li>
<li><a
href="b1677ce001"><code>b1677ce</code></a>
Release version 1.15.5 of the npm package.</li>
<li><a
href="d8914f7982"><code>d8914f7</code></a>
Preserve fragment in responseUrl.</li>
<li>See full diff in <a
href="https://github.com/follow-redirects/follow-redirects/compare/v1.15.4...v1.15.6">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=follow-redirects&package-manager=npm_and_yarn&previous-version=1.15.4&new-version=1.15.6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-03-16 18:54:06 -07:00
dependabot[bot]
afdab62f53
Bump follow-redirects from 1.15.4 to 1.15.6 in /js/web (#19949)
Bumps
[follow-redirects](https://github.com/follow-redirects/follow-redirects)
from 1.15.4 to 1.15.6.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="35a517c586"><code>35a517c</code></a>
Release version 1.15.6 of the npm package.</li>
<li><a
href="c4f847f851"><code>c4f847f</code></a>
Drop Proxy-Authorization across hosts.</li>
<li><a
href="8526b4a1b2"><code>8526b4a</code></a>
Use GitHub for disclosure.</li>
<li><a
href="b1677ce001"><code>b1677ce</code></a>
Release version 1.15.5 of the npm package.</li>
<li><a
href="d8914f7982"><code>d8914f7</code></a>
Preserve fragment in responseUrl.</li>
<li>See full diff in <a
href="https://github.com/follow-redirects/follow-redirects/compare/v1.15.4...v1.15.6">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=follow-redirects&package-manager=npm_and_yarn&previous-version=1.15.4&new-version=1.15.6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-03-16 18:53:17 -07:00
wangshuai09
1eb67a07ca
Add cann_dependencies (#19929)
### Description
<!-- Describe your changes. -->

Add `cann_dependencies`


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

The previous [PR](https://github.com/microsoft/onnxruntime/pull/17365)
avioded using patchelf but lost `cann_dependencies`, This PR adds
`cann_dependencies` to avoid require cann libraries when repairing
wheel.
2024-03-15 20:28:43 -07:00
Yulong Wang
b29849a287
[js/common] fix typedoc warnings (#19933)
### Description
Fix a few warnings in typedoc (for generating JS API):
```
[warning] The signature TrainingSession.loadParametersBuffer has an @param with name "buffer", which was not used.
[warning] NonTensorType, defined in ./lib/onnx-value.ts, is referenced by OnnxValue but not included in the documentation.
[warning] TensorFactory, defined in ./lib/tensor-factory.ts, is referenced by Tensor but not included in the documentation.
[warning] ExternalDataFileType, defined in ./lib/onnx-model.ts, is referenced by InferenceSession.SessionOptions.externalData but not included in the documentation.
[warning] TensorToDataUrlOptions, defined in ./lib/tensor-conversion.ts, is referenced by Tensor.toDataURL.toDataURL.options but not included in the documentation.
[warning] TensorToImageDataOptions, defined in ./lib/tensor-conversion.ts, is referenced by Tensor.toImageData.toImageData.options but not included in the documentation.
[warning] Failed to resolve link to "GpuBufferType" in comment for Env.WebGpuFlags.adapter.
[warning] Failed to resolve link to "GpuBufferType" in comment for Env.WebGpuFlags.device.
```

Changes highlighted:
- Merge `CoreMlExecutionProviderOption` and
`CoreMLExecutionProviderOption`. They expose 2 set of different options
for React-native and ORT nodejs binding. This should be fixed in future.
- Fix a few inconsistency of names between JSDoc and parameters
- Fix broken type links
- Exclude trace functions
2024-03-15 19:01:50 -07:00
Belem Zhang
acb0df2280
Fix #19931 broken Get Started link of "ONNX Runtime JavaScript API" page (#19932)
### Description
Fix #19931 broken Get Started link

HTTP 404 for "Get Started" link in "ONNX Runtime JavaScript API" page

Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
2024-03-15 19:00:30 -07:00
Hector Li
d5c6a2cecf
Enable code in QNN UT to verify the fix for partition issue (#19939)
### Description
Enable code in QNN UT to verify the fix for partition issue relate to
QDQ model.
https://github.com/microsoft/onnxruntime/pull/19723
2024-03-15 17:02:01 -07:00
enximi
7b46b31558
fix: "UserWarning: Unsupported Windows version (11). ONNX Runtime sup… (#19845)
fix: "UserWarning: Unsupported Windows version (11). ONNX Runtime
supports Windows 10 and above, only."

### Description

Include Windows 11 in the version check. Now, you will not see the
warning “Unsupported Windows version (11). ONNX Runtime supports Windows
10 and above, only.”

### Motivation and Context

Warning on Windows 11: Only supports systems above Windows 10, which is
somewhat strange.
2024-03-15 12:41:44 -07:00
Yulong Wang
79e50aeef3
[js/web] rewrite backend resolve to allow multiple EPs (#19735)
### Description

This PR rewrite the backend resolve logic to support specifying multiple
EPs.

#### Backend

The first version of ONNX Runtime Web actually carried some existing
code from [ONNX.js](https://github.com/microsoft/onnxjs), which includes
the "backend" concept. The original "backend" in ONNX.js is designed in
a way assuming there is only one backend from user's backend hint list
will be used. For example, in ONNX.js, if user specify a backend hint as
`['webgl', 'wasm']`, ONNX.js will first try to use WebGL backend - if it
loads successfully (the browser supports webgl), then "webgl" backend
will be used and "wasm" will be ignored; otherwise, "webgl" will be
ignored and try to load "wasm" backend.

In short: only one backend will be used when initializing a session.

#### Execution Provider

Execution Provider, or EP, in ONNX Runtime is a different concept. One
of the differences is that users are allow to specify multiple EPs, and
if one does not support a particular kernel, it can fallback to other
EP. This is a very common case when using a GPU EP in ONNX Runtime.

#### Current Status: Backend v.s. EP

Because of the history reasons mentioned above, the current status is
quite confusing. There are **real backend**s, which means it's different
implementation in code; and there are **backend hint**s, which are used
as string names for backend hint; and there are **EP**s of the ONNX
Runtime concepts.

currently there are only 2 **backend**s in our code base: The "onnxjs
backend", and the "wasm backend". The "onnxjs backend" currently only
powers backend hint "webgl", which go into the old onnx.js code path.
All other backend hints including "wasm", "cpu"(alias to wasm), "webgpu"
and "webnn" are all powered by "wasm backend".

And because ORT Web treat "backend" as an internal concept and want to
align with ONNX Runtime, so those names of backend hints are becoming EP
names.

The following table shows today's status:

| Execution Provider Name (public) / Backend Hint (internal) | Backend |
EP in ORT
| -------- | ------- | ------- |
| "wasm"/"cpu" | WasmBackend | CPU EP
| "webgl" | OnnxjsBackend | \* technically not an EP
| "webgpu" | WasmBackend | JSEP
| "webnn" | WasmBackend | WebNN EP

#### Problem

While the API allows to specify multiple EPs, the backend resolving only
allows one backend. This causes issues when user specify multiple EP
names in session options, the backend resolve behavior and EP
registration behavior is inconsistent. Specifically, in this issue:
https://github.com/microsoft/onnxruntime/issues/15796#issuecomment-1925363908:

EP list `['webgpu', 'wasm']` on a browser without WebGPU support
resolves to 'wasm' backend, but the full EP list is passed in session
options, so JSEP is still enabled, causing the runtime error.


#### Solution

Since we still need WebGL backend, we cannot totally remove the backend
register/resolve system. In this PR I made the following changes:
- initialize every backend from the EP list, instead of only do that for
the first successful one.
- for the first resolved backend, filter all EP using the exact same
backend. Remove all EPs not using this backend from session options
- for every explicitly specified EP, if it's removed, show a warning
message in console
2024-03-15 11:47:45 -07:00
Yifan Li
0b2a75b274
[EP Perf] Add concurrency test (#19804)
### Description
<!-- Describe your changes. -->
* Add concurrency test to EP Perf CI panel (impl. by onnx_test_runner)
  * Model: FasterRCNN-10 model within CI image
  * `-c` param configurable via CI panel when kicking off CI tasks
  * Auto-replicate test input/outputs according to `-c` param
* By default, the model test will be executed in 100 iterations (~2min
added to T4 CI task load overall)

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
To monitor potential concurrency issues of ORT-TRT
2024-03-15 07:41:21 -07:00
Hariharan Seshadri
42399dfd2b
Fix a potential race in the CUDA TopK kernel (#19917)
### Description
If the `K` value is flowing through as a tensor, we are updating a
mutable member of the `TopK` class and basing the compute off that -
which is likely to cause data race issues with concurrent Run() calls
and `K` value changes.


### Motivation and Context
Fix potential race in CUDA TopK kernel
2024-03-14 18:13:47 -07:00
Justin Chu
bcf47d3546
Update install_deps_lort.sh to fix onnxscript installation (#19922)
Install onnxscript correctly with `pip install`. Dev dependencies are
not required.

### Motivation and Context

Fix build breaks.
2024-03-14 17:05:50 -07:00
Adam Louly
32558134a9
[On-Device-Training] Upgrade Flatbuffers to Support 2GB+ Checkpoints. (#19770)
### Description
Modifications to support 2GB+ checkpoint & Upgrading Flatbuffers


### Motivation and Context
This PR includes changes that will make ort handle 2GB+ checkpoints.
To do that we need to upgrade flatbuffers to 23.5.9 -
https://github.com/google/flatbuffers/pull/7945

- Modified the commitHash and the hash for the new version
- Removed the patch for rust generator's unused variable warning as it
is no longer producing this - [Check it out
here](d121e09d89/src/idl_gen_rust.cpp)
- Updated the VerifyField calls with alignment values that were
introduced in the new version.

---------

Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>
2024-03-14 16:36:24 -07:00
Yi Zhang
87a9f77c56
Refactor Python Packaing Pipeline (Training Cuda 11.8) (#19910)
### Description
1. Use stage to organize the pipeline and split building and testing
2. Move compilation on CPU machine
3. test stage can leverage existing artifacts
4. check wheel size, it gives warning if the size above 300M
5. docker image name wasn't change even the argument changed, which
caused the docker image was always rebuilt. So update the docker image
name according to the argument can save the docker build time.

Pipeline duration reduced by 60% (2 hours ->  50 minutes)
Compilation time reduced by 75% (1.5hours -> 20 minutes)
GPU time reduced by 87% ( 8 hours to 1 hours)
for debugging, the GPU time could be reduced by above 95%, because we
can choose run only one test stage and skip building.

### Motivation and Context
Make the pipeline efficient.
Optimized

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=424177&view=results
Curent

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=422393&view=results

---------
2024-03-15 06:47:41 +08:00
Changming Sun
8b766bd24e
Change nuget pipeline's "Windows_Packaging_combined_GPU" job to download TRT binaries in every build (#19919)
### Description
Change nuget pipeline's "Final_Jar_Testing_Windows_GPU" job to download
TRT binaries in every build. Now all the other build jobs are already
doing this. This is the only one left.

Similar to #19909

### Motivation and Context

As a follow up of #19118
2024-03-14 15:07:56 -07:00
Tianlei Wu
a2ffc3740b
[Cuda] Demo multiple cuda graphs and user compute stream (#19883)
Update stable diffusion demo to add options `--max-cuda-graphs` and
`--user-compute-stream`.

* Add python class GpuBindingManager to manage IO Binding based on input
shape and max number of cuda graphs setting. The benefit is that one
inference session could enable or disable cuda graph in different runs.
* When `--user-compute-stream`, the demo will use custom compute stream.
2024-03-14 13:48:37 -07:00
Edward Chen
0b90363acb
[MLAS][AArch64] SQ4BitGemm CompInt8 multi-block implementation (#19826)
Update SQ4BitGemm CompInt8 implementation to process multiple blocks along a single column instead of processing single blocks from multiple columns.
2024-03-14 13:05:42 -07:00
Baiju Meswani
226f60f2f1
Add support for SGD optimizer in minimal build (#19901) 2024-03-14 11:31:20 -07:00
Changming Sun
1fb6cbddee
Add a build patch for Windows ARM64EC (#19898)
### Description
Add a patch for Windows ARM64EC


### Motivation and Context
Will need more changes in onnxruntime/core/common/cpuid_arch_definition.h and onnxruntime/core/common/cpuid_info.cc
2024-03-14 08:50:42 -07:00