Commit graph

11997 commits

Author SHA1 Message Date
cao lei
f430600432
Enable streams for DML EP. This change is to revert PR 19481 since the bug 19480 is fixed by PR 19515 (#19609)
### Description
<!-- Describe your changes. -->
Enable streams for DML EP. This change is to revert PR 19481 since the
bug 19480 is fixed by PR 19515


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Enable streams for DML EP. This change is to revert PR 19481 since the
bug 19480 is fixed by PR 19515
2024-02-23 06:02:05 -08:00
satyajandhyala
ae3d73c981
[JS/WebGPU] Fix Split and Where to handle corner cases. (#19613)
### Description
<!-- Describe your changes. -->
1. Fix Where operator to handle Boolean input less than 4 bytes.
2. Fix JSEP test harness to use tensor names consistently.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-02-23 00:21:15 -08:00
Markus Tavenrath
5e432a3ae6
Add support for NHWC GridSample in the CUDA EP and enable grid_sample_test for all EPs (#19562)
I've added NHWC GridSample support to the CUDA EP to reduce the number
of layout transforms. Also I've enabled the full set of GridSampleTests
for all EPs. I've also added the GridSample OpSet 16 to the registered
kernels.

### Motivation and Context
This is the first PR is a series of enhancements of the CUDA EP
improving NHWC support to avoid costly layout transforms between NWHC
and NCHW nodes which are layout sensitive. Also testing was quite
rudimentary for the CUDA EP while it was great for the CPU path. I've
regenerated grid_sample_test.cc enabling tests for other platforms as
well. Those tests resurfaced #10607 again which is fixed as well.
2024-02-22 19:47:15 -08:00
pengwa
ae92d593c0
ONNX Gelu Op in Opset 20 (#19560)
### ONNX Gelu Op in Opset 20

Refactor code to support MSDomain Gelu and ONNX Gelu-opset20 Op

1. Move CPU-GELU implmentation from
`onnxruntime/contrib_ops/cpu/activations.h/cc` to
`onnxruntime/core/providers/cpu/tensor/gelu.h/cc`, as the implementation
for approximate attribute to be 'none'.
2. Dumplicate some logic from
`onnxruntime/contrib_ops/cpu/bert/bias_gelu.cc` to
`onnxruntime/core/providers/cpu/tensor/gelu.h/cc`, as the implementation
for approximate attribute to be 'tanh'.
3. Register ONNX domain Gelu CPU kernel from opset 20 in
`onnxruntime/core/providers/cpu/cpu_execution_provider.cc`.
4. Move `onnxruntime/contrib_ops/cuda/bert/fast_gelu_impl.h/cu` to
`onnxruntime/core/providers/cuda/tensor/gelu_impl.h` and
`onnxruntime/core/providers/cuda/tensor/gelu_approximate_impl.cu`
respectively, as the implementation for approximate attribute to be
'tanh'.
5. Implement the logic for approximate attribute to be 'none' in
`onnxruntime/core/providers/cuda/tensor/gelu_impl.cu`.
6. Register ONNX domain Gelu CUDA kernel from opset 20 in
`onnxruntime/core/providers/cuda/cuda_execution_provider.cc`.
7. ROCM ep related changes. 
8. Enrich the tests for ONNX domain Gelu in
`onnxruntime/test/providers/cpu/activation/activation_op_test.cc`.
2024-02-23 11:05:16 +08:00
Segev Finer
29b1106033
[node] Switch to setImmediate to avoid starving the Node.js event loop (#19610)
### Description
<!-- Describe your changes. -->
Switch to setImmediate to avoid starving the Node.js event loop

There should really be a true async version though, running
computationally intensive things on the event loop will stop everything
else from happening while it is running, e.g. a web server from
answering requests.

This can be done by wrapping `RunAsync` behind a
[`napi::Promise`](https://github.com/nodejs/node-addon-api/blob/main/doc/promises.md)
to run on the onnxruntime thread pool or [`AsyncWorker`](
https://github.com/nodejs/node-addon-api/blob/main/doc/async_worker.md)
for the Node.js/libuv thread pool.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Without this, if you run inference in a tight loop, without anything
else in between that is async/deferred, `process.nextTick` will lead to
starving the event loop and not letting anything else run,
`setImmediate` at least lets the event loop spin between calls to `run`.

See
https://dev.to/ynmanware/setimmediate-settimeout-and-process-nexttick-3mfd

Contributed on behalf of [Swimm](https://swimm.io/)
2024-02-22 18:53:50 -08:00
Hector Li
4ab497603e
Enable user to set QNN HTP performance mode for every session run (#19521)
### Description
Currently, the QNN HTP performance mode is set during session creation, there's no way to change it afterwards. There's requirement to set it high performance mode for high priority request and set it back to low performance mode later to save the power when the incoming request is idle for example.

Now, still keeps the performance mode at the session level in QNN EP options which is used at the default one. Ort QNN EP will set it once if user set it.
And there are setting (qnn.htp_perf_mode and qnn.htp_perf_mode_post_run) in run option to change the performance mode before and after session run. There's recommended scenario that user set the mode to high performance mode before the the inference sun so that user can get the result back ASAP. And set the mode to low performance mode after the inference to save the power.
2024-02-22 17:04:59 -08:00
AtomicVar
5e5c36f6df
Fix citation author name issue (#19597)
Use `name` rather than `given-names` to set author name.

### Motivation and Context
The old CITATION.cff uses `given-names` to set author names, which won't
be rendered properly with some bibtex style of LaTeX:

<img width="680" alt="image"
src="https://github.com/microsoft/onnxruntime/assets/22856433/c509400e-5b16-4400-8950-550b05186369">

The problem is that **the `"ONNX Runtime developers"` is regarded as a
human name**.

How to fix: by using `name` to set author name, the generated Bibtex
entry will use `{}` to enclose the `"ONNX Runtime developers"`. Then it
is displayed literally:

<img width="742" alt="image"
src="https://github.com/microsoft/onnxruntime/assets/22856433/94083c9f-0daa-4c51-92e1-c966b88d09d2">
2024-02-22 17:03:56 -08:00
dependabot[bot]
76a2a487a1
Bump ip from 1.1.8 to 1.1.9 in /js/react_native/e2e (#19583)
Bumps [ip](https://github.com/indutny/node-ip) from 1.1.8 to 1.1.9.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="1ecbf2fd8c"><code>1ecbf2f</code></a>
1.1.9</li>
<li><a
href="6a3ada9b47"><code>6a3ada9</code></a>
lib: fixed CVE-2023-42282 and added unit test</li>
<li>See full diff in <a
href="https://github.com/indutny/node-ip/compare/v1.1.8...v1.1.9">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=ip&package-manager=npm_and_yarn&previous-version=1.1.8&new-version=1.1.9)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
Dependabot will merge this PR once CI passes on it, as requested by
@fs-eire.

[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-02-22 13:58:17 -08:00
Hector Li
09622418c4
Add special handling if there is only 1 graph inside the cached QNN context binary (#19594)
Add special handling if there is only 1 graph inside the cached QNN context binary. No need to make the EPContext node name match the QNN graph name. This is for better backward compatibility in case the QNN context model is generated before the PR for QNN context binary model support multi-partition.
2024-02-22 13:15:13 -08:00
Xu Xing
fe82fccf1a
[js/webgpu] Fix Conv2DTransposeMatMul f16 compilation failure (#19596)
This is used in sam-h-decoder-f16.

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-02-22 13:09:28 -08:00
jingyanwangms
3bdb10d5ca
Move import to when needed to avoid circular dependency error (#19579)
### Description
Move import to when needed to avoid circular dependency error


### Motivation and Context
Fixes dependency error described here:
https://github.com/microsoft/DeepSpeed/issues/5140

---------

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
2024-02-22 10:56:25 -08:00
cao lei
6b73ab3e3e
Introduce reused_buffer_index_per_stream in allocation planner which will be reset after computing the reuse buffer for each stream (#19515)
### Description
<!-- Describe your changes. -->
Introduce reused_buffer_index_per_stream in allocation planner which
will be reset after computing the reuse buffer for each stream. So if a
NodeArg is an input of several Ops across different streams and reuses
other NodeArg, the reused NodeArg won't be involved when computing the
second stream's reuse plan.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This is to fix https://github.com/microsoft/onnxruntime/issues/19480,
which is a crash for the scenario mentioned above.

---------

Co-authored-by: Lei Cao <leca@microsoft.com>
2024-02-22 10:19:08 -08:00
PeixuanZuo
05ed89f469
[ROCm] Add excluded libs for ROCm python package (#19586)
The rocm lib version has changed in rocm 6.0

Using libs packaged in whl might cause errors. 
For example, `libamdhip64.so.6` packaged in whl will cause compute error
when training gpt2 model.

The root cause still in investigating.
2024-02-22 13:34:55 +08:00
PeixuanZuo
8354329086
[ROCm] SkipGroupNorm triton (#19408)
Change GroupNorm triton to support SkipGroupNorm
2024-02-22 13:34:45 +08:00
Vincent Wang
3d88487c96
Minor Triton Fix (#19589)
Including removing a unnecessary assert, and add support of passing
string attribute from ONNX node attribute to python functoin kwargs
(mainly for passing debug info from graph to python for now).
2024-02-22 10:35:26 +08:00
Sheil Kumar
5197db1980
Diable __cpuid call for ARM64EC (#19592)
Diable __cpuid call for ARM64EC

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2024-02-21 15:45:44 -08:00
dependabot[bot]
38c3432393
Bump ip from 1.1.8 to 1.1.9 in /js/react_native (#19582)
Bumps [ip](https://github.com/indutny/node-ip) from 1.1.8 to 1.1.9.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="1ecbf2fd8c"><code>1ecbf2f</code></a>
1.1.9</li>
<li><a
href="6a3ada9b47"><code>6a3ada9</code></a>
lib: fixed CVE-2023-42282 and added unit test</li>
<li>See full diff in <a
href="https://github.com/indutny/node-ip/compare/v1.1.8...v1.1.9">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=ip&package-manager=npm_and_yarn&previous-version=1.1.8&new-version=1.1.9)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
Dependabot will merge this PR once CI passes on it, as requested by
@fs-eire.

[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-02-21 13:58:53 -08:00
Matttttt
ebd220b073
Misspelling in README.md (#19433)
Fixed a misspelling.
2024-02-21 13:38:18 -08:00
Tianlei Wu
3afb38cfb7
[CUDA] Add use_tf32 cuda provider option (for FP32 Conv) (#19426)
Follow up of https://github.com/microsoft/onnxruntime/pull/19357 to apply the use_tf32 option on fp32 cuDNN convolution.

When use_tf32 = 0, we will disable TF32 in cuDNN convolution for FP32 inputs.

https://docs.nvidia.com/deeplearning/cudnn/api/cudnn-graph-library.html#cudnnmathtype-t
**CUDNN_FMA_MATH**
- Restricted to only kernels that use FMA instructions.
- On pre-NVIDIA A100 GPU devices, CUDNN_DEFAULT_MATH and CUDNN_FMA_MATH
have the same behavior: Tensor Core kernels will not be selected.
- With NVIDIA Ampere architecture and CUDA toolkit 11,
CUDNN_DEFAULT_MATH permits TF32 Tensor Core operation and CUDNN_FMA_MATH
does not.
- The TF32 behavior for CUDNN_DEFAULT_MATH and the other Tensor Core
math types can be explicitly disabled by the environment variable
NVIDIA_TF32_OVERRIDE=0.
2024-02-21 12:46:16 -08:00
Adam Pocock
e5ce81ae84
[java] Adding ML program flag for CoreML (#19551)
### Description
Adds the new CoreML enum flags to enable ML Program support in Java.

### Motivation and Context
Adds support for #19347 to the Java API.
2024-02-21 12:24:41 -08:00
Xu Xing
57d6819212
[js/web] Fix fused-conv is not included in npm test (#19581)
BUG: https://github.com/microsoft/onnxruntime/issues/18855

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-02-21 08:08:47 -08:00
Yulong Wang
58f4921686
[js] changes to allow Float16Array if any polyfill is available (#19305)
### Description

This change adds only necessary code to enable ort-web works with any
Float16Array polyfill. Unlike #19302, in this PR, ort-web does not
include any specific polyfill; instead, it's user's choice for how to
use a polyfill.

ORT-web uses Float16Array if it's available; otherwise, fallback to use
Uint16Array.

```js
// case 1: user does not use polyfill:
import * as ort from 'onnxruntime-web';

const myF16Data = new Uint16Array(...);  // need to use Uint16Array
const myF16tensor = new ort.Tensor('float16', myF16Data, dims);
```

```js
// case 2: user use polyfill:
import * as ort from 'onnxruntime-web';
import {
  Float16Array, isFloat16Array, isTypedArray,
  getFloat16, setFloat16,
  f16round,
} from "@petamoriken/float16";
globalThis.Float16Array = Float16Array;  // ort-web will pick the global Float16Array

const myF16Data = new Float16Array(...);  // Use the polyfilled Float16Array type
const myF16tensor = new ort.Tensor('float16', myF16Data, dims);
```
2024-02-21 00:31:06 -08:00
satyajandhyala
8092a89688
Changed command line argpasrse to process '--symmetric [True|False]'. (#19577)
### Description
<!-- Describe your changes. -->
Accept the command line option --symmetric and its optional value
correctly. If the optional value matches uncased to 'True' then set
symmetric to True else set symmetric to False. Asymmetric quantization
will generate zero_point input.
```
usage: matmul_4bits_quantizer.py [-h] --input_model INPUT_MODEL --output_model OUTPUT_MODEL [--block_size BLOCK_SIZE] [--symmetric [{True,False}]] [--accuracy_level ACCURACY_LEVEL] [-v]
                                 [--nodes_to_exclude NODES_TO_EXCLUDE [NODES_TO_EXCLUDE ...]]
```
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-02-20 21:18:54 -08:00
Baiju Meswani
124bde985a
Bring QAT POC back to a functional state (#19290) 2024-02-20 19:20:42 -08:00
PeixuanZuo
6226c5f62f
[ROCm] Add SkipGroupNorm for ROCm EP (#19303)
Add SkipGroupNorm for ROCm EP.

---------

Co-authored-by: Peixuan Zuo <peixuanzuo@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2024-02-21 11:08:48 +08:00
zhijiang
8fadc6c913
Zhijxu/cleanup cached tensors when oom (#19306)
in pytorch, when oom happens at bp, user could decrease the batch size
and rerun it without restarting the process.

while in ORT, the intermediate tensors are kept even OOM, so decrease
batch size still fail.


this is torch run, we can see after oom failure, torch will release
tensor before next step

![image](https://github.com/microsoft/onnxruntime/assets/43435212/92b8a2e3-454b-448a-a223-17cb91d463c2)

this is from ort, we can see ort not release its tensors after OOM
failure.

![image](https://github.com/microsoft/onnxruntime/assets/43435212/bb6a3882-8e14-4f37-8079-e7f70fc2546b)

ort with the PR, we can see memory is released, **the 4GB memory is not
own by ort, and will be released by torch at the end**.

![image](https://github.com/microsoft/onnxruntime/assets/43435212/7f39d711-4e36-47d5-aecf-3805433a6d01)
2024-02-21 10:41:42 +08:00
Markus Tavenrath
0c4421cb78
Fix compile warnings (as errors) for functions which miss returning required return value (#19079)
Added dummy return values to functions which specify a return value, but
do not return an value value.

### Motivation and Context
Fix compiler errors with 'warnings as errors' enabled.
2024-02-21 12:39:43 +10:00
Scott McKay
45e20bf781
Use build.py to build in py-win-gpu.yml so parallelization parameters are set (#19578)
### Description
<!-- Describe your changes. -->
build.py sets a few parallelization parameters when building. Using
msbuild directly lacks those.


7a5860e490/tools/ci_build/build.py (L1665-L1669)

Changed to use build.py. If there's a concern with that we _could_ set
the parameters in the yaml, but that will be uglier due to duplicating
logic in multiple places.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-02-21 10:38:37 +08:00
Yulong Wang
6e04e36e3f
[js/common] upgrade tsc in common from 4.9.5 to 5.2.2 (#19317)
### Description
upgrade tsc in common from 4.9.5 to 5.2.2
2024-02-20 17:33:37 -08:00
Yulong Wang
70567a4b3a
[js/web] use ApiTensor insteadof onnxjs Tensor in TensorResultValidator (#19358)
### Description
use ApiTensor insteadof onnxjs Tensor in TensorResultValidator. Make
test runner less depend on onnxjs classes.
2024-02-20 17:33:21 -08:00
Yulong Wang
3fe2c137ee
[js] small fix to workaround formatter (#19400)
### Description
Rename shader variable names to snake_case naming and also to avoid
formatter behaving inconsistently in win/linux.
2024-02-20 17:23:01 -08:00
Yulong Wang
97ff17c2cb
update script of run CI for external PRs to add "Big Models" (#19576)
### Description
update script of run CI for external PRs to add "Big Models"
2024-02-20 17:02:11 -08:00
Jake Mathern
7a5860e490
Fix cmake function duplicate lib (#19547)
### Description
Fixes cmake function definition in winml.cmake to copy link flags.



### Motivation and Context
XFGCheck errors in WindowsAI because this function does not transfer
linker flags
2024-02-20 13:41:40 -08:00
Scott McKay
ec9c8cbdc9
Use xcode parallel build flags to speed up iOS CI that is timing out (#19570)
### Description
<!-- Describe your changes. -->
Provide specific xcodebuild flags instead of depending on cmake to do
the right thing.

This built in just over an hour with a ccache miss. Previous CIs with a
ccache miss were timing out after 150 minutes.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-02-21 07:40:35 +10:00
Sheil Kumar
3c49aacd56
Disable __cpuid check on arm64 builds as intrinsic is not available (#19574)
Disable __cpuid check on arm64 builds as intrinsic is not available

Motivation
Breaking the arm64 build.

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2024-02-20 13:13:40 -08:00
Jiajie Hu
1b48054e1b
[js/webgpu] Create Split indices helpers by rank, not by shape (#19554)
### Description
This is required to make shape uniforms really work.

### Motivation and Context
The bug was unveiled in a model with multiple Split nodes. The later
nodes would try to reuse a previous pipeline cache, while the old shapes
were hardcoded as constants in cache.
2024-02-20 09:24:34 -08:00
Xavier Dupré
7efb0dbe12
add option DefaultTensorType to specify the default tensor type to quantize (#19455)
### Description
The current quantization tool relies on shape inference to provide the
type of every intermediate tensor, then the tool knows which type it
must dequantize into (float32, float16). However, this information is
not available if shape inference fails. That happens every time the
model include an operator from a custom domain such as com.microsoft.

This PR introduces an extra option `DefaultTensorType` as a fall back
when the quantizer cannot find the type it needs.

### Motivation and Context
This fixes issue #19409.
2024-02-20 08:22:44 -08:00
Markus Tavenrath
e832562d70
Fix invalid usage of designated initializers. (#19497)
### Description
I've replaces all ocurances of C++ designated initializers in the CUDA
NHWC Tests by member initialization.



### Motivation and Context
C++ designated initializers have been introduced in C++ 20. Yet GCC
accepts designated initializers in C++17 which is the standard used to
compile onnxruntime. Yet MSVC is standard conform and accepts this
feature starting C++20 which leads to compile failures on Windows
without this change.
2024-02-20 18:06:03 +10:00
PeixuanZuo
f3e3b531fe
Update build directory clean up stage for python package pipeline (#19553)
Fix to make clean up stage take effect.

If the `SourceFolder ` is empty, the task deletes files from the root
folder of the repository as though
[$(Build.SourcesDirectory)](https://learn.microsoft.com/en-us/azure/devops/pipelines/build/variables)
was specified.
2024-02-20 10:31:39 +08:00
pengwa
b55260d076
Minor fix for cmake (#19552)
### Minor fix for cmake

When build on Linux, get a warning saying "
CMake Warning at CMakeLists.txt:1603 (message):
  MPI and NCCL disabled on Win build.
"

This message is not correct. So have such a fix to avoid any
misunderstanding from users.


![image](https://github.com/microsoft/onnxruntime/assets/10530022/848c2d77-a538-4e31-8e0d-4b539233e515)




### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-02-19 10:21:19 +08:00
satyajandhyala
dfeda9019c
[JS/WebGPU] Add MatMulNBits (#19446)
### Description
Add MatMulNBits to support MatMul using 4-bit quantized weights



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-02-17 09:19:17 -08:00
Yulong Wang
06269a3952
[js/webgpu] allow uint8 tensors for webgpu (#19545)
### Description
allow uint8 tensors for webgpu
2024-02-16 18:28:27 -08:00
Adrian Lizarraga
4874a41008
[QNN EP] Update default QNN SDK to 2.19.2.240210 (#19546)
### Description
Updates the default QNN SDK version to 2.19.2.240210.

### Motivation and Context
Build and test the latest version of QNN SDK in our pipelines.
2024-02-16 16:59:43 -08:00
kunal-vaishnavi
44d8ad93b2
Whisper Timestamps and Temperature (#19509)
### Description
This PR updates exporting and running the Whisper model with beam search
by adding the following.

- Adds temperature as a graph input to the exported model
- Fixes the token ids by adding them as attributes to
`WhisperBeamSearch`
- Fixes the timestamps test cases so they pass now
- Fixes a bug with invoking `torch.onnx.export`
- Cleans up the Whisper scripts and groups the arguments in
`convert_to_onnx.py`
- Adds a `requirements.txt` file to specify package dependencies
- Adds `whisper-large-v3` to list of pretrained models
- Fixes a bug with missing cross-attention KV cache inputs in the
decoder subgraph

### Motivation and Context

- This is a follow-up to [this
PR](https://github.com/microsoft/onnxruntime/pull/19188).
- The incorrect token ids in the timestamps processor were first noticed
during [this PR
review](https://github.com/microsoft/onnxruntime/pull/17500#discussion_r1333520007).
When they were originally added in [this
PR](https://github.com/microsoft/onnxruntime/pull/15853), the offsets
were previously constant across the Whisper model sizes. When comparing
the new `whisper-large-v3` variant, the English-only variants (e.g.
`whisper-tiny.en`), and the original variants (e.g. `whisper-tiny`),
both the values and the offsets differ. Therefore, it is easier to set
the token ids as attributes to `WhisperBeamSearch` when exporting to
ensure the right values are used in the timestamps processor.
- The Hugging Face API for returning timestamps and the expected outputs
from the PyTorch model have both changed.
- The fix for `torch.onnx.export` is a follow-up to [this PR
review](https://github.com/microsoft/onnxruntime/pull/17179#issuecomment-1683001470).
- The argument grouping is a follow-up to [this PR
review](https://github.com/microsoft/onnxruntime/pull/17500#discussion_r1333521721).
- Specific package versions are needed to run the Whisper scripts and
the `requirements.txt` file ensures that these versions are installed.
- The `whisper-large-v3` variant is released and should be in the list
of official pretrained models.
- After the changes from [this
PR](https://github.com/microsoft/onnxruntime/pull/17316), the exported
model is not loading in an ORT inference session because the
cross-attention KV cache inputs are missing in the decoder subgraph.
2024-02-16 15:21:43 -08:00
Tianlei Wu
1dce5e1732
Disable TF32 in Linux_Test stage of Linux GPU CI Pipeline (#19541)
### Description
Some test thresholds that previously worked in T4 GPU does not work
anymore. The reason is current pipeline uses A10, and TF32 is enabled by
default.

Disable TF32 in Linux GPU CI Pipeline in testing to avoid such random
test failure.

### Motivation and Context
Linux Test has random failure at tests:

ProviderOptionsTest > testCUDAOptions() FAILED
org.opentest4j.AssertionFailedError: array contents differ at index
[446], expected: <0.0419757> but was: <0.041948937>
at
app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
at
app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
at
app//org.junit.jupiter.api.AssertArrayEquals.failArraysNotEqual(AssertArrayEquals.java:440)
at
app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:290)
at
app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:123)
at
app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:119)
at
app//org.junit.jupiter.api.Assertions.assertArrayEquals(Assertions.java:1360)
at
app//ai.onnxruntime.providers.ProviderOptionsTest.runProvider(ProviderOptionsTest.java:99)
at
app//ai.onnxruntime.providers.ProviderOptionsTest.testCUDAOptions(ProviderOptionsTest.java:43)
 
org.opentest4j.AssertionFailedError: array contents differ at index [6],
expected: <0.0225981> but was: <0.022587791>
at
app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
at
app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
at
app//org.junit.jupiter.api.AssertArrayEquals.failArraysNotEqual(AssertArrayEquals.java:440)
at
app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:290)
at
app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:123)
at
app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:119)
at
app//org.junit.jupiter.api.Assertions.assertArrayEquals(Assertions.java:1360)
at app//ai.onnxruntime.InferenceTest.runProvider(InferenceTest.java:676)
at app//ai.onnxruntime.InferenceTest.testCUDA(InferenceTest.java:615)
2024-02-16 14:41:11 -08:00
Adrian Lizarraga
b84712151c
QNN EP: Fuse DQ -> Q sequences into a QNN Convert op (#19511)
### Description
Fuses DQ -> Q sequences into a QNN Convert operator if:
- Converting from one qtype to another. Ex: Dequantize(uint8 to float)
-> Quantize(float to uint16)
- The DQ and Q operators are not part of another node unit (i.e.,
standalone)
- The Q operator is the only consumer for the DQ operator.



### Motivation and Context
Allows faster execution of QDQ models with mixed activation types by
leveraging the QNN Convert operator, which converts between quantization
types. For certain models, this results in inference latency speed-ups
of up to 2x (depends on the number of DQ -> Q sequences).

#### Example for Add node unit with 16-bit I/O:

Original:
```
u8 ----> DQ ---> Q ---u16--> Add ---u16-->
                              ^
                              |
u16 --------------------------+
```

After fusing DQ -> Q:
```
u8 ----> Convert ---u16--> Add ---u16-->
                            ^
                            |
u16 ------------------------+
```
2024-02-16 14:36:05 -08:00
Sheil Kumar
ef0b71308c
Optimize KahnsTopologicalSort and PriorityNodeCompare (#19475)
**Description**
1) During SessionInitialization, KahnsTopologicalSort is a major cause
of perf degradation.
The main cause of slow down is that the TopologicalSort needs to keep
track of nodes to visit in order, and reorder them based on priority (as
informed by a comparator). The existing implementation uses a
priority_queue that is backed by a std::vector container. However,
vectors are not good for insertion and reordering. The appropriate data
type for this operation is a linked list. However, linked lists like
std::list are not usable as a container for std::priority_queue. This is
because std::priority_queue requires random access, which linked lists
do not have. However, for this simple implementation, we can leverage a
std::list under the hood and perform insertions manually using
std::upper_bound. This drastically reduces the time taken by the method,
which currently instead causes numerous recopies and a lot of movement
inside the graph nodes to visit list.

2) In the comparator, I hide forward and backward attribute checking
behind the #ifdef ENABLE_TRAINING macro, as I believe it should only be
valid in the training scenario.

3) In noopelimination transformer, I prevent the creation of Initializer
(which unpacks tensorproto data) in every node and only create
initializers when Add/Sub/Mul/Div op nodes are detected.

**Motivation and Context**
Session creation time of many models is quite slow.

---------

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2024-02-16 05:34:55 -08:00
Tianlei Wu
4bfa69def8
Speed Up DecoderMaskedSelfAttentionTest (#19531)
### Description
The unit tests take 19 minutes to run (in debug build) because of too
many combinations. I reduce the combinations and remain good test
coverage. After the change, the test can finish in 51 seconds.

Before:
[----------] 2 tests from DecoderMaskedSelfAttentionTest
[ RUN      ] DecoderMaskedSelfAttentionTest.Test_fp32
[       OK ] DecoderMaskedSelfAttentionTest.Test_fp32 (394086 ms)
[ RUN      ] DecoderMaskedSelfAttentionTest.Test_fp16
[       OK ] DecoderMaskedSelfAttentionTest.Test_fp16 (747035 ms)
[----------] 2 tests from DecoderMaskedSelfAttentionTest (1141122 ms
total)

After:
[----------] 2 tests from DecoderMaskedSelfAttentionTest
[ RUN      ] DecoderMaskedSelfAttentionTest.Test_fp32
[       OK ] DecoderMaskedSelfAttentionTest.Test_fp32 (21057 ms)
[ RUN      ] DecoderMaskedSelfAttentionTest.Test_fp16
[       OK ] DecoderMaskedSelfAttentionTest.Test_fp16 (30653 ms)
[----------] 2 tests from DecoderMaskedSelfAttentionTest (51710 ms
total)


### Motivation and Context
Reduce test time, and improve build pipeline efficiency.
2024-02-15 20:22:36 -08:00
sophies927
d0061d6fb1
Update stale.yml to use old version as a bug fix (#19532)
### Description
Changed the actions/stale version back to v8 from v9.



### Motivation and Context
There is a well-documented issue w/ the new actions/stale version
(v9.0.0) that causes the following error: "Error delete _state: [403]
Resource not accessible by integration". See
https://github.com/actions/stale/issues/1133 for more context.

This issue is preventing the stale bot from labeling stale issues since
the version was updated b/c the action can no longer access the cache
and cannot apply labels to all issues due to GH API rate limiting.

There are two potential fixes if we continue to use the new version: (1)
run the action on all PRs/issues to avoid using the cache or (2) give
write access to the endpoints listed in
https://docs.github.com/en/rest/authentication/permissions-required-for-fine-grained-personal-access-tokens?apiVersion=2022-11-28#repository-permissions-for-actions.
Neither of these options is preferable, so I am going to wait until the
bug is fixed.

Note: The old version (v8.0.0) uses Node 16, which will be deprecated in
Spring 2024, instead of Node 20, so we should keep an eye on [this
issue](https://github.com/actions/stale/issues/1133) to see when they
make the fix and we can switch back to the new version.
2024-02-15 17:03:11 -08:00
rui-ren
d63c664ca0
fix rocm ci pipeline (#19525)
### Description
<!-- Describe your changes. -->

ROCm CI pipeline issue.
```
Downloading and preparing dataset wikitext/wikitext-2-raw-v1 (download: 4.50 MiB, generated: 12.91 MiB, post-processed: Unknown size, total: 17.41 MiB) to /home/onnxruntimedev/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/aa5e094000ec7afeb74c3be92c88313cd6f132d564c7effd961c10fd47c76f20...
    main()
  File "/stage/huggingface-transformers/examples/pytorch/language-modeling/run_mlm.py", line 242, in main
    datasets = load_dataset(data_args.dataset_name, data_args.dataset_config_name, cache_dir=model_args.cache_dir)
  File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/load.py", line 856, in load_dataset
    builder_instance.download_and_prepare(
  File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/builder.py", line 583, in download_and_prepare
    self._download_and_prepare(
  File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/builder.py", line 639, in _download_and_prepare
    split_generators = self._split_generators(dl_manager, **split_generators_kwargs)
  File "/home/onnxruntimedev/.cache/huggingface/modules/datasets_modules/datasets/wikitext/aa5e094000ec7afeb74c3be92c88313cd6f132d564c7effd961c10fd47c76f20/wikitext.py", line 138, in _split_generators
    data_file = dl_manager.download_and_extract(self.config.data_url)
  File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/download_manager.py", line 289, in download_and_extract
    return self.extract(self.download(url_or_urls))
  File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/download_manager.py", line 197, in download
    downloaded_path_or_paths = map_nested(
  File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 195, in map_nested
    return function(data_struct)
  File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/download_manager.py", line 220, in _download
    return cached_path(url_or_filename, download_config=download_config)
  File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/file_utils.py", line 281, in cached_path
    output_path = get_from_cache(
  File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/file_utils.py", line 634, in get_from_cache
    raise ConnectionError("Couldn't reach {}".format(url))
ConnectionError: Couldn't reach https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-raw-v1.zip

```


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Update the `datasets` pipeline to latest version `2.17.0`.
2024-02-15 00:02:08 -08:00