Commit graph

1676 commits

Author SHA1 Message Date
liqun Fu
2be4dc6d04
ONNX 1.15 integration (#17125)
### Description
this is for ORT 1.17.0 - make ORT to use ONNX release 1.15.0 branch. Eventually will update to the release tag once ONNX 1.15.0 is released


### Motivation and Context
Prepare for ORT 1.17.0 release. People can start work on new and updated ONNX ops in ORT.
---------

Signed-off-by: Liqun Fu <liqfu@microsoft.com>
2023-09-26 14:44:48 -07:00
Changming Sun
a942bbf489
Update nodejs to 18.x (#17657)
1. Upgrade nodejs from 16.x to 18.x for Windows pipelines
2. Avoid using Azure DevOps "NodeTool" on Linux. The tool installs
nodejs from internet or local disk cache. But we already moved all Linux
tests to docker. So we do not need the installer anymore.
3. Remove some other unused code.
2023-09-25 14:12:11 -07:00
PeixuanZuo
216214b7d3
[ROCm] Remove ROCm5.4.2, ROCm 5.5 and add ROCm5.7 to python package pipeline (#17668)
- Remove ROCm5.4.2, ROCm 5.5 and add ROCm5.7 to python package pipeline

- Remove redundant arg
2023-09-25 10:35:28 +08:00
PeixuanZuo
5b9cd91a9c
[ROCm] fix CI (#17648)
fix CI, follow #17621
2023-09-21 07:37:50 -07:00
Changming Sun
57dfd15d7b
Remove dnf update from docker build scripts (#17551)
### Description
1. Remove 'dnf update' from docker build scripts, because it upgrades TRT
packages from CUDA 11.x to CUDA 12.x.
To reproduce it, you can run the following commands in a CentOS CUDA
11.x docker image such as nvidia/cuda:11.8.0-cudnn8-devel-ubi8.
```
export v=8.6.1.6-1.cuda11.8
dnf  install -y libnvinfer8-${v} libnvparsers8-${v} libnvonnxparsers8-${v} libnvinfer-plugin8-${v} libnvinfer-vc-plugin8-${v}        libnvinfer-devel-${v} libnvparsers-devel-${v} libnvonnxparsers-devel-${v} libnvinfer-plugin-devel-${v} libnvinfer-vc-plugin-devel-${v} libnvinfer-headers-devel-${v}  libnvinfer-headers-plugin-devel-${v} 
dnf update -y
```
The last command will generate the following outputs:
```
========================================================================================================================
 Package                                     Architecture       Version                          Repository        Size
========================================================================================================================
Upgrading:
 libnvinfer-devel                            x86_64             8.6.1.6-1.cuda12.0               cuda             542 M
 libnvinfer-headers-devel                    x86_64             8.6.1.6-1.cuda12.0               cuda             118 k
 libnvinfer-headers-plugin-devel             x86_64             8.6.1.6-1.cuda12.0               cuda              14 k
 libnvinfer-plugin-devel                     x86_64             8.6.1.6-1.cuda12.0               cuda              13 M
 libnvinfer-plugin8                          x86_64             8.6.1.6-1.cuda12.0               cuda              13 M
 libnvinfer-vc-plugin-devel                  x86_64             8.6.1.6-1.cuda12.0               cuda             107 k
 libnvinfer-vc-plugin8                       x86_64             8.6.1.6-1.cuda12.0               cuda             251 k
 libnvinfer8                                 x86_64             8.6.1.6-1.cuda12.0               cuda             543 M
 libnvonnxparsers-devel                      x86_64             8.6.1.6-1.cuda12.0               cuda             467 k
 libnvonnxparsers8                           x86_64             8.6.1.6-1.cuda12.0               cuda             757 k
 libnvparsers-devel                          x86_64             8.6.1.6-1.cuda12.0               cuda             2.0 M
 libnvparsers8                               x86_64             8.6.1.6-1.cuda12.0               cuda             854 k
Installing dependencies:
 cuda-toolkit-12-0-config-common             noarch             12.0.146-1                       cuda             7.7 k
 cuda-toolkit-12-config-common               noarch             12.2.140-1                       cuda             7.9 k
 libcublas-12-0                              x86_64             12.0.2.224-1                     cuda             361 M
 libcublas-devel-12-0                        x86_64             12.0.2.224-1                     cuda             397 M

Transaction Summary
========================================================================================================================

```
As you can see from the output,  they are CUDA 12 packages. 

The problem can also be solved by lock the packages' versions by using
"dnf versionlock" command right after installing the CUDA/TRT packages.
However, going forward, to get the better reproducibility, I suggest
manually fix dnf package versions in the installation scripts like we do
for TRT now.

```bash
v="8.6.1.6-1.cuda11.8" &&\
    yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo &&\
    yum -y install libnvinfer8-${v} libnvparsers8-${v} libnvonnxparsers8-${v} libnvinfer-plugin8-${v} libnvinfer-vc-plugin8-${v}\
        libnvinfer-devel-${v} libnvparsers-devel-${v} libnvonnxparsers-devel-${v} libnvinfer-plugin-devel-${v} libnvinfer-vc-plugin-devel-${v} libnvinfer-headers-devel-${v}  libnvinfer-headers-plugin-devel-${v}
```
When we have a need to upgrade a package due to security alert or some
other reasons, we manually change the version string instead of relying
on "dnf update". Though this approach increases efforts, it can make our
pipeines more stable.

2. Move python test to docker
### Motivation and Context
Right now the nightly gpu package mixes using CUDA 11.x and CUDA 12.x
and the result package is totally not usable(crashes every time)
2023-09-21 07:33:29 -07:00
Pranav Sharma
038c76378f
Include onnxruntime_float16.h in the package. (#17637)
### Description
Include onnxruntime_float16.h in the package.

### Motivation and Context
This was missed in the recently released 1.16 pkgs (except Nuget).
2023-09-21 00:08:10 -07:00
PeixuanZuo
1f991f27f1
[ROCm] add manylinux build test for ROCm CI (#17621)
manylinux build is used for nightly packaging generation and it's hard
to capture issue in time when related files change. This PR add
manylinux build in CI.
2023-09-21 10:45:16 +08:00
Changming Sun
dd561f2015
Upgrade sympy (#17639)
AB#17015
2023-09-20 18:44:23 -07:00
Yulong Wang
d522cc7cc4
Update npm-packaging-pipeline.yml to always use artifacts from main branch (#17604)
### Description
Update npm-packaging-pipeline.yml to always use artifacts from main
branch
2023-09-19 14:42:08 -07:00
Wei-Sheng Chin
068300d97e
Pin beartype version (#17599)
PyTorch doesn't like the latest beartype:
https://github.com/pytorch/pytorch/pull/109510
2023-09-18 19:31:04 -07:00
Yi Zhang
7116e66c4b
Improve Win QNNEP pipeline (#17586)
### Description
1. use standard win build template
2. enable compiler cache

### Motivation and Context
Make win build task easy to maintain and accelerate the pipeline.
2023-09-19 07:36:17 +08:00
Yi Zhang
377f959c69
Run Final_Jar_Testing_Linux_GPU in docker (#17533)
### Description
1. Create a package test image based on [RedHat
UBI](https://www.redhat.com/en/blog/introducing-red-hat-universal-base-image)
2. Install TensorRT 8.6.1.6 in RedHat. (Ref.
https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html#maclearn-net-repo-install-rpm)
3. Run Final_Jar_Testing_Linux_GPU in docker (base image:
nvidia/cuda:11.8.0-cudnn8-devel-ubi8)

### Motivation and Context

[AB#18470](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/18470)

### Verification

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=354004&view=logs&j=8939b564-1402-57b5-92dc-510eba75e069&t=8939b564-1402-57b5-92dc-510eba75e069
2023-09-15 08:35:55 -07:00
Yulong Wang
7af2f68ef3
[js/web] add a test flag to customize chromium flags (#17545)
### Description
add a test flag to customize chromium flags.

Usage:
npm test -- \<other flags> --chromium-flags=<...>
2023-09-14 10:05:31 -07:00
Changming Sun
5d3786206b
Fix ROCM's nightly build (#17518)
### Description
PR 15470 updated some C/C++ dependencies. The change caused ROCM EP's
nightly build to fail. see issue
https://github.com/ROCm-Developer-Tools/HIP/issues/2082 for a
background. So, the root cause is HIP compiler has a special requirement
that HIP's include dirs must be used before the operating system's
include folder: /usr/include. HIP adds "-isystem" in front of
"/usr/include". gcc or clang will search the folders added with "-I"
first, then the "-isystem" folder. It works fine as long as we do not
add "-I/usr/include" to the compile commands for *.cu files. It would be wrong if
we already have installed an open source library to /usr and want to use the
prebuilt library from there instead of the current build dir. 


### Motivation and Context
2023-09-13 08:50:14 -07:00
Yi Zhang
c0a4fe777f
Move Linux python test into docker (#17479)
### Description
supplement of #17417



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-09-13 15:21:28 +08:00
rui-ren
b52127d22d
update acpt image for the training ci nightly (#17521)
### Description
<!-- Describe your changes. -->

The name of nightly ACPT image has been updated to
`ptebic.azurecr.io/internal/aifx/acpt/nightly-ubuntu-cuda-torch-dev`

As the previous image alias had `cu118`, `torch210dev` or `py38`, any
version update will break the training nightly pipeline



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Using constant image alias to avoid pipeline failure.
2023-09-12 22:32:20 -07:00
Changming Sun
9b755dce9f
Delete all Prefast tasks (#17522)
### Description
Delete all Prefast tasks because the new VS 17.7 version crashes every
time when we run the task on our CI build servers. However, we cannot
reproduce it locally. And this problem blocks us installing security
patches to our CI build machines.

Will use [CodeQL](https://codeql.github.com/) instead. 

### Motivation and Context
Address some security alerts.
2023-09-12 17:40:49 -07:00
Edward Chen
cf672c5887
Use name of temporary provisioning profile. (#17459)
The old provisioning profile no longer works. Switched to a temporary one that we can use before a new one is available. The temporary one has a different name.
2023-09-12 10:56:35 -07:00
Adrian Lizarraga
f20e475e67
[QNN EP] Update QNN SDK to version 2.14.1 (#17467)
### Description
Updates the version of QNN SDK used by CI Pipelines. Enables some tests
fixed by 2.14.1, but still need to look into Resize in a separate PR.

### Motivation and Context
Test latest version of QNN SDK.
2023-09-11 21:07:50 -07:00
Yulong Wang
850baced33
[web] a few updates to web pipeline (#17485)
### Description

Update the Web CI pipelines:

- remove parameter 'WebTemplate': Since we start to support webgpu, the
linux-web-ci.yml is no longer working and it is already out-of-date.
remove this file and parameter so that we always use win-web-ci.yml

- change flag `RunWebGpuTests` into 2 flags, for release and debug.
Currently for CI we only run webgpu tests on release build. But we want
to have the capability to run webgpu tests on debug build as well.


After this PR is merged, next step is to enable both Debug and Release
webgpu tests in PostMerge pipeline.
2023-09-11 11:43:42 -07:00
Caroline Zhu
dcc93909b4
Add training WASM generation to Web CI pipeline (#17319)
### Description
[Successful pipeline
run](https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1123141&view=results)

Added flag to build the training artifacts & updated the
pull-wasm-artifacts script to pull the training artifacts as well.

Bundled into this PR are minor formatting fixes + naming fixes.

### Motivation and Context
[This PR](https://github.com/microsoft/onnxruntime/pull/16521) extended
the WASM API wrapper to build training WASM artifacts as well.
The ORT training WASM artifacts are required to support ORT training web
bindings.
2023-09-08 15:49:47 -07:00
Changming Sun
bc84f52633
Update C/C++ dependencies: abseil, date, nsync, googletest, wil, mp11, cpuinfo and safeint (#15470)
### Description
Update C/C++ dependencies abseil, date, nsync, googletest, wil, mp11,
cpuinfo and safeint to newer versions per request of @
mayeut. He created the following PRs to update the deps:
https://github.com/microsoft/onnxruntime/pull/15432
https://github.com/microsoft/onnxruntime/pull/15434
https://github.com/microsoft/onnxruntime/pull/15435
https://github.com/microsoft/onnxruntime/pull/15436
https://github.com/microsoft/onnxruntime/pull/15437

However, our build system needs to fetch the dependencies from an
internal mirror that only Microsoft employees have write access to. So I
closed his PRs and created this one.

This PR also updates abseil to a newer version. This is to prepare for
upgrading re2.
2023-09-08 13:35:04 -07:00
Ashwini Khade
c5dbd5c919
Updates to training pipelines (#17292) 2023-09-08 11:57:12 -07:00
Yi Zhang
ae74a517b6
Run Nuget_Test_Linux_GPU in container (#17452)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

### Verification

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=351542&view=results
2023-09-08 13:41:20 +08:00
Yi Zhang
0a3eb60b01
Fix Bug: Step failed but not exited with error (#17442)
### Description
Add "set -ex" in the script.


### Motivation and Context
Build failed but it still passed.

https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1132003&view=logs&j=7536d2cd-87d4-54fe-4891-bfbbf2741d83&t=39e3f98f-7fe5-578c-20bd-5ae5a4590bda
2023-09-07 14:33:31 +08:00
Changming Sun
b38fb0da06
Revert the yaml file changes in "Nodejs_Packaging_CPU" build job (#17441)
### Description
The yaml file changes made in #16050 do not really work. Currently the
pipeline is failing with error:
```
Error: Not found SourceFolder: C:\a\_work\5\b\RelWithDebInfo\RelWithDebInfo\nuget-artifacts\onnxruntime-win-x64\lib
```

So, I will revert the yaml changes first to bring the pipeline back.
Some people are waiting for our nightly packages.

Test run:
https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=351104&view=results

### Motivation and Context
2023-09-06 20:20:55 -07:00
Yi Zhang
ede339f304
Move dotnet build and test into docker in Linux CPU CI (#17417)
### Description
install dotnet 6.0 in the docker image.
move C# build and test into docker.

### Motivation and Context

### Note
The Unit tests and Symbolic shape infer's migration will be in another
PR.
2023-09-07 09:28:16 +08:00
Edward Chen
a3a1237270
Disable xcpretty filtering of xcodebuild output in iOS packaging pipeline. (#17429) 2023-09-06 09:04:17 -07:00
Changming Sun
c6b0d185b4
Update cmake to 3.27 and upgrade Linux CUDA docker files from CentOS7 to UBI8 (#16856)
### Description
1. Update docker files and their build instructions.
ARM64 and x86_64 can use the same docker file.

2. Upgrade Linux CUDA pipeline's base docker image from CentOS7 to UBI8
AB#18990
2023-09-05 18:12:10 -07:00
aciddelgado
44101e8771
Flash Attention v2 MHA (#17227)
### Description
Integrate Flash Attention V2 to PackedMultiHeadAttention,
MultiHeadAttention and Attention operators.

Flash Attention v2 source code is from
https://github.com/Dao-AILab/flash-attention/tree/main/csrc/flash_attn/src.
We did some change to remove dependency on Torch, then removed backward
and bfloat16 related code.

Add benchmark script (see benchmark_mha.sh) to compare different
attention kernels for MultiHeadAttention operator.

Current limitations for Flash Attention in PackedMultiHeadAttention,
MultiHeadAttention and Attention operators:
* Relative Position Bias is not supported
* Different hidden size for Q and V is not supported
* Only float16 is supported
* Padding/attention mask is not supported
* For MultiHeadAttention, when there is past or present input, bias
shall be provided to activate flash attention
* For Attention, past or present inputs will deactivate flash attention
* Causal is not supported

Some limitations (like attention mask and causal) might be removed
later.

Currently, Flash Attention v2 only works in Linux. For Windows, we will
enable later with Cutlass 3.2.

Two environment variables can be used for testing purpose:
(1) `ORT_DISABLE_FLASH_ATTENTION` to disable flash attention. Default
value is 0 (enable). Set it to "1" to disable it.
(2) `ORT_MIN_SEQ_LEN_FLASH_ATTENTION_PACKED_QKV`. Default value is
"513", which means that we only enable flash attention when sequence
length is larger than 512 for packed QKV format. Set it to "0" if you
want to use flash attention v2 whenever possible.

### Speedup

The following result is from Standard_ND96amsr_A100_v4 VM
(A100-SXM4-80GB GPU) using benchmark_mha.sh. The metric is TFLOPs per
second for MultiHeadAttention operator.

There are 3 input formats:
* `Q,K,V` means separated inputs query, key and value of BxSxNH
* `Q,KV` means packed KV, where key is 5D: BxSxNx2xH
* `QKV` means packed QKV, where query is 5D: BxSxNx3xH

Note that flash attention cannot use packed QKV format, so extra
Transpose is needed. We found that TensorRT kernel is faster for
sequence length <= 512 for packed QKV. The reason might be no transpose
is needed for TensorRT kernel in this format.

We also notice that, TensorRT kernel is faster for stable diffusion
512x512 image (see seq_len=4096, heads=8, head_dim=40 below), while
flash attention v2 is faster for 1024x1024 image (see seq_len=16384,
heads=8, head_dim=40 below).

input format | batch size | sequence length | heads | head dim |
flash_v2 (TFLOPs/s) | TensorRT (TFLOPs/s) | Memory Efficient Attention
(TFLOPs/s)
-- | -- | -- | -- | -- | -- | -- | --
Q,K,V | 32 | 512 | 64 | 32 | 78.1 | 60.0 | 39.3
Q,K,V | 32 | 512 | 128 | 16 | 46.8 | 44.1 | 21.7
Q,K,V | 16 | 1024 | 64 | 32 | 99.0 | 72.8 | 44.3
Q,K,V | 16 | 1024 | 128 | 16 | 54.7 | 49.2 | 23.4
Q,K,V | 8 | 2048 | 64 | 32 | 113.8 | 81.2 | 47.8
Q,K,V | 8 | 2048 | 128 | 16 | 59.7 | 51.9 | 24.7
Q,K,V | 4 | 4096 | 64 | 32 | 122.5 | 85.6 | 49.7
Q,K,V | 4 | 4096 | 128 | 16 | 62.5 | 53.3 | 25.3
Q,K,V | 2 | 8192 | 64 | 32 | 127.4 | 87.5 | 50.7
Q,K,V | 2 | 8192 | 128 | 16 | 64.0 | 54.2 | 25.6
Q,K,V | 1 | 16384 | 64 | 32 | 129.5 | 91.0 | 51.2
Q,K,V | 1 | 16384 | 128 | 16 | 64.7 | 54.5 | 25.8
Q,K,V | 1 | 4096 | 8 | 40 | 51.0 | 43.6 | 36.8
Q,K,V | 1 | 4096 | 8 | 80 | 97.7 | 77.0 | 55.5
Q,K,V | 1 | 4096 | 8 | 160 | 120.0 | 39.7 | 57.8
Q,K,V | 4 | 4096 | 8 | 40 | 89.0 | 84.4 | 49.2
Q,K,V | 4 | 4096 | 8 | 80 | 133.0 | 92.2 | 63.2
Q,K,V | 4 | 4096 | 8 | 160 | 164.8 | 42.7 | 63.8
Q,K,V | 1 | 16384 | 8 | 40 | 96.9 | 91.3 | 52.1
Q,K,V | 1 | 16384 | 8 | 80 | 142.9 | 101.5 | 65.6
Q,K,V | 1 | 16384 | 8 | 160 | 177.4 | 44.2 | 65.7
Q,K,V | 128 | 128 | 12 | 64 | 29.0 | 26.9 | 25.7
Q,K,V | 64 | 128 | 12 | 64 | 23.1 | 10.8 | 21.3
Q,K,V | 128 | 384 | 12 | 64 | 83.5 | 60.8 | 55.7
Q,K,V | 64 | 384 | 12 | 64 | 72.6 | 40.5 | 52.8
Q,K,V | 128 | 512 | 12 | 64 | 98.9 | 77.9 | 62.1
Q,K,V | 64 | 512 | 12 | 64 | 94.7 | 75.6 | 60.4
Q,KV | 32 | 512 | 64 | 32 | 85.9 | 41.1 | 41.1
Q,KV | 32 | 512 | 128 | 16 | 47.1 | 21.6 | 21.6
Q,KV | 16 | 1024 | 64 | 32 | 104.4 | 45.8 | 45.8
Q,KV | 16 | 1024 | 128 | 16 | 54.7 | 23.6 | 23.6
Q,KV | 8 | 2048 | 64 | 32 | 116.8 | 48.5 | 48.5
Q,KV | 8 | 2048 | 128 | 16 | 59.8 | 24.7 | 24.7
Q,KV | 4 | 4096 | 64 | 32 | 124.2 | 50.1 | 50.1
Q,KV | 4 | 4096 | 128 | 16 | 62.6 | 25.3 | 25.3
Q,KV | 2 | 8192 | 64 | 32 | 128.5 | 50.8 | 50.9
Q,KV | 2 | 8192 | 128 | 16 | 64.1 | 25.6 | 25.6
Q,KV | 1 | 16384 | 64 | 32 | 129.4 | 51.2 | 51.2
Q,KV | 1 | 16384 | 128 | 16 | 64.8 | 25.8 | 25.8
Q,KV | 1 | 4096 | 8 | 40 | 67.5 | 37.7 | 37.5
Q,KV | 1 | 4096 | 8 | 80 | 101.3 | 56.7 | 56.6
Q,KV | 1 | 4096 | 8 | 160 | 124.0 | 58.6 | 58.6
Q,KV | 4 | 4096 | 8 | 40 | 90.8 | 49.8 | 49.8
Q,KV | 4 | 4096 | 8 | 80 | 135.6 | 63.8 | 63.8
Q,KV | 4 | 4096 | 8 | 160 | 166.3 | 64.5 | 64.5
Q,KV | 1 | 16384 | 8 | 40 | 97.5 | 52.3 | 52.3
Q,KV | 1 | 16384 | 8 | 80 | 143.5 | 65.9 | 65.8
Q,KV | 1 | 16384 | 8 | 160 | 178.4 | 65.9 | 65.8
Q,KV | 128 | 128 | 12 | 64 | 26.8 | 48.1 | 30.9
Q,KV | 64 | 128 | 12 | 64 | 28.0 | 38.9 | 25.0
Q,KV | 128 | 384 | 12 | 64 | 97.7 | 61.1 | 61.0
Q,KV | 64 | 384 | 12 | 64 | 89.5 | 57.8 | 57.9
Q,KV | 128 | 512 | 12 | 64 | 111.9 | 66.7 | 66.9
Q,KV | 64 | 512 | 12 | 64 | 107.2 | 64.9 | 64.8
QKV | 32 | 512 | 64 | 32 | 77.2 | 84.7 | 39.3
QKV | 32 | 512 | 128 | 16 | 43.4 | 53.1 | 20.9
QKV | 16 | 1024 | 64 | 32 | 98.8 | 87.4 | 44.6
QKV | 16 | 1024 | 128 | 16 | 52.0 | 54.1 | 23.2
QKV | 8 | 2048 | 64 | 32 | 113.1 | 89.0 | 47.9
QKV | 8 | 2048 | 128 | 16 | 58.2 | 54.6 | 24.5
QKV | 4 | 4096 | 64 | 32 | 120.6 | 89.7 | 49.7
QKV | 4 | 4096 | 128 | 16 | 61.7 | 54.6 | 25.2
QKV | 2 | 8192 | 64 | 32 | 125.9 | 89.5 | 50.7
QKV | 2 | 8192 | 128 | 16 | 63.6 | 54.8 | 25.5
QKV | 1 | 16384 | 64 | 32 | 128.5 | 92.0 | 51.2
QKV | 1 | 16384 | 128 | 16 | 64.6 | 54.8 | 25.7
QKV | 1 | 4096 | 8 | 40 | 60.2 | **69.8** | 38.1
QKV | 1 | 4096 | 8 | 80 | 101.6 | 75.2 | 56.7
QKV | 1 | 4096 | 8 | 160 | 130.2 | 41.2 | 58.4
QKV | 4 | 4096 | 8 | 40 | 90.6 | **91.0** | 49.5
QKV | 4 | 4096 | 8 | 80 | 133.6 | 98.1 | 62.8
QKV | 4 | 4096 | 8 | 160 | 165.3 | 43.7 | 63.9
QKV | 1 | 16384 | 8 | 40 | 97.2 | 92.8 | 52.1
QKV | 1 | 16384 | 8 | 80 | 143.0 | 103.1 | 65.6
QKV | 1 | 16384 | 8 | 160 | 177.6 | 44.5 | 65.7
QKV | 128 | 128 | 12 | 64 | 31.1 | 65.9 | 27.6
QKV | 64 | 128 | 12 | 64 | 26.1 | 49.8 | 23.5
QKV | 128 | 384 | 12 | 64 | 84.6 | 88.5 | 56.1
QKV | 64 | 384 | 12 | 64 | 79.1 | 80.3 | 53.5
QKV | 128 | 512 | 12 | 64 | 97.3 | 114.2 | 62.2
QKV | 64 | 512 | 12 | 64 | 95.9 | 110.7 | 60.6
QKV | 4 | 2048 | 32 | 128 | 125.26 | 44.72 | 78.15
QKV | 4 | 4096 | 32 | 128 | 141.62 | 46.29 | 85.84
QKV | 8 | 2048 | 32 | 128 | 127.40 | 45.49 | 78.75
QKV | 8 | 4096 | 32 | 128 | 144.24 | 46.60 | 86.95

### Known Issues

NVCC uses huge memory while compiling flash attention CUDA kernel. Linux
build with CUDA might fail when machine has limited memory while number
of CPUs is large. Walkaround is to use a build machine with larger
memory, or use argument like `--nvcc_threads 1` to limit nvcc threads in
build.

### Motivation and Context
Increases speed and efficiency of MHA or Packed MHA.

---------

Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
Co-authored-by: tlwu@microsoft.com <tlwu@a100.crj0ad2y1kku1j4yxl4sj10o4e.gx.internal.cloudapp.net>
2023-08-31 13:52:21 -07:00
Rachel Guo
b54619509f
Refine build script for adding disable selected data types option (#17284)
### Description
<!-- Describe your changes. -->

As title. 

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Now we have multiple data types that we want to disable for minimal
build and to reduce binary size. may be worth adding an argument in the
build script for specifying that.

Also for fp16 type stuff, it may be too restrict to disable that for all
minimal build.

---------

Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>
2023-08-31 13:32:55 -07:00
Yi Zhang
507a40e1e9
Add compiler cache in Linux GPU TensorRT CI. (#17348)
### Description
Add the compiler cache in linux GPU tensorRT CI.
Save about 30 minutes in the GPU machine. (52 minutes -> 24 minutes)

PS. 
There're only white-space differences in the dockerfile.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-08-31 08:13:26 +08:00
Jian Chen
081c0692a4
Update to nodejs version from 16 to 18.17.1 (#17351)
### Description
Update to nodejs version from 16 to 18.17.1



### Motivation and Context
Nodejs will reach EOL in September 2023
2023-08-30 12:41:48 -07:00
Changming Sun
71da0824f3
Upgrade binskim and fix an error in nuget packaging pipeline (#17340)
### Description
Upgrade binskim and fix an error in nuget packaging pipeline.
2023-08-30 07:52:06 -07:00
Jian Chen
922629aad8
Upgrade Centos7 to Alamlinux8 (#16907)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Get the latest gcc 12 by default

---------

Co-authored-by: Changming Sun <chasun@microsoft.com>
2023-08-29 21:05:36 -07:00
Yi Zhang
d4a61ac71f
Pr trggiers generated by code (#17247)
### Description
1. Refactor the trigger rules generation.
2. Skip all doc changes in PR pipelines.


### Motivation and Context 
Make all trigger rules generated by running set-trigger-rules.py to
reduce inconsistences.
It's easily to make mistakes to copy&paste manually. 

For example: these 2 excludes are different, Why?

4e6cec4d09/tools/ci_build/github/azure-pipelines/linux-ci-pipeline.yml (L16-L18)


4e6cec4d09/tools/ci_build/github/azure-pipelines/linux-gpu-ci-pipeline.yml (L27-L29)


### Note
All changes in workflow yamls are generated by code.
Please review the **skip-js.yml, skip-docs.yml and
set-trigger-rules.py**.

@fs-eire, please double check the 
filter rules in skip-js.yml
and the skipped workflows

7023c2edff/tools/ci_build/set-trigger-rules.py (L14-L41)
2023-08-30 05:57:03 +08:00
Yi Zhang
0e9e9b2a67
Fix one exception in post merge (#17327)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-08-29 19:24:50 +08:00
cloudhan
bf8b1681f9
Build nuget pkg for ROCm (#16791)
Add nuget pkg building and publishing for ROCm EP

---------
Co-authored-by: Yi Zhang <zhanyi@microsoft.com>
2023-08-28 13:35:08 +08:00
Yifan Li
808215366d
Fix Multi GPU TensorRT tests (#17269)
### Description
* Integrate `trt_multi_gpu` test stage in ORT post merge CI (Win-2xA10
vm)
* Deprecate Linux MultiGPU TRT CI (This vm will be deprecated soon)
* Add multi gpu support to existing C# test cases
* Deprecate unfunctional flag `--enable_multi_device_tests`

### Motivation and Context
* Two contexts of replacing Linux MultiGPU TRT CI:
* Flag `--enable_multi_device_tests` is not functional, which cannot
detect issues like #17036
* The Linux-2xM60 VM of this CI pool is about to be deprecated 9/6/23.
Need to enable this test in other dualGPU vm pool.
2023-08-25 20:30:45 -07:00
Arthur Islamov
c262879214
Added DML and CUDA provider support in onnxruntime-node (#16050)
### Description
I've added changes to support CUDA and DML (only on Windows, on other
platforms it will throw an error)



### Motivation and Context
It fixes this feature request
https://github.com/microsoft/onnxruntime/issues/14127 which is tracked
here https://github.com/microsoft/onnxruntime/issues/14529

I was working on StableDiffusion implementation for node.js and it is
very slow on CPU, so GPU support is essential.

Here is a working demo with a patched and precompiled version
https://github.com/dakenf/stable-diffusion-nodejs

---------
2023-08-25 16:57:06 -07:00
Yi Zhang
9cd33e07b4
Readd Tests in Window GPU Reduced Ops workflow (#17294)
### Description
Add single test step in Window GPU Reduced Ops workflow


### Motivation and Context
The old workflow's building and testing were running in one command.
In PR #17263, the test step was removed by mistake.
So, readd it.
How to consolidate the test step is in consideration.
2023-08-25 15:56:59 +08:00
Yi Zhang
756eda2cc4
Windows CI build steps template (#17263)
### Description
1. New windows ci build steps template.
2. Remove useless variables.

### Motivation and Context
1. Make it easier to apply build cache to all windows CIs.
2. Other team's devs only need to take care of build options


###Comparision
Before: 

9f21f694cf/tools/ci_build/github/azure-pipelines/win-gpu-tensorrt-ci-pipeline.yml (L19-L82)

After:
b4c1f2261b/tools/ci_build/github/azure-pipelines/win-gpu-tensorrt-ci-pipeline.yml (L35-L54)
2023-08-25 05:58:49 +08:00
Jian Chen
33415b9da4
Removing 10.14 suffix from osx nuget package (#17277)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-08-24 08:51:54 -07:00
cloudhan
87bef1f3f2
Move composable_kernel to deps.txt (#17245) 2023-08-23 17:39:16 -07:00
Yi Zhang
61a79436e2
Common pre-build steps of Windows CI (#16970)
### Description
Unify some pre-build common steps.

### Motivation and Context
In the long run, other devs should only focus on build option and test
commands.
It would reduce mistakes and maintenance cost to use common template
steps.
There will be more PRs to achieve the goal.
2023-08-22 18:09:55 +08:00
cloudhan
4e6cec4d09
Update ck and enable test (#16383)
Apply the fix in https://github.com/ROCmSoftwarePlatform/composable_kernel/issues/728
Introduce more kernel instances and allow the introduction of streamk and splitk.
2023-08-22 11:08:55 +08:00
Baiju Meswani
aae9a52e8b
Avoid pushing cpu package to https://download.onnxruntime.ai/ (#17238) 2023-08-21 15:47:07 -07:00
Changming Sun
e2b6827a59
Add a CUDA 12.x pipeline and improve install_third_party_deps.ps1 (#17231)
### Description
1. Add a CUDA 12.x pipeline
2. Improve install_third_party_deps.ps1: avoid using Start-process.
Directly call the command instead.

### Motivation and Context
Since our official packages and all CI pipelines still use CUDA 11.x, we need extra pipelines to validate our source code level compatibility with CUDA 12.x. BTW for sure the prebuilt binaries in our release page are not compatible with CUDA 12.x. Do not report bugs for that. 

AB#15152
2023-08-21 13:04:36 -07:00
Chi Lo
9445539e2c
Update dependency for deps.txt (#17220)
https://github.com/microsoft/onnxruntime/pull/17059 updates deps.txt and
we also need to update cgmanifest.json and upload the files to Azure
DevOps


https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=342803&view=results
for testing
2023-08-19 00:43:25 -07:00
Edward Chen
d6cd41cfc1
[CoreML EP] Add Shape, Gather, and Slice ops (#17153)
Add CoreML EP shape related ops:
- Shape
- Gather
- Slice

Add support for int64/int32 inputs in CoreML EP.
2023-08-18 22:34:34 -07:00
Yulong Wang
3426954525
disable browser stack tests (#17224)
### Description
disable browser stack tests
2023-08-18 17:14:12 -07:00
Changming Sun
6db72165eb
Fix python packaging test pipeline (#17204)
### Description
1. Fix python packaging test pipeline. There was an error in
tools/ci_build/github/linux/run_python_tests.sh that it installed a
released version of onnxruntime python package from pypi.org to run the
test. Supposedly it should pick one from the current build.
2. Refactor the pipeline to allow choosing cmake build type from the web
UI when manually trigger a build. Now this feature is for Linux only.
Because I don't want to change too much when we are about to cut a
release branch. After that I will expand it to all platforms. This
feature is useful for debugging pipeline issues, also, we may consider
having a nightly pipeline to run all tests in Debug mode which may catch
extra bugs because in debug mode we can enforce range check.

Test run:
https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=342674&view=results

### Motivation and Context
Currently the pipeline has a crash error. 

AB#18580
2023-08-18 14:51:26 -07:00
Adrian Lizarraga
6ee4be724b
Update LICENSE name in NuGet packaging pipelines (#17183)
### Description
Updates NuGet packaging pipelines to use the correct license name.

### Motivation and Context
The license name changed. See https://github.com/microsoft/onnxruntime/pull/17170
The QNN_Windows_Nuget and Zip-Nuget-* pipelines will not run without this update.
2023-08-17 22:22:19 -07:00
Changming Sun
0cccbcc47b
Move DML build job's Prefast task to a CPU machine pool (#17192)
### Description
Move DML build job's Prefast task to a CPU machine pool which has larger
memory. The current one runs out of memory in every run.

### Motivation and Context
To fix the broken python packaging pipeline.
2023-08-17 13:16:29 -07:00
Jian Chen
e0022d061f
Set web-ci-pipeline.yml only triggered when related fields are updated (#17148)
- 'js/web'
    - 'js/node'
    - 'onnxruntime/core/providers/js'
    is updated

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-08-17 12:55:35 -07:00
Adrian Lizarraga
96b1ff610b
Add CI and PR validation triggers to QNN Windows x64 Pipeline yaml (#17178)
### Description
Adds continuous integration and pull-requestion validation triggers
directly to the yaml file for the Windows x64 QNN CI Pipeline.


### Motivation and Context
There have been various unit tests failures that break the
QNN_Windows_Nuget pipeline, which builds QNN EP for Windows x64. This PR
ensures that QNN EP is built and tested on a Windows x64 image for every
pull request.
2023-08-16 11:44:54 -07:00
Jian Chen
8998b6811d
Fix NPM Packaging Pipeline (#17182)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-08-15 22:56:38 -07:00
Adam Louly
c647e3e8ab
Run nightly pipeline tests from the commit id. (#17162)
### Description

The onnxruntime-CI-nightly-ort-pipeline encounters occasional failures
due to synchronization discrepancies between the ACPT nightly image and
the repository. We are addressing this by executing tests using the
commit ID associated with the ort build within the ACPT image.

---------

Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2023-08-15 12:07:38 -07:00
Changming Sun
8e203efc69
Cleanup cmake file (#17154)
### Description
1. Clean up cmake files. Remove some unused code
2. Remove the "Semmle" task from
tools/ci_build/github/azure-pipelines/templates/win-ci.yml. Semmle is
deprecated and replaced by CodeQL.
2023-08-15 10:51:33 -07:00
Changming Sun
2a22325005
Explicitly set JDK version when building ORT java package (#17147)
### Description
Explicitly set JDK version when building ORT java package. This is to fix an internal build error.
2023-08-15 10:36:05 -07:00
Adrian Lizarraga
b734db1924
[QNN EP] Fix CI build on Windows x64 pipelines (#17152)
### Description
- Disables Resize tests that use nearest mode on QNN CPU.
- Fixes indentation problems on yaml for win x64 qnn pipeline.


### Motivation and Context
The QNN windows Nuget pipeline does not run due to failing unit tests on
Windows x64. These tests should not be enabled until we determine the
rounding behavior of QNN's ResizeNearestNeighbor operator.
2023-08-14 21:03:14 -07:00
Baiju Meswani
289600b47d
ONNX Runtime training cpu package name for ADO (#17109) 2023-08-14 11:32:35 -07:00
PeixuanZuo
be2200c00b
[ROCm] fix python package pipeline (#17136)
ROCm python package pipeline failed because this
PR(https://github.com/microsoft/onnxruntime/pull/16325) changed onnx
version to a commit and we need to build onnx from source. Low protobuf
version will cause build errors.
This PR remove `cmake ` and `protobuf ` from Dockerfile, these two will
install by `install_os_deps.sh`.
2023-08-14 11:22:43 -07:00
Jian Chen
45f52987a2
Web CI Pipeline Isolation (#17005)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-08-14 10:37:37 -07:00
Jian Chen
68ea9631af
Fix typo onnxruntimecpubuilpython (#17120)
### Description
The correct name should be  onnxruntimecpubuildpython



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
2023-08-14 08:34:43 -07:00
Changming Sun
4728f20f9a
Fix CI build (#17118)
### Description
Some pipelines are failing. It is because PR #16325 set ONNX version to
`rel-1.14.1` . It is a branch name, not a commit or tag name. It means
whenever the branch got a new commit, we will auto pick it and use it.
2023-08-11 10:56:38 -07:00
Edward Chen
e7e974b23f
Use double quotes so variable gets expanded. (#17105) 2023-08-11 09:05:41 -07:00
Hector Li
344c41fdb9
[QNN EP] Update QNN to v2.13 (#17079)
### Description
Update QNN SDK to v2.13, update some UTs accordingly
2023-08-10 20:47:55 -07:00
Yulong Wang
9cd4e5af68
[wasm] upgrade emsdk to 3.1.44 (#17069)
### Description
This change upgrade emsdk to 3.1.44.

Because backend is upgraded to LLVM 16, so need to fix a lot of build
failures caused by "-Wshorten-64-to-32".

most of the build failures comes from generated `onnx.pb.h`, and this
can be fixed by including "core/graph/onnx_protobuf.h", which detects
and ignore shorten-64-to-32 warnings.
2023-08-10 16:08:36 -07:00
Bowen Bao
6986981482
Bump ONNX version (#16325)
### Description
Bump ONNX version to https://github.com/onnx/onnx/tree/rel-1.14.1 to
include a fix for segfault when shape inferencing nested onnx functions.



### Motivation and Context
Resolves #16170
2023-08-10 11:27:28 -07:00
PeixuanZuo
12837ba5c7
[ROCm] Update CI based on ubuntu 22.04 (#17076)
- Update ROCm version to ROCm5.6
- Update CI based on ubuntu 22.04
2023-08-10 09:51:29 -07:00
RandySheriffH
a7542f48d6
Make AzureEP default for python and c# packaging (#17025)
Make AzureEP default for python and c# packaging, with UT.

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-08-09 12:36:52 -07:00
Yulong Wang
56bced0581
[js/web] enable webgpu in browser unit test (#16310)
### Description
enable webgpu in browser unit test.

The CI pipeline uses Edge v113+ which enables WebGPU.

===

**UPDATE on 08/07/2023:**
- add flags to Edge browser launch commandline so that Edge on CI agents
can initialize WebGPU correctly.
- ONLY enable webgpu on web release build. Other pipelines are using
flag `-b=wasm,webgl,xnnpack` to specify the other 3 backends explicitly.
- disable "Resize" related test failures. Once they are fixed the tests
can be re-enabled.

---------

Co-authored-by: Satya Jandhyala <satya.k.jandhyala@gmail.com>
2023-08-08 11:45:04 -07:00
Edward Chen
50719d2f8e
[iOS] Add script to get simulator device info. (#17012)
Add script to get iOS simulator device info so we don't need to use hardcoded specifiers which may or may not refer to a valid simulator device.

Add use-xcode-version step to a packaging pipeline so it uses a consistent version of Xcode.
2023-08-08 09:04:06 -07:00
Baiju Meswani
249917a093
Add mac and windows python packages for onnxruntime-training (#16993) 2023-08-07 20:32:55 -07:00
Yifan Li
d6ce43db5e
[EP Perf] MemTest: Add Valgrind and fix addressSanitizer (#16930)
### Description
1. Add valgrind to existing ep_perf CI MemTest and parse ORT-TRT memLeak
details
1. General Valgrind logs and logs related to ORT-TRT will be parsed in
[CI
artifacts](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=334122&view=artifacts&pathAsName=false&type=publishedArtifacts)
      1. Logic:
1. Run valgrind with `onnxruntime-perf-test -e tensorrt` and export log
to `valgrind.log`
         2. Identify if any `definitely lost` memleak happened
1. For log paragraphs which show `definitely lost`, parse if they have
keyword `TensorrtExecutionProvider`.
2. If so, extract these details to `ort_trt_memleak_detail.log`, and
return `build failure` to EP Perf CI
3. Fix existing addressSanitizer and sync the squeezenet testcase with
latest update from
[ort-inference-example](https://github.com/microsoft/onnxruntime-inference-examples/blob/main/c_cxx/squeezenet/main.cpp)
1. Updates in short: Upgrade main.cpp to be using
OrtTensorRTProviderOptionsV2
4. Reorder the 7-min-MemTest to be ahead of 9-hr-model-tests, and enable
MemTest by default
2023-08-04 16:58:57 -07:00
Yulong Wang
5af8774a0b
[build] do init and precheck first (#16961)
### Description
This change allows Web CI to do some check as the first step, so that if
there are errors it won't launch the task to build web assembly, which
is heavy.

Checks includes:
- "npm ci" in /js, /js/common and /js/web. this implicitly include:
    - typescript compiler in /js
    - typescript compiler in /js/common
    - webpack build in /js/common
    - typescript compiler in /js/web
- ESLint on typescripts
- clang-format formatter (.js, .ts, .cc, .h, .mm)
- Prettier formatter (.json, .jsonc, .md)

---------

Co-authored-by: Caroline Zhu <carolinezhu@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2023-08-04 16:44:45 -07:00
Yi Zhang
555414f1aa
Set PR trigger rules (#16987)
### Description
Add a script to insert the trigger rules to workflow yamls.
First step, skipp windows gpu and linux gpu workflow when there's only
doc change

### Motivation and Context
Make skipping workflows for doc change easily.

[AB#18201](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/18201)
2023-08-04 08:21:07 -07:00
Edward Chen
06096fcb31
Hardcode xcodebuild destination iOS simulator OS to 16.4. (#16982) 2023-08-03 14:49:54 -07:00
Dmitri Smirnov
bd4d011142
[C#] Rename unreleased API, add utilities (#16806)
### Description
1. rename OrtValue.FillStringTensorElement to StringTensorSetElementAt .
To the API user I think we're conceptually setting the string at an
offset in the tensor with is roughly equivalent to `List<string> list
... list[index] = "value"`.
2. While working on new inference examples, I noticed that I am still
inclined to use `DenseTensor` for N-D indexing. Added `GetStrides()` and
`GetIndex()` from strides for long dims, so the user can obtain strides
and translate N-D indices into a flat index to operate directly on the
native `OrtValue` buffers. Expose these functions to the user.
3. Make sure we generate docs for C# public static  functions.
2023-08-02 10:06:42 -07:00
Yulong Wang
4a2a248dd7
remove unused comments in mac CI yml file (#16964)
### Description
remove unused comments in mac CI yml file
2023-08-01 20:52:12 -07:00
Yulong Wang
afac67bcc3
[build] fix the CI pipeline (#16962)
### Description
There are currently multiple failures that blocking the CI pipelines so
this PR has all of the fixes in order to make sure it passes the CI.
Otherwise a single fix will still fail the CI.

includes:
#16960
#16958

Please help to make sure this PR get merged once CI passed.

@snnn @carzh @guschmue 

Fixed:
[AB#18118](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/18118)

---------

Co-authored-by: Caroline Zhu <carolinezhu@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2023-08-01 16:22:45 -07:00
Yulong Wang
969c95f73f
[js/common] a few fixes/revises to onnxruntime-common (#16853)
### Description
- enable unit test for js/common in CI
- add debug config in js/.vscode/launch.json
- enable source map for js/common/test for debugging purposes; add
source map files to ignore list
- ignore js/common/test folder for npm packaging
2023-08-01 11:17:39 -07:00
Yi Zhang
c4e4b98fb2
replace one pool with onnxruntime-Win2022-GPU-T4 (#16953)
### Description
replace one pool

### Motivation and Context
onnxruntime-gpu-tensorrt8-winbuild-t4 would be deprecated
2023-08-01 21:02:56 +08:00
Changming Sun
73ddba964f
Update the MacOS/Linux build scripts that build/install protobuf from source (#16906)
### Description
1. As a follow-up of #16761, this PR allows build ORT on iOS/Android
without the need to explicitly specify a protoc path. #16761 is for
WASM. This one is for iOS/Android
2. Update the MacOS/Linux build scripts that build/install protobuf from
source. Make them be more flexible. Add the support for
RedHatEnterprise(ubi), which will needed for upgrading the base image
from centos:7 to ubi:8.
3. Update tools/ci_build/github/pai/rocm-ci-pipeline-env.Dockerfile :
the docker file's base image has preinstalled protobuf in /usr/local, we
should uninstall them to avoid conflicts.
2023-07-31 10:51:48 -07:00
Yi Zhang
28a099fca8
unify the steps of downloading cuda sdk and setup env (#16896)
### Description
The `%AGENT_TEMPDIRECTORY%\v11.8` is created in azcopy step.
So, the set env step should be after the azcopy step.

### Motivation and Context
Correct the previous logic
Unify the step since multiple jobs are using it.
2023-07-31 10:25:04 -07:00
Scott McKay
21a71d52bd
Enable CodeQL for Android build as per 1CS requirement. (#16875)
### Description
<!-- Describe your changes. -->
Split stages for CPU and CPU+NNAPI builds as CodeQL is enabled at the
stage level.
We run it for CPU+NNAPI as that covers all the Android code. 
We don't want to run it for both as duplicate issues would be created
for a problem in code included in both builds.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-07-28 17:54:23 +10:00
Yi Zhang
9f21f694cf
stop support to VS 2019 (#16892)
### Description
Remove VS 2019 code.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-07-28 13:09:35 +08:00
Changming Sun
9dcbcf1d2f
Delete unused files (#16887)
### Description
These yaml files and docker files are not used by any pipeline. If I
were wrong, feel free to submit a PR to get the wrongly deleted file
back from git history (git keeps everything forever).
2023-07-27 16:46:09 -07:00
Yi Zhang
bd95a8ea77
update onnxruntime-gpu-winbuild-T4 to onnxruntime-Win2022-GPU-T4 (#16838)
### Description


### Motivation and Context
It's also used to upgrade visual studio to VS2022.
onnxruntime-gpu-winbuild-T4 and onnxruntime-gpu-tensorrt8-winbuild-t4
are using the image based on one dev branch and VS2019

To avoid breaking the current CIs, we move jobs running on
onnxruntime-gpu-winbuild-T4/onnxruntime-gpu-tensorrt8-winbuild-t4 to
onnxruntime-Win2022-GPU-T4.
2023-07-27 08:38:20 -07:00
Wang, Mengni
fe463d4957
Support SmoothQuant for ORT static quantization (#16288)
### Description

Support SmoothQuant for ORT static quantization via intel neural
compressor

> Note:
Please use neural-compressor==2.2 to try SmoothQuant function.

### Motivation and Context
For large language models (LLMs) with gigantic parameters, the
systematic outliers make quantification of activations difficult. As a
training free post-training quantization (PTQ) solution, SmoothQuant
offline migrates this difficulty from activations to weights with a
mathematically equivalent transformation. Integrating SmoothQuant into
ORT quantization can benefit the accuracy of INT8 LLMs.

---------

Signed-off-by: Mengni Wang <mengni.wang@intel.com>
2023-07-26 18:56:45 -07:00
Justin Chu
0c1a5098dc
Disable PERF* rules in ruff to allow better readability (#16834)
### Description

Disable two PERF* rules in ruff to allow better readability. Rational
commented inline. This change also removes the unused noqa directives
because of the rule change.

### Motivation and Context

Readability
2023-07-25 15:38:22 -07:00
Edward Chen
e01365f80b
Update upload_pod_archive_and_update_podspec.sh to take path pattern (#16810)
Update upload_pod_archive_and_update_podspec.sh to take a pod archive path glob pattern. The actual pod archive path has a version suffix which changes.
2023-07-25 08:55:31 -07:00
Yi Zhang
38db5eca65
replace onnxruntime-Win-CPU-2019 with onnxruntime-Win-CPU-2022 (#16844)
### Description
<!-- Describe your changes. -->



### Motivation and Context
upgrade to VS2022
2023-07-25 23:05:34 +08:00
Yi Zhang
f88f0d8e36
Upgrade 4 stages in nuget pipeline to VS2022 (#16825)
### Description


### Motivation and Context
Continue upgrading to VS2022

### Verfication

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=331377&view=results

N.B.
In practice, SDLNativeRules@3 doesn't support VS2019.
2023-07-25 14:22:39 +08:00
PeixuanZuo
8ede2f139e
[ROCm] Optimize ROCm CI pipeline 2 (#16691)
- Set `KERNEL_EXPLORER_TEST_USE_CUPY=1` to replace numpy with cupy on
kernel explorer test.

KERNEL_EXPLORER_TEST_USE_CUPY=0 The CPU utilization is shown as below:

![image](https://github.com/microsoft/onnxruntime/assets/94887879/91724b78-0b4e-4cbd-ad88-83cad9976472)

KERNEL_EXPLORER_TEST_USE_CUPY=1 The CPU utilization is shown as below:

![image](https://github.com/microsoft/onnxruntime/assets/94887879/58239911-667c-4d5f-bb78-deca60d0266f)


- Use `Bash@3`.
- Update shell script.
2023-07-24 13:57:48 +08:00
Yi Zhang
3252ff2cb7
Change DML GPU pool in Windows GPU workflow use Visual Studio 2022 (#16784)
### Description
1. use the pool with VS2022
2. upgrade System.Memory to 4.5.5


### Motivation and Context
Solve the build error while using VS2022:
`[Failure] Msbuild failed when processing the file
'D:\a\_work\1\s\csharp\src\Microsoft.ML.OnnxRuntime\Microsoft.ML.OnnxRuntime.csproj'
with message: Method not found: 'System.ReadOnlySpan`1<Char>
Microsoft.IO.Path.GetFileName(System.ReadOnlySpan`1<Char>)'`

Ref:
https://stackoverflow.com/questions/73399777/azure-build-failing-due-to-method-not-found-system-readonlyspan1char-micros
2023-07-23 10:07:21 +08:00
Justin Chu
d79515041c
[Better Engineering] Bump ruff to 0.0.278 and fix new lint errors (#16789)
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at
bottom):
* __->__ #16789

Bump ruff to 0.0.278 and fix new lint errors. I added noqa to all
existing RUF012 errors which requires mutable class variables to be
annotated with `ClassVar`, as well as all PERF issues.

Signed-off-by: Justin Chu <justinchu@microsoft.com>
2023-07-21 12:53:41 -07:00
Baiju Meswani
538d2412ef
Objective-C Add Support to Create and Query String ORTValues (#16764)
This pull request contains a few changes:

1. Adds support for string ort values.
2. Fixes the training minimal build (that was broken with #16601) by
putting custom op registration behind #ifdefs
3. Fixes the iOS pod package generation (that was again broken with
#16601) by explicitly providing paths to be copied during pod creation.
2023-07-20 17:39:29 -07:00
Adrian Lizarraga
a8c263f92c
[QNN EP] Update QNN SDK to 2.12 (#16750)
### Description
- Updates the default QNN SDK to 2.12 for CI pipelines
- Adds a disabled InstanceNormalization test for regression on QNN SDK
2.12
- Cleans up logs for unsupported ops.

### Motivation and Context
Test with the latest QNN SDK.
2023-07-20 16:22:14 -07:00