Commit graph

637 commits

Author SHA1 Message Date
Abhishek Jindal
680a526e73
Training packaging pipeline for cuda12 (#18524)
### Description
<!-- Describe your changes. -->
Build ORT-training packaging pipeline for CUDA 12.2


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This will help any customer using CUDA 12 and would not need to build
ORT-training from source

Test run:
https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=382993&view=logs&s=130be951-c2f3-5601-5709-434b5e50ddb0
2023-11-21 13:19:21 -08:00
Jian Chen
d97fc1824f
Create a new Python Package pipeline for CUDA 12 (#18348)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-11-20 09:48:28 -08:00
Wei-Sheng Chin
3bcc137eb4
Tiny change to trigger the update of DORT's CI image (#18507)
Recent PyTorch breaks DORT CI and [a
patch](https://github.com/pytorch/pytorch/pull/113697) has been merged
into PyTorch main. In order to update DORT's CI, we made dummy change in
this PR.
2023-11-19 22:09:11 -08:00
PeixuanZuo
37d8bed53d
[ROCm] add migraphx into onnxruntime-training-rocm package (#18339) 2023-11-14 11:54:22 +08:00
RandySheriffH
59262dfc63
Add cuda context headers to zip (#18330)
Expose cuda context headers for cuda custom ops.

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-11-09 14:53:58 -08:00
Changming Sun
398ef677ba
Update protobuf python package's version (#18203)
1. Now we use a released version of ONNX, so we can directly download a
prebuilt package from pypi.org. We do not need to build one from source.
2. Update protobuf python package's version to match the C/C++ version
we are using.
3. Update tensorboard python python because the current one is
incompatible with the newer protobuf version.
2023-11-06 09:22:54 -08:00
liqun Fu
20f2dd8b6b
use onnx rel-1.15.0, update cgman, cmake/external and requirement hash (#18177) 2023-10-31 14:58:21 -07:00
Xavier Dupré
c10b83eb68
Update python cryptography version to 41.0.4 (#18056)
### Description

Version 41.0.0 currently used has vulnerabilities.

### Motivation and Context

See [Vulnerable OpenSSL included in cryptography
wheels](https://github.com/advisories/GHSA-v8gr-m533-ghj9)
2023-10-27 12:06:38 +02:00
Jian Chen
7c18c60bc2
Change cuda image for tensorRT to the one with cudnn8 (#18102)
### Description
copilot:summary


### Motivation and Context
copliot::walkthrough
2023-10-26 16:28:57 -07:00
Jian Chen
76e275baf4
Merge Cuda docker files into a single one (#18020)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-10-24 15:17:36 -07:00
Jian Chen
cbb0e0f83c
Create a new Dockerfile for cuda 12 and trt 8.6.1.6-1.cuda12.0 (#18000) 2023-10-18 14:46:02 -07:00
PeixuanZuo
2ef6ee674c
[ROCm] Update ROCm and MIGraphX CI to ROCm5.7 (#17834)
- Update ROCm and MIGraphX CI to ROCm5.7
- Simplify test exculde file. Some tests will output `registered
execution providers ROCMExecutionProvider were unable to run the model.`
if they cannot run.
- Add `enable_training` build argument for MIGraphX pipeline.
2023-10-09 10:29:11 +08:00
Wei-Sheng Chin
b5a103ae16
Upgrade transformers to fix CI (#17823)
Python package pipeline fails due to "tokenizers" compilation. Since
"tokenizers" is a dep of "transformers", we update its version and hope
a new solution had been there.

```
error: casting `&T` to `&mut T` is undefined behavior, even if the reference is unused, consider instead using an `UnsafeCell`
--> tokenizers-lib/src/models/bpe/trainer.rs:517:47
```
2023-10-07 09:51:24 -07:00
Chi Lo
569876fb16
[TensorRT EP] Refactor OrtTensorRTProviderOptions initialization and make it easy to add new field (#17617)
Two major modifications of this PR:

1. Refactor OrtTensorRTProviderOptions initialization and make it easy
to add new field.
2. Make Python API capable of using TensorRT plugins by adding new
Python binding api `register_tensorrt_plugins_as_custom_ops`. (It needs
to register ep's custom op domain before model load. For C++ API, it's
slightly different, when calling
SessionOptionsAppendExecutionProvider_TensorRT_XX, it appends cutom op
domain to session option. Later ORT can register custom op domain from
session option before model loading)
2023-10-06 14:12:20 -07:00
Justin Chu
be7541ef4a
[Linter] Bump ruff and remove pylint (#17797)
Bump ruff version and remove pylint from the linter list. Fix any new
error detected by ruff.

### Motivation and Context

Ruff covers many of the pylint rules. Since pylint is not enabled in
this repo and runs slow, we remove it from the linters
2023-10-05 21:07:33 -07:00
Changming Sun
276e8733bd
Update onnx python package and setuptools (#17709)
### Description
A follow-up for #17125
2023-09-27 07:54:48 -07:00
liqun Fu
2be4dc6d04
ONNX 1.15 integration (#17125)
### Description
this is for ORT 1.17.0 - make ORT to use ONNX release 1.15.0 branch. Eventually will update to the release tag once ONNX 1.15.0 is released


### Motivation and Context
Prepare for ORT 1.17.0 release. People can start work on new and updated ONNX ops in ORT.
---------

Signed-off-by: Liqun Fu <liqfu@microsoft.com>
2023-09-26 14:44:48 -07:00
Changming Sun
57dfd15d7b
Remove dnf update from docker build scripts (#17551)
### Description
1. Remove 'dnf update' from docker build scripts, because it upgrades TRT
packages from CUDA 11.x to CUDA 12.x.
To reproduce it, you can run the following commands in a CentOS CUDA
11.x docker image such as nvidia/cuda:11.8.0-cudnn8-devel-ubi8.
```
export v=8.6.1.6-1.cuda11.8
dnf  install -y libnvinfer8-${v} libnvparsers8-${v} libnvonnxparsers8-${v} libnvinfer-plugin8-${v} libnvinfer-vc-plugin8-${v}        libnvinfer-devel-${v} libnvparsers-devel-${v} libnvonnxparsers-devel-${v} libnvinfer-plugin-devel-${v} libnvinfer-vc-plugin-devel-${v} libnvinfer-headers-devel-${v}  libnvinfer-headers-plugin-devel-${v} 
dnf update -y
```
The last command will generate the following outputs:
```
========================================================================================================================
 Package                                     Architecture       Version                          Repository        Size
========================================================================================================================
Upgrading:
 libnvinfer-devel                            x86_64             8.6.1.6-1.cuda12.0               cuda             542 M
 libnvinfer-headers-devel                    x86_64             8.6.1.6-1.cuda12.0               cuda             118 k
 libnvinfer-headers-plugin-devel             x86_64             8.6.1.6-1.cuda12.0               cuda              14 k
 libnvinfer-plugin-devel                     x86_64             8.6.1.6-1.cuda12.0               cuda              13 M
 libnvinfer-plugin8                          x86_64             8.6.1.6-1.cuda12.0               cuda              13 M
 libnvinfer-vc-plugin-devel                  x86_64             8.6.1.6-1.cuda12.0               cuda             107 k
 libnvinfer-vc-plugin8                       x86_64             8.6.1.6-1.cuda12.0               cuda             251 k
 libnvinfer8                                 x86_64             8.6.1.6-1.cuda12.0               cuda             543 M
 libnvonnxparsers-devel                      x86_64             8.6.1.6-1.cuda12.0               cuda             467 k
 libnvonnxparsers8                           x86_64             8.6.1.6-1.cuda12.0               cuda             757 k
 libnvparsers-devel                          x86_64             8.6.1.6-1.cuda12.0               cuda             2.0 M
 libnvparsers8                               x86_64             8.6.1.6-1.cuda12.0               cuda             854 k
Installing dependencies:
 cuda-toolkit-12-0-config-common             noarch             12.0.146-1                       cuda             7.7 k
 cuda-toolkit-12-config-common               noarch             12.2.140-1                       cuda             7.9 k
 libcublas-12-0                              x86_64             12.0.2.224-1                     cuda             361 M
 libcublas-devel-12-0                        x86_64             12.0.2.224-1                     cuda             397 M

Transaction Summary
========================================================================================================================

```
As you can see from the output,  they are CUDA 12 packages. 

The problem can also be solved by lock the packages' versions by using
"dnf versionlock" command right after installing the CUDA/TRT packages.
However, going forward, to get the better reproducibility, I suggest
manually fix dnf package versions in the installation scripts like we do
for TRT now.

```bash
v="8.6.1.6-1.cuda11.8" &&\
    yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo &&\
    yum -y install libnvinfer8-${v} libnvparsers8-${v} libnvonnxparsers8-${v} libnvinfer-plugin8-${v} libnvinfer-vc-plugin8-${v}\
        libnvinfer-devel-${v} libnvparsers-devel-${v} libnvonnxparsers-devel-${v} libnvinfer-plugin-devel-${v} libnvinfer-vc-plugin-devel-${v} libnvinfer-headers-devel-${v}  libnvinfer-headers-plugin-devel-${v}
```
When we have a need to upgrade a package due to security alert or some
other reasons, we manually change the version string instead of relying
on "dnf update". Though this approach increases efforts, it can make our
pipeines more stable.

2. Move python test to docker
### Motivation and Context
Right now the nightly gpu package mixes using CUDA 11.x and CUDA 12.x
and the result package is totally not usable(crashes every time)
2023-09-21 07:33:29 -07:00
Pranav Sharma
038c76378f
Include onnxruntime_float16.h in the package. (#17637)
### Description
Include onnxruntime_float16.h in the package.

### Motivation and Context
This was missed in the recently released 1.16 pkgs (except Nuget).
2023-09-21 00:08:10 -07:00
PeixuanZuo
1f991f27f1
[ROCm] add manylinux build test for ROCm CI (#17621)
manylinux build is used for nightly packaging generation and it's hard
to capture issue in time when related files change. This PR add
manylinux build in CI.
2023-09-21 10:45:16 +08:00
Changming Sun
dd561f2015
Upgrade sympy (#17639)
AB#17015
2023-09-20 18:44:23 -07:00
Wei-Sheng Chin
068300d97e
Pin beartype version (#17599)
PyTorch doesn't like the latest beartype:
https://github.com/pytorch/pytorch/pull/109510
2023-09-18 19:31:04 -07:00
Yi Zhang
377f959c69
Run Final_Jar_Testing_Linux_GPU in docker (#17533)
### Description
1. Create a package test image based on [RedHat
UBI](https://www.redhat.com/en/blog/introducing-red-hat-universal-base-image)
2. Install TensorRT 8.6.1.6 in RedHat. (Ref.
https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html#maclearn-net-repo-install-rpm)
3. Run Final_Jar_Testing_Linux_GPU in docker (base image:
nvidia/cuda:11.8.0-cudnn8-devel-ubi8)

### Motivation and Context

[AB#18470](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/18470)

### Verification

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=354004&view=logs&j=8939b564-1402-57b5-92dc-510eba75e069&t=8939b564-1402-57b5-92dc-510eba75e069
2023-09-15 08:35:55 -07:00
Changming Sun
5d3786206b
Fix ROCM's nightly build (#17518)
### Description
PR 15470 updated some C/C++ dependencies. The change caused ROCM EP's
nightly build to fail. see issue
https://github.com/ROCm-Developer-Tools/HIP/issues/2082 for a
background. So, the root cause is HIP compiler has a special requirement
that HIP's include dirs must be used before the operating system's
include folder: /usr/include. HIP adds "-isystem" in front of
"/usr/include". gcc or clang will search the folders added with "-I"
first, then the "-isystem" folder. It works fine as long as we do not
add "-I/usr/include" to the compile commands for *.cu files. It would be wrong if
we already have installed an open source library to /usr and want to use the
prebuilt library from there instead of the current build dir. 


### Motivation and Context
2023-09-13 08:50:14 -07:00
Changming Sun
bc84f52633
Update C/C++ dependencies: abseil, date, nsync, googletest, wil, mp11, cpuinfo and safeint (#15470)
### Description
Update C/C++ dependencies abseil, date, nsync, googletest, wil, mp11,
cpuinfo and safeint to newer versions per request of @
mayeut. He created the following PRs to update the deps:
https://github.com/microsoft/onnxruntime/pull/15432
https://github.com/microsoft/onnxruntime/pull/15434
https://github.com/microsoft/onnxruntime/pull/15435
https://github.com/microsoft/onnxruntime/pull/15436
https://github.com/microsoft/onnxruntime/pull/15437

However, our build system needs to fetch the dependencies from an
internal mirror that only Microsoft employees have write access to. So I
closed his PRs and created this one.

This PR also updates abseil to a newer version. This is to prepare for
upgrading re2.
2023-09-08 13:35:04 -07:00
Yi Zhang
ae74a517b6
Run Nuget_Test_Linux_GPU in container (#17452)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

### Verification

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=351542&view=results
2023-09-08 13:41:20 +08:00
Yi Zhang
ede339f304
Move dotnet build and test into docker in Linux CPU CI (#17417)
### Description
install dotnet 6.0 in the docker image.
move C# build and test into docker.

### Motivation and Context

### Note
The Unit tests and Symbolic shape infer's migration will be in another
PR.
2023-09-07 09:28:16 +08:00
Changming Sun
c6b0d185b4
Update cmake to 3.27 and upgrade Linux CUDA docker files from CentOS7 to UBI8 (#16856)
### Description
1. Update docker files and their build instructions.
ARM64 and x86_64 can use the same docker file.

2. Upgrade Linux CUDA pipeline's base docker image from CentOS7 to UBI8
AB#18990
2023-09-05 18:12:10 -07:00
aciddelgado
44101e8771
Flash Attention v2 MHA (#17227)
### Description
Integrate Flash Attention V2 to PackedMultiHeadAttention,
MultiHeadAttention and Attention operators.

Flash Attention v2 source code is from
https://github.com/Dao-AILab/flash-attention/tree/main/csrc/flash_attn/src.
We did some change to remove dependency on Torch, then removed backward
and bfloat16 related code.

Add benchmark script (see benchmark_mha.sh) to compare different
attention kernels for MultiHeadAttention operator.

Current limitations for Flash Attention in PackedMultiHeadAttention,
MultiHeadAttention and Attention operators:
* Relative Position Bias is not supported
* Different hidden size for Q and V is not supported
* Only float16 is supported
* Padding/attention mask is not supported
* For MultiHeadAttention, when there is past or present input, bias
shall be provided to activate flash attention
* For Attention, past or present inputs will deactivate flash attention
* Causal is not supported

Some limitations (like attention mask and causal) might be removed
later.

Currently, Flash Attention v2 only works in Linux. For Windows, we will
enable later with Cutlass 3.2.

Two environment variables can be used for testing purpose:
(1) `ORT_DISABLE_FLASH_ATTENTION` to disable flash attention. Default
value is 0 (enable). Set it to "1" to disable it.
(2) `ORT_MIN_SEQ_LEN_FLASH_ATTENTION_PACKED_QKV`. Default value is
"513", which means that we only enable flash attention when sequence
length is larger than 512 for packed QKV format. Set it to "0" if you
want to use flash attention v2 whenever possible.

### Speedup

The following result is from Standard_ND96amsr_A100_v4 VM
(A100-SXM4-80GB GPU) using benchmark_mha.sh. The metric is TFLOPs per
second for MultiHeadAttention operator.

There are 3 input formats:
* `Q,K,V` means separated inputs query, key and value of BxSxNH
* `Q,KV` means packed KV, where key is 5D: BxSxNx2xH
* `QKV` means packed QKV, where query is 5D: BxSxNx3xH

Note that flash attention cannot use packed QKV format, so extra
Transpose is needed. We found that TensorRT kernel is faster for
sequence length <= 512 for packed QKV. The reason might be no transpose
is needed for TensorRT kernel in this format.

We also notice that, TensorRT kernel is faster for stable diffusion
512x512 image (see seq_len=4096, heads=8, head_dim=40 below), while
flash attention v2 is faster for 1024x1024 image (see seq_len=16384,
heads=8, head_dim=40 below).

input format | batch size | sequence length | heads | head dim |
flash_v2 (TFLOPs/s) | TensorRT (TFLOPs/s) | Memory Efficient Attention
(TFLOPs/s)
-- | -- | -- | -- | -- | -- | -- | --
Q,K,V | 32 | 512 | 64 | 32 | 78.1 | 60.0 | 39.3
Q,K,V | 32 | 512 | 128 | 16 | 46.8 | 44.1 | 21.7
Q,K,V | 16 | 1024 | 64 | 32 | 99.0 | 72.8 | 44.3
Q,K,V | 16 | 1024 | 128 | 16 | 54.7 | 49.2 | 23.4
Q,K,V | 8 | 2048 | 64 | 32 | 113.8 | 81.2 | 47.8
Q,K,V | 8 | 2048 | 128 | 16 | 59.7 | 51.9 | 24.7
Q,K,V | 4 | 4096 | 64 | 32 | 122.5 | 85.6 | 49.7
Q,K,V | 4 | 4096 | 128 | 16 | 62.5 | 53.3 | 25.3
Q,K,V | 2 | 8192 | 64 | 32 | 127.4 | 87.5 | 50.7
Q,K,V | 2 | 8192 | 128 | 16 | 64.0 | 54.2 | 25.6
Q,K,V | 1 | 16384 | 64 | 32 | 129.5 | 91.0 | 51.2
Q,K,V | 1 | 16384 | 128 | 16 | 64.7 | 54.5 | 25.8
Q,K,V | 1 | 4096 | 8 | 40 | 51.0 | 43.6 | 36.8
Q,K,V | 1 | 4096 | 8 | 80 | 97.7 | 77.0 | 55.5
Q,K,V | 1 | 4096 | 8 | 160 | 120.0 | 39.7 | 57.8
Q,K,V | 4 | 4096 | 8 | 40 | 89.0 | 84.4 | 49.2
Q,K,V | 4 | 4096 | 8 | 80 | 133.0 | 92.2 | 63.2
Q,K,V | 4 | 4096 | 8 | 160 | 164.8 | 42.7 | 63.8
Q,K,V | 1 | 16384 | 8 | 40 | 96.9 | 91.3 | 52.1
Q,K,V | 1 | 16384 | 8 | 80 | 142.9 | 101.5 | 65.6
Q,K,V | 1 | 16384 | 8 | 160 | 177.4 | 44.2 | 65.7
Q,K,V | 128 | 128 | 12 | 64 | 29.0 | 26.9 | 25.7
Q,K,V | 64 | 128 | 12 | 64 | 23.1 | 10.8 | 21.3
Q,K,V | 128 | 384 | 12 | 64 | 83.5 | 60.8 | 55.7
Q,K,V | 64 | 384 | 12 | 64 | 72.6 | 40.5 | 52.8
Q,K,V | 128 | 512 | 12 | 64 | 98.9 | 77.9 | 62.1
Q,K,V | 64 | 512 | 12 | 64 | 94.7 | 75.6 | 60.4
Q,KV | 32 | 512 | 64 | 32 | 85.9 | 41.1 | 41.1
Q,KV | 32 | 512 | 128 | 16 | 47.1 | 21.6 | 21.6
Q,KV | 16 | 1024 | 64 | 32 | 104.4 | 45.8 | 45.8
Q,KV | 16 | 1024 | 128 | 16 | 54.7 | 23.6 | 23.6
Q,KV | 8 | 2048 | 64 | 32 | 116.8 | 48.5 | 48.5
Q,KV | 8 | 2048 | 128 | 16 | 59.8 | 24.7 | 24.7
Q,KV | 4 | 4096 | 64 | 32 | 124.2 | 50.1 | 50.1
Q,KV | 4 | 4096 | 128 | 16 | 62.6 | 25.3 | 25.3
Q,KV | 2 | 8192 | 64 | 32 | 128.5 | 50.8 | 50.9
Q,KV | 2 | 8192 | 128 | 16 | 64.1 | 25.6 | 25.6
Q,KV | 1 | 16384 | 64 | 32 | 129.4 | 51.2 | 51.2
Q,KV | 1 | 16384 | 128 | 16 | 64.8 | 25.8 | 25.8
Q,KV | 1 | 4096 | 8 | 40 | 67.5 | 37.7 | 37.5
Q,KV | 1 | 4096 | 8 | 80 | 101.3 | 56.7 | 56.6
Q,KV | 1 | 4096 | 8 | 160 | 124.0 | 58.6 | 58.6
Q,KV | 4 | 4096 | 8 | 40 | 90.8 | 49.8 | 49.8
Q,KV | 4 | 4096 | 8 | 80 | 135.6 | 63.8 | 63.8
Q,KV | 4 | 4096 | 8 | 160 | 166.3 | 64.5 | 64.5
Q,KV | 1 | 16384 | 8 | 40 | 97.5 | 52.3 | 52.3
Q,KV | 1 | 16384 | 8 | 80 | 143.5 | 65.9 | 65.8
Q,KV | 1 | 16384 | 8 | 160 | 178.4 | 65.9 | 65.8
Q,KV | 128 | 128 | 12 | 64 | 26.8 | 48.1 | 30.9
Q,KV | 64 | 128 | 12 | 64 | 28.0 | 38.9 | 25.0
Q,KV | 128 | 384 | 12 | 64 | 97.7 | 61.1 | 61.0
Q,KV | 64 | 384 | 12 | 64 | 89.5 | 57.8 | 57.9
Q,KV | 128 | 512 | 12 | 64 | 111.9 | 66.7 | 66.9
Q,KV | 64 | 512 | 12 | 64 | 107.2 | 64.9 | 64.8
QKV | 32 | 512 | 64 | 32 | 77.2 | 84.7 | 39.3
QKV | 32 | 512 | 128 | 16 | 43.4 | 53.1 | 20.9
QKV | 16 | 1024 | 64 | 32 | 98.8 | 87.4 | 44.6
QKV | 16 | 1024 | 128 | 16 | 52.0 | 54.1 | 23.2
QKV | 8 | 2048 | 64 | 32 | 113.1 | 89.0 | 47.9
QKV | 8 | 2048 | 128 | 16 | 58.2 | 54.6 | 24.5
QKV | 4 | 4096 | 64 | 32 | 120.6 | 89.7 | 49.7
QKV | 4 | 4096 | 128 | 16 | 61.7 | 54.6 | 25.2
QKV | 2 | 8192 | 64 | 32 | 125.9 | 89.5 | 50.7
QKV | 2 | 8192 | 128 | 16 | 63.6 | 54.8 | 25.5
QKV | 1 | 16384 | 64 | 32 | 128.5 | 92.0 | 51.2
QKV | 1 | 16384 | 128 | 16 | 64.6 | 54.8 | 25.7
QKV | 1 | 4096 | 8 | 40 | 60.2 | **69.8** | 38.1
QKV | 1 | 4096 | 8 | 80 | 101.6 | 75.2 | 56.7
QKV | 1 | 4096 | 8 | 160 | 130.2 | 41.2 | 58.4
QKV | 4 | 4096 | 8 | 40 | 90.6 | **91.0** | 49.5
QKV | 4 | 4096 | 8 | 80 | 133.6 | 98.1 | 62.8
QKV | 4 | 4096 | 8 | 160 | 165.3 | 43.7 | 63.9
QKV | 1 | 16384 | 8 | 40 | 97.2 | 92.8 | 52.1
QKV | 1 | 16384 | 8 | 80 | 143.0 | 103.1 | 65.6
QKV | 1 | 16384 | 8 | 160 | 177.6 | 44.5 | 65.7
QKV | 128 | 128 | 12 | 64 | 31.1 | 65.9 | 27.6
QKV | 64 | 128 | 12 | 64 | 26.1 | 49.8 | 23.5
QKV | 128 | 384 | 12 | 64 | 84.6 | 88.5 | 56.1
QKV | 64 | 384 | 12 | 64 | 79.1 | 80.3 | 53.5
QKV | 128 | 512 | 12 | 64 | 97.3 | 114.2 | 62.2
QKV | 64 | 512 | 12 | 64 | 95.9 | 110.7 | 60.6
QKV | 4 | 2048 | 32 | 128 | 125.26 | 44.72 | 78.15
QKV | 4 | 4096 | 32 | 128 | 141.62 | 46.29 | 85.84
QKV | 8 | 2048 | 32 | 128 | 127.40 | 45.49 | 78.75
QKV | 8 | 4096 | 32 | 128 | 144.24 | 46.60 | 86.95

### Known Issues

NVCC uses huge memory while compiling flash attention CUDA kernel. Linux
build with CUDA might fail when machine has limited memory while number
of CPUs is large. Walkaround is to use a build machine with larger
memory, or use argument like `--nvcc_threads 1` to limit nvcc threads in
build.

### Motivation and Context
Increases speed and efficiency of MHA or Packed MHA.

---------

Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
Co-authored-by: tlwu@microsoft.com <tlwu@a100.crj0ad2y1kku1j4yxl4sj10o4e.gx.internal.cloudapp.net>
2023-08-31 13:52:21 -07:00
Yi Zhang
507a40e1e9
Add compiler cache in Linux GPU TensorRT CI. (#17348)
### Description
Add the compiler cache in linux GPU tensorRT CI.
Save about 30 minutes in the GPU machine. (52 minutes -> 24 minutes)

PS. 
There're only white-space differences in the dockerfile.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-08-31 08:13:26 +08:00
Jian Chen
081c0692a4
Update to nodejs version from 16 to 18.17.1 (#17351)
### Description
Update to nodejs version from 16 to 18.17.1



### Motivation and Context
Nodejs will reach EOL in September 2023
2023-08-30 12:41:48 -07:00
Jian Chen
922629aad8
Upgrade Centos7 to Alamlinux8 (#16907)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Get the latest gcc 12 by default

---------

Co-authored-by: Changming Sun <chasun@microsoft.com>
2023-08-29 21:05:36 -07:00
cloudhan
bf8b1681f9
Build nuget pkg for ROCm (#16791)
Add nuget pkg building and publishing for ROCm EP

---------
Co-authored-by: Yi Zhang <zhanyi@microsoft.com>
2023-08-28 13:35:08 +08:00
Yifan Li
808215366d
Fix Multi GPU TensorRT tests (#17269)
### Description
* Integrate `trt_multi_gpu` test stage in ORT post merge CI (Win-2xA10
vm)
* Deprecate Linux MultiGPU TRT CI (This vm will be deprecated soon)
* Add multi gpu support to existing C# test cases
* Deprecate unfunctional flag `--enable_multi_device_tests`

### Motivation and Context
* Two contexts of replacing Linux MultiGPU TRT CI:
* Flag `--enable_multi_device_tests` is not functional, which cannot
detect issues like #17036
* The Linux-2xM60 VM of this CI pool is about to be deprecated 9/6/23.
Need to enable this test in other dualGPU vm pool.
2023-08-25 20:30:45 -07:00
Changming Sun
6db72165eb
Fix python packaging test pipeline (#17204)
### Description
1. Fix python packaging test pipeline. There was an error in
tools/ci_build/github/linux/run_python_tests.sh that it installed a
released version of onnxruntime python package from pypi.org to run the
test. Supposedly it should pick one from the current build.
2. Refactor the pipeline to allow choosing cmake build type from the web
UI when manually trigger a build. Now this feature is for Linux only.
Because I don't want to change too much when we are about to cut a
release branch. After that I will expand it to all platforms. This
feature is useful for debugging pipeline issues, also, we may consider
having a nightly pipeline to run all tests in Debug mode which may catch
extra bugs because in debug mode we can enforce range check.

Test run:
https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=342674&view=results

### Motivation and Context
Currently the pipeline has a crash error. 

AB#18580
2023-08-18 14:51:26 -07:00
PeixuanZuo
be2200c00b
[ROCm] fix python package pipeline (#17136)
ROCm python package pipeline failed because this
PR(https://github.com/microsoft/onnxruntime/pull/16325) changed onnx
version to a commit and we need to build onnx from source. Low protobuf
version will cause build errors.
This PR remove `cmake ` and `protobuf ` from Dockerfile, these two will
install by `install_os_deps.sh`.
2023-08-14 11:22:43 -07:00
Bowen Bao
6986981482
Bump ONNX version (#16325)
### Description
Bump ONNX version to https://github.com/onnx/onnx/tree/rel-1.14.1 to
include a fix for segfault when shape inferencing nested onnx functions.



### Motivation and Context
Resolves #16170
2023-08-10 11:27:28 -07:00
PeixuanZuo
12837ba5c7
[ROCm] Update CI based on ubuntu 22.04 (#17076)
- Update ROCm version to ROCm5.6
- Update CI based on ubuntu 22.04
2023-08-10 09:51:29 -07:00
RandySheriffH
a7542f48d6
Make AzureEP default for python and c# packaging (#17025)
Make AzureEP default for python and c# packaging, with UT.

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-08-09 12:36:52 -07:00
Yifan Li
d6ce43db5e
[EP Perf] MemTest: Add Valgrind and fix addressSanitizer (#16930)
### Description
1. Add valgrind to existing ep_perf CI MemTest and parse ORT-TRT memLeak
details
1. General Valgrind logs and logs related to ORT-TRT will be parsed in
[CI
artifacts](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=334122&view=artifacts&pathAsName=false&type=publishedArtifacts)
      1. Logic:
1. Run valgrind with `onnxruntime-perf-test -e tensorrt` and export log
to `valgrind.log`
         2. Identify if any `definitely lost` memleak happened
1. For log paragraphs which show `definitely lost`, parse if they have
keyword `TensorrtExecutionProvider`.
2. If so, extract these details to `ort_trt_memleak_detail.log`, and
return `build failure` to EP Perf CI
3. Fix existing addressSanitizer and sync the squeezenet testcase with
latest update from
[ort-inference-example](https://github.com/microsoft/onnxruntime-inference-examples/blob/main/c_cxx/squeezenet/main.cpp)
1. Updates in short: Upgrade main.cpp to be using
OrtTensorRTProviderOptionsV2
4. Reorder the 7-min-MemTest to be ahead of 9-hr-model-tests, and enable
MemTest by default
2023-08-04 16:58:57 -07:00
Changming Sun
73ddba964f
Update the MacOS/Linux build scripts that build/install protobuf from source (#16906)
### Description
1. As a follow-up of #16761, this PR allows build ORT on iOS/Android
without the need to explicitly specify a protoc path. #16761 is for
WASM. This one is for iOS/Android
2. Update the MacOS/Linux build scripts that build/install protobuf from
source. Make them be more flexible. Add the support for
RedHatEnterprise(ubi), which will needed for upgrading the base image
from centos:7 to ubi:8.
3. Update tools/ci_build/github/pai/rocm-ci-pipeline-env.Dockerfile :
the docker file's base image has preinstalled protobuf in /usr/local, we
should uninstall them to avoid conflicts.
2023-07-31 10:51:48 -07:00
Changming Sun
9dcbcf1d2f
Delete unused files (#16887)
### Description
These yaml files and docker files are not used by any pipeline. If I
were wrong, feel free to submit a PR to get the wrongly deleted file
back from git history (git keeps everything forever).
2023-07-27 16:46:09 -07:00
Wang, Mengni
fe463d4957
Support SmoothQuant for ORT static quantization (#16288)
### Description

Support SmoothQuant for ORT static quantization via intel neural
compressor

> Note:
Please use neural-compressor==2.2 to try SmoothQuant function.

### Motivation and Context
For large language models (LLMs) with gigantic parameters, the
systematic outliers make quantification of activations difficult. As a
training free post-training quantization (PTQ) solution, SmoothQuant
offline migrates this difficulty from activations to weights with a
mathematically equivalent transformation. Integrating SmoothQuant into
ORT quantization can benefit the accuracy of INT8 LLMs.

---------

Signed-off-by: Mengni Wang <mengni.wang@intel.com>
2023-07-26 18:56:45 -07:00
PeixuanZuo
8ede2f139e
[ROCm] Optimize ROCm CI pipeline 2 (#16691)
- Set `KERNEL_EXPLORER_TEST_USE_CUPY=1` to replace numpy with cupy on
kernel explorer test.

KERNEL_EXPLORER_TEST_USE_CUPY=0 The CPU utilization is shown as below:

![image](https://github.com/microsoft/onnxruntime/assets/94887879/91724b78-0b4e-4cbd-ad88-83cad9976472)

KERNEL_EXPLORER_TEST_USE_CUPY=1 The CPU utilization is shown as below:

![image](https://github.com/microsoft/onnxruntime/assets/94887879/58239911-667c-4d5f-bb78-deca60d0266f)


- Use `Bash@3`.
- Update shell script.
2023-07-24 13:57:48 +08:00
saurabh
24566058b3
ovep dockerfile and wheel docs changes (#16482)
### Description
This PR is includes changes in the documentation of _readmeOV.rst_ file
and also the changes in the dockerfile which enables to build ORT with
latest OpenVINO 2023.0.0



### Motivation and Context
Modified the dockerfile to incorporate the latest version of OpenVINO
(2023.0.0) for building Onnxruntime.
The changes in the PR aim to improve the overall user experience by
providing accurate and up-to-date documentation while leveraging latest
OpenVINO 2023.0.0
2023-07-19 09:01:09 -07:00
Edward Chen
df8843c4a7
Upgrade old Python version in packaging pipeline (#16667)
- Upgrade from Python 3.6 to 3.8 in packaging pipeline.
- Raise build.py minimum required Python version.
2023-07-17 08:24:47 -07:00
PeixuanZuo
596dbe277e
[ROCm] add upgrade to fix security issue (#16668) 2023-07-12 17:57:18 +08:00
PeixuanZuo
e2526714e2
[ROCm] Move MIGraphX build step on CPU only machine (#16582)
- Move MIGraphX build step on CPU only machine
- Use ccache on build step
- Not pass host uid into docker build process.
2023-07-05 13:55:28 +08:00
Wei-Sheng Chin
a0a5f57581
[DORT] Use new FX-to-ONNX exporter (#16450)
The ONNX exporter in DORT have been moved to PyTorch as a formal
feature. We therefore switch to consume the exporter from PyTorch
instead of maintaining two duplicates.
2023-07-04 13:13:04 -07:00
pengwa
ac100ebb64
Fix orttraining-ortmodule-distributed CI (#16569)
### Fix orttraining-ortmodule-distributed CI

https://pypi.org/project/pydantic/#history released version 2.0 1st
July, Deepspeed has known issue on newer version of it
(https://github.com/microsoft/DeepSpeed/issues/3280). So fix this by add
similar check as DS did in
https://github.com/microsoft/DeepSpeed/pull/3290
2023-07-03 13:18:59 +08:00
Yifan Li
e2c214d81f
[TensorRT EP] TRT 8.6 minor version update (#16475)
### Description
* Minor version update: TRT 8.6.0.12->8.6.1.6
  * CI pipeline ymls/dockerfiles are updated
* cgmanifest.json/deps.txt/download-deps.yml are updated; Win trt
binaries uploaded to [win img
307029](https://aiinfra.visualstudio.com/AI%20Infra%20Management/_build/results?buildId=307029&view=results)
* Re-enable unit tests which were failed in 8.6.0 and re-gained support
in 8.6.1
2023-06-26 10:44:27 -07:00
yf711
0ad0d6ebbf
Unblock Linux MultiGPU TensorRT CI (#16446)
### Description
Revert docker base image to
nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04@sha256:b754c43fe9d62e88862d168c4ab9282618a376dbc54871467870366cacfa456e



### Motivation and Context
The default img env of nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04 has
minor upgrade, which make Linux MultiGPU TensorRT CI (NV12 instance with
Maxwell GPU) fail on three CApiTestGlobalThreadPoolsWithProvider
tests (these three tests have higher error which are above the tolerance)

That minor upgrade includes cudnn 8.7.0->8.9.0, which might be a factor
that make maxwell GPU generator higher error. CIs with T4 GPU are not
affected.
2023-06-21 17:15:39 -07:00
PeixuanZuo
a95f8ae53c
[ROCm] Update ROCm/MIGraphX CI pipeline (#16215)
MIGraphX CI

- Change docker container user name to `onnxruntimedev`

ROCm CI

- Build docker image every job instead of using prebuild image.
- Every job create a container with only one GPU with command `docker
run -it --device=/dev/kfd --device=/dev/dri/renderDxxx`
- Remove tests that are unstable or use outdated interfaces.
- Enable training ortmodule test.
2023-06-05 10:28:10 +08:00
Changming Sun
6b5b79872b
Avoid taking dependency on dl.fedoraproject.org (#16202)
### Description
1. Avoid taking dependency on dl.fedoraproject.org
The website is not very stable. Our build pipelines often fail to fetch
packages from there.

2. Update manylinux to the latest version
2023-06-02 07:41:46 -07:00
Baiju Meswani
7edc4b105d
Copy missing training header files to the package archive (#16119) 2023-05-30 16:45:40 -07:00
yf711
84f1af7ff5
ort build flag fix (#16072)
### Description
* Sync and clean build flag `--use_tensorrt_builtin_parser` from
existing CI config as this becomes default flag
* cuda version update
2023-05-24 12:32:10 -07:00
RandySheriffH
d35361bf9d
Fix python pipeline for AzureEP without using root (#16023)
Fix python pipeline for AzureEP without using root, this is for 1.15.

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-05-22 16:38:47 -07:00
PeixuanZuo
d78bbf5ef2
[ROCm] remove ROCm5.2.3, ROCm5.3, ROCm5.4 from pipeline (#16004)
remove ROCm5.2.3, ROCm5.3, ROCm5.4 from pipeline.
2023-05-19 10:29:01 +08:00
yf711
825d691617
Unify cuda & trt version on few CIs (#15943)
### Description
The cuda & trt version of some CIs didn't sync with the majority. 
Unifying cuda version as 11.8 and trt version as 8.6 on these CIs
2023-05-15 09:54:30 -07:00
PeixuanZuo
af6cb2af87
[ROCm] update ROCm/MIGraphX CI to ROCm5.5 (#15905)
update ROCm/MIGraphX CI to ROC5.5.

TODO:
two PR to fix failure on
orttraining/orttraining/test/python/orttraining_test_ortmodule_api.py
-
test_gradient_correctness_minmax/test_gradient_correctness_argmax_unfold/test_gradient_correctness_argmax_diagonal
(https://github.com/microsoft/onnxruntime/pull/15903)
- test_ortmodule_attribute_name_collision_warning
(https://github.com/microsoft/onnxruntime/pull/15884)
2023-05-15 10:28:15 +08:00
Changming Sun
7c58d013aa
Remove Ubuntu 18.04 usages (#15781)
### Description
Remove Ubuntu 18.04 usages because it will be EOL this month.

### Motivation and Context
2023-05-11 11:44:00 -07:00
liqun Fu
ac9ae9f7c5
update onnx release 1.14 for docker files (#15680)
### Description
this is for ort 1.15 release to work with onnx 1.14
It shall be merged after onnx 1.14 release and before ort 1.15 release.


### Motivation and Context

---------

Signed-off-by: Liqun Fu <liqfu@microsoft.com>
2023-05-10 13:15:56 -07:00
PeixuanZuo
41457885e0
[ROCm] add rocm5.5 to python package pipeline (#15820)
add rocm5.5 to python packaging pipeline.

https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=306082&view=results

TODO:
Remove version 5.2.3, 5.3.2 and 5.4 in the next PR.
2023-05-06 10:21:15 +08:00
Changming Sun
5b826b1bc3
Update cmake version in Linux build (#15707)
### Description
All our Windows build pipelines already uses cmake 3.26 except one
pipeline: QNN ARM64.
This PR does the same for Linux build pipelines.

### Motivation and Context
This change is related to #15704 .
2023-04-27 20:02:33 -07:00
yf711
2e1f92a986
Fix EP Perf pipeline (#15507)
### Description
* Update TensorRT 8.6 lib dependencies in dockerfile of TRT EP Perf
pipeline
* Avoid using `--allow_running_as_root` and build ORT with non-root user


### Motivation and Context
To fix the build issue on EP perf pipeline

Fixed
[AB#14615]
2023-04-27 10:09:14 -07:00
yf711
28985c47b7
[TensorRT EP] Unleash opset16-17 onnx model tests (#15657)
### Description
In 2021 we restricted onnx node test CI execution in range of opset
14-15 for ORT-TRT, which was the latest opset that TRT EP could support

Update this range to opset 14-17 to improve the ORT-TRT unit test
coverage, as [Nvidia announced that TRT 8.6 supported
opset17](https://github.com/onnx/onnx-tensorrt/blob/main/docs/operators.md)
2023-04-26 11:44:19 -07:00
yf711
d701dcd027
Fix Linux MultiGPU TensorRT CI (#15697)
### Description
* Reverting default TensorRT version to 8.5 as temporary fix
  
* Apart from that, this PR temporarily leaves this CI as a place to
validate user behavior that uses TRT 8.5 with latest ORT

### Context
* This CI pool equips 2xTesla M60 GPUs, which are no longer supported by
TensorRT 8.6.
* Currently, other CIs are using single-T4 VM but there's no VM with
2xT4 or other suitable dualGPU in the range.
* Once we decide which VM instance for this CI to migrate to, TRT8.6 can
be enabled on this CI

* According to
[Nvidia](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/index.html):
* TensorRT 8.5.3 was the last release supporting NVIDIA Kepler (SM 3.x)
and NVIDIA Maxwell (SM 5.x) devices. *These devices are no longer
supported in TensorRT 8.6*. NVIDIA Pascal (SM 6.x) devices are
deprecated in TensorRT 8.6.
2023-04-26 10:01:33 -07:00
Xavier Dupré
699c9a520b
Fix TVM pipelines (#15653)
### Description
Fix TVM pipelines by adding missing dependancy of TVM (attrs).
2023-04-26 09:55:05 +02:00
Baiju Meswani
11b0a18de6
Add support for cuda 11.8 and python 3.11 for training (#15548) 2023-04-20 12:56:45 -07:00
Yi Zhang
573e4cf95f
[Fix] Python Packaging Pipeline exception. (#15568)
### Description
supplement of #15299

### Motivation and Context
It broke Python Packaging Pipeline since April 12.
2023-04-19 21:57:14 +08:00
Yi Zhang
4e1f75810c
Add compilation cache in 2 Linux CPU pipelines and refactor the Linux build step with cache (#15484)
### Description
1. Add compilation cache in Linux CPU ARM and Linux Minimal Build.
2. Integrate 4 Linux CPU build step with cache into one.
3. install ccache from source code in Linux ARM64 image.

### Motivation and Context
1. Enable more build steps with compilation cache.
2. Make it easier to add cache.

It could save 40 more minutes of compilation time in Linux ARM64.

https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=959619&view=logs&j=1e0830bb-fd74-5d0a-5029-1c63b4266d7b&t=75260ed7-7566-5947-2095-566660191920
2023-04-14 23:56:59 +08:00
yf711
8cd5f3ad9c
[TensorRT EP] support TensorRT 8.6-EA (#15299)
### Description

<!-- Describe your changes. -->

* Integrate TRT 8.6EA on relevant Linux/Windows/pkg pipelines
  * Update onnx-tensorrt to 8.6
  * Add new dockerfiles for TRT 8.6 and clean old ones
* Update
[CGManifest](https://github.com/microsoft/onnxruntime/tree/main/cgmanifests)
files and ort build deps version
  * yml/script update
* Enable built-in TRT parser option on TRT related pipelines by default
* Exclude test TopKOperator.Top3ExplicitAxisInfinity out of TRT EP tests
(8.6-EA has issue with topk operator)
2023-04-12 11:34:59 -07:00
Jian Chen
af28754e6f
Update python package pipeline to support 3.11 (#15311)
### Description
Update python package pipeline to support 3.11

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-04-04 10:55:32 -07:00
PeixuanZuo
d80859f63d
[ROCm] fix python packaging pipeline and add python10 (#15282)
rocm python packaging pipeline failed because manylinux version and
manylinux.patch update.
1. fix duplicate `epel-release` installation issue, ROCm pipeline
install it at the begin of the dockerfile to install rocm libs. remove
duplicate installation on install-runtime-packages.sh.
```
/var/tmp/yum-root-sMRl36/epel-release-latest-7.noarch.rpm: does not update installed package.
Error: Nothing to do
```
2. add python10 to fix error below.
```
+ /opt/python/cp310-cp310/bin/python -m venv /opt/_internal/tools
build_scripts/finalize.sh: line 40: /opt/python/cp310-cp310/bin/python: No such file or directory
```
3. add python10 to rocm pipeline.

pipeline link:
https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=294776&view=results
2023-03-31 10:25:21 +08:00
Changming Sun
15f7dca9fb
Update protobuf to 3.21.x (#15245)
### Description

Fixed
[AB#10092](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/10092),
[AB#11753](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/11753),
[AB#11759](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/11759)

### Motivation and Context
The one we use has a security issue in Java, though we don't use that
version's protobuf java package.
2023-03-29 14:08:18 -07:00
Jian Chen
85948d6bc6
Cjian/windows update python3.11 (#15243)
### Description
windows update python3.11



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Ubuntu <chasun@chasunlinux.lw3b1xzoyrkuzm34swpscft0ff.dx.internal.cloudapp.net>
2023-03-28 22:15:47 -07:00
PeixuanZuo
62b2947ac1
[ROCm] remove python3.7 from python packaging pipeline (#15230)
remove python3.7 from python packaging pipeline.

https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=289720&view=results
2023-03-28 10:37:04 +08:00
Jian Chen
792d411135
Update python 3.11 and remove 3.7 for Linux (#15214)
### Description
Update python 3.11 and remove 3.7



### Motivation and Context
Update python 3.11 and remove 3.7

---------

Co-authored-by: Ubuntu <chasun@chasunlinux.lw3b1xzoyrkuzm34swpscft0ff.dx.internal.cloudapp.net>
2023-03-27 14:46:30 -07:00
Changming Sun
ffcfb1ec98
Remove protobuf submodule (#15190)
### Description
Remove protobuf submodule as a follow-up of #13523

"Android CI Pipeline" and "Zip-Nuget-Java-Nodejs Packaging Pipeline"
need to be tested.


### Motivation and Context
It is related to
[AB#11753](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/11753)

Fixed
[AB#14027](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/14027)
2023-03-27 10:35:49 -07:00
Justin Chu
d834ec895a
Adopt linrtunner as the linting tool - take 2 (#15085)
### Description

`lintrunner` is a linter runner successfully used by pytorch, onnx and
onnx-script. It provides a uniform experience running linters locally
and in CI. It supports all major dev systems: Windows, Linux and MacOs.
The checks are enforced by the `Python format` workflow.

This PR adopts `lintrunner` to onnxruntime and fixed ~2000 flake8 errors
in Python code. `lintrunner` now runs all required python lints
including `ruff`(replacing `flake8`), `black` and `isort`. Future lints
like `clang-format` can be added.

Most errors are auto-fixed by `ruff` and the fixes should be considered
robust.

Lints that are more complicated to fix are applied `# noqa` for now and
should be fixed in follow up PRs.

### Notable changes

1. This PR **removed some suboptimal patterns**:

	- `not xxx in` -> `xxx not in` membership checks
	- bare excepts (`except:` -> `except Exception`)
	- unused imports
	
	The follow up PR will remove:
	
	- `import *`
	- mutable values as default in function definitions (`def func(a=[])`)
	- more unused imports
	- unused local variables

2. Use `ruff` to replace `flake8`. `ruff` is much (40x) faster than
flake8 and is more robust. We are using it successfully in onnx and
onnx-script. It also supports auto-fixing many flake8 errors.

3. Removed the legacy flake8 ci flow and updated docs.

4. The added workflow supports SARIF code scanning reports on github,
example snapshot:
	

![image](https://user-images.githubusercontent.com/11205048/212598953-d60ce8a9-f242-4fa8-8674-8696b704604a.png)

5. Removed `onnxruntime-python-checks-ci-pipeline` as redundant

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Unified linting experience in CI and local.

Replacing https://github.com/microsoft/onnxruntime/pull/14306

---------

Signed-off-by: Justin Chu <justinchu@microsoft.com>
2023-03-24 15:29:03 -07:00
Justin Chu
896ab94780
Remove root in run_python_dockerbuild.sh (#15169)
Running docker in root causes the pipeline to be stateful and
subsequently fail
2023-03-23 06:32:36 -07:00
Edward Chen
bd142bfb04
Gradle clean up (#14973)
- Use java/gradlew directly in .github/workflows/publish-java-apidocs.yml.
- Remove use of deleted step from tools/ci_build/github/azure-pipelines/android-arm64-v8a-QNN-crosscompile-ci-pipeline.yml.
- Remove Gradle installations and PATH updates from Dockerfiles and scripts. Now Gradle wrapper is used so a system Gradle installation is not needed.
2023-03-10 10:50:32 -08:00
zhijiang
80e25ad6ac
fix cg issue (#14372)
### Description
tensorboard depends on rsa>=3.1.4, while rsa 4.5 has vuln issue, so pin
it to higher version as suggested

Fixed
[AB#7352](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/7352)



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-03-09 15:28:11 +08:00
Changming Sun
c1155b70c5
Remove 37 and 50 from CUDA compute archs (#14874)
### Description
To reduce CUDA package's size a little bit. 37 is for Tesla K80. Azure's
NC-series uses it, but in most cases CUDA can dynamic generate device
code .
2023-03-03 12:24:21 -08:00
Changming Sun
f3b6664384
Remove Python 3.7 from the python packaging pipeline (#14887)
### Description
1. Remove Python 3.7 from the python packaging pipeline. It is planned
for the next release and approved by the PMs. Also we will add 3.11, but
it will be addressed in another PR.
2. Stop generating python packages based on Ubuntu 18.04 which will
reach EOL next month. We will either replace them with Ubuntu 20.04 or a
CentOS 8 variant.
2023-03-02 19:44:49 -08:00
Chun-Wei Chen
70a31e047a
Consume ONNX 1.13.1 in ONNX Runtime (#14812)
### Description
<!-- Describe your changes. -->
Consume ONNX 1.13.1 in ONNX Runtime. (ONNX 1.13.0 to ONNX 1.13.1)


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
ONNX 1.13.1 patch was just released yesterday. This PR is making ORT's
ONNX submodule consistent with the latest released ONNX. Not sure
whether this PR is really needed, but let me make it ready. Previous PR
for testing ONNX 1.13.1rc2 :
https://github.com/microsoft/onnxruntime/pull/14634.

Fixed
[AB#13174](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/13174)
.
2023-03-02 14:57:35 -08:00
Scott McKay
b7fde84341
Changes to support standalone custom ops in a minimal build. (#14497)
### Description
<!-- Describe your changes. -->
Changes to support standalone custom ops in a minimal build. Also
incorporates changes from #14492 (needed to test builds prior to that
being checked in).

We first need to save the schema info from the operators used by the
standalone op invoker in the ORT format model. Add mechanism for that.

Merge the kernel lookup logic so the same is used in full and minimal
build. NOTE: the version matching is now consistent with all other
kernel lookups, and the call to CreateOp MUST use the exact version for
the operator. Previously matching wasn't as strict, but this can lead to
the incorrect kernel being chosen.

Add tests.

NOTE: There is currently no way to detect the ops/types/opsets used
inside these custom ops as they don't exist until we create kernels,
which is after model loading completes (which is the point the ORT
format model is saved). Due to that they have to be manually added to
the configuration used to do the reduced ops build. That shouldn't be
too hard for the custom op author to add given the custom op
implementation is specifying the op, opset and type constraints (i.e.
they have the info and it's just a case of capturing/formatting it
correctly).


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Enable usage of the standalone op invoker by custom ops in a minimal
build.

---------

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-03-01 11:22:54 +10:00
Baiju Meswani
68a402e739
Add support for python 3.10 for onnxruntime-training cuda and cpu (#14100) 2023-02-02 11:32:41 -08:00
Baiju Meswani
7954976e0a
Fix python packaging pipeline (#14533)
fix onnx and protobuf inconsistencies in python packaging pipeline.
2023-02-02 13:11:18 +08:00
Yi Zhang
80f807c03d
upgrade protobuf to 3.20.2 and onnx to 1.13 (#14279)
### Description
upgrade protobuf to 3.20.2, same as onnx 1.13.0

### Motivation and Context
Per component governance requirement and Fixes #14060

unused-parameter error occurs in 2 conditions.
1. compile protolbuf

`onnxruntime_src/cmake/external/protobuf/src/google/protobuf/repeated_ptr_field.h:752:66:
error: unused parameter ‘prototype’ [-Werror=unused-parameter]`
2. include onnx_pb.h
```
2023-01-28T10:20:15.0410853Z FAILED: CMakeFiles/onnxruntime_pybind11_state.dir/onnxruntime_src/onnxruntime/python/onnxruntime_pybind_iobinding.cc.o 
......
2023-01-28T10:20:15.0466024Z                  from /build/Debug/_deps/onnx-src/onnx/onnx_pb.h:51,
2023-01-28T10:20:15.0466958Z                  from /onnxruntime_src/include/onnxruntime/core/framework/to_tensor_proto_element_type.h:10,
....
2023-01-28T10:20:15.0609678Z /build/Debug/_deps/onnx-build/onnx/onnx-operators-ml.pb.h:1178:25:   required from here
2023-01-28T10:20:15.0610895Z /onnxruntime_src/cmake/external/protobuf/src/google/protobuf/repeated_ptr_field.h:752:66: error: unused parameter ‘prototype’ [-Werror=unused-parameter]
2023-01-28T10:20:15.0611707Z cc1plus: all warnings being treated as errors

```

https://dev.azure.com/onnxruntime/2a773b67-e88b-4c7f-9fc0-87d31fea8ef2/_apis/build/builds/874605/logs/22
2023-01-31 12:55:09 -08:00
Yi Zhang
6d60dc24fe
install shared deps script (#14234)
### Description
Add a new install_shared_deps.sh

### Motivation and Context
Azcopy, Ninja, Node.js and CCache are all needed, but they are copied
everywhere.
2023-01-16 18:27:29 +08:00
sfatimar
7654cd50e8
Openvino ep 2022.3 v4.3 (#14210)
### Description
Changes to incorporate OpenVINO EP 2022.3


### Motivation and Context
This change is required to incorportate OpenVINO EP 2022.3
- If it fixes an open issue, please link to the issue here. -->

Co-authored-by: mohsinmx <mohsinx.mohammad@intel.com>
Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>
Co-authored-by: Aravind <aravindx.gunda@intel.com>
Co-authored-by: mayavijx <mayax.vijayan@intel.com>
Co-authored-by: flexci <mohsinmx>
2023-01-11 16:31:26 -08:00
RandySheriffH
83ad562826
Rename CloudEP to AzureEP (#14175)
Rename CloudEP to AzureEP.

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-01-11 12:25:04 -08:00
PeixuanZuo
33367fa2dc
[MIGraphX] update the MIGraphX version used in ORT to rocm-5.4.0 (#14184)
### Description
Update the MIGraphX version used in ORT to rocm-5.4.0

### Motivation and Context
The previous branch migraphx_for_ort has stopped updating, it is too far
away from the MIgraphX latest release branch. More discussion here:
https://github.com/microsoft/onnxruntime/issues/14126#issuecomment-1373201049

Co-authored-by: peixuanzuo <peixuanzuo@linmif39a000004.zvflicr54joexhdgnhvmxrxygg.phxx.internal.cloudapp.net>
2023-01-10 13:40:25 +08:00
Baiju Meswani
c6ff5bac9d
Update torch in eager mode CI pipeline (#14094) 2023-01-06 11:46:44 -08:00
Ashwini Khade
e5e3570ac5
fix cg issue (#14112)
### Description
Update torch version to 1.13.1 to fix CG issue:
https://dev.azure.com/aiinfra/ONNX%20Runtime/_workitems/edit/10666/
2023-01-04 09:07:13 -08:00
Baiju Meswani
0ff61f7b97
Update torch to 1.13.1 in CI and packaging pipelines for ort training (#14055) 2023-01-03 20:03:33 -08:00
RandySheriffH
587e891cae
CloudEP (#13855)
Implement CloudEP for hybrid inferencing.
The PR introduces zero new API, customers could configure session and
run options to do inferencing with Azure [triton
endpoint.](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-with-triton?tabs=azure-cli%2Cendpoint)
Sample configuration in python be like:

```
sess_opt.add_session_config_entry('cloud.endpoint_type', 'triton');
sess_opt.add_session_config_entry('cloud.uri', 'https://cloud.com');
sess_opt.add_session_config_entry('cloud.model_name', 'detection2');
sess_opt.add_session_config_entry('cloud.model_version', '7'); // optional, default 1
sess_opt.add_session_config_entry('cloud.verbose', '1'); // optional, default '0', meaning no verbose
...
run_opt.add_run_config_entry('use_cloud', '1') # 0 for local inferencing, 1 for cloud endpoint.
run_opt.add_run_config_entry('cloud.auth_key', '...')
...
sess.run(None, {'input':input_}, run_opt)
```

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-01-03 10:03:15 -08:00
Baiju Meswani
b85878953f
Fix nightly ort training ci pipeline (#14007) 2022-12-30 12:28:57 -08:00
PeixuanZuo
ab2dd8dfaf
[ROCm] Update ROCm and MigraphX CI to ROCm5.4 (#14011)
Update ROCm and MigraphX CI to ROCm5.4
Run ortmodule_test with ROCm5.4 and all
passed(https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=824742&view=logs&j=8292f886-7946-5da9-7977-04484c342eda&t=5de68eaa-cbdc-5be5-13d0-bb946f4ddb2d).

Co-authored-by: peixuanzuo <peixuanzuo@linmif39a000004.zvflicr54joexhdgnhvmxrxygg.phxx.internal.cloudapp.net>
2022-12-22 10:01:05 +08:00