Commit graph

701 commits

Author SHA1 Message Date
Changming Sun
4af593a722
Add python 3.13 support (#22380)
1. Add python 3.13 to our python packaging pipelines
2. Because numpy 2.0.0 doesn't support thread free python, this PR also
upgrades numpy to the latest
3. Delete some unused files.
2024-10-14 18:07:54 -07:00
Changming Sun
9ee963110e
Update manylinux version (#22355)
### Description
Update the commit from 59600894a2c1c18290944b83e989bfe618975230 to
1887322ed36d522409a6b805d4e7942cf76a8e40


### Motivation and Context
The new one has python 3.13.

AB#50959
2024-10-08 23:11:11 -07:00
Changming Sun
d98340968e
Stop publishing python 3.8/3.9 packages (#22343)
### Description
1. Stop publishing python 3.8/3.9 packages, to align with numpy. 
2. Add a trigger for CUDA12's python test pipeline.
2024-10-08 09:50:05 -07:00
jingyanwangms
d0b0ecfdb9
[Running CI] Update TensorRT to 10.4 (#22049)
### Description
TensorRT 10.4 is GA now, update to 10.4



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-09-26 11:10:52 -07:00
George Wu
944d87381d
[QNN EP] set up py packaging pipeline for Linux x64 (#22132)
set up a pipeline to produce nightly Linux x64 whls for onnxruntime-qnn
this can be used for offline context binary generation.
2024-09-18 23:24:32 -07:00
mindest
30f07758a2
Add packaging version constraint. (#21814)
### Description
Newer `setuptools` requires newer version of `packaging`, due to
function update.

### Motivation and Context
Fixes #21792
2024-09-04 16:57:04 -07:00
Scott McKay
44fc7b443c
Update C# test projects (#21631)
### Description
<!-- Describe your changes. -->
Update various test projects to .net8 from EOL frameworks.
Replace the Xamarin based Android and iOS test projects with a MAUI
based project that uses .net 8.
Add new CoreML flags to C# bindings

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Remove usage of EOL frameworks.
2024-09-05 08:21:23 +10:00
sfatimar
8dba8e3e24
Memory Optimization for Compilation in OVEP (#21872)
Calling Split API Calls Read+Model in lieu of unified Compile Model call
for export compile flow to ensure memory optimization. Freeing up model
proto and serialized string and read model ov ir later to free up memory
for the ahead pipeline
Optimization during EpCtxt flow
All the Graph related operations require all the Node Attributes to be
set while dealing with model instances internally with them, in the
existing implementation these attributes make a copy when constructing a
Graph dynamically during runtime.
Propose to use these attributes in place without creating a copy to
avoid memory allocation / copy while calling these Graph related
functions.
Changes to ensure the bug fixes related to openvino version and epctxt
file path.
Moving Compiler version to C++20 for getting r-value mem optimizations
benefit

### Motivation and Context
This change is required because memory optimization during Compilation
flow is too high.

---------

Co-authored-by: saurabhkale17 <saurabh1.kale@intel.com>
Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>
Co-authored-by: Vishnudas Thaniel S <vishnudas.thaniel.s@intel.com>
Co-authored-by: Javier E. Martinez <javier.e.martinez@intel.com>
Co-authored-by: jatinwadhwa921 <110383850+jatinwadhwa921@users.noreply.github.com>
Co-authored-by: ankitm3k <ankit.maheshkar@intel.com>
Co-authored-by: jatinwadhwa921 <jatin.wadhwa@intel.com>
2024-09-03 13:52:31 -07:00
mindest
bfa4da4f65
Add Linux ROCm CI Pipeline (#21798)
### Description

* Add new ROCm CI pipeline (`Linux ROCm CI Pipeline`) focusing on
inference.
* Resolve test errors; disable flaky tests.

based on test PR #21614.
2024-08-30 14:50:32 +08:00
dependabot[bot]
4ac1558498
Bump torch from 1.13.1+cpu to 2.2.0 in /tools/ci_build/github/linux/docker/scripts/training/ortmodule/stage1/torch_eager_cpu (#21919)
Bumps [torch](https://github.com/pytorch/pytorch) from 1.13.1+cpu to
2.2.0.
2024-08-29 21:57:24 -07:00
jingyanwangms
c018ba43ef
[Running CI] [TensorRT EP] support TensorRT 10.3-GA (#21742)
### Description
- TensorRT 10.2.0.19 -> 10.3.0.26

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-08-18 13:26:41 -07:00
Prathik Rao
e32e3575d8
pin pytorch lightning version for training CI (#21731)
### Description
<!-- Describe your changes. -->

Pins pytorch-lightning package to version 2.3.3 since version >=2.4.0
requires torch > 2.1.0 which is not compatible with cu118.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

ORT 1.19 Release Preparation
2024-08-13 20:04:56 -07:00
Yi Zhang
0d1da41ca8
Fix docker image layer caching to avoid redundant docker building and transient connection exceptions. (#21612)
### Description
Improve docker commands to make docker image layer caching works.
It can make docker building faster and more stable.
So far, A100 pool's system disk is too small to use docker cache.
We won't use pipeline cache for docker image and remove some legacy
code.

### Motivation and Context
There are often an exception of
```
64.58 + curl https://nodejs.org/dist/v18.17.1/node-v18.17.1-linux-x64.tar.gz -sSL --retry 5 --retry-delay 30 --create-dirs -o /tmp/src/node-v18.17.1-linux-x64.tar.gz --fail
286.4 curl: (92) HTTP/2 stream 0 was not closed cleanly: INTERNAL_ERROR (err 2)
```
Because Onnxruntime pipeline have been sending too many requests to
download Nodejs in docker building.
Which is the major reason of pipeline failing now

In fact, docker image layer caching never works.
We can always see the scrips are still running
```
#9 [3/5] RUN cd /tmp/scripts && /tmp/scripts/install_centos.sh && /tmp/scripts/install_deps.sh && rm -rf /tmp/scripts
#9 0.234 /bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
#9 0.235 /bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
#9 0.235 /tmp/scripts/install_centos.sh: line 1: !/bin/bash: No such file or directory
#9 0.235 ++ '[' '!' -f /etc/yum.repos.d/microsoft-prod.repo ']'
#9 0.236 +++ tr -dc 0-9.
#9 0.236 +++ cut -d . -f1
#9 0.238 ++ os_major_version=8
....
#9 60.41 + curl https://nodejs.org/dist/v18.17.1/node-v18.17.1-linux-x64.tar.gz -sSL --retry 5 --retry-delay 30 --create-dirs -o /tmp/src/node-v18.17.1-linux-x64.tar.gz --fail
#9 60.59 + return 0
...
```

This PR is improving the docker command to make image layer caching
work.
Thus, CI won't send so many redundant request of downloading NodeJS.
```
#9 [2/5] ADD scripts /tmp/scripts
#9 CACHED

#10 [3/5] RUN cd /tmp/scripts && /tmp/scripts/install_centos.sh && /tmp/scripts/install_deps.sh && rm -rf /tmp/scripts
#10 CACHED

#11 [4/5] RUN adduser --uid 1000 onnxruntimedev
#11 CACHED

#12 [5/5] WORKDIR /home/onnxruntimedev
#12 CACHED
```

###Reference
https://docs.docker.com/build/drivers/

---------

Co-authored-by: Yi Zhang <your@email.com>
2024-08-06 21:37:09 +08:00
Edward Chen
a5ce65d87a
Clean up some mobile package related files and their usages. (#21606)
The mobile packages have been removed.
2024-08-05 16:38:20 -07:00
Yifan Li
ebcb7075eb
Set CUDA12 as default in GPU packages (#21438)
### Description
* Swap cuda version 11.8/12.2 in GPU CIs
* Set CUDA12 as default version in yamls of publishing nuget/python/java
GPU packages
* Suppress warnings as errors of flash_api.cc during ort win-build
2024-07-25 10:17:16 -07:00
Changming Sun
b04adcc381
Update copy_strip_binary.sh: use "make install" instead (#21464)
### Description
Before this change, copy_strip_binary.sh manually copies each file from
onnx runtime's build folder to an artifact folder. It can be hard when
dealing with symbolic link for shared libraries.
This PR will change the packaging pipelines to run "make install" first,
before packaging shared libs .


### Motivation and Context

Recently because of feature request #21281 , we changed
libonnxruntime.so's SONAME. Now every package that contains this shared
library must also contains libonnxruntime.so.1. Therefore we need to
change the packaging scripts to include this file. Instead of manually
construct the symlink layout, using `make install` is much easier and
will make things more consistent because it is a standard way of making
packages.

**Breaking change:**
After this change, our **inference** tarballs that are published to our
Github release pages will be not contain ORT **training** headers.
2024-07-24 10:02:00 -07:00
Changming Sun
f70215d4e6
Update C++ dependencies (#21410)
1. Update google benchmark from 1.8.3 to 1.8.5
2. Update google test from commit in main branch to tag 1.15.0 
3. Update pybind11 from 2.12.0 to 2.13.1
4. Update pytorch cpuinfo to include the support for Arm Neoverse V2,
Cortex X4, A720 and A520.
5. Update re2 from 2024-05-01 to 2024-07-02
6. Update cmake to 3.30.1
7. Update Linux docker images
8. Fix a warning in test/perftest/ort_test_session.cc:826:37: error:
implicit conversion loses integer precision: 'streamoff' (aka 'long
long') to 'const std::streamsize' (aka 'const long')
[-Werror,-Wshorten-64-to-32]
2024-07-23 10:00:36 -07:00
Yifan Li
bb76ead96c
[TensorRT EP] support TensorRT 10.2-GA (#21395)
### Description
<!-- Describe your changes. -->
* promote trt version to 10.2.0.19
* EP_Perf CI: clean config of legacy TRT<8.6, promote test env to
trt10.2-cu118/cu125
* skip two tests as Float8/BF16 are supported by TRT>10.0 but TRT CIs
are not hardware-compatible on these:
 ```
 1: [  FAILED  ] 2 tests, listed below:
 1: [  FAILED  ] IsInfTest.test_isinf_bfloat16
 1: [  FAILED  ] IsInfTest.test_Float8E4M3FN
 ```

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-07-18 12:11:52 -07:00
Changming Sun
fe6ef404b5
Enable LTO for Android build (#21243)
### Description
Enable LTO for Android build, which can reduce binary size by 6%.
2024-07-10 18:44:17 -07:00
Jian Chen
d1c19e79ea
Update OpenVino CI Ubuntu to 22.04 (#21127)
### Description
[Update OpenVino CI Ubuntu to
22.04](312fab5b3f)



### Motivation and Context
Ubuntu 22.04 is needed for linux C++20
2024-07-09 09:56:44 -07:00
Yi Zhang
587e92c279
Add FP32 and INT4 test in Llama2 (#21187)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-06-28 06:18:26 +08:00
Changming Sun
d1ab94c2b0
Add compatibility for NumPy 2.0 (#21085)
### Description

As suggested by SciPy's doc, we will
`Build against NumPy 2.0.0, then it will work for all NumPy versions
with the same major version number (NumPy does maintain backwards ABI
compatibility), and as far back as NumPy 1.19 series at the time of
writing`

I think it works because in
[numpyconfig.h#L64](https://github.com/numpy/numpy/blob/main/numpy/_core/include/numpy/numpyconfig.h#L64)
there is a macro NPY_FEATURE_VERSION. By default it is set to
NPY_1_19_API_VERSION. And the NPY_FEATURE_VERSION macro controls ABI.

This PR only upgrade the build time dependency; When a user installs
ONNX Runtime, they still can use numpy 1.x.

### Motivation and Context
Recently numpy published a new version, 2.0.0, which is incompatible with the latest ONNX Runtime release.
2024-06-27 13:50:53 -07:00
Jian Chen
05032e5e5f
Updating cudnn from 8 to 9 on exsiting cuda 12 docker image (#20925)
### Description
Adding support of cudnn 9 


### Motivation and Context
Keep exsiting  cuda 12.2 with nvidia dirver 535
2024-06-11 09:37:16 -07:00
liqun Fu
51bc53580d
Update to onnx 1.16.1 (#20702) 2024-06-04 11:06:28 -07:00
Changming Sun
d13cabf7f9
Upgrade GCC and remove the dependency on GCC8's experimental std::filesystem implementation (#20893)
### Description
This PR upgrades CUDA 11 build pipelines' GCC version from 8 to 11. 

### Motivation and Context

GCC8 has an experimental std::filesystem implementation which is not ABI
compatible with the formal one in later GCC releases. It didn't cause
trouble for us, however, ONNX community has encountered this issue much.
For example, https://github.com/onnx/onnx/issues/6047 . So this PR
increases the minimum supported GCC version from 8 to 9, and removes the
references to GCC's "stdc++fs" library. Please note we compile our code
on RHEL8 and RHEL8's libstdc++ doesn't have the fs library, which means
the binaries in ONNX Runtime's official packages always static link to
the fs library. It is just a matter of which version of the library, an
experimental one or a more mature one. And it is an implementation
detail that is not visible from outside. Anyway, a newer GCC is better.
It will give us the chance to use many C++20 features.

#### Why we were using GCC 8?
It is because all our Linux packages were built on RHEL8 or its
equivalents. The default GCC version in RHEL8 is 8. RHEL also provides
additional GCC versions from RH devtoolset. UBI8 is the abbreviation of
Red Hat Universal Base Image 8, which is the containerized RHEL8. UBI8
is free, which means it doesn't require a subscription(while RHEL does).
The only devtoolset that UBI8 provides is GCC 12, which is too new for
being used with CUDA 11.8. And our CUDA 11.8's build env is a docker
image from Nvidia that is based on UBI8.
#### How the problem is solved
Almalinux is an alternative to RHEL. Almalinux 8 provides GCC 11. And
the CUDA 11.8 docker image from Nvidia is open source, which means we
can rebuild the image based on Almalinux 8 to get GCC 11. I've done
this, but I cannot republish the new image due to various complicated
license restrictions. Therefore I put them at an internal location in
onnxruntimebuildcache.azurecr.io.
2024-06-03 10:14:08 -07:00
Changming Sun
67bc9438d7
Update training packaging pipeline's docker files (#20853)
### Description
Similar to #20786 . The last PR was able to update all pipelines and all
docker files. This is a follow-up to that PR.

### Motivation and Context
1. To extract the common part as a reusable build infra among different
ONNX Runtime projects.
2. Avoid hitting docker hub's limit: 429 Too Many Requests - Server
message: toomanyrequests: You have reached your pull rate limit. You may
increase the limit by authenticating and upgrading:
https://www.docker.com/increase-rate-limit
2024-05-30 23:48:42 -07:00
Changming Sun
65ef270e06
Update Aten pipeline's docker file to use UBI8 (#20856)
### Description
Now it uses CentOS 7 which is EOL. This PR updates it to UBI8.

### Motivation and Context
To deprecate CentOS 7 .
2024-05-30 07:38:15 -07:00
Vincent Wang
e77f238dc6
Update Torch Version to Fix ATen CPU Pipeline Failure (#20845)
Update Torch Version to Fix ATen CPU Pipeline Failure.
2024-05-29 16:04:18 +08:00
Changming Sun
439ed92b96
Remove TVM EP's pipeline (#20813)
### Description
Temporarily remove TVM EP's pipeline until someone helps us upgrade TVM
to a newer version which is compatible with the latest ONNX.

### Motivation and Context
The ONNX version that TVM EP uses has a known security vulnerability. We
cannot continue using it in our hosted build environment. This change is temporary
2024-05-25 20:42:41 -07:00
Changming Sun
535a030b1e
Remove manylinux build scripts from python packaging pipeline (#20786)
### Description
Use a common set of prebuilt manylinux base images to build the
packages, to avoid building the manylinux part again and again. The base
images can be used in GenAI and other projects too.
This PR also updates the GCC version for inference python CUDA11/CUDA12
builds from 8 to 11. Later on I will update all other CUDA pipelines to
use GCC 11, to avoid the issue described in
https://github.com/onnx/onnx/issues/6047 and
https://github.com/microsoft/onnxruntime-genai/issues/257 .

### Motivation and Context
To extract the common part as a reusable build infra among different
ONNX Runtime projects.
2024-05-24 08:18:22 -07:00
Jian Chen
372974e5d6
Using CPU pool to build Linux GPU C API Package (#20648)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-05-20 15:25:14 -07:00
Changming Sun
08b637350a
Remove an extra space in azure_scale_set_vm_mount_test_data.sh (#20584) 2024-05-08 09:46:50 -07:00
Yifan Li
29417762f7
[TensorRT EP] support TensorRT 10-GA (#20506)
### Description
<!-- Describe your changes. -->
This branch is based on rel-1.18.0 and supports TensorRT 10-GA.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-05-01 11:10:53 -07:00
Yi Zhang
7ebc653f04
Revert "Nuget .NET changes for Mac Catalyst (#19923)" (#20418)
This reverts commit f396748ed6.

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-04-23 15:08:12 +08:00
Yi Zhang
197b3f1d90
Enable Whisper Test with OMP_FFMPEG (#20402)
### Description
 Installing OMP_FFMPEG in the docker  and Readd Whisper Test
Download OMP_FFMPEG in restricted accessed Azure blob.
2024-04-22 10:55:56 -07:00
Rachel Guo
f396748ed6
Nuget .NET changes for Mac Catalyst (#19923)
### Description
<!-- Describe your changes. -->

Add Nuget package changes for adding new 'net6.0-maccatalyst' platform.

The output ORT Nuget package was manually tested and verified in a .NET
MAUI app setup.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: Yi Zhang <zhanyi@microsoft.com>
Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>
2024-04-19 14:20:03 -07:00
liqun Fu
cd7112f800
Integration with ONNX 1.16.0 (#19745)
### Description
update with ONNX 1.16.0 branch according to
https://github.com/microsoft/onnxruntime/blob/main/docs/How_To_Update_ONNX_Dev_Notes.md

ONNX 1.16.0 release notes:
https://github.com/onnx/onnx/releases/tag/v1.16.0

#### Updated ops for CPU EP:
- DequantizeLinear(21)
  - Added int16 and uint16 support + various optimizer tests
  - Missing int4 and uint4 support
  - Missing block dequantization support
- QuantizeLinear(21)
  - Added int16 and uint16 support + various optimizer tests
  - Missing int4 and uint4 support
  - Missing block quantization support
- Cast(21)
  - Missing int4 and uint4 support
- CastLike(21)
  - Missing int4 and uint4 support
- ConstantOfShape(21)
  - Missing int4 and uint4 support
- Identity(21)
  - Missing int4 and uint4 support
- If(21)
  - Missing int4 and uint4 support
- Loop(21)
  - Missing int4 and uint4 support
- Reshape(21)
  - Missing int4 and uint4 support
- Scan(21)
  - Missing int4 and uint4 support
- Shape(21)
  - Missing int4 and uint4 support
- Size(21)
  - Missing int4 and uint4 support
- Flatten(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Pad(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Squeeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Transpose(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Unsqueeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support

#### Unimplemented opset 21 features/ops
- int4 and uint4 data type
- QLinearMatMul(21)
- GroupNormalization(21)
- ai.onnx.ml.TreeEnsemble(5)

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

### Disabled tests
#### ORT Training

orttraining/orttraining/test/python/orttraining_test_ort_apis_py_bindings.py
- test_ort_custom_ops: Potential shape inference bug for custom ops

#### Python quantization unit tests
test/onnx/python/quantization (shape inference bug)
- test_op_conv_transpose.py: test_quantize_conv_transpose_u8u8_fp16
- test_op_conv_transpose.py: test_quantize_conv_transpose_s8s8_fp16
- test_op_gemm.py: test_quantize_qop_gemm_s8s8
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_same
 - test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_p3
- test_op_matmul.py: test_quantize_matmul_u8u8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_entropy
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_percentile
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_distribution
- test_op_relu.py: test_quantize_qop_relu_s8s8

#### ONNX tests
- test_maxpool_2d_ceil_output_size_reduce_by_one: ONNX 1.16.0 fixed a
maxpool output size bug and added this test. Enable this test when [ORT
PR](https://github.com/microsoft/onnxruntime/pull/18377) is merged.
Refer to original [ONNX PR](https://github.com/onnx/onnx/pull/5741).
- test_ai_onnx_ml_tree_ensemble_set_membership_cpu: new unimplemented op
ai.onnx.ml.TreeEnsemble
- test_ai_onnx_ml_tree_ensemble_single_tree_cpu: same
- test_ai_onnx_ml_tree_ensemble_set_membership_cuda: same
- test_ai_onnx_ml_tree_ensemble_single_tree_cuda: same
- test_cast_INT4_to_FLOAT_cpu: ORT Cast(21) impl doesn't support int4
yet
- test_cast_INT4_to_INT8_cpu: same
- test_cast_UINT4_to_FLOAT_cpu: same
- test_cast_UINT4_to_UINT8_cpu: same
- test_cast_INT4_to_FLOAT_cuda
- test_cast_INT4_to_INT8_cuda
- test_cast_UINT4_to_FLOAT_cuda
- test_cast_UINT4_to_UINT8_cuda
- test_constantofshape_float_ones_cuda: ConstantOfShape(21) not
implemented for cuda
- test_constantofshape_int_shape_zero_cuda: same
- test_constantofshape_int_zeros_cuda: same
- test_flatten_axis0_cuda: Flatten(21) not implemented for cuda
- test_flatten_axis1_cuda: same
- test_flatten_axis2_cuda: same
- test_flatten_axis3_cuda: same
- test_flatten_default_axis_cuda: same
- test_flatten_negative_axis1_cuda: same
- test_flatten_negative_axis2_cuda: same
- test_flatten_negative_axis3_cuda: same
- test_flatten_negative_axis4_cuda: same
- test_qlinearmatmul_2D_int8_float16_cpu: QLinearMatMul(21) for onnx not
implemented in ORT yet
- test_qlinearmatmul_2D_int8_float32_cpu: same
- test_qlinearmatmul_2D_uint8_float16_cpu: same
- test_qlinearmatmul_2D_uint8_float32_cpu: same
- test_qlinearmatmul_3D_int8_float16_cpu: same
- test_qlinearmatmul_3D_int8_float32_cpu: same
- test_qlinearmatmul_3D_uint8_float16_cpu: same
- test_qlinearmatmul_3D_uint8_float32_cpu: same
- test_qlinearmatmul_2D_int8_float16_cuda: same
- test_qlinearmatmul_2D_int8_float32_cuda: same
- test_qlinearmatmul_2D_uint8_float16_cuda: same
- test_qlinearmatmul_2D_uint8_float32_cuda: same
- test_qlinearmatmul_3D_int8_float16_cuda: same
- test_qlinearmatmul_3D_int8_float32_cuda: same
- test_qlinearmatmul_3D_uint8_float16_cuda: same
- test_qlinearmatmul_3D_uint8_float32_cuda: same
- test_size_cuda: Size(21) not implemented for cuda
- test_size_example_cuda: same
- test_dequantizelinear_blocked: Missing implementation for block
dequant for DequantizeLinear(21)
- test_quantizelinear_blocked_asymmetric: Missing implementation for
block quant for QuantizeLinear(21)
- test_quantizelinear_blocked_symmetric: Missing implementation for
block quant for QuantizeLinear(21)

---------

Signed-off-by: liqunfu <liqun.fu@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: George Wu <jywu@microsoft.com>
Co-authored-by: adrianlizarraga <adlizarraga@microsoft.com>
2024-04-12 09:46:49 -07:00
sfatimar
eab35c20fc
Ort openvino npu 1.17 master (#19966)
### Description
Add NPU to list of device supported. 
Added changes for Support to OV 2024.0
Nuget packages removes packaging of OpenVINO DLL 
Bug Fixes with Python API 
Reverted Dockerfiles not being maintained. 



### Motivation and Context
NPU Device has been introduced by Intel in latest client systems
OpenVINO 2024.0 release is out.

---------

Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com>
Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>
Co-authored-by: Ubuntu <ubuntu@ubuntu-118727.iind.intel.com>
Co-authored-by: hmamidix <hemax.sowjanya.mamidi@intel.com>
Co-authored-by: vthaniel <vishnudas.thaniel.s@intel.com>
Co-authored-by: saurabhkale17 <saurabh1.kale@intel.com>
2024-03-21 18:44:00 -07:00
Justin Chu
bcf47d3546
Update install_deps_lort.sh to fix onnxscript installation (#19922)
Install onnxscript correctly with `pip install`. Dev dependencies are
not required.

### Motivation and Context

Fix build breaks.
2024-03-14 17:05:50 -07:00
Hariharan Seshadri
ed306b4f97
Fix Android CI pipeline (#19877) 2024-03-13 10:09:43 -07:00
Justin Chu
faea42af95
Bump ruff to 0.3.2 and black to 24 (#19878)
### Motivation and Context

Routing updates
2024-03-13 10:00:32 -07:00
Yi Zhang
d4fa4f0276
Remove FFmpeg to meet compliance (#19859) 2024-03-12 09:06:59 -07:00
Yifan Li
069d2d6f54
[EP Perf] Update EP Perf dockerfiles with cuda12/cudnn9 (#19781)
### Description
* Update name of existing dockerfiles and add support to test latest
TensorRT EA binary located in the image
* Add cuda 12.3/cuDNN 9/TensorRT 8.6 dockerfile
* Add detail to CI prompts and configs

Instruction to test latest TRT via BIN:
1. Select `BIN` in TensorRT Version
2. In Variables, update related tarCudaVersion, **clear**
tarCudnnVersion (not required in latest TRT tar binary) , and path to
binary.
2024-03-08 13:58:22 -08:00
Yi Zhang
9460597b21
Update copying API header files (#19736)
### Description
Make Linux logic consistent as Windows


### Motivation and Context
onnxruntime_lite_custom_op.h in Windows zip package but not in Linux zip
package

acbfc29f27/tools/ci_build/github/azure-pipelines/templates/c-api-artifacts-package-and-publish-steps-windows.yml (L67)

Co-authored-by: Your Name <your@email.com>
2024-03-02 11:33:47 +08:00
Yi Zhang
3b46ab6439
Re-add testing removed by mistake. (#19647) 2024-02-27 08:46:29 -08:00
Yi Zhang
0fcc6fb760
Add Whisper model in CI (#19604)
### Description
 Add Whisper Conversion and E2E into Big Models pipeline



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Your Name <your@email.com>
Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>
2024-02-25 14:04:22 +08:00
Prathik Rao
3b03b2e046
Upgrade default ORTModule opset from 15 to 17 (#19315)
### Description
<!-- Describe your changes. -->

This PR upgrades ORTModule's default opset from 15 to 17. Opset 17 is
the final opset supported by torchscript exporter
(https://github.com/pytorch/pytorch/pull/107829)

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Engineering excellence contribution for ORT Training DRI.

---------

Co-authored-by: Prathik Rao <prathikrao@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2024-02-14 11:19:33 -08:00
Yifan Li
5c7e6b2e2a
[EP Perf] Add CI option to enable TRT-OSS parser (#19448)
### Description
<!-- Describe your changes. -->
* Introducing CI option to enable TRT-OSS parser, during ep perf
testing:

![image](https://github.com/microsoft/onnxruntime/assets/109183385/a9ba6393-6b94-4b8f-8ca4-ba7bc7954504)

By default, open-sourced onnx-tensorrt parser listed under
[cmake/deps.txt](https://github.com/microsoft/onnxruntime/blob/main/cmake/deps.txt#L39-L40)
will be used if enabling this option.


### To verify this option and check the difference during ORT image
build:
If this option is enabled:
<img width="649" alt="image"
src="https://github.com/microsoft/onnxruntime/assets/109183385/3b778583-451e-4617-ba8c-c064442e60fd">

If this option is not enabled (by default):
<img width="683" alt="image"
src="https://github.com/microsoft/onnxruntime/assets/109183385/cd8383ba-eff4-4536-94ab-a1424bb858ab">

* update default usage of cmake/trt version to the latest

### Motivation and Context
Make it easier to test oss parser and find potential gap between
tensorrt builtin/oss parser.

Schedule runs with oss parser will be set after this PR gets merged
2024-02-12 23:04:08 -08:00
Justin Chu
3d2ddf96e3
Bump ruff linter to 0.2.1 (#19471)
### Motivation and Context

Include new lint rules
2024-02-08 16:08:27 -08:00
Jian Chen
75f06319d6
Change binet to bin (#19424)
### Description
This pull request includes a small change to the
`Dockerfile.manylinux2_28_cuda` file in the
`tools/ci_build/github/linux/docker` directory. The change corrects the
`PREPEND_PATH` argument from `/usr/local/cuda/binet` to
`/usr/local/cuda/bin`, ensuring the correct path to CUDA binaries is
set.
2024-02-07 09:51:02 -08:00
Changming Sun
e91d91ae4f
Fix a build issue: /MP was not enabled correctly (#19190)
### Description

In PR #19073 I mistunderstood the value of "--parallel". Instead of
testing if args.parallel is None or not , I should test the returned
value of number_of_parallel_jobs function.

If build.py was invoked without --parallel, then args.parallel equals to
1. Because it is the default value. Then we should not add "/MP".
However, the current code adds it. Because if `args.paralllel` is
evaluated to `if 1` , which is True.
If build.py was invoked with --parallel with additional numbers, then
args.parallel equals to 0. Because it is unspecified. Then we should add
"/MP". However, the current code does not add it. Because `if
args.paralllel` is evaluated to `if 0` , which is False.

This also adds a new build flag: use_binskim_compliant_compile_flags, which is intended to be only used in ONNX Runtime team's build pipelines for compliance reasons. 

### Motivation and Context
2024-01-29 12:45:38 -08:00
Changming Sun
81d363045b
Upgrade Ubuntu machine pool from 20.04 to 22.04 (#19117)
### Description
Upgrade Ubuntu machine pool from 20.04 to 22.04
2024-01-16 17:25:18 -08:00
Changming Sun
0e8d4c3d21
Enable Address Sanitizer in CI (#19073)
### Description
1. Add two build jobs for enabling Address Sanitizer in CI. One for
Windows CPU, One for Linux CPU.
2. Set default compiler flags/linker flags in build.py for normal
Windows/Linux/MacOS build. This can help control compiler flags in a
more centralized way.
3. All Windows binaries in our official packages will be built with
"/PROFILE" flag. Symbols of onnxruntime.dll can be found at [Microsoft
public symbol
server](https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/microsoft-public-symbols).

Limitations:
1. On Linux Address Sanitizer ignores RPATH settings in ELF binaries.
Therefore once Address Sanitizer is enabled, before running tests we
need to manually set LD_LIBRARY_PATH properly otherwise
libonnxruntime.so may not be able to find custom ops and shared EPs.
4. On Linux we also need to set LD_PRELOAD before running some tests(if
the main executable, like python, is not built with address sanitizer.
On Windows we do not need to.
5. On Windows before running python tests we should manually copy
address sanitizer DLL to the onnxruntime/capi directory, because python
3.8 and above has enabled "Safe DLL Search Mode" that wouldn't use the
information provided by PATH env.
6. On Linux Address Sanitizer found a lot of memory leaks from our
python binding code. Therefore right now we cannot enable Address
Sanitizer when building ONNX Runtime with python binding.
7. Address Sanitizer itself uses a lot of memory address space and
delays memory deallocations, which is easy to cause OOM issues in 32-bit
applications. We cannot run all the tests in onnxruntime_test_all in
32-bit mode with Address Sanitizer due to this reason. However, we still
can run individual tests in such a way. We just cannot run all of them
in one process.

### Motivation and Context
To catch memory issues.
2024-01-12 07:24:40 -08:00
Jian Chen
2eb3db6bf0
Adding python3.12 support to ORT (#18814)
### Description
Adding python3.12 support to ORT



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-01-11 08:34:28 -08:00
Ashwini Khade
897a4163d7
Update transformer version for training CIs (#19046)
### Description
Updating version to resolve security vulnerability.
2024-01-09 12:00:34 -08:00
PeixuanZuo
efdcefcf8c
[ROCm] fix security warning (#19017)
fix security warning
2024-01-05 10:05:34 -08:00
Changming Sun
e155c66b4a
Change all macOS python packages to use universal2 (#19013)
### Description
Change all macOS python packages to use universal2, to reduce the number
of packages we have.

### Motivation and Context
According to [wikipedia](https://en.wikipedia.org/wiki/MacOS_Big_Sur),
macOS 11 is the first macOS version that supports universal 2. And it is
the min macOS version we support. So we no longer need to maintain
separate binaries for different CPU archs.
2024-01-04 17:44:49 -08:00
PeixuanZuo
7a454acd61
[ROCm] Update CI/Packaging pipeline to ROCm6.0 (#18985)
Update CI/Packaing pipeline to ROCm6.0
2024-01-03 17:25:15 +08:00
Yifan Li
54e471a054
[EP Perf] Display percentage of cuda/trt ops in cuda/trt ep on EP Perf Dashboard (#18868)
### Description
Display percentage of cuda/trt ops in cuda/trt ep on EP Perf Dashboard:

![image](https://github.com/microsoft/onnxruntime/assets/109183385/bafba098-1338-46fa-b10a-ca19eff2a746)

Check
[here](https://msit.powerbi.com/groups/d1ae6355-afd0-4c40-b78e-676a86cab1e2/reports/82101bbb-dad2-4f24-9ddf-a37f0d41509a/ReportSectionda402bdf6824e505a614?experience=power-bi)
to preview on ep perf dashboard


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
- brief overview of op metrics towards various models
- easy to identify models which haven't reached 100% ops on cuda/trt ep.
2023-12-20 22:11:47 -08:00
Ashwini Khade
4dff154f51
Fix nightly pipeline failure (#18867)
### Description
Fixes a failure in the ortmodule nightly pipeline. 



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-12-19 09:18:00 -08:00
Jian Chen
6d7519ede8
Adding new pipeline for python cuda testing (#18718)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-12-18 18:13:03 -08:00
Ashwini Khade
16df8377d3
Update transformers package to fix the security issue (#18730)
### Description
Updating transformers package in test pipeline to fix a security
vulnerability.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-12-11 09:15:23 -08:00
cloudhan
de32baeeef
[ROCm] Add GemmFloat8 (#18488) 2023-12-11 11:37:29 +08:00
Jian Chen
3ea27c2925
Create a new Nuget Package pipeline for CUDA 12 (#18135) 2023-11-28 09:03:46 -08:00
Abhishek Jindal
680a526e73
Training packaging pipeline for cuda12 (#18524)
### Description
<!-- Describe your changes. -->
Build ORT-training packaging pipeline for CUDA 12.2


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This will help any customer using CUDA 12 and would not need to build
ORT-training from source

Test run:
https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=382993&view=logs&s=130be951-c2f3-5601-5709-434b5e50ddb0
2023-11-21 13:19:21 -08:00
Jian Chen
d97fc1824f
Create a new Python Package pipeline for CUDA 12 (#18348)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-11-20 09:48:28 -08:00
Wei-Sheng Chin
3bcc137eb4
Tiny change to trigger the update of DORT's CI image (#18507)
Recent PyTorch breaks DORT CI and [a
patch](https://github.com/pytorch/pytorch/pull/113697) has been merged
into PyTorch main. In order to update DORT's CI, we made dummy change in
this PR.
2023-11-19 22:09:11 -08:00
PeixuanZuo
37d8bed53d
[ROCm] add migraphx into onnxruntime-training-rocm package (#18339) 2023-11-14 11:54:22 +08:00
RandySheriffH
59262dfc63
Add cuda context headers to zip (#18330)
Expose cuda context headers for cuda custom ops.

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-11-09 14:53:58 -08:00
Changming Sun
398ef677ba
Update protobuf python package's version (#18203)
1. Now we use a released version of ONNX, so we can directly download a
prebuilt package from pypi.org. We do not need to build one from source.
2. Update protobuf python package's version to match the C/C++ version
we are using.
3. Update tensorboard python python because the current one is
incompatible with the newer protobuf version.
2023-11-06 09:22:54 -08:00
liqun Fu
20f2dd8b6b
use onnx rel-1.15.0, update cgman, cmake/external and requirement hash (#18177) 2023-10-31 14:58:21 -07:00
Xavier Dupré
c10b83eb68
Update python cryptography version to 41.0.4 (#18056)
### Description

Version 41.0.0 currently used has vulnerabilities.

### Motivation and Context

See [Vulnerable OpenSSL included in cryptography
wheels](https://github.com/advisories/GHSA-v8gr-m533-ghj9)
2023-10-27 12:06:38 +02:00
Jian Chen
7c18c60bc2
Change cuda image for tensorRT to the one with cudnn8 (#18102)
### Description
copilot:summary


### Motivation and Context
copliot::walkthrough
2023-10-26 16:28:57 -07:00
Jian Chen
76e275baf4
Merge Cuda docker files into a single one (#18020)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-10-24 15:17:36 -07:00
Jian Chen
cbb0e0f83c
Create a new Dockerfile for cuda 12 and trt 8.6.1.6-1.cuda12.0 (#18000) 2023-10-18 14:46:02 -07:00
PeixuanZuo
2ef6ee674c
[ROCm] Update ROCm and MIGraphX CI to ROCm5.7 (#17834)
- Update ROCm and MIGraphX CI to ROCm5.7
- Simplify test exculde file. Some tests will output `registered
execution providers ROCMExecutionProvider were unable to run the model.`
if they cannot run.
- Add `enable_training` build argument for MIGraphX pipeline.
2023-10-09 10:29:11 +08:00
Wei-Sheng Chin
b5a103ae16
Upgrade transformers to fix CI (#17823)
Python package pipeline fails due to "tokenizers" compilation. Since
"tokenizers" is a dep of "transformers", we update its version and hope
a new solution had been there.

```
error: casting `&T` to `&mut T` is undefined behavior, even if the reference is unused, consider instead using an `UnsafeCell`
--> tokenizers-lib/src/models/bpe/trainer.rs:517:47
```
2023-10-07 09:51:24 -07:00
Chi Lo
569876fb16
[TensorRT EP] Refactor OrtTensorRTProviderOptions initialization and make it easy to add new field (#17617)
Two major modifications of this PR:

1. Refactor OrtTensorRTProviderOptions initialization and make it easy
to add new field.
2. Make Python API capable of using TensorRT plugins by adding new
Python binding api `register_tensorrt_plugins_as_custom_ops`. (It needs
to register ep's custom op domain before model load. For C++ API, it's
slightly different, when calling
SessionOptionsAppendExecutionProvider_TensorRT_XX, it appends cutom op
domain to session option. Later ORT can register custom op domain from
session option before model loading)
2023-10-06 14:12:20 -07:00
Justin Chu
be7541ef4a
[Linter] Bump ruff and remove pylint (#17797)
Bump ruff version and remove pylint from the linter list. Fix any new
error detected by ruff.

### Motivation and Context

Ruff covers many of the pylint rules. Since pylint is not enabled in
this repo and runs slow, we remove it from the linters
2023-10-05 21:07:33 -07:00
Changming Sun
276e8733bd
Update onnx python package and setuptools (#17709)
### Description
A follow-up for #17125
2023-09-27 07:54:48 -07:00
liqun Fu
2be4dc6d04
ONNX 1.15 integration (#17125)
### Description
this is for ORT 1.17.0 - make ORT to use ONNX release 1.15.0 branch. Eventually will update to the release tag once ONNX 1.15.0 is released


### Motivation and Context
Prepare for ORT 1.17.0 release. People can start work on new and updated ONNX ops in ORT.
---------

Signed-off-by: Liqun Fu <liqfu@microsoft.com>
2023-09-26 14:44:48 -07:00
Changming Sun
57dfd15d7b
Remove dnf update from docker build scripts (#17551)
### Description
1. Remove 'dnf update' from docker build scripts, because it upgrades TRT
packages from CUDA 11.x to CUDA 12.x.
To reproduce it, you can run the following commands in a CentOS CUDA
11.x docker image such as nvidia/cuda:11.8.0-cudnn8-devel-ubi8.
```
export v=8.6.1.6-1.cuda11.8
dnf  install -y libnvinfer8-${v} libnvparsers8-${v} libnvonnxparsers8-${v} libnvinfer-plugin8-${v} libnvinfer-vc-plugin8-${v}        libnvinfer-devel-${v} libnvparsers-devel-${v} libnvonnxparsers-devel-${v} libnvinfer-plugin-devel-${v} libnvinfer-vc-plugin-devel-${v} libnvinfer-headers-devel-${v}  libnvinfer-headers-plugin-devel-${v} 
dnf update -y
```
The last command will generate the following outputs:
```
========================================================================================================================
 Package                                     Architecture       Version                          Repository        Size
========================================================================================================================
Upgrading:
 libnvinfer-devel                            x86_64             8.6.1.6-1.cuda12.0               cuda             542 M
 libnvinfer-headers-devel                    x86_64             8.6.1.6-1.cuda12.0               cuda             118 k
 libnvinfer-headers-plugin-devel             x86_64             8.6.1.6-1.cuda12.0               cuda              14 k
 libnvinfer-plugin-devel                     x86_64             8.6.1.6-1.cuda12.0               cuda              13 M
 libnvinfer-plugin8                          x86_64             8.6.1.6-1.cuda12.0               cuda              13 M
 libnvinfer-vc-plugin-devel                  x86_64             8.6.1.6-1.cuda12.0               cuda             107 k
 libnvinfer-vc-plugin8                       x86_64             8.6.1.6-1.cuda12.0               cuda             251 k
 libnvinfer8                                 x86_64             8.6.1.6-1.cuda12.0               cuda             543 M
 libnvonnxparsers-devel                      x86_64             8.6.1.6-1.cuda12.0               cuda             467 k
 libnvonnxparsers8                           x86_64             8.6.1.6-1.cuda12.0               cuda             757 k
 libnvparsers-devel                          x86_64             8.6.1.6-1.cuda12.0               cuda             2.0 M
 libnvparsers8                               x86_64             8.6.1.6-1.cuda12.0               cuda             854 k
Installing dependencies:
 cuda-toolkit-12-0-config-common             noarch             12.0.146-1                       cuda             7.7 k
 cuda-toolkit-12-config-common               noarch             12.2.140-1                       cuda             7.9 k
 libcublas-12-0                              x86_64             12.0.2.224-1                     cuda             361 M
 libcublas-devel-12-0                        x86_64             12.0.2.224-1                     cuda             397 M

Transaction Summary
========================================================================================================================

```
As you can see from the output,  they are CUDA 12 packages. 

The problem can also be solved by lock the packages' versions by using
"dnf versionlock" command right after installing the CUDA/TRT packages.
However, going forward, to get the better reproducibility, I suggest
manually fix dnf package versions in the installation scripts like we do
for TRT now.

```bash
v="8.6.1.6-1.cuda11.8" &&\
    yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo &&\
    yum -y install libnvinfer8-${v} libnvparsers8-${v} libnvonnxparsers8-${v} libnvinfer-plugin8-${v} libnvinfer-vc-plugin8-${v}\
        libnvinfer-devel-${v} libnvparsers-devel-${v} libnvonnxparsers-devel-${v} libnvinfer-plugin-devel-${v} libnvinfer-vc-plugin-devel-${v} libnvinfer-headers-devel-${v}  libnvinfer-headers-plugin-devel-${v}
```
When we have a need to upgrade a package due to security alert or some
other reasons, we manually change the version string instead of relying
on "dnf update". Though this approach increases efforts, it can make our
pipeines more stable.

2. Move python test to docker
### Motivation and Context
Right now the nightly gpu package mixes using CUDA 11.x and CUDA 12.x
and the result package is totally not usable(crashes every time)
2023-09-21 07:33:29 -07:00
Pranav Sharma
038c76378f
Include onnxruntime_float16.h in the package. (#17637)
### Description
Include onnxruntime_float16.h in the package.

### Motivation and Context
This was missed in the recently released 1.16 pkgs (except Nuget).
2023-09-21 00:08:10 -07:00
PeixuanZuo
1f991f27f1
[ROCm] add manylinux build test for ROCm CI (#17621)
manylinux build is used for nightly packaging generation and it's hard
to capture issue in time when related files change. This PR add
manylinux build in CI.
2023-09-21 10:45:16 +08:00
Changming Sun
dd561f2015
Upgrade sympy (#17639)
AB#17015
2023-09-20 18:44:23 -07:00
Wei-Sheng Chin
068300d97e
Pin beartype version (#17599)
PyTorch doesn't like the latest beartype:
https://github.com/pytorch/pytorch/pull/109510
2023-09-18 19:31:04 -07:00
Yi Zhang
377f959c69
Run Final_Jar_Testing_Linux_GPU in docker (#17533)
### Description
1. Create a package test image based on [RedHat
UBI](https://www.redhat.com/en/blog/introducing-red-hat-universal-base-image)
2. Install TensorRT 8.6.1.6 in RedHat. (Ref.
https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html#maclearn-net-repo-install-rpm)
3. Run Final_Jar_Testing_Linux_GPU in docker (base image:
nvidia/cuda:11.8.0-cudnn8-devel-ubi8)

### Motivation and Context

[AB#18470](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/18470)

### Verification

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=354004&view=logs&j=8939b564-1402-57b5-92dc-510eba75e069&t=8939b564-1402-57b5-92dc-510eba75e069
2023-09-15 08:35:55 -07:00
Changming Sun
5d3786206b
Fix ROCM's nightly build (#17518)
### Description
PR 15470 updated some C/C++ dependencies. The change caused ROCM EP's
nightly build to fail. see issue
https://github.com/ROCm-Developer-Tools/HIP/issues/2082 for a
background. So, the root cause is HIP compiler has a special requirement
that HIP's include dirs must be used before the operating system's
include folder: /usr/include. HIP adds "-isystem" in front of
"/usr/include". gcc or clang will search the folders added with "-I"
first, then the "-isystem" folder. It works fine as long as we do not
add "-I/usr/include" to the compile commands for *.cu files. It would be wrong if
we already have installed an open source library to /usr and want to use the
prebuilt library from there instead of the current build dir. 


### Motivation and Context
2023-09-13 08:50:14 -07:00
Changming Sun
bc84f52633
Update C/C++ dependencies: abseil, date, nsync, googletest, wil, mp11, cpuinfo and safeint (#15470)
### Description
Update C/C++ dependencies abseil, date, nsync, googletest, wil, mp11,
cpuinfo and safeint to newer versions per request of @
mayeut. He created the following PRs to update the deps:
https://github.com/microsoft/onnxruntime/pull/15432
https://github.com/microsoft/onnxruntime/pull/15434
https://github.com/microsoft/onnxruntime/pull/15435
https://github.com/microsoft/onnxruntime/pull/15436
https://github.com/microsoft/onnxruntime/pull/15437

However, our build system needs to fetch the dependencies from an
internal mirror that only Microsoft employees have write access to. So I
closed his PRs and created this one.

This PR also updates abseil to a newer version. This is to prepare for
upgrading re2.
2023-09-08 13:35:04 -07:00
Yi Zhang
ae74a517b6
Run Nuget_Test_Linux_GPU in container (#17452)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

### Verification

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=351542&view=results
2023-09-08 13:41:20 +08:00
Yi Zhang
ede339f304
Move dotnet build and test into docker in Linux CPU CI (#17417)
### Description
install dotnet 6.0 in the docker image.
move C# build and test into docker.

### Motivation and Context

### Note
The Unit tests and Symbolic shape infer's migration will be in another
PR.
2023-09-07 09:28:16 +08:00
Changming Sun
c6b0d185b4
Update cmake to 3.27 and upgrade Linux CUDA docker files from CentOS7 to UBI8 (#16856)
### Description
1. Update docker files and their build instructions.
ARM64 and x86_64 can use the same docker file.

2. Upgrade Linux CUDA pipeline's base docker image from CentOS7 to UBI8
AB#18990
2023-09-05 18:12:10 -07:00
aciddelgado
44101e8771
Flash Attention v2 MHA (#17227)
### Description
Integrate Flash Attention V2 to PackedMultiHeadAttention,
MultiHeadAttention and Attention operators.

Flash Attention v2 source code is from
https://github.com/Dao-AILab/flash-attention/tree/main/csrc/flash_attn/src.
We did some change to remove dependency on Torch, then removed backward
and bfloat16 related code.

Add benchmark script (see benchmark_mha.sh) to compare different
attention kernels for MultiHeadAttention operator.

Current limitations for Flash Attention in PackedMultiHeadAttention,
MultiHeadAttention and Attention operators:
* Relative Position Bias is not supported
* Different hidden size for Q and V is not supported
* Only float16 is supported
* Padding/attention mask is not supported
* For MultiHeadAttention, when there is past or present input, bias
shall be provided to activate flash attention
* For Attention, past or present inputs will deactivate flash attention
* Causal is not supported

Some limitations (like attention mask and causal) might be removed
later.

Currently, Flash Attention v2 only works in Linux. For Windows, we will
enable later with Cutlass 3.2.

Two environment variables can be used for testing purpose:
(1) `ORT_DISABLE_FLASH_ATTENTION` to disable flash attention. Default
value is 0 (enable). Set it to "1" to disable it.
(2) `ORT_MIN_SEQ_LEN_FLASH_ATTENTION_PACKED_QKV`. Default value is
"513", which means that we only enable flash attention when sequence
length is larger than 512 for packed QKV format. Set it to "0" if you
want to use flash attention v2 whenever possible.

### Speedup

The following result is from Standard_ND96amsr_A100_v4 VM
(A100-SXM4-80GB GPU) using benchmark_mha.sh. The metric is TFLOPs per
second for MultiHeadAttention operator.

There are 3 input formats:
* `Q,K,V` means separated inputs query, key and value of BxSxNH
* `Q,KV` means packed KV, where key is 5D: BxSxNx2xH
* `QKV` means packed QKV, where query is 5D: BxSxNx3xH

Note that flash attention cannot use packed QKV format, so extra
Transpose is needed. We found that TensorRT kernel is faster for
sequence length <= 512 for packed QKV. The reason might be no transpose
is needed for TensorRT kernel in this format.

We also notice that, TensorRT kernel is faster for stable diffusion
512x512 image (see seq_len=4096, heads=8, head_dim=40 below), while
flash attention v2 is faster for 1024x1024 image (see seq_len=16384,
heads=8, head_dim=40 below).

input format | batch size | sequence length | heads | head dim |
flash_v2 (TFLOPs/s) | TensorRT (TFLOPs/s) | Memory Efficient Attention
(TFLOPs/s)
-- | -- | -- | -- | -- | -- | -- | --
Q,K,V | 32 | 512 | 64 | 32 | 78.1 | 60.0 | 39.3
Q,K,V | 32 | 512 | 128 | 16 | 46.8 | 44.1 | 21.7
Q,K,V | 16 | 1024 | 64 | 32 | 99.0 | 72.8 | 44.3
Q,K,V | 16 | 1024 | 128 | 16 | 54.7 | 49.2 | 23.4
Q,K,V | 8 | 2048 | 64 | 32 | 113.8 | 81.2 | 47.8
Q,K,V | 8 | 2048 | 128 | 16 | 59.7 | 51.9 | 24.7
Q,K,V | 4 | 4096 | 64 | 32 | 122.5 | 85.6 | 49.7
Q,K,V | 4 | 4096 | 128 | 16 | 62.5 | 53.3 | 25.3
Q,K,V | 2 | 8192 | 64 | 32 | 127.4 | 87.5 | 50.7
Q,K,V | 2 | 8192 | 128 | 16 | 64.0 | 54.2 | 25.6
Q,K,V | 1 | 16384 | 64 | 32 | 129.5 | 91.0 | 51.2
Q,K,V | 1 | 16384 | 128 | 16 | 64.7 | 54.5 | 25.8
Q,K,V | 1 | 4096 | 8 | 40 | 51.0 | 43.6 | 36.8
Q,K,V | 1 | 4096 | 8 | 80 | 97.7 | 77.0 | 55.5
Q,K,V | 1 | 4096 | 8 | 160 | 120.0 | 39.7 | 57.8
Q,K,V | 4 | 4096 | 8 | 40 | 89.0 | 84.4 | 49.2
Q,K,V | 4 | 4096 | 8 | 80 | 133.0 | 92.2 | 63.2
Q,K,V | 4 | 4096 | 8 | 160 | 164.8 | 42.7 | 63.8
Q,K,V | 1 | 16384 | 8 | 40 | 96.9 | 91.3 | 52.1
Q,K,V | 1 | 16384 | 8 | 80 | 142.9 | 101.5 | 65.6
Q,K,V | 1 | 16384 | 8 | 160 | 177.4 | 44.2 | 65.7
Q,K,V | 128 | 128 | 12 | 64 | 29.0 | 26.9 | 25.7
Q,K,V | 64 | 128 | 12 | 64 | 23.1 | 10.8 | 21.3
Q,K,V | 128 | 384 | 12 | 64 | 83.5 | 60.8 | 55.7
Q,K,V | 64 | 384 | 12 | 64 | 72.6 | 40.5 | 52.8
Q,K,V | 128 | 512 | 12 | 64 | 98.9 | 77.9 | 62.1
Q,K,V | 64 | 512 | 12 | 64 | 94.7 | 75.6 | 60.4
Q,KV | 32 | 512 | 64 | 32 | 85.9 | 41.1 | 41.1
Q,KV | 32 | 512 | 128 | 16 | 47.1 | 21.6 | 21.6
Q,KV | 16 | 1024 | 64 | 32 | 104.4 | 45.8 | 45.8
Q,KV | 16 | 1024 | 128 | 16 | 54.7 | 23.6 | 23.6
Q,KV | 8 | 2048 | 64 | 32 | 116.8 | 48.5 | 48.5
Q,KV | 8 | 2048 | 128 | 16 | 59.8 | 24.7 | 24.7
Q,KV | 4 | 4096 | 64 | 32 | 124.2 | 50.1 | 50.1
Q,KV | 4 | 4096 | 128 | 16 | 62.6 | 25.3 | 25.3
Q,KV | 2 | 8192 | 64 | 32 | 128.5 | 50.8 | 50.9
Q,KV | 2 | 8192 | 128 | 16 | 64.1 | 25.6 | 25.6
Q,KV | 1 | 16384 | 64 | 32 | 129.4 | 51.2 | 51.2
Q,KV | 1 | 16384 | 128 | 16 | 64.8 | 25.8 | 25.8
Q,KV | 1 | 4096 | 8 | 40 | 67.5 | 37.7 | 37.5
Q,KV | 1 | 4096 | 8 | 80 | 101.3 | 56.7 | 56.6
Q,KV | 1 | 4096 | 8 | 160 | 124.0 | 58.6 | 58.6
Q,KV | 4 | 4096 | 8 | 40 | 90.8 | 49.8 | 49.8
Q,KV | 4 | 4096 | 8 | 80 | 135.6 | 63.8 | 63.8
Q,KV | 4 | 4096 | 8 | 160 | 166.3 | 64.5 | 64.5
Q,KV | 1 | 16384 | 8 | 40 | 97.5 | 52.3 | 52.3
Q,KV | 1 | 16384 | 8 | 80 | 143.5 | 65.9 | 65.8
Q,KV | 1 | 16384 | 8 | 160 | 178.4 | 65.9 | 65.8
Q,KV | 128 | 128 | 12 | 64 | 26.8 | 48.1 | 30.9
Q,KV | 64 | 128 | 12 | 64 | 28.0 | 38.9 | 25.0
Q,KV | 128 | 384 | 12 | 64 | 97.7 | 61.1 | 61.0
Q,KV | 64 | 384 | 12 | 64 | 89.5 | 57.8 | 57.9
Q,KV | 128 | 512 | 12 | 64 | 111.9 | 66.7 | 66.9
Q,KV | 64 | 512 | 12 | 64 | 107.2 | 64.9 | 64.8
QKV | 32 | 512 | 64 | 32 | 77.2 | 84.7 | 39.3
QKV | 32 | 512 | 128 | 16 | 43.4 | 53.1 | 20.9
QKV | 16 | 1024 | 64 | 32 | 98.8 | 87.4 | 44.6
QKV | 16 | 1024 | 128 | 16 | 52.0 | 54.1 | 23.2
QKV | 8 | 2048 | 64 | 32 | 113.1 | 89.0 | 47.9
QKV | 8 | 2048 | 128 | 16 | 58.2 | 54.6 | 24.5
QKV | 4 | 4096 | 64 | 32 | 120.6 | 89.7 | 49.7
QKV | 4 | 4096 | 128 | 16 | 61.7 | 54.6 | 25.2
QKV | 2 | 8192 | 64 | 32 | 125.9 | 89.5 | 50.7
QKV | 2 | 8192 | 128 | 16 | 63.6 | 54.8 | 25.5
QKV | 1 | 16384 | 64 | 32 | 128.5 | 92.0 | 51.2
QKV | 1 | 16384 | 128 | 16 | 64.6 | 54.8 | 25.7
QKV | 1 | 4096 | 8 | 40 | 60.2 | **69.8** | 38.1
QKV | 1 | 4096 | 8 | 80 | 101.6 | 75.2 | 56.7
QKV | 1 | 4096 | 8 | 160 | 130.2 | 41.2 | 58.4
QKV | 4 | 4096 | 8 | 40 | 90.6 | **91.0** | 49.5
QKV | 4 | 4096 | 8 | 80 | 133.6 | 98.1 | 62.8
QKV | 4 | 4096 | 8 | 160 | 165.3 | 43.7 | 63.9
QKV | 1 | 16384 | 8 | 40 | 97.2 | 92.8 | 52.1
QKV | 1 | 16384 | 8 | 80 | 143.0 | 103.1 | 65.6
QKV | 1 | 16384 | 8 | 160 | 177.6 | 44.5 | 65.7
QKV | 128 | 128 | 12 | 64 | 31.1 | 65.9 | 27.6
QKV | 64 | 128 | 12 | 64 | 26.1 | 49.8 | 23.5
QKV | 128 | 384 | 12 | 64 | 84.6 | 88.5 | 56.1
QKV | 64 | 384 | 12 | 64 | 79.1 | 80.3 | 53.5
QKV | 128 | 512 | 12 | 64 | 97.3 | 114.2 | 62.2
QKV | 64 | 512 | 12 | 64 | 95.9 | 110.7 | 60.6
QKV | 4 | 2048 | 32 | 128 | 125.26 | 44.72 | 78.15
QKV | 4 | 4096 | 32 | 128 | 141.62 | 46.29 | 85.84
QKV | 8 | 2048 | 32 | 128 | 127.40 | 45.49 | 78.75
QKV | 8 | 4096 | 32 | 128 | 144.24 | 46.60 | 86.95

### Known Issues

NVCC uses huge memory while compiling flash attention CUDA kernel. Linux
build with CUDA might fail when machine has limited memory while number
of CPUs is large. Walkaround is to use a build machine with larger
memory, or use argument like `--nvcc_threads 1` to limit nvcc threads in
build.

### Motivation and Context
Increases speed and efficiency of MHA or Packed MHA.

---------

Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
Co-authored-by: tlwu@microsoft.com <tlwu@a100.crj0ad2y1kku1j4yxl4sj10o4e.gx.internal.cloudapp.net>
2023-08-31 13:52:21 -07:00
Yi Zhang
507a40e1e9
Add compiler cache in Linux GPU TensorRT CI. (#17348)
### Description
Add the compiler cache in linux GPU tensorRT CI.
Save about 30 minutes in the GPU machine. (52 minutes -> 24 minutes)

PS. 
There're only white-space differences in the dockerfile.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-08-31 08:13:26 +08:00
Jian Chen
081c0692a4
Update to nodejs version from 16 to 18.17.1 (#17351)
### Description
Update to nodejs version from 16 to 18.17.1



### Motivation and Context
Nodejs will reach EOL in September 2023
2023-08-30 12:41:48 -07:00
Jian Chen
922629aad8
Upgrade Centos7 to Alamlinux8 (#16907)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Get the latest gcc 12 by default

---------

Co-authored-by: Changming Sun <chasun@microsoft.com>
2023-08-29 21:05:36 -07:00
cloudhan
bf8b1681f9
Build nuget pkg for ROCm (#16791)
Add nuget pkg building and publishing for ROCm EP

---------
Co-authored-by: Yi Zhang <zhanyi@microsoft.com>
2023-08-28 13:35:08 +08:00
Yifan Li
808215366d
Fix Multi GPU TensorRT tests (#17269)
### Description
* Integrate `trt_multi_gpu` test stage in ORT post merge CI (Win-2xA10
vm)
* Deprecate Linux MultiGPU TRT CI (This vm will be deprecated soon)
* Add multi gpu support to existing C# test cases
* Deprecate unfunctional flag `--enable_multi_device_tests`

### Motivation and Context
* Two contexts of replacing Linux MultiGPU TRT CI:
* Flag `--enable_multi_device_tests` is not functional, which cannot
detect issues like #17036
* The Linux-2xM60 VM of this CI pool is about to be deprecated 9/6/23.
Need to enable this test in other dualGPU vm pool.
2023-08-25 20:30:45 -07:00
Changming Sun
6db72165eb
Fix python packaging test pipeline (#17204)
### Description
1. Fix python packaging test pipeline. There was an error in
tools/ci_build/github/linux/run_python_tests.sh that it installed a
released version of onnxruntime python package from pypi.org to run the
test. Supposedly it should pick one from the current build.
2. Refactor the pipeline to allow choosing cmake build type from the web
UI when manually trigger a build. Now this feature is for Linux only.
Because I don't want to change too much when we are about to cut a
release branch. After that I will expand it to all platforms. This
feature is useful for debugging pipeline issues, also, we may consider
having a nightly pipeline to run all tests in Debug mode which may catch
extra bugs because in debug mode we can enforce range check.

Test run:
https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=342674&view=results

### Motivation and Context
Currently the pipeline has a crash error. 

AB#18580
2023-08-18 14:51:26 -07:00
PeixuanZuo
be2200c00b
[ROCm] fix python package pipeline (#17136)
ROCm python package pipeline failed because this
PR(https://github.com/microsoft/onnxruntime/pull/16325) changed onnx
version to a commit and we need to build onnx from source. Low protobuf
version will cause build errors.
This PR remove `cmake ` and `protobuf ` from Dockerfile, these two will
install by `install_os_deps.sh`.
2023-08-14 11:22:43 -07:00