Commit graph

62 commits

Author SHA1 Message Date
Jian Chen
3ea27c2925
Create a new Nuget Package pipeline for CUDA 12 (#18135) 2023-11-28 09:03:46 -08:00
Jian Chen
7c18c60bc2
Change cuda image for tensorRT to the one with cudnn8 (#18102)
### Description
copilot:summary


### Motivation and Context
copliot::walkthrough
2023-10-26 16:28:57 -07:00
Jian Chen
76e275baf4
Merge Cuda docker files into a single one (#18020)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-10-24 15:17:36 -07:00
Hariharan Seshadri
9356986730
Fix AMD builds and enable testing NHWC CUDA ops in one GPU CI (#17972)
### Description
This PR:

(1) Fixes AMD builds after #17200 broke them (Need to remember to run
AMD builds while trying to merge external CUDA PRs next time)

(2) Turn on the NHWC CUDA feature in the Linux GPU CI. The extra time
spent in building a few more files and running a few more tests will not
be much.

Test Linux GPU CI run :
https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1170770

### Motivation and Context
Keep the NHWC CUDA ops tested
(https://github.com/microsoft/onnxruntime/pull/17200) and guard against
regressions
2023-10-17 09:23:52 -07:00
Changming Sun
c6b0d185b4
Update cmake to 3.27 and upgrade Linux CUDA docker files from CentOS7 to UBI8 (#16856)
### Description
1. Update docker files and their build instructions.
ARM64 and x86_64 can use the same docker file.

2. Upgrade Linux CUDA pipeline's base docker image from CentOS7 to UBI8
AB#18990
2023-09-05 18:12:10 -07:00
Yi Zhang
d4a61ac71f
Pr trggiers generated by code (#17247)
### Description
1. Refactor the trigger rules generation.
2. Skip all doc changes in PR pipelines.


### Motivation and Context 
Make all trigger rules generated by running set-trigger-rules.py to
reduce inconsistences.
It's easily to make mistakes to copy&paste manually. 

For example: these 2 excludes are different, Why?

4e6cec4d09/tools/ci_build/github/azure-pipelines/linux-ci-pipeline.yml (L16-L18)


4e6cec4d09/tools/ci_build/github/azure-pipelines/linux-gpu-ci-pipeline.yml (L27-L29)


### Note
All changes in workflow yamls are generated by code.
Please review the **skip-js.yml, skip-docs.yml and
set-trigger-rules.py**.

@fs-eire, please double check the 
filter rules in skip-js.yml
and the skipped workflows

7023c2edff/tools/ci_build/set-trigger-rules.py (L14-L41)
2023-08-30 05:57:03 +08:00
Jian Chen
45f52987a2
Web CI Pipeline Isolation (#17005)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-08-14 10:37:37 -07:00
Yi Zhang
555414f1aa
Set PR trigger rules (#16987)
### Description
Add a script to insert the trigger rules to workflow yamls.
First step, skipp windows gpu and linux gpu workflow when there's only
doc change

### Motivation and Context
Make skipping workflows for doc change easily.

[AB#18201](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/18201)
2023-08-04 08:21:07 -07:00
Changming Sun
6b5b79872b
Avoid taking dependency on dl.fedoraproject.org (#16202)
### Description
1. Avoid taking dependency on dl.fedoraproject.org
The website is not very stable. Our build pipelines often fail to fetch
packages from there.

2. Update manylinux to the latest version
2023-06-02 07:41:46 -07:00
Yi Zhang
6d43d51eb0
[Fix] No test result report while not using ctest (#15976)
### Description
1. Set gtest output while ctest is set to empty.
2. onnx_src in _deps shouldn't be removed because
onnx_test_pytorch_converted and onnx_test_pytorch_converted need to read
data from onnx/backend/test/data/..

### Motivation and Context
Test result report is important to find the flaky tests.

### To do
Tests are not inconsistent.
If ctest_path is empty, onnx_test_pytorch_converted and
onnx_test_pytorch_converted will not be executed, if it's not,
onnxruntime_mlas_test will not be executed.


270c09a37f/tools/ci_build/build.py (L1743-L1753)
2023-05-17 08:31:16 -07:00
Yi Zhang
b20d5e85d5
Update Cuda to 11.8 in 2 Linux GPU workflows. (#15925)
### Description
use template variable for cuda version


### Motivation and Context
2023-05-14 12:51:25 +08:00
Yi Zhang
0e7ae13e74
Run Linux GPU tests in docker container (#15872)
### Description
Run  Linux GPU tests in docker container

### Motivation and Context
2023-05-12 06:29:22 +08:00
Changming Sun
d3d232b047
Rename onnxruntime-Linux-CPU-2019 machine pool (#15691)
Rename onnxruntime-Linux-CPU-2019 machine pool to
"onnxruntime-Ubuntu2004-AMD-CPU". The old one has an internal error and
stuck there. I cannot make any change to it. It has been like this for
more than 1 week. So I created a new pool with the same setting except
the name is different.
Also, move some android pipelines to
"onnxruntime-Linux-CPU-For-Android-CI" which uses a standard image from
https://github.com/actions/runner-images
2023-04-27 12:46:18 -07:00
Yi Zhang
0ea965c541
clear cache stat. after building (#15439)
### Description
Add  `ccache -z` after every building.


### Motivation and Context
Uploaded Cache stat shouldn't include cache stat.
2023-04-10 13:56:55 +08:00
Jian Chen
792d411135
Update python 3.11 and remove 3.7 for Linux (#15214)
### Description
Update python 3.11 and remove 3.7



### Motivation and Context
Update python 3.11 and remove 3.7

---------

Co-authored-by: Ubuntu <chasun@chasunlinux.lw3b1xzoyrkuzm34swpscft0ff.dx.internal.cloudapp.net>
2023-03-27 14:46:30 -07:00
Changming Sun
63cc1bb26a
Move Linux CPU pipelines to an AMD CPU pool which is cheaper (#15144)
### Description
1. Move Linux CPU pipelines to an AMD CPU pool which is cheaper
2. Enable CCache for orttraining pipeline

### Motivation and Context
Azure AMD CPU machines are generally much cheaper than Intel CPU
machines. However, they don't have local disks.
2023-03-27 14:10:08 -07:00
Edward Chen
c46c7ccba5
Update Gradle version (#14862)
- Update Gradle version used in most places from 6.8.3 to 8.0.1. Update Android Gradle Plugin version where applicable.
  Not updated in this change: React Native Android projects (under `js/react_native/`). That can be done later along with updating the React Native projects.

- Add Gradle wrapper in `java/` to make it easier to consistently use a specific Gradle version.
2023-03-08 12:22:06 -08:00
Yi Zhang
aa9fbed3d4
Add compilation cache for Linux GPU (#13995)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2022-12-16 16:38:12 +08:00
Changming Sun
04900f96c1
Improve dependency management (#13523)
## Description
1. Convert some git submodules to cmake external projects
2. Update nsync from
[1.23.0](https://github.com/google/nsync/releases/tag/1.23.0) to
[1.25.0](https://github.com/google/nsync/releases/tag/1.25.0)
3. Update re2 from 2021-06-01 to 2022-06-01
4. Update wil from an old commit to 1.0.220914.1 tag
5. Update gtest to a newer commit so that it can optionally leverage
absl/re2 for parsing command line flags.

The following git submodules are deleted:

1. FP16
2. safeint
3. XNNPACK
4. cxxopts
5. dlpack
7. flatbuffers
8. googlebenchmark
9. json
10. mimalloc
11. mp11
12. pthreadpool

More will come.

## Motivation and Context
There are 3 ways of integrating 3rd party C/C++ libraries into ONNX
Runtime:
1. Install them to a system location, then use cmake's find_package
module to locate them.
2.  Use git submodules 
6.  Use cmake's external projects(externalproject_add). 

At first when this project was just started, we considered both option 2
and option 3. We preferred option 2 because:

1. It's easier to handle authentication. At first this project was not
open source, and it had some other non-public dependencies. If we use
git submodule, ADO will handle authentication smoothly. Otherwise we
need to manually pass tokens around and be very careful on not exposing
them in build logs.
2. At that time, cmake fetched dependencies after "cmake" finished
generating vcprojects/makefiles. So it was very difficult to make cflags
consistent. Since cmake 3.11, it has a new command: FetchContent, which
fetches dependencies when it generates vcprojects/makefiles just before
add_subdirectories, so the parent project's variables/settings can be
easily passed to the child projects.

And when the project went on,  we had some new concerns:
1. As we started to have more and more EPs and build configs, the number
of submodules grew quickly. For more developers, most ORT submodules are
not relevant to them. They shouldn't need to download all of them.
2. It is impossible to let two different build configs use two different
versions of the same dependency. For example, right now we have protobuf
3.18.3 in the submodules. Then every EP must use the same version.
Whenever we have a need to upgrade protobuf, we need to coordinate
across the whole team and many external developers. I can't manage it
anymore.
3. Some projects want to manage the dependencies in a different way,
either because of their preference or because of compliance
requirements. For example, some Microsoft teams want to use vcpkg, but
we don't want to force every user of onnxruntime using vcpkg.
7. Someone wants to dynamically link to protobuf, but our build script
only does static link.
8. Hard to handle security vulnerabilities. For example, whenever
protobuf has a security patch, we have a lot of things to do. But if we
allowed people to build ORT with a different version of protobuf without
changing ORT"s source code, the customer who build ORT from source will
be able to act on such things in a quicker way. They will not need to
wait ORT having a patch release.
9. Every time we do a release, github will also publish a source file
zip file and a source file tarball for us. But they are not usable,
because they miss submodules.
 
### New features

After this change, users will be able to:
1. Build the dependencies in the way they want, then install them to
somewhere(for example, /usr or a temp folder).
2. Or download the dependencies by using cmake commands from these
dependencies official website
3. Similar to the above, but use your private mirrors to migrate supply
chain risks.
4. Use different versions of the dependencies, as long as our source
code is compatible with them. For example, you may use you can't use
protobuf 3.20.x as they need code changes in ONNX Runtime.
6.  Only download the things the current build needs.
10. Avoid building external dependencies again and again in every build.

### Breaking change
The onnxruntime_PREFER_SYSTEM_LIB build option is removed you could think from now 
it is default ON. If you don't like the new behavior, you can set FETCHCONTENT_TRY_FIND_PACKAGE_MODE to NEVER.
Besides, for who relied on the onnxruntime_PREFER_SYSTEM_LIB build
option, please be aware that this PR will change find_package calls from
Module mode to Config mode. For example, in the past if you have
installed protobuf from apt-get from ubuntu 20.04's official repo,
find_package can find it and use it. But after this PR, it won't. This
is because that protobuf version provided by Ubuntu 20.04 is too old to
support the "config mode". It can be resolved by getting a newer version
of protobuf from somewhere.
2022-12-01 09:51:59 -08:00
Changming Sun
a396a91c9a
Move build machines with Nvidia M60 GPUs to Nvidia T4 (#13170) 2022-10-25 11:21:13 -07:00
Changming Sun
eafd67b8fd
Update CUDA version to 11.6 and refactor python packaging pipeline (#13002)
1. Update CUDA version from 11.4 to 11.6.
2. Update Manylinux version
3. Upgrade GCC version from 10 to 11 for most x86_64 pipelines. CentOS 7 ARM64 doesn't have GCC 11 yet.
4. Refactor python packaging pipeline: 
    a. Split Linux GPU build job to two parts, build and test, so that the
build part doesn't need to use a GPU machine
    b. Make the Linux GPU build job and Linux CPU build job more similar: share the same bash script and yaml file.
5. Temporarily disable Attention_Mask1D_Fp16_B2_FusedNoPadding because it is causing one of our packaging pipeline to fail. I have created an ADO task for this.
2022-09-23 00:29:27 -07:00
Changming Sun
b2b4f703a5
Move Linux GPU CI pipeline to T4 (#12996)
Move Linux GPU CI pipeline to T4
2022-09-20 20:21:32 -07:00
Changming Sun
5d610bc8eb
Disable CG task in PR pipelines (#12426) 2022-08-02 19:01:41 -07:00
Changming Sun
7b4ce0c1e1
Delete the build scripts that were copied from manylinux project (#12358)
1. Delete the build scripts that were copied from manylinux project. Use "git checkout" instead.
2. Update manylinux version to get python 3.11. Related issue: Python 3.11 support #12343
3. Change the cuda version of linux gpu build job of nuget packaging pipeline from cuda 11.4 to cuda 11.6 to match the TRT job within the same pipeline.. (A lot other places need be updated as well, but I'd prefer to put them in another PR)
4. Make dockerfile names static. For example, replace tools/ci_build/github/linux/docker/$(DockerFile) to tools/ci_build/github/linux/docker/Dockerfile.manylinux2014_cpu . The former one relies on a runtime variable $(DockerFile), Template Parameters are expanded early in processing a pipeline run when most variables are not available. It like C++ macros vs variables.
2022-07-29 18:24:19 -07:00
Gary Miguel
4bf22e2a40
Update ONNX to 1.12 (#11924)
Follow-ups that need to happen after this and before the next ORT release:
* Support SequenceMap with https://github.com/microsoft/onnxruntime/pull/11731
* Support signal ops with https://github.com/microsoft/onnxruntime/pull/11778

Follow-ups that need to happen after this but don't necessarily need to happen before the release:
* Implement LayerNormalization kernel for opset version 17: https://github.com/microsoft/onnxruntime/issues/11916

Fixes #11640
2022-06-21 17:19:52 -07:00
Changming Sun
57b51e72d7
Linux CI: uninstall onnx before installing it (#11428) 2022-05-04 08:49:37 -07:00
Changming Sun
588a66e221
Add cleanup steps to the build jobs which run in Linux CPU machine pool (#11078) 2022-03-31 22:34:12 -07:00
Changming Sun
cc3a3476ed
Uninstall onnxruntime-training before running local tests (#10827)
* Uninstall onnxruntime-training before running local tests
2022-03-09 18:45:04 -08:00
dependabot[bot]
4d943c9bd3
Bump numpy from 1.16.6 to 1.21.0 in /tools/ci_build/github/linux/docker/scripts/manylinux (#10387)
* Bump numpy in /tools/ci_build/github/linux/docker/scripts/manylinux
2022-03-07 20:39:49 -08:00
Dmitri Smirnov
7e092a7e3f
Reduce number of memory allocations based on a customer profiling case (#10193)
Add abseil and inlined containers typedefs
Introduce TensorShapeVector for shape building.
Use gsl::span<const T> to make interfaces accept different types of vector like args.
Introduce InineShapeVectorT for shape capacity typed instantiations
Refactor cuda slice along with provider shared interfaces
Refactor Concat, Conv, Pad
Build with Conv Einsum and ConvTranspose refactored.
Remove TesnorShape::GetDimsAsVector()
Refactor SliceIterator and SliceIteratorBase
Refactor broadcast
Refactor Pads for twice as long
Remove memory planner intermediate shapes vector
Refactor orttraining
Fix passing TenshroShapeVector to tests
Remove abseil copy and submodule, use FetchContent_Declare/Fetch
Path with separate command
Make RocmAsyncBuffer accept anything convertible to span. Adjust Linux GPU pipeline.
2022-01-24 10:40:46 -08:00
RandySheriffH
9345894c82
Add build option to enable cuda profiling (#9875) 2021-11-29 22:44:50 -08:00
Chun-Wei Chen
ac57afc3a6
Update ONNX to 1.10 globally in CIs (#9751)
* Bump ONNX 1.10.2 globally

* load ONNX_VERSION from VERSION_NUMBER

* /
2021-11-15 15:28:26 -08:00
Olivia Jain
60089f7093
Cuda11.4 (#8709)
* initial update from 11.1 to 11.4

* change 11.4.1 to 11.4.0

* adjusting to match nvidia/cuda image tags

* adjusting to match nvidia/cuda image tags centos7

* correction to 11.4.0

* correction to 11.4.0

* update to cuda 11.4

* change training back to 11.1

* change training back to 11.1

* point to correct nvcr.io/nvidia/cuda 11.4.1 image

* change centos8 to centos7

* correct cudnn path

* Update linux-gpu-ci-pipeline.yml for Azure Pipelines

* Update c-api-noopenmp-packaging-pipelines.yml

* need to resolve centos images but remove space and change to 11.4

* Update linux-gpu-ci-pipeline.yml

* add cudnn to docker image

* bump devtoolset to 10

* revert cuda 11.4 change to setup_env_trt

* orttraining back to 11.1

* use nvcr.io

* Fix previous change back to cuda 11.1

* update cudnn path

* use cudnn image (revert if failure)
2021-08-17 16:36:26 -07:00
Thiago Crepaldi
83be3759bc
Add post-install command to build PyTorch CPP extensions from within onnxruntime package (#8027)
ORTModule requires two PyTorch CPP extensions that are currently JIT compiled. The runtime compilation can cause issues in some environments without all build requirements or in environments with multiple instances of ORTModule running in parallel

This PR creates a custom command to compile such extensions that must be manually executed before ORTModule is executed for the first time. When users try to use ORTModule before the extensions are compiled, an error with instructions are raised

PyTorch CPP Extensions for ORTModule can be compiled by running:
python -m onnxruntime.training.ortmodule.torch_cpp_extensions.install

Full build environment is needed for this
2021-06-28 18:11:58 -07:00
Changming Sun
cba4bc11c7
Split Linux CPU CI pipeline (#8097) 2021-06-21 10:52:30 -07:00
Changming Sun
b854f2399d
Update manylinux build scripts and GPU CUDA version from 11.0 to 11.1 (#7632)
1. Update manylinux build scripts. This will add [PEP600](https://www.python.org/dev/peps/pep-0600/)(manylinux2 tags) support. numpy has adopted this new feature, we should do the same. The old build script files were copied from https://github.com/pypa/manylinux, but they has been deleted and replaced in the upstream repo. The manylinux repo doesn't have a manylinux2014 branch anymore. So I'm removing the obsolete code, sync the files with the latest master.
2. Update GPU CUDA version from 11.0 to 11.1(after a discussion with PMs). 
3. Delete tools/ci_build/github/linux/docker/Dockerfile.manylinux2014_cuda10_2.  (Merged the content to tools/ci_build/github/linux/docker/Dockerfile.manylinux2014_cuda11)
4. Modernize the cmake code of how to locate python devel files. It was suggested in https://github.com/onnx/onnx/pull/1631 .
5. Remove `onnxruntime_MSVC_STATIC_RUNTIME` and `onnxruntime_GCC_STATIC_CPP_RUNTIME` build options. Now cmake has builtin support for it. Starting from cmake 3.15, we can use `CMAKE_MSVC_RUNTIME_LIBRARY` cmake variable to choose which MSVC runtime library we want to use. 
6. Update Ubuntu docker images that used in our CI build from Ubuntu 18.04 to Ubuntu 20.04.
7. Update GCC version in CUDA 11.1 pipelines from 8.x to 9.3.1
8. Split Linux GPU CI pipeline to two jobs: build the code on a CPU machine then run the tests on another GPU machines.  In the past we didn't test our python packages. We only tested the pre-packed files. So we didn't catch the rpath issue in CI build. 
9. Add a CentOS machine pool and test our Linux GPU build on real CentOS machines. 
10. Rework ARM64 Linux GPU python packaging pipeline. Previously it uses cross-compiling therefore we must static link to C Runtime. But now have pluggable EP API and it doesn't support static link. So I changed to use qemu emulation instead. Now the build is 10x slower than before. But it is more extensible.
2021-06-02 23:36:49 -07:00
Edward Chen
04679e31ab
Specify CUDA compute capability 7.5 in Linux GPU build (#7203)
Recently a build agent pool was changed to use T4 GPUs (CUDA compute capability 7.5). Updating some CUDA build options to accommodate that.
2021-03-31 18:51:44 -07:00
Changming Sun
4161758058
Remove openmp related packaging pipeline (#6991)
1. Remove openmp related packaging pipelines and build jobs.
2. Set continueOnError to true for the TSAUpload tasks. Their service is unstable recently.
3. Update Ubuntu 16 docker images to Ubuntu 18, in prepare for getting C++17 support
4. Cherry-pick the changes in 1.7.1 to the master: updating CFLAGS/CXXFLAGS to strip out debug symbols
2021-03-12 10:02:59 -08:00
Changming Sun
b5bd14fc9f
Update GPU packaging pipelines to cuda11 and fix the other build break issues (#6585)
Update gpu packaging pipelines to CUDA11

In the next release we will use CUDA 11. And our CUDA 11 build suddenly became broken because recently CentOS 7 posted an update of glibc. The version of glibc was changed from 2.17-317.el7 to 2.17-322.el7_9. But the newer one isn't compatible with CUDA 11. We have to downgrade it.
2021-02-05 16:58:37 -08:00
Edward Chen
6d642a3dba
Replace direct pulls from image cache container registry with get_docker_image.py, build definition clean up. (#5906) 2020-12-01 19:10:23 -08:00
Changming Sun
2d9dcc4576
Add python 3.9 support (#5874)
1. Add python 3.9 support(except Linux ARM)
2. Add Windows GPU python 3.8 to our packaging pipeline.
2020-11-30 12:02:48 -08:00
Changming Sun
26db396b4b
Reduce the number of CI build variants (#5856) 2020-11-18 20:41:30 -08:00
Changming Sun
85f945a875
Regenerate CI build docker images (#5850) 2020-11-18 14:36:59 -08:00
Ashwini Khade
1cca903680
update onnx commit id (#5594)
* update onnx commit id

* update onnx commit for docker images

* update docker images
2020-11-02 09:46:36 -08:00
Ashwini Khade
df22611026
Update ONNX commit (#5487)
* update ONNX

* update onnx + register kernels for reduction ops

* bug fix kernel reg

* update cgmanifests

* revert unsqueeze op 13 registration

* filter ops which are not implemented yet

* filter some tests

* update onnx commit to include conv transpose bug fix

* update docker images

* undo not required test changes

* fix test failures
2020-10-21 07:22:20 -07:00
Changming Sun
17f1178c2e
Downgrade GCC (#5269)
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2020-09-24 21:14:54 -07:00
Changming Sun
a0a435abc6
Add sympy==1.1.1 to Linux docker image (#5177) 2020-09-15 16:08:49 -07:00
Changming Sun
c5efb0085d
Update Linux GPU build pipelines to CUDA 10.2 (#5120)
* Update Linux GPU build pipelines to CUDA 10.2
2020-09-10 17:40:51 -07:00
Changming Sun
d5d5e37e76
Build system enhancements (#5012)
1. Add a docker file for CUDA11
2. Support setting CUDA_ARCHITECTURES from command line.
2020-09-02 10:13:26 -07:00
Ashwini Khade
8679a7244e
Enable rejecting models based on onnx opset (#4912)
* enable rejecting models based on onnx opset

* enable unreleased opsets in linux and mac CI

* test fixes and more updates

* enable unreleased opsets in CI builds

* enable released opsets in linux cis

* try fix windows ci yml

* yml fixes

* update yml

* yml updates post master merge

* review comments

* bug fix
2020-08-31 13:35:36 -07:00