ROCm python package pipeline failed because this
PR(https://github.com/microsoft/onnxruntime/pull/16325) changed onnx
version to a commit and we need to build onnx from source. Low protobuf
version will cause build errors.
This PR remove `cmake ` and `protobuf ` from Dockerfile, these two will
install by `install_os_deps.sh`.
### Description
All our Windows build pipelines already uses cmake 3.26 except one
pipeline: QNN ARM64.
This PR does the same for Linux build pipelines.
### Motivation and Context
This change is related to #15704 .
rocm python packaging pipeline failed because manylinux version and
manylinux.patch update.
1. fix duplicate `epel-release` installation issue, ROCm pipeline
install it at the begin of the dockerfile to install rocm libs. remove
duplicate installation on install-runtime-packages.sh.
```
/var/tmp/yum-root-sMRl36/epel-release-latest-7.noarch.rpm: does not update installed package.
Error: Nothing to do
```
2. add python10 to fix error below.
```
+ /opt/python/cp310-cp310/bin/python -m venv /opt/_internal/tools
build_scripts/finalize.sh: line 40: /opt/python/cp310-cp310/bin/python: No such file or directory
```
3. add python10 to rocm pipeline.
pipeline link:
https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=294776&view=results
- Use java/gradlew directly in .github/workflows/publish-java-apidocs.yml.
- Remove use of deleted step from tools/ci_build/github/azure-pipelines/android-arm64-v8a-QNN-crosscompile-ci-pipeline.yml.
- Remove Gradle installations and PATH updates from Dockerfiles and scripts. Now Gradle wrapper is used so a system Gradle installation is not needed.
### Description
<!-- Describe your changes. -->
1. Remove ROCm5.3 pipeline because it has rocblas bug, we don't need it.
2. We removed the dependency on centos docker image provided by
AMD(https://hub.docker.com/r/rocm/dev-centos-7) and build ROCm centos
base image by ourselves. The reference
dockerfile(https://github.com/RadeonOpenCompute/ROCm-docker/blob/master/dev/Dockerfile-centos-7)
is very redundant for our need. We simplified the ROCm manylinux
dockerfile.
3. Different versions of rocm use the same dockerfile
`Dockerfile.manylinux2014_rocm`.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Co-authored-by: peixuanzuo <peixuanzuo@linmif39a000004.zvflicr54joexhdgnhvmxrxygg.phxx.internal.cloudapp.net>
* make work for both rocm 4.2 and rocm 4.3.1
* fix rocm 4.3.1 docker image reference
* fix CUDA_VERSION to ROCM_VERSION
* fix ReduceConsts conflict def
* add ifdef to miopen_common.h as well
* trailing ws
* install protobuf from source
* fix rm command in Dockerfile
* fix options on rm command
* fix cd into protobuf source directory
* try again
* remove strip step
* debug list the files
* ls on /usr
* more debug
* more debug
* adjust LD_LIBRARY_PATH
* try remove protobuf before ORT build
1. Update SDLNativeRules from v2 to v3. The new one allows us setting excluded paths.
2. Update TSAUpload from v1 to v2. And add a config file ".gdn/.gdntsa" for it.
3. Fix some parentheses warnings
4. Update cmake to the latest.
5. Remove "--x86" build option from pipeline yaml files. Now we can auto-detect cpu architecture from python. So we don't need to ask user to specify it.
* first attempt share docker image across python and torch versons
* set dependency between jobs
* fix yaml grammer
* remove python version from first stage
* clean deepspeed directroy
* split into two images according torch version
* fix yaml syntax
* invalidate cache
* remove DS to prevent torch 1.9.0 upgrade
1. Update manylinux build scripts. This will add [PEP600](https://www.python.org/dev/peps/pep-0600/)(manylinux2 tags) support. numpy has adopted this new feature, we should do the same. The old build script files were copied from https://github.com/pypa/manylinux, but they has been deleted and replaced in the upstream repo. The manylinux repo doesn't have a manylinux2014 branch anymore. So I'm removing the obsolete code, sync the files with the latest master.
2. Update GPU CUDA version from 11.0 to 11.1(after a discussion with PMs).
3. Delete tools/ci_build/github/linux/docker/Dockerfile.manylinux2014_cuda10_2. (Merged the content to tools/ci_build/github/linux/docker/Dockerfile.manylinux2014_cuda11)
4. Modernize the cmake code of how to locate python devel files. It was suggested in https://github.com/onnx/onnx/pull/1631 .
5. Remove `onnxruntime_MSVC_STATIC_RUNTIME` and `onnxruntime_GCC_STATIC_CPP_RUNTIME` build options. Now cmake has builtin support for it. Starting from cmake 3.15, we can use `CMAKE_MSVC_RUNTIME_LIBRARY` cmake variable to choose which MSVC runtime library we want to use.
6. Update Ubuntu docker images that used in our CI build from Ubuntu 18.04 to Ubuntu 20.04.
7. Update GCC version in CUDA 11.1 pipelines from 8.x to 9.3.1
8. Split Linux GPU CI pipeline to two jobs: build the code on a CPU machine then run the tests on another GPU machines. In the past we didn't test our python packages. We only tested the pre-packed files. So we didn't catch the rpath issue in CI build.
9. Add a CentOS machine pool and test our Linux GPU build on real CentOS machines.
10. Rework ARM64 Linux GPU python packaging pipeline. Previously it uses cross-compiling therefore we must static link to C Runtime. But now have pluggable EP API and it doesn't support static link. So I changed to use qemu emulation instead. Now the build is 10x slower than before. But it is more extensible.
* Install and use conda on ortmodule CI pipelines
* Update build script to install onnxruntime wheel before running unit tests
* Remove python 3.5 from install_python_deps
* Pinning deepspeed version to 0.3.15
* first attempt rocm training wheel
* modifications needed to python packaging pipeline for Rocm 4.1
* changges to not conflict with cuda
missed stage1 changes
remove package push
add option r to getopt
try again without python install
try again without python install
try again without python install
split pipelines and add back push to remote storage
try on cuda gpu pool
try again
try again
try running without az subscription set
try again on original pipeline
change pool
passing AMD Rocm whl on AMD-GPU pool
split rocm pipeline from cuda pipeline
remove comments
* try adding Rocm tests as well
* try with tests in place
* fix trailing ws
* add training data
* try again as root for tests
* use python3
* typo
* try to map video, render group into container
* try again
* try again
* try to avoid yum error code
* make UID 1001
* try without yum downgrade
* define rocm_version=None
* remove CUDA related comments for Rocm Dockerfile
* Dont pin nightly torch torchvision torchtext versions as they expire (for now nightly is required for Rocm 4.1)
* missed requirements-rocm.txt from last commit
* fix whitespace