1. Update google benchmark from 1.8.3 to 1.8.5
2. Update google test from commit in main branch to tag 1.15.0
3. Update pybind11 from 2.12.0 to 2.13.1
4. Update pytorch cpuinfo to include the support for Arm Neoverse V2,
Cortex X4, A720 and A520.
5. Update re2 from 2024-05-01 to 2024-07-02
6. Update cmake to 3.30.1
7. Update Linux docker images
8. Fix a warning in test/perftest/ort_test_session.cc:826:37: error:
implicit conversion loses integer precision: 'streamoff' (aka 'long
long') to 'const std::streamsize' (aka 'const long')
[-Werror,-Wshorten-64-to-32]
### Description
<!-- Describe your changes. -->
There is a bug for kernel running on rocm6.0, so change ci docker image
to rocm6.1
For the torch installed in the docker image, change to rocm repo when it
is not 6.0 version.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
ROCm CI pipeline issue.
```
Downloading and preparing dataset wikitext/wikitext-2-raw-v1 (download: 4.50 MiB, generated: 12.91 MiB, post-processed: Unknown size, total: 17.41 MiB) to /home/onnxruntimedev/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/aa5e094000ec7afeb74c3be92c88313cd6f132d564c7effd961c10fd47c76f20...
main()
File "/stage/huggingface-transformers/examples/pytorch/language-modeling/run_mlm.py", line 242, in main
datasets = load_dataset(data_args.dataset_name, data_args.dataset_config_name, cache_dir=model_args.cache_dir)
File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/load.py", line 856, in load_dataset
builder_instance.download_and_prepare(
File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/builder.py", line 583, in download_and_prepare
self._download_and_prepare(
File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/builder.py", line 639, in _download_and_prepare
split_generators = self._split_generators(dl_manager, **split_generators_kwargs)
File "/home/onnxruntimedev/.cache/huggingface/modules/datasets_modules/datasets/wikitext/aa5e094000ec7afeb74c3be92c88313cd6f132d564c7effd961c10fd47c76f20/wikitext.py", line 138, in _split_generators
data_file = dl_manager.download_and_extract(self.config.data_url)
File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/download_manager.py", line 289, in download_and_extract
return self.extract(self.download(url_or_urls))
File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/download_manager.py", line 197, in download
downloaded_path_or_paths = map_nested(
File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 195, in map_nested
return function(data_struct)
File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/download_manager.py", line 220, in _download
return cached_path(url_or_filename, download_config=download_config)
File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/file_utils.py", line 281, in cached_path
output_path = get_from_cache(
File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/file_utils.py", line 634, in get_from_cache
raise ConnectionError("Couldn't reach {}".format(url))
ConnectionError: Couldn't reach https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-raw-v1.zip
```
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Update the `datasets` pipeline to latest version `2.17.0`.
### Description
<!-- Describe your changes. -->
This PR upgrades ORTModule's default opset from 15 to 17. Opset 17 is
the final opset supported by torchscript exporter
(https://github.com/pytorch/pytorch/pull/107829)
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Engineering excellence contribution for ORT Training DRI.
---------
Co-authored-by: Prathik Rao <prathikrao@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Fix pytest version to 7.4.4, higher version will cause error
`from onnxruntime.capi import onnxruntime_validation
ModuleNotFoundError: No module named 'onnxruntime.capi'`
### Description
Version 41.0.0 currently used has vulnerabilities.
### Motivation and Context
See [Vulnerable OpenSSL included in cryptography
wheels](https://github.com/advisories/GHSA-v8gr-m533-ghj9)
- Update ROCm and MIGraphX CI to ROCm5.7
- Simplify test exculde file. Some tests will output `registered
execution providers ROCMExecutionProvider were unable to run the model.`
if they cannot run.
- Add `enable_training` build argument for MIGraphX pipeline.
### Description
1. Update docker files and their build instructions.
ARM64 and x86_64 can use the same docker file.
2. Upgrade Linux CUDA pipeline's base docker image from CentOS7 to UBI8
AB#18990
### Description
1. As a follow-up of #16761, this PR allows build ORT on iOS/Android
without the need to explicitly specify a protoc path. #16761 is for
WASM. This one is for iOS/Android
2. Update the MacOS/Linux build scripts that build/install protobuf from
source. Make them be more flexible. Add the support for
RedHatEnterprise(ubi), which will needed for upgrading the base image
from centos:7 to ubi:8.
3. Update tools/ci_build/github/pai/rocm-ci-pipeline-env.Dockerfile :
the docker file's base image has preinstalled protobuf in /usr/local, we
should uninstall them to avoid conflicts.
This PR mainly optimize ROCm CI test to reduce time and CPU utilization.
- use smaller batch size on strided_batched_gemm/batched_gemm test
- disable cpu training test
- fix test_e2e_padding_elimination Occasional failures on ROCm.
- Move ROCm build step on CPU only machine
- Add the performance data of the huggingface bert-large model on the
MI200
- At the beginning of the test step, check the agent's GPU usage and
kill the threads occupying the GPU, which may be left over from previous
tasks that exited abnormally.
- Use different docker images during the build and test steps. The
difference is the `uid` and `user` when build docker image and create
docker container.
Some CI jobs may interrupted unexpectedly and didn't execute umount data
step. The data left in host device will cause `device or resource busy`
and make subsequent CI jobs fail.
Move the mount data step into docker container, the host machine will
not be occupied when CI jobs exit incorrectly.
MIGraphX CI
- Change docker container user name to `onnxruntimedev`
ROCm CI
- Build docker image every job instead of using prebuild image.
- Every job create a container with only one GPU with command `docker
run -it --device=/dev/kfd --device=/dev/dri/renderDxxx`
- Remove tests that are unstable or use outdated interfaces.
- Enable training ortmodule test.
update ROCm/MIGraphX CI to ROC5.5.
TODO:
two PR to fix failure on
orttraining/orttraining/test/python/orttraining_test_ortmodule_api.py
-
test_gradient_correctness_minmax/test_gradient_correctness_argmax_unfold/test_gradient_correctness_argmax_diagonal
(https://github.com/microsoft/onnxruntime/pull/15903)
- test_ortmodule_attribute_name_collision_warning
(https://github.com/microsoft/onnxruntime/pull/15884)
### Description
They were missed in #15707 , because they are not in common places for Dockerfiles.
Though this commit updated tools/ci_build/github/pai/rocm-ci-pipeline-env.Dockerfile, it won't automatically take effect. The image needs to be manually generated and pushed to a place, and before doing that our CMakeLists.txt also needs to be tweaked a little bit.
### Description
Use pytest-xdist to distribute tests across multiple CPUs to speed up
test execution.
Use pytest-rerunfailures to rerun failed test in case of pytest-xdist
crash.
`pytest -n 16` can reduce pytest time from 80 minutes to 20 minutes.
### Motivation and Context
Now kernel explorer pytest of ROCm CI takes nearly 1 hour 20 minutes. It
will take longer time when we add more tunableOp in the future.
## Description
1. Convert some git submodules to cmake external projects
2. Update nsync from
[1.23.0](https://github.com/google/nsync/releases/tag/1.23.0) to
[1.25.0](https://github.com/google/nsync/releases/tag/1.25.0)
3. Update re2 from 2021-06-01 to 2022-06-01
4. Update wil from an old commit to 1.0.220914.1 tag
5. Update gtest to a newer commit so that it can optionally leverage
absl/re2 for parsing command line flags.
The following git submodules are deleted:
1. FP16
2. safeint
3. XNNPACK
4. cxxopts
5. dlpack
7. flatbuffers
8. googlebenchmark
9. json
10. mimalloc
11. mp11
12. pthreadpool
More will come.
## Motivation and Context
There are 3 ways of integrating 3rd party C/C++ libraries into ONNX
Runtime:
1. Install them to a system location, then use cmake's find_package
module to locate them.
2. Use git submodules
6. Use cmake's external projects(externalproject_add).
At first when this project was just started, we considered both option 2
and option 3. We preferred option 2 because:
1. It's easier to handle authentication. At first this project was not
open source, and it had some other non-public dependencies. If we use
git submodule, ADO will handle authentication smoothly. Otherwise we
need to manually pass tokens around and be very careful on not exposing
them in build logs.
2. At that time, cmake fetched dependencies after "cmake" finished
generating vcprojects/makefiles. So it was very difficult to make cflags
consistent. Since cmake 3.11, it has a new command: FetchContent, which
fetches dependencies when it generates vcprojects/makefiles just before
add_subdirectories, so the parent project's variables/settings can be
easily passed to the child projects.
And when the project went on, we had some new concerns:
1. As we started to have more and more EPs and build configs, the number
of submodules grew quickly. For more developers, most ORT submodules are
not relevant to them. They shouldn't need to download all of them.
2. It is impossible to let two different build configs use two different
versions of the same dependency. For example, right now we have protobuf
3.18.3 in the submodules. Then every EP must use the same version.
Whenever we have a need to upgrade protobuf, we need to coordinate
across the whole team and many external developers. I can't manage it
anymore.
3. Some projects want to manage the dependencies in a different way,
either because of their preference or because of compliance
requirements. For example, some Microsoft teams want to use vcpkg, but
we don't want to force every user of onnxruntime using vcpkg.
7. Someone wants to dynamically link to protobuf, but our build script
only does static link.
8. Hard to handle security vulnerabilities. For example, whenever
protobuf has a security patch, we have a lot of things to do. But if we
allowed people to build ORT with a different version of protobuf without
changing ORT"s source code, the customer who build ORT from source will
be able to act on such things in a quicker way. They will not need to
wait ORT having a patch release.
9. Every time we do a release, github will also publish a source file
zip file and a source file tarball for us. But they are not usable,
because they miss submodules.
### New features
After this change, users will be able to:
1. Build the dependencies in the way they want, then install them to
somewhere(for example, /usr or a temp folder).
2. Or download the dependencies by using cmake commands from these
dependencies official website
3. Similar to the above, but use your private mirrors to migrate supply
chain risks.
4. Use different versions of the dependencies, as long as our source
code is compatible with them. For example, you may use you can't use
protobuf 3.20.x as they need code changes in ONNX Runtime.
6. Only download the things the current build needs.
10. Avoid building external dependencies again and again in every build.
### Breaking change
The onnxruntime_PREFER_SYSTEM_LIB build option is removed you could think from now
it is default ON. If you don't like the new behavior, you can set FETCHCONTENT_TRY_FIND_PACKAGE_MODE to NEVER.
Besides, for who relied on the onnxruntime_PREFER_SYSTEM_LIB build
option, please be aware that this PR will change find_package calls from
Module mode to Config mode. For example, in the past if you have
installed protobuf from apt-get from ubuntu 20.04's official repo,
find_package can find it and use it. But after this PR, it won't. This
is because that protobuf version provided by Ubuntu 20.04 is too old to
support the "config mode". It can be resolved by getting a newer version
of protobuf from somewhere.
### Description
<!-- Describe your changes. -->
Unit test with ROCm5.3 slower than ROCm5.2.3. Revert to ROCm5.2.3.
We will update to ROCm5.3 when the issue resloved by AMD.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Update for ROCm CI before reland tunable GEMM #12853. This PR also update
composable kernel to use CMakes's HIP language support so that we can
mix C/C++ compiler with HIP compiler instead of locking to hip-clang
* [UPDATE] update ci to rocm5.2 + torch1.11
* [Revert] disable ort module test
* [DELETE] delete Rocm5.1.1 ci test result
* [UPDATE] update the comments
* [UPDATE] update amd ci pipeline 2 rocm5.1.1
* [FIX] json format error
* [ERROR] disable unit tests
* [FIX] ucx error
* [FIX] cmake version
* [FIX] units test
* update to torch 1.10
* update torchvision version
* update torchtext version
* remove deprecated option enable_onnx_checker
* add unit test to test gradient of GatherElements
* add ORTMODULE_ONNX_OPSET_VERSION in a docker file
* try to run inside 4.3.1 container
* no \ in container run command
* remove networking options
* try with adding video render groups
* add job to build docker image
* try without 1st stage
* change alpha, beta to float
* try adding service connection
* retain huggingface directory
* static video and render gid
* use runtime expression for variables
* install torch-ort
* pin sacrebleu==1.5.1
* update curves for rocm 4.3.1
* try again
* disable determinism and only check tail of loss curve and with a much larger threshold of 0.05
* disable RoBERTa due to high run variablity on ROCm 4.3.1
* put reduction unit tests back in