Commit graph

12 commits

Author SHA1 Message Date
Yi Zhang
0d1da41ca8
Fix docker image layer caching to avoid redundant docker building and transient connection exceptions. (#21612)
### Description
Improve docker commands to make docker image layer caching works.
It can make docker building faster and more stable.
So far, A100 pool's system disk is too small to use docker cache.
We won't use pipeline cache for docker image and remove some legacy
code.

### Motivation and Context
There are often an exception of
```
64.58 + curl https://nodejs.org/dist/v18.17.1/node-v18.17.1-linux-x64.tar.gz -sSL --retry 5 --retry-delay 30 --create-dirs -o /tmp/src/node-v18.17.1-linux-x64.tar.gz --fail
286.4 curl: (92) HTTP/2 stream 0 was not closed cleanly: INTERNAL_ERROR (err 2)
```
Because Onnxruntime pipeline have been sending too many requests to
download Nodejs in docker building.
Which is the major reason of pipeline failing now

In fact, docker image layer caching never works.
We can always see the scrips are still running
```
#9 [3/5] RUN cd /tmp/scripts && /tmp/scripts/install_centos.sh && /tmp/scripts/install_deps.sh && rm -rf /tmp/scripts
#9 0.234 /bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
#9 0.235 /bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
#9 0.235 /tmp/scripts/install_centos.sh: line 1: !/bin/bash: No such file or directory
#9 0.235 ++ '[' '!' -f /etc/yum.repos.d/microsoft-prod.repo ']'
#9 0.236 +++ tr -dc 0-9.
#9 0.236 +++ cut -d . -f1
#9 0.238 ++ os_major_version=8
....
#9 60.41 + curl https://nodejs.org/dist/v18.17.1/node-v18.17.1-linux-x64.tar.gz -sSL --retry 5 --retry-delay 30 --create-dirs -o /tmp/src/node-v18.17.1-linux-x64.tar.gz --fail
#9 60.59 + return 0
...
```

This PR is improving the docker command to make image layer caching
work.
Thus, CI won't send so many redundant request of downloading NodeJS.
```
#9 [2/5] ADD scripts /tmp/scripts
#9 CACHED

#10 [3/5] RUN cd /tmp/scripts && /tmp/scripts/install_centos.sh && /tmp/scripts/install_deps.sh && rm -rf /tmp/scripts
#10 CACHED

#11 [4/5] RUN adduser --uid 1000 onnxruntimedev
#11 CACHED

#12 [5/5] WORKDIR /home/onnxruntimedev
#12 CACHED
```

###Reference
https://docs.docker.com/build/drivers/

---------

Co-authored-by: Yi Zhang <your@email.com>
2024-08-06 21:37:09 +08:00
Changming Sun
67bc9438d7
Update training packaging pipeline's docker files (#20853)
### Description
Similar to #20786 . The last PR was able to update all pipelines and all
docker files. This is a follow-up to that PR.

### Motivation and Context
1. To extract the common part as a reusable build infra among different
ONNX Runtime projects.
2. Avoid hitting docker hub's limit: 429 Too Many Requests - Server
message: toomanyrequests: You have reached your pull rate limit. You may
increase the limit by authenticating and upgrading:
https://www.docker.com/increase-rate-limit
2024-05-30 23:48:42 -07:00
Yi Zhang
338e6672dd
use build.sourceversion in cache image key (#15019)
### Description
Use build.sourceversion in docker image cache key.



### Motivation and Context
We used filpath as the cache key in #14496.
In most cases, the docker base image tag is latest.
So, the hash of the files couldn't be aware of the change of base image.
As the result, the docker image restored, but the image will still be
rebuilt .
The maintenance cost would be huge if we pin image hash in docker file.
For example,
https://quay.io/repository/pypa/manylinux2014_x86_64?tab=tags&tag=latest,
it's updated almost every week.
So far, the build.sourceversion is the right way to keep cache is
updated and valid.
2023-03-24 10:01:22 +08:00
Yi Zhang
ca315b9148
Use ADO cache to cache docker image instead of ACR (#14496)
### Description
Now, we only enable image cache in pipeline cache for Linux Aten
Pipeline.
It'll be enabled in other Linux pipelines gradually.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Fixed
[AB#13143](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/13143)


### Verification
1. No Image Cache in Pipeline

https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=904531&view=results
2. Use Cached Image in Pipeline

https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=904533&view=results
2023-03-11 10:32:02 +08:00
Yi Zhang
d55ae490e1
detach patch manylinux from get_docker_image (#14958)
### Description
Make patch manylinux one single step.


### Motivation and Context
If we want to use hash of docker-related files as the cache key, the
files should keep consistent before and after docker build.
And changes in generated build_scripts should trigger rebuilding the
image as well.
2023-03-09 15:40:58 +08:00
Changming Sun
29ed8811e5
Move C/C++ deps' URLs to deps.txt (#13769)
### Description
1. Move C/C++ deps' URLs to deps.txt, and download the dependencies from
Azure Devops Artifacts instead of github.
2. Add "EXCLUDE_FROM_ALL" keyword to the cmake external projects, so
that we only build the parts we need and avoid installing the 3rd-party
dependencies when people run `make install` in ORT's build directory.
However, at this moment cmake itself doesn't have the feature. So I
copied their code to cmake/external/helper_functions.cmake and modified
it.

This PR is split from #13523, to make that one smaller. 

### Motivation and Context
1. Secure the supply chain
2. Make it be possible to automatically detect if ORT has an old
dependency that hasn't been updated from a long time.
2022-11-29 18:06:35 -08:00
Changming Sun
7b4ce0c1e1
Delete the build scripts that were copied from manylinux project (#12358)
1. Delete the build scripts that were copied from manylinux project. Use "git checkout" instead.
2. Update manylinux version to get python 3.11. Related issue: Python 3.11 support #12343
3. Change the cuda version of linux gpu build job of nuget packaging pipeline from cuda 11.4 to cuda 11.6 to match the TRT job within the same pipeline.. (A lot other places need be updated as well, but I'd prefer to put them in another PR)
4. Make dockerfile names static. For example, replace tools/ci_build/github/linux/docker/$(DockerFile) to tools/ci_build/github/linux/docker/Dockerfile.manylinux2014_cpu . The former one relies on a runtime variable $(DockerFile), Template Parameters are expanded early in processing a pipeline run when most variables are not available. It like C++ macros vs variables.
2022-07-29 18:24:19 -07:00
Chi Lo
aebbb90b79
Integrate C-API tests into Pipelines for release packages (#10794)
* add c-api test for package

* fix bug for running c-api test for package

* refine run application script

* remove redundant code

* include CUDA test

* Remove testing CUDA EP temporarily

* fix bug

* Code refactor

* try to fix YAML bug

* try to fix YAML bug

* try to fix YAML bug

* fix bug for multiple directories in Pipelines

* fix bug

* add comments and fix bug

* Update c-api-noopenmp-packaging-pipelines.yml

* Remove failOnStandardError flag in Pipelines
2022-03-15 10:18:38 -07:00
liqunfu
196e6702ad
to support multiple cuda versions in published onnxruntime-training package (#7468)
to support multiple CUDA versions in published onnxruntime-training package
2021-04-27 17:15:33 -07:00
Edward Chen
cd3a5acca0
Update get_docker_image.py to enable use without image cache container registry. (#6177)
Update get_docker_image.py to enable use without image cache container registry.
2020-12-18 19:01:02 -08:00
Edward Chen
e357486707
Fix build definition template typo, add logging (#6065)
Fix a typo in tools/ci_build/github/azure-pipelines/templates/get-docker-image-steps.yml.
Add logging to tools/ci_build/get_docker_image.py for easier debugging.
2020-12-08 15:16:50 -08:00
Edward Chen
6d642a3dba
Replace direct pulls from image cache container registry with get_docker_image.py, build definition clean up. (#5906) 2020-12-01 19:10:23 -08:00