onnxruntime/tools/ci_build/github/azure-pipelines
Yi Zhang 0d1da41ca8
Fix docker image layer caching to avoid redundant docker building and transient connection exceptions. (#21612)
### Description
Improve docker commands to make docker image layer caching works.
It can make docker building faster and more stable.
So far, A100 pool's system disk is too small to use docker cache.
We won't use pipeline cache for docker image and remove some legacy
code.

### Motivation and Context
There are often an exception of
```
64.58 + curl https://nodejs.org/dist/v18.17.1/node-v18.17.1-linux-x64.tar.gz -sSL --retry 5 --retry-delay 30 --create-dirs -o /tmp/src/node-v18.17.1-linux-x64.tar.gz --fail
286.4 curl: (92) HTTP/2 stream 0 was not closed cleanly: INTERNAL_ERROR (err 2)
```
Because Onnxruntime pipeline have been sending too many requests to
download Nodejs in docker building.
Which is the major reason of pipeline failing now

In fact, docker image layer caching never works.
We can always see the scrips are still running
```
#9 [3/5] RUN cd /tmp/scripts && /tmp/scripts/install_centos.sh && /tmp/scripts/install_deps.sh && rm -rf /tmp/scripts
#9 0.234 /bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
#9 0.235 /bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
#9 0.235 /tmp/scripts/install_centos.sh: line 1: !/bin/bash: No such file or directory
#9 0.235 ++ '[' '!' -f /etc/yum.repos.d/microsoft-prod.repo ']'
#9 0.236 +++ tr -dc 0-9.
#9 0.236 +++ cut -d . -f1
#9 0.238 ++ os_major_version=8
....
#9 60.41 + curl https://nodejs.org/dist/v18.17.1/node-v18.17.1-linux-x64.tar.gz -sSL --retry 5 --retry-delay 30 --create-dirs -o /tmp/src/node-v18.17.1-linux-x64.tar.gz --fail
#9 60.59 + return 0
...
```

This PR is improving the docker command to make image layer caching
work.
Thus, CI won't send so many redundant request of downloading NodeJS.
```
#9 [2/5] ADD scripts /tmp/scripts
#9 CACHED

#10 [3/5] RUN cd /tmp/scripts && /tmp/scripts/install_centos.sh && /tmp/scripts/install_deps.sh && rm -rf /tmp/scripts
#10 CACHED

#11 [4/5] RUN adduser --uid 1000 onnxruntimedev
#11 CACHED

#12 [5/5] WORKDIR /home/onnxruntimedev
#12 CACHED
```

###Reference
https://docs.docker.com/build/drivers/

---------

Co-authored-by: Yi Zhang <your@email.com>
2024-08-06 21:37:09 +08:00
..
nodejs/templates Adding Job names to jobs without a name (#20961) 2024-06-06 19:09:21 -07:00
nuget/templates [TensorRT EP] support TensorRT 10.2-GA (#21395) 2024-07-18 12:11:52 -07:00
stages Set CUDA12 as default in GPU packages (#21438) 2024-07-25 10:17:16 -07:00
templates Fix docker image layer caching to avoid redundant docker building and transient connection exceptions. (#21612) 2024-08-06 21:37:09 +08:00
triggers
android-arm64-v8a-QNN-crosscompile-ci-pipeline.yml Update QNN pipeline pool (#21482) 2024-07-29 10:00:21 -07:00
android-x86_64-crosscompile-ci-pipeline.yml Fix Android CI Pipeline code coverage failure (#21504) 2024-07-26 07:36:23 +10:00
bigmodels-ci-pipeline.yml Fix docker image layer caching to avoid redundant docker building and transient connection exceptions. (#21612) 2024-08-06 21:37:09 +08:00
binary-size-checks-pipeline.yml Clean up some mobile package related files and their usages. (#21606) 2024-08-05 16:38:20 -07:00
build-perf-test-binaries-pipeline.yml
c-api-noopenmp-packaging-pipelines.yml Split ondevice training cpu packaging pipeline to a separated pipeline (#21485) 2024-07-25 10:58:34 -07:00
c-api-training-packaging-pipelines.yml Move on-device training packages publish step (#21539) 2024-07-29 09:59:46 -07:00
clean-build-docker-image-cache-pipeline.yml
cuda-packaging-pipeline.yml [TensorRT EP] support TensorRT 10.2-GA (#21395) 2024-07-18 12:11:52 -07:00
linux-ci-pipeline.yml Update training packaging pipeline's docker files (#20853) 2024-05-30 23:48:42 -07:00
linux-cpu-aten-pipeline.yml
linux-cpu-eager-pipeline.yml
linux-cpu-minimal-build-ci-pipeline.yml Update training packaging pipeline's docker files (#20853) 2024-05-30 23:48:42 -07:00
linux-dnnl-ci-pipeline.yml Update training packaging pipeline's docker files (#20853) 2024-05-30 23:48:42 -07:00
linux-gpu-ci-pipeline.yml Set CUDA12 as default in GPU packages (#21438) 2024-07-25 10:17:16 -07:00
linux-gpu-tensorrt-ci-pipeline.yml Set CUDA12 as default in GPU packages (#21438) 2024-07-25 10:17:16 -07:00
linux-gpu-tensorrt-daily-perf-pipeline.yml Set CUDA12 as default in GPU packages (#21438) 2024-07-25 10:17:16 -07:00
linux-migraphx-ci-pipeline.yml change ci docker image to rocm6.1 (#21296) 2024-07-18 14:50:01 +08:00
linux-openvino-ci-pipeline.yml Update OpenVino CI Ubuntu to 22.04 (#21127) 2024-07-09 09:56:44 -07:00
linux-qnn-ci-pipeline.yml [QNN EP] Update to QNN SDK 2.24.0 (#21463) 2024-07-24 10:17:12 -07:00
mac-ci-pipeline.yml Delete pyop (#21094) 2024-06-19 16:21:33 -07:00
mac-coreml-ci-pipeline.yml
mac-ios-ci-pipeline.yml Upgrade min ios version to 13.0 (#20773) 2024-06-04 10:15:20 -07:00
mac-ios-packaging-pipeline.yml Upgrade min ios version to 13.0 (#20773) 2024-06-04 10:15:20 -07:00
mac-react-native-ci-pipeline.yml
npm-packaging-pipeline.yml
nuget-cuda-publishing-pipeline.yml Set CUDA12 as default in GPU packages (#21438) 2024-07-25 10:17:16 -07:00
orttraining-linux-ci-pipeline.yml
orttraining-linux-gpu-ci-pipeline.yml
orttraining-linux-gpu-ortmodule-distributed-test-ci-pipeline.yml custom allreduce cuda kernel (#20703) 2024-06-13 11:09:49 -07:00
orttraining-linux-nightly-ortmodule-test-pipeline.yml
orttraining-mac-ci-pipeline.yml
orttraining-pai-ci-pipeline.yml Replace inline pip install with pip install from requirements*.txt (#21106) 2024-07-22 12:39:10 -07:00
orttraining-py-packaging-pipeline-cpu.yml disables qnn in ort training cpu pipeline (#21510) 2024-07-26 17:23:35 +08:00
orttraining-py-packaging-pipeline-cuda.yml Update training packaging pipeline's docker files (#20853) 2024-05-30 23:48:42 -07:00
orttraining-py-packaging-pipeline-cuda12.yml Update training packaging pipeline's docker files (#20853) 2024-05-30 23:48:42 -07:00
orttraining-py-packaging-pipeline-rocm.yml [ROCm] Update ck to use ck_tile (#21030) 2024-06-19 14:06:10 +08:00
post-merge-jobs.yml [TensorRT EP] support TensorRT 10.2-GA (#21395) 2024-07-18 12:11:52 -07:00
publish-nuget.yml Move on-device training packages publish step (#21539) 2024-07-29 09:59:46 -07:00
py-cuda-package-test-pipeline.yml
py-cuda-packaging-pipeline.yml
py-cuda-publishing-pipeline.yml Set CUDA12 as default in GPU packages (#21438) 2024-07-25 10:17:16 -07:00
py-package-build-pipeline.yml
py-package-test-pipeline.yml [TensorRT EP] support TensorRT 10.2-GA (#21395) 2024-07-18 12:11:52 -07:00
py-packaging-pipeline.yml [QNN EP] Update to QNN SDK 2.24.0 (#21463) 2024-07-24 10:17:12 -07:00
qnn-ep-nuget-packaging-pipeline.yml [QNN EP] Update to QNN SDK 2.24.0 (#21463) 2024-07-24 10:17:12 -07:00
rocm-nuget-packaging-pipeline.yml Make ROCm packaging stages to a single workflow (#21235) 2024-07-04 11:07:04 +08:00
web-ci-pipeline.yml Fix typos according to reviewdog report. (#21335) 2024-07-22 13:37:32 -07:00
win-ci-fuzz-testing.yml Uppdate nuget to Use Nuget 6.10.x (#21209) 2024-06-28 19:49:54 -07:00
win-ci-pipeline.yml add vitisai ep build stage to Windows CPU Pipeline (#21361) 2024-07-15 19:34:08 -07:00
win-gpu-cuda-ci-pipeline.yml Separating all GPU stages into different Pipelines (#21521) 2024-07-26 14:54:45 -07:00
win-gpu-dml-ci-pipeline.yml Separating all GPU stages into different Pipelines (#21521) 2024-07-26 14:54:45 -07:00
win-gpu-doc-gen-ci-pipeline.yml Separating all GPU stages into different Pipelines (#21521) 2024-07-26 14:54:45 -07:00
win-gpu-reduce-op-ci-pipeline.yml Move jobs in onnxruntime-Win2022-GPU-T4 machine pool to onnxruntime-Win2022-GPU-A10 (#21023) 2024-06-12 22:04:40 -07:00
win-gpu-tensorrt-ci-pipeline.yml Set CUDA12 as default in GPU packages (#21438) 2024-07-25 10:17:16 -07:00
win-gpu-training-ci-pipeline.yml Separating all GPU stages into different Pipelines (#21521) 2024-07-26 14:54:45 -07:00
win-qnn-arm64-ci-pipeline.yml [QNN EP] Update to QNN SDK 2.24.0 (#21463) 2024-07-24 10:17:12 -07:00
win-qnn-ci-pipeline.yml [QNN EP] Update to QNN SDK 2.24.0 (#21463) 2024-07-24 10:17:12 -07:00