pytorch/.github/workflows/generated-docker-builds.yml
Michael Suo 3fcdff0615 Set pull_request checkout to head sha
**this is a re-submit of this PR, the previous version broke forked pull requests by checkout out the head ref as opposed to the head sha**

There are two commits that we test sometimes in CI:
1. The merge commit (a test merge between the PR head ref and the latest base ref)
2. The head ref (the exact commit that was at the head of the user's branch when they pushed).

This distinction is fairly subtle; in the case of 1, you are effectively running against a "rebased" version of your PR's branch. The problem is that we use *both* of these commits today, with confusing results—depending on how you put up your PR and what workflows are running, we might be testing two different commits!

We should probably consolidate on one. This would eliminate a subtle but complex part of our CI (I am mildly horrified by the complexity of [this explanation](https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md#which-commit-is-used-in-ci), although it's heroic that someone went and documented it lol). This PR consolidates on using the head ref (option 2).
- This is the behavior of phabricator/fbcode, which many PT devs will be more familiar with.
- This is the behavior of ghstack
- Our master branch moves quite quickly, so the chance that there is a substantial divergence between your local test runs and CI is high, with confusing results that are nondeterministic based on when you put up the PR.
- We use a linear history/squash-rebase-merge workflow, which is better modeled by option 2. Option 1 effectively emulates a merge-commit-style workflow.

The primary disadvantage is that now when re-running workflows, you will not be re-running against a "rebased" version of the PR, but the exact head ref that was pushed. Tbh I find it quite unintuitive that what you're testing changes depending on when you press the re-run button, but I know at least @malfet does this so it's worth mentioning.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71974
2022-02-16 21:03:09 +00:00

174 lines
8.4 KiB
YAML
Generated

# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/docker_builds_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: docker-builds
on:
workflow_dispatch:
pull_request:
types: [opened, synchronize, reopened]
paths:
- '.circleci/docker/**'
- '.github/workflows/generated-docker-builds.yml'
schedule:
- cron: 1 3 * * 3
concurrency:
group: docker-builds-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
AWS_DEFAULT_REGION: us-east-1
jobs:
docker-build:
runs-on: linux.2xlarge
timeout-minutes: 240
strategy:
matrix:
include:
- docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7'
docker_image_short_name: 'pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7'
- docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.5-cudnn8-py3-gcc7'
docker_image_short_name: 'pytorch-linux-bionic-cuda11.5-cudnn8-py3-gcc7'
- docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-py3.7-clang9'
docker_image_short_name: 'pytorch-linux-bionic-py3.7-clang9'
- docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-rocm4.3.1-py3.7'
docker_image_short_name: 'pytorch-linux-bionic-rocm4.3.1-py3.7'
- docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-rocm4.5-py3.7'
docker_image_short_name: 'pytorch-linux-bionic-rocm4.5-py3.7'
- docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7'
docker_image_short_name: 'pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7'
- docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7'
docker_image_short_name: 'pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7'
- docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7'
docker_image_short_name: 'pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7'
- docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c'
docker_image_short_name: 'pytorch-linux-xenial-py3-clang5-android-ndk-r19c'
- docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-asan'
docker_image_short_name: 'pytorch-linux-xenial-py3-clang5-asan'
- docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang7-asan'
docker_image_short_name: 'pytorch-linux-xenial-py3-clang7-asan'
- docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang7-onnx'
docker_image_short_name: 'pytorch-linux-xenial-py3-clang7-onnx'
- docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc5.4'
docker_image_short_name: 'pytorch-linux-xenial-py3.7-gcc5.4'
- docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc7'
docker_image_short_name: 'pytorch-linux-xenial-py3.7-gcc7'
env:
DOCKER_IMAGE_BASE: '${{ matrix.docker_image_base }}'
name: docker-build (${{ matrix.docker_image_short_name }})
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \
--password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"
- name: Chown workspace
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
retry docker pull "${ALPINE_IMAGE}"
# Ensure the working directory gets chowned back to the current user
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE}"
mkdir "${GITHUB_WORKSPACE}"
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }}
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Clean PyTorch checkout
run: |
# Remove any artifacts from the previous checkouts
git clean -fxd
- name: Calculate docker image tag
id: calculate-tag
run: |
DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}"
echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}"
echo "::set-output name=docker_tag::${DOCKER_TAG}"
echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"
- name: Check if image should be built
id: check
env:
BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}
run: |
set -x
echo ::set-output name=rebuild::yes
- name: Build and push docker image
if: ${{ steps.check.outputs.rebuild }}
env:
DOCKER_SKIP_S3_UPLOAD: 1
working-directory: .circleci/docker
run: |
export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}
./build_docker.sh
- name: Pull Docker image
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
retry docker pull "${DOCKER_IMAGE}"
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Clean up docker images
if: always()
run: |
# Prune all of the docker images
docker system prune -af