pytorch/.github/scripts
Huy Do d7f943ec82 [mergebot] Flaky and broken trunk should take precedence over ic (#107761)
I notice a curious case on https://github.com/pytorch/pytorch/pull/107508 where there was one broken trunk failure and the PR was merged with `merge -ic`.  Because the failure had been classified as unrelated, I expected to see a no-op force merge here.  However, it showed up as a force merge with failure.

![Screenshot 2023-08-22 at 20 01 10](https://github.com/pytorch/pytorch/assets/475357/b9c93e24-8da8-4fc6-9b3d-61b6bd0a8937)

The record on Rockset reveals https://github.com/pytorch/pytorch/pull/107508 has:

* 0 broken trunk check (unexpected, this should be 1 as Dr. CI clearly say so)
* 1 ignore current check (unexpected, this should be 0 and the failure should be counted as broken trunk instead)
* 3 unstable ROCm jobs (expected)

It turns out that ignore current takes precedence over flaky and broken trunk classification.  This might have been the expectation in the past but I think that's not the case now.  The bot should be consistent with what is shown on Dr. CI.  The change here is to make flaky, unstable, and broken trunk classification to take precedence over ignore current.  Basically, we only need to ignore new or unrecognized failures that have yet been classified.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107761
Approved by: https://github.com/clee2000
2023-08-23 21:22:56 +00:00
..
build_triton_wheel.py [inductor] Update triton pin (#102736) 2023-06-12 22:02:13 +00:00
check_labels.py
close_nonexistent_disable_issues.py Close non existent disable issues complete rollout (#106923) 2023-08-10 16:48:14 +00:00
collect_ciflow_labels.py
comment_on_pr.py Add space to merge cancel comment (#107603) 2023-08-21 21:43:15 +00:00
convert_lintrunner_annotations_to_github.py
ensure_actions_will_cancel.py [BE] Enable ruff's UP rules and autoformat tools and scripts (#105428) 2023-07-19 01:24:44 +00:00
export_pytorch_labels.py
fetch_latest_green_commit.py Update viable/strict script to ignore unstable jobs (#103899) 2023-06-20 19:24:20 +00:00
file_io_utils.py [BE] Enable ruff's UP rules and autoformat tools and scripts (#105428) 2023-07-19 01:24:44 +00:00
filter_test_configs.py [BE] Enable ruff's UP rules and autoformat tools and scripts (#105428) 2023-07-19 01:24:44 +00:00
generate_binary_build_matrix.py [aarch64] Add PT Docker build image for aarch64 (#106881) 2023-08-09 20:28:04 +00:00
generate_ci_workflows.py Adding aarch64 wheel CI workflows (#104109) 2023-06-29 18:58:43 +00:00
generate_pytorch_version.py [BE] Enable ruff's UP rules and autoformat tools and scripts (#105428) 2023-07-19 01:24:44 +00:00
get_workflow_job_id.py
github_utils.py Use GitHub REST API to get the merge base commit SHA (#105098) 2023-07-14 04:25:45 +00:00
gitutils.py [BE]: Enable ruff rules PIE807 and PIE810 (#106218) 2023-07-28 22:35:56 +00:00
gql_mocks.json Make mergebot work with review comments (#107390) 2023-08-17 21:31:41 +00:00
kill_active_ssh_sessions.ps1
label_utils.py [BE] Enable ruff's UP rules and autoformat tools and scripts (#105428) 2023-07-19 01:24:44 +00:00
lint_native_functions.py [BE] Enable ruff's UP rules and autoformat tools and scripts (#105428) 2023-07-19 01:24:44 +00:00
parse_ref.py
pr-sanity-check.sh Make sure we get full file path for filtering in pr-sanity-check (#100978) 2023-05-09 18:42:42 +00:00
pytest_cache.py Preserve PyTest Cache across job runs (#100522) 2023-05-10 18:37:28 +00:00
pytest_caching_utils.py Preserve PyTest Cache across job runs (#100522) 2023-05-10 18:37:28 +00:00
README.md
report_git_status.sh
rockset_mocks.json Fix trymerge broken trunk detection when the merge base job was retried (successfully) (#107333) 2023-08-17 02:09:31 +00:00
run_torchbench.py [BE] Enable ruff's UP rules and autoformat tools and scripts (#105428) 2023-07-19 01:24:44 +00:00
stop_runner_service.sh
test_check_labels.py [BE] Enable ruff's UP rules and autoformat tools and scripts (#105428) 2023-07-19 01:24:44 +00:00
test_fetch_latest_green_commit.py
test_filter_test_configs.py Handle empty PR body in filter_test_configs (#104914) 2023-07-11 10:16:58 +00:00
test_gitutils.py [BE][GHF] Add retries_decorator (#101227) 2023-05-12 20:29:06 +00:00
test_label_utils.py
test_pytest_caching_utils.py Preserve PyTest Cache across job runs (#100522) 2023-05-10 18:37:28 +00:00
test_trymerge.py [mergebot] Flaky and broken trunk should take precedence over ic (#107761) 2023-08-23 21:22:56 +00:00
test_tryrebase.py Bot message changes for -f and rebase (#106150) 2023-07-28 16:13:51 +00:00
trymerge.py [mergebot] Flaky and broken trunk should take precedence over ic (#107761) 2023-08-23 21:22:56 +00:00
trymerge_explainer.py Bot message changes for -f and rebase (#106150) 2023-07-28 16:13:51 +00:00
tryrebase.py Bot message changes for -f and rebase (#106150) 2023-07-28 16:13:51 +00:00
update_commit_hashes.py [CI] Distribute bot workload (#101723) 2023-05-17 21:46:55 +00:00
wait_for_ssh_to_drain.ps1

pytorch/.github

NOTE: This README contains information for the .github directory but cannot be located there because it will overwrite the repo README.

This directory contains workflows and scripts to support our CI infrastructure that runs on GitHub Actions.

Workflows

  • Pull CI (pull.yml) is run on PRs and on main.
  • Trunk CI (trunk.yml) is run on trunk to validate incoming commits. Trunk jobs are usually more expensive to run so we do not run them on PRs unless specified.
  • Scheduled CI (periodic.yml) is a subset of trunk CI that is run every few hours on main.
  • Binary CI is run to package binaries for distribution for all platforms.

Templates

Templates written in Jinja are located in the .github/templates directory and used to generate workflow files for binary jobs found in the .github/workflows/ directory. These are also a couple of utility templates used to discern common utilities that can be used amongst different templates.

(Re)Generating workflow files

You will need jinja2 in order to regenerate the workflow files which can be installed using:

pip install -r .github/requirements/regenerate-requirements.txt

Workflows can be generated / regenerated using the following command:

.github/regenerate.sh

Adding a new generated binary workflow

New generated binary workflows can be added in the .github/scripts/generate_ci_workflows.py script. You can reference examples from that script in order to add the workflow to the stream that is relevant to what you particularly care about.

Different parameters can be used to achieve different goals, i.e. running jobs on a cron, running only on trunk, etc.

ciflow (trunk)

The label ciflow/trunk can be used to run trunk only workflows. This is especially useful if trying to re-land a PR that was reverted for failing a non-default workflow.

Infra

Currently most of our self hosted runners are hosted on AWS, for a comprehensive list of available runner types you can reference .github/scale-config.yml.

Exceptions to AWS for self hosted:

  • ROCM runners

Adding new runner types

New runner types can be added by committing changes to .github/scale-config.yml. Example: https://github.com/pytorch/pytorch/pull/70474

NOTE: New runner types can only be used once the changes to .github/scale-config.yml have made their way into the default branch

Testing pytorch/builder changes

In order to test changes to the builder scripts:

  1. Specify your builder PR's branch and repo as builder_repo and builder_branch in .github/templates/common.yml.j2.
  2. Regenerate workflow files with .github/regenerate.sh (see above).
  3. Submit fake PR to PyTorch. If changing binaries build, add an appropriate label like ciflow/binaries to trigger the builds.