pytorch/.github/scripts
Catherine Lee de9ddd19a5 Various CI settings (#117668)
Test [ci-verbose-test-logs] (this worked, the test logs printing while running and interleaved and are really long)

Settings for no timeout (step timeout still applies, only gets rid of ~30 min timeout for shard of test file) and no piping logs/extra verbose test logs (good for debugging deadlocks but results in very long and possibly interleaved logs).

Also allows these to be set via pr body if the label name is in brackets ex [label name] or the test above.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117668
Approved by: https://github.com/huydhn
2024-01-26 00:17:29 +00:00
..
build_triton_wheel.py Update Triton pin (#117873) 2024-01-23 21:05:30 +00:00
check_labels.py [BE]: enable ruff rules PLR1722 and PLW3301 (#109461) 2023-09-18 02:07:21 +00:00
close_nonexistent_disable_issues.py Close non existent disable issues complete rollout (#106923) 2023-08-10 16:48:14 +00:00
collect_ciflow_labels.py [BE] Apply ufmt to run_test and GitHub Python util scripts (#97588) 2023-03-26 04:52:55 +00:00
comment_on_pr.py Add space to merge cancel comment (#107603) 2023-08-21 21:43:15 +00:00
convert_lintrunner_annotations_to_github.py [BE] Apply ufmt to run_test and GitHub Python util scripts (#97588) 2023-03-26 04:52:55 +00:00
drci_mocks.json.gz Improve the error message when a PR lacks the necessary approvals (#116161) 2023-12-22 00:22:43 +00:00
ensure_actions_will_cancel.py Fix concurrency limits for Create Release (#110759) 2023-10-06 23:14:12 +00:00
export_pytorch_labels.py [BE] Apply ufmt to run_test and GitHub Python util scripts (#97588) 2023-03-26 04:52:55 +00:00
file_io_utils.py [BE] Enable ruff's UP rules and autoformat tools and scripts (#105428) 2023-07-19 01:24:44 +00:00
filter_test_configs.py Various CI settings (#117668) 2024-01-26 00:17:29 +00:00
generate_binary_build_matrix.py Upgrade nightly wheels to rocm6.0 (#116983) 2024-01-11 20:36:00 +00:00
generate_ci_workflows.py [CI] Build M1 conda binaries on M1 runners (#117801) 2024-01-19 14:31:12 +00:00
generate_docker_release_matrix.py Use matrix generate script for docker release workflows (#115949) 2023-12-18 20:20:59 +00:00
generate_pytorch_version.py [BE] Enable ruff's UP rules and autoformat tools and scripts (#105428) 2023-07-19 01:24:44 +00:00
get_workflow_job_id.py [ci][ez] Add job_id to emit_metrics (#113099) 2023-11-08 10:32:41 +00:00
github_utils.py Handle the case when opening a reverted PR with deleted head branch (#114423) 2023-11-23 07:32:46 +00:00
gitutils.py [GHF] Add support for new style stacks (#116873) 2024-01-05 20:32:24 +00:00
gql_mocks.json.gz Improve the error message when a PR lacks the necessary approvals (#116161) 2023-12-22 00:22:43 +00:00
kill_active_ssh_sessions.ps1
label_utils.py [mergebot] Dry run for labels + easier to read Dr CI result (#118240) 2024-01-25 23:06:43 +00:00
lint_native_functions.py [BE] Enable ruff's UP rules and autoformat tools and scripts (#105428) 2023-07-19 01:24:44 +00:00
parse_ref.py [BE] Apply ufmt to run_test and GitHub Python util scripts (#97588) 2023-03-26 04:52:55 +00:00
pr-sanity-check.sh Make sure we get full file path for filtering in pr-sanity-check (#100978) 2023-05-09 18:42:42 +00:00
pytest_cache.py [td] Consistent pytest cache (#113804) 2023-11-17 23:45:47 +00:00
pytest_caching_utils.py [td] Consistent pytest cache (#113804) 2023-11-17 23:45:47 +00:00
README.md Rename default branch to main (#99210) 2023-04-16 18:48:14 -07:00
report_git_status.sh
rockset_mocks.json.gz [BE] Clean up trymerge code handling broken trunk failures (#111520) 2023-10-19 02:30:56 +00:00
stop_runner_service.sh Stop runner service when its GPU crashes (#97585) 2023-03-29 21:17:13 +00:00
tag_docker_images_for_release.py [RelEng] Tag docker images for release, pin unstable and disabled jobs, apply release only changes (#114355) 2023-11-23 02:14:22 +00:00
test_check_labels.py [BE] Enable ruff's UP rules and autoformat tools and scripts (#105428) 2023-07-19 01:24:44 +00:00
test_filter_test_configs.py Various CI settings (#117668) 2024-01-26 00:17:29 +00:00
test_gitutils.py [BE][GHF] Add retries_decorator (#101227) 2023-05-12 20:29:06 +00:00
test_label_utils.py [BE] Apply ufmt to run_test and GitHub Python util scripts (#97588) 2023-03-26 04:52:55 +00:00
test_pytest_caching_utils.py Preserve PyTest Cache across job runs (#100522) 2023-05-10 18:37:28 +00:00
test_trymerge.py [mergebot] Dry run for labels + easier to read Dr CI result (#118240) 2024-01-25 23:06:43 +00:00
test_tryrebase.py Bot message changes for -f and rebase (#106150) 2023-07-28 16:13:51 +00:00
trymerge.py [mergebot] Dry run for labels + easier to read Dr CI result (#118240) 2024-01-25 23:06:43 +00:00
trymerge_explainer.py Bot message changes for -f and rebase (#106150) 2023-07-28 16:13:51 +00:00
tryrebase.py [GHF] Abort merge on rebase failure (#113960) 2023-11-17 23:11:00 +00:00
wait_for_ssh_to_drain.ps1

pytorch/.github

NOTE: This README contains information for the .github directory but cannot be located there because it will overwrite the repo README.

This directory contains workflows and scripts to support our CI infrastructure that runs on GitHub Actions.

Workflows

  • Pull CI (pull.yml) is run on PRs and on main.
  • Trunk CI (trunk.yml) is run on trunk to validate incoming commits. Trunk jobs are usually more expensive to run so we do not run them on PRs unless specified.
  • Scheduled CI (periodic.yml) is a subset of trunk CI that is run every few hours on main.
  • Binary CI is run to package binaries for distribution for all platforms.

Templates

Templates written in Jinja are located in the .github/templates directory and used to generate workflow files for binary jobs found in the .github/workflows/ directory. These are also a couple of utility templates used to discern common utilities that can be used amongst different templates.

(Re)Generating workflow files

You will need jinja2 in order to regenerate the workflow files which can be installed using:

pip install -r .github/requirements/regenerate-requirements.txt

Workflows can be generated / regenerated using the following command:

.github/regenerate.sh

Adding a new generated binary workflow

New generated binary workflows can be added in the .github/scripts/generate_ci_workflows.py script. You can reference examples from that script in order to add the workflow to the stream that is relevant to what you particularly care about.

Different parameters can be used to achieve different goals, i.e. running jobs on a cron, running only on trunk, etc.

ciflow (trunk)

The label ciflow/trunk can be used to run trunk only workflows. This is especially useful if trying to re-land a PR that was reverted for failing a non-default workflow.

Infra

Currently most of our self hosted runners are hosted on AWS, for a comprehensive list of available runner types you can reference .github/scale-config.yml.

Exceptions to AWS for self hosted:

  • ROCM runners

Adding new runner types

New runner types can be added by committing changes to .github/scale-config.yml. Example: https://github.com/pytorch/pytorch/pull/70474

NOTE: New runner types can only be used once the changes to .github/scale-config.yml have made their way into the default branch

Testing pytorch/builder changes

In order to test changes to the builder scripts:

  1. Specify your builder PR's branch and repo as builder_repo and builder_branch in .github/templates/common.yml.j2.
  2. Regenerate workflow files with .github/regenerate.sh (see above).
  3. Submit fake PR to PyTorch. If changing binaries build, add an appropriate label like ciflow/binaries to trigger the builds.