pytorch/.github/scripts
Huy Do d25be63c05 [Reland] Use sudo when reset NVIDIA devices (#88605)
I accidentally delete my remote branch, so I need to create a new PR for this fix (instead of updating the reverted PR https://github.com/pytorch/pytorch/pull/88531)

TIL, sudo echo doesn't do that I think it does, the correct syntax should be `echo "1" | sudo tee /sys/bus/pci/devices/$PCI_ID/reset` granting sudo permission to the latter tee command.

### Testing

Due diligence and actually login to `i-07e62045d15df3629` and make sure that the command works
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88605
Approved by: https://github.com/ZainRizvi
2022-11-08 01:17:35 +00:00
..
build_publish_nightly_docker.sh Advance nightly docker to 11.6 (#87858) 2022-10-28 19:55:33 +00:00
build_triton_wheel.py [CI] Add triton wheels build workflow (#87234) 2022-10-19 03:35:16 +00:00
check_labels.py Move check labels to separate workflow (#87999) 2022-10-31 16:52:30 +00:00
comment_on_pr.py
convert_lintrunner_annotations_to_github.py
ensure_actions_will_cancel.py Disable mem leak check (#88373) 2022-11-04 20:47:42 +00:00
export_pytorch_labels.py
fetch_latest_green_commit.py Allow viable/strict promotion even if periodic or docker-release-builds jobs are failing (#86827) 2022-10-13 00:38:48 +00:00
filter_test_configs.py Disable mem leak check (#88373) 2022-11-04 20:47:42 +00:00
generate_binary_build_matrix.py Stop cuda-10.2 binary builds (#85873) 2022-09-29 15:04:24 +00:00
generate_ci_workflows.py Stop cuda-10.2 binary builds (#85873) 2022-09-29 15:04:24 +00:00
generate_pytorch_version.py
get_workflow_job_id.py
gitutils.py
gql_mocks.json Move check labels to separate workflow (#87999) 2022-10-31 16:52:30 +00:00
install_nvidia_utils_linux.sh [Reland] Use sudo when reset NVIDIA devices (#88605) 2022-11-08 01:17:35 +00:00
kill_active_ssh_sessions.ps1
lint_native_functions.py
parse_ref.py [BE] Get rid of deprecation warnings in workflows (take 3) (#87152) 2022-10-18 13:53:30 +00:00
pr-sanity-check.sh .github: Improve sanity check for generated files (#86143) 2022-10-03 23:38:55 +00:00
README.md Fix typos under .github directory (#87828) 2022-10-27 00:01:10 +00:00
report_git_status.sh
run_torchbench.py Upload the benchmark result to S3 and post the URL (#84726) 2022-09-09 22:01:23 +00:00
test_check_labels.py Create workflow to make sure PRs have valid labels (#86829) 2022-10-21 17:39:29 +00:00
test_fetch_latest_green_commit.py Allow viable/strict promotion even if periodic or docker-release-builds jobs are failing (#86827) 2022-10-13 00:38:48 +00:00
test_filter_test_configs.py [BE] Get rid of deprecation warnings in workflows (take 3) (#87152) 2022-10-18 13:53:30 +00:00
test_gitutils.py
test_trymerge.py [GHF] Make EasyCLA unskippable (#86161) 2022-10-03 22:50:06 +00:00
test_tryrebase.py
trymerge.py [GHF] Remove CC line from commit message (#88252) 2022-11-01 22:17:12 +00:00
trymerge_explainer.py [ci] fix bot comment (#87127) 2022-10-17 21:27:21 +00:00
tryrebase.py print stderr for ghstack rebase (#87795) 2022-10-26 22:10:10 +00:00
update_commit_hashes.py
wait_for_ssh_to_drain.ps1

pytorch/.github

NOTE: This README contains information for the .github directory but cannot be located there because it will overwrite the repo README.

This directory contains workflows and scripts to support our CI infrastructure that runs on Github Actions.

Workflows

  • Pull CI (pull.yml) is run on PRs and on master.
  • Trunk CI (trunk.yml) is run on trunk to validate incoming commits. Trunk jobs are usually more expensive to run so we do not run them on PRs unless specified.
  • Scheduled CI (periodic.yml) is a subset of trunk CI that is run every few hours on master.
  • Binary CI is run to package binaries for distribution for all platforms.

Templates

Templates written in Jinja are located in the .github/templates directory and used to generate workflow files for binary jobs found in the .github/workflows/ directory. These are also a couple of utility templates used to discern common utilities that can be used amongst different templates.

(Re)Generating workflow files

You will need jinja2 in order to regenerate the workflow files which can be installed using:

pip install -r .github/requirements.txt

Workflows can be generated / regenerated using the following command:

.github/regenerate.sh

Adding a new generated binary workflow

New generated binary workflows can be added in the .github/scripts/generate_ci_workflows.py script. You can reference examples from that script in order to add the workflow to the stream that is relevant to what you particularly care about.

Different parameters can be used to achieve different goals, i.e. running jobs on a cron, running only on trunk, etc.

ciflow (trunk)

The label ciflow/trunk can be used to run trunk only workflows. This is especially useful if trying to re-land a PR that was reverted for failing a non-default workflow.

Infra

Currently most of our self hosted runners are hosted on AWS, for a comprehensive list of available runner types you can reference .github/scale-config.yml.

Exceptions to AWS for self hosted:

  • ROCM runners

Adding new runner types

New runner types can be added by committing changes to .github/scale-config.yml. Example: https://github.com/pytorch/pytorch/pull/70474

NOTE: New runner types can only be used once the changes to .github/scale-config.yml have made their way into the default branch