Because some implementations, like OpenDAL does not work with AWS IMDSv2, but this script will bridge the gap and enables more recent `sccache` releases(that switched from simple-s3 to OpenDAL) to work in current CI system
When launched it prints something like:
```
export AWS_ACCESS_KEY_ID=XXXXX
export AWS_SECRET_ACCESS_KEY=YYYY
export AWS_SESSION_TOKEN=ZZZZ
```
which can be `eval`ed and passed then sccache can use those failures.
Validated in https://github.com/pytorch/pytorch/pull/121323
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121426
Approved by: https://github.com/Skylion007
25 min -> 17 + 13 min, which is still not as fast as I want it to be but I'll take it
Lintrunner provides some parallelism by default, but it's not perfect
Reducing fetch-depth from all to 1 further reduces time by ~2-3 minutes
From non clang's logs:
```
2024-02-09T22:05:39.5297616Z Requirement already satisfied: PyYAML==6.0 in /opt/conda/lib/python3.11/site-packages (6.0)
2024-02-09T22:12:23.6164708Z Collecting black==23.12.1
```
I don't know why this part takes so long, maybe it's just buffering? Clang version doesn't show this issue
See 5a750c8035
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119575
Approved by: https://github.com/huydhn, https://github.com/malfet
Due to PR_WINDOW, if the magic string exists in the body but the pr was not updated recently, the query wouldn't find it and would delete the branch. Instead, query separately for branches with the no-delete-branch label, which I created recently.
Might as well query for branches with open PRs while we're at it so PRs with the stale label won't get their branches deleted either
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119399
Approved by: https://github.com/huydhn
Due to PR_WINDOW, if the magic string exists in the body but the pr was not updated recently, the query wouldn't find it and would delete the branch. Instead, query separately for branches with the no-delete-branch label, which I created recently.
Might as well query for branches with open PRs while we're at it so PRs with the stale label won't get their branches deleted either
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119399
Approved by: https://github.com/huydhn
Example https://github.com/pytorch/pytorch/actions/runs/7562281351/job/20592425611?pr=117079 (The code to delete branches isn't being run, it's just listing the branches it wants to delete)
Internal code: https://fburl.com/code/hdvvbfkj
Threshold for branch with PR is 30 days regardless of whether or not the PR is merged or not (compared to 3 days if merged and 30 days if closed). Threshold for branch without PR is 1.5 years (same internally).
Threshold of ~400 queries to github so it doesn't hit token usage limits. Currently this leads to about 350 branches deleted per run.
Only query for the last 90 days of updated PRs to reduce token usage, so if a branch has a PR but it was updated 90+ days ago, it will think it doesn't have a PR and will wait for the 1.5 years branch update check instead, regardless of whether the PR is open or closed.
I tested that it could delete my own branch and it worked.
labeled with test-config/crossref because I just want the smallest test config possible to reduce CI usage
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117079
Approved by: https://github.com/malfet
Instead rely on `GitHubPR.default_branch()` which is the name of the repo's default branch.
Do not pass branch name `merge_changes` is called, as it is set to default branch inside the function
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118530
Approved by: https://github.com/clee2000
Mention co-authors in PR body
Modify `CommitAuthors` to include query first two commit `authors`, which makes sure that authors from suggested commits are recognized.
Test plan: CI + check `get_authors()` on a few PRs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118347
Approved by: https://github.com/kit1980
# Changes
* introduce `--check-mergeability` trymerge flag that attempts to merge PR locally, using the same merge logic as the mergebot, but requires just a read-only `GITHUB_TOKEN` and git repo.
* change mergeability workflow to utilize the new --check-mergeability logic
# Alternatives considered
1.
> Rewrite `https://github.com/pytorch/test-infra/actions/workflows/pr-dependencies-check.yml` to correctly support partially merged ghstacks.
That would be a slightly better approach, but ROI is lower, as it requires reimplementing trymerge logic and additional effort to consolidate the codebase (trymerge lives in pytorch repo).
`pr-dependencies-check.yml` still produces human-readable results for partially merged ghstack prs (even if it falsely reports them as non-mergeable).
2.
> Instead of introducing new trymerge flag, use existing flags, including `--dry-run`.
That didn't work, as no combination of existing flags skips the rule checks and ROCKSET lookups.
# Testing
1. Manual testing `trymerge.py --check-mergeability` on the regular and ghstack PRs:
```
export GITHUB_TOKEN=
export GIT_REPO_DIR=`pwd`
export GITHUB_REPOSITORY=pytorch/pytorch
export GIT_REMOTE_URL=https://github.com/pytorch/pytorch
# Test 1 (2 prs, 1 is closed)
python3 ../pytorch/.github/scripts/trymerge.py --check-mergeability 117862
Skipping 1 of 2 PR (#117859) as its already been merged
echo $?
0
# Test 2 (3 prs, 1 is closed)
python3 ../pytorch/.github/scripts/trymerge.py --check-mergeability 118125
Skipping 1 of 3 PR (#117859) as its already been merged
echo $?
0
# Test 3 (3 prs, intentional conflicts introduced into `main`):
python3 ../pytorch/.github/scripts/trymerge.py --check-mergeability 118125
Skipping 1 of 3 PR (#117859) as its already been merged
stdout:
Auto-merging torch/_inductor/ir.py
Auto-merging torch/_inductor/lowering.py
CONFLICT (content): Merge conflict in torch/_inductor/lowering.py
error: could not apply 66ba5b8792f... Realize inputs to DynamicScalar before unwrapping
...
RuntimeError: Command `git -C /Users/ivanzaitsev/pytorch2 cherry-pick -x 66ba5b8792fa076c4e512d920651e5b6b7e466f4` returned non-zero exit code 1
```
2. Workflow run:
https://github.com/pytorch/pytorch/actions/runs/7660736172/job/20878651852?pr=118258
<img width="516" alt="image" src="https://github.com/pytorch/pytorch/assets/108101595/28fbf0d2-ac2a-4518-b41d-b32b41373747">
<img width="621" alt="image" src="https://github.com/pytorch/pytorch/assets/108101595/ddbf8566-a417-43ec-9d0e-f623f4a71313">
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118258
Approved by: https://github.com/PaliC, https://github.com/huydhn
Test [ci-verbose-test-logs] (this worked, the test logs printing while running and interleaved and are really long)
Settings for no timeout (step timeout still applies, only gets rid of ~30 min timeout for shard of test file) and no piping logs/extra verbose test logs (good for debugging deadlocks but results in very long and possibly interleaved logs).
Also allows these to be set via pr body if the label name is in brackets ex [label name] or the test above.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117668
Approved by: https://github.com/huydhn
Dry run open for labels so we can run trymerge locally with dryrun without actually affected the PR
Make Dr.CI results easier to read (previously a massive json dump, now just the job names + ids, in a nicer format)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118240
Approved by: https://github.com/huydhn
As usual, almost no work on PyTorch side, all changes are on the builder end, namely:
- 8b67d32929 - depend on `blas * mkl` only on x86 machines
- eb78393f1e - install arm64 conda when running on Apple Silicon
- 0d3aea4ee0 - constrain llvmdev-9 to x86 machines only
- 6c6a33b271 - set correct DEVELOPER_DIR path
TODO:
- We should auto-detect this `DEVELOPER_DIR` via `xcode-select`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117801
Approved by: https://github.com/atalman
Where base stack targets default branch, rather than base. But as
default branch is likely to advance, since PR was made, search for
mergebase before determining whether `base`..`head` are in sync with `orig` branch
Also, rather than hardcode default branch name, fetch it from `GitHubPR.default_branch()`
Test Plan: https://github.com/malfet/deleteme/pull/77
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116873
Approved by: https://github.com/ezyang
By adding `get_ghstack_dependent_prs` that using `git branch --contains`
finds all PRs containing stacked branch, selecting longest one (in
terms of distance between origin and default branch) and skipping all
open PRs
Please note, that reverts should be applied in a reversed order with the
one how PRs were landed originally.
Use a bit of a defensive programming, i.e. revert single PR if attempt to fetch dependencies fails for some reason.
Test plan:
- Lint
- ```
>>> from trymerge import GitRepo, GitHubPR, get_ghstack_prs, get_ghstack_dependent_prs
>>> pr=GitHubPR("pytorch", "pytorch", 115188)
>>> pr1=GitHubPR("pytorch", "pytorch", 115210)
>>> repo=GitRepo("/Users/nshulga/git/pytorch/pytorch")
>>> get_ghstack_dependent_prs(repo, pr1)
[('22742d93a5357c9b5b45a74f91a6dc5599c9c266', <trymerge.GitHubPR object at 0x100f32f40>)]
>>> get_ghstack_dependent_prs(repo, pr)
[('22742d93a5357c9b5b45a74f91a6dc5599c9c266', <trymerge.GitHubPR object at 0x10102eaf0>), ('76b1d44d576c20be79295810904c589241ca1bd2', <trymerge.GitHubPR object at 0x10102eb50>)]
>>> rc=get_ghstack_dependent_prs(repo, pr)
rc[0]>>> rc[0][1].pr_num
115210
>>> rc[1][1].pr_num
115188
```
- see: https://github.com/malfet/deleteme/pull/59#issuecomment-1869904714 and https://github.com/malfet/deleteme/pull/74#issuecomment-1870542702
Fixes https://github.com/pytorch/test-infra/issues/4845
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116447
Approved by: https://github.com/huydhn
ghstack dependencies: #116446
Prep change for allowing stacked reverts
This is a no-op that factors out some helper function that would be
useful later:
- `get_pr_commit_sha` finds a committed sha for a given PR
- `_revlist_to_prs` converts a revlist to GitHubPRs conditionally
filtering some out
- `do_revert_prs` reverts multiple PRs in a batch, but so far is
invoked with only one PR
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116446
Approved by: https://github.com/huydhn, https://github.com/seemethere
Not sure if this is recent API change or what but `gh_get_labels('malfet', 'deleteme')` used to raise an exception (see https://github.com/malfet/deleteme/actions/runs/7334535266/job/19971328673#step:6:37 )
```
File "/home/runner/work/deleteme/deleteme/.github/scripts/label_utils.py", line 50, in get_last_page_num_from_header
link_info[link_info.rindex(prefix) + len(prefix) : link_info.rindex(suffix)]
AttributeError: 'NoneType' object has no attribute 'rindex'
```
And with this fix it returns the expected list
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116444
Approved by: https://github.com/huydhn
Constant time access of first value in collection. This is a constant time operation instead of converting the item to a list to get the first item which is linear. The rule is turned on which automatically autofixes and enforces this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115507
Approved by: https://github.com/malfet
1. This tags docker images using docker pull/tag/push for current release
2. Sets RELEASE_VERSION_TAG var and regenerates the workflows using the new docker tag
3. Remove conda token setting and Binary tests release changes these are already automated
4. Pin unstable and disabled jobs, autumate: https://github.com/pytorch/pytorch/pull/111675
Test:
```
RELEASE_VERSION=2.2 ./scripts/release/apply-release-changes.sh
Tagging pytorch/manylinux-builder:cuda11.8-main to pytorch/manylinux-builder:cuda11.8-2.2 , dry_run: enabled
Tagging pytorch/manylinux-builder:cuda12.1-main to pytorch/manylinux-builder:cuda12.1-2.2 , dry_run: enabled
Tagging pytorch/libtorch-cxx11-builder:cuda11.8-main to pytorch/libtorch-cxx11-builder:cuda11.8-2.2 , dry_run: enabled
Tagging pytorch/libtorch-cxx11-builder:cuda12.1-main to pytorch/libtorch-cxx11-builder:cuda12.1-2.2 , dry_run: enabled
Tagging pytorch/manylinux-builder:rocm5.6-main to pytorch/manylinux-builder:rocm5.6-2.2 , dry_run: enabled
Tagging pytorch/manylinux-builder:rocm5.7-main to pytorch/manylinux-builder:rocm5.7-2.2 , dry_run: enabled
Tagging pytorch/libtorch-cxx11-builder:rocm5.6-main to pytorch/libtorch-cxx11-builder:rocm5.6-2.2 , dry_run: enabled
Tagging pytorch/libtorch-cxx11-builder:rocm5.7-main to pytorch/libtorch-cxx11-builder:rocm5.7-2.2 , dry_run: enabled
Tagging pytorch/manylinux-builder:cpu-main to pytorch/manylinux-builder:cpu-2.2 , dry_run: enabled
Tagging pytorch/libtorch-cxx11-builder:cpu-main to pytorch/libtorch-cxx11-builder:cpu-2.2 , dry_run: enabled
Tagging pytorch/manylinuxcxx11-abi-builder:cpu-cxx11-abi-main to pytorch/manylinuxcxx11-abi-builder:cpu-cxx11-abi-2.2 , dry_run: enabled
Tagging pytorch/manylinuxaarch64-builder:cpu-aarch64-main to pytorch/manylinuxaarch64-builder:cpu-aarch64-2.2 , dry_run: enabled
Tagging pytorch/conda-builder:cuda11.8-main to pytorch/conda-builder:cuda11.8-2.2 , dry_run: enabled
Tagging pytorch/conda-builder:cuda12.1-main to pytorch/conda-builder:cuda12.1-2.2 , dry_run: enabled
Tagging pytorch/conda-builder:cpu-main to pytorch/conda-builder:cpu-2.2 , dry_run: enabled
/data/users/atalman/pytorch/.github/workflows/generated-linux-binary-manywheel-nightly.yml
/data/users/atalman/pytorch/.github/workflows/generated-linux-binary-conda-nightly.yml
/data/users/atalman/pytorch/.github/workflows/generated-linux-binary-libtorch-cxx11-abi-nightly.yml
/data/users/atalman/pytorch/.github/workflows/generated-linux-binary-libtorch-pre-cxx11-nightly.yml
/data/users/atalman/pytorch/.github/workflows/generated-linux-aarch64-binary-manywheel-nightly.yml
/data/users/atalman/pytorch/.github/workflows/generated-linux-binary-manywheel-main.yml
/data/users/atalman/pytorch/.github/workflows/generated-linux-binary-libtorch-cxx11-abi-main.yml
/data/users/atalman/pytorch/.github/workflows/generated-linux-binary-libtorch-pre-cxx11-main.yml
/data/users/atalman/pytorch/.github/workflows/generated-windows-binary-wheel-nightly.yml
/data/users/atalman/pytorch/.github/workflows/generated-windows-binary-conda-nightly.yml
/data/users/atalman/pytorch/.github/workflows/generated-windows-binary-libtorch-release-nightly.yml
/data/users/atalman/pytorch/.github/workflows/generated-windows-binary-libtorch-debug-nightly.yml
/data/users/atalman/pytorch/.github/workflows/generated-windows-binary-libtorch-release-main.yml
/data/users/atalman/pytorch/.github/workflows/generated-windows-binary-libtorch-debug-main.yml
/data/users/atalman/pytorch/.github/workflows/generated-macos-binary-wheel-nightly.yml
/data/users/atalman/pytorch/.github/workflows/generated-macos-binary-conda-nightly.yml
/data/users/atalman/pytorch/.github/workflows/generated-macos-binary-libtorch-cxx11-abi-nightly.yml
/data/users/atalman/pytorch/.github/workflows/generated-macos-arm64-binary-libtorch-cxx11-abi-nightly.yml
/data/users/atalman/pytorch/.github/workflows/generated-macos-arm64-binary-wheel-nightly.yml
/data/users/atalman/pytorch/.github/workflows/generated-macos-arm64-binary-conda-nightly.yml
````
Result of pinning unstable and disabled jobs:
```
# The link to the published list of disabled jobs
DISABLED_JOBS_URL = "https://ossci-metrics.s3.amazonaws.com/disabled-jobs.json?versionid=kKJlAXdrUbk3CilXbKu.6OwNTGQB8a.B"
# and unstable jobs
UNSTABLE_JOBS_URL = "https://ossci-metrics.s3.amazonaws.com/unstable-jobs.json?versionid=vzaicOxSsh55iXBXwgGrW6dFeVtPfrhr"
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114355
Approved by: https://github.com/malfet
Move the pytest cache downloading into the build step and store it in additional ci files so that it stays consistent during sharding.
Only build env is taken into account now instead of also test config since we might not have the test config during build time, making it less specific, but I also think this might be better since tests are likely to fail across the same test config (I also think it might be worth not even looking at build env but thats a different topic)
Each cache upload should only include information from the current run. Do not merge current cache with downloaded cache during upload (shouldn't matter anyways since the downloaded cache won't exist at the time)
From what I cant tell of the s3 retention policy, pytest cache files will be deleted after 30 days (cc @ZainRizvi to confirm), so we never have to worry about space or pulling old versions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113804
Approved by: https://github.com/ZainRizvi
Abort merges invoked with `-r` if there is nothing to rebase
Make `rebase_onto`/`rebase_ghstack_onto` return False if rebase is no-op and abort merge in that case
Remove `-e` option from both trymerge and tryrebase workflows as one should never report failures on workflow dispatch
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113960
Approved by: https://github.com/clee2000
The bot that creates the issue got changed, but the search did not, so it wasn't finding old PRs and was just making new ones.
This PR makes it reuse PRs again instead of making a new one everytime.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113961
Approved by: https://github.com/huydhn