Commit graph

685 commits

Author SHA1 Message Date
Huy Do
6f340c6f30 Handle the case when opening a reverted PR with deleted head branch (#114423)
When reopening a reverted PR, `422: Unprocessable Entity` is returned when the head branch has been deleted, for example https://github.com/pytorch/pytorch/pull/112889#issuecomment-1823216686

```
{
  "message": "Validation Failed",
  "errors": [
    {
      "resource": "PullRequest",
      "code": "custom",
      "field": "state",
      "message": "state cannot be changed. The commsplit branch has been deleted."
    }
  ],
  "documentation_url": "https://docs.github.com/rest/pulls/pulls#update-a-pull-request"
}
```

The revert still happens though, only reopening PR fails, which is ok to ignore in this case I think instead of going the complicated route of trying to restore the deleted branch by merge bot.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114423
Approved by: https://github.com/malfet, https://github.com/kit1980
2023-11-23 07:32:46 +00:00
atalman
7a697c4683 [RelEng] Tag docker images for release, pin unstable and disabled jobs, apply release only changes (#114355)
1. This tags docker images using docker pull/tag/push for current release
2. Sets RELEASE_VERSION_TAG var and regenerates the workflows using the new docker tag
3. Remove conda token setting and Binary tests release changes these are already automated
4. Pin unstable and disabled jobs, autumate: https://github.com/pytorch/pytorch/pull/111675

Test:
```
RELEASE_VERSION=2.2 ./scripts/release/apply-release-changes.sh
Tagging pytorch/manylinux-builder:cuda11.8-main to pytorch/manylinux-builder:cuda11.8-2.2 , dry_run: enabled
Tagging pytorch/manylinux-builder:cuda12.1-main to pytorch/manylinux-builder:cuda12.1-2.2 , dry_run: enabled
Tagging pytorch/libtorch-cxx11-builder:cuda11.8-main to pytorch/libtorch-cxx11-builder:cuda11.8-2.2 , dry_run: enabled
Tagging pytorch/libtorch-cxx11-builder:cuda12.1-main to pytorch/libtorch-cxx11-builder:cuda12.1-2.2 , dry_run: enabled
Tagging pytorch/manylinux-builder:rocm5.6-main to pytorch/manylinux-builder:rocm5.6-2.2 , dry_run: enabled
Tagging pytorch/manylinux-builder:rocm5.7-main to pytorch/manylinux-builder:rocm5.7-2.2 , dry_run: enabled
Tagging pytorch/libtorch-cxx11-builder:rocm5.6-main to pytorch/libtorch-cxx11-builder:rocm5.6-2.2 , dry_run: enabled
Tagging pytorch/libtorch-cxx11-builder:rocm5.7-main to pytorch/libtorch-cxx11-builder:rocm5.7-2.2 , dry_run: enabled
Tagging pytorch/manylinux-builder:cpu-main to pytorch/manylinux-builder:cpu-2.2 , dry_run: enabled
Tagging pytorch/libtorch-cxx11-builder:cpu-main to pytorch/libtorch-cxx11-builder:cpu-2.2 , dry_run: enabled
Tagging pytorch/manylinuxcxx11-abi-builder:cpu-cxx11-abi-main to pytorch/manylinuxcxx11-abi-builder:cpu-cxx11-abi-2.2 , dry_run: enabled
Tagging pytorch/manylinuxaarch64-builder:cpu-aarch64-main to pytorch/manylinuxaarch64-builder:cpu-aarch64-2.2 , dry_run: enabled
Tagging pytorch/conda-builder:cuda11.8-main to pytorch/conda-builder:cuda11.8-2.2 , dry_run: enabled
Tagging pytorch/conda-builder:cuda12.1-main to pytorch/conda-builder:cuda12.1-2.2 , dry_run: enabled
Tagging pytorch/conda-builder:cpu-main to pytorch/conda-builder:cpu-2.2 , dry_run: enabled
/data/users/atalman/pytorch/.github/workflows/generated-linux-binary-manywheel-nightly.yml
/data/users/atalman/pytorch/.github/workflows/generated-linux-binary-conda-nightly.yml
/data/users/atalman/pytorch/.github/workflows/generated-linux-binary-libtorch-cxx11-abi-nightly.yml
/data/users/atalman/pytorch/.github/workflows/generated-linux-binary-libtorch-pre-cxx11-nightly.yml
/data/users/atalman/pytorch/.github/workflows/generated-linux-aarch64-binary-manywheel-nightly.yml
/data/users/atalman/pytorch/.github/workflows/generated-linux-binary-manywheel-main.yml
/data/users/atalman/pytorch/.github/workflows/generated-linux-binary-libtorch-cxx11-abi-main.yml
/data/users/atalman/pytorch/.github/workflows/generated-linux-binary-libtorch-pre-cxx11-main.yml
/data/users/atalman/pytorch/.github/workflows/generated-windows-binary-wheel-nightly.yml
/data/users/atalman/pytorch/.github/workflows/generated-windows-binary-conda-nightly.yml
/data/users/atalman/pytorch/.github/workflows/generated-windows-binary-libtorch-release-nightly.yml
/data/users/atalman/pytorch/.github/workflows/generated-windows-binary-libtorch-debug-nightly.yml
/data/users/atalman/pytorch/.github/workflows/generated-windows-binary-libtorch-release-main.yml
/data/users/atalman/pytorch/.github/workflows/generated-windows-binary-libtorch-debug-main.yml
/data/users/atalman/pytorch/.github/workflows/generated-macos-binary-wheel-nightly.yml
/data/users/atalman/pytorch/.github/workflows/generated-macos-binary-conda-nightly.yml
/data/users/atalman/pytorch/.github/workflows/generated-macos-binary-libtorch-cxx11-abi-nightly.yml
/data/users/atalman/pytorch/.github/workflows/generated-macos-arm64-binary-libtorch-cxx11-abi-nightly.yml
/data/users/atalman/pytorch/.github/workflows/generated-macos-arm64-binary-wheel-nightly.yml
/data/users/atalman/pytorch/.github/workflows/generated-macos-arm64-binary-conda-nightly.yml
````

Result of pinning unstable and disabled jobs:
```
# The link to the published list of disabled jobs
DISABLED_JOBS_URL = "https://ossci-metrics.s3.amazonaws.com/disabled-jobs.json?versionid=kKJlAXdrUbk3CilXbKu.6OwNTGQB8a.B"
# and unstable jobs
UNSTABLE_JOBS_URL = "https://ossci-metrics.s3.amazonaws.com/unstable-jobs.json?versionid=vzaicOxSsh55iXBXwgGrW6dFeVtPfrhr"
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114355
Approved by: https://github.com/malfet
2023-11-23 02:14:22 +00:00
atalman
995fae6060 Move small pypi build as default for linux cuda 12.1 (#114281)
This is first PR to resolve: https://github.com/pytorch/pytorch/issues/113972
Move our small wheel build as default
Test:
```
pip3 install --no-cache-dir --pre torch-2.2.0.dev20231121%2Bcu121-cp310-cp310-linux_x86_64.whl  --index-url https://download.pytorch.org/whl/nightly/cu121
Looking in indexes: https://download.pytorch.org/whl/nightly/cu121
Processing ./torch-2.2.0.dev20231121%2Bcu121-cp310-cp310-linux_x86_64.whl
Collecting filelock (from torch==2.2.0.dev20231121+cu121)
  Downloading https://download.pytorch.org/whl/nightly/filelock-3.9.0-py3-none-any.whl (9.7 kB)
Collecting typing-extensions>=4.8.0 (from torch==2.2.0.dev20231121+cu121)
  Downloading https://download.pytorch.org/whl/nightly/typing_extensions-4.8.0-py3-none-any.whl (31 kB)
Collecting sympy (from torch==2.2.0.dev20231121+cu121)
  Downloading https://download.pytorch.org/whl/nightly/sympy-1.11.1-py3-none-any.whl (6.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.5/6.5 MB 253.4 MB/s eta 0:00:00
Collecting networkx (from torch==2.2.0.dev20231121+cu121)
  Downloading https://download.pytorch.org/whl/nightly/networkx-3.0rc1-py3-none-any.whl (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 387.1 MB/s eta 0:00:00
Collecting jinja2 (from torch==2.2.0.dev20231121+cu121)
  Downloading https://download.pytorch.org/whl/nightly/Jinja2-3.1.2-py3-none-any.whl (133 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 133.1/133.1 kB 365.3 MB/s eta 0:00:00
Collecting fsspec (from torch==2.2.0.dev20231121+cu121)
  Downloading https://download.pytorch.org/whl/nightly/fsspec-2023.4.0-py3-none-any.whl (153 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 154.0/154.0 kB 370.6 MB/s eta 0:00:00
Collecting pytorch-triton==2.1.0+6e4932cda8 (from torch==2.2.0.dev20231121+cu121)
  Downloading https://download.pytorch.org/whl/nightly/pytorch_triton-2.1.0%2B6e4932cda8-cp310-cp310-linux_x86_64.whl (125.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 125.4/125.4 MB 384.1 MB/s eta 0:00:00
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch==2.2.0.dev20231121+cu121)
  Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.7/23.7 MB 404.9 MB/s eta 0:00:00
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch==2.2.0.dev20231121+cu121)
  Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 823.6/823.6 kB 402.5 MB/s eta 0:00:00
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch==2.2.0.dev20231121+cu121)
  Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.1/14.1 MB 383.9 MB/s eta 0:00:00
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch==2.2.0.dev20231121+cu121)
  Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 731.7/731.7 MB 406.9 MB/s eta 0:00:00
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch==2.2.0.dev20231121+cu121)
  Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 410.6/410.6 MB 388.2 MB/s eta 0:00:00
Collecting nvidia-cufft-cu12==11.0.2.54 (from torch==2.2.0.dev20231121+cu121)
  Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.6/121.6 MB 410.5 MB/s eta 0:00:00
Collecting nvidia-curand-cu12==10.3.2.106 (from torch==2.2.0.dev20231121+cu121)
  Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.5/56.5 MB 272.9 MB/s eta 0:00:00
Collecting nvidia-cusolver-cu12==11.4.5.107 (from torch==2.2.0.dev20231121+cu121)
  Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.2/124.2 MB 381.5 MB/s eta 0:00:00
Collecting nvidia-cusparse-cu12==12.1.0.106 (from torch==2.2.0.dev20231121+cu121)
  Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 196.0/196.0 MB 394.6 MB/s eta 0:00:00
Collecting nvidia-nccl-cu12==2.19.3 (from torch==2.2.0.dev20231121+cu121)
  Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_nccl_cu12-2.19.3-py3-none-manylinux1_x86_64.whl (166.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 166.0/166.0 MB 384.7 MB/s eta 0:00:00
Collecting nvidia-nvtx-cu12==12.1.105 (from torch==2.2.0.dev20231121+cu121)
  Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 99.1/99.1 kB 281.8 MB/s eta 0:00:00
Collecting nvidia-nvjitlink-cu12 (from nvidia-cusolver-cu12==11.4.5.107->torch==2.2.0.dev20231121+cu121)
  Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_nvjitlink_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (19.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19.8/19.8 MB 367.3 MB/s eta 0:00:00
Collecting MarkupSafe>=2.0 (from jinja2->torch==2.2.0.dev20231121+cu121)
  Downloading https://download.pytorch.org/whl/nightly/MarkupSafe-2.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25 kB)
Collecting mpmath>=0.19 (from sympy->torch==2.2.0.dev20231121+cu121)
  Downloading https://download.pytorch.org/whl/nightly/mpmath-1.2.1-py3-none-any.whl (532 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 532.6/532.6 kB 391.3 MB/s eta 0:00:00
Installing collected packages: mpmath, typing-extensions, sympy, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, networkx, MarkupSafe, fsspec, filelock, pytorch-triton, nvidia-cusparse-cu12, nvidia-cudnn-cu12, jinja2, nvidia-cusolver-cu12, torch
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114281
Approved by: https://github.com/malfet, https://github.com/huydhn
2023-11-22 00:10:03 +00:00
Catherine Lee
dab272eed8 [td] Consistent pytest cache (#113804)
Move the pytest cache downloading into the build step and store it in additional ci files so that it stays consistent during sharding.

Only build env is taken into account now instead of also test config since we might not have the test config during build time, making it less specific, but I also think this might be better since tests are likely to fail across the same test config (I also think it might be worth not even looking at build env but thats a different topic)

Each cache upload should only include information from the current run.  Do not merge current cache with downloaded cache during upload (shouldn't matter anyways since the downloaded cache won't exist at the time)

From what I cant tell of the s3 retention policy, pytest cache files will be deleted after 30 days (cc @ZainRizvi to confirm), so we never have to worry about space or pulling old versions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113804
Approved by: https://github.com/ZainRizvi
2023-11-17 23:45:47 +00:00
Nikita Shulga
3fc38e6c83 [GHF] Abort merge on rebase failure (#113960)
Abort merges invoked with `-r` if there is nothing to rebase

Make `rebase_onto`/`rebase_ghstack_onto` return False if rebase is no-op and abort merge in that case

Remove `-e` option from both trymerge and tryrebase workflows as  one should never report failures on workflow dispatch

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113960
Approved by: https://github.com/clee2000
2023-11-17 23:11:00 +00:00
Catherine Lee
c51827b8ce [ez] Hash update to reuse issues again (#113961)
The bot that creates the issue got changed, but the search did not, so it wasn't finding old PRs and was just making new ones.

This PR makes it reuse PRs again instead of making a new one everytime.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113961
Approved by: https://github.com/huydhn
2023-11-17 19:06:38 +00:00
albanD
25fb88cf23 Add all 3.12 binary build for wheel. Let's see how it goes. V2 (#112882)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112882
Approved by: https://github.com/malfet, https://github.com/sammcj
2023-11-16 18:20:12 +00:00
Eli Uriegas
84ee7453ad ci: Add clickable PR link to trymerge (#113712)
Adds a link to trymerge so that you can quickly click through the job to
the pull request for debugging.

Signed-off-by: Eli Uriegas <eliuriegas@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113712
Approved by: https://github.com/clee2000, https://github.com/malfet
2023-11-15 01:55:33 +00:00
Catherine Lee
6e73ae2022 [ci][ez] Add job_id to emit_metrics (#113099)
As in title.

Also print the job id in the step since I'm struggling to find it
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113099
Approved by: https://github.com/seemethere
2023-11-08 10:32:41 +00:00
Huy Do
dd957138ec Pin Docker images to main (#112692)
This will help prevent a commit like 77901321d9 pushing to release branch from overwrite the Docker images used in main.  In addition, the `DEFAULT_TAG` can be easily updated to `2.1` for example when doing branch cut release.  This basically pins the Docker images like https://github.com/pytorch/pytorch/pull/111971

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112692
Approved by: https://github.com/malfet
2023-11-02 17:39:45 +00:00
Nikita Shulga
1b86d5ef2f [Ci] Add arm64 libtorch CI config (#112474)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112474
Approved by: https://github.com/ZainRizvi, https://github.com/seemethere
ghstack dependencies: #112451, #112452
2023-11-01 19:09:34 +00:00
Nikita Shulga
54c7d0d99d [GHF] Bot should reopen PR after revert (#112614)
Fixes https://github.com/pytorch/test-infra/issues/4692
Test plan, see https://github.com/malfet/deleteme/pull/58#issuecomment-1789365259 / https://github.com/malfet/deleteme/actions/runs/6723011476
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112614
Approved by: https://github.com/seemethere, https://github.com/ezyang
ghstack dependencies: #112613
2023-11-01 18:03:32 +00:00
Nikita Shulga
4a2242e479 [BE] Use GITHUB_API_URL (#112613)
To avoid hardcoding the same string constant over and over again
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112613
Approved by: https://github.com/seemethere
2023-11-01 18:03:32 +00:00
Nikita Shulga
8d6b4322d0 [CI] Limit libtorch builds to shared-with-deps (#112452)
As that is the only variant that is being mentioned on  https://pytorch.org/get-started/locally/

And for MacOS those three flavors were just building and uploading the
same thing 3 times over, see [this](https://github.com/pytorch/pytorch/actions/runs/6689661275/job/18176516410) for example:
```
upload: ../../_temp/artifacts/libtorch-macos-2.2.0.dev20231030.zip to s3://pytorch/libtorch/nightly/cpu/libtorch-macos-2.2.0.dev20231030.zip
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112452
Approved by: https://github.com/huydhn
ghstack dependencies: #112451
2023-10-31 08:40:06 +00:00
Nikita Shulga
0ce8cf7c7a Update small wheel nccl-version to 2.19.3 (#112293)
To keep it in sync with https://github.com/pytorch/pytorch/pull/110827

Added check to `scripts/generate_binary_build_matrix.py` to validate submodule and small wheel nccl versions are the same

Step one in addressing https://github.com/pytorch/pytorch/issues/112285
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112293
Approved by: https://github.com/huydhn
2023-10-31 01:20:01 +00:00
Huy Do
f6f81a5969 Update get-workflow-job-id to also return job name (#112103)
Then we can use this job name in `filter-test-configs` if it's available.  This addresses the issue in which `filter-test-configs` on GitHub runners (MacOS x86) couldn't find the runner log to get the job name.  This is expected because GitHub runners are isolated, so a job should not be able to access runner logs, which could contains information from other jobs.

This allows all missing features depending on running `filter-test-configs` on GitHub runners:
* Rerun disabled tests and memory leak check. For example, this would help avoid closing https://github.com/pytorch/pytorch/issues/110980#issuecomment-1779806466 early with the disabled test running properly on MacOS x86
* MacOS x86 jobs can now be disabled or marked as unstable

I keep the current logic to parse the log as a fallback because it's working fine on self-hosted runners.  That also handles the case where `get-workflow-job-id` fails.  Also I move the rest of `get-workflow-job-id` up before the test step like https://github.com/pytorch/pytorch/pull/111483

### Testing

Spot checks some jobs to confirm they have the correct names:

* MacOS M1 test job https://github.com/pytorch/pytorch/actions/runs/6648305319/job/18065275722?pr=112103#step:10:8
* MacOS x86 build job https://github.com/pytorch/pytorch/actions/runs/6648306305/job/18065138137?pr=112103#step:9:14
* Linux test job has https://github.com/pytorch/pytorch/actions/runs/6648300991/job/18065354503?pr=112103#step:13:7
* Windows test job https://github.com/pytorch/pytorch/actions/runs/6648305319/job/18065599500?pr=112103#step:12:7
* MacOS x86 test job https://github.com/pytorch/pytorch/actions/runs/6648306305/job/18066312801#step:10:8
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112103
Approved by: https://github.com/clee2000
2023-10-26 16:42:46 +00:00
Huy Do
9132734a35 Use Dr.CI GitHub checkrun summary when querying its API fails (#111628)
This will allow internal SandCastle job to access Dr.CI classification results via GitHub checkrun summary and correctly ignore unrelated failures.

### Testing

Adding `TestBypassFailuresOnSandCastle` where Dr.CI API returns nothing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111628
Approved by: https://github.com/clee2000
2023-10-24 01:32:30 +00:00
Aaron Gokaslan
9b499b417e [BE]: Apply subprocess check to github scripts (#111684)
Add subproces checks to raise exceptions in Github scripts
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111684
Approved by: https://github.com/albanD
2023-10-20 23:37:57 +00:00
Huy Do
4ec777e9a5 [BE] Clean up trymerge code handling broken trunk failures (#111520)
This is the final part of https://github.com/pytorch/pytorch/pull/110054.  The broken trunk classification has been done on Dr.CI, so we can just check for that in trymerge for consistency when ghstack is used.

* [x] https://github.com/pytorch/pytorch/pull/110054
* [x] https://github.com/pytorch/pytorch/pull/110133
* [x] This PR to clean up the broken trunk logic.

One important change is that `get_classifications` doesn't need to query the jobs from Rockset for the head and merge base SHA anymore, saving a query there.  The function looks a lot simpler now.

### Testing

https://github.com/pytorch/pytorch/pull/111253 had 1 broken trunk failure as detected by Dr.CI from the base commit 3eb5cae3af (valid) while trymerge didn't detect that because ghstack base commit be8e517174 didn't have the same failure (miss).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111520
Approved by: https://github.com/clee2000
2023-10-19 02:30:56 +00:00
atalman
f9053877b4 Add pypi required metadata to all wheels except linux (#111042)
Will fix package after publishing https://github.com/pytorch/pytorch/issues/100974
Poetry install requires all wheels on pypi to have same metadata. Hence including linux dependencies in all non-linux wheels

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111042
Approved by: https://github.com/malfet
2023-10-12 17:40:13 +00:00
Nikita Shulga
92fea5ae3f [GHF] Re-enable test_internal_changes (#110834)
As Jon fixed the internal change status reporting after the issue is closed
Fixes https://github.com/pytorch/pytorch/issues/110218

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110834
Approved by: https://github.com/janeyx99
2023-10-09 03:23:07 +00:00
Nikita Shulga
d35e3dbd06 Fix concurrency limits for Create Release (#110759)
Also, don't run it on tags, but run on release branch and on `release` event.
Tweak linter to accept different concurrency limits for `create_release.yml`

Fixes https://github.com/pytorch/pytorch/issues/110569 as all the invocations of workflow in the past were cancelled by concurrently limit due to the tag push and release happening at roughly the same time, see https://github.com/pytorch/pytorch/actions/workflows/create_release.yml?query=event%3Arelease

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110759
Approved by: https://github.com/atalman
2023-10-06 23:14:12 +00:00
Huy Do
f952551963 Handle invalid cancellation signals in trymerge (#110690)
This change is needed after https://github.com/pytorch/test-infra/pull/4579 and https://github.com/pytorch/test-infra/pull/4610.  All invalid cancelled signals have been removed from Dr.CI and HUD.  So trymerge should ignore them accordingly for a consistent experience.

### Testing

https://github.com/pytorch/pytorch/pull/110367#issuecomment-1750099960 is the PR where a bunch of invalid cancelled signals showed up and blocked merges

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110690
Approved by: https://github.com/clee2000, https://github.com/ZainRizvi
2023-10-06 22:43:33 +00:00
Huy Do
26bfb0fc21 Check for both workflow and job names from Dr.CI (#110661)
In https://github.com/pytorch/pytorch/pull/110362, the failure was flaky but merge bot treated it as an actual failure. This is a regression after https://github.com/pytorch/test-infra/pull/4604 where the name returned by Dr.CI now includes workflow name.  For example, the name is `trunk / macos-12-py3-arm64 / test (default, 2, 3, macos-m1-12)` in the JSON response:

```
{"FAILED": [], "FLAKY": [{"workflowId": 6372581477, "id": 17297638807, "name": "trunk / macos-12-py3-arm64 / test (default, 2, 3, macos-m1-12)", "jobName": "macos-12-py3-arm64 / test (default, 2, 3, macos-m1-12)", "conclusion": "failure", "completed_at": "2023-10-01T22:18:28Z", "html_url": "https://github.com/pytorch/pytorch/actions/runs/6372581477/job/17297638807", "head_branch": "ciflow/trunk/110362", "pr_number": 110362, "head_sha": "03f51e36dedf234931006d1db61677b229c9a119", "failure_captures": ["Failure: There is only 4671284KB free space left in /, which is less than the minimum requirement of"], "failure_line": "Failure: There is only 4671284KB free space left in /, which is less than the minimum requirement of 6291456KB for macOS", "time": "2023-10-01T22:17:53.847751Z"}], "BROKEN_TRUNK": [], "UNSTABLE": []}
```

I update merge bot to handle this better by considering both workflow name, job name, and the combination full name.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110661
Approved by: https://github.com/clee2000
2023-10-06 04:36:52 +00:00
Nikita Shulga
cd0e7d133b Migrate MacOs wheel binary builds to ephemeral M1 runners (#110432)
Surprisingly there are no speed difference between running the cross-compilation on `macos12-xl` (x86_64 12 core machine) and `macos-13-xlarge` (m1 6 core machine)

Most of the changes are on the https://github.com/pytorch/builder side:
- 50a6e91f97 skips installing mkl on M1 machines
- bbb29b0467 same for llvm-9
- 8bcc83dbb1 bumps minimal numpy version to 1.19 (as 1.17 is not available for m1)
- cc4f1f9055 skips building tests/distributed for M1
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110432
Approved by: https://github.com/kit1980
2023-10-03 17:31:28 +00:00
Huy Do
81a74457ca [BE] Clean up trymerge code handling flaky failures (#110133)
This is the 2nd part of https://github.com/pytorch/pytorch/pull/110054.  The flaky classification has been done on Dr.CI.  There is no need to download flaky rule files and do the check anymore.  Some tests are also updated with new examples because we mocked the list of flaky rules there.  Similar tests have been done on Dr.CI.

* [x] https://github.com/pytorch/pytorch/pull/110054
* [x] Clean up the flaky rules logic because it has already been implemented on Dr. CI
* [ ] Clean up the broken trunk logic for the same reason

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110133
Approved by: https://github.com/clee2000
2023-09-30 08:01:00 +00:00
Nikita Shulga
ae546db562 [GHF] Update meregbot tests (#110221)
One should never edit `gql_mocks.json` by hand, as otherwise it does not validate mergebot behavior using the actual GitHub data, but rather snapshot of this data frozen in time.

Unfortunately, GitHub started to delete checkrun statuses against older
PR, so some tests needs to be updated.

For example https://github.com/pytorch/pytorch/pull/77700/checks committed on May 19th 2022 has no checks at the time of the writing (Sep 28th 2023)

Deleted `test_checksuites_pagination` as its checks are gone it tests the same functionality as `test_get_checkruns_many_runs`, which was updated to use more recent PR.

Deleted `test_get_classifications_pending_unstable`, because what it wants to test is inherently unreliable and therefore it must be rewritten using some different mechanisms.

Disabled `test_internal_changes` as the mechanism is broken at the moment, see https://github.com/pytorch/pytorch/issues/110218

Updated `test_pr_dependencies_ghstack` and `test_pr_dependencies` to generate `msg` using `pr.get_body()` rather than hardcode the text (that were updated after test was committed.)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110221
Approved by: https://github.com/clee2000, https://github.com/huydhn
2023-09-28 21:29:17 +00:00
Nikita Shulga
a200bb5e54 [BE] Do not use assert in unit tests (#110179)
One should always use `unittest.assert` methods rather than plain `assert` as later can be turned into a noop if Python runtime is invoked with optimizations enabled

Fixes use of `assert` introduced by https://github.com/pytorch/pytorch/pull/105251

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110179
Approved by: https://github.com/huydhn
2023-09-27 21:53:18 +00:00
Huy Do
955298bc40 Use Dr.CI results to classify flaky failures in trymerge (#110054)
After https://github.com/pytorch/test-infra/pull/4589, we can now query Dr.CI to get the list of flaky failures there.  This change queries Dr.CI API endpoint and check if the failure is a flaky one using `is_flaky` function.

Because the change is relatively large, I'm breaking it down to several smaller PRs in this order:

* [x] This PR queries Dr.CI and adds `is_flaky` check
* [ ] Clean up the flaky rules logic because it has already been implemented on Dr. CI
* [ ] Clean up the broken trunk logic for the same reason

### Testing

* Create a new `drci_mocks.json` file to catch the JSON response from Dr.CI API endpoint. The API requires `DRCI_BOT_KEY`.
*  `pytest -v test_trymerge.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110054
Approved by: https://github.com/clee2000
2023-09-27 21:21:29 +00:00
Huy Do
7c1702f099 Keep JSON mocks file in gzip format (#110173)
This is to keep them smaller than the file size limit enforced in fbcode.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110173
Approved by: https://github.com/malfet
2023-09-27 20:16:58 +00:00
PyTorch MergeBot
063d2572da Revert "Use Dr.CI results to classify flaky failures in trymerge (#110054)"
This reverts commit d0f82cd082.

Reverted https://github.com/pytorch/pytorch/pull/110054 on behalf of https://github.com/huydhn due to The mock gql_mocks.json file is not bigger than the file size limit on fbcode ([comment](https://github.com/pytorch/pytorch/pull/110054#issuecomment-1737727552))
2023-09-27 16:33:10 +00:00
Huy Do
d0f82cd082 Use Dr.CI results to classify flaky failures in trymerge (#110054)
After https://github.com/pytorch/test-infra/pull/4589, we can now query Dr.CI to get the list of flaky failures there.  This change queries Dr.CI API endpoint and check if the failure is a flaky one using `is_flaky` function.

Because the change is relatively large, I'm breaking it down to several smaller PRs in this order:

* [x] This PR queries Dr.CI and adds `is_flaky` check
* [ ] Clean up the flaky rules logic because it has already been implemented on Dr. CI
* [ ] Clean up the broken trunk logic for the same reason

### Testing

* Create a new `drci_mocks.json` file to catch the JSON response from Dr.CI API endpoint. The API requires `DRCI_BOT_KEY`.
*  `pytest -v test_trymerge.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110054
Approved by: https://github.com/clee2000
2023-09-26 21:24:21 +00:00
Jithun Nair
86a9534165 Upgrade nightly wheels to rocm5.7 (#109571)
Follow-up to https://github.com/pytorch/builder/pull/1541

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109571
Approved by: https://github.com/ezyang
2023-09-21 13:41:23 +00:00
Aaron Gokaslan
6d725e7d66 [BE]: enable ruff rules PLR1722 and PLW3301 (#109461)
Enables two ruff rules derived from pylint:
* PLR1722 replaces any exit() calls with sys.exit(). exit() is only designed to be used in repl contexts as may not always be imported by default. This always use the version in the sys module which is better
* PLW3301 replaces nested min / max calls with simplified versions (ie. `min(a, min(b, c))` => `min(a, b. c)`). The new version is more idiomatic and more efficient.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109461
Approved by: https://github.com/ezyang
2023-09-18 02:07:21 +00:00
Huy Do
c9fdfafb00 Allow marking multiple unstable configs of the same job name (#109185)
This is a bug that has stayed for a surprisingly long period of time (my fault).  When there are multiple unstable configurations (`inductor`, `inductor_huggingface`, `inductor_huggingface_dynamic`) of the same job (`inductor / cuda12.1-py3.10-gcc9-sm86`), only the first one was marked as unstable.  The for loop returned too early and missed the other twos even though they were also marked as unstable, for example https://ossci-metrics.s3.amazonaws.com/unstable-jobs.json

### Testing

* Add an unit test
* CI run https://github.com/pytorch/pytorch/actions/runs/6169798353 shows that the configs below are all marked as unstable:
  * https://github.com/pytorch/pytorch/issues/107079
  * https://github.com/pytorch/pytorch/issues/109153
  * https://github.com/pytorch/pytorch/issues/109154
* Manually run the script to verify the test matrix output:
```
python .github/scripts/filter_test_configs.py \
    --workflow "inductor" \
    --job-name "cuda12.1-py3.10-gcc9-sm86 / build," \
    --test-matrix "{ include: [
    { config: "inductor", shard: 1, num_shards: 1, runner: "linux.g5.4xlarge.nvidia.gpu" },
    { config: "inductor_huggingface", shard: 1, num_shards: 1, runner: "linux.g5.4xlarge.nvidia.gpu" },
    { config: "inductor_timm", shard: 1, num_shards: 2, runner: "linux.g5.4xlarge.nvidia.gpu" },
    { config: "inductor_timm", shard: 2, num_shards: 2, runner: "linux.g5.4xlarge.nvidia.gpu" },
    { config: "inductor_torchbench", shard: 1, num_shards: 1, runner: "linux.g5.4xlarge.nvidia.gpu" },
    { config: "inductor_huggingface_dynamic", shard: 1, num_shards: 1, runner: "linux.g5.4xlarge.nvidia.gpu" },
    { config: "inductor_timm_dynamic", shard: 1, num_shards: 2, runner: "linux.g5.4xlarge.nvidia.gpu" },
    { config: "inductor_timm_dynamic", shard: 2, num_shards: 2, runner: "linux.g5.4xlarge.nvidia.gpu" },
    { config: "inductor_torchbench_dynamic", shard: 1, num_shards: 1, runner: "linux.g5.4xlarge.nvidia.gpu" },
    { config: "inductor_distributed", shard: 1, num_shards: 1, runner: "linux.g5.12xlarge.nvidia.gpu" },
  ]}
  " \
    --pr-number "" \
    --tag "" \
    --event-name "push" \
    --schedule "" \
    --branch ""
::set-output name=keep-going::False
::set-output name=is-unstable::False
::set-output name=reenabled-issues::
::set-output name=test-matrix::{"include": [{"config": "inductor", "shard": 1, "num_shards": 1, "runner": "linux.g5.4xlarge.nvidia.gpu", "unstable": "unstable"}, {"config": "inductor_huggingface", "shard": 1, "num_shards": 1, "runner": "linux.g5.4xlarge.nvidia.gpu", "unstable": "unstable"}, {"config": "inductor_timm", "shard": 1, "num_shards": 2, "runner": "linux.g5.4xlarge.nvidia.gpu"}, {"config": "inductor_timm", "shard": 2, "num_shards": 2, "runner": "linux.g5.4xlarge.nvidia.gpu"}, {"config": "inductor_torchbench", "shard": 1, "num_shards": 1, "runner": "linux.g5.4xlarge.nvidia.gpu"}, {"config": "inductor_huggingface_dynamic", "shard": 1, "num_shards": 1, "runner": "linux.g5.4xlarge.nvidia.gpu", "unstable": "unstable"}, {"config": "inductor_timm_dynamic", "shard": 1, "num_shards": 2, "runner": "linux.g5.4xlarge.nvidia.gpu"}, {"config": "inductor_timm_dynamic", "shard": 2, "num_shards": 2, "runner": "linux.g5.4xlarge.nvidia.gpu"}, {"config": "inductor_torchbench_dynamic", "shard": 1, "num_shards": 1, "runner": "linux.g5.4xlarge.nvidia.gpu"}, {"config": "inductor_distributed", "shard": 1, "num_shards": 1, "runner": "linux.g5.12xlarge.nvidia.gpu"}]}
::set-output name=is-test-matrix-empty::False
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109185
Approved by: https://github.com/clee2000
2023-09-13 17:06:37 +00:00
Huy Do
7be233f3a5 Remove commit hash when building triton wheel and conda in release mode (#108203)
This is the follow-up of https://github.com/pytorch/pytorch/pull/108187 to set the correct release version without commit hash for triton wheel and conda binaries when building them in release mode.

### Testing

* With commit hash (nightly): https://github.com/pytorch/pytorch/actions/runs/6019021716
* Without commit hash https://github.com/pytorch/pytorch/actions/runs/6019378616 (by adding `--release` into the PR)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108203
Approved by: https://github.com/atalman
2023-08-30 16:49:21 +00:00
Jack Taylor
196ef78b90 [ROCm] Use rocm manylinux builder image for triton wheels (#107600)
Update to ROCm triton pinned commit for the 2.1 branch cut off.

As part of this we are updating `build_triton_wheel.py` and `build-triton-wheel.yml` to support building ROCm triton wheels through pytorch/manylinux-rocm to avoid the need of slowly downloading rpm libraries for ROCm in the cpu manylinux builder image and avoiding the need to maintain a conditional file with hard coded repositories from radeon.org for every ROCm release.

This new approach will allow us to build wheels faster in a more easily maintainable way.

This PR also brings in a required change as Triton on ROCm requires device_type to be set to hip so we can pass down the correct device type to triton (https://github.com/ROCmSoftwarePlatform/triton/pull/284).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107600
Approved by: https://github.com/jansel, https://github.com/jithunnair-amd
2023-08-25 10:25:29 +00:00
Xu Zhao
26ae48832e Remove run torchbench. Torchbench runs are now part of the dynamo ci. (#107826)
As the title says.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107826
Approved by: https://github.com/huydhn
2023-08-24 01:19:49 +00:00
Huy Do
d7f943ec82 [mergebot] Flaky and broken trunk should take precedence over ic (#107761)
I notice a curious case on https://github.com/pytorch/pytorch/pull/107508 where there was one broken trunk failure and the PR was merged with `merge -ic`.  Because the failure had been classified as unrelated, I expected to see a no-op force merge here.  However, it showed up as a force merge with failure.

![Screenshot 2023-08-22 at 20 01 10](https://github.com/pytorch/pytorch/assets/475357/b9c93e24-8da8-4fc6-9b3d-61b6bd0a8937)

The record on Rockset reveals https://github.com/pytorch/pytorch/pull/107508 has:

* 0 broken trunk check (unexpected, this should be 1 as Dr. CI clearly say so)
* 1 ignore current check (unexpected, this should be 0 and the failure should be counted as broken trunk instead)
* 3 unstable ROCm jobs (expected)

It turns out that ignore current takes precedence over flaky and broken trunk classification.  This might have been the expectation in the past but I think that's not the case now.  The bot should be consistent with what is shown on Dr. CI.  The change here is to make flaky, unstable, and broken trunk classification to take precedence over ignore current.  Basically, we only need to ignore new or unrecognized failures that have yet been classified.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107761
Approved by: https://github.com/clee2000
2023-08-23 21:22:56 +00:00
Jane (Yuan) Xu
350fb16f47 Add space to merge cancel comment (#107603)
Minor QoL improvement
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107603
Approved by: https://github.com/kit1980, https://github.com/ZainRizvi
2023-08-21 21:43:15 +00:00
Zain Rizvi
b9c86c521d Make mergebot work with review comments (#107390)
Fixes https://github.com/pytorch/pytorch/issues/100406

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107390
Approved by: https://github.com/clee2000
ghstack dependencies: #107385
2023-08-17 21:31:41 +00:00
Zain Rizvi
4874b02379 [BE] Remove deprecated github gql param and disable inconsistent test (#107385)
Two fixes:
- Stop querying `pushDate`, which [has been deprecated ](https://docs.github.com/en/graphql/reference/objects) and now always returns null
- Disables the test `test_merge_ghstack_into` which was recently added in https://github.com/pytorch/pytorch/pull/105251. This test used the results of another person's ghstack PR for testing, but as the dev submitted chunks of their PR this test's assumptions have been broken. cc @izaitsevfb for a long term fix here

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107385
Approved by: https://github.com/clee2000
2023-08-17 21:31:41 +00:00
Huy Do
4979a1b8f9 Fix trymerge broken trunk detection when the merge base job was retried (successfully) (#107333)
This fixes a discrepancy bug between Dr.CI and trymerge when detecting broken trunk failures.
 Take https://github.com/pytorch/pytorch/pull/107160 as an example:

* Dr.CI correctly identifies the broken trunk failure
* while trymerge records it as a new failure

The issue is that the merge base [failure](https://github.com/pytorch/pytorch/actions/runs/5833057579/job/15820504498) was flaky.  It was retried successfully and its conclusion went from a failure to a success.  The Rockset query returns all run attempts and while Dr.CI correctly records the failure, trymerge overwrites it with the successful retry.   Thus, the latter saw a new failure.

This change makes trymerge keep the merge base failure similar to what Dr.CI does https://github.com/pytorch/test-infra/blob/main/torchci/pages/api/drci/drci.ts#L158-L168

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107333
Approved by: https://github.com/clee2000
2023-08-17 02:09:31 +00:00
Catherine Lee
a14d99bb6c Close non existent disable issues complete rollout (#106923)
follow up to https://github.com/pytorch/pytorch/pull/105096
It seems fine, anecdotally I have seen some issues closed and they haven't been reopened
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106923
Approved by: https://github.com/huydhn
2023-08-10 16:48:14 +00:00
Mike Schneider
861ae39938 [aarch64] Add PT Docker build image for aarch64 (#106881)
# Changes
* Update `generate_binary_build_matrix.py` for aarch64 to use `pytorch/manylinuxaarch64-builder:cpu` when it is created
* Executed `generate_binary_build_matrix.py` to update `generated-linux-aarch64-binary-manywheel-nightly.yml`

Aarch64 build/test will fail till the new docker image is available for consmption.

Builder PR to build docker image : https://github.com/pytorch/builder/pull/1472

This switches nightly to use the docker build : https://hud.pytorch.org/hud/pytorch/pytorch/nightly/1?per_page=50&name_filter=aarch64
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106881
Approved by: https://github.com/atalman
2023-08-09 20:28:04 +00:00
Ivan Zaitsev
d2aa3f5fa9 [GHF][mergebot] record ghstack dependencies in the commit message (#105251)
Currently all information about the dependencies of ghstack PRs (e.g. #105010) is stripped away:
c984885809/.github/scripts/trymerge.py (L1077-L1078)

This PR adds this information back in a more compact form. All dependencies (PR numbers) of each PR in ghstack are recorded.

The resulting commit message will look like this (the last line is new):

> Mock title (#123)
>
> Mock body text
> Pull Request resolved: https://github.com/pytorch/pytorch/pull/123
> Approved by: https://github.com/Approver1, https://github.com/Approver2
> ghstack dependencies: #1, #2

---

### Testing

Unit tests.

---

### Note Re: `# type: ignore[assignment]` in unit tests.

I did my due diligence to find alternatives. Unfortunately mypy [doesn't](https://github.com/python/mypy/issues/6713) support this [way of patching methods](https://docs.python.org/3/library/unittest.mock-examples.html#mock-patching-methods), and the alternatives are either extremely verbose or don't work for this case. I decided it's not worth the effort (since the problem is limited only to the unit test).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105251
Approved by: https://github.com/huydhn
2023-07-29 20:32:10 +00:00
Aaron Gokaslan
52d4b1ae31 [BE]: Enable ruff rules PIE807 and PIE810 (#106218)
* Enables PIE807 + PIE810. PIE807 is do not reimplement list builtin function using lambda and PIE810 is to always fuse startswith / endswith calls (I applied the autofixes for this before we had ruff enabled).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106218
Approved by: https://github.com/albanD
2023-07-28 22:35:56 +00:00
Catherine Lee
57f23ca58b Bot message changes for -f and rebase (#106150)
* Encourage people to use -i instead of -f for mergebot
* Add additional info for when rebase fails due to lacking permissions

<details><summary>dryrun</summary>

````
csl@csl-mbp ~/zzzzzzzz/pytorch [csl/errormsgs] $
(forpytorch) python3 .github/scripts/tryrebase.py 106089 --branch viable/strict --dry-run
+ git -C /Users/csl/zzzzzzzz/pytorch rev-parse --verify refs/remotes/origin/viable/strict
@pytorchbot started a rebase job onto [refs/remotes/origin/viable/strict](7c97c943fb). Check the current status [here](None)
+ git -C /Users/csl/zzzzzzzz/pytorch fetch origin pull/106089/head:pull/106089/head
+ git -C /Users/csl/zzzzzzzz/pytorch rebase refs/remotes/origin/viable/strict pull/106089/head
+ git -C /Users/csl/zzzzzzzz/pytorch rev-parse --verify pull/106089/head
+ git -C /Users/csl/zzzzzzzz/pytorch rev-parse --verify refs/remotes/origin/viable/strict
+ git -C /Users/csl/zzzzzzzz/pytorch push --dry-run -f https://github.com/Lightning-Sandbox/pytorch.git pull/106089/head:fix/spaces
stdout:
remote: Permission to Lightning-Sandbox/pytorch.git denied to clee2000.
fatal: unable to access 'https://github.com/Lightning-Sandbox/pytorch.git/': The requested URL returned error: 403

stderr:

Rebase failed due to Command `git -C /Users/csl/zzzzzzzz/pytorch push --dry-run -f https://github.com/Lightning-Sandbox/pytorch.git pull/106089/head:fix/spaces` returned non-zero exit code 128
```
remote: Permission to Lightning-Sandbox/pytorch.git denied to clee2000.
fatal: unable to access 'https://github.com/Lightning-Sandbox/pytorch.git/': The requested URL returned error: 403
```
This is likely because the author did not allow edits from maintainers on the PR or because the repo has additional permissions settings that mergebot does not qualify.
````
</details>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106150
Approved by: https://github.com/huydhn
2023-07-28 16:13:51 +00:00
DanilBaibak
7b73b1e8a7 Fixed test_get_classifications_pending_unstable (#106203)
Fixed `test_get_classifications_pending_unstable` test. [Broken test](https://github.com/pytorch/pytorch/actions/runs/5690543018/job/15424383198) on main branch.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106203
Approved by: https://github.com/malfet
2023-07-28 14:15:17 +00:00
Huy Do
4fe407ad73 Add details about ic, broken, flaky, and unstable checks to merge records (#106162)
At the moment, we only record the list of pending and failed check on Rockset merge records. This is enough to compute the force merge KPI(s), but isn't enough for more in-depth analysis on what happened at the time of the merge:

* If the number of `ok_failed_checks` is less than `ok_failed_checks_threshold`, the list of `failed_checks` would be empty (expectedly).  So Rockset would only record an empty list.
* We support retry in PR, so the classifications on Dr.CI could be different than what dev observed at the time of the merge if retry completed successfully

### Testing

`python .github/scripts/trymerge.py --comment-id 1654010315 106095 --dry-run` (need to comment out some of the code to actually write a test record to Rockset), then manually verify it with

```
SELECT
    *
FROM
    commons.merges
WHERE
    pr_num = 106095
```

to see that `ignore_current_checks`, `broken_trunk_checks`, `flaky_checks`, and `unstable_checks` shows up correctly
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106162
Approved by: https://github.com/clee2000
2023-07-28 09:41:02 +00:00