pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-15 21:00:47 +00:00

Author	SHA1	Message	Date
Catherine Lee	de9ddd19a5	Various CI settings (#117668 ) Test [ci-verbose-test-logs] (this worked, the test logs printing while running and interleaved and are really long) Settings for no timeout (step timeout still applies, only gets rid of ~30 min timeout for shard of test file) and no piping logs/extra verbose test logs (good for debugging deadlocks but results in very long and possibly interleaved logs). Also allows these to be set via pr body if the label name is in brackets ex [label name] or the test above. Pull Request resolved: https://github.com/pytorch/pytorch/pull/117668 Approved by: https://github.com/huydhn	2024-01-26 00:17:29 +00:00
Catherine Lee	02a411d4a6	[mergebot] Dry run for labels + easier to read Dr CI result (#118240 ) Dry run open for labels so we can run trymerge locally with dryrun without actually affected the PR Make Dr.CI results easier to read (previously a massive json dump, now just the job names + ids, in a nicer format) Pull Request resolved: https://github.com/pytorch/pytorch/pull/118240 Approved by: https://github.com/huydhn	2024-01-25 23:06:43 +00:00
Huy Do	eebe7e1d37	Migrate update-viablestrict to test-infra (#118163 ) In https://github.com/pytorch/test-infra/pull/4905, so that ExecuTorch can use the same GHA on their CI. ### Testing https://github.com/pytorch/pytorch/actions/runs/7634906738/job/20799502532#step:2:15480 Pull Request resolved: https://github.com/pytorch/pytorch/pull/118163 Approved by: https://github.com/clee2000	2024-01-25 07:07:34 +00:00
DanilBaibak	a545ebc870	Switched macOS runners type to macos-m1-stable (#117651 ) Switched macOS runners type to `macos-m1-stable`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/117651 Approved by: https://github.com/huydhn	2024-01-24 11:55:13 +00:00
Bin Bao	c6930aad46	Update Triton pin (#117873 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/117873 Approved by: https://github.com/shunting314, https://github.com/malfet	2024-01-23 21:05:30 +00:00
Nikita Shulga	98a044d33e	[CI] Build M1 conda binaries on M1 runners (#117801 ) As usual, almost no work on PyTorch side, all changes are on the builder end, namely: - `8b67d32929` - depend on `blas * mkl` only on x86 machines - `eb78393f1e` - install arm64 conda when running on Apple Silicon - `0d3aea4ee0` - constrain llvmdev-9 to x86 machines only - `6c6a33b271` - set correct DEVELOPER_DIR path TODO: - We should auto-detect this `DEVELOPER_DIR` via `xcode-select` Pull Request resolved: https://github.com/pytorch/pytorch/pull/117801 Approved by: https://github.com/atalman	2024-01-19 14:31:12 +00:00
Huy Do	cf470e7b59	Migrate update-commit-hash to test-infra (#117506 ) After https://github.com/pytorch/test-infra/pull/4885, the GHA is now reusable on `test-infra`. This tests the change and we can also land it after https://github.com/pytorch/test-infra/pull/4885 lands. Pull Request resolved: https://github.com/pytorch/pytorch/pull/117506 Approved by: https://github.com/malfet, https://github.com/atalman	2024-01-17 00:15:04 +00:00
Jithun Nair	24c39bb5e5	Upgrade nightly wheels to rocm6.0 (#116983 ) Follow-up to https://github.com/pytorch/builder/pull/1647 Pull Request resolved: https://github.com/pytorch/pytorch/pull/116983 Approved by: https://github.com/jeffdaily, https://github.com/atalman	2024-01-11 20:36:00 +00:00
Nikita Shulga	0f0020d76f	[GHF] Add support for new style stacks (#116873 ) Where base stack targets default branch, rather than base. But as default branch is likely to advance, since PR was made, search for mergebase before determining whether `base`..`head` are in sync with `orig` branch Also, rather than hardcode default branch name, fetch it from `GitHubPR.default_branch()` Test Plan: https://github.com/malfet/deleteme/pull/77 Pull Request resolved: https://github.com/pytorch/pytorch/pull/116873 Approved by: https://github.com/ezyang	2024-01-05 20:32:24 +00:00
Nikita Shulga	93b86bf531	[GHF] Implement stacked revert (#116447 ) By adding `get_ghstack_dependent_prs` that using `git branch --contains` finds all PRs containing stacked branch, selecting longest one (in terms of distance between origin and default branch) and skipping all open PRs Please note, that reverts should be applied in a reversed order with the one how PRs were landed originally. Use a bit of a defensive programming, i.e. revert single PR if attempt to fetch dependencies fails for some reason. Test plan: - Lint - ``` >>> from trymerge import GitRepo, GitHubPR, get_ghstack_prs, get_ghstack_dependent_prs >>> pr=GitHubPR("pytorch", "pytorch", 115188) >>> pr1=GitHubPR("pytorch", "pytorch", 115210) >>> repo=GitRepo("/Users/nshulga/git/pytorch/pytorch") >>> get_ghstack_dependent_prs(repo, pr1) [('22742d93a5357c9b5b45a74f91a6dc5599c9c266', <trymerge.GitHubPR object at 0x100f32f40>)] >>> get_ghstack_dependent_prs(repo, pr) [('22742d93a5357c9b5b45a74f91a6dc5599c9c266', <trymerge.GitHubPR object at 0x10102eaf0>), ('76b1d44d576c20be79295810904c589241ca1bd2', <trymerge.GitHubPR object at 0x10102eb50>)] >>> rc=get_ghstack_dependent_prs(repo, pr) rc[0]>>> rc[0][1].pr_num 115210 >>> rc[1][1].pr_num 115188 ``` - see: https://github.com/malfet/deleteme/pull/59#issuecomment-1869904714 and https://github.com/malfet/deleteme/pull/74#issuecomment-1870542702 Fixes https://github.com/pytorch/test-infra/issues/4845 Pull Request resolved: https://github.com/pytorch/pytorch/pull/116447 Approved by: https://github.com/huydhn ghstack dependencies: #116446	2023-12-27 23:01:16 +00:00
Nikita Shulga	5fcc2519f5	[GHF] Refactors (#116446 ) Prep change for allowing stacked reverts This is a no-op that factors out some helper function that would be useful later: - `get_pr_commit_sha` finds a committed sha for a given PR - `_revlist_to_prs` converts a revlist to GitHubPRs conditionally filtering some out - `do_revert_prs` reverts multiple PRs in a batch, but so far is invoked with only one PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/116446 Approved by: https://github.com/huydhn, https://github.com/seemethere	2023-12-27 23:01:16 +00:00
Nikita Shulga	87da0e1d23	[GHF] Fix gh_get_labels for small repos (#116444 ) Not sure if this is recent API change or what but `gh_get_labels('malfet', 'deleteme')` used to raise an exception (see https://github.com/malfet/deleteme/actions/runs/7334535266/job/19971328673#step:6:37 ) ``` File "/home/runner/work/deleteme/deleteme/.github/scripts/label_utils.py", line 50, in get_last_page_num_from_header link_info[link_info.rindex(prefix) + len(prefix) : link_info.rindex(suffix)] AttributeError: 'NoneType' object has no attribute 'rindex' ``` And with this fix it returns the expected list Pull Request resolved: https://github.com/pytorch/pytorch/pull/116444 Approved by: https://github.com/huydhn	2023-12-27 15:50:42 +00:00
Huy Do	d6de2df6b6	Improve the error message when a PR lacks the necessary approvals (#116161 ) The error message from https://github.com/pytorch/pytorch/pull/115329#issuecomment-1857135047 is pretty confusing because it lists some random `pytorch/metamates` folks from `superuser` merge rule. My attempt here is to make the error message clearer by pointing out: * All the matching merge rules and * Their list of approvers The message will now become: ``` Approvers from one of the follow rules are needed: - Core Reviewers (1, 2, 3, 4, 5, ...) - Core Maintainers (1, 2) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/116161 Approved by: https://github.com/malfet, https://github.com/PaliC, https://github.com/atalman, https://github.com/ZainRizvi	2023-12-22 00:22:43 +00:00
atalman	7b6210e8a4	Use matrix generate script for docker release workflows (#115949 ) Enable both supported CUDA version builds for docker release. Rather then building only 1 version. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115949 Approved by: https://github.com/huydhn	2023-12-18 20:20:59 +00:00
Nikita Shulga	7ed2bc7c67	[GHF] Do not block reverts with internal changes (#115903 ) As check is more often than not is unreliable, so better just post a warning and let the revert proceed. Fixes https://github.com/pytorch/test-infra/issues/4797 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115903 Approved by: https://github.com/clee2000, https://github.com/atalman	2023-12-15 17:00:07 +00:00
Nikita Shulga	28e37d4f3b	Update Trition pin (#115743 ) To include a cherry-pick of https://github.com/openai/triton/pull/2771 that should fix cuda-11.8 runtime issues Also, tweak build wheel script to update both ROCm and vanilla Trition builds version to 2.2 (even though on trunk it should probably be 3.3 already) TODO: Remove `ROCM_TRITION_VERSION` once both trunk and ROCM version are in sync again Pull Request resolved: https://github.com/pytorch/pytorch/pull/115743 Approved by: https://github.com/davidberard98	2023-12-14 18:54:24 +00:00
Aaron Gokaslan	794545c11f	[BE]: Enable RUF015 codebase wide (#115507 ) Constant time access of first value in collection. This is a constant time operation instead of converting the item to a list to get the first item which is linear. The rule is turned on which automatically autofixes and enforces this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115507 Approved by: https://github.com/malfet	2023-12-11 15:51:01 +00:00
Nikita Shulga	bf16fec463	Fix up triton builds (#115039 ) Follow ups after https://github.com/pytorch/pytorch/pull/114772 and https://github.com/pytorch/pytorch/pull/108187 - Triton builds should be published from `main` rather than `nightly` branch, as: - They are independent of any PyTorch changes - Every nightly is pinned to a specific commit therefore publishing updated triton binaries will not affect previous nightlies - If this is not the case, nightly promotion will never happen as binary builds on main will continue to fail in perpetuity searching for new triton binary - `patch_setup_py` is still needed to modify name of the package for ROCm builds Pull Request resolved: https://github.com/pytorch/pytorch/pull/115039 Approved by: https://github.com/seemethere, https://github.com/kit1980, https://github.com/huydhn	2023-12-03 23:14:41 +00:00
Nikita Shulga	a6294d8b9f	[RelEng] Enable Py312 conda builds (#114819 ) Once [sympy-1.12](https://anaconda.org/anaconda/sympy/files?version=1.12) has been added it can be build across the board Majority of the changes are in the builder repo: * `6b8c73fecb` tweaks numpy and openssl deps * `fc773dde97` <- tweak MLK requirements for Windows * `ca378c16f8` do not depend on Triton * `3c7404d80c` <- build without GLOO_SSL And finally, to workaround chicken-and-egg problem from [smoke_test.bat:97](`b92da8cd64/windows/internal/smoke_test.bat (L97)`) ```cmd call conda install -yq numpy pytorch %CONDA_EXTRA_ARGS% ``` Manually upload binaries to pytorch-nightly channel (will fix it akin to Nova in followup PRs) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114819 Approved by: https://github.com/huydhn	2023-12-03 01:30:03 +00:00
Bin Bao	8a90249bc2	[inductor] Update triton pin (#114772 ) Differential Revision: [D51761353](https://our.internmc.facebook.com/intern/diff/D51761353) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114772 Approved by: https://github.com/shunting314, https://github.com/atalman	2023-12-02 19:13:56 +00:00
pbialecki	386b9c2adc	build small pip wheels for CUDA 11.8 (#114620 ) As discussed, we would like to start building all wheels using the CUDA PyPI dependencies. Adding the "small wheel" workflow for CUDA 11.8 as it's already used for 12.1U1. CC @malfet @atalman Pull Request resolved: https://github.com/pytorch/pytorch/pull/114620 Approved by: https://github.com/atalman, https://github.com/malfet	2023-11-30 20:50:31 +00:00
Huy Do	6f340c6f30	Handle the case when opening a reverted PR with deleted head branch (#114423 ) When reopening a reverted PR, `422: Unprocessable Entity` is returned when the head branch has been deleted, for example https://github.com/pytorch/pytorch/pull/112889#issuecomment-1823216686 ``` { "message": "Validation Failed", "errors": [ { "resource": "PullRequest", "code": "custom", "field": "state", "message": "state cannot be changed. The commsplit branch has been deleted." } ], "documentation_url": "https://docs.github.com/rest/pulls/pulls#update-a-pull-request" } ``` The revert still happens though, only reopening PR fails, which is ok to ignore in this case I think instead of going the complicated route of trying to restore the deleted branch by merge bot. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114423 Approved by: https://github.com/malfet, https://github.com/kit1980	2023-11-23 07:32:46 +00:00
atalman	7a697c4683	[RelEng] Tag docker images for release, pin unstable and disabled jobs, apply release only changes (#114355 ) 1. This tags docker images using docker pull/tag/push for current release 2. Sets RELEASE_VERSION_TAG var and regenerates the workflows using the new docker tag 3. Remove conda token setting and Binary tests release changes these are already automated 4. Pin unstable and disabled jobs, autumate: https://github.com/pytorch/pytorch/pull/111675 Test: ``` RELEASE_VERSION=2.2 ./scripts/release/apply-release-changes.sh Tagging pytorch/manylinux-builder:cuda11.8-main to pytorch/manylinux-builder:cuda11.8-2.2 , dry_run: enabled Tagging pytorch/manylinux-builder:cuda12.1-main to pytorch/manylinux-builder:cuda12.1-2.2 , dry_run: enabled Tagging pytorch/libtorch-cxx11-builder:cuda11.8-main to pytorch/libtorch-cxx11-builder:cuda11.8-2.2 , dry_run: enabled Tagging pytorch/libtorch-cxx11-builder:cuda12.1-main to pytorch/libtorch-cxx11-builder:cuda12.1-2.2 , dry_run: enabled Tagging pytorch/manylinux-builder:rocm5.6-main to pytorch/manylinux-builder:rocm5.6-2.2 , dry_run: enabled Tagging pytorch/manylinux-builder:rocm5.7-main to pytorch/manylinux-builder:rocm5.7-2.2 , dry_run: enabled Tagging pytorch/libtorch-cxx11-builder:rocm5.6-main to pytorch/libtorch-cxx11-builder:rocm5.6-2.2 , dry_run: enabled Tagging pytorch/libtorch-cxx11-builder:rocm5.7-main to pytorch/libtorch-cxx11-builder:rocm5.7-2.2 , dry_run: enabled Tagging pytorch/manylinux-builder:cpu-main to pytorch/manylinux-builder:cpu-2.2 , dry_run: enabled Tagging pytorch/libtorch-cxx11-builder:cpu-main to pytorch/libtorch-cxx11-builder:cpu-2.2 , dry_run: enabled Tagging pytorch/manylinuxcxx11-abi-builder:cpu-cxx11-abi-main to pytorch/manylinuxcxx11-abi-builder:cpu-cxx11-abi-2.2 , dry_run: enabled Tagging pytorch/manylinuxaarch64-builder:cpu-aarch64-main to pytorch/manylinuxaarch64-builder:cpu-aarch64-2.2 , dry_run: enabled Tagging pytorch/conda-builder:cuda11.8-main to pytorch/conda-builder:cuda11.8-2.2 , dry_run: enabled Tagging pytorch/conda-builder:cuda12.1-main to pytorch/conda-builder:cuda12.1-2.2 , dry_run: enabled Tagging pytorch/conda-builder:cpu-main to pytorch/conda-builder:cpu-2.2 , dry_run: enabled /data/users/atalman/pytorch/.github/workflows/generated-linux-binary-manywheel-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-linux-binary-conda-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-linux-binary-libtorch-cxx11-abi-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-linux-binary-libtorch-pre-cxx11-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-linux-aarch64-binary-manywheel-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-linux-binary-manywheel-main.yml /data/users/atalman/pytorch/.github/workflows/generated-linux-binary-libtorch-cxx11-abi-main.yml /data/users/atalman/pytorch/.github/workflows/generated-linux-binary-libtorch-pre-cxx11-main.yml /data/users/atalman/pytorch/.github/workflows/generated-windows-binary-wheel-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-windows-binary-conda-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-windows-binary-libtorch-release-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-windows-binary-libtorch-debug-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-windows-binary-libtorch-release-main.yml /data/users/atalman/pytorch/.github/workflows/generated-windows-binary-libtorch-debug-main.yml /data/users/atalman/pytorch/.github/workflows/generated-macos-binary-wheel-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-macos-binary-conda-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-macos-binary-libtorch-cxx11-abi-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-macos-arm64-binary-libtorch-cxx11-abi-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-macos-arm64-binary-wheel-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-macos-arm64-binary-conda-nightly.yml ```` Result of pinning unstable and disabled jobs: ``` # The link to the published list of disabled jobs DISABLED_JOBS_URL = "https://ossci-metrics.s3.amazonaws.com/disabled-jobs.json?versionid=kKJlAXdrUbk3CilXbKu.6OwNTGQB8a.B" # and unstable jobs UNSTABLE_JOBS_URL = "https://ossci-metrics.s3.amazonaws.com/unstable-jobs.json?versionid=vzaicOxSsh55iXBXwgGrW6dFeVtPfrhr" ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114355 Approved by: https://github.com/malfet	2023-11-23 02:14:22 +00:00
atalman	995fae6060	Move small pypi build as default for linux cuda 12.1 (#114281 ) This is first PR to resolve: https://github.com/pytorch/pytorch/issues/113972 Move our small wheel build as default Test: ``` pip3 install --no-cache-dir --pre torch-2.2.0.dev20231121%2Bcu121-cp310-cp310-linux_x86_64.whl --index-url https://download.pytorch.org/whl/nightly/cu121 Looking in indexes: https://download.pytorch.org/whl/nightly/cu121 Processing ./torch-2.2.0.dev20231121%2Bcu121-cp310-cp310-linux_x86_64.whl Collecting filelock (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/filelock-3.9.0-py3-none-any.whl (9.7 kB) Collecting typing-extensions>=4.8.0 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/typing_extensions-4.8.0-py3-none-any.whl (31 kB) Collecting sympy (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/sympy-1.11.1-py3-none-any.whl (6.5 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.5/6.5 MB 253.4 MB/s eta 0:00:00 Collecting networkx (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/networkx-3.0rc1-py3-none-any.whl (2.0 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 387.1 MB/s eta 0:00:00 Collecting jinja2 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/Jinja2-3.1.2-py3-none-any.whl (133 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 133.1/133.1 kB 365.3 MB/s eta 0:00:00 Collecting fsspec (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/fsspec-2023.4.0-py3-none-any.whl (153 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 154.0/154.0 kB 370.6 MB/s eta 0:00:00 Collecting pytorch-triton==2.1.0+6e4932cda8 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/pytorch_triton-2.1.0%2B6e4932cda8-cp310-cp310-linux_x86_64.whl (125.4 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 125.4/125.4 MB 384.1 MB/s eta 0:00:00 Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.7/23.7 MB 404.9 MB/s eta 0:00:00 Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 823.6/823.6 kB 402.5 MB/s eta 0:00:00 Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.1/14.1 MB 383.9 MB/s eta 0:00:00 Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 731.7/731.7 MB 406.9 MB/s eta 0:00:00 Collecting nvidia-cublas-cu12==12.1.3.1 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 410.6/410.6 MB 388.2 MB/s eta 0:00:00 Collecting nvidia-cufft-cu12==11.0.2.54 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.6/121.6 MB 410.5 MB/s eta 0:00:00 Collecting nvidia-curand-cu12==10.3.2.106 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.5/56.5 MB 272.9 MB/s eta 0:00:00 Collecting nvidia-cusolver-cu12==11.4.5.107 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.2/124.2 MB 381.5 MB/s eta 0:00:00 Collecting nvidia-cusparse-cu12==12.1.0.106 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 196.0/196.0 MB 394.6 MB/s eta 0:00:00 Collecting nvidia-nccl-cu12==2.19.3 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_nccl_cu12-2.19.3-py3-none-manylinux1_x86_64.whl (166.0 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 166.0/166.0 MB 384.7 MB/s eta 0:00:00 Collecting nvidia-nvtx-cu12==12.1.105 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 99.1/99.1 kB 281.8 MB/s eta 0:00:00 Collecting nvidia-nvjitlink-cu12 (from nvidia-cusolver-cu12==11.4.5.107->torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_nvjitlink_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (19.8 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19.8/19.8 MB 367.3 MB/s eta 0:00:00 Collecting MarkupSafe>=2.0 (from jinja2->torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/MarkupSafe-2.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25 kB) Collecting mpmath>=0.19 (from sympy->torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/mpmath-1.2.1-py3-none-any.whl (532 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 532.6/532.6 kB 391.3 MB/s eta 0:00:00 Installing collected packages: mpmath, typing-extensions, sympy, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, networkx, MarkupSafe, fsspec, filelock, pytorch-triton, nvidia-cusparse-cu12, nvidia-cudnn-cu12, jinja2, nvidia-cusolver-cu12, torch ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114281 Approved by: https://github.com/malfet, https://github.com/huydhn	2023-11-22 00:10:03 +00:00
Catherine Lee	dab272eed8	[td] Consistent pytest cache (#113804 ) Move the pytest cache downloading into the build step and store it in additional ci files so that it stays consistent during sharding. Only build env is taken into account now instead of also test config since we might not have the test config during build time, making it less specific, but I also think this might be better since tests are likely to fail across the same test config (I also think it might be worth not even looking at build env but thats a different topic) Each cache upload should only include information from the current run. Do not merge current cache with downloaded cache during upload (shouldn't matter anyways since the downloaded cache won't exist at the time) From what I cant tell of the s3 retention policy, pytest cache files will be deleted after 30 days (cc @ZainRizvi to confirm), so we never have to worry about space or pulling old versions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113804 Approved by: https://github.com/ZainRizvi	2023-11-17 23:45:47 +00:00
Nikita Shulga	3fc38e6c83	[GHF] Abort merge on rebase failure (#113960 ) Abort merges invoked with `-r` if there is nothing to rebase Make `rebase_onto`/`rebase_ghstack_onto` return False if rebase is no-op and abort merge in that case Remove `-e` option from both trymerge and tryrebase workflows as one should never report failures on workflow dispatch Pull Request resolved: https://github.com/pytorch/pytorch/pull/113960 Approved by: https://github.com/clee2000	2023-11-17 23:11:00 +00:00
Catherine Lee	c51827b8ce	[ez] Hash update to reuse issues again (#113961 ) The bot that creates the issue got changed, but the search did not, so it wasn't finding old PRs and was just making new ones. This PR makes it reuse PRs again instead of making a new one everytime. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113961 Approved by: https://github.com/huydhn	2023-11-17 19:06:38 +00:00
albanD	25fb88cf23	Add all 3.12 binary build for wheel. Let's see how it goes. V2 (#112882 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/112882 Approved by: https://github.com/malfet, https://github.com/sammcj	2023-11-16 18:20:12 +00:00
Eli Uriegas	84ee7453ad	ci: Add clickable PR link to trymerge (#113712 ) Adds a link to trymerge so that you can quickly click through the job to the pull request for debugging. Signed-off-by: Eli Uriegas <eliuriegas@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113712 Approved by: https://github.com/clee2000, https://github.com/malfet	2023-11-15 01:55:33 +00:00
Catherine Lee	6e73ae2022	[ci][ez] Add job_id to emit_metrics (#113099 ) As in title. Also print the job id in the step since I'm struggling to find it Pull Request resolved: https://github.com/pytorch/pytorch/pull/113099 Approved by: https://github.com/seemethere	2023-11-08 10:32:41 +00:00
Huy Do	dd957138ec	Pin Docker images to main (#112692 ) This will help prevent a commit like `77901321d9` pushing to release branch from overwrite the Docker images used in main. In addition, the `DEFAULT_TAG` can be easily updated to `2.1` for example when doing branch cut release. This basically pins the Docker images like https://github.com/pytorch/pytorch/pull/111971 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112692 Approved by: https://github.com/malfet	2023-11-02 17:39:45 +00:00
Nikita Shulga	1b86d5ef2f	[Ci] Add arm64 libtorch CI config (#112474 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112474 Approved by: https://github.com/ZainRizvi, https://github.com/seemethere ghstack dependencies: #112451, #112452	2023-11-01 19:09:34 +00:00
Nikita Shulga	54c7d0d99d	[GHF] Bot should reopen PR after revert (#112614 ) Fixes https://github.com/pytorch/test-infra/issues/4692 Test plan, see https://github.com/malfet/deleteme/pull/58#issuecomment-1789365259 / https://github.com/malfet/deleteme/actions/runs/6723011476 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112614 Approved by: https://github.com/seemethere, https://github.com/ezyang ghstack dependencies: #112613	2023-11-01 18:03:32 +00:00
Nikita Shulga	4a2242e479	[BE] Use GITHUB_API_URL (#112613 ) To avoid hardcoding the same string constant over and over again Pull Request resolved: https://github.com/pytorch/pytorch/pull/112613 Approved by: https://github.com/seemethere	2023-11-01 18:03:32 +00:00
Nikita Shulga	8d6b4322d0	[CI] Limit libtorch builds to `shared-with-deps` (#112452 ) As that is the only variant that is being mentioned on https://pytorch.org/get-started/locally/ And for MacOS those three flavors were just building and uploading the same thing 3 times over, see [this](https://github.com/pytorch/pytorch/actions/runs/6689661275/job/18176516410) for example: ``` upload: ../../_temp/artifacts/libtorch-macos-2.2.0.dev20231030.zip to s3://pytorch/libtorch/nightly/cpu/libtorch-macos-2.2.0.dev20231030.zip ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112452 Approved by: https://github.com/huydhn ghstack dependencies: #112451	2023-10-31 08:40:06 +00:00
Nikita Shulga	0ce8cf7c7a	Update small wheel nccl-version to 2.19.3 (#112293 ) To keep it in sync with https://github.com/pytorch/pytorch/pull/110827 Added check to `scripts/generate_binary_build_matrix.py` to validate submodule and small wheel nccl versions are the same Step one in addressing https://github.com/pytorch/pytorch/issues/112285 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112293 Approved by: https://github.com/huydhn	2023-10-31 01:20:01 +00:00
Huy Do	f6f81a5969	Update get-workflow-job-id to also return job name (#112103 ) Then we can use this job name in `filter-test-configs` if it's available. This addresses the issue in which `filter-test-configs` on GitHub runners (MacOS x86) couldn't find the runner log to get the job name. This is expected because GitHub runners are isolated, so a job should not be able to access runner logs, which could contains information from other jobs. This allows all missing features depending on running `filter-test-configs` on GitHub runners: * Rerun disabled tests and memory leak check. For example, this would help avoid closing https://github.com/pytorch/pytorch/issues/110980#issuecomment-1779806466 early with the disabled test running properly on MacOS x86 * MacOS x86 jobs can now be disabled or marked as unstable I keep the current logic to parse the log as a fallback because it's working fine on self-hosted runners. That also handles the case where `get-workflow-job-id` fails. Also I move the rest of `get-workflow-job-id` up before the test step like https://github.com/pytorch/pytorch/pull/111483 ### Testing Spot checks some jobs to confirm they have the correct names: * MacOS M1 test job https://github.com/pytorch/pytorch/actions/runs/6648305319/job/18065275722?pr=112103#step:10:8 * MacOS x86 build job https://github.com/pytorch/pytorch/actions/runs/6648306305/job/18065138137?pr=112103#step:9:14 * Linux test job has https://github.com/pytorch/pytorch/actions/runs/6648300991/job/18065354503?pr=112103#step:13:7 * Windows test job https://github.com/pytorch/pytorch/actions/runs/6648305319/job/18065599500?pr=112103#step:12:7 * MacOS x86 test job https://github.com/pytorch/pytorch/actions/runs/6648306305/job/18066312801#step:10:8 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112103 Approved by: https://github.com/clee2000	2023-10-26 16:42:46 +00:00
Huy Do	9132734a35	Use Dr.CI GitHub checkrun summary when querying its API fails (#111628 ) This will allow internal SandCastle job to access Dr.CI classification results via GitHub checkrun summary and correctly ignore unrelated failures. ### Testing Adding `TestBypassFailuresOnSandCastle` where Dr.CI API returns nothing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111628 Approved by: https://github.com/clee2000	2023-10-24 01:32:30 +00:00
Aaron Gokaslan	9b499b417e	[BE]: Apply subprocess check to github scripts (#111684 ) Add subproces checks to raise exceptions in Github scripts Pull Request resolved: https://github.com/pytorch/pytorch/pull/111684 Approved by: https://github.com/albanD	2023-10-20 23:37:57 +00:00
Huy Do	4ec777e9a5	[BE] Clean up trymerge code handling broken trunk failures (#111520 ) This is the final part of https://github.com/pytorch/pytorch/pull/110054. The broken trunk classification has been done on Dr.CI, so we can just check for that in trymerge for consistency when ghstack is used. * [x] https://github.com/pytorch/pytorch/pull/110054 * [x] https://github.com/pytorch/pytorch/pull/110133 * [x] This PR to clean up the broken trunk logic. One important change is that `get_classifications` doesn't need to query the jobs from Rockset for the head and merge base SHA anymore, saving a query there. The function looks a lot simpler now. ### Testing https://github.com/pytorch/pytorch/pull/111253 had 1 broken trunk failure as detected by Dr.CI from the base commit `3eb5cae3af` (valid) while trymerge didn't detect that because ghstack base commit `be8e517174` didn't have the same failure (miss). Pull Request resolved: https://github.com/pytorch/pytorch/pull/111520 Approved by: https://github.com/clee2000	2023-10-19 02:30:56 +00:00
atalman	f9053877b4	Add pypi required metadata to all wheels except linux (#111042 ) Will fix package after publishing https://github.com/pytorch/pytorch/issues/100974 Poetry install requires all wheels on pypi to have same metadata. Hence including linux dependencies in all non-linux wheels Pull Request resolved: https://github.com/pytorch/pytorch/pull/111042 Approved by: https://github.com/malfet	2023-10-12 17:40:13 +00:00
Nikita Shulga	92fea5ae3f	[GHF] Re-enable `test_internal_changes` (#110834 ) As Jon fixed the internal change status reporting after the issue is closed Fixes https://github.com/pytorch/pytorch/issues/110218 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110834 Approved by: https://github.com/janeyx99	2023-10-09 03:23:07 +00:00
Nikita Shulga	d35e3dbd06	Fix concurrency limits for Create Release (#110759 ) Also, don't run it on tags, but run on release branch and on `release` event. Tweak linter to accept different concurrency limits for `create_release.yml` Fixes https://github.com/pytorch/pytorch/issues/110569 as all the invocations of workflow in the past were cancelled by concurrently limit due to the tag push and release happening at roughly the same time, see https://github.com/pytorch/pytorch/actions/workflows/create_release.yml?query=event%3Arelease Pull Request resolved: https://github.com/pytorch/pytorch/pull/110759 Approved by: https://github.com/atalman	2023-10-06 23:14:12 +00:00
Huy Do	f952551963	Handle invalid cancellation signals in trymerge (#110690 ) This change is needed after https://github.com/pytorch/test-infra/pull/4579 and https://github.com/pytorch/test-infra/pull/4610. All invalid cancelled signals have been removed from Dr.CI and HUD. So trymerge should ignore them accordingly for a consistent experience. ### Testing https://github.com/pytorch/pytorch/pull/110367#issuecomment-1750099960 is the PR where a bunch of invalid cancelled signals showed up and blocked merges Pull Request resolved: https://github.com/pytorch/pytorch/pull/110690 Approved by: https://github.com/clee2000, https://github.com/ZainRizvi	2023-10-06 22:43:33 +00:00
Huy Do	26bfb0fc21	Check for both workflow and job names from Dr.CI (#110661 ) In https://github.com/pytorch/pytorch/pull/110362, the failure was flaky but merge bot treated it as an actual failure. This is a regression after https://github.com/pytorch/test-infra/pull/4604 where the name returned by Dr.CI now includes workflow name. For example, the name is `trunk / macos-12-py3-arm64 / test (default, 2, 3, macos-m1-12)` in the JSON response: ``` {"FAILED": [], "FLAKY": [{"workflowId": 6372581477, "id": 17297638807, "name": "trunk / macos-12-py3-arm64 / test (default, 2, 3, macos-m1-12)", "jobName": "macos-12-py3-arm64 / test (default, 2, 3, macos-m1-12)", "conclusion": "failure", "completed_at": "2023-10-01T22:18:28Z", "html_url": "https://github.com/pytorch/pytorch/actions/runs/6372581477/job/17297638807", "head_branch": "ciflow/trunk/110362", "pr_number": 110362, "head_sha": "03f51e36dedf234931006d1db61677b229c9a119", "failure_captures": ["Failure: There is only 4671284KB free space left in /, which is less than the minimum requirement of"], "failure_line": "Failure: There is only 4671284KB free space left in /, which is less than the minimum requirement of 6291456KB for macOS", "time": "2023-10-01T22:17:53.847751Z"}], "BROKEN_TRUNK": [], "UNSTABLE": []} ``` I update merge bot to handle this better by considering both workflow name, job name, and the combination full name. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110661 Approved by: https://github.com/clee2000	2023-10-06 04:36:52 +00:00
Nikita Shulga	cd0e7d133b	Migrate MacOs wheel binary builds to ephemeral M1 runners (#110432 ) Surprisingly there are no speed difference between running the cross-compilation on `macos12-xl` (x86_64 12 core machine) and `macos-13-xlarge` (m1 6 core machine) Most of the changes are on the https://github.com/pytorch/builder side: - `50a6e91f97` skips installing mkl on M1 machines - `bbb29b0467` same for llvm-9 - `8bcc83dbb1` bumps minimal numpy version to 1.19 (as 1.17 is not available for m1) - `cc4f1f9055` skips building tests/distributed for M1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110432 Approved by: https://github.com/kit1980	2023-10-03 17:31:28 +00:00
Huy Do	81a74457ca	[BE] Clean up trymerge code handling flaky failures (#110133 ) This is the 2nd part of https://github.com/pytorch/pytorch/pull/110054. The flaky classification has been done on Dr.CI. There is no need to download flaky rule files and do the check anymore. Some tests are also updated with new examples because we mocked the list of flaky rules there. Similar tests have been done on Dr.CI. * [x] https://github.com/pytorch/pytorch/pull/110054 * [x] Clean up the flaky rules logic because it has already been implemented on Dr. CI * [ ] Clean up the broken trunk logic for the same reason Pull Request resolved: https://github.com/pytorch/pytorch/pull/110133 Approved by: https://github.com/clee2000	2023-09-30 08:01:00 +00:00
Nikita Shulga	ae546db562	[GHF] Update meregbot tests (#110221 ) One should never edit `gql_mocks.json` by hand, as otherwise it does not validate mergebot behavior using the actual GitHub data, but rather snapshot of this data frozen in time. Unfortunately, GitHub started to delete checkrun statuses against older PR, so some tests needs to be updated. For example https://github.com/pytorch/pytorch/pull/77700/checks committed on May 19th 2022 has no checks at the time of the writing (Sep 28th 2023) Deleted `test_checksuites_pagination` as its checks are gone it tests the same functionality as `test_get_checkruns_many_runs`, which was updated to use more recent PR. Deleted `test_get_classifications_pending_unstable`, because what it wants to test is inherently unreliable and therefore it must be rewritten using some different mechanisms. Disabled `test_internal_changes` as the mechanism is broken at the moment, see https://github.com/pytorch/pytorch/issues/110218 Updated `test_pr_dependencies_ghstack` and `test_pr_dependencies` to generate `msg` using `pr.get_body()` rather than hardcode the text (that were updated after test was committed.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110221 Approved by: https://github.com/clee2000, https://github.com/huydhn	2023-09-28 21:29:17 +00:00
Nikita Shulga	a200bb5e54	[BE] Do not use `assert` in unit tests (#110179 ) One should always use `unittest.assert` methods rather than plain `assert` as later can be turned into a noop if Python runtime is invoked with optimizations enabled Fixes use of `assert` introduced by https://github.com/pytorch/pytorch/pull/105251 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110179 Approved by: https://github.com/huydhn	2023-09-27 21:53:18 +00:00
Huy Do	955298bc40	Use Dr.CI results to classify flaky failures in trymerge (#110054 ) After https://github.com/pytorch/test-infra/pull/4589, we can now query Dr.CI to get the list of flaky failures there. This change queries Dr.CI API endpoint and check if the failure is a flaky one using `is_flaky` function. Because the change is relatively large, I'm breaking it down to several smaller PRs in this order: * [x] This PR queries Dr.CI and adds `is_flaky` check * [ ] Clean up the flaky rules logic because it has already been implemented on Dr. CI * [ ] Clean up the broken trunk logic for the same reason ### Testing * Create a new `drci_mocks.json` file to catch the JSON response from Dr.CI API endpoint. The API requires `DRCI_BOT_KEY`. * `pytest -v test_trymerge.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110054 Approved by: https://github.com/clee2000	2023-09-27 21:21:29 +00:00

1 2 3 4 5 ...

706 commits