pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-14 20:57:59 +00:00

Author	SHA1	Message	Date
Aaron Gokaslan	e2ac2dc13a	Update NCCL submodule to v2.20.5 (#121635 ) Updates NCCL submodule to 2.20.5 . Includes a lot of bugfixes for reductions and connections issues. Should also improve performance. We have been running 2.20.5 internally for a few weeks, the binary pip wheels have finally been published so we can update main. Release notes here: https://docs.nvidia.com/deeplearning/nccl/release-notes/rel_2-20-5.html#rel_2-20-5 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121635 Approved by: https://github.com/malfet	2024-03-11 17:23:59 +00:00
Nikita Shulga	c7a65f58b0	[CI] Script to fetch creds from current AWS session (#121426 ) Because some implementations, like OpenDAL does not work with AWS IMDSv2, but this script will bridge the gap and enables more recent `sccache` releases(that switched from simple-s3 to OpenDAL) to work in current CI system When launched it prints something like: ``` export AWS_ACCESS_KEY_ID=XXXXX export AWS_SECRET_ACCESS_KEY=YYYY export AWS_SESSION_TOKEN=ZZZZ ``` which can be `eval`ed and passed then sccache can use those failures. Validated in https://github.com/pytorch/pytorch/pull/121323 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121426 Approved by: https://github.com/Skylion007	2024-03-07 19:25:54 +00:00
Edward Z. Yang	7881b95c73	Don't suppress error codes in lint job, properly activate conda (#120769 ) Before: ``` 2024-02-28T02:38:24.3757573Z + conda activate /opt/conda/envs/py_3.9 2024-02-28T02:38:24.3757872Z 2024-02-28T02:38:24.3758116Z CondaError: Run 'conda init' before 'conda activate' ``` Now, this would actually fail the job, and I also fix the bug. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/120769 Approved by: https://github.com/albanD, https://github.com/janeyx99, https://github.com/malfet	2024-02-28 15:17:31 +00:00
Aleksei Nikiforov	232f09e0ea	Add copy of scripts for setting up s390x workers (#120417 ) This PR contains scripts used to produce self-hosted s390x worker. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120417 Approved by: https://github.com/malfet	2024-02-23 17:01:44 +00:00
Catherine Lee	34638c82a6	[mergebot] No unique behavior for facebook bot re pending jobs (#119735 ) if fb bot says merge without -f, do normal behavior and wait for pending checks Pull Request resolved: https://github.com/pytorch/pytorch/pull/119735 Approved by: https://github.com/izaitsevfb, https://github.com/huydhn	2024-02-13 20:07:24 +00:00
Huy Do	5acd1f0f7d	Add cherry-pick workflow (#119352 ) After https://github.com/pytorch/test-infra/pull/4758, we can create a new workflow on PyTorch to receive `try-cherry-pick` dispatch event from the bot, and create the cherry pick PR. * [x] Cherry pick a PR after it has been landed and create a cherry pick PR to the target release branch. * [ ] The second part after this is to update the release tracker with the info. This will be done in a subsequent PR. * [ ] ghstack is not yet supported * [ ] Cherry pick a reverted commit is not yet supported (from @kit1980 comment) ### Testing The script can be used locally: ``` python cherry_pick.py --onto release/2.2 --classification release --github-actor huydhn 118907 The cherry pick PR is at https://github.com/pytorch/pytorch/pull/119351 ``` The test cherry pick PR is created at https://github.com/pytorch/pytorch/pull/119351 Unit testing this on CI is tricky, so I test this out on canary instead. * https://github.com/pytorch/pytorch-canary/pull/193#issuecomment-1933162707 creates the PR at https://github.com/pytorch/pytorch-canary/pull/201 * One more test on canary with the new token https://github.com/pytorch/pytorch-canary/pull/193#issuecomment-1933229483. The minimum required permission from what I see is `workflow` * Cherry picking conflicts could happen and needs to be handled manually https://github.com/pytorch/pytorch-canary/pull/194#issuecomment-1933142975 * ~Require a linked issue when cherry picking regressions, critical fixes, or fixing new features https://github.com/pytorch/pytorch-canary/pull/193#issuecomment-1933174520~ Relax this requirement to a suggestion Pull Request resolved: https://github.com/pytorch/pytorch/pull/119352 Approved by: https://github.com/atalman	2024-02-12 23:12:10 +00:00
Catherine Lee	ad217d4266	[ez] Add try catch for deleting old branches (#119696 ) I think some chars in branch names affect the api calls, so just assume they're protected Pull Request resolved: https://github.com/pytorch/pytorch/pull/119696 Approved by: https://github.com/huydhn	2024-02-12 21:08:59 +00:00
Catherine Lee	059bf1baa4	Separate clang lint? (#119575 ) 25 min -> 17 + 13 min, which is still not as fast as I want it to be but I'll take it Lintrunner provides some parallelism by default, but it's not perfect Reducing fetch-depth from all to 1 further reduces time by ~2-3 minutes From non clang's logs: ``` 2024-02-09T22:05:39.5297616Z Requirement already satisfied: PyYAML==6.0 in /opt/conda/lib/python3.11/site-packages (6.0) 2024-02-09T22:12:23.6164708Z Collecting black==23.12.1 ``` I don't know why this part takes so long, maybe it's just buffering? Clang version doesn't show this issue See `5a750c8035` Pull Request resolved: https://github.com/pytorch/pytorch/pull/119575 Approved by: https://github.com/huydhn, https://github.com/malfet	2024-02-12 17:46:31 +00:00
Catherine Lee	3f82e435eb	Fix delete branches (#119399 ) Due to PR_WINDOW, if the magic string exists in the body but the pr was not updated recently, the query wouldn't find it and would delete the branch. Instead, query separately for branches with the no-delete-branch label, which I created recently. Might as well query for branches with open PRs while we're at it so PRs with the stale label won't get their branches deleted either Pull Request resolved: https://github.com/pytorch/pytorch/pull/119399 Approved by: https://github.com/huydhn	2024-02-09 17:28:00 +00:00
PyTorch MergeBot	c6f39740c7	Revert "Fix delete branches (#119399 )" This reverts commit `e1fc7e1ebc`. Reverted https://github.com/pytorch/pytorch/pull/119399 on behalf of https://github.com/clee2000 due to has a bug ([comment](https://github.com/pytorch/pytorch/pull/119399#issuecomment-1936291560))	2024-02-09 17:14:23 +00:00
Catherine Lee	5d6e323549	No TD (test removal) option in CI (#118808 ) It currently doesn't do anything, but I will want these env vars later. Maybe I should start using ghstack Intention: --enable-td actually gets rid of tests I am open to better names Pull Request resolved: https://github.com/pytorch/pytorch/pull/118808 Approved by: https://github.com/huydhn, https://github.com/osalpekar	2024-02-09 16:42:27 +00:00
Catherine Lee	e1fc7e1ebc	Fix delete branches (#119399 ) Due to PR_WINDOW, if the magic string exists in the body but the pr was not updated recently, the query wouldn't find it and would delete the branch. Instead, query separately for branches with the no-delete-branch label, which I created recently. Might as well query for branches with open PRs while we're at it so PRs with the stale label won't get their branches deleted either Pull Request resolved: https://github.com/pytorch/pytorch/pull/119399 Approved by: https://github.com/huydhn	2024-02-09 16:40:32 +00:00
Catherine Lee	200108c6e6	Delete old branches (#117079 ) Example https://github.com/pytorch/pytorch/actions/runs/7562281351/job/20592425611?pr=117079 (The code to delete branches isn't being run, it's just listing the branches it wants to delete) Internal code: https://fburl.com/code/hdvvbfkj Threshold for branch with PR is 30 days regardless of whether or not the PR is merged or not (compared to 3 days if merged and 30 days if closed). Threshold for branch without PR is 1.5 years (same internally). Threshold of ~400 queries to github so it doesn't hit token usage limits. Currently this leads to about 350 branches deleted per run. Only query for the last 90 days of updated PRs to reduce token usage, so if a branch has a PR but it was updated 90+ days ago, it will think it doesn't have a PR and will wait for the 1.5 years branch update check instead, regardless of whether the PR is open or closed. I tested that it could delete my own branch and it worked. labeled with test-config/crossref because I just want the smallest test config possible to reduce CI usage Pull Request resolved: https://github.com/pytorch/pytorch/pull/117079 Approved by: https://github.com/malfet	2024-02-05 20:50:05 +00:00
Nikita Shulga	4b59bfe8e5	[CI] Filter should not fail if pr_body is empty (#118934 ) Otherwise it will fail with `TypeError: argument of type 'NoneType' is not iterable` (see https://github.com/pytorch/pytorch/actions/runs/7748725174/job/21131915226 for example) ``` % gh api /repos/pytorch/pytorch/issues/118927\| { "url": "https://api.github.com/repos/pytorch/pytorch/issues/118927", ... "body": null, ... "state_reason": null } ``` TODO: Can we add a test for it? Pull Request resolved: https://github.com/pytorch/pytorch/pull/118934 Approved by: https://github.com/clee2000, https://github.com/seemethere, https://github.com/huydhn	2024-02-02 00:49:20 +00:00
Edward Z. Yang	82b0341af3	s/verison/version/ (#118749 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/118749 Approved by: https://github.com/malfet, https://github.com/albanD	2024-01-31 19:23:55 +00:00
Nikita Shulga	3011a4406f	[BE][GHF] Do not hardcode default branch name (#118530 ) Instead rely on `GitHubPR.default_branch()` which is the name of the repo's default branch. Do not pass branch name `merge_changes` is called, as it is set to default branch inside the function Pull Request resolved: https://github.com/pytorch/pytorch/pull/118530 Approved by: https://github.com/clee2000	2024-01-29 17:18:23 +00:00
Nikita Shulga	7cc7bf9dda	[GHF] Add co-authors to PR (#118347 ) Mention co-authors in PR body Modify `CommitAuthors` to include query first two commit `authors`, which makes sure that authors from suggested commits are recognized. Test plan: CI + check `get_authors()` on a few PRs Pull Request resolved: https://github.com/pytorch/pytorch/pull/118347 Approved by: https://github.com/kit1980	2024-01-27 01:02:49 +00:00
Ivan Zaitsev	b599f5608c	Fix mergeability check for ghstack PRs (#118258 ) # Changes * introduce `--check-mergeability` trymerge flag that attempts to merge PR locally, using the same merge logic as the mergebot, but requires just a read-only `GITHUB_TOKEN` and git repo. * change mergeability workflow to utilize the new --check-mergeability logic # Alternatives considered 1. > Rewrite `https://github.com/pytorch/test-infra/actions/workflows/pr-dependencies-check.yml` to correctly support partially merged ghstacks. That would be a slightly better approach, but ROI is lower, as it requires reimplementing trymerge logic and additional effort to consolidate the codebase (trymerge lives in pytorch repo). `pr-dependencies-check.yml` still produces human-readable results for partially merged ghstack prs (even if it falsely reports them as non-mergeable). 2. > Instead of introducing new trymerge flag, use existing flags, including `--dry-run`. That didn't work, as no combination of existing flags skips the rule checks and ROCKSET lookups. # Testing 1. Manual testing `trymerge.py --check-mergeability` on the regular and ghstack PRs: ``` export GITHUB_TOKEN= export GIT_REPO_DIR=`pwd` export GITHUB_REPOSITORY=pytorch/pytorch export GIT_REMOTE_URL=https://github.com/pytorch/pytorch # Test 1 (2 prs, 1 is closed) python3 ../pytorch/.github/scripts/trymerge.py --check-mergeability 117862 Skipping 1 of 2 PR (#117859) as its already been merged echo $? 0 # Test 2 (3 prs, 1 is closed) python3 ../pytorch/.github/scripts/trymerge.py --check-mergeability 118125 Skipping 1 of 3 PR (#117859) as its already been merged echo $? 0 # Test 3 (3 prs, intentional conflicts introduced into `main`): python3 ../pytorch/.github/scripts/trymerge.py --check-mergeability 118125 Skipping 1 of 3 PR (#117859) as its already been merged stdout: Auto-merging torch/_inductor/ir.py Auto-merging torch/_inductor/lowering.py CONFLICT (content): Merge conflict in torch/_inductor/lowering.py error: could not apply 66ba5b8792f... Realize inputs to DynamicScalar before unwrapping ... RuntimeError: Command `git -C /Users/ivanzaitsev/pytorch2 cherry-pick -x 66ba5b8792fa076c4e512d920651e5b6b7e466f4` returned non-zero exit code 1 ``` 2. Workflow run: https://github.com/pytorch/pytorch/actions/runs/7660736172/job/20878651852?pr=118258 <img width="516" alt="image" src="https://github.com/pytorch/pytorch/assets/108101595/28fbf0d2-ac2a-4518-b41d-b32b41373747"> <img width="621" alt="image" src="https://github.com/pytorch/pytorch/assets/108101595/ddbf8566-a417-43ec-9d0e-f623f4a71313"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/118258 Approved by: https://github.com/PaliC, https://github.com/huydhn	2024-01-26 03:15:56 +00:00
Catherine Lee	de9ddd19a5	Various CI settings (#117668 ) Test [ci-verbose-test-logs] (this worked, the test logs printing while running and interleaved and are really long) Settings for no timeout (step timeout still applies, only gets rid of ~30 min timeout for shard of test file) and no piping logs/extra verbose test logs (good for debugging deadlocks but results in very long and possibly interleaved logs). Also allows these to be set via pr body if the label name is in brackets ex [label name] or the test above. Pull Request resolved: https://github.com/pytorch/pytorch/pull/117668 Approved by: https://github.com/huydhn	2024-01-26 00:17:29 +00:00
Catherine Lee	02a411d4a6	[mergebot] Dry run for labels + easier to read Dr CI result (#118240 ) Dry run open for labels so we can run trymerge locally with dryrun without actually affected the PR Make Dr.CI results easier to read (previously a massive json dump, now just the job names + ids, in a nicer format) Pull Request resolved: https://github.com/pytorch/pytorch/pull/118240 Approved by: https://github.com/huydhn	2024-01-25 23:06:43 +00:00
Huy Do	eebe7e1d37	Migrate update-viablestrict to test-infra (#118163 ) In https://github.com/pytorch/test-infra/pull/4905, so that ExecuTorch can use the same GHA on their CI. ### Testing https://github.com/pytorch/pytorch/actions/runs/7634906738/job/20799502532#step:2:15480 Pull Request resolved: https://github.com/pytorch/pytorch/pull/118163 Approved by: https://github.com/clee2000	2024-01-25 07:07:34 +00:00
DanilBaibak	a545ebc870	Switched macOS runners type to macos-m1-stable (#117651 ) Switched macOS runners type to `macos-m1-stable`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/117651 Approved by: https://github.com/huydhn	2024-01-24 11:55:13 +00:00
Bin Bao	c6930aad46	Update Triton pin (#117873 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/117873 Approved by: https://github.com/shunting314, https://github.com/malfet	2024-01-23 21:05:30 +00:00
Nikita Shulga	98a044d33e	[CI] Build M1 conda binaries on M1 runners (#117801 ) As usual, almost no work on PyTorch side, all changes are on the builder end, namely: - `8b67d32929` - depend on `blas * mkl` only on x86 machines - `eb78393f1e` - install arm64 conda when running on Apple Silicon - `0d3aea4ee0` - constrain llvmdev-9 to x86 machines only - `6c6a33b271` - set correct DEVELOPER_DIR path TODO: - We should auto-detect this `DEVELOPER_DIR` via `xcode-select` Pull Request resolved: https://github.com/pytorch/pytorch/pull/117801 Approved by: https://github.com/atalman	2024-01-19 14:31:12 +00:00
Huy Do	cf470e7b59	Migrate update-commit-hash to test-infra (#117506 ) After https://github.com/pytorch/test-infra/pull/4885, the GHA is now reusable on `test-infra`. This tests the change and we can also land it after https://github.com/pytorch/test-infra/pull/4885 lands. Pull Request resolved: https://github.com/pytorch/pytorch/pull/117506 Approved by: https://github.com/malfet, https://github.com/atalman	2024-01-17 00:15:04 +00:00
Jithun Nair	24c39bb5e5	Upgrade nightly wheels to rocm6.0 (#116983 ) Follow-up to https://github.com/pytorch/builder/pull/1647 Pull Request resolved: https://github.com/pytorch/pytorch/pull/116983 Approved by: https://github.com/jeffdaily, https://github.com/atalman	2024-01-11 20:36:00 +00:00
Nikita Shulga	0f0020d76f	[GHF] Add support for new style stacks (#116873 ) Where base stack targets default branch, rather than base. But as default branch is likely to advance, since PR was made, search for mergebase before determining whether `base`..`head` are in sync with `orig` branch Also, rather than hardcode default branch name, fetch it from `GitHubPR.default_branch()` Test Plan: https://github.com/malfet/deleteme/pull/77 Pull Request resolved: https://github.com/pytorch/pytorch/pull/116873 Approved by: https://github.com/ezyang	2024-01-05 20:32:24 +00:00
Nikita Shulga	93b86bf531	[GHF] Implement stacked revert (#116447 ) By adding `get_ghstack_dependent_prs` that using `git branch --contains` finds all PRs containing stacked branch, selecting longest one (in terms of distance between origin and default branch) and skipping all open PRs Please note, that reverts should be applied in a reversed order with the one how PRs were landed originally. Use a bit of a defensive programming, i.e. revert single PR if attempt to fetch dependencies fails for some reason. Test plan: - Lint - ``` >>> from trymerge import GitRepo, GitHubPR, get_ghstack_prs, get_ghstack_dependent_prs >>> pr=GitHubPR("pytorch", "pytorch", 115188) >>> pr1=GitHubPR("pytorch", "pytorch", 115210) >>> repo=GitRepo("/Users/nshulga/git/pytorch/pytorch") >>> get_ghstack_dependent_prs(repo, pr1) [('22742d93a5357c9b5b45a74f91a6dc5599c9c266', <trymerge.GitHubPR object at 0x100f32f40>)] >>> get_ghstack_dependent_prs(repo, pr) [('22742d93a5357c9b5b45a74f91a6dc5599c9c266', <trymerge.GitHubPR object at 0x10102eaf0>), ('76b1d44d576c20be79295810904c589241ca1bd2', <trymerge.GitHubPR object at 0x10102eb50>)] >>> rc=get_ghstack_dependent_prs(repo, pr) rc[0]>>> rc[0][1].pr_num 115210 >>> rc[1][1].pr_num 115188 ``` - see: https://github.com/malfet/deleteme/pull/59#issuecomment-1869904714 and https://github.com/malfet/deleteme/pull/74#issuecomment-1870542702 Fixes https://github.com/pytorch/test-infra/issues/4845 Pull Request resolved: https://github.com/pytorch/pytorch/pull/116447 Approved by: https://github.com/huydhn ghstack dependencies: #116446	2023-12-27 23:01:16 +00:00
Nikita Shulga	5fcc2519f5	[GHF] Refactors (#116446 ) Prep change for allowing stacked reverts This is a no-op that factors out some helper function that would be useful later: - `get_pr_commit_sha` finds a committed sha for a given PR - `_revlist_to_prs` converts a revlist to GitHubPRs conditionally filtering some out - `do_revert_prs` reverts multiple PRs in a batch, but so far is invoked with only one PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/116446 Approved by: https://github.com/huydhn, https://github.com/seemethere	2023-12-27 23:01:16 +00:00
Nikita Shulga	87da0e1d23	[GHF] Fix gh_get_labels for small repos (#116444 ) Not sure if this is recent API change or what but `gh_get_labels('malfet', 'deleteme')` used to raise an exception (see https://github.com/malfet/deleteme/actions/runs/7334535266/job/19971328673#step:6:37 ) ``` File "/home/runner/work/deleteme/deleteme/.github/scripts/label_utils.py", line 50, in get_last_page_num_from_header link_info[link_info.rindex(prefix) + len(prefix) : link_info.rindex(suffix)] AttributeError: 'NoneType' object has no attribute 'rindex' ``` And with this fix it returns the expected list Pull Request resolved: https://github.com/pytorch/pytorch/pull/116444 Approved by: https://github.com/huydhn	2023-12-27 15:50:42 +00:00
Huy Do	d6de2df6b6	Improve the error message when a PR lacks the necessary approvals (#116161 ) The error message from https://github.com/pytorch/pytorch/pull/115329#issuecomment-1857135047 is pretty confusing because it lists some random `pytorch/metamates` folks from `superuser` merge rule. My attempt here is to make the error message clearer by pointing out: * All the matching merge rules and * Their list of approvers The message will now become: ``` Approvers from one of the follow rules are needed: - Core Reviewers (1, 2, 3, 4, 5, ...) - Core Maintainers (1, 2) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/116161 Approved by: https://github.com/malfet, https://github.com/PaliC, https://github.com/atalman, https://github.com/ZainRizvi	2023-12-22 00:22:43 +00:00
atalman	7b6210e8a4	Use matrix generate script for docker release workflows (#115949 ) Enable both supported CUDA version builds for docker release. Rather then building only 1 version. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115949 Approved by: https://github.com/huydhn	2023-12-18 20:20:59 +00:00
Nikita Shulga	7ed2bc7c67	[GHF] Do not block reverts with internal changes (#115903 ) As check is more often than not is unreliable, so better just post a warning and let the revert proceed. Fixes https://github.com/pytorch/test-infra/issues/4797 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115903 Approved by: https://github.com/clee2000, https://github.com/atalman	2023-12-15 17:00:07 +00:00
Nikita Shulga	28e37d4f3b	Update Trition pin (#115743 ) To include a cherry-pick of https://github.com/openai/triton/pull/2771 that should fix cuda-11.8 runtime issues Also, tweak build wheel script to update both ROCm and vanilla Trition builds version to 2.2 (even though on trunk it should probably be 3.3 already) TODO: Remove `ROCM_TRITION_VERSION` once both trunk and ROCM version are in sync again Pull Request resolved: https://github.com/pytorch/pytorch/pull/115743 Approved by: https://github.com/davidberard98	2023-12-14 18:54:24 +00:00
Aaron Gokaslan	794545c11f	[BE]: Enable RUF015 codebase wide (#115507 ) Constant time access of first value in collection. This is a constant time operation instead of converting the item to a list to get the first item which is linear. The rule is turned on which automatically autofixes and enforces this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115507 Approved by: https://github.com/malfet	2023-12-11 15:51:01 +00:00
Nikita Shulga	bf16fec463	Fix up triton builds (#115039 ) Follow ups after https://github.com/pytorch/pytorch/pull/114772 and https://github.com/pytorch/pytorch/pull/108187 - Triton builds should be published from `main` rather than `nightly` branch, as: - They are independent of any PyTorch changes - Every nightly is pinned to a specific commit therefore publishing updated triton binaries will not affect previous nightlies - If this is not the case, nightly promotion will never happen as binary builds on main will continue to fail in perpetuity searching for new triton binary - `patch_setup_py` is still needed to modify name of the package for ROCm builds Pull Request resolved: https://github.com/pytorch/pytorch/pull/115039 Approved by: https://github.com/seemethere, https://github.com/kit1980, https://github.com/huydhn	2023-12-03 23:14:41 +00:00
Nikita Shulga	a6294d8b9f	[RelEng] Enable Py312 conda builds (#114819 ) Once [sympy-1.12](https://anaconda.org/anaconda/sympy/files?version=1.12) has been added it can be build across the board Majority of the changes are in the builder repo: * `6b8c73fecb` tweaks numpy and openssl deps * `fc773dde97` <- tweak MLK requirements for Windows * `ca378c16f8` do not depend on Triton * `3c7404d80c` <- build without GLOO_SSL And finally, to workaround chicken-and-egg problem from [smoke_test.bat:97](`b92da8cd64/windows/internal/smoke_test.bat (L97)`) ```cmd call conda install -yq numpy pytorch %CONDA_EXTRA_ARGS% ``` Manually upload binaries to pytorch-nightly channel (will fix it akin to Nova in followup PRs) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114819 Approved by: https://github.com/huydhn	2023-12-03 01:30:03 +00:00
Bin Bao	8a90249bc2	[inductor] Update triton pin (#114772 ) Differential Revision: [D51761353](https://our.internmc.facebook.com/intern/diff/D51761353) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114772 Approved by: https://github.com/shunting314, https://github.com/atalman	2023-12-02 19:13:56 +00:00
pbialecki	386b9c2adc	build small pip wheels for CUDA 11.8 (#114620 ) As discussed, we would like to start building all wheels using the CUDA PyPI dependencies. Adding the "small wheel" workflow for CUDA 11.8 as it's already used for 12.1U1. CC @malfet @atalman Pull Request resolved: https://github.com/pytorch/pytorch/pull/114620 Approved by: https://github.com/atalman, https://github.com/malfet	2023-11-30 20:50:31 +00:00
Huy Do	6f340c6f30	Handle the case when opening a reverted PR with deleted head branch (#114423 ) When reopening a reverted PR, `422: Unprocessable Entity` is returned when the head branch has been deleted, for example https://github.com/pytorch/pytorch/pull/112889#issuecomment-1823216686 ``` { "message": "Validation Failed", "errors": [ { "resource": "PullRequest", "code": "custom", "field": "state", "message": "state cannot be changed. The commsplit branch has been deleted." } ], "documentation_url": "https://docs.github.com/rest/pulls/pulls#update-a-pull-request" } ``` The revert still happens though, only reopening PR fails, which is ok to ignore in this case I think instead of going the complicated route of trying to restore the deleted branch by merge bot. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114423 Approved by: https://github.com/malfet, https://github.com/kit1980	2023-11-23 07:32:46 +00:00
atalman	7a697c4683	[RelEng] Tag docker images for release, pin unstable and disabled jobs, apply release only changes (#114355 ) 1. This tags docker images using docker pull/tag/push for current release 2. Sets RELEASE_VERSION_TAG var and regenerates the workflows using the new docker tag 3. Remove conda token setting and Binary tests release changes these are already automated 4. Pin unstable and disabled jobs, autumate: https://github.com/pytorch/pytorch/pull/111675 Test: ``` RELEASE_VERSION=2.2 ./scripts/release/apply-release-changes.sh Tagging pytorch/manylinux-builder:cuda11.8-main to pytorch/manylinux-builder:cuda11.8-2.2 , dry_run: enabled Tagging pytorch/manylinux-builder:cuda12.1-main to pytorch/manylinux-builder:cuda12.1-2.2 , dry_run: enabled Tagging pytorch/libtorch-cxx11-builder:cuda11.8-main to pytorch/libtorch-cxx11-builder:cuda11.8-2.2 , dry_run: enabled Tagging pytorch/libtorch-cxx11-builder:cuda12.1-main to pytorch/libtorch-cxx11-builder:cuda12.1-2.2 , dry_run: enabled Tagging pytorch/manylinux-builder:rocm5.6-main to pytorch/manylinux-builder:rocm5.6-2.2 , dry_run: enabled Tagging pytorch/manylinux-builder:rocm5.7-main to pytorch/manylinux-builder:rocm5.7-2.2 , dry_run: enabled Tagging pytorch/libtorch-cxx11-builder:rocm5.6-main to pytorch/libtorch-cxx11-builder:rocm5.6-2.2 , dry_run: enabled Tagging pytorch/libtorch-cxx11-builder:rocm5.7-main to pytorch/libtorch-cxx11-builder:rocm5.7-2.2 , dry_run: enabled Tagging pytorch/manylinux-builder:cpu-main to pytorch/manylinux-builder:cpu-2.2 , dry_run: enabled Tagging pytorch/libtorch-cxx11-builder:cpu-main to pytorch/libtorch-cxx11-builder:cpu-2.2 , dry_run: enabled Tagging pytorch/manylinuxcxx11-abi-builder:cpu-cxx11-abi-main to pytorch/manylinuxcxx11-abi-builder:cpu-cxx11-abi-2.2 , dry_run: enabled Tagging pytorch/manylinuxaarch64-builder:cpu-aarch64-main to pytorch/manylinuxaarch64-builder:cpu-aarch64-2.2 , dry_run: enabled Tagging pytorch/conda-builder:cuda11.8-main to pytorch/conda-builder:cuda11.8-2.2 , dry_run: enabled Tagging pytorch/conda-builder:cuda12.1-main to pytorch/conda-builder:cuda12.1-2.2 , dry_run: enabled Tagging pytorch/conda-builder:cpu-main to pytorch/conda-builder:cpu-2.2 , dry_run: enabled /data/users/atalman/pytorch/.github/workflows/generated-linux-binary-manywheel-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-linux-binary-conda-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-linux-binary-libtorch-cxx11-abi-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-linux-binary-libtorch-pre-cxx11-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-linux-aarch64-binary-manywheel-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-linux-binary-manywheel-main.yml /data/users/atalman/pytorch/.github/workflows/generated-linux-binary-libtorch-cxx11-abi-main.yml /data/users/atalman/pytorch/.github/workflows/generated-linux-binary-libtorch-pre-cxx11-main.yml /data/users/atalman/pytorch/.github/workflows/generated-windows-binary-wheel-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-windows-binary-conda-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-windows-binary-libtorch-release-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-windows-binary-libtorch-debug-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-windows-binary-libtorch-release-main.yml /data/users/atalman/pytorch/.github/workflows/generated-windows-binary-libtorch-debug-main.yml /data/users/atalman/pytorch/.github/workflows/generated-macos-binary-wheel-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-macos-binary-conda-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-macos-binary-libtorch-cxx11-abi-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-macos-arm64-binary-libtorch-cxx11-abi-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-macos-arm64-binary-wheel-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-macos-arm64-binary-conda-nightly.yml ```` Result of pinning unstable and disabled jobs: ``` # The link to the published list of disabled jobs DISABLED_JOBS_URL = "https://ossci-metrics.s3.amazonaws.com/disabled-jobs.json?versionid=kKJlAXdrUbk3CilXbKu.6OwNTGQB8a.B" # and unstable jobs UNSTABLE_JOBS_URL = "https://ossci-metrics.s3.amazonaws.com/unstable-jobs.json?versionid=vzaicOxSsh55iXBXwgGrW6dFeVtPfrhr" ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114355 Approved by: https://github.com/malfet	2023-11-23 02:14:22 +00:00
atalman	995fae6060	Move small pypi build as default for linux cuda 12.1 (#114281 ) This is first PR to resolve: https://github.com/pytorch/pytorch/issues/113972 Move our small wheel build as default Test: ``` pip3 install --no-cache-dir --pre torch-2.2.0.dev20231121%2Bcu121-cp310-cp310-linux_x86_64.whl --index-url https://download.pytorch.org/whl/nightly/cu121 Looking in indexes: https://download.pytorch.org/whl/nightly/cu121 Processing ./torch-2.2.0.dev20231121%2Bcu121-cp310-cp310-linux_x86_64.whl Collecting filelock (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/filelock-3.9.0-py3-none-any.whl (9.7 kB) Collecting typing-extensions>=4.8.0 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/typing_extensions-4.8.0-py3-none-any.whl (31 kB) Collecting sympy (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/sympy-1.11.1-py3-none-any.whl (6.5 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.5/6.5 MB 253.4 MB/s eta 0:00:00 Collecting networkx (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/networkx-3.0rc1-py3-none-any.whl (2.0 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 387.1 MB/s eta 0:00:00 Collecting jinja2 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/Jinja2-3.1.2-py3-none-any.whl (133 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 133.1/133.1 kB 365.3 MB/s eta 0:00:00 Collecting fsspec (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/fsspec-2023.4.0-py3-none-any.whl (153 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 154.0/154.0 kB 370.6 MB/s eta 0:00:00 Collecting pytorch-triton==2.1.0+6e4932cda8 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/pytorch_triton-2.1.0%2B6e4932cda8-cp310-cp310-linux_x86_64.whl (125.4 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 125.4/125.4 MB 384.1 MB/s eta 0:00:00 Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.7/23.7 MB 404.9 MB/s eta 0:00:00 Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 823.6/823.6 kB 402.5 MB/s eta 0:00:00 Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.1/14.1 MB 383.9 MB/s eta 0:00:00 Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 731.7/731.7 MB 406.9 MB/s eta 0:00:00 Collecting nvidia-cublas-cu12==12.1.3.1 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 410.6/410.6 MB 388.2 MB/s eta 0:00:00 Collecting nvidia-cufft-cu12==11.0.2.54 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.6/121.6 MB 410.5 MB/s eta 0:00:00 Collecting nvidia-curand-cu12==10.3.2.106 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.5/56.5 MB 272.9 MB/s eta 0:00:00 Collecting nvidia-cusolver-cu12==11.4.5.107 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.2/124.2 MB 381.5 MB/s eta 0:00:00 Collecting nvidia-cusparse-cu12==12.1.0.106 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 196.0/196.0 MB 394.6 MB/s eta 0:00:00 Collecting nvidia-nccl-cu12==2.19.3 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_nccl_cu12-2.19.3-py3-none-manylinux1_x86_64.whl (166.0 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 166.0/166.0 MB 384.7 MB/s eta 0:00:00 Collecting nvidia-nvtx-cu12==12.1.105 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 99.1/99.1 kB 281.8 MB/s eta 0:00:00 Collecting nvidia-nvjitlink-cu12 (from nvidia-cusolver-cu12==11.4.5.107->torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_nvjitlink_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (19.8 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19.8/19.8 MB 367.3 MB/s eta 0:00:00 Collecting MarkupSafe>=2.0 (from jinja2->torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/MarkupSafe-2.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25 kB) Collecting mpmath>=0.19 (from sympy->torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/mpmath-1.2.1-py3-none-any.whl (532 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 532.6/532.6 kB 391.3 MB/s eta 0:00:00 Installing collected packages: mpmath, typing-extensions, sympy, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, networkx, MarkupSafe, fsspec, filelock, pytorch-triton, nvidia-cusparse-cu12, nvidia-cudnn-cu12, jinja2, nvidia-cusolver-cu12, torch ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114281 Approved by: https://github.com/malfet, https://github.com/huydhn	2023-11-22 00:10:03 +00:00
Catherine Lee	dab272eed8	[td] Consistent pytest cache (#113804 ) Move the pytest cache downloading into the build step and store it in additional ci files so that it stays consistent during sharding. Only build env is taken into account now instead of also test config since we might not have the test config during build time, making it less specific, but I also think this might be better since tests are likely to fail across the same test config (I also think it might be worth not even looking at build env but thats a different topic) Each cache upload should only include information from the current run. Do not merge current cache with downloaded cache during upload (shouldn't matter anyways since the downloaded cache won't exist at the time) From what I cant tell of the s3 retention policy, pytest cache files will be deleted after 30 days (cc @ZainRizvi to confirm), so we never have to worry about space or pulling old versions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113804 Approved by: https://github.com/ZainRizvi	2023-11-17 23:45:47 +00:00
Nikita Shulga	3fc38e6c83	[GHF] Abort merge on rebase failure (#113960 ) Abort merges invoked with `-r` if there is nothing to rebase Make `rebase_onto`/`rebase_ghstack_onto` return False if rebase is no-op and abort merge in that case Remove `-e` option from both trymerge and tryrebase workflows as one should never report failures on workflow dispatch Pull Request resolved: https://github.com/pytorch/pytorch/pull/113960 Approved by: https://github.com/clee2000	2023-11-17 23:11:00 +00:00
Catherine Lee	c51827b8ce	[ez] Hash update to reuse issues again (#113961 ) The bot that creates the issue got changed, but the search did not, so it wasn't finding old PRs and was just making new ones. This PR makes it reuse PRs again instead of making a new one everytime. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113961 Approved by: https://github.com/huydhn	2023-11-17 19:06:38 +00:00
albanD	25fb88cf23	Add all 3.12 binary build for wheel. Let's see how it goes. V2 (#112882 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/112882 Approved by: https://github.com/malfet, https://github.com/sammcj	2023-11-16 18:20:12 +00:00
Eli Uriegas	84ee7453ad	ci: Add clickable PR link to trymerge (#113712 ) Adds a link to trymerge so that you can quickly click through the job to the pull request for debugging. Signed-off-by: Eli Uriegas <eliuriegas@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113712 Approved by: https://github.com/clee2000, https://github.com/malfet	2023-11-15 01:55:33 +00:00
Catherine Lee	6e73ae2022	[ci][ez] Add job_id to emit_metrics (#113099 ) As in title. Also print the job id in the step since I'm struggling to find it Pull Request resolved: https://github.com/pytorch/pytorch/pull/113099 Approved by: https://github.com/seemethere	2023-11-08 10:32:41 +00:00
Huy Do	dd957138ec	Pin Docker images to main (#112692 ) This will help prevent a commit like `77901321d9` pushing to release branch from overwrite the Docker images used in main. In addition, the `DEFAULT_TAG` can be easily updated to `2.1` for example when doing branch cut release. This basically pins the Docker images like https://github.com/pytorch/pytorch/pull/111971 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112692 Approved by: https://github.com/malfet	2023-11-02 17:39:45 +00:00
Nikita Shulga	1b86d5ef2f	[Ci] Add arm64 libtorch CI config (#112474 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112474 Approved by: https://github.com/ZainRizvi, https://github.com/seemethere ghstack dependencies: #112451, #112452	2023-11-01 19:09:34 +00:00

1 2 3 4 5 ...

724 commits