Commit graph

2808 commits

Author SHA1 Message Date
atalman
244b124bb8 Add linux cpu test for 3.12 (#117853)
This is continuation of work: https://github.com/pytorch/pytorch/pull/113987

Co-authored-by: albanD <desmaison.alban@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117853
Approved by: https://github.com/albanD
2024-02-14 20:52:23 +00:00
Omkar Salpekar
ca55468416 Target Determinator Indexer Workflow (#118824)
As described in [this talk](https://www.youtube.com/watch?v=I95KmF6KSIA) and [this repo](https://github.com/osalpekar/llm-target-determinator),  we are experimenting with using CodeLlama-powered information retrieval for target determination.

The idea is that we create embeddings for PyTorch test functions, and store this index in S3. Then when a new PR comes in, we create embedding(s) for that PR, compare them to the index of test embeddings, and run only the most relevant tests.

This PR creates a workflow that does the indexing part (creating embeddings for functions and store in S3). All the logic for running the indexer is in [osalpekar/llm-target-determinator](https://github.com/osalpekar/llm-target-determinator). This workflow just checks out the relevant repos, installs the dependencies, runs the torchrun command to trigger indexing, and uploads the artifacts to S3.
Co-authored-by: Catherine Lee <csl@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118824
Approved by: https://github.com/izaitsevfb, https://github.com/huydhn
2024-02-14 06:21:18 +00:00
Huy Do
179ecab7e7 Do full checkout in lint workflow to rebuild new Docker images (#119858)
From https://github.com/pytorch/pytorch/pull/119575, using `fetch-depth: 1` didn't work for `calculate-docker-image` when rebuilding a new one.  Specifically, doing a full checkout is needed for `git rev-parse HEAD~:.ci/docker` to get the Docker tag.

This shows up as a trunk failure after the recent Docker image update 507db17675
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119858
Approved by: https://github.com/PaliC, https://github.com/clee2000, https://github.com/malfet
2024-02-14 02:37:54 +00:00
Ozan Aydin
b51e0246b7 sccache version update (#119554)
Fixes #37928

`sccache` is updated to the newer version (`v0.7.4`) to fix non-cacheable calls `multiple input files`  for `CUDA` builds.

This should make `Cache hits (CUDA)`  work as expected and improve the speed dramatically.

---

Additional information:

- Modified `install_sccache.bat` check structure due to GitHub Action error `Process completed with exit code 255.`
    - Error is occurring when freshly downloaded `sccache` is being called with `--show-stats` or `--start-server` arguments within the script
    - Now, it is checking file's existence and killing/deleting executable before the download

- Removed `sccache-cl` since it is no longer needed with newer versions of `sccache`

---

`win-vs2019-cpu-py3 / build` - `16m 27s`

![image](https://github.com/pytorch/pytorch/assets/148207261/b5628e6c-64bb-4293-9d07-480f56df44f1)

`win-vs2019-cuda11.8-py3 / build` - `17m 4s` **(previously ~45 mins - 1h30mins)**

![image](https://github.com/pytorch/pytorch/assets/148207261/e4ab01cb-0f56-41e8-984f-110e643b9c09)

Now `Cache Hits (CUDA)` hits all `304` object and the error `Non-cacheable reasons` is fixed.

![image](https://github.com/pytorch/pytorch/assets/148207261/c8c25d2e-3fc1-4edb-8982-99c1f490cb54)

---

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119554
Approved by: https://github.com/malfet
2024-02-13 23:50:40 +00:00
Jeff Daily
ba1eb0e27f [ROCm] upgrade CI to 6.0 (#119495)
Co-authored-by: Jithun Nair <jithun.nair@amd.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119495
Approved by: https://github.com/huydhn
2024-02-13 22:39:03 +00:00
Catherine Lee
34638c82a6 [mergebot] No unique behavior for facebook bot re pending jobs (#119735)
if fb bot says merge without -f, do normal behavior and wait for pending checks
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119735
Approved by: https://github.com/izaitsevfb, https://github.com/huydhn
2024-02-13 20:07:24 +00:00
Huy Do
5acd1f0f7d Add cherry-pick workflow (#119352)
After https://github.com/pytorch/test-infra/pull/4758, we can create a new workflow on PyTorch to receive `try-cherry-pick` dispatch event from the bot, and create the cherry pick PR.

* [x] Cherry pick a PR after it has been landed and create a cherry pick PR to the target release branch.
* [ ] The second part after this is to update the release tracker with the info.  This will be done in a subsequent PR.
* [ ] ghstack is not yet supported
* [ ] Cherry pick a reverted commit is not yet supported (from @kit1980 comment)

### Testing

The script can be used locally:

```
python cherry_pick.py --onto release/2.2 --classification release --github-actor huydhn 118907
The cherry pick PR is at https://github.com/pytorch/pytorch/pull/119351
```

The test cherry pick PR is created at https://github.com/pytorch/pytorch/pull/119351

Unit testing this on CI is tricky, so I test this out on canary instead.

* https://github.com/pytorch/pytorch-canary/pull/193#issuecomment-1933162707 creates the PR at https://github.com/pytorch/pytorch-canary/pull/201
  * One more test on canary with the new token https://github.com/pytorch/pytorch-canary/pull/193#issuecomment-1933229483.  The minimum required permission from what I see is `workflow`
* Cherry picking conflicts could happen and needs to be handled manually https://github.com/pytorch/pytorch-canary/pull/194#issuecomment-1933142975
* ~Require a linked issue when cherry picking regressions, critical fixes, or fixing new features https://github.com/pytorch/pytorch-canary/pull/193#issuecomment-1933174520~ Relax this requirement to a suggestion
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119352
Approved by: https://github.com/atalman
2024-02-12 23:12:10 +00:00
Catherine Lee
ad217d4266 [ez] Add try catch for deleting old branches (#119696)
I think some chars in branch names affect the api calls, so just assume they're protected
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119696
Approved by: https://github.com/huydhn
2024-02-12 21:08:59 +00:00
Catherine Lee
059bf1baa4 Separate clang lint? (#119575)
25 min -> 17 + 13 min, which is still not as fast as I want it to be but I'll take it
Lintrunner provides some parallelism by default, but it's not perfect

Reducing fetch-depth from all to 1 further reduces time by ~2-3 minutes

From non clang's logs:
```
2024-02-09T22:05:39.5297616Z Requirement already satisfied: PyYAML==6.0 in /opt/conda/lib/python3.11/site-packages (6.0)
2024-02-09T22:12:23.6164708Z Collecting black==23.12.1
```
I don't know why this part takes so long, maybe it's just buffering?  Clang version doesn't show this issue

See 5a750c8035
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119575
Approved by: https://github.com/huydhn, https://github.com/malfet
2024-02-12 17:46:31 +00:00
PyTorch UpdateBot
f2778e3874 [vision hash update] update the pinned vision hash (#119511)
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml).
Update the pinned vision hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119511
Approved by: https://github.com/pytorchbot
2024-02-10 03:22:13 +00:00
PyTorch UpdateBot
42ca82dfb1 [audio hash update] update the pinned audio hash (#119612)
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml).
Update the pinned audio hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119612
Approved by: https://github.com/pytorchbot
2024-02-10 03:22:06 +00:00
Catherine Lee
3f82e435eb Fix delete branches (#119399)
Due to PR_WINDOW, if the magic string exists in the body but the pr was not updated recently, the query wouldn't find it and would delete the branch.  Instead, query separately for branches with the no-delete-branch label, which I created recently.

Might as well query for branches with open PRs while we're at it so PRs with the stale label won't get their branches deleted either
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119399
Approved by: https://github.com/huydhn
2024-02-09 17:28:00 +00:00
PyTorch MergeBot
c6f39740c7 Revert "Fix delete branches (#119399)"
This reverts commit e1fc7e1ebc.

Reverted https://github.com/pytorch/pytorch/pull/119399 on behalf of https://github.com/clee2000 due to has a bug ([comment](https://github.com/pytorch/pytorch/pull/119399#issuecomment-1936291560))
2024-02-09 17:14:23 +00:00
Catherine Lee
5d6e323549 No TD (test removal) option in CI (#118808)
It currently doesn't do anything, but I will want these env vars later.  Maybe I should start using ghstack

Intention: --enable-td actually gets rid of tests

I am open to better names
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118808
Approved by: https://github.com/huydhn, https://github.com/osalpekar
2024-02-09 16:42:27 +00:00
Catherine Lee
e1fc7e1ebc Fix delete branches (#119399)
Due to PR_WINDOW, if the magic string exists in the body but the pr was not updated recently, the query wouldn't find it and would delete the branch.  Instead, query separately for branches with the no-delete-branch label, which I created recently.

Might as well query for branches with open PRs while we're at it so PRs with the stale label won't get their branches deleted either
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119399
Approved by: https://github.com/huydhn
2024-02-09 16:40:32 +00:00
Nikita Shulga
173256424a Update setuptools to 68.2.2 (#119456)
Followup after itself: Anaconda does not have setuptools v65, but does v68
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119456
Approved by: https://github.com/Skylion007
2024-02-09 15:38:25 +00:00
Nikita Shulga
2cdf9b7674 [BE] Update requests to 2.31.0 (#119516)
Fixes potential memory leak detected by DepandaBot and reported in  https://nvd.nist.gov/vuln/detail/CVE-2023-32681

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119516
Approved by: https://github.com/kit1980, https://github.com/seemethere
2024-02-09 05:10:16 +00:00
Nikita Shulga
45c4a0ce9d Update setup tools to 65.5.1 (#119456)
Should some dependabot  alerts by:
- Updating setupttols to 65.5.1
- Updating jinja2 to 3.3.1

TODO:
 - Update jinja2 and sphinx for the docs builds
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119456
Approved by: https://github.com/Skylion007
2024-02-08 23:34:41 +00:00
Angela Yi
0827510fd3 [export] Remove torch._export.export (#119095)
XLA changes: https://github.com/pytorch/xla/pull/6486

Test Plan: CI

Differential Revision: D53316196

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119095
Approved by: https://github.com/ydwu4, https://github.com/zhxchen17, https://github.com/tugsbayasgalan, https://github.com/avikchaudhuri, https://github.com/jerryzh168
2024-02-08 21:22:04 +00:00
Nikita Shulga
d0db80126e [EZ][CI] Fetch full history for MPS jobs (#119401)
Otherwise emitting TD stats will fail with following warning:
```
Emiting td_test_failure_stats
/Users/ec2-user/runner/_work/pytorch/pytorch/tools/testing/target_determination/heuristics/edited_by_pr.py:37: UserWarning: Can't query changed test files due to Command '['git', 'merge-base', 'origin/main', 'HEAD']' returned non-zero exit status 1.
  warn(f"Can't query changed test files due to {e}")
```

Test plan: Observe that MPS jobs finishes without those warnings
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119401
Approved by: https://github.com/atalman, https://github.com/huydhn
2024-02-07 19:29:30 +00:00
PyTorch UpdateBot
53ee47ca32 [vision hash update] update the pinned vision hash (#119337)
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml).
Update the pinned vision hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119337
Approved by: https://github.com/pytorchbot
2024-02-07 03:43:26 +00:00
Edward Z. Yang
3f0fd36835 Introduce size oblivious guards (#118579)
Fixes https://github.com/pytorch/pytorch/issues/117361

The implementation here slightly diverges from what was proposed in the issue, so I will recap what this PR is doing here. Today, when doing computations involving size-like unbacked SymInts, we assume for all operations that the compile time range of the integer is `[2, inf]`, even though at runtime we also accept zero and one.

This PR removes the carte blanche assumption, and instead does the analysis in a much more limited and controlled fashion: only for guards which we have designated as "size oblivious" are we willing to do the analysis under the assumption that the range of all size-like unbacked SymInts is `[2, inf]`; otherwise, we will faithfully only do analysis with `[0, inf]` (or whatever the user provided) bounds.

The infra pieces of this PR are:

* Remove runtime_var_to_range from torch/fx/experimental/symbolic_shapes.py; modify `_constrain_range_for_size` to refine the range without clamping min to 2, and instead add the symbol to a `size_like` set in the ShapeEnv
* When evaluating an expression, if the expression is requested to be evaluated in a `size_oblivious` way, we attempt to statically compute the value of the expression with the assumption that all symbols in `size_like` are updated to assume that they are `>= 2`.
* Add Python and C++ APIs for guarding on a SymBool in a size-oblivious way. In C++, I also need to add some helpers for performing symbolic comparisons, since the stock comparisons immediately specialize in the "normal" way.

The rest of the changes of the PR are marking various spots in PyTorch framework code as size oblivious, based on what our current test suite exercises.

As you review the places where we have marked things as size oblivious, it may become clear why I ended up not opting for the "designate a branch as the default branch when it's not statically obvious which way to go": for some of the conditions, this answer is rather non-obvious. I think potentially there is another refinement on top of this PR, which is something like "I don't care if you can't figure it out with ValueRange analysis, go down this path anyway if there are unbacked sizes involved." But even if we add this API, I think we are obligated to attempt the ValueRange analysis first, since it can lead to better outcomes sometimes (e.g., we are able to figure out that something is contiguous no matter what the unbacked size is.)

When is it permissible to mark something as size oblivious? Heuristically, it is OK anywhere in framework code if it gets you past a guard on unbacked SymInt problem. It is somewhat difficult to provide a true semantic answer, however. In particular, these annotations don't have any observational equivalence guarantee; for example, if I have `torch.empty(u0, 1).squeeze()`, we will always produce a `[u0]` size tensor, even though if `u0 == 1` PyTorch will actually produce a `[]` size tensor. The argument that I gave to Lezcano is that we are in fact defining an alternate semantics for a "special" size = 0, 1, for which we have these alternate eager mode semantics. In particular, suppose that we have a constant `special1` which semantically denotes 1, but triggers alternate handling rules. We would define `torch.empty(special1, 1).squeeze()` to always produce a `[special1]` size tensor, making its semantics coincide with unbacked SymInt semantics. In this model, the decision to designate guards as size oblivious is simply a user API question: you put them where ever you need some handling for special1! As we conservatively error out whenever it is not obvious what `special1` semantics should be, it is always valid to expand these semantics to cover more cases (although you can always choose the wrong semantics!)

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118579
Approved by: https://github.com/eellison, https://github.com/lezcano
2024-02-06 19:45:32 +00:00
Catherine Lee
9250965f8b [ez] Lower windows timeout limit for trunk, set test step timeout (#119234)
Lower windows timeout to be the same as linux

Step timeout thing for win (linux version + details for why at https://github.com/pytorch/pytorch/pull/93084)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119234
Approved by: https://github.com/huydhn
2024-02-06 01:54:31 +00:00
Catherine Lee
200108c6e6 Delete old branches (#117079)
Example https://github.com/pytorch/pytorch/actions/runs/7562281351/job/20592425611?pr=117079 (The code to delete branches isn't being run, it's just listing the branches it wants to delete)

Internal code: https://fburl.com/code/hdvvbfkj

Threshold for branch with PR is 30 days regardless of whether or not the PR is merged or not (compared to 3 days if merged and 30 days if closed).  Threshold for branch without PR is 1.5 years (same internally).

Threshold of ~400 queries to github so it doesn't hit token usage limits.  Currently this leads to about 350 branches deleted per run.

Only query for the last 90 days of updated PRs to reduce token usage, so if a branch has a PR but it was updated 90+ days ago, it will think it doesn't have a PR and will wait for the 1.5 years branch update check instead, regardless of whether the PR is open or closed.

I tested that it could delete my own branch and it worked.

labeled with test-config/crossref because I just want the smallest test config possible to reduce CI usage
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117079
Approved by: https://github.com/malfet
2024-02-05 20:50:05 +00:00
Huy Do
71655bccbe Fix wrong mobile build Docker image (#119213)
It turns out that the Docker image name hasn't been updated yet referring to a non-existing name, may be we could update `calculate-docker-image` to fail in this case if there is a way to separate a non-existing name failure v.s. missing tag failure.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119213
Approved by: https://github.com/clee2000, https://github.com/kit1980, https://github.com/malfet
2024-02-05 19:48:10 +00:00
Edward Z. Yang
29f99a3365 Update XLA commit pin (#118871)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118871
Approved by: https://github.com/albanD
2024-02-02 19:55:04 +00:00
Nikita Shulga
4b59bfe8e5 [CI] Filter should not fail if pr_body is empty (#118934)
Otherwise it will fail with `TypeError: argument of type 'NoneType' is not iterable` (see https://github.com/pytorch/pytorch/actions/runs/7748725174/job/21131915226 for example)

```
% gh api /repos/pytorch/pytorch/issues/118927|
{
  "url": "https://api.github.com/repos/pytorch/pytorch/issues/118927",
  ...
  "body": null,
  ...
  "state_reason": null
}
```

TODO: Can we add a test for it?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118934
Approved by: https://github.com/clee2000, https://github.com/seemethere, https://github.com/huydhn
2024-02-02 00:49:20 +00:00
PyTorch UpdateBot
adff335095 [vision hash update] update the pinned vision hash (#118825)
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml).
Update the pinned vision hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118825
Approved by: https://github.com/pytorchbot
2024-02-01 03:14:16 +00:00
Edward Z. Yang
82b0341af3 s/verison/version/ (#118749)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118749
Approved by: https://github.com/malfet, https://github.com/albanD
2024-01-31 19:23:55 +00:00
PyTorch UpdateBot
f7ae454003 [vision hash update] update the pinned vision hash (#118700)
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml).
Update the pinned vision hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118700
Approved by: https://github.com/pytorchbot
2024-01-31 03:10:52 +00:00
PyTorch UpdateBot
6d7cfb5c3f [audio hash update] update the pinned audio hash (#118699)
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml).
Update the pinned audio hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118699
Approved by: https://github.com/pytorchbot
2024-01-31 03:10:48 +00:00
suo
68a75d4539 [lint] remove merge_base_with from .lintrunner.toml (#118677)
This setting is problematic in fbcode, where the expected behavior is to match `arc lint`, which has a behavior much like running `lintrunner` without a `--merge-base-with` argument.

Let's try removing this. I also updated the CI message to encourage people to run with `-m origin/main`, which should hopefully cut down on confusion in the absence of defaulting to that behavior.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118677
Approved by: https://github.com/PaliC
2024-01-31 00:53:58 +00:00
Huy Do
48f876143a Fix missing permission in create release workflow (#118681)
Fixes https://github.com/pytorch/pytorch/actions/runs/7715417683/job/21029944543
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118681
Approved by: https://github.com/clee2000, https://github.com/seemethere, https://github.com/atalman, https://github.com/malfet
2024-01-30 22:02:30 +00:00
Ivan Zaitsev
ba1be17733 Remove voznesenskym from the list of autoreviewers (#118680)
Mitigates the failures of "Auto Request Review" workflow:
```
Requesting review to ezyang, albanD, miladm, voznesenskym, antoniojkim, SherlockNoMad
Error: HttpError: Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the pytorch/pytorch repository.
```
https://github.com/pytorch/pytorch/actions/runs/7716852492/job/21034629665?pr=118669
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118680
Approved by: https://github.com/clee2000
2024-01-30 21:35:38 +00:00
PyTorch UpdateBot
135f785d77 [audio hash update] update the pinned audio hash (#118338)
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml).
Update the pinned audio hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118338
Approved by: https://github.com/pytorchbot
2024-01-30 03:44:00 +00:00
PyTorch UpdateBot
ff0cb38693 [vision hash update] update the pinned vision hash (#118340)
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml).
Update the pinned vision hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118340
Approved by: https://github.com/pytorchbot
2024-01-30 03:15:16 +00:00
Ivan Zaitsev
e3d7a19f73 [CI] add wait for /orig branch in mergeability check (#118576)
---

Test runs:
* [happy path](https://github.com/pytorch/pytorch/actions/runs/7702614677/job/20991275431?pr=118576) (this PR)
* [waiting for the hardcoded branch name](https://github.com/izaitsevfb/pr-head-test/actions/runs/7702386966/job/20990584514#step:3:33) in a separate repo (step succeeded after the branch was manually pushed)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118576
Approved by: https://github.com/malfet, https://github.com/huydhn
2024-01-29 22:10:50 +00:00
Nikita Shulga
3011a4406f [BE][GHF] Do not hardcode default branch name (#118530)
Instead rely on `GitHubPR.default_branch()` which is the name of the repo's default branch.

Do not pass branch name `merge_changes` is called, as it is set to default branch inside the function

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118530
Approved by: https://github.com/clee2000
2024-01-29 17:18:23 +00:00
Nikita Shulga
7cc7bf9dda [GHF] Add co-authors to PR (#118347)
Mention co-authors in PR body

Modify `CommitAuthors` to include query first two commit `authors`, which makes sure that authors from suggested commits are recognized.

Test plan: CI + check `get_authors()` on a few PRs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118347
Approved by: https://github.com/kit1980
2024-01-27 01:02:49 +00:00
Ivan Zaitsev
d41cfc92e6 [CI] simplify mergeability check workflow (#118415)
Test run:
https://github.com/pytorch/pytorch/actions/runs/7673050632/job/20914851421?pr=118415
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118415
Approved by: https://github.com/PaliC, https://github.com/huydhn
2024-01-26 21:45:24 +00:00
Jean Schmidt
07499074bb Increasing session duration for AWS credentials for _rocm-test.yml (#118412)
The workflow _rocm-test.yml needs longer session duration for AWS role keys

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118412
Approved by: https://github.com/jeffdaily, https://github.com/huydhn
2024-01-26 19:32:24 +00:00
Ivan Zaitsev
b599f5608c Fix mergeability check for ghstack PRs (#118258)
# Changes
* introduce `--check-mergeability` trymerge flag that attempts to merge PR locally, using the same merge logic as the mergebot, but requires just a read-only `GITHUB_TOKEN` and git repo.
* change mergeability workflow to utilize the new --check-mergeability logic

# Alternatives considered

1.
> Rewrite `https://github.com/pytorch/test-infra/actions/workflows/pr-dependencies-check.yml` to correctly support partially merged ghstacks.

That would be a slightly better approach, but ROI is lower, as it requires reimplementing trymerge logic and additional effort to consolidate the codebase (trymerge lives in pytorch repo).

`pr-dependencies-check.yml` still produces human-readable results for partially merged ghstack prs (even if it falsely reports them as non-mergeable).

2.

> Instead of introducing new trymerge flag, use existing flags, including `--dry-run`.

That didn't work, as no combination of existing flags skips the rule checks and ROCKSET lookups.

# Testing

1. Manual testing  `trymerge.py --check-mergeability`  on the regular and ghstack PRs:

```
export GITHUB_TOKEN=
export GIT_REPO_DIR=`pwd`
export GITHUB_REPOSITORY=pytorch/pytorch
export GIT_REMOTE_URL=https://github.com/pytorch/pytorch

# Test 1 (2 prs, 1 is closed)
python3 ../pytorch/.github/scripts/trymerge.py --check-mergeability  117862
Skipping 1 of 2 PR (#117859) as its already been merged

echo $?
0

# Test 2 (3 prs, 1 is closed)
python3 ../pytorch/.github/scripts/trymerge.py --check-mergeability  118125
Skipping 1 of 3 PR (#117859) as its already been merged

echo $?
0

# Test 3 (3 prs, intentional conflicts introduced into `main`):

python3 ../pytorch/.github/scripts/trymerge.py --check-mergeability  118125
Skipping 1 of 3 PR (#117859) as its already been merged
stdout:
Auto-merging torch/_inductor/ir.py
Auto-merging torch/_inductor/lowering.py
CONFLICT (content): Merge conflict in torch/_inductor/lowering.py
error: could not apply 66ba5b8792f... Realize inputs to DynamicScalar before unwrapping
...
RuntimeError: Command `git -C /Users/ivanzaitsev/pytorch2 cherry-pick -x 66ba5b8792fa076c4e512d920651e5b6b7e466f4` returned non-zero exit code 1
```

2.  Workflow run:
https://github.com/pytorch/pytorch/actions/runs/7660736172/job/20878651852?pr=118258

<img width="516" alt="image" src="https://github.com/pytorch/pytorch/assets/108101595/28fbf0d2-ac2a-4518-b41d-b32b41373747">
<img width="621" alt="image" src="https://github.com/pytorch/pytorch/assets/108101595/ddbf8566-a417-43ec-9d0e-f623f4a71313">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118258
Approved by: https://github.com/PaliC, https://github.com/huydhn
2024-01-26 03:15:56 +00:00
Nikita Shulga
66c3152e36 [CI] Build docker on larger runners (#118167)
Otherwise it takes 1+h to build CUDA12.1 docker
- Limit UCC builds to just sm_52(M60) and sm_86(A10G), which I think has the biggest impact
- Replace hardcoded `-j6` build parallelism with more dynamic `-j$[$(nproc) - 2]`
- Remove redundant check about Ubuntu-14.04
- Added `DOCKER_BUILDKIT` to parallelize the builds

As result, docker build time drops from 1+h to 35 min
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118167
Approved by: https://github.com/huydhn
2024-01-26 02:28:25 +00:00
Catherine Lee
de9ddd19a5 Various CI settings (#117668)
Test [ci-verbose-test-logs] (this worked, the test logs printing while running and interleaved and are really long)

Settings for no timeout (step timeout still applies, only gets rid of ~30 min timeout for shard of test file) and no piping logs/extra verbose test logs (good for debugging deadlocks but results in very long and possibly interleaved logs).

Also allows these to be set via pr body if the label name is in brackets ex [label name] or the test above.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117668
Approved by: https://github.com/huydhn
2024-01-26 00:17:29 +00:00
Catherine Lee
02a411d4a6 [mergebot] Dry run for labels + easier to read Dr CI result (#118240)
Dry run open for labels so we can run trymerge locally with dryrun without actually affected the PR

Make Dr.CI results easier to read (previously a massive json dump, now just the job names + ids, in a nicer format)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118240
Approved by: https://github.com/huydhn
2024-01-25 23:06:43 +00:00
Huy Do
eebe7e1d37 Migrate update-viablestrict to test-infra (#118163)
In https://github.com/pytorch/test-infra/pull/4905, so that ExecuTorch can use the same GHA on their CI.

### Testing

https://github.com/pytorch/pytorch/actions/runs/7634906738/job/20799502532#step:2:15480
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118163
Approved by: https://github.com/clee2000
2024-01-25 07:07:34 +00:00
PyTorch UpdateBot
5a83c47d98 [vision hash update] update the pinned vision hash (#117594)
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml).
Update the pinned vision hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117594
Approved by: https://github.com/pytorchbot
2024-01-25 05:33:01 +00:00
Jack Taylor
e6288820e3 Revert "Update triton ROCm version to 6.0" (#118179)
Reverting [this commit](https://github.com/pytorch/pytorch/pull/117433) due to failures observed in wheel environment e.g:
```
ImportError: /tmp/torchinductor_root/triton/0/ebfa57c0b7b95873c96cad6f9bca148d/hip_utils.so: undefined symbol: hipGetDevicePropertiesR0600`
```

Will revert for now and investigate and aim to re-land this as part of https://github.com/pytorch/pytorch/pull/116270

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118179
Approved by: https://github.com/jeffdaily, https://github.com/malfet
2024-01-24 22:01:27 +00:00
DanilBaibak
a545ebc870 Switched macOS runners type to macos-m1-stable (#117651)
Switched macOS runners type to `macos-m1-stable`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117651
Approved by: https://github.com/huydhn
2024-01-24 11:55:13 +00:00
Bin Bao
c6930aad46 Update Triton pin (#117873)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117873
Approved by: https://github.com/shunting314, https://github.com/malfet
2024-01-23 21:05:30 +00:00