Commit graph

800 commits

Author SHA1 Message Date
Jon Janzen
605dfd8fb4 Switch sync_distributed_folder to use non-reverse order (#131683)
`git` on GHA seems to use the reverse commit ordering that I see locally O_o
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131683
Approved by: https://github.com/seemethere
2024-07-25 20:44:23 +00:00
PaliC
544f950d14 [BE] Improve error message when there are internal changes (#131547)
Fixes https://github.com/pytorch/test-infra/issues/4988
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131547
Approved by: https://github.com/xuzhao9, https://github.com/malfet, https://github.com/atalman
2024-07-24 20:38:08 +00:00
Thanh Ha
3eb9fa5d58 Add support for using LF Canary runners (#131188)
The script is updated such that if a canary build is detected and the label_type is LF runner it will run on an LF Canary runner.

Closes pytorch/ci-infra#245.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131188
Approved by: https://github.com/ZainRizvi
2024-07-22 13:26:46 +00:00
Xuehai Pan
747b38c131 [BE][Easy][2/19] enforce style for empty lines in import segments in .ci/ and .github/ (#129753)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129753
Approved by: https://github.com/malfet
ghstack dependencies: #129752
2024-07-16 09:40:00 +00:00
Xuehai Pan
973037be6a [BE][Easy] apply autofix for ruff rules unnecessary-collection-call (C408): list() / tuple() / dict() (#130199)
This PR changes the empty collection factory call to Python literals:

- `list()` -> `[]`
- `tuple()` -> `()`
- `dict()` -> `{}`

The Python literals are more performant and safer. For example, the bytecode for building an empty dictionary:

```bash
$ python3 -m dis - <<EOS
import collections

d1 = {}
d2 = dict()

dict = collections.OrderedDict
d3 = dict()
EOS
```

```text
  0           0 RESUME                   0

  1           2 LOAD_CONST               0 (0)
              4 LOAD_CONST               1 (None)
              6 IMPORT_NAME              0 (collections)
              8 STORE_NAME               0 (collections)

  3          10 BUILD_MAP                0
             12 STORE_NAME               1 (d1)

  4          14 PUSH_NULL
             16 LOAD_NAME                2 (dict)
             18 CALL                     0
             26 STORE_NAME               3 (d2)

  6          28 LOAD_NAME                0 (collections)
             30 LOAD_ATTR                8 (OrderedDict)
             50 STORE_NAME               2 (dict)

  7          52 PUSH_NULL
             54 LOAD_NAME                2 (dict)
             56 CALL                     0
             64 STORE_NAME               5 (d3)
             66 RETURN_CONST             1 (None)
```

The dict literal `{}` only has one bytecode `BUILD_MAP`, while the factory call `dict()` has three `PUSH_NULL + LOAD_NAME + CALL`. Also, the factory call is not safe if users override the `dict` name in `locals` or `globals` (see the example of replacing with `OrderedDict` above).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130199
Approved by: https://github.com/malfet
2024-07-11 17:30:28 +00:00
chuanqiw
ca023f77bc [CD] Add pytorch xpu wheel build in nightly (#129560)
Add pytorch xpu wheel build in nightly after the xpu build image enabling PR https://github.com/pytorch/builder/pull/1879 merged

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129560
Approved by: https://github.com/atalman
2024-07-11 15:49:04 +00:00
Jon Janzen
46c52661bc Use a better cherry-pick strategy for stable pytorch w/ distribute changes (#129987)
1. Update the branch name from internal feedback
2. Only cherry-pick in the changes to these folders
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129987
Approved by: https://github.com/seemethere
2024-07-10 20:55:36 +00:00
Catherine Lee
80a421a54d [TD] Pin numpy to 1.26.0 in indexer (#130442)
Temporarily pin 1.26.0 to get the workflow working while I go sort out which dependencies need to be updated

Succeeding run: https://github.com/pytorch/pytorch/actions/runs/9877733366/job/27280052419?pr=130442

Tested by adding my branch to the trust relationship for the policy and removing the environment
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130442
Approved by: https://github.com/atalman, https://github.com/malfet
2024-07-10 20:52:24 +00:00
atalman
a1590e16df Add single Python 3.10, single Cuda 12.1 build with dependencies included (#130349)
Build large wheel for Python 3.10, CUDA 12.1 that will be used in Colab. Build name: ``manywheel-py3_11-cuda12_1-full-build``

We still have all code to support the full build in builder repo, here:
https://github.com/pytorch/builder/blob/main/manywheel/build_cuda.sh#L151

Test:
```
import sys
import torch
sys.version_info
print(torch.__version__)
sys.version_info

2.3.0+cu121
sys.version_info(major=3, minor=10, micro=12, releaselevel='final', serial=0)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130349
Approved by: https://github.com/malfet
2024-07-10 12:57:39 +00:00
Andrey Talman
17ca0d0edf Add linux manywheel python 3.13 binary workflows (#130030)
Test with passing linux manywheel workflows is here: https://github.com/pytorch/pytorch/pull/121979
Builder PR already merged: https://github.com/pytorch/builder/pull/1910

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130030
Approved by: https://github.com/albanD
2024-07-08 22:50:15 +00:00
chuanqiw
d496145534 [CD] Add triton xpu wheel build (#129730)
Enable triton xpu wheel build firstly, then add pytorch xpu nightly wheel build

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129730
Approved by: https://github.com/atalman
2024-07-04 17:55:20 +00:00
Aaron Gokaslan
6cb0ad3375 [BE]: Update NCCL submodule to 2.21.5 (#124014)
Update NCCL to the latest version. This release is mostly bugfixes with a few new minor features.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124014
Approved by: https://github.com/eqy, https://github.com/ezyang, https://github.com/nWEIdia, https://github.com/malfet, https://github.com/atalman
2024-07-02 14:39:33 +00:00
Jack Taylor
95a5958db4 [ROCm] Update nightly triton-rocm pin to release branch (#129361)
Update pin to tip of https://github.com/triton-lang/triton/commits/release/3.0.x/ following upstream strategy here https://github.com/pytorch/pytorch/pull/126098

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129361
Approved by: https://github.com/peterbell10
2024-07-02 11:49:52 +00:00
Zain Rizvi
9645eaaaec [BE] Improve logging for runner-determinator (#129679)
This lets us be more flexible about what data we output and throwing exceptions. It's also less likely to break when others make changes (e.g. any print statement would have broken this code before since the printed output was expected to only be a json)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129679
Approved by: https://github.com/zxiiro, https://github.com/jeanschmidt, https://github.com/Skylion007
2024-07-01 22:31:35 +00:00
PyTorch MergeBot
3d96217891 Revert "[BE][Easy] use pathlib.Path instead of dirname / ".." / pardir (#129374)"
This reverts commit 9e1f3ecaa7.

Reverted https://github.com/pytorch/pytorch/pull/129374 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is still failing with the same error ([comment](https://github.com/pytorch/pytorch/pull/129374#issuecomment-2197801405))
2024-06-29 00:47:15 +00:00
Xuehai Pan
9e1f3ecaa7 [BE][Easy] use pathlib.Path instead of dirname / ".." / pardir (#129374)
Changes by apply order:

1. Replace all `".."` and `os.pardir` usage with `os.path.dirname(...)`.
2. Replace nested `os.path.dirname(os.path.dirname(...))` call with `str(Path(...).parent.parent)`.
3. Reorder `.absolute()` ~/ `.resolve()`~ and `.parent`: always resolve the path first.

    `.parent{...}.absolute()` -> `.absolute().parent{...}`

4. Replace chained `.parent x N` with `.parents[${N - 1}]`: the code is easier to read (see 5.)

    `.parent.parent.parent.parent` -> `.parents[3]`

5. ~Replace `.parents[${N - 1}]` with `.parents[${N} - 1]`: the code is easier to read and does not introduce any runtime overhead.~

    ~`.parents[3]` -> `.parents[4 - 1]`~

6. ~Replace `.parents[2 - 1]` with `.parent.parent`: because the code is shorter and easier to read.~

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129374
Approved by: https://github.com/justinchuby, https://github.com/malfet
2024-06-28 00:35:15 +00:00
Zain Rizvi
389492e264 Fix runner determinator bug (#129612)
Currently the runner determinator is buggy and doesn't let anyone's workflows run against the LF runners (it prefixes a "@" to the user names in the issue instead of either stripping it or prefixing it to the incoming names)

This PR fixes the bug so that people opted in to using LF runners can actually use them. It also puts the python code back into the repo.  Even though the code isn't directly invoked, having it there makes testing and linting easier/possible

Also includes lint fixes

Note: if you just review the .yml file you'll see all the relevant diffs

### Testing:
#### Before
```
python .github/scripts/runner_determinator.py --github-token $GH_KEY --github-issue 5132 --github-actor ZainRizvi --github-issue-owner ZainRizvi --github-branch foo
{"label_type": "", "message": "LF Workflows are disabled for ZainRizvi, ZainRizvi. Using meta runners."}
```

#### After
```
python .github/scripts/runner_determinator.py --github-token $GH_KEY --github-issue 5132 --github-actor ZainRizvi --github-issue-owner ZainRizvi --github-branch foo
{"label_type": "lf.", "message": "LF Workflows are enabled for ZainRizvi, ZainRizvi. Using LF runners."}
```

Aside: updated test case after rebase:
```
python .github/scripts/runner_determinator.py --github-token $GH_KEY --github-issue 5132 --github-actor ZainRizvi --github-issue-owner ZainRizvi2 --github-branch foo  --github-repo python/pythonss --github-ref-type branch
{"label_type": "lf.", "message": "LF Workflows are enabled for ZainRizvi. Using LF runners."}
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129612
Approved by: https://github.com/zxiiro, https://github.com/jeanschmidt
2024-06-27 17:51:09 +00:00
Catherine Lee
90f82426b9 RS migration - trymerge to upload merge records to s3 (#129503)
Uploads merge records to to ossci-raw-job-status (public) bucket instead of directly to rockset

The runner used by trymerge is a GH runner, so it doesn't have access to s3.  Instead, I save the record as a json and upload the json to s3 in a different step that runs after the aws credentials are configured.

The role is defined [here](https://togithub.com/pytorch-labs/pytorch-gha-infra/pull/421)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129503
Approved by: https://github.com/huydhn, https://github.com/ZainRizvi, https://github.com/malfet
2024-06-26 19:06:52 +00:00
PyTorch MergeBot
895316119d Revert "[BE][Easy] use pathlib.Path instead of dirname / ".." / pardir (#129374)"
This reverts commit 0314c4c101.

Reverted https://github.com/pytorch/pytorch/pull/129374 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it causes lots of internal build failures where they fail to find hipify module ([comment](https://github.com/pytorch/pytorch/pull/129374#issuecomment-2192437052))
2024-06-26 19:03:57 +00:00
Jean Schmidt
53fafdd0c3 [BE] Runner determinator: more resilient user matching (#129462)
Small improvements on runner determinator script:

* Don't do splitting of the issue comment, unless necessary;
* Match username against a set over a list;
* Match both triggering_actor and issue owner over only actor (to avoid edge cases, where we get `pytorch-bot[bot]`)
* Add stripping, to remove potential breaking and not visible whitespaces;
* Don't use linux.4xlarge as a runner: it should not depend on meta runners, for reliability;
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129462
Approved by: https://github.com/zxiiro, https://github.com/ZainRizvi
2024-06-26 13:47:52 +00:00
Huy Do
cda4d4887d Skip signals from older runs of the same workflows (#129291)
I discovered this bug in trymerge when debugging https://github.com/pytorch/pytorch/pull/129013 in which Dr.CI reported no relevant failures while mergebot complained about some unrelated ROCm failures https://github.com/pytorch/pytorch/pull/129013#issuecomment-2183009217.

It turns out that mergebot took into account stale signals from older runs of the same workflow here.  For example,
* https://github.com/pytorch/pytorch/actions/runs/9604985361 was the first run where it had a ROCm failure
* While https://github.com/pytorch/pytorch/actions/runs/9608926565 was the second attempt and it was all green

Notice that both runs came from the same push to commit [be69191](be69191f2d) with [ciflow/rocm/129013](https://github.com/pytorch/pytorch/tree/ciflow/rocm/129013).  So, we just need to check the signals from the newer run.

Note that Dr.CI handles this part correctly using the logic in https://github.com/pytorch/test-infra/blob/main/torchci/pages/api/drci/drci.ts#L1079-L1088.  So, the fix in this PR is to bring the same logic to trymerge.

### Testing

`pytest -v test_trymerge.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129291
Approved by: https://github.com/ZainRizvi
2024-06-26 03:49:09 +00:00
Xuehai Pan
0314c4c101 [BE][Easy] use pathlib.Path instead of dirname / ".." / pardir (#129374)
Changes by apply order:

1. Replace all `".."` and `os.pardir` usage with `os.path.dirname(...)`.
2. Replace nested `os.path.dirname(os.path.dirname(...))` call with `str(Path(...).parent.parent)`.
3. Reorder `.absolute()` ~/ `.resolve()`~ and `.parent`: always resolve the path first.

    `.parent{...}.absolute()` -> `.absolute().parent{...}`

4. Replace chained `.parent x N` with `.parents[${N - 1}]`: the code is easier to read (see 5.)

    `.parent.parent.parent.parent` -> `.parents[3]`

5. ~Replace `.parents[${N - 1}]` with `.parents[${N} - 1]`: the code is easier to read and does not introduce any runtime overhead.~

    ~`.parents[3]` -> `.parents[4 - 1]`~

6. ~Replace `.parents[2 - 1]` with `.parent.parent`: because the code is shorter and easier to read.~

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129374
Approved by: https://github.com/justinchuby, https://github.com/malfet
2024-06-25 08:28:38 +00:00
Zain Rizvi
4d04203852 [BE] Runner determinator: Expect usernames to be prefixed with '@' (#129246)
Expect the username in the runner rollover issue (https://github.com/pytorch/test-infra/issues/5132) to be prefixed with a "@".

This will make typos way less likely since github's autocomplete/autoformating will help out

For now, I've updated the issue to have usernames both with and without the @ while this change rolls out

Testing:
Ran the script locally on both this issue and a new test issue and verified they both had the expected output:
```
(venv) (base) ➜  ~/pytorch git:(zainr/improve-get-workflow-type)
python .github/scripts/get_workflow_type.py --github-token github_pat_***  --github-issue 5132 --github-user ZainRizvi --github-branch "zainr/stuff"
{"label_type": "lf.", "message": "LF Workflows are enabled for ZainRizvi. Using LF runners."}
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129246
Approved by: https://github.com/zxiiro, https://github.com/huydhn
2024-06-25 02:39:33 +00:00
PaliC
b0044e2e18 [Split Build] Support nightly release (#129011)
This PR adds the split build to our binaries workflow. Validation for the workflow is done using the PR above in conjunction with https://github.com/pytorch/builder/pull/1876.

Test Workflow: Check CI in the workflow above
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129011
Approved by: https://github.com/atalman
2024-06-22 05:45:14 +00:00
Jithun Nair
a6ac6447b5 Re-enable py3.12 nightly wheel builds and add triton dependency for ROCm (#128525)
The llnl-hatchet developers have published the py3.12 binaries on [PyPI](https://pypi.org/project/llnl-hatchet/#files). In fact, looking [here](https://download.pytorch.org/whl/nightly/llnl-hatchet), it seems we already have the py3.12 wheels mirrored. This should allow us to re-enable py3.12 binaries for ROCm.

This PR reverts commit 9d849d4312.

It also adds the pytorch-triton-rocm dependency for torch wheels on ROCm since pytorch-triton-rocm py3.12 wheels are available now

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128525
Approved by: https://github.com/malfet
2024-06-19 21:56:54 +00:00
Thanh Ha
4bc90185fb fix: Print statements causing parse error (#128969)
The print statements for the get_workflow_type script is problematic because the shell script calling this script is expecting the output to only be JSON. This PR resolves this by removing all print statements to covert them to a message field in the JSON return output so that the output can continue to expect to be JSON while giving us the debug data we are looking for.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128969
Approved by: https://github.com/tylertitsworth, https://github.com/ZainRizvi
2024-06-19 01:17:08 +00:00
Huy Do
84c86e56bd Update tracker issues after successfully cherry-picking a PR (#128924)
This extends the capacity of the cherry-pick bot to automatically update the tracker issue with the information.  For this to work, the tracker issue needs to be an open one with a `release tracker` label, i.e. https://github.com/pytorch/pytorch/issues/128436.  The version from the release branch, i.e. `release/2.4`, will be match with the title of the tracker issue, i.e. `[v.2.4.0] Release Tracker` or `[v.2.4.1] Release Tracker`

### Testing

`python cherry_pick.py --onto-branch release/2.4 --classification release --fixes "DEBUG DEBUG" --github-actor huydhn 128718`

* On the PR https://github.com/pytorch/pytorch/pull/128718#issuecomment-2174846771
* On the tracker issue https://github.com/pytorch/pytorch/issues/128436#issuecomment-2174846757

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128924
Approved by: https://github.com/atalman
2024-06-18 17:48:47 +00:00
Nikita Shulga
b94c52dd29 [GHF] Refuse merge to non-default branch (#128710)
Unless PR is ghstack one

Test plan:
```
% GITHUB_TOKEN=$(gh auth token)  python3 -c "from trymerge import GitHubPR; pr=GitHubPR('pytorch', 'pytorch', 128591); print(pr.base_ref(), pr.default_branch())"
release/2.4 main
```
Fixes: https://github.com/pytorch/test-infra/issues/5339

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128710
Approved by: https://github.com/seemethere, https://github.com/atalman
2024-06-14 18:23:25 +00:00
PyTorch MergeBot
ee140a198f Revert "[Port][Quant][Inductor] Bug fix: mutation nodes not handled correctly for QLinearPointwiseBinaryPT2E (#128591)"
This reverts commit 03e8a4cf45.

Reverted https://github.com/pytorch/pytorch/pull/128591 on behalf of https://github.com/atalman due to Contains release only changes should not be landed ([comment](https://github.com/pytorch/pytorch/pull/128591#issuecomment-2168308233))
2024-06-14 15:51:00 +00:00
Xia, Weiwen
03e8a4cf45 [Port][Quant][Inductor] Bug fix: mutation nodes not handled correctly for QLinearPointwiseBinaryPT2E (#128591)
Port #127592 from main to release/2.4

------
Fixes #127402

- Revert some changes to `ir.MutationOutput` and inductor/test_flex_attention.py
- Add checks of mutation for QLinearPointwiseBinaryPT2E

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127592
Approved by: https://github.com/leslie-fang-intel, https://github.com/Chillee

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128591
Approved by: https://github.com/jgong5, https://github.com/Chillee
2024-06-14 09:31:38 +00:00
Zain Rizvi
b05b8d3989 [EZ][ALI Migration] Add logging for workflow type determination (#128619)
To help figure out what went wrong when the wrong label appears to have been set
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128619
Approved by: https://github.com/zxiiro, https://github.com/clee2000
2024-06-13 16:37:07 +00:00
DanilBaibak
6d1b1ddd3e Select Runner Label Dynamically (#127287)
Updated `get_workflow_type.py` logic to dynamically select a prefix for the runner label.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127287
Approved by: https://github.com/ZainRizvi
2024-06-12 18:47:47 +00:00
Aaron Orenstein
3c971d2ef3 Flip default value for mypy disallow_untyped_defs [final] (#127836)
Not requiring all functions to have types allows a lot of 'Any' types to slip in - which poison types and make mypy unable to properly typecheck the code.  I want to flip the default so that new files are required to have fully typed defs and we can have a burndown list of files that fail to require full types.

The preceding stack of PRs (cut up simply to limit the number of file changes per PR "reasonable") adds `# mypy: allow-untyped-defs` to any file which didn't immediately pass mypy with the flag flipped.  Due to changing files and merge conflicts it will probably be necessary to have several passes through before landing this final PR which turns the option on.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127836
Approved by: https://github.com/oulgen, https://github.com/Skylion007
2024-06-12 15:28:42 +00:00
Eddie Yan
de4f8b9946 [BE]: Update cudnn to 9.1.0.70 (#123475)
cuDNN has managed to upload cu11 and cu12 wheels for ~~9.0.0.312~~ 9.1.0.70, so trying this out...

CC @Skylion007 @malfet

Co-authored-by: Wei Wang <weiwan@nvidia.com>
Co-authored-by: atalman <atalman@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123475
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/nWEIdia, https://github.com/atalman
2024-06-06 18:45:22 +00:00
Catherine Lee
936225d7b2 [mergebot] Fix pending unstable jobs being viewed as failed (#128080)
https://github.com/pytorch/pytorch/pull/128038#issuecomment-2150802030

In the above, pending unstable jobs get put into the ok_failed_checks list, and because there are a lot of unstable jobs, it exceeds the threshold and merge fails.

I don't think unstable jobs should be considered in the ok failed checks threshold, only flaky and broken trunk jobs should be considered there.

Change looks big, but main thing is that unstable jobs don't get included in the check for how many flaky failures there are.  The other changes are mostly renames so things are clearer
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128080
Approved by: https://github.com/huydhn
2024-06-06 18:22:20 +00:00
Jithun Nair
9d849d4312 Disable py3.12 nightly wheel builds for ROCm (#127968)
Triton commit bump PR https://github.com/pytorch/pytorch/pull/125396 reverted due to missing llnl-hatchet dependency for triton. Workaround is to disable py3.12 binary build jobs for ROCm on PyTorch CI until llnl-hatchet publishes py3.12 wheels on [PyPI](https://pypi.org/project/llnl-hatchet/#files)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127968
Approved by: https://github.com/atalman, https://github.com/pruthvistony
2024-06-06 15:17:35 +00:00
PyTorch MergeBot
9a8ab778d3 Revert "[BE]: Update cudnn to 9.1.0.70 (#123475)"
This reverts commit c490046693.

Reverted https://github.com/pytorch/pytorch/pull/123475 on behalf of https://github.com/huydhn due to CUDA trunk jobs are pretty red after this change, and the forward fix https://github.com/pytorch/pytorch/pull/127984 does not look working ([comment](https://github.com/pytorch/pytorch/pull/123475#issuecomment-2149258430))
2024-06-05 08:59:53 +00:00
Eddie Yan
c490046693 [BE]: Update cudnn to 9.1.0.70 (#123475)
cuDNN has managed to upload cu11 and cu12 wheels for ~~9.0.0.312~~ 9.1.0.70, so trying this out...

CC @Skylion007 @malfet

Co-authored-by: Wei Wang <weiwan@nvidia.com>
Co-authored-by: atalman <atalman@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123475
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/nWEIdia
2024-06-04 16:33:06 +00:00
Huy Do
57baae9c9b Migrating CI/CD jobs to macOS 14 (#127582)
We have half the fleet in MacoS 14 already and it has been running fine so far https://github.com/pytorch/pytorch/issues/127490.  So, I'm preparing the final push to replace the rest of them.  This also switches release build from 13 to 14 (GitHub runners)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127582
Approved by: https://github.com/atalman
2024-05-31 22:30:59 +00:00
Catherine Lee
121c55d8d1 Old branch deletion script to also delete old ciflow tags (#127625)
Change branch deletion script to also delete left over ciflow tags that the bot doesn't get to, as well as the one created by triggering a workflow on HUD

Example run https://github.com/pytorch/pytorch/actions/runs/9322082915/job/25662376463?pr=127625
(didn't actually delete the tag, but lists what tags it would delete)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127625
Approved by: https://github.com/huydhn
2024-05-31 18:54:54 +00:00
Svetlana Karslioglu
4a0d96e496 Add a GH action to autolabel docathon PRs (#127569)
To ease oncall burden for the docathon PR reviewers and ensure all PRs are correctly labeled, adding this GH action that will look for the issue number in the PR and if that issue has a docathon-h1-2024 label, then it would propagate the labels from the issues into the PR. It should not conflict with the existing labelers because we use ``pull_request.add_to_labels`` - credit @kit1980.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127569
Approved by: https://github.com/kit1980
2024-05-31 17:57:07 +00:00
Jon Janzen
781f26240a Add script to copy distributed commits to stable branch (#126918)
This will be used as part of a prototype of a stable pytorch with a fast-moving distributed folder

Tasks: T189915739

Test plan:

I ran the script in a few configurations on my local machine. It worked as expected
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126918
Approved by: https://github.com/seemethere, https://github.com/malfet
2024-05-29 03:33:44 +00:00
Ting Lu
1c2e221e25 CUDA 12.4 ARM wheel integration to CD - nightly build (#126174)
rebasing https://github.com/pytorch/pytorch/pull/124112.
too many conflict files, so starting a new PR.

Test https://github.com/pytorch/builder/pull/1775 (merged) for ARM wheel addition
Test https://github.com/pytorch/builder/pull/1828 (merged) for setting MAX_JOBS

Current issue to follow up:
https://github.com/pytorch/pytorch/issues/126980

Co-authored-by: Aidyn-A <aidyn.b.aitzhan@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126174
Approved by: https://github.com/nWEIdia, https://github.com/atalman
2024-05-27 05:50:36 +00:00
Xuehai Pan
ba3b05fdf3 [1/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort stdlib (#127122)
The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127122
Approved by: https://github.com/kit1980
2024-05-25 08:25:50 +00:00
Wei Wang
0902929d58 [CUDA] [CI]: Enable CUDA 12.4 CI (#121956)
Reference PR: https://github.com/pytorch/pytorch/pull/93406

Co-authored-by: Aidyn-A <31858918+Aidyn-A@users.noreply.github.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121956
Approved by: https://github.com/atalman
2024-05-23 20:37:47 +00:00
Catherine Lee
5ccc634603 [CI] Pin uv==0.1.45 for lintrunner (#126908)
e4623de4cf/1
```

2024-05-22T19:10:48.5974515Z + python3 -m pip install uv
2024-05-22T19:10:48.5975198Z Collecting uv
2024-05-22T19:10:48.5976496Z   Downloading uv-0.1.45-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (32 kB)
2024-05-22T19:10:48.5977828Z Downloading uv-0.1.45-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.8 MB)
2024-05-22T19:10:48.5986243Z [?25l   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/12.8 MB ? eta -:--:--
2024-05-22T19:10:48.5988326Z    ━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━ 6.8/12.8 MB 205.8 MB/s eta 0:00:01
2024-05-22T19:10:48.5990300Z    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 12.8/12.8 MB 215.1 MB/s eta 0:00:01
2024-05-22T19:10:48.5991645Z    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 12.8/12.8 MB 215.1 MB/s eta 0:00:01
2024-05-22T19:10:48.5992724Z    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.8/12.8 MB 97.8 MB/s eta 0:00:00
2024-05-22T19:10:48.5993443Z [?25hInstalling collected packages: uv
2024-05-22T19:10:48.5993950Z Successfully installed uv-0.1.45
2024-05-22T19:10:48.5994363Z + CACHE_DIRECTORY=/tmp/.lintbin
2024-05-22T19:10:48.5994772Z + [[ -d /tmp/.lintbin ]]
2024-05-22T19:10:48.5995157Z + cp -r /tmp/.lintbin .
2024-05-22T19:10:48.5995497Z + lintrunner init
2024-05-22T19:10:48.5995839Z + [[ 1 == \1 ]]
```
vs
```

2024-05-22T20:33:53.5563991Z + python3 -m pip install uv
2024-05-22T20:33:53.5564921Z Collecting uv
2024-05-22T20:33:53.5566259Z   Downloading uv-0.2.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (32 kB)
2024-05-22T20:33:53.5568142Z Downloading uv-0.2.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.9 MB)
2024-05-22T20:33:53.5570253Z [?25l   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/12.9 MB ? eta -:--:--
2024-05-22T20:33:53.5571889Z    ━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━ 7.0/12.9 MB 208.8 MB/s eta 0:00:01
2024-05-22T20:33:53.5573716Z    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 12.9/12.9 MB 206.7 MB/s eta 0:00:01
2024-05-22T20:33:53.5575478Z    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 12.9/12.9 MB 206.7 MB/s eta 0:00:01
2024-05-22T20:33:53.5577240Z    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.9/12.9 MB 101.6 MB/s eta 0:00:00
2024-05-22T20:33:53.5578531Z [?25hInstalling collected packages: uv
2024-05-22T20:33:53.5579316Z Successfully installed uv-0.2.1
2024-05-22T20:33:53.5580033Z + CACHE_DIRECTORY=/tmp/.lintbin
2024-05-22T20:33:53.5580640Z + [[ -d /tmp/.lintbin ]]
2024-05-22T20:33:53.5581229Z + cp -r /tmp/.lintbin .
2024-05-22T20:33:53.5581799Z + lintrunner init
2024-05-22T20:33:53.5603302Z Traceback (most recent call last):
2024-05-22T20:33:53.5604857Z   File "/home/ec2-user/actions-runner/_work/pytorch/pytorch/test-infra/.github/scripts/run_with_env_secrets.py", line 101, in <module>
2024-05-22T20:33:53.5605805Z     main()
2024-05-22T20:33:53.5606687Z   File "/home/ec2-user/actions-runner/_work/pytorch/pytorch/test-infra/.github/scripts/run_with_env_secrets.py", line 97, in main
2024-05-22T20:33:53.5607762Z     run_cmd_or_die(f"docker exec -t {container_name} /exec")
2024-05-22T20:33:53.5608949Z   File "/home/ec2-user/actions-runner/_work/pytorch/pytorch/test-infra/.github/scripts/run_with_env_secrets.py", line 38, in run_cmd_or_die
2024-05-22T20:33:53.5610107Z     raise RuntimeError(f"Command {cmd} failed with exit code {exit_code}")
2024-05-22T20:33:53.5611328Z RuntimeError: Command docker exec -t e551764bdba0c87c2fc392fba9ea265e8821a552915b36010f18299d8035b304 /exec failed with exit code 1
2024-05-22T20:33:53.5626540Z ##[error]Process completed with exit code 1.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126908
Approved by: https://github.com/huydhn
2024-05-22 21:41:21 +00:00
Catherine Lee
ac2c547838 [TD] Upload names of failures to s3 for pytest cache (#126315)
Some tests don't get run through pytest and pytest crashes when a test segfaults, so in both caess, the pytest cache won't have an entry (similar to https://github.com/pytorch/test-infra/pull/5205).

Instead, manually upload/download an extra file that lists the failing test files

Technically this would be more general than the pytest cache
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126315
Approved by: https://github.com/ZainRizvi
2024-05-21 16:29:31 +00:00
PyTorch MergeBot
8bca0847c2 Revert "[TD] Upload names of failures to s3 for pytest cache (#126315)"
This reverts commit 655038687a.

Reverted https://github.com/pytorch/pytorch/pull/126315 on behalf of https://github.com/clee2000 due to broke inductor ([comment](https://github.com/pytorch/pytorch/pull/126315#issuecomment-2121133045))
2024-05-20 20:15:08 +00:00
Catherine Lee
655038687a [TD] Upload names of failures to s3 for pytest cache (#126315)
Some tests don't get run through pytest and pytest crashes when a test segfaults, so in both caess, the pytest cache won't have an entry (similar to https://github.com/pytorch/test-infra/pull/5205).

Instead, manually upload/download an extra file that lists the failing test files

Technically this would be more general than the pytest cache
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126315
Approved by: https://github.com/ZainRizvi
2024-05-20 17:36:30 +00:00
Aleksei Nikiforov
da7ced6e8c S390x binaries (#120398)
Allow building nightly, rc and release binaries for s390x.

This PR implements building binaries, but publishing part is currently missing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120398
Approved by: https://github.com/huydhn
2024-05-11 02:32:25 +00:00