Jon Janzen
605dfd8fb4
Switch sync_distributed_folder to use non-reverse order ( #131683 )
...
`git` on GHA seems to use the reverse commit ordering that I see locally O_o
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131683
Approved by: https://github.com/seemethere
2024-07-25 20:44:23 +00:00
PaliC
544f950d14
[BE] Improve error message when there are internal changes ( #131547 )
...
Fixes https://github.com/pytorch/test-infra/issues/4988
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131547
Approved by: https://github.com/xuzhao9 , https://github.com/malfet , https://github.com/atalman
2024-07-24 20:38:08 +00:00
Thanh Ha
3eb9fa5d58
Add support for using LF Canary runners ( #131188 )
...
The script is updated such that if a canary build is detected and the label_type is LF runner it will run on an LF Canary runner.
Closes pytorch/ci-infra#245 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131188
Approved by: https://github.com/ZainRizvi
2024-07-22 13:26:46 +00:00
Xuehai Pan
747b38c131
[BE][Easy][2/19] enforce style for empty lines in import segments in .ci/ and .github/ ( #129753 )
...
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501 . Most changes are auto-generated by linter.
You can review these PRs via:
```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129753
Approved by: https://github.com/malfet
ghstack dependencies: #129752
2024-07-16 09:40:00 +00:00
Xuehai Pan
973037be6a
[BE][Easy] apply autofix for ruff rules unnecessary-collection-call (C408): list() / tuple() / dict() ( #130199 )
...
This PR changes the empty collection factory call to Python literals:
- `list()` -> `[]`
- `tuple()` -> `()`
- `dict()` -> `{}`
The Python literals are more performant and safer. For example, the bytecode for building an empty dictionary:
```bash
$ python3 -m dis - <<EOS
import collections
d1 = {}
d2 = dict()
dict = collections.OrderedDict
d3 = dict()
EOS
```
```text
0 0 RESUME 0
1 2 LOAD_CONST 0 (0)
4 LOAD_CONST 1 (None)
6 IMPORT_NAME 0 (collections)
8 STORE_NAME 0 (collections)
3 10 BUILD_MAP 0
12 STORE_NAME 1 (d1)
4 14 PUSH_NULL
16 LOAD_NAME 2 (dict)
18 CALL 0
26 STORE_NAME 3 (d2)
6 28 LOAD_NAME 0 (collections)
30 LOAD_ATTR 8 (OrderedDict)
50 STORE_NAME 2 (dict)
7 52 PUSH_NULL
54 LOAD_NAME 2 (dict)
56 CALL 0
64 STORE_NAME 5 (d3)
66 RETURN_CONST 1 (None)
```
The dict literal `{}` only has one bytecode `BUILD_MAP`, while the factory call `dict()` has three `PUSH_NULL + LOAD_NAME + CALL`. Also, the factory call is not safe if users override the `dict` name in `locals` or `globals` (see the example of replacing with `OrderedDict` above).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130199
Approved by: https://github.com/malfet
2024-07-11 17:30:28 +00:00
chuanqiw
ca023f77bc
[CD] Add pytorch xpu wheel build in nightly ( #129560 )
...
Add pytorch xpu wheel build in nightly after the xpu build image enabling PR https://github.com/pytorch/builder/pull/1879 merged
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129560
Approved by: https://github.com/atalman
2024-07-11 15:49:04 +00:00
Jon Janzen
46c52661bc
Use a better cherry-pick strategy for stable pytorch w/ distribute changes ( #129987 )
...
1. Update the branch name from internal feedback
2. Only cherry-pick in the changes to these folders
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129987
Approved by: https://github.com/seemethere
2024-07-10 20:55:36 +00:00
Catherine Lee
80a421a54d
[TD] Pin numpy to 1.26.0 in indexer ( #130442 )
...
Temporarily pin 1.26.0 to get the workflow working while I go sort out which dependencies need to be updated
Succeeding run: https://github.com/pytorch/pytorch/actions/runs/9877733366/job/27280052419?pr=130442
Tested by adding my branch to the trust relationship for the policy and removing the environment
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130442
Approved by: https://github.com/atalman , https://github.com/malfet
2024-07-10 20:52:24 +00:00
atalman
a1590e16df
Add single Python 3.10, single Cuda 12.1 build with dependencies included ( #130349 )
...
Build large wheel for Python 3.10, CUDA 12.1 that will be used in Colab. Build name: ``manywheel-py3_11-cuda12_1-full-build``
We still have all code to support the full build in builder repo, here:
https://github.com/pytorch/builder/blob/main/manywheel/build_cuda.sh#L151
Test:
```
import sys
import torch
sys.version_info
print(torch.__version__)
sys.version_info
2.3.0+cu121
sys.version_info(major=3, minor=10, micro=12, releaselevel='final', serial=0)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130349
Approved by: https://github.com/malfet
2024-07-10 12:57:39 +00:00
Andrey Talman
17ca0d0edf
Add linux manywheel python 3.13 binary workflows ( #130030 )
...
Test with passing linux manywheel workflows is here: https://github.com/pytorch/pytorch/pull/121979
Builder PR already merged: https://github.com/pytorch/builder/pull/1910
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130030
Approved by: https://github.com/albanD
2024-07-08 22:50:15 +00:00
chuanqiw
d496145534
[CD] Add triton xpu wheel build ( #129730 )
...
Enable triton xpu wheel build firstly, then add pytorch xpu nightly wheel build
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129730
Approved by: https://github.com/atalman
2024-07-04 17:55:20 +00:00
Aaron Gokaslan
6cb0ad3375
[BE]: Update NCCL submodule to 2.21.5 ( #124014 )
...
Update NCCL to the latest version. This release is mostly bugfixes with a few new minor features.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124014
Approved by: https://github.com/eqy , https://github.com/ezyang , https://github.com/nWEIdia , https://github.com/malfet , https://github.com/atalman
2024-07-02 14:39:33 +00:00
Jack Taylor
95a5958db4
[ROCm] Update nightly triton-rocm pin to release branch ( #129361 )
...
Update pin to tip of https://github.com/triton-lang/triton/commits/release/3.0.x/ following upstream strategy here https://github.com/pytorch/pytorch/pull/126098
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129361
Approved by: https://github.com/peterbell10
2024-07-02 11:49:52 +00:00
Zain Rizvi
9645eaaaec
[BE] Improve logging for runner-determinator ( #129679 )
...
This lets us be more flexible about what data we output and throwing exceptions. It's also less likely to break when others make changes (e.g. any print statement would have broken this code before since the printed output was expected to only be a json)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129679
Approved by: https://github.com/zxiiro , https://github.com/jeanschmidt , https://github.com/Skylion007
2024-07-01 22:31:35 +00:00
PyTorch MergeBot
3d96217891
Revert "[BE][Easy] use pathlib.Path instead of dirname / ".." / pardir ( #129374 )"
...
This reverts commit 9e1f3ecaa7 .
Reverted https://github.com/pytorch/pytorch/pull/129374 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is still failing with the same error ([comment](https://github.com/pytorch/pytorch/pull/129374#issuecomment-2197801405 ))
2024-06-29 00:47:15 +00:00
Xuehai Pan
9e1f3ecaa7
[BE][Easy] use pathlib.Path instead of dirname / ".." / pardir ( #129374 )
...
Changes by apply order:
1. Replace all `".."` and `os.pardir` usage with `os.path.dirname(...)`.
2. Replace nested `os.path.dirname(os.path.dirname(...))` call with `str(Path(...).parent.parent)`.
3. Reorder `.absolute()` ~/ `.resolve()`~ and `.parent`: always resolve the path first.
`.parent{...}.absolute()` -> `.absolute().parent{...}`
4. Replace chained `.parent x N` with `.parents[${N - 1}]`: the code is easier to read (see 5.)
`.parent.parent.parent.parent` -> `.parents[3]`
5. ~Replace `.parents[${N - 1}]` with `.parents[${N} - 1]`: the code is easier to read and does not introduce any runtime overhead.~
~`.parents[3]` -> `.parents[4 - 1]`~
6. ~Replace `.parents[2 - 1]` with `.parent.parent`: because the code is shorter and easier to read.~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129374
Approved by: https://github.com/justinchuby , https://github.com/malfet
2024-06-28 00:35:15 +00:00
Zain Rizvi
389492e264
Fix runner determinator bug ( #129612 )
...
Currently the runner determinator is buggy and doesn't let anyone's workflows run against the LF runners (it prefixes a "@" to the user names in the issue instead of either stripping it or prefixing it to the incoming names)
This PR fixes the bug so that people opted in to using LF runners can actually use them. It also puts the python code back into the repo. Even though the code isn't directly invoked, having it there makes testing and linting easier/possible
Also includes lint fixes
Note: if you just review the .yml file you'll see all the relevant diffs
### Testing:
#### Before
```
python .github/scripts/runner_determinator.py --github-token $GH_KEY --github-issue 5132 --github-actor ZainRizvi --github-issue-owner ZainRizvi --github-branch foo
{"label_type": "", "message": "LF Workflows are disabled for ZainRizvi, ZainRizvi. Using meta runners."}
```
#### After
```
python .github/scripts/runner_determinator.py --github-token $GH_KEY --github-issue 5132 --github-actor ZainRizvi --github-issue-owner ZainRizvi --github-branch foo
{"label_type": "lf.", "message": "LF Workflows are enabled for ZainRizvi, ZainRizvi. Using LF runners."}
```
Aside: updated test case after rebase:
```
python .github/scripts/runner_determinator.py --github-token $GH_KEY --github-issue 5132 --github-actor ZainRizvi --github-issue-owner ZainRizvi2 --github-branch foo --github-repo python/pythonss --github-ref-type branch
{"label_type": "lf.", "message": "LF Workflows are enabled for ZainRizvi. Using LF runners."}
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129612
Approved by: https://github.com/zxiiro , https://github.com/jeanschmidt
2024-06-27 17:51:09 +00:00
Catherine Lee
90f82426b9
RS migration - trymerge to upload merge records to s3 ( #129503 )
...
Uploads merge records to to ossci-raw-job-status (public) bucket instead of directly to rockset
The runner used by trymerge is a GH runner, so it doesn't have access to s3. Instead, I save the record as a json and upload the json to s3 in a different step that runs after the aws credentials are configured.
The role is defined [here](https://togithub.com/pytorch-labs/pytorch-gha-infra/pull/421 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129503
Approved by: https://github.com/huydhn , https://github.com/ZainRizvi , https://github.com/malfet
2024-06-26 19:06:52 +00:00
PyTorch MergeBot
895316119d
Revert "[BE][Easy] use pathlib.Path instead of dirname / ".." / pardir ( #129374 )"
...
This reverts commit 0314c4c101 .
Reverted https://github.com/pytorch/pytorch/pull/129374 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it causes lots of internal build failures where they fail to find hipify module ([comment](https://github.com/pytorch/pytorch/pull/129374#issuecomment-2192437052 ))
2024-06-26 19:03:57 +00:00
Jean Schmidt
53fafdd0c3
[BE] Runner determinator: more resilient user matching ( #129462 )
...
Small improvements on runner determinator script:
* Don't do splitting of the issue comment, unless necessary;
* Match username against a set over a list;
* Match both triggering_actor and issue owner over only actor (to avoid edge cases, where we get `pytorch-bot[bot]`)
* Add stripping, to remove potential breaking and not visible whitespaces;
* Don't use linux.4xlarge as a runner: it should not depend on meta runners, for reliability;
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129462
Approved by: https://github.com/zxiiro , https://github.com/ZainRizvi
2024-06-26 13:47:52 +00:00
Huy Do
cda4d4887d
Skip signals from older runs of the same workflows ( #129291 )
...
I discovered this bug in trymerge when debugging https://github.com/pytorch/pytorch/pull/129013 in which Dr.CI reported no relevant failures while mergebot complained about some unrelated ROCm failures https://github.com/pytorch/pytorch/pull/129013#issuecomment-2183009217 .
It turns out that mergebot took into account stale signals from older runs of the same workflow here. For example,
* https://github.com/pytorch/pytorch/actions/runs/9604985361 was the first run where it had a ROCm failure
* While https://github.com/pytorch/pytorch/actions/runs/9608926565 was the second attempt and it was all green
Notice that both runs came from the same push to commit [be69191](be69191f2d ) with [ciflow/rocm/129013](https://github.com/pytorch/pytorch/tree/ciflow/rocm/129013 ). So, we just need to check the signals from the newer run.
Note that Dr.CI handles this part correctly using the logic in https://github.com/pytorch/test-infra/blob/main/torchci/pages/api/drci/drci.ts#L1079-L1088 . So, the fix in this PR is to bring the same logic to trymerge.
### Testing
`pytest -v test_trymerge.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129291
Approved by: https://github.com/ZainRizvi
2024-06-26 03:49:09 +00:00
Xuehai Pan
0314c4c101
[BE][Easy] use pathlib.Path instead of dirname / ".." / pardir ( #129374 )
...
Changes by apply order:
1. Replace all `".."` and `os.pardir` usage with `os.path.dirname(...)`.
2. Replace nested `os.path.dirname(os.path.dirname(...))` call with `str(Path(...).parent.parent)`.
3. Reorder `.absolute()` ~/ `.resolve()`~ and `.parent`: always resolve the path first.
`.parent{...}.absolute()` -> `.absolute().parent{...}`
4. Replace chained `.parent x N` with `.parents[${N - 1}]`: the code is easier to read (see 5.)
`.parent.parent.parent.parent` -> `.parents[3]`
5. ~Replace `.parents[${N - 1}]` with `.parents[${N} - 1]`: the code is easier to read and does not introduce any runtime overhead.~
~`.parents[3]` -> `.parents[4 - 1]`~
6. ~Replace `.parents[2 - 1]` with `.parent.parent`: because the code is shorter and easier to read.~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129374
Approved by: https://github.com/justinchuby , https://github.com/malfet
2024-06-25 08:28:38 +00:00
Zain Rizvi
4d04203852
[BE] Runner determinator: Expect usernames to be prefixed with '@' ( #129246 )
...
Expect the username in the runner rollover issue (https://github.com/pytorch/test-infra/issues/5132 ) to be prefixed with a "@".
This will make typos way less likely since github's autocomplete/autoformating will help out
For now, I've updated the issue to have usernames both with and without the @ while this change rolls out
Testing:
Ran the script locally on both this issue and a new test issue and verified they both had the expected output:
```
(venv) (base) ➜ ~/pytorch git:(zainr/improve-get-workflow-type)
python .github/scripts/get_workflow_type.py --github-token github_pat_*** --github-issue 5132 --github-user ZainRizvi --github-branch "zainr/stuff"
{"label_type": "lf.", "message": "LF Workflows are enabled for ZainRizvi. Using LF runners."}
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129246
Approved by: https://github.com/zxiiro , https://github.com/huydhn
2024-06-25 02:39:33 +00:00
PaliC
b0044e2e18
[Split Build] Support nightly release ( #129011 )
...
This PR adds the split build to our binaries workflow. Validation for the workflow is done using the PR above in conjunction with https://github.com/pytorch/builder/pull/1876 .
Test Workflow: Check CI in the workflow above
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129011
Approved by: https://github.com/atalman
2024-06-22 05:45:14 +00:00
Jithun Nair
a6ac6447b5
Re-enable py3.12 nightly wheel builds and add triton dependency for ROCm ( #128525 )
...
The llnl-hatchet developers have published the py3.12 binaries on [PyPI](https://pypi.org/project/llnl-hatchet/#files ). In fact, looking [here](https://download.pytorch.org/whl/nightly/llnl-hatchet ), it seems we already have the py3.12 wheels mirrored. This should allow us to re-enable py3.12 binaries for ROCm.
This PR reverts commit 9d849d4312 .
It also adds the pytorch-triton-rocm dependency for torch wheels on ROCm since pytorch-triton-rocm py3.12 wheels are available now
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128525
Approved by: https://github.com/malfet
2024-06-19 21:56:54 +00:00
Thanh Ha
4bc90185fb
fix: Print statements causing parse error ( #128969 )
...
The print statements for the get_workflow_type script is problematic because the shell script calling this script is expecting the output to only be JSON. This PR resolves this by removing all print statements to covert them to a message field in the JSON return output so that the output can continue to expect to be JSON while giving us the debug data we are looking for.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128969
Approved by: https://github.com/tylertitsworth , https://github.com/ZainRizvi
2024-06-19 01:17:08 +00:00
Huy Do
84c86e56bd
Update tracker issues after successfully cherry-picking a PR ( #128924 )
...
This extends the capacity of the cherry-pick bot to automatically update the tracker issue with the information. For this to work, the tracker issue needs to be an open one with a `release tracker` label, i.e. https://github.com/pytorch/pytorch/issues/128436 . The version from the release branch, i.e. `release/2.4`, will be match with the title of the tracker issue, i.e. `[v.2.4.0] Release Tracker` or `[v.2.4.1] Release Tracker`
### Testing
`python cherry_pick.py --onto-branch release/2.4 --classification release --fixes "DEBUG DEBUG" --github-actor huydhn 128718`
* On the PR https://github.com/pytorch/pytorch/pull/128718#issuecomment-2174846771
* On the tracker issue https://github.com/pytorch/pytorch/issues/128436#issuecomment-2174846757
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128924
Approved by: https://github.com/atalman
2024-06-18 17:48:47 +00:00
Nikita Shulga
b94c52dd29
[GHF] Refuse merge to non-default branch ( #128710 )
...
Unless PR is ghstack one
Test plan:
```
% GITHUB_TOKEN=$(gh auth token) python3 -c "from trymerge import GitHubPR; pr=GitHubPR('pytorch', 'pytorch', 128591); print(pr.base_ref(), pr.default_branch())"
release/2.4 main
```
Fixes: https://github.com/pytorch/test-infra/issues/5339
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128710
Approved by: https://github.com/seemethere , https://github.com/atalman
2024-06-14 18:23:25 +00:00
PyTorch MergeBot
ee140a198f
Revert "[Port][Quant][Inductor] Bug fix: mutation nodes not handled correctly for QLinearPointwiseBinaryPT2E ( #128591 )"
...
This reverts commit 03e8a4cf45 .
Reverted https://github.com/pytorch/pytorch/pull/128591 on behalf of https://github.com/atalman due to Contains release only changes should not be landed ([comment](https://github.com/pytorch/pytorch/pull/128591#issuecomment-2168308233 ))
2024-06-14 15:51:00 +00:00
Xia, Weiwen
03e8a4cf45
[Port][Quant][Inductor] Bug fix: mutation nodes not handled correctly for QLinearPointwiseBinaryPT2E ( #128591 )
...
Port #127592 from main to release/2.4
------
Fixes #127402
- Revert some changes to `ir.MutationOutput` and inductor/test_flex_attention.py
- Add checks of mutation for QLinearPointwiseBinaryPT2E
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127592
Approved by: https://github.com/leslie-fang-intel , https://github.com/Chillee
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128591
Approved by: https://github.com/jgong5 , https://github.com/Chillee
2024-06-14 09:31:38 +00:00
Zain Rizvi
b05b8d3989
[EZ][ALI Migration] Add logging for workflow type determination ( #128619 )
...
To help figure out what went wrong when the wrong label appears to have been set
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128619
Approved by: https://github.com/zxiiro , https://github.com/clee2000
2024-06-13 16:37:07 +00:00
DanilBaibak
6d1b1ddd3e
Select Runner Label Dynamically ( #127287 )
...
Updated `get_workflow_type.py` logic to dynamically select a prefix for the runner label.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127287
Approved by: https://github.com/ZainRizvi
2024-06-12 18:47:47 +00:00
Aaron Orenstein
3c971d2ef3
Flip default value for mypy disallow_untyped_defs [final] ( #127836 )
...
Not requiring all functions to have types allows a lot of 'Any' types to slip in - which poison types and make mypy unable to properly typecheck the code. I want to flip the default so that new files are required to have fully typed defs and we can have a burndown list of files that fail to require full types.
The preceding stack of PRs (cut up simply to limit the number of file changes per PR "reasonable") adds `# mypy: allow-untyped-defs` to any file which didn't immediately pass mypy with the flag flipped. Due to changing files and merge conflicts it will probably be necessary to have several passes through before landing this final PR which turns the option on.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127836
Approved by: https://github.com/oulgen , https://github.com/Skylion007
2024-06-12 15:28:42 +00:00
Eddie Yan
de4f8b9946
[BE]: Update cudnn to 9.1.0.70 ( #123475 )
...
cuDNN has managed to upload cu11 and cu12 wheels for ~~9.0.0.312~~ 9.1.0.70, so trying this out...
CC @Skylion007 @malfet
Co-authored-by: Wei Wang <weiwan@nvidia.com>
Co-authored-by: atalman <atalman@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123475
Approved by: https://github.com/Skylion007 , https://github.com/malfet , https://github.com/nWEIdia , https://github.com/atalman
2024-06-06 18:45:22 +00:00
Catherine Lee
936225d7b2
[mergebot] Fix pending unstable jobs being viewed as failed ( #128080 )
...
https://github.com/pytorch/pytorch/pull/128038#issuecomment-2150802030
In the above, pending unstable jobs get put into the ok_failed_checks list, and because there are a lot of unstable jobs, it exceeds the threshold and merge fails.
I don't think unstable jobs should be considered in the ok failed checks threshold, only flaky and broken trunk jobs should be considered there.
Change looks big, but main thing is that unstable jobs don't get included in the check for how many flaky failures there are. The other changes are mostly renames so things are clearer
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128080
Approved by: https://github.com/huydhn
2024-06-06 18:22:20 +00:00
Jithun Nair
9d849d4312
Disable py3.12 nightly wheel builds for ROCm ( #127968 )
...
Triton commit bump PR https://github.com/pytorch/pytorch/pull/125396 reverted due to missing llnl-hatchet dependency for triton. Workaround is to disable py3.12 binary build jobs for ROCm on PyTorch CI until llnl-hatchet publishes py3.12 wheels on [PyPI](https://pypi.org/project/llnl-hatchet/#files )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127968
Approved by: https://github.com/atalman , https://github.com/pruthvistony
2024-06-06 15:17:35 +00:00
PyTorch MergeBot
9a8ab778d3
Revert "[BE]: Update cudnn to 9.1.0.70 ( #123475 )"
...
This reverts commit c490046693 .
Reverted https://github.com/pytorch/pytorch/pull/123475 on behalf of https://github.com/huydhn due to CUDA trunk jobs are pretty red after this change, and the forward fix https://github.com/pytorch/pytorch/pull/127984 does not look working ([comment](https://github.com/pytorch/pytorch/pull/123475#issuecomment-2149258430 ))
2024-06-05 08:59:53 +00:00
Eddie Yan
c490046693
[BE]: Update cudnn to 9.1.0.70 ( #123475 )
...
cuDNN has managed to upload cu11 and cu12 wheels for ~~9.0.0.312~~ 9.1.0.70, so trying this out...
CC @Skylion007 @malfet
Co-authored-by: Wei Wang <weiwan@nvidia.com>
Co-authored-by: atalman <atalman@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123475
Approved by: https://github.com/Skylion007 , https://github.com/malfet , https://github.com/nWEIdia
2024-06-04 16:33:06 +00:00
Huy Do
57baae9c9b
Migrating CI/CD jobs to macOS 14 ( #127582 )
...
We have half the fleet in MacoS 14 already and it has been running fine so far https://github.com/pytorch/pytorch/issues/127490 . So, I'm preparing the final push to replace the rest of them. This also switches release build from 13 to 14 (GitHub runners)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127582
Approved by: https://github.com/atalman
2024-05-31 22:30:59 +00:00
Catherine Lee
121c55d8d1
Old branch deletion script to also delete old ciflow tags ( #127625 )
...
Change branch deletion script to also delete left over ciflow tags that the bot doesn't get to, as well as the one created by triggering a workflow on HUD
Example run https://github.com/pytorch/pytorch/actions/runs/9322082915/job/25662376463?pr=127625
(didn't actually delete the tag, but lists what tags it would delete)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127625
Approved by: https://github.com/huydhn
2024-05-31 18:54:54 +00:00
Svetlana Karslioglu
4a0d96e496
Add a GH action to autolabel docathon PRs ( #127569 )
...
To ease oncall burden for the docathon PR reviewers and ensure all PRs are correctly labeled, adding this GH action that will look for the issue number in the PR and if that issue has a docathon-h1-2024 label, then it would propagate the labels from the issues into the PR. It should not conflict with the existing labelers because we use ``pull_request.add_to_labels`` - credit @kit1980.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127569
Approved by: https://github.com/kit1980
2024-05-31 17:57:07 +00:00
Jon Janzen
781f26240a
Add script to copy distributed commits to stable branch ( #126918 )
...
This will be used as part of a prototype of a stable pytorch with a fast-moving distributed folder
Tasks: T189915739
Test plan:
I ran the script in a few configurations on my local machine. It worked as expected
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126918
Approved by: https://github.com/seemethere , https://github.com/malfet
2024-05-29 03:33:44 +00:00
Ting Lu
1c2e221e25
CUDA 12.4 ARM wheel integration to CD - nightly build ( #126174 )
...
rebasing https://github.com/pytorch/pytorch/pull/124112 .
too many conflict files, so starting a new PR.
Test https://github.com/pytorch/builder/pull/1775 (merged) for ARM wheel addition
Test https://github.com/pytorch/builder/pull/1828 (merged) for setting MAX_JOBS
Current issue to follow up:
https://github.com/pytorch/pytorch/issues/126980
Co-authored-by: Aidyn-A <aidyn.b.aitzhan@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126174
Approved by: https://github.com/nWEIdia , https://github.com/atalman
2024-05-27 05:50:36 +00:00
Xuehai Pan
ba3b05fdf3
[1/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort stdlib ( #127122 )
...
The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127122
Approved by: https://github.com/kit1980
2024-05-25 08:25:50 +00:00
Wei Wang
0902929d58
[CUDA] [CI]: Enable CUDA 12.4 CI ( #121956 )
...
Reference PR: https://github.com/pytorch/pytorch/pull/93406
Co-authored-by: Aidyn-A <31858918+Aidyn-A@users.noreply.github.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121956
Approved by: https://github.com/atalman
2024-05-23 20:37:47 +00:00
Catherine Lee
5ccc634603
[CI] Pin uv==0.1.45 for lintrunner ( #126908 )
...
e4623de4cf/1
```
2024-05-22T19:10:48.5974515Z + python3 -m pip install uv
2024-05-22T19:10:48.5975198Z Collecting uv
2024-05-22T19:10:48.5976496Z Downloading uv-0.1.45-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (32 kB)
2024-05-22T19:10:48.5977828Z Downloading uv-0.1.45-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.8 MB)
2024-05-22T19:10:48.5986243Z [?25l [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/12.8 MB[0m [31m?[0m eta [36m-:--:--[0m
2024-05-22T19:10:48.5988326Z [2K [91m━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━[0m [32m6.8/12.8 MB[0m [31m205.8 MB/s[0m eta [36m0:00:01[0m
2024-05-22T19:10:48.5990300Z [2K [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m12.8/12.8 MB[0m [31m215.1 MB/s[0m eta [36m0:00:01[0m
2024-05-22T19:10:48.5991645Z [2K [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m12.8/12.8 MB[0m [31m215.1 MB/s[0m eta [36m0:00:01[0m
2024-05-22T19:10:48.5992724Z [2K [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m97.8 MB/s[0m eta [36m0:00:00[0m
2024-05-22T19:10:48.5993443Z [?25hInstalling collected packages: uv
2024-05-22T19:10:48.5993950Z Successfully installed uv-0.1.45
2024-05-22T19:10:48.5994363Z + CACHE_DIRECTORY=/tmp/.lintbin
2024-05-22T19:10:48.5994772Z + [[ -d /tmp/.lintbin ]]
2024-05-22T19:10:48.5995157Z + cp -r /tmp/.lintbin .
2024-05-22T19:10:48.5995497Z + lintrunner init
2024-05-22T19:10:48.5995839Z + [[ 1 == \1 ]]
```
vs
```
2024-05-22T20:33:53.5563991Z + python3 -m pip install uv
2024-05-22T20:33:53.5564921Z Collecting uv
2024-05-22T20:33:53.5566259Z Downloading uv-0.2.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (32 kB)
2024-05-22T20:33:53.5568142Z Downloading uv-0.2.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.9 MB)
2024-05-22T20:33:53.5570253Z [?25l [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/12.9 MB[0m [31m?[0m eta [36m-:--:--[0m
2024-05-22T20:33:53.5571889Z [2K [91m━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━[0m [32m7.0/12.9 MB[0m [31m208.8 MB/s[0m eta [36m0:00:01[0m
2024-05-22T20:33:53.5573716Z [2K [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m12.9/12.9 MB[0m [31m206.7 MB/s[0m eta [36m0:00:01[0m
2024-05-22T20:33:53.5575478Z [2K [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m12.9/12.9 MB[0m [31m206.7 MB/s[0m eta [36m0:00:01[0m
2024-05-22T20:33:53.5577240Z [2K [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.9/12.9 MB[0m [31m101.6 MB/s[0m eta [36m0:00:00[0m
2024-05-22T20:33:53.5578531Z [?25hInstalling collected packages: uv
2024-05-22T20:33:53.5579316Z Successfully installed uv-0.2.1
2024-05-22T20:33:53.5580033Z + CACHE_DIRECTORY=/tmp/.lintbin
2024-05-22T20:33:53.5580640Z + [[ -d /tmp/.lintbin ]]
2024-05-22T20:33:53.5581229Z + cp -r /tmp/.lintbin .
2024-05-22T20:33:53.5581799Z + lintrunner init
2024-05-22T20:33:53.5603302Z Traceback (most recent call last):
2024-05-22T20:33:53.5604857Z File "/home/ec2-user/actions-runner/_work/pytorch/pytorch/test-infra/.github/scripts/run_with_env_secrets.py", line 101, in <module>
2024-05-22T20:33:53.5605805Z main()
2024-05-22T20:33:53.5606687Z File "/home/ec2-user/actions-runner/_work/pytorch/pytorch/test-infra/.github/scripts/run_with_env_secrets.py", line 97, in main
2024-05-22T20:33:53.5607762Z run_cmd_or_die(f"docker exec -t {container_name} /exec")
2024-05-22T20:33:53.5608949Z File "/home/ec2-user/actions-runner/_work/pytorch/pytorch/test-infra/.github/scripts/run_with_env_secrets.py", line 38, in run_cmd_or_die
2024-05-22T20:33:53.5610107Z raise RuntimeError(f"Command {cmd} failed with exit code {exit_code}")
2024-05-22T20:33:53.5611328Z RuntimeError: Command docker exec -t e551764bdba0c87c2fc392fba9ea265e8821a552915b36010f18299d8035b304 /exec failed with exit code 1
2024-05-22T20:33:53.5626540Z ##[error]Process completed with exit code 1.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126908
Approved by: https://github.com/huydhn
2024-05-22 21:41:21 +00:00
Catherine Lee
ac2c547838
[TD] Upload names of failures to s3 for pytest cache ( #126315 )
...
Some tests don't get run through pytest and pytest crashes when a test segfaults, so in both caess, the pytest cache won't have an entry (similar to https://github.com/pytorch/test-infra/pull/5205 ).
Instead, manually upload/download an extra file that lists the failing test files
Technically this would be more general than the pytest cache
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126315
Approved by: https://github.com/ZainRizvi
2024-05-21 16:29:31 +00:00
PyTorch MergeBot
8bca0847c2
Revert "[TD] Upload names of failures to s3 for pytest cache ( #126315 )"
...
This reverts commit 655038687a .
Reverted https://github.com/pytorch/pytorch/pull/126315 on behalf of https://github.com/clee2000 due to broke inductor ([comment](https://github.com/pytorch/pytorch/pull/126315#issuecomment-2121133045 ))
2024-05-20 20:15:08 +00:00
Catherine Lee
655038687a
[TD] Upload names of failures to s3 for pytest cache ( #126315 )
...
Some tests don't get run through pytest and pytest crashes when a test segfaults, so in both caess, the pytest cache won't have an entry (similar to https://github.com/pytorch/test-infra/pull/5205 ).
Instead, manually upload/download an extra file that lists the failing test files
Technically this would be more general than the pytest cache
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126315
Approved by: https://github.com/ZainRizvi
2024-05-20 17:36:30 +00:00
Aleksei Nikiforov
da7ced6e8c
S390x binaries ( #120398 )
...
Allow building nightly, rc and release binaries for s390x.
This PR implements building binaries, but publishing part is currently missing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120398
Approved by: https://github.com/huydhn
2024-05-11 02:32:25 +00:00