Commit graph

892 commits

Author SHA1 Message Date
Xuehai Pan
2293fe1024 [BE][Easy] use pathlib.Path instead of dirname / ".." / pardir (#129374)
Changes by apply order:

1. Replace all `".."` and `os.pardir` usage with `os.path.dirname(...)`.
2. Replace nested `os.path.dirname(os.path.dirname(...))` call with `str(Path(...).parent.parent)`.
3. Reorder `.absolute()` ~/ `.resolve()`~ and `.parent`: always resolve the path first.

    `.parent{...}.absolute()` -> `.absolute().parent{...}`

4. Replace chained `.parent x N` with `.parents[${N - 1}]`: the code is easier to read (see 5.)

    `.parent.parent.parent.parent` -> `.parents[3]`

5. ~Replace `.parents[${N - 1}]` with `.parents[${N} - 1]`: the code is easier to read and does not introduce any runtime overhead.~

    ~`.parents[3]` -> `.parents[4 - 1]`~

6. ~Replace `.parents[2 - 1]` with `.parent.parent`: because the code is shorter and easier to read.~

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129374
Approved by: https://github.com/justinchuby, https://github.com/malfet
2024-12-21 22:08:01 +00:00
atalman
2400db115c Use Manylinux 2.28 for nightly build and cxx11-abi (#143423)
As per: https://dev-discuss.pytorch.org/t/pytorch-linux-wheels-switching-to-new-wheel-build-platform-manylinux-2-28-on-november-12-2024/2581

Linux Builds: CPU, CUDA 11.8, CUDA 12.4 switched to Manylinux 2.28 and D_GLIBCXX_USE_CXX11_ABI=1 on the week of Dec 16

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143423
Approved by: https://github.com/huydhn, https://github.com/malfet, https://github.com/seemethere
2024-12-18 02:02:58 +00:00
Nikita Shulga
d83a049232 [EZ] Update lintrunner in CI to 0.12.7 (#143073)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143073
Approved by: https://github.com/wdvr
2024-12-12 15:35:37 +00:00
Tom Ritchford
498a7808ff Fix unused Python variables outside torch/ and test/ (#136359)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136359
Approved by: https://github.com/albanD
2024-12-11 17:10:23 +00:00
Huy Do
6f8751dcc9 Fix timeout check workflow lint job (#142476)
Fixes https://github.com/pytorch/pytorch/issues/142485

The workflow check lint job timed out in trunk, i.e. https://github.com/pytorch/pytorch/actions/runs/12261226178/job/34207762939, and here was what happened:

1. https://github.com/pytorch/pytorch/pull/142294 landed yesterday to build ROCm on 3.13, but the PR had a landrace with https://github.com/pytorch/pytorch/pull/142282 in the generated workflow file
2. The trunk lint check caught that in https://github.com/pytorch/pytorch/blob/main/.github/scripts/report_git_status.sh#L2
3. However, the script also attempted to print the difference with `git diff .github/workflows`.  This command was the one that stuck because `git diff` uses page by default and requires a prompt to display the next page ¯\_(ツ)_/¯

It took so long to debug this because a timeout Nova GHA doesn't print any progress.  I'll create an issue for this.

Bonus:

I also fix the broken print from test tool lint job that confuses GitHub https://github.com/pytorch/pytorch/actions/runs/12261226178 with an annotation failure `Credentials could not be loaded, please check your action inputs`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142476
Approved by: https://github.com/wdvr
2024-12-10 20:47:22 +00:00
Ting Lu
f26b75b7ac [aarch64] add CUDA 12.6 sbsa nightly binary (#142335)
related to #138440

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142335
Approved by: https://github.com/atalman
2024-12-10 06:19:28 +00:00
Jithun Nair
a1b5067297 Enable py3.13 wheels for ROCm (#142294)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/142294
Approved by: https://github.com/huydhn
2024-12-10 01:10:24 +00:00
chuanqiw
b64a537993 [CD] xpu nightly manylinux whl with cxx11-abi (#142210)
Follow https://github.com/pytorch/pytorch/issues/123649
Works for https://github.com/pytorch/pytorch/issues/114850
Pull Request resolved: https://github.com/pytorch/pytorch/pull/142210
Approved by: https://github.com/EikanWang, https://github.com/atalman, https://github.com/malfet
2024-12-06 15:10:47 +00:00
atalman
c6c45467a3 Use cxx11-abi for Linux CUDA 12.6 builds (#142064)
Manylinux 2.28 and cxx11-abi migration. Please see: https://dev-discuss.pytorch.org/t/pytorch-linux-wheels-switching-to-new-wheel-build-platform-manylinux-2-28-on-november-12-2024/2581
Pull Request resolved: https://github.com/pytorch/pytorch/pull/142064
Approved by: https://github.com/kit1980, https://github.com/malfet
2024-12-05 14:51:50 +00:00
Jithun Nair
9dffd12f90 Upgrade ROCm wheels to manylinux2_28 - 2 of 2 (binaries) (#141423)
Depends on https://github.com/pytorch/pytorch/pull/140681 and https://github.com/pytorch/pytorch/pull/141609

Highlights:
* Upgrade binaries to ROCm6.2.4 to use latest docker images
* Remove pre-cxx11 builds for libtorch on ROCm
* Use manylinux2_28 docker images for ROCm
* Set `DESIRED_DEVTOOLSET=cxx-abi` (and hence `_GLIBCXX_USE_CXX11_ABI=1`) for ROCm manylinux2_28 wheels (ROCm RHEL8 packages also have GCC_ABI=1, so it keeps it consistent)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141423
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
Co-authored-by: Pruthvi Madugundu <pruthvigithub@gmail.com>
2024-12-04 07:00:25 +00:00
Ting Lu
e5f5283ab2 Fix cuda arch full version for 12.6 (#141976)
follow up for https://github.com/pytorch/pytorch/pull/141433/files
build still showing up as 12.6.2 in the name, see latest https://github.com/pytorch/pytorch/actions/runs/12134985224/job/33833276884.

related to https://github.com/pytorch/pytorch/issues/138440

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141976
Approved by: https://github.com/atalman, https://github.com/nWEIdia, https://github.com/Skylion007
2024-12-03 20:33:01 +00:00
atalman
0f3f801fc2 Add windows CUDA 12.6 nightly builds (#141805)
Windows AMI was published to prod. This PR adds CUDA 12.6 nightly builds

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141805
Approved by: https://github.com/huydhn, https://github.com/Skylion007
2024-11-30 14:39:47 +00:00
chuanqiw
a23ac6f8bd [CD] Enable pypi dependencies both for XPU linux and Windows whls (#141135)
Enable xpu runtime pypi packages as dependencies of XPU CD wheels both for Linux and Windows.
Fixes https://github.com/pytorch/pytorch/issues/135867
Works for https://github.com/pytorch/pytorch/issues/139722 and https://github.com/pytorch/pytorch/issues/114850
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141135
Approved by: https://github.com/atalman
2024-11-29 21:35:07 +00:00
Aaron Gokaslan
7224cd4471 [BE]: Update 12.6 builds to CUDA 12.6.3 (#141433)
Update CUDA 12.6 to Update 3 and make cusparse-lt 0.6.3? #141365 Was going to leave some comments on #141365, but though it was just faster to open a PR here.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141433
Approved by: https://github.com/atalman
2024-11-28 22:01:47 +00:00
atalman
0f261e8f77 Add Manylinux2014 and Manylinux 2.28 config to triton builds. Run auditwheel on triton binaries (#141704)
This PR combines Manylinux 2_28 and Manylinux 2014  builds of triton under one workflow. This is required in order to support torch cpu, cuda 118, cuda 12.4 wheels built with Manylinux 2014 and torch cuda 12.6 wheels built with Manylinux 2_28.

Manylinux 2014 wheels:
``pytorch_triton-3.2.0+git35c6c7c6-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl``
Manylinux 2_28 wheels:
``pytorch_triton-3.2.0+git35c6c7c6-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl``

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141704
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/huydhn
2024-11-28 13:40:39 +00:00
atalman
893a4390c9 Use cuda 12.6 wheels with Manylinux 2.28. Use Manylinux2014 for CPU, CUDA11.8, CUDA12.4 (#141565)
For release 2.6 we will be using only CUDA 12.6 binaries on Manylinux 2.28.
Issue: https://github.com/pytorch/pytorch/issues/123649
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141565
Approved by: https://github.com/Skylion007, https://github.com/huydhn, https://github.com/malfet
2024-11-26 19:36:42 +00:00
Aleksei Nikiforov
0ce0e44237 Add workaround for potential runners issue on s390x (#141239)
More information is at
https://gitlab.com/qemu-project/qemu/-/issues/2600

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141239
Approved by: https://github.com/huydhn
2024-11-22 22:17:55 +00:00
Xuehai Pan
2a6eaa2e6f Refactor nightly pull tool to use venv and pip (#141281)
Resolves #141238

- #141238

Example output:

```console
$ python3.12 tools/nightly.py checkout -b my-nightly-branch -p my-env --python python3.10
log file: /Users/PanXuehai/Projects/pytorch/nightly/log/2024-11-22_04h15m45s_63f8b29e-a845-11ef-bbf9-32c784498a7b/nightly.log
Creating virtual environment
Creating venv (Python 3.10.15): /Users/PanXuehai/Projects/pytorch/my-env
Installing packages
Upgrading package(s) (https://download.pytorch.org/whl/nightly/cpu): pip, setuptools, wheel
Installing packages took 5.576 [s]
Creating virtual environment took 9.505 [s]
Downloading packages
Downloading package(s) (https://download.pytorch.org/whl/nightly/cpu): torch
Downloaded 9 file(s) to /var/folders/sq/7sf73d5s2qnb3w6jjsmhsw3h0000gn/T/pip-download-lty5dvz4:
  - mpmath-1.3.0-py3-none-any.whl
  - torch-2.6.0.dev20241121-cp310-none-macosx_11_0_arm64.whl
  - jinja2-3.1.4-py3-none-any.whl
  - sympy-1.13.1-py3-none-any.whl
  - MarkupSafe-3.0.2-cp310-cp310-macosx_11_0_arm64.whl
  - networkx-3.4.2-py3-none-any.whl
  - fsspec-2024.10.0-py3-none-any.whl
  - filelock-3.16.1-py3-none-any.whl
  - typing_extensions-4.12.2-py3-none-any.whl
Downloading packages took 7.628 [s]
Installing dependencies
Installing packages
Installing package(s) (https://download.pytorch.org/whl/nightly/cpu): numpy, cmake, ninja, packaging, ruff, mypy, pytest, hypothesis, ipython, rich, clang-format, clang-tidy, sphinx, mpmath-1.3.0-py3-none-any.whl, jinja2-3.1.4-py3-none-any.whl, sympy-1.13.1-py3-none-any.whl, MarkupSafe-3.0.2-cp310-cp310-macosx_11_0_arm64.whl, networkx-3.4.2-py3-none-any.whl, fsspec-2024.10.0-py3-none-any.whl, filelock-3.16.1-py3-none-any.whl, typing_extensions-4.12.2-py3-none-any.whl
Installing packages took 42.514 [s]
Installing dependencies took 42.515 [s]
Unpacking wheel file
Unpacking wheel file took 3.223 [s]
Checking out nightly PyTorch
Found released git version ac47a2d971
Found nightly release version e0482fdf95
Switched to a new branch 'my-nightly-branch'
Checking out nightly PyTorch took 0.198 [s]
Moving nightly files into repo
Linking /var/folders/sq/7sf73d5s2qnb3w6jjsmhsw3h0000gn/T/wheel-dljxil5i/torch-2.6.0.dev20241121/torch/_C.cpython-310-darwin.so -> /Users/PanXuehai/Projects/pytorch/torch/_C.cpython-310-darwin.so
Linking /var/folders/sq/7sf73d5s2qnb3w6jjsmhsw3h0000gn/T/wheel-dljxil5i/torch-2.6.0.dev20241121/torch/lib/libtorch_python.dylib -> /Users/PanXuehai/Projects/pytorch/torch/lib/libtorch_python.dylib
...
Linking /var/folders/sq/7sf73d5s2qnb3w6jjsmhsw3h0000gn/T/wheel-dljxil5i/torch-2.6.0.dev20241121/torch/include/c10/macros/Macros.h -> /Users/PanXuehai/Projects/pytorch/torch/include/c10/macros/Macros.h
Moving nightly files into repo took 11.426 [s]
Writing pytorch-nightly.pth
Writing pytorch-nightly.pth took 0.036 [s]
-------
PyTorch Development Environment set up!
Please activate to enable this environment:

  $ source /Users/PanXuehai/Projects/pytorch/my-env/bin/activate
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141281
Approved by: https://github.com/seemethere
2024-11-22 20:03:55 +00:00
Bert Maher
57fc070e08 [triton] Update pin for PyTorch 2.6/Triton 3.2 (#139206)
Bump the Triton pin to the release candidate commit for Triton 3.2.

A few changes beyond the pin bump itself are needed:
* Remove the script that adds a git version hash suffix to the Triton wheel, since as of https://github.com/triton-lang/triton/pull/4812 Triton adds that itself
* Add `pybind11` to the Triton build setup, since Triton now depends on it
* Use manylinux-2.28 for the Triton wheel builder, and use clang+lld for building to pick up the right glibc

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139206
Approved by: https://github.com/malfet, https://github.com/atalman

Co-authored-by: Andrey Talman <atalman@fb.com>
2024-11-22 18:34:32 +00:00
PyTorch MergeBot
44d5012a80 Revert "[triton] Update pin for PyTorch 2.6/Triton 3.2 (#139206)"
This reverts commit c93e57efac.

Reverted https://github.com/pytorch/pytorch/pull/139206 on behalf of https://github.com/atalman due to Will revert and reland skipping xpu builds ([comment](https://github.com/pytorch/pytorch/pull/139206#issuecomment-2494437857))
2024-11-22 18:01:18 +00:00
Bert Maher
c93e57efac [triton] Update pin for PyTorch 2.6/Triton 3.2 (#139206)
Bump the Triton pin to the release candidate commit for Triton 3.2.

A few changes beyond the pin bump itself are needed:
* Remove the script that adds a git version hash suffix to the Triton wheel, since as of https://github.com/triton-lang/triton/pull/4812 Triton adds that itself
* Add `pybind11` to the Triton build setup, since Triton now depends on it
* Use manylinux-2.28 for the Triton wheel builder, and use clang+lld for building to pick up the right glibc

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139206
Approved by: https://github.com/malfet, https://github.com/atalman

Co-authored-by: Andrey Talman <atalman@fb.com>
2024-11-22 14:50:22 +00:00
Aaron Gokaslan
765a347d21 [BE]: Update CUDNN for Linux to 9.5.1.17 for 12.6 only (#137978)
* Significantly faster, better CUDNN Attention especially on Hopper (FA3 implementation?)
* Lots of bugfixes
* Better performance
* More numerically stable / fixed heuristics
* More functionality for SDPA

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137978
Approved by: https://github.com/eqy, https://github.com/drisspg, https://github.com/nWEIdia, https://github.com/atalman, https://github.com/malfet
2024-11-20 23:11:39 +00:00
atalman
99a03211cb Deprecate conda nightly builds (#141024)
Removing CD as per https://github.com/pytorch/pytorch/issues/138506

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141024
Approved by: https://github.com/malfet
2024-11-19 16:09:54 +00:00
atalman
cec82c3aed Use Manylinux 2.28 for aarch64 CPU workflows (#140743)
Use https://hub.docker.com/r/pytorch/manylinux2_28_aarch64-builder/tags

Similar to https://github.com/pytorch/pytorch/pull/138732
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140743
Approved by: https://github.com/malfet
2024-11-15 01:46:29 +00:00
Zain Rizvi
b69282c98c Enable opting out of experiments even when they're being rolled out (#140433)
Enables opting out of specific experiments in the runner determinator

To opt out:
1. Go to the tracking issue: https://github.com/pytorch/test-infra/issues/5132
2. In the entry by your name, enter the experiment name, prefixed with a `-`.  For example, to opt out of the LF fleet you could enter `@ZainRIzvi,-lf`

This lets you simultaneously be opted into some experiments and opted out of others.

While the `disable-runner-experiments` label offers an option to disable all experiments on a given PR, this one lets you disable a selected set of experiments across all your PRs.

Fixes https://github.com/pytorch/pytorch/issues/138099

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140433
Approved by: https://github.com/zxiiro, https://github.com/jeanschmidt
2024-11-14 19:18:24 +00:00
Catherine Lee
ea7d1826a2 [ez] Make merge blocking sevs be based on label instead of string (#140636)
sev issues are now merge blocking if they are labeled merge blocking, instead of simply having the merge blocking string in the body.  This makes it easier to default to non merge blocking when creating a sev

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140636
Approved by: https://github.com/huydhn, https://github.com/ZainRizvi
2024-11-14 19:02:27 +00:00
atalman
70acf02116 Use Manylinux2_28 for wheel builds (#138732)
Fixes https://github.com/pytorch/pytorch/issues/123649
Use Manylinux 2_28 Docker builds for PyTorch Nightly builds

This moves the wheels to a Docker image that uses : ``quay.io/pypa/manylinux_2_28_x86_64`` as a base rather then ``centos:7`` which is EOL on June 30, 2024.

Information:
https://github.com/pypa/manylinux#manylinux_2_28-almalinux-8-based

manylinux_2_28 (AlmaLinux 8 based)
Toolchain: GCC 13
Built wheels are also expected to be compatible with other distros using glibc 2.28 or later, including:
Debian 10+
Ubuntu 18.10+
Fedora 29+
CentOS/RHEL 8+

This migration should enable us to migrate to latest CUDNN version, and land this PR: https://github.com/pytorch/pytorch/pull/137978

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138732
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/huydhn
2024-11-14 00:25:47 +00:00
Catherine Lee
08acfcddc4 [ez] Fix check labels error when deleting comment (#140578)
Re make of https://github.com/pytorch/pytorch/pull/140587
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140578
Approved by: https://github.com/huydhn
2024-11-13 23:00:58 +00:00
Catherine Lee
0db21a6b23 Remove most rockset references (#139922)
Remove most references to rockset:
* replace comments and docs with a generic "backend database"
* Delete `upload_to_rockset`, so we no longer need to install the package.
* Do not upload perf stats to rockset as well (we should be completely on DynamoDB now right @huydhn?)

According to VSCode, it went from 41 -> 7 instances of "rockset" in the repo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139922
Approved by: https://github.com/huydhn, https://github.com/ZainRizvi
2024-11-12 21:17:43 +00:00
Ting Lu
14bb49fe98 Add CUDA 12.6 Linux Builds to Binaries Matrix (#138899)
Related to #138440

Issue tracker: https://github.com/pytorch/pytorch/issues/138609

Version based on https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138899
Approved by: https://github.com/atalman

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2024-11-12 19:52:31 +00:00
atalman
51e8a13d00 CD Enable Python 3.13 on windows (#138095)
Adding CD windows. Part of: https://github.com/pytorch/pytorch/issues/130249
Builder PR landed with smoke test: https://github.com/pytorch/builder/pull/2035

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138095
Approved by: https://github.com/Skylion007, https://github.com/malfet
2024-11-12 12:28:10 +00:00
Aleksei Nikiforov
63715f6567 S390x update builder image (#132983)
Publish current state of s390x builder image to allow reproducing worker setup.
Also, if this image gets published to docker repository later, it'd be possible to download published image instead of building it into worker image in https://github.com/pytorch/pytorch/blob/main/.github/scripts/s390x-ci/self-hosted-builder/actions-runner.Dockerfile#L66, which should allow improving restart time at the cost of additional runtime overhead.

Compared to first attempt to merge:
- default docker repository settings are added to all runners. Changes are mirrored in this PR.
- job is moved into separate workflow file.
- it's no longer attempted to update limits on s390x. Limits should be properly set up there on the host. And it's not possible to update them from worker since it runs in container. Also, worker container currently doesn't have sudo installed or configured or any systemd running.
- github token is now passed once via named pipe instead of environment variable. This should increase security of tokens.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132983
Approved by: https://github.com/huydhn, https://github.com/malfet
2024-11-11 16:14:06 +00:00
Andrea Frittoli
0b650c360a Build magma for windows (#139924)
Copy the magma for windows job and script from pytorch/builder c9aac65e12/.github/workflows/build-magma-windows.yml

The linux version is moved here in https://github.com/pytorch/pytorch/pull/139888

Fixes #140001

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139924
Approved by: https://github.com/atalman
2024-11-09 09:27:59 +00:00
Andrea Frittoli
c1c94cb0be Build magma binary tarballs for various cuda (#139888)
This is a first step towards removing builds dependency to conda.

Currently we build magma as a conda package in a pytorch conda channel, implemented in a1b372dbda/magma.

This commit adapts the logic from pytorch/builder as follows:
- use pytorch/manylinux-cuda<cuda-version> as base image
- apply patches and invoke the build.sh script directly (not anymore through conda build)
- stores license and build files along with the built artifact, in an info subfolder
- create a tarball file which resembles that created by conda, without any conda-specific metadata

A new matrix workflow is added, which runs the build for each supported cuda version, and uploads the binaries to pyorch s3 bucket.

For the upload, define an upload.sh script, which will be used by the magma windows job as well, to upload to `s3://ossci-*` buckets.

The build runs on PR and push, upload runs in DRY_RUN mode in case of PR.

Fixes #139397

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139888
Approved by: https://github.com/atalman, https://github.com/malfet, https://github.com/seemethere
2024-11-08 13:28:27 +00:00
Huy Do
09ba38c4b7 Add an opt-out label to runner determinator on PR (#140054)
My sales pitch:  I need to ssh into the runner from time to time on my PR to debug issues, but it's well-known that LF runners don't support SSH login anymore.  So, the propose fix here is to introduce a new label called ~no-runner-determinator~ `no-runner-experiments` that can be attached to the PR.  Whenever `.github/scripts/runner_determinator.py` runs on a PR and sees this label, it will not apply any logic and just straight up use an empty prefix.

### Testing

With the label:

```
python3 runner_determinator.py \
    --github-token "MY_TOKEN" \
    --github-issue "5132" \
    --github-branch "install-torchao-torchtune-et" \
    --github-actor "huydhn" \
    --github-issue-owner "huydhn" \
    --github-ref-type "branch" \
    --github-repo "pytorch/pytorch" \
    --eligible-experiments "" \
    --pr-number "139947"

INFO    : Opt-out runner determinator because #139947 has no-runner-determinator label
WARNING : No env var found for GITHUB_OUTPUT, you must be running this code locally. Falling back to the deprecated print method.
::set-output name=label-type::
```

Without the label:

```
python3 runner_determinator.py \
    --github-token "MY_TOKEN" \
    --github-issue "5132" \
    --github-branch "install-torchao-torchtune-et" \
    --github-actor "huydhn" \
    --github-issue-owner "huydhn" \
    --github-ref-type "branch" \
    --github-repo "pytorch/pytorch" \
    --eligible-experiments "" \
    --pr-number "139947"

INFO    : Based on rollout percentage of 95%, enabling experiment lf.
INFO    : Skipping experiment 'awsa100', as it is not a default experiment
WARNING : No env var found for GITHUB_OUTPUT, you must be running this code locally. Falling back to the deprecated print method.
::set-output name=label-type::lf.
```

Running in trunk commit without a PR number will use the regular logic:

```
python3 runner_determinator.py \
    --github-token "MY_TOKEN" \
    --github-issue "5132" \
    --github-branch "install-torchao-torchtune-et" \
    --github-actor "huydhn" \
    --github-issue-owner "huydhn" \
    --github-ref-type "branch" \
    --github-repo "pytorch/pytorch" \
    --eligible-experiments "" \
    --pr-number ""

INFO    : Based on rollout percentage of 95%, enabling experiment lf.
INFO    : Skipping experiment 'awsa100', as it is not a default experiment
WARNING : No env var found for GITHUB_OUTPUT, you must be running this code locally. Falling back to the deprecated print method.
::set-output name=label-type::lf.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140054
Approved by: https://github.com/malfet, https://github.com/ZainRizvi
2024-11-07 22:55:27 +00:00
PyTorch MergeBot
dd6738c1ad Revert "Use Manylinux2_28 for wheel builds (#138732)"
This reverts commit 5860c8ebd1.

Reverted https://github.com/pytorch/pytorch/pull/138732 on behalf of https://github.com/atalman due to Reverting for now will be relanding ([comment](https://github.com/pytorch/pytorch/pull/138732#issuecomment-2460570980))
2024-11-06 19:12:52 +00:00
atalman
5860c8ebd1 Use Manylinux2_28 for wheel builds (#138732)
Fixes https://github.com/pytorch/pytorch/issues/123649
Use Manylinux 2_28 Docker builds for PyTorch Nightly builds

This moves the wheels to a Docker image that uses : ``quay.io/pypa/manylinux_2_28_x86_64`` as a base rather then ``centos:7`` which is EOL on June 30, 2024.

Information:
https://github.com/pypa/manylinux#manylinux_2_28-almalinux-8-based

manylinux_2_28 (AlmaLinux 8 based)
Toolchain: GCC 13
Built wheels are also expected to be compatible with other distros using glibc 2.28 or later, including:
Debian 10+
Ubuntu 18.10+
Fedora 29+
CentOS/RHEL 8+

This migration should enable us to migrate to latest CUDNN version, and land this PR: https://github.com/pytorch/pytorch/pull/137978

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138732
Approved by: https://github.com/Skylion007, https://github.com/malfet
2024-11-05 17:21:24 +00:00
Wei Wang
53f164cae5 [CUDA][CI][cusparselt] Only CUDA 11.8 ships the libcusparseLt.so.0, CUDA 12 would use PYPI libcusparselt (#138547)
since nvidia-cusparselt-cu12 is available and
nvidia-cusparselt-cu11 is not available

Related: #138175
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138547
Approved by: https://github.com/atalman
2024-11-05 15:12:41 +00:00
atalman
eaf92b2484 [Python 3.13 CD] Enable Aarch64 py3.13 builds (#138629)
Adding CD aarch64. Part of: https://github.com/pytorch/pytorch/issues/130249

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138629
Approved by: https://github.com/ZainRizvi
2024-11-05 01:16:37 +00:00
Catherine Lee
754b262bdb Move close_nonexistent_disable_issues.py queries to ClickHouse (#139296)
Example run: https://github.com/pytorch/pytorch/actions/runs/11601996563/job/32305991204?pr=139296 (commented out the part that actually closes issues but the queries run)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139296
Approved by: https://github.com/huydhn
2024-10-30 23:09:39 +00:00
Catherine Lee
24c9683355 [mergebot] Add ci-no-td label on revert (#139218)
Just in case?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139218
Approved by: https://github.com/wdvr
2024-10-30 21:36:09 +00:00
Nikita Shulga
889717aabd [CI/CD] Disable split build (#138752)
See https://github.com/pytorch/pytorch/issues/138750

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138752
Approved by: https://github.com/kit1980, https://github.com/huydhn
2024-10-23 22:38:30 +00:00
atalman
60081c29ec Use cuda 12.4 pytorch_extra_install_requirements as default (#138458)
Since cuda 12.4 binaries are default binaries on pypi now. The pytorch_extra_install_requirements need to use 12.4.
This would need to be cherry-picked to release 2.5 branch to avoid injecting these versions into metadata during pypi promotion.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138458
Approved by: https://github.com/malfet
2024-10-21 20:16:37 +00:00
Jane Xu
54839781ed Update lint failure msg to encourage lintrunner -a locally (#138232)
This is only a minor patch that I hope will change how I talk to contributors when lint fails, so that I can tell them to read the logs about lintrunner. There have been too many times when I have had to click the "approve all workflows" just for lint to fail again cuz the developer is manually applying every fix and using CI to test. I understand there are times when lintrunner doesn't work, but I'd like most contributors to at least give it a swirl once to start.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138232
Approved by: https://github.com/kit1980, https://github.com/Skylion007
2024-10-17 19:13:55 +00:00
Nikita Shulga
12f4d91e84 Enable Python-3.13 builds on MacOS (#138037)
All logic changes happen in builder repo, namely:
 - a01e87535b
 - bcd0972459
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138037
Approved by: https://github.com/huydhn
ghstack dependencies: #138041
2024-10-16 04:24:12 +00:00
Nikita Shulga
dd2ae7d0c9 [BE] Use x in [foo, bar] (#138041)
As shorthand for `x == foo or x == bar`
And `x not in [foo, bar]` as shorthand for `x != foo and x != bar`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138041
Approved by: https://github.com/huydhn
2024-10-16 01:57:37 +00:00
Wei Wang
e89fe0bd6e Updating cuda binary build to get cusparselt from PYPI (#137653)
Fixes #137374
Update 1: such PR require Meta uploading the PYPI package to download.pytorch.org.
See: ERROR: Could not find a version that satisfies the requirement nvidia-cusparselt-cu12==0.6.2; platform_system == "Linux" and platform_machine == "x86_64" (from torch) (from versions: none)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137653
Approved by: https://github.com/eqy, https://github.com/Skylion007, https://github.com/atalman
2024-10-12 16:40:37 +00:00
Jean Schmidt
2cb983ab97 [CI] Adds support for selecting experiments for workflows on runner determinator (#137614)
adds a `default` tag to experiment configurations, allowing to remove some experiments by default on the random draw:

```
        experiments:
            lf:
                rollout_perc: 25
            otherExp:
                rollout_perc: 25
                default: false
        ---
```

and includes the configuration to filter what experiments are of interest for a particular workflow (comma separated):

```
  get-test-label-type:
    name: get-test-label-type
    uses: ./.github/workflows/_runner-determinator.yml
    with:
      ...
      check_experiments: "awsa100"
```

The end goal, is to enable us to run multiple experiments, that are independent from one another. For example, while we still runs the LF infra experiment, we want to migrate other runners leveraging the current solution. A immediate UC is for the A100 instances, where we want to migrate to AWS.

Those new instances will during the migration period be labeled both `awsa100.linux.gcp.a100` and `linux.aws.a100`. Once the experiment ends, we will remove the first confusing one.

```
jobs:
  get-build-label-type:
    name: get-build-label-type
    uses: ./.github/workflows/_runner-determinator.yml
    with:
      ...

  get-test-label-type:
    name: get-test-label-type
    uses: ./.github/workflows/_runner-determinator.yml
    with:
      ...
      check_experiments: "awsa100"

  linux-focal-cuda12_1-py3_10-gcc9-inductor-build:
    name: cuda12.1-py3.10-gcc9-sm80
    uses: ./.github/workflows/_linux-build.yml
    needs:
      - get-build-label-type
      - get-test-label-type
    with:
      runner_prefix: "${{ needs.get-build-label-type.outputs.label-type }}"
      ...
      test-matrix: |
        { include: [
          { config: "inductor_huggingface_perf_compare", shard: 1, num_shards: 1, runner: "${{ needs.get-test-label-type.outputs.label-type }}linux.gcp.a100" },
          ...
        ]}
      ...
```

```
experiments:
    lf:
        rollout_perc: 50
    awsa100:
        rollout_perc: 50
         default: false
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137614
Approved by: https://github.com/malfet
2024-10-11 19:20:02 +00:00
Xuehai Pan
267f82b860 [BE] Format .ci/ / .github/ / benchmarks/ / functorch/ / tools/ / torchgen/ with ruff format (#132577)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132577
Approved by: https://github.com/malfet
2024-10-11 18:30:26 +00:00
Catherine Lee
f54e142c58 Remove references to Rockset in trymerge (#137207)
For the migration to ClickHouse

But also Rockset is not used in trymerge anymore
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137207
Approved by: https://github.com/huydhn, https://github.com/ZainRizvi
2024-10-05 12:53:22 +00:00