Commit graph

1980 commits

Author SHA1 Message Date
Jack Taylor
bad69044d8 [ROCm] upgrade ROCm CI builds to py3.10 (#134108)
Upgrade ROCm CI builds to py3.10

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134108
Approved by: https://github.com/jeffdaily, https://github.com/jithunnair-amd, https://github.com/atalman
2024-09-18 17:39:34 +00:00
PyTorch MergeBot
7fe004f7cf Revert "Add CI for Triton CPU backend (#135342)"
This reverts commit 426580a67d.

Reverted https://github.com/pytorch/pytorch/pull/135342 on behalf of https://github.com/jeanschmidt due to Broke internal signals, see D62737208 for more details ([comment](https://github.com/pytorch/pytorch/pull/133408#issuecomment-2353623816))
2024-09-16 18:33:33 +00:00
Jon Janzen
13bd1256f9 Delete stable prototype (#135911)
This project ended up going in an entirely different direction, so we can close out all this
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135911
Approved by: https://github.com/izaitsevfb, https://github.com/malfet
2024-09-16 15:32:17 +00:00
Jez Ng
426580a67d Add CI for Triton CPU backend (#135342)
Where possible, I have marked failing tests (which we intend to fix or triage) as `@xfail_if_triton_cpu`. This will help us track progress of the Triton CPU backend over time. Tests that I don't think we need to address, or that are flaky, have been marked as skips.

Successful CI run: https://github.com/pytorch/pytorch/actions/runs/10822238062/job/30028284549

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135342
Approved by: https://github.com/jansel
ghstack dependencies: #133408
2024-09-14 21:45:19 +00:00
Huy Do
db5e1b44d2 Fix inductor-micro-benchmark results upload (take 2) (#136052)
I had a brain freeze when I wrote the original fix.  The parameters were in the wrong order.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136052
Approved by: https://github.com/clee2000, https://github.com/kit1980, https://github.com/malfet
2024-09-13 22:05:10 +00:00
Nikita Shulga
a30d5ba16c Fix bug in split-build workflows codegen (#136043)
By just deleting a few rogue lines left out in https://github.com/pytorch/pytorch/pull/135510
If file in workflows folder does not have a `.yml` extensions it will not be launched at all, will it?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136043
Approved by: https://github.com/kit1980, https://github.com/atalman
2024-09-13 21:29:06 +00:00
atalman
a3d827a28c Use python 3.11 for Large Wheel build (#136042)
Use Python 3.11 in nightly Large wheel builds. Required for Colab testing

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136042
Approved by: https://github.com/kit1980, https://github.com/malfet

Co-authored-by: Sergii Dymchenko <kit1980@gmail.com>
2024-09-13 20:27:11 +00:00
Huy Do
a130ed828a Fix the upload of x86 micro benchmark results (#135780)
Upload stats workflow currently skips this https://github.com/pytorch/pytorch/actions/runs/10807251335/job/29977650639, this is a miss from https://github.com/pytorch/pytorch/pull/135042.  So, the workflow is running but nothing has been uploaded yet.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135780
Approved by: https://github.com/atalman
2024-09-12 01:16:38 +00:00
Zain Rizvi
09519eb195 Support rolling over a percentage of workflows (#134816)
In order to support adding a rollover percentage, this ended up being a complete rewrite of runner_determinator.py.

Details of the new format are in the comments up top.

On the plus side, this now includes some unit tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134816
Approved by: https://github.com/PaliC, https://github.com/zxiiro
2024-09-11 18:01:26 +00:00
Jithun Nair
82a4df2d5f [CI] [ROCm] Run rocm workflow on every push to main branch (#135644)
Dial the frequency back up from https://github.com/pytorch/pytorch/pull/131637

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135644
Approved by: https://github.com/huydhn
2024-09-11 17:21:05 +00:00
Catherine Lee
4ca65d3323 [CI] Increase sharding for jobs that are timing out (#135582)
Increase sharding for
* slow grad check
* slow cuda tests slow / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test
* avx

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135582
Approved by: https://github.com/huydhn, https://github.com/malfet
2024-09-10 19:45:13 +00:00
Thanh Ha
5e0788befb Migrate remaining jobs to use runner determinator (#134867)
At this point all self-hosted runner jobs should be using the runner determinator to switch between LF and Meta runners. This change updates the remaining jobs that have not yet been migrated over.

Issue: https://lf-pytorch.atlassian.net/browse/PC-25

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134867
Approved by: https://github.com/ZainRizvi
2024-09-10 18:14:00 +00:00
atalman
9b764491e3 Use upload-artifact@v4.4.0 for create_release.yml (#135528)
Fixes failure: https://github.com/pytorch/pytorch/actions/runs/10780281005/job/29895846007

Due broken sync
```
actions/upload-artifact@v2
and
actions/download-artifact@v4.1.7
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135528
Approved by: https://github.com/kit1980, https://github.com/malfet
2024-09-09 20:48:52 +00:00
Sahan Paliskara
a4e6a0b240 [split build] move periodic split builds into own concurrency group (#135510)
To avoid nightly workflows cancelling each other
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135510
Approved by: https://github.com/clee2000, https://github.com/huydhn, https://github.com/malfet

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2024-09-09 19:35:57 +00:00
Sahan Paliskara
0c661f3e1a [Split Build] Refactor split build binary builds into their own workflows and move split build binary builds to periodic (#134624)
As we need to move split build binary tests from trunk to periodic this pr, refactors those jobs out into its own workflow to achieve this.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134624
Approved by: https://github.com/malfet
2024-09-06 23:57:56 +00:00
atalman
b46a1b9e2d Use Python 3.9 on all libtorch jobs (#135245)
Part of the migration py3.8->3.9

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135245
Approved by: https://github.com/izaitsevfb
2024-09-06 02:27:22 +00:00
PyTorch MergeBot
8f66995459 Revert "Support rolling over a percentage of workflows (#134816)"
This reverts commit fc890b55b5.

Reverted https://github.com/pytorch/pytorch/pull/134816 on behalf of https://github.com/malfet due to Causes lint to intermittently fail ([comment](https://github.com/pytorch/pytorch/pull/134816#issuecomment-2332902609))
2024-09-05 23:39:41 +00:00
Edward Z. Yang
3825607144 Add torch._logging.scribe (#135224)
See https://github.com/pytorch/pytorch/pull/135138 for a usage example. Meta only, see https://docs.google.com/document/d/1JpbAQvRhTmuxjnKKjT7qq57dsnV84nxSLpWJo1abJuE/edit#heading=h.9wi46k7np6xw for context

fbscribelogger is a library that allows us to write to scribe, which is Meta's logging infrastructure, when you have appropriate access token (this token is available for jobs running on main, as well as authorized jobs with the ci-scribe label). The resulting data is accessible via Scuba (a real time in-memory database) and Hive (a more traditional SQL persisted database).

Here's the motivating use case. Suppose there is somewhere in PyTorch's codebase where you'd like to log an event, and then you'd like to find all the situations where this log is called. If PyTorch is rolled out to our internal users, we have some FB-oriented APIs (like torch._utils_internal.signpost_event) with which you can do this. But you have to actually land your PR to main, wait for it to be ingested to fbcode, and then wait for us to actually roll out this version, before you get any data. But what if you want the results within the next few hours? Instead, you can use torch._logging.scribe to directly write to our logging infrastructure *from inside CI jobs.* The most convenient approach is to log unstructured JSON blobs to `open_source_signpost` (added in this PR; you can also add your own dedicated table as described in the GDoc above). After adding logging code to your code, you can push your PR to CI, add 'ci-scribe' label, and in a few hours view the results in Scuba, e.g., (Meta-only) https://fburl.com/scuba/torch_open_source_signpost/z2mq8o4l If you want continuous logging on all commits on master, you can land your PR and it will be continuously get logging for all CI runs that happen on main.

Eventually, if your dataset is important enough, you can consider collaborating with PyTorch Dev Infra to get the data collected in our public AWS cloud so that OSS users can view it without access to Meta's internal users. But this facility is really good for prototyping / one-off experiments. It's entirely self serve: just add your logging, run your PR CI with ci-scribe, get results, do analysis in Scuba.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135224
Approved by: https://github.com/Skylion007
2024-09-05 22:37:13 +00:00
Zain Rizvi
fc890b55b5 Support rolling over a percentage of workflows (#134816)
In order to support adding a rollover percentage, this ended up being a complete rewrite of runner_determinator.py.

Details of the new format are in the comments up top.

On the plus side, this now includes some unit tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134816
Approved by: https://github.com/PaliC, https://github.com/zxiiro
2024-09-05 22:21:45 +00:00
PyTorch MergeBot
f63571060c Revert "Use actions/upload-artifact@v4.4.0 for rest of workflows (#135264)"
This reverts commit 9c0b03020b.

Reverted https://github.com/pytorch/pytorch/pull/135264 on behalf of https://github.com/atalman due to broke CI ([comment](https://github.com/pytorch/pytorch/pull/135264#issuecomment-2332674607))
2024-09-05 21:43:05 +00:00
Huy Do
24a223c49d Run inductor micro benchmark on x86 metal runner (#135042)
This enables inductor micro benchmark on CPU (x86):

* Running on AWS metal runner for more accurate benchmark
* I add a new `arch` column, which will be either x86_64 or arm64 for CPU or GPU name for GPU.  We can use this later to differentiate between different setup, i.e. cuda (a100) vs cuda (a10g) or cpu (x86_64) vs cpu (arm64)

The next step would be to run this one cpu arm64, and cuda (a10g).

### Testing
Here is the CSV results from my test run https://github.com/pytorch/pytorch/actions/runs/10709344180

```
name,metric,target,actual,dtype,device,arch,is_model
mlp_layer_norm_gelu,flops_utilization,0.8,17.36,bfloat16,cpu,x86_64,False
gather_gemv,memory_bandwidth(GB/s),990,170.80,int8,cpu,x86_64,False
gather_gemv,memory_bandwidth(GB/s),1060,204.78,bfloat16,cpu,x86_64,False
Mixtral-8x7B-v0.1,token_per_sec,175,26.68,int8,cpu,x86_64,True
Mixtral-8x7B-v0.1,memory_bandwidth(GB/s),1130,171.91,int8,cpu,x86_64,True
Mixtral-8x7B-v0.1,compilation_time(s),162,47.36,int8,cpu,x86_64,True
gemv,memory_bandwidth(GB/s),870,236.36,int8,cpu,x86_64,False
gemv,memory_bandwidth(GB/s),990,305.71,bfloat16,cpu,x86_64,False
Llama-2-7b-chat-hf,token_per_sec,94,14.01,bfloat16,cpu,x86_64,True
Llama-2-7b-chat-hf,memory_bandwidth(GB/s),1253,185.18,bfloat16,cpu,x86_64,True
Llama-2-7b-chat-hf,compilation_time(s),162,74.99,bfloat16,cpu,x86_64,True
Llama-2-7b-chat-hf,token_per_sec,144,25.09,int8,cpu,x86_64,True
Llama-2-7b-chat-hf,memory_bandwidth(GB/s),957,165.83,int8,cpu,x86_64,True
Llama-2-7b-chat-hf,compilation_time(s),172,70.69,int8,cpu,x86_64,True
layer_norm,memory_bandwidth(GB/s),950,172.03,bfloat16,cpu,x86_64,False
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135042
Approved by: https://github.com/yanboliang
2024-09-05 21:31:36 +00:00
atalman
9c0b03020b Use actions/upload-artifact@v4.4.0 for rest of workflows (#135264)
To be consistent with https://github.com/pytorch/pytorch/pull/135263 and rest of workflows. Use v4.4.0.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135264
Approved by: https://github.com/kit1980, https://github.com/malfet
2024-09-05 21:05:06 +00:00
Jack Taylor
034717a029 [ROCm] remove triton-rocm commit pin and merge pins with triton.txt (#133438)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133438
Approved by: https://github.com/jithunnair-amd, https://github.com/malfet

Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>
2024-09-05 20:36:45 +00:00
atalman
8efe547046 Use actions/upload-artifact@v4.4.0 for triton builds (#135263)
Same as: https://github.com/pytorch/pytorch/pull/135139
Fixes upload failure: https://github.com/pytorch/pytorch/actions/runs/10722567217/job/29748125015
fix regression introduced by https://github.com/pytorch/pytorch/pull/135068

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135263
Approved by: https://github.com/kit1980, https://github.com/huydhn
2024-09-05 20:03:39 +00:00
Edward Z. Yang
2e2fb668fa Upgrade expecttest to 0.2.1 (#135136)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135136
Approved by: https://github.com/albanD, https://github.com/atalman, https://github.com/Skylion007
2024-09-05 16:05:35 +00:00
Stonepia
9d24f945ba [CI] Use larger instance for building triton whl (#135201)
When running CI jobs of "Build Triton Wheels", it failed due to the lack of resources. This PR uses a larger runner to avoid these issues.

The failure message is like:

```
Process completed with exit code 137.
```

Related running actions:
Failed actions: https://github.com/pytorch/pytorch/actions/runs/10714445036
Success actions: https://github.com/pytorch/pytorch/actions/runs/10716710830

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135201
Approved by: https://github.com/chuanqi129, https://github.com/atalman
2024-09-05 14:36:23 +00:00
Nikita Shulga
105ac2418c Fix binary builds artifact download (#135139)
By upgrading upload-artifacts action to v4.4.0

As artifact store layout is different between v3 and v4 actions and artifacts uploaded by v3 can not be downloaded by v4

Should fix`Unable to download artifact(s): Artifact not found for name: libtorch-cpu-shared-with-deps-release`, which could be seen for example [here](https://github.com/pytorch/pytorch/actions/runs/10707740040/job/29690137218#step:7:29)

I.e. fix regression introduced by https://github.com/pytorch/pytorch/pull/135068

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135139
Approved by: https://github.com/atalman, https://github.com/huydhn
2024-09-05 00:43:34 +00:00
chuanqiw
977a909250 [CI] Build pytorch wheel with Torch XPU Operators on Windows (#133151)
# Description
This pipeline enables the CI build on Windows with PR labeled with ciflow/xpu. This will build torch binary with Torch XPU Operators on Windows using Vision Studio BuildTools 2022.

# Changes
1. Install xpu batch file (install_xpu.bat) - Check if build machine has oneAPI in environment, and if the version of it is latest. If not, install the latest public released oneAPI in the machine.
2. GHA callable pipeline (_win-build.yml) - Set vc_year and use_xpu as parameter to set build wheel environment.
3.  GHA workflow (xpu.yml) - Add a new windows build job and pass parameters to it.
4.  Build wheels script (.ci/pytorch/win-test-helpers/build_pytorch.bat) - Prepare environment for building, e.g. install oneAPI bundle.

# Note
1. For building wheels on Intel GPU, you need Vision Studio BuildTools version >= 2022
2. This pipeline requires to use Vision Studio BuildTools 2022 to build wheels. For now, we specify "windows.4xlarge.nonephemeral" as build machine label in the yaml file. We will request to add self-hosted runners with Intel GPU and Vision Studio BuildTools 2022 installed soon.

Work for #114850

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133151
Approved by: https://github.com/chuanqi129, https://github.com/atalman

Co-authored-by: chuanqiw <chuanqi.wang@intel.com>
2024-09-05 00:02:46 +00:00
atalman
60dfe1b35e Fix lint after Bump actions/download-artifact update (#135109)
Fixes lint after auto-generated PR: 367a78495f

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135109
Approved by: https://github.com/ezyang, https://github.com/huydhn
2024-09-04 15:26:17 +00:00
chuanqiw
67208f08bd [CD] Enable XPU nightly build on Windows (#134312)
Depends on https://github.com/pytorch/builder/pull/1975 land. Works for https://github.com/pytorch/pytorch/issues/114850
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134312
Approved by: https://github.com/atalman
2024-09-04 14:46:36 +00:00
Thanh Ha
dcf05fcb14 Fix stale job using non-existant ARC runner (#134863)
The ARC CI system has been shutdown so this job is currently using a runner that doesn't exist.

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134863
Approved by: https://github.com/ZainRizvi
2024-09-04 12:57:10 +00:00
dependabot[bot]
367a78495f
Bump actions/download-artifact from 2 to 4.1.7 in /.github/workflows (#135068)
Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 2 to 4.1.7.
- [Release notes](https://github.com/actions/download-artifact/releases)
- [Commits](https://github.com/actions/download-artifact/compare/v2...v4.1.7)

---
updated-dependencies:
- dependency-name: actions/download-artifact
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-09-03 20:33:57 -07:00
Zain Rizvi
f05b716d6d Add validator to ensure runner determinator script is kept in sync (#134800)
We keep two copies of the runner-determinator script:
1. In runner_determinator.py, for ease of testing.  This however is not actually executed during CI
2. Embedded in _runner-determinator.yml.  This is what CI uses.

Why the duplication? Short version: Because of how github CI works, during a given CI run the workflow yml files could actually come from the main branch, while the remaining files get read from the local commit.
This can lead to a newer version of _runner-determinator.yml trying to invoke an older version of runner_determintor.py than it was actually designed for. Chaos ensues.

We mitigate this by embedding the script into the yml file.  But we still keep the script around because it's much easier to run tests against.

This workflow's job is to ensure that if one edits the script in one of those two locations then they remember to update it in the other location as well
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134800
Approved by: https://github.com/zxiiro, https://github.com/PaliC
ghstack dependencies: #134796
2024-09-03 23:29:04 +00:00
Zain Rizvi
469429b959 Refactor runner determinator (#134796)
Some minor refactorings to make the code easier to parse and easier to add unit tests for.  Keeping this as a separate PR for ease of review, since it should have zero functional behavior changes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134796
Approved by: https://github.com/zxiiro, https://github.com/PaliC
2024-09-03 23:29:04 +00:00
PyTorch MergeBot
a1ba8e61d1 Revert "[ROCm] remove triton-rocm commit pin and merge pins with triton.txt (#133438)"
This reverts commit 5e8bf29148.

Reverted https://github.com/pytorch/pytorch/pull/133438 on behalf of https://github.com/ZainRizvi due to This still breaks linux binary builds. Added the appropriate labels to ensure tests can pass. See [GH job link](https://github.com/pytorch/pytorch/actions/runs/10626427003/job/29460479554) [HUD commit link](5e8bf29148) ([comment](https://github.com/pytorch/pytorch/pull/133438#issuecomment-2322246198))
2024-08-30 20:00:41 +00:00
Wouter Devriendt
db17a9898d regenerate ci workflows for binary builds with new g4dn runners (#133404)
Fixes #103104

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133404
Approved by: https://github.com/ZainRizvi
2024-08-30 19:53:22 +00:00
Jack Taylor
5e8bf29148 [ROCm] remove triton-rocm commit pin and merge pins with triton.txt (#133438)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133438
Approved by: https://github.com/jithunnair-amd, https://github.com/malfet

Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>
2024-08-30 03:38:35 +00:00
atalman
6180574771 Move py 3.8->3.9 pull, trunk, inductor, prerioric CI tests (#133624)
Part of Deprecation of python 3.8 and moving to 3.9. Related to: https://github.com/pytorch/pytorch/issues/120718
Except XPU and ROCM jobs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133624
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/ZainRizvi
2024-08-29 19:15:59 +00:00
Ivan Zaitsev
41e36e2b46 Reflect check_labels status as a signal (#134711)
Fixes the workflow when meta-exported diff (co-dev) doesn't have the required labels, but the signal is suppressed due to job failure (e.g. [see this run](https://github.com/pytorch/pytorch/actions/runs/10590994706/job/29347663526?pr=134484)).

With this change the workflow status correctly reflects the status of the check.

# Testing
* [illegal pr_num](https://github.com/pytorch/pytorch/actions/runs/10603163898/job/29386843591)
* [successful run](https://github.com/pytorch/pytorch/actions/runs/10603279052/job/29387230110) (topic label present)
* no labels: [check fails](https://github.com/pytorch/pytorch/actions/runs/10603310368/job/29387333864)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134711
Approved by: https://github.com/clee2000
2024-08-29 03:11:16 +00:00
Bin Bao
e6bf1710ff [Inductor][Refactor] Rename CPU benchmark test configs (#134639)
Summary: benchmarks/dynamo/ci_expected_accuracy/update_expected.py expects a benchmark run config is named as {config}_{benchmark}, and CPU tests should follow the same naming convention.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134639
Approved by: https://github.com/huydhn
2024-08-28 14:49:55 +00:00
atalman
d5aefadb17 [CD] Fix docker builds by installing setuptools (#134595)
Seeing failures like this:
```
#49 844.6 //build_scripts/manylinux1-check.py:6: DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives
.....
[python 3/3] RUN bash build_scripts/build.sh && rm -r build_scripts:
846.9 ...it did, yay.
846.9 + for PYTHON in '/opt/python/*/bin/python'
846.9 + /opt/python/cpython-3.12.0/bin/python build_scripts/manylinux1-check.py
847.0 Traceback (most recent call last):
847.0   File "//build_scripts/manylinux1-check.py", line 55, in <module>
847.0     if is_manylinux1_compatible():
847.0        ^^^^^^^^^^^^^^^^^^^^^^^^^^
847.0   File "//build_scripts/manylinux1-check.py", line 6, in is_manylinux1_compatible
847.0     from distutils.util import get_platform
847.0 ModuleNotFoundError: No module named 'distutils'
------
```
PR: https://github.com/pytorch/pytorch/pull/134455

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134595
Approved by: https://github.com/kit1980, https://github.com/seemethere, https://github.com/malfet
2024-08-27 19:31:44 +00:00
Zain Rizvi
f480385277 Remove explicit Amz2023 reference from jobs (#134355)
Changes jobs to go back to using the default AMI.

Note: This is only a cleanup PR. It does NOT introduce any behavior changes in CI

Now that the default variant uses the Amazon 2023 AMI and has been shown to be stable for a week, it's time to remove the explicit amz2023 references and go back to using the default variant.

After a week or two, when this is rolled out to most people, we can remove the variants from scale config as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134355
Approved by: https://github.com/jeanschmidt
2024-08-27 08:51:42 +00:00
atalman
78128cbdd8 [CD] Use ephemeral arm64 runners for nightly and docker builds (#134473)
Follow up after adding linux arm64 ephemeral instances: https://github.com/pytorch/pytorch/pull/134469
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134473
Approved by: https://github.com/malfet
2024-08-26 17:47:20 +00:00
atalman
a6fac0e969 Use ephemeral runners for windows nightly builds (#134463)
This is definition of windows.4xlarge:

```
  windows.4xlarge:
    disk_size: 256
    instance_type: c5d.4xlarge
    is_ephemeral: true
    max_available: 420
    os: windows
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134463
Approved by: https://github.com/jeanschmidt
2024-08-26 16:33:19 +00:00
Thanh Ha
bb67ff2ba7 Migrate Windows bin jobs to runner determinator (#134231)
Update Windows binary workflows to use the runner determinator script.

Closes: pytorch/ci-infra#262

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134231
Approved by: https://github.com/ZainRizvi
2024-08-26 14:56:00 +00:00
PyTorch MergeBot
4648848696 Revert "[ROCm] remove triton-rocm commit pin and merge pins with triton.txt (#133438)"
This reverts commit f71c3d265a.

Reverted https://github.com/pytorch/pytorch/pull/133438 on behalf of https://github.com/jeanschmidt due to seems to have introduced breakages in linux binary builds ([comment](https://github.com/pytorch/pytorch/pull/133438#issuecomment-2308787310))
2024-08-25 11:20:30 +00:00
Jack Taylor
f71c3d265a [ROCm] remove triton-rocm commit pin and merge pins with triton.txt (#133438)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133438
Approved by: https://github.com/jithunnair-amd, https://github.com/malfet
2024-08-24 18:26:49 +00:00
chuanqiw
6245d5b87b [CI] Update XPU ci test python version to 3.9 (#134214)
Works for https://github.com/pytorch/pytorch/issues/114850
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134214
Approved by: https://github.com/EikanWang, https://github.com/malfet
2024-08-24 18:11:36 +00:00
atalman
ff77c67d16 Use ephemeral runners for linux nightly builds (#134367)
Should be landed with https://github.com/pytorch/test-infra/pull/5590
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134367
Approved by: https://github.com/kit1980, https://github.com/malfet, https://github.com/seemethere
2024-08-24 12:49:07 +00:00
Nikita Shulga
09a82f3d24 [EZ][BE] Delete references to non-existing AWS_SCCACHE secrets (#134370)
First of all, none of the binary builds should be using sccache for security and reliability reasons (as distributed cache can become corrupted/compromised), but even if they do all authentication to AWS service shoudl be done via OIDC

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134370
Approved by: https://github.com/seemethere, https://github.com/atalman
2024-08-23 22:23:48 +00:00