Commit graph

3626 commits

Author SHA1 Message Date
Michael Lazos
1252c1933d Update to remind users to use torch.compile template (#145960)
Users have been submitting fuzzer issues without meeting the requirements outline in the torch.compile issue template. This updates the note to remind users to use the torch.compile template for torch.compile bugs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145960
Approved by: https://github.com/eellison
2025-01-30 21:34:40 +00:00
Michael Lazos
d14046b58d Update fuzzer guidance to include rng (#145962)
Add another condition to fuzzer issue guidance.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145962
Approved by: https://github.com/eellison
2025-01-30 21:33:57 +00:00
PyTorch MergeBot
967cf85f3a Revert "Update mi300 labels to account for multiple clusters. (#145923)"
This reverts commit 3e135993bd.

Reverted https://github.com/pytorch/pytorch/pull/145923 on behalf of https://github.com/atalman due to reverting back to one cluster ([comment](https://github.com/pytorch/pytorch/pull/145923#issuecomment-2625022826))
2025-01-30 16:45:50 +00:00
Benjamin Glass
933b6d9830 cpp_wrapper: enable in aarch64 and x86 nightly dashboard performance runs (#145791)
Adds `cpp_wrapper` mode to the nightly inductor benchmark runs, as well as optionally for manually triggered runs. This is justified by `aot_inductor` already being in those runs.

Additionally, re-enables `aot_inductor` in the nightly aarch64 runs. It was disabled 5 months ago to deal with a performance instability, which has likely gone away at this point.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145791
Approved by: https://github.com/desertfire
2025-01-30 02:55:45 +00:00
Yang Wang
a9ed7bd78e [utilization] pipeline to create clean db records (#145327)
upload_utilization_script to generate db-ready-insert records to s3
- generate two files: metadata and timeseries in ossci-utilization buckets
- convert log record to db format ones
- add unit test job for tools/stats/

Related Prs:
setup composite action for data pipeline: https://github.com/pytorch/pytorch/pull/145310
add permission for composite action to access S3 bucket: https://github.com/pytorch-labs/pytorch-gha-infra/pull/595
add insert logic in s3 replicator: https://github.com/pytorch/test-infra/pull/6217
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145327
Approved by: https://github.com/huydhn

Co-authored-by: Huy Do <huydhn@gmail.com>
2025-01-29 23:48:50 +00:00
bglass@quansight.com
40ccb7a86d cpp_wrapper: Move #includes to per-device header files (#145932)
Summary:
This prepares us for the next PR in the stack, where we introduce pre-compiled per-device header files to save compilation time.

Reland https://github.com/pytorch/pytorch/pull/143909 after merge conflicts.

Co-authored-by: Benjamin Glass <[bglass@quansight.com](mailto:bglass@quansight.com)>

Differential Revision: D68656960

Pulled By: benjaminglass1

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145932
Approved by: https://github.com/yushangdi, https://github.com/benjaminglass1

Co-authored-by: bglass@quansight.com <bglass@quansight.com>
2025-01-29 21:08:45 +00:00
saienduri
3e135993bd Update mi300 labels to account for multiple clusters. (#145923)
We now have multiple Kubernetes clusters of mi300x resources, and this commit updates labels accordingly to target both clusters evenly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145923
Approved by: https://github.com/jeffdaily
2025-01-29 16:56:43 +00:00
Ting Lu
354fe48db9 Add magma cuda build 12.8 (#145765)
https://github.com/pytorch/pytorch/issues/145570

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145765
Approved by: https://github.com/malfet
2025-01-29 08:43:38 +00:00
Ting Lu
f4ca98950e Add CUDA 12.8 libtorch image (#145789)
https://github.com/pytorch/pytorch/issues/145570

Builds 12.8 libtorch docker/deprecate 12.1 meanwhile

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145789
Approved by: https://github.com/nWEIdia, https://github.com/atalman
2025-01-29 02:59:37 +00:00
albanD
02dd7a7803 Extend abi-stable nitpick message to all the c stable files (#145862)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145862
Approved by: https://github.com/ezyang
2025-01-28 23:22:23 +00:00
Wei Wang
6bcb545d9c [CI][CUDA][cuSPARSELt] cusparselt 0.6.3 and cu121 related cleanups (#145793)
Make ci cusparselt installation be consistent with nightly binary
Remove cu121 related docker build jobs and inductor runs Update test failures relating to cu121

Retry of https://github.com/pytorch/pytorch/pull/145696
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145793
Approved by: https://github.com/eqy, https://github.com/tinglvv
2025-01-28 21:01:58 +00:00
atalman
5382ab57d7 Move trunk windows builds to CUDA-12.4 (#145844)
Same as : https://github.com/pytorch/pytorch/pull/130446

That should catch build regressions that were previously only detectable during the nightly builds for 12.4

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145844
Approved by: https://github.com/janeyx99, https://github.com/malfet
2025-01-28 18:00:51 +00:00
Huy Do
56915b093a Fix environment deployment spam (#145823)
With https://github.com/pytorch-labs/pytorch-gha-infra/pull/598 in place, the environment can now be removed.

Fixes https://github.com/pytorch/pytorch/issues/145704

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145823
Approved by: https://github.com/clee2000
2025-01-28 17:46:31 +00:00
Zain Rizvi
097ccd9c39 Move ROCm MI300 jobs to unstable to make CI green (#145790)
This is a temporary change to reduce intermittent tests failures. Jobs can be moved back once those machines get better runner isolation.

This also sneaks in a small fix to all the rocm job's build step to be run on Linux Foundation runners (the get-label-type dependency).  The inductor-rocm-mi300 workflow already had it, but it was missing in the rocm-mi300 workflow.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145790
Approved by: https://github.com/yangw-dev
2025-01-28 17:25:15 +00:00
saienduri
7eb51e5464 Ensure GPU isolation for kubernetes pod MI300 runners. (#145829)
Fixes the reason behind moving the tests to unstable initially. (https://github.com/pytorch/pytorch/pull/145790)
We ensure gpu isolation for each pod within kubernetes by propagating the drivers selected for the pod from the Kubernetes layer up to the docker run in pytorch here.
Now we stick with the GPUs assigned to the pod in the first place and there is no overlap between the test runners.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145829
Approved by: https://github.com/jeffdaily
2025-01-28 17:20:46 +00:00
Aaron Orenstein
60f98262f1 PEP585: .github (#145707)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145707
Approved by: https://github.com/huydhn
2025-01-27 21:21:01 +00:00
Ting Lu
93dd6bc4d8 Add CUDA 12.8 installation and manylinux-cuda12.8 (#145567)
Breaking https://github.com/pytorch/pytorch/pull/145557 into two parts.
Need to have manylinux-cuda12.8 in order to build magma.

Issue: https://github.com/pytorch/pytorch/issues/145570

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145567
Approved by: https://github.com/nWEIdia, https://github.com/atalman
2025-01-27 20:49:07 +00:00
Huy Do
2f8ad8f4b9 Run inductor perf benchmark on ROCm (#145763)
This requires https://github.com/pytorch/pytorch/pull/144594.  The test run on PT2 dashboard is at https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Mon%2C%2020%20Jan%202025%2019%3A46%3A14%20GMT&stopTime=Mon%2C%2027%20Jan%202025%2019%3A46%3A14%20GMT&granularity=hour&mode=inference&dtype=bfloat16&deviceName=rocm&lBranch=144594&lCommit=9f5cb037965aa2990b2e4593610bca92526ebb3b&rBranch=144594&rCommit=9f5cb037965aa2990b2e4593610bca92526ebb3b

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145763
Approved by: https://github.com/jeffdaily
2025-01-27 20:19:03 +00:00
Edward Z. Yang
635b98fa08 Add nitpick warning that aoti_torch/c/shim.h is ABI stable (#145745)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145745
Approved by: https://github.com/albanD
2025-01-27 19:25:37 +00:00
Huy Do
5d01a2874f Increase the number of perf benchmark shards (#145534)
Per the discussion on https://github.com/pytorch/pytorch/issues/140332#issuecomment-2610805551, this adds 2 more shards for HF, 2 more for TorchBench, and 1 more for TIMM.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145534
Approved by: https://github.com/jeanschmidt
2025-01-27 16:20:42 +00:00
amdfaa
e57cdb8402 [ROCm] trunk.yml only runs pre-merge via ciflow/trunk label (#145629)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145629
Approved by: https://github.com/jeffdaily
2025-01-24 18:31:33 +00:00
amdfaa
ce371ab4c6 [ROCm] Create inductor-rocm-mi300 (#145621)
- Adds an mi300 inductor workflow to main.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145621
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-01-24 17:04:17 +00:00
atalman
5d24a9a274 Advance docker release latest verison to cuda 12.4 (#145566)
Fixed latest tag in ghcr.io to be cuda 12.4 docker image. Todo, Need to add it to : https://github.com/pytorch/builder/blob/main/CUDA_UPGRADE_GUIDE.MD

Will need to check if we can automate this by introducing cuda_stable variable or something like this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145566
Approved by: https://github.com/nWEIdia, https://github.com/kit1980, https://github.com/malfet
2025-01-24 15:27:25 +00:00
Yang Wang
6d4f5f7688 [Utilization][Usage Log] Add data model for record (#145114)
Add data model for consistency and data model change in the future.

The data model will be used during the post-test-process pipeline
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145114
Approved by: https://github.com/huydhn
2025-01-23 19:04:41 +00:00
amdfaa
c9e12d6a3b [ROCm] Update rocm.yml and add rocm-mi300.yml (#145398)
- Added another workflow to run the mi300 jobs post-merge.
- Updated rocm.yml to use mi200s instead of mi300s.
- Required to get an idea of how PRs are landing on our mi200s and mi300s.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145398
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-01-23 00:07:50 +00:00
PyTorch UpdateBot
3917053f63 [audio hash update] update the pinned audio hash (#145328)
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml).
Update the pinned audio hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145328
Approved by: https://github.com/pytorchbot
2025-01-22 19:39:03 +00:00
Huy Do
266fd35c58 Fix ExecuTorch, XLA, Triton hash updates (#145314)
Fix some stale hash updates https://github.com/pytorch/pytorch/pulls/pytorchupdatebot reported by @izaitsevfb

* XLA and ExecuTorch now wait for all jobs in pull instead of hardcoding the job names which are not correct anymore and the bot waits forever there
* Trion commit hash hasn't been updated automatically since 2023 and people have been updating the pin manually with their testings from time to time, so I doubt that it would be an useful thing to keep.

The vision update failures looks more complex though and I would need to take a closer look.  So, I will keep it in another PR
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145314
Approved by: https://github.com/izaitsevfb
2025-01-21 23:24:21 +00:00
Catherine Lee
7dd9d1f243 Update clickhouse-connect to 0.8.14 (#144915)
Corresponds to https://github.com/pytorch/test-infra/pull/6177

I only tested the slow test script but I also did testing on the new version with scripts in https://github.com/pytorch/test-infra/pull/6177
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144915
Approved by: https://github.com/huydhn
2025-01-21 21:43:18 +00:00
Huy Do
eb553ae3cf Fix broken gpt_fast micro benchmark after #144315 (#145235)
The benchmark is failing with the following error

```
  File "/var/lib/jenkins/workspace/benchmarks/gpt_fast/benchmark.py", line 333, in <module>
    main(output_file=args.output, only_model=args.only)
  File "/var/lib/jenkins/workspace/benchmarks/gpt_fast/benchmark.py", line 308, in main
    lst = func(device)
  File "/var/lib/jenkins/workspace/benchmarks/gpt_fast/benchmark.py", line 66, in run_mlp_layer_norm_gelu
    us_per_iter = benchmarker.benchmark(compiled_mod, (x,)) * 1000
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/_inductor/runtime/benchmarking.py", line 39, in wrapper
    return fn(self, *args, **kwargs)
TypeError: benchmark() missing 1 required positional argument: 'fn_kwargs'
```

An example error is https://github.com/pytorch/pytorch/actions/runs/12862761823/job/35858912555

I also assign `oncall: pt2` as the owner of this job going forward.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145235
Approved by: https://github.com/nmacchioni
2025-01-21 17:42:24 +00:00
atalman
2cffbff7da Add 3.13t Windows and MacOS binary builds (#141806)
Related to: https://github.com/pytorch/pytorch/issues/130249

For conda uses approach described here:
https://conda-forge.org/blog/2024/09/26/python-313/

Create Python 3.13t conda env like so:
```
conda create -n py313 python=3.13 python-freethreading  -c conda-forge
```

For windows executable installation we need to pass additional parameter to enable 3.13t:
```
Include_freethreaded=1
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141806
Approved by: https://github.com/albanD
2025-01-21 17:16:19 +00:00
Wang, Chuanqi
225a10febe [CI] Add xpu linux build into pull workflow (#145084)
To mitigate the XPU build failure risk introduced by non-XPU specific PRs. Refer #144967 & #143803
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145084
Approved by: https://github.com/huydhn, https://github.com/atalman
2025-01-20 19:31:48 +00:00
Aleksei Nikiforov
53e2408015 Improve cleanup of cancelled jobs on s390x for tests too (#144968)
Follow up to https://github.com/pytorch/pytorch/pull/144149
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144968
Approved by: https://github.com/huydhn
2025-01-20 12:56:07 +00:00
atalman
a215e174a1 [BE] Remove conda from scripts and build files Part 2 (#145015)
Continuation of https://github.com/pytorch/pytorch/pull/144870

Remove conda logic from scripts:

1. Remove conda build from triton build script
2. Remove conda checks from setup.py
3. Remove conda from release scripts
4. Script read_conda_versions.sh is not used (checked via git grep)

Related to: https://github.com/pytorch/pytorch/issues/138506
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145015
Approved by: https://github.com/malfet, https://github.com/Skylion007
2025-01-17 16:26:24 +00:00
Wang, Eikan
dbed747aae Add Intel GPU specific CMake files to merge rules (#135110)
As the title.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135110
Approved by: https://github.com/atalman
2025-01-17 09:44:13 +00:00
Huy Do
cf28d613f1 Allow ROCm runner to upload benchmark results if found (#144710)
https://github.com/pytorch/pytorch/wiki/How-to-integrate-with-PyTorch-OSS-benchmark-database. This will unblock AMD when they try to run benchmark MI300 benchmarks on CI.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144710
Approved by: https://github.com/kit1980
2025-01-16 19:31:45 +00:00
Nikita Shulga
ad15436db6 Fix pt2-bug-report.yml formatting (#144987)
This is a 2nd regression caused by https://github.com/pytorch/pytorch/pull/144574

Test plan: `python3 -c "import yaml; foo=yaml.safe_load(open('pt2-bug-report.yml'));print(foo['body'][0])"`
Before it printed
```
% python3 -c "import yaml; foo=yaml.safe_load(open('pt2-bug-report.yml'));print(foo['body'][0])"
{'type': 'markdown', 'attributes': {'value': ''}}
```
After
```
% python3 -c "import yaml; foo=yaml.safe_load(open('pt2-bug-report.yml'));print(foo['body'][0])"
{'type': 'markdown', 'attributes': {'value': '#### Note: Please write your bug report in English to ensure it can be understood and addressed by the development team.\n'}}
```

Fixes https://github.com/pytorch/pytorch/issues/144970

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144987
Approved by: https://github.com/Skylion007, https://github.com/zou3519
2025-01-16 18:58:07 +00:00
atalman
519269a415 [BE] - Remove conda test and upload scripts and env variables from Workflows Part 1 (#144870)
Remove conda test and upload scripts and env variables from Workflows

Related to: https://github.com/pytorch/pytorch/issues/138506
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144870
Approved by: https://github.com/malfet
2025-01-16 17:20:14 +00:00
Yutao Xu
6470b0ea6f Update torch-xpu-ops commit pin (#144739)
Update the torch-xpu-ops commit to [22cc419e4e60f469341712a5a103fa309a7dfd48](22cc419e4e), includes:

- Fix building issue https://github.com/intel/torch-xpu-ops/issues/1279
- Aten operator coverage improvement

Note: new torch-xpu-ops commit don't support bundle 0.5.3

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144739
Approved by: https://github.com/EikanWang, https://github.com/malfet
2025-01-16 15:12:37 +00:00
Huy Do
05095a45f2 Fix the wrong artifact in remaining workflows (#144812)
I missed them in https://github.com/pytorch/pytorch/pull/144694 as they weren't run often.  But they are still failing nonetheless, i.e. https://github.com/pytorch/pytorch/actions/runs/12762640334/job/35578870178

The issue was from https://github.com/pytorch/pytorch/pull/125401 where it added `use-gha: ${{ inputs.use-gha }}` to linux_test workflow.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144812
Approved by: https://github.com/clee2000
2025-01-15 20:36:40 +00:00
Wang, Chuanqi
b4b4e57469 [CD] Enable profiling for XPU Windows nightly wheels (#144316)
PR https://github.com/pytorch/pytorch/pull/144034 added profiling support for torch XPU Windows binary, enable it in PyTorch XPU Windows CD
Works for https://github.com/pytorch/pytorch/issues/114850

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144316
Approved by: https://github.com/xuhancn, https://github.com/atalman
2025-01-14 19:01:27 +00:00
Nikita Shulga
6053242890 [CD] Enable python3.13t builds for aarch64 (#144698)
But make sure that right numpy version is picked (2.0.2 does not support 3.13)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144698
Approved by: https://github.com/atalman
ghstack dependencies: #144696, #144697, #144716
2025-01-14 02:29:01 +00:00
Huy Do
b221f88fc1 Leave SCCACHE_S3_KEY_PREFIX empty to share the cache among all build jobs (#144704)
This is a follow-up of https://github.com/pytorch/pytorch/pull/144112#pullrequestreview-2528451214.  After leaving https://github.com/pytorch/pytorch/pull/144112 running for more than a week, all build jobs were fine, but I failed to see any improvement in build time.

So, let's try @malfet suggestion by removing the prefix altogether to keep it simple.  After this land, I will circle back on this to see if there is any improvements.  Otherwise, it's still a simple BE change I guess.

Here is the query I'm using to gather build time data for reference:

```
with jobs as (
    select
        id,
        name,
        DATE_DIFF('minute', created_at, completed_at) as duration,
        DATE_TRUNC('week', created_at) as bucket
    from
        workflow_job
    where
        name like '%/ build'
        and html_url like concat('%', {repo: String }, '%')
        and conclusion = 'success'
        and created_at >= (CURRENT_TIMESTAMP() - INTERVAL 6 MONTHS)
),
aggregated_jobs_in_bucket as (
    select
        --groupArray(duration) as durations,
        --quantiles(0.9)(duration),
        avg(duration),
        bucket
    from
        jobs
    group by
        bucket
)
select
    *
from
    aggregated_jobs_in_bucket
order by
    bucket desc
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144704
Approved by: https://github.com/clee2000
2025-01-14 02:19:38 +00:00
atalman
c15d6508bd Binary builds Docker images - remove cuda 12.1 (#144575)
Remove cuda 12.1 from manylinux, libtoch and almalinux builds

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144575
Approved by: https://github.com/seemethere, https://github.com/kit1980, https://github.com/malfet, https://github.com/Skylion007
2025-01-13 22:44:59 +00:00
Huy Do
5129d6ef51 Fix inductor periodic smoke test wrong artifact (#144694)
I'm not entirely sure why this failure starts to show up in periodic since Friday https://github.com/pytorch/pytorch/actions/runs/12716967189/job/35463656803.  The artifact was uploaded to S3, but `use-gha: anything-non-empty-to-use-gh` was set and it was working.  Maybe this is related to https://github.com/pytorch/pytorch/issues/144479

I also clean up the GCP/AWS A100 selection logic as the GCP cluster doesn't exist anymore.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144694
Approved by: https://github.com/clee2000
2025-01-13 21:42:39 +00:00
Nikita Shulga
d44c3906b8 [EZ] [CD] Add 3.13 to FULL_PYTHON_VERSIONS (#144697)
Separation was necessary for Conda codegen, but now it's gone
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144697
Approved by: https://github.com/atalman, https://github.com/izaitsevfb
ghstack dependencies: #144696
2025-01-13 19:12:12 +00:00
Nikita Shulga
d2f905760d [EZ] [CD] Eliminate stale TODO (#144696)
As 3.13 has been enabled across the board, which one can verify by running `./github/regenerate.sh` and observe that non of the configs have changed

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144696
Approved by: https://github.com/izaitsevfb, https://github.com/atalman
2025-01-13 19:12:12 +00:00
Huy Do
e4b2e90e54 Fix broken YAML template after #144574 (#144604)
The YAML syntax is wrong and GitHub complains about it https://github.com/pytorch/pytorch/blob/main/.github/ISSUE_TEMPLATE/pt2-bug-report.yml
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144604
Approved by: https://github.com/wdvr
2025-01-11 05:09:06 +00:00
Nikita Shulga
92ddb3d3d3 [MPS] Expose MPSProfiler::start/stopCapture to Python (#144561)
I.e. when `MTL_CAPTURE_ENABLED` environment variable is set to 1, one should be able to invoke wrap the code with `torch.mps.profiler.capture_metal` to generate gputrace for shaders invoked inside the context manager.

For example, code below:
```python
import torch
import os

def foo(x):
   return x[:,::2].sin() + x[:, 1::2].cos()

if __name__ == "__main__":
    os.environ["MTL_CAPTURE_ENABLED"] = "1"
    x = torch.rand(32, 1024, device="mps")

    with torch.mps.profiler.metal_capture("compiled_shader"):
        torch.compile(foo)(x)
```
should capture the execution of a `torch.compile` generated shader
<img width="734" alt="image" src="https://github.com/user-attachments/assets/718ff64e-103b-4b11-b66c-c89cfc770b5d" />

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144561
Approved by: https://github.com/manuelcandales
ghstack dependencies: #144559, #144560
2025-01-11 02:05:36 +00:00
Sahan Paliskara
9ec8ecea71
Update documentation.yml 2025-01-10 15:27:28 -08:00
Sahan Paliskara
1ff8a1c4eb
Update documentation.yml to request english 2025-01-10 15:26:43 -08:00