Commit graph

84115 commits

Author SHA1 Message Date
Jack Zhang
ed309b9156 Re-add stft option to align window for center = false (#146379)
Skips advancing the fc window on https://github.com/pytorch/pytorch/pull/145437, since I just found that there were non-trivial efforts to do so a while ago that eventually was reverted: https://github.com/pytorch/pytorch/pull/73434

Works around the issue by keeping the stft sans center overload

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146379
Approved by: https://github.com/justinchuby, https://github.com/iseeyuan
2025-02-06 14:07:13 +00:00
PyTorch MergeBot
1b79d47635 Revert "[dynamo] check for incompatible configs (#146513)"
This reverts commit aab7925418.

Reverted https://github.com/pytorch/pytorch/pull/146513 on behalf of https://github.com/atalman due to inductor/test_fuzzer.py::TestConfigFuzzer::test_config_fuzzer_dynamo_bisect [GH job link](https://github.com/pytorch/pytorch/actions/runs/13174131431/job/36772837627) [HUD commit link](4a545eb85d) ([comment](https://github.com/pytorch/pytorch/pull/146513#issuecomment-2639860568))
2025-02-06 13:42:25 +00:00
Animesh Jain
340cfe4f28 [dynamo][fbcode] Turn on inline_inbuilt_nn_modules (#145407)
As title.

Some internal testing at https://fb.workplace.com/groups/241460628989036/permalink/411650015303429/

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145407
Approved by: https://github.com/ezyang, https://github.com/jansel
2025-02-06 13:18:35 +00:00
PyTorch MergeBot
bd7d4fb2b5 Revert "[DTensor][Test] Create a simple unit test for tensordot (#146514)"
This reverts commit 1f8baf09ea.

Reverted https://github.com/pytorch/pytorch/pull/146514 on behalf of https://github.com/albanD due to The lint failures that you ignored are real right? ([comment](https://github.com/pytorch/pytorch/pull/146514#issuecomment-2639554636))
2025-02-06 11:26:43 +00:00
zeshengzong
4a545eb85d Fix torch.nn.functional.one_hot param num_classes optional description (#146470)
`torch.nn.functional.one_hot` [document](https://pytorch.org/docs/stable/generated/torch.nn.functional.one_hot.html) describe param `num_classes` not optional, but user can call method without pass it.

![image](https://github.com/user-attachments/assets/4e6d4feb-691f-451f-95b5-4ac11bac7bc2)

```python
>>> import torch
>>> a = torch.arange(0, 5) % 3  # [0,1,2,0,1]
>>> torch.nn.functional.one_hot(a)
tensor([[1, 0, 0],
        [0, 1, 0],
        [0, 0, 1],
        [1, 0, 0],
        [0, 1, 0]])

```

`num_classes` has default value -1

93d98aca31/aten/src/ATen/native/native_functions.yaml (L6154-L6157)

## Test Result

![image](https://github.com/user-attachments/assets/2c7203b7-6226-4ebc-84c8-cbf912fc48e2)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146470
Approved by: https://github.com/albanD
2025-02-06 07:48:05 +00:00
Simon Fan
aab7925418 [dynamo] check for incompatible configs (#146513)
internal: https://fb.workplace.com/groups/1075192433118967/permalink/1599802033991335/

Assuming flags don't change during compilation, we shouldn't allow incompatible configs to be set at torch.compile wrap time.

Not in this PR: For flags that need to change during compilation, we'd have to be strict about where they can be used in the compile lifecycle

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146513
Approved by: https://github.com/williamwen42
2025-02-06 07:39:52 +00:00
eqy
5f0901e573 [cuBLAS][cuBLASLt] Unify cuBLASLt workspaces with cuBLAS workspaces (#145130)
As `cuBLAS` workspaces are already per-stream, there shouldn't be kernel execution overlap with `cuBLASLt` kernels.

This PR reuses `cuBLAS` workspaces for `cuBLASLt` for the following benefits:

+ caching (`cuBLAS` workspaces were already cached, so now we get that for `cuBLASLt`)
+ "free" workspace size bump for `cuBLASLt` `cuBLASLt` workspace sizes were previously smaller than those for `cuBLAS` by default which potentially hurts performance, and we encountered difficulty in increasing the size due to downstream OOMs , see also #120925
+ fixes behavior broken behavior with the memtracker; https://github.com/pytorch/pytorch/pull/139442 attempted to handle peaky allocation behavior that broke memtracker equivalence tests but it didn't seem to fully work, here the cached/reused `cuBLAS` workspace seems to fix it
+ one environment variable to rule them all: `CUBLAS_WORKSPACE_CONFIG` applies directly to `cuBLASLt` without a confusing `CUBLASLT_WORKSPACE_SIZE` that users would also need to consider

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145130
Approved by: https://github.com/ngimel
2025-02-06 05:57:33 +00:00
Nikita Shulga
36c6e09528 [MPSInductor] Fix min/max for bfloat16 (#146552)
By introducing a full specialization that upcasts everything to float, as bfloat does not have a native min/max

Test by runing `test_min_max_reduction`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146552
Approved by: https://github.com/dcci
2025-02-06 05:15:00 +00:00
wz337
1f8baf09ea [DTensor][Test] Create a simple unit test for tensordot (#146514)
Fixes #ISSUE_NUMBER

The dims and shape of the tensors are from a specific Shampoo use case. We want to create a unit test for it to make sure there are no regressions for this.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146514
Approved by: https://github.com/tianyu-l
2025-02-06 05:09:34 +00:00
Michael Diggin
e01a5e9e1e Small improvements to NJT matrix multiplies (#146405)
Fixes #146404

Adds changes to the matmul and matmul_backward operation for nested jagged tensors, to support back propagation when the output is a regular strided tensor.
This required adding support for the nested matmul operation to work when the nested tensor wasn't 'self', i.e
`A@B` where `A` isn't nested but `B` is.

The operation schemas had to be updated to reflect that either input can be a strided tensor instead (and the gradient), so an extra assertion is added in an edge case where neither input is nested.

Unit tests are also added.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146405
Approved by: https://github.com/soulitzer, https://github.com/jbschlosser
2025-02-06 04:51:12 +00:00
bobrenjc93
389c5c0842 print out partial fx graph for all data-dependent errors (#146363)
The previous implementation didn't catch the following type of errors

```
torch.fx.experimental.symbolic_shapes.GuardOnDataDependentSymNode: Could not extract specialized integer from data-dependent expression u2 (unhinted: u2).  (Size-like symbols: none)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146363
Approved by: https://github.com/angelayi, https://github.com/bdhirsh
ghstack dependencies: #146298, #146296
2025-02-06 04:21:34 +00:00
Michael Suo
425804db2b [torch] fix exception types in custom class magic setattr/getattr (#146516)
Summary:
`c10::AttributeError` is not automatically converted to Python AttributeError, it needs some special macros (e.g. `HANDLE_TH_ERRORS`).

Some Python functions like `hasattr` rely on the type of the throw exception to be correct.

We don't need the fully generality of those macros, so just do a targeted error type conversion here.

Test Plan: added unit test

Differential Revision: D69197217

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146516
Approved by: https://github.com/zdevito
2025-02-06 02:14:11 +00:00
Pian Pawakapan
3a6a203b98 [dynamic shapes][real tensor tracing] propagate unbacked hint when creating mod replacement (#146381)
Fixes data-dependent errors for 2 PT2I models in draft export

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146381
Approved by: https://github.com/angelayi
2025-02-06 01:48:40 +00:00
Pian Pawakapan
c5062cca98 [export] make stack_trace optional in insert_custom_op_guards (#146438)
Summary: Fixes 1 PT2I exportability error

Test Plan: -

Differential Revision: D69132186

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146438
Approved by: https://github.com/yiming0416, https://github.com/angelayi
2025-02-06 01:48:26 +00:00
Nikita Shulga
6a985d8b2e Make inductor_utils.requires_gpu accept MPS (#145156)
Not yet ready to setp HAS_GPU to true, but can unskip tests that require GPU
(Noticed while running test_mps_basics.py that `test_scalar_cpu_tensor_arg` is getting skipped)

- Replace `GPU_TYPE` with `self.device` in `test_custom_op_fixed_layout_sequential`, `test_inductor_layout_optimization_input_mutations`, `test_mutable_custom_op_fixed_layout2`  otherwise they GPU tests are just running for _cpu suffixes.
- Tweak `test_tmp_not_defined_issue3` to work correctly on CPU, by defining `test_device` and `test_device_0`
- UnXFail `test_mutable_custom_op_fixed_layout2_dynamic_shapes` as it should just work on CPU
- Add `skip_if_no_triton` decorator and decorate `test_reduction_config_limit` with it, as it does not need CPU nor GPU, but rather a triton backend.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145156
Approved by: https://github.com/dcci, https://github.com/Skylion007, https://github.com/jansel
2025-02-06 01:14:36 +00:00
Isalia20
0dc03134d9 [MPS] linalg solve implementation (#146531)
Fixes #98222

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146531
Approved by: https://github.com/malfet

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2025-02-06 00:57:49 +00:00
Nikita Shulga
495049860b [BE][Metal] Fix signed unsigned comparison warning (#146549)
I wish I knew how to extract Metal warnings during JIT compilation but https://developer.apple.com/documentation/metal/mtldevice/makelibrary(source:options:)?changes=_7&language=objc is a lie as `error:` stays `nil` unless shader compilation fails. But when it does following warnings are thrown
```
program_source:666:26: warning: comparison of integers of different signs: 'int' and 'unsigned int' [-Wsign-compare]
  for (auto idx = 1; idx < size; ++idx) {
                     ~~~ ^ ~~~~
program_source:677:26: warning: comparison of integers of different signs: 'int' and 'unsigned int' [-Wsign-compare]
  for (auto idx = 1; idx < size; ++idx) {
                     ~~~ ^ ~~~~
program_source:688:26: warning: comparison of integers of different signs: 'int' and 'unsigned int' [-Wsign-compare]
  for (auto idx = 1; idx < size; ++idx) {
                     ~~~ ^ ~~~~
program_source:699:26: warning: comparison of integers of different signs: 'int' and 'unsigned int' [-Wsign-compare]
  for (auto idx = 1; idx < size; ++idx) {
                     ~~~ ^ ~~~~
program_source:710:26: warning: comparison of integers of different signs: 'int' and 'unsigned int' [-Wsign-compare]
  for (auto idx = 1; idx < size; ++idx) {
                     ~~~ ^ ~~~~
program_source:723:26: warning: comparison of integers of different signs: 'int' and 'unsigned int' [-Wsign-compare]
  for (auto idx = 1; idx < size; ++idx) {

```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146549
Approved by: https://github.com/dcci
2025-02-06 00:40:17 +00:00
PyTorch MergeBot
e0cf519ade Revert "[inductor] Refactor op handlers part 2 (#146252)"
This reverts commit 13f0436abd.

Reverted https://github.com/pytorch/pytorch/pull/146252 on behalf of https://github.com/atalman due to Sorry need to revert, failing internally ([comment](https://github.com/pytorch/pytorch/pull/146252#issuecomment-2638305417))
2025-02-06 00:04:04 +00:00
Nikita Shulga
c7087d6b14 [BE][EZ][Metal] Do not pass tensor length as arg (#146522)
As all devices capable of running Metal-2 support nonuniform threadgroup sizes, see https://developer.apple.com/metal/Metal-Feature-Set-Tables.pdf for more detail
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146522
Approved by: https://github.com/dcci
ghstack dependencies: #146521
2025-02-06 00:03:41 +00:00
Nikita Shulga
54ef029532 [BE][EZ][Metal] Mark constant inputs as constant (#146521)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146521
Approved by: https://github.com/dcci
2025-02-06 00:03:41 +00:00
PyTorch MergeBot
2001066c61 Revert "[inductor] Refactor op handlers part 3 (#146254)"
This reverts commit 8e9bda8d89.

Reverted https://github.com/pytorch/pytorch/pull/146254 on behalf of https://github.com/atalman due to Sorry need to revert https://github.com/pytorch/pytorch/pull/146252 ([comment](https://github.com/pytorch/pytorch/pull/146254#issuecomment-2638300857))
2025-02-05 23:59:50 +00:00
Simon Fan
72405b0c0f [ca] refactor compile reasons and log to tlparse (#146386)
This PR accumulates comple reasons inside each CacheNode, and logs them to tlparse on each CA compile. This defines a compile as an autograd structure change, and a recompile as a dynamic shape change.

sample tlparse: https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpdbo7gt/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=100

for compiles:
```python
[
  "!0: Cache miss due to new autograd node: torch::autograd::GraphRoot (NodeCall 0) with key size 39, previous key sizes=[]"
]
```

for recompiles:
```python
[
  "!0: Cache miss due to new autograd node: torch::autograd::GraphRoot (NodeCall 0) with key size 39, previous key sizes=[]",
  "!1: Cache miss due to 7 changed tensor shapes (total of 7): sizes[0], sizes[1], sizes[2], sizes[3], sizes[4], sizes[5], sizes[6]"
]
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146386
Approved by: https://github.com/jansel
ghstack dependencies: #146229
2025-02-05 23:33:21 +00:00
PyTorch MergeBot
68304dba7a Revert "[inductor] Refactor op handlers part 4 (#146255)"
This reverts commit 7aced455c5.

Reverted https://github.com/pytorch/pytorch/pull/146255 on behalf of https://github.com/atalman due to Sorry need to revert https://github.com/pytorch/pytorch/pull/146252 ([comment](https://github.com/pytorch/pytorch/pull/146255#issuecomment-2638258089))
2025-02-05 23:24:20 +00:00
PyTorch MergeBot
49effa0deb Revert "[inductor] Refactor op handlers part 5 (#146257)"
This reverts commit d3dd3eeb7f.

Reverted https://github.com/pytorch/pytorch/pull/146257 on behalf of https://github.com/atalman due to Sorry need to revert https://github.com/pytorch/pytorch/pull/146252 ([comment](https://github.com/pytorch/pytorch/pull/146257#issuecomment-2638251994))
2025-02-05 23:20:38 +00:00
PyTorch MergeBot
93e1e6e07c Revert "[inductor] Minor compile time optimizations in DefaultHandler (#146282)"
This reverts commit b8a529cca1.

Reverted https://github.com/pytorch/pytorch/pull/146282 on behalf of https://github.com/atalman due to Sorry need to revert https://github.com/pytorch/pytorch/pull/146252 ([comment](https://github.com/pytorch/pytorch/pull/146282#issuecomment-2638239575))
2025-02-05 23:13:08 +00:00
PyTorch MergeBot
7dc5cfe2ad Revert "[inductor] Refactor CaptureIndexing into global scope (#146297)"
This reverts commit 7288950bcd.

Reverted https://github.com/pytorch/pytorch/pull/146297 on behalf of https://github.com/atalman due to Sorry need to revert https://github.com/pytorch/pytorch/pull/146252 ([comment](https://github.com/pytorch/pytorch/pull/146297#issuecomment-2638234829))
2025-02-05 23:10:08 +00:00
PyTorch MergeBot
9555bfce88 Revert "[inductor] Pre-populate cache for simplify_with_ranges return value (#146373)"
This reverts commit 84ba9c6e78.

Reverted https://github.com/pytorch/pytorch/pull/146373 on behalf of https://github.com/atalman due to Sorry need to revert https://github.com/pytorch/pytorch/pull/146252 ([comment](https://github.com/pytorch/pytorch/pull/146373#issuecomment-2638232033))
2025-02-05 23:07:08 +00:00
Yanan Cao (PyTorch)
8af31e30d7 [Codemod][AddExplicitStrictExportArg] caffe2/torch (#146439)
Differential Revision: D69068432

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146439
Approved by: https://github.com/avikchaudhuri
2025-02-05 22:56:54 +00:00
Catherine Lee
97b64f2e5c Fix workflow for closing nonexistent disable issues (#146447)
The workflow could not update issues because it didn't have permissions, and it looked green because it didn't check return codes.

Tested by running the workflow and seeing that issues did get closed
Fixes https://github.com/pytorch/pytorch/issues/145382
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146447
Approved by: https://github.com/huydhn
2025-02-05 22:29:05 +00:00
Howard Huang
9b6d680131 Remove stage_index_to_group_rank from schedule (#146217)
This PR allows schedules loaded via CSV to automatically set their `stage_index_to_group_rank ` and removes the `stage_index_to_group_rank ` argument from the `PipelineScheduleMulti` constructor

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146217
Approved by: https://github.com/wconstab
ghstack dependencies: #146193
2025-02-05 21:26:45 +00:00
Howard Huang
4ee7d0de86 Add generate_stage_to_rank_mapping utility (#146193)
We use `stage_index_to_group_rank` in the stage to determine what send/recv ops and in the schedule for IR generation. However, we don't need to expose this as an argument in our schedule class, so this stack of PRs is to remove it.

This PR creates a `stage_index_to_group_rank` utility function and removes the arg for the ZBVschedule. In a following PR I will add code to infer the `stage_index_to_group_rank` for the CSV schedule path and we will be able to remove this argument from our classes entirely.

Related comment from @wconstab https://github.com/pytorch/torchtitan/issues/774#issuecomment-2619793741

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146193
Approved by: https://github.com/wconstab
2025-02-05 21:26:45 +00:00
rzou
98b5d455fd [opcheck] Improve error reporting; allow atol/rtol overrides (#146488)
This PR improves opcheck to:
1. directly use torch.testing.assert_close (without a msg override).
   This allows it to print the absolute and relative differences and the
   number of mismatched elements.
2. take in an atol/rtol tolerance (for if someone just wants to use
   opcheck in their testing).

Test Plan:
- tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146488
Approved by: https://github.com/williamwen42
2025-02-05 21:25:06 +00:00
Justin Chu
1f6b566d74 [ONNX] Bump onnx and onnxscript versions in CI (#146097)
Bump onnx onnxscript==0.1 in CI; Skipped onnxruntime 1.19 because it has regression on avgpool.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146097
Approved by: https://github.com/malfet
2025-02-05 21:00:25 +00:00
Katarzyna Fojcik
9da376daa6 Add retain-output argument (#145921)
This PR add retain-output argument which enables appending to the already existing output file if it exists instead of deleting it and creating a new one.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145921
Approved by: https://github.com/jansel
2025-02-05 19:45:09 +00:00
Raymond Li
dd349207c5 Add check that envvar configs are boolean (#145454)
So we don't get unexpected behavior when higher typed values are passed in
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145454
Approved by: https://github.com/c00w, https://github.com/jamesjwu
2025-02-05 19:40:10 +00:00
Anant Gulati
9091096d6c Refactoring Distributed test cases to be device agnostic [1/n] (#145222)
In this series of PR we intend to refactoring distributed test cases to enable to be completely device agnostic.

These changes will include the following approaches to do the same :

- Allowing for multiple device types using instantiate_device_type_test
- Replacing calls to cuda stream with torch.get_device_module(device) wherever it applies
- Skipping set up steps required while using MultiProcessTestCase with DistributedTestBase (#138216) wherever applicable
- Replacing explicit calls to distributed backend (NCCL,HCCL,etc) with get_default_backend_for_device (#140536).

This should result in significant improvement in usability for all devices

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145222
Approved by: https://github.com/kwen2501
2025-02-05 18:47:09 +00:00
eqy
6f7fda3f49 Bump nn.functional.conv3d tolerances for test_comprehensive (#135719)
`float16` tolerance was previously set to `1e-5` which seemed very low
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135719
Approved by: https://github.com/Chillee, https://github.com/albanD
2025-02-05 18:34:12 +00:00
Tugsbayasgalan Manlaibaatar
d2a2b9f8a7 Fix constants with non-functional operators (#145593)
Previously, in non-strict path, we always error when trying to inplace update a constant tensor because those constant tensors are not actually wrapped by functional tensors. This is correct behaviour in torch.compile, because dynamo makes all constant tensors into buffers and AOTDispatcher just lifts them and wraps them in functional tensors. However, in non-strict, there is no such step that registers constants as buffers so AOTDispatcher panics when it sees these dangling constant tensors when functioanalizing.

Due to recent change in the IR, this is no longer an issue in non-strict path because we don't call AOTDispatcher at training IR level, but now it is a problem for both strict and non-strict when we lower to inference. (lowering to inference is very similar to non-strict tracing) As a result, we have at least one external (https://github.com/pytorch/pytorch/issues/141336) and internal issues reported due to this difference.

To fix this, there are two ways:
1. Make functionalization be aware of constant tensors and map them to functional tensors on the fly. This makes functionalization invariant uglier and could potentially open up a gate for more nasty bugs.
2. Special handle this in export. This seems more aligned with what dynamo does today so i think we should do it this way. I think the current state could benefit from more refactors to make the run_deocmpositions to be more similar to strict export (because both of them now handle this constant registerinig logic) but it is bit complicated to do it now because strict export version of this logic is also not complete because it doesn't take into account of export graph renaming pass etc). I will follow up with more refactors after this PR (T213466691) to unblock users faster.

For future reference:

Why are we not doing "turning constants into non-persistent buffers and never de-register"? The reason is because in some internal models, they rely on module.to to reliably work to move params/buffers to correct device. As a result, buffers are moved while constants are not. In composibility meeting, we agreed that export won't do device agnostic tracing going forward (it will provide a way to specify FakeTensor in CPU that can be configured to be run on GPU), so after that is done, we can always turn constants into non-persistent buffers which will simplify export's constant handling.

Differential Revision: [D68610739](https://our.internmc.facebook.com/intern/diff/D68610739)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145593
Approved by: https://github.com/avikchaudhuri
2025-02-05 17:44:19 +00:00
Jeff Daily
44248c44eb [ROCm] miopen benchmark behavior now better aligns with cudnn (#145294)
The default benchmark setting is now false. The new miopen behavior means when benchmarking is disabled, for any shape that doesn't have a find hit, then it will do a quick search (same behavior as the prior default), and use that result. Now when benchmark is enabled, it will perform an exhaustive search and update any DBs. miopen immediate mode is still available and is used when deterministic is true and benchmark is false.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145294
Approved by: https://github.com/BrianHarrisonAMD, https://github.com/malfet
2025-02-05 17:19:53 +00:00
PyTorch MergeBot
f27220e32a Revert "Move get accelerator to use build time flags when possible (#146098)"
This reverts commit 157d81c201.

Reverted https://github.com/pytorch/pytorch/pull/146098 on behalf of https://github.com/atalman due to Failing internally, sorry need to revert ([comment](https://github.com/pytorch/pytorch/pull/146098#issuecomment-2637443675))
2025-02-05 16:39:37 +00:00
Jason Ansel
f55c0af37f [inductor] Support non-power-of-2 cooperative RSPLIT (#145689)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145689
Approved by: https://github.com/eellison
2025-02-05 16:36:53 +00:00
maajidkhann
db22e9d5a2 Implement blend operation for float, double, int in VEC ATen backend for SVE (#146479)
- Added support for SVE vectorized blend operation for float, double, int8_t, int16_t, int32_t and int64_t data types.
- Utilizes SVE ACLE intrinsic (svcntb, svcntw, svcmpne, svsel) to handle different vector lengths (VL) dynamically.
-  Ensured compatibility with SVE128, SVE256, and SVE512 hardware configurations.
-  Enabled back blend SVE vec tests

**Testing:**
**a) Float DType:**
./vec_test_all_types_SVE256 --gtest_filter=BitwiseFloatsAdditional2/0.Blend    [Test Passed] on Graviton 3 machine (SVE256)
./vec_test_all_types_SVE128 --gtest_filter=BitwiseFloatsAdditional2/0.Blend    [Test Passed] on Graviton 4 machine (SVE128)

**b) Double DType:**
./vec_test_all_types_SVE256 --gtest_filter=BitwiseFloatsAdditional2/1.Blend    [Test Passed] on Graviton 3 machine (SVE256)
./vec_test_all_types_SVE128 --gtest_filter=BitwiseFloatsAdditional2/1.Blend    [Test Passed] on Graviton 4 machine (SVE128)

**c)Int DType:**
python3 test/inductor/test_cpu_repro.py CPUReproTests.test_vec_remainder
[Test Passed] on Graviton 3 machine (SVE256) and on Graviton 4 machine (SVE128)
<img width="661" alt="grv4_test_case_passed" src="https://github.com/user-attachments/assets/5572fcc0-a861-4bd6-bf9e-356219ffe656" />

Fixes https://github.com/pytorch/pytorch/issues/146309

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146479
Approved by: https://github.com/malfet

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2025-02-05 16:29:13 +00:00
Zhengxu Chen
cd6c0707a8 [aoti] Assign proxy call args by name, and support default values. (#146263)
Fixing the following issue when compiling the following program:
```
                window = torch.hann_window(N_FFT).to(x.device)
                stft = torch.stft(
                    x, N_FFT, HOP_LENGTH, window=window, return_complex=True
                )
                magnitudes = stft[..., :-1].abs() ** 2
                return magnitudes
```
```
Traceback (most recent call last):
  File "/home/zhxchen17/miniconda3/envs/dev/lib/python3.11/unittest/case.py", line 57, in testPartExecutor
    yield
  File "/home/zhxchen17/miniconda3/envs/dev/lib/python3.11/unittest/case.py", line 623, in run
    self._callTestMethod(testMethod)
  File "/home/zhxchen17/miniconda3/envs/dev/lib/python3.11/unittest/case.py", line 579, in _callTestMethod
    if method() is not None:
       ^^^^^^^^
  File "/home/zhxchen17/pytorch/torch/testing/_internal/common_utils.py", line 3120, in wrapper
    method(*args, **kwargs)
  File "/home/zhxchen17/pytorch/test/inductor/test_torchinductor.py", line 12356, in new_test
    return value(self)
           ^^^^^^^^^^^
  File "/home/zhxchen17/pytorch/test/inductor/test_aot_inductor.py", line 4334, in test_stft
    self.check_model(model, example_inputs)
  File "/home/zhxchen17/pytorch/test/inductor/test_aot_inductor_utils.py", line 185, in check_model
    actual = AOTIRunnerUtil.run(
             ^^^^^^^^^^^^^^^^^^^
  File "/home/zhxchen17/pytorch/test/inductor/test_aot_inductor_utils.py", line 137, in run
    optimized = AOTIRunnerUtil.load(device, so_path)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zhxchen17/pytorch/test/inductor/test_aot_inductor_utils.py", line 119, in load
    return torch._export.aot_load(so_path, device)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zhxchen17/pytorch/torch/_export/__init__.py", line 165, in aot_load
    runner = torch._C._aoti.AOTIModelContainerRunnerCuda(so_path, 1, device)  # type: ignore[assignment, call-arg]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Expected extern kernel aten::hann_window to have serialized argument type as_scalar_type for argument 1 but got as_device
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146263
Approved by: https://github.com/angelayi
2025-02-05 15:43:05 +00:00
rzou
1bb977a2a4 [auto_functionalized] Support Tensor(a!)[]? (#145400)
Summary:
This is just updating some of the checks to allow the Tensor(a!)[]? type
through.

Fixes #144072

Test Plan:
- new tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145400
Approved by: https://github.com/laithsakka
2025-02-05 14:52:39 +00:00
PyTorch MergeBot
282d185ec1 Revert "[inductor] use ftz variant of exp (#146216)"
This reverts commit b0b3fe8bcf.

Reverted https://github.com/pytorch/pytorch/pull/146216 on behalf of https://github.com/atalman due to inductor/test_op_completeness.py::TestOpCompleteness::test_triton_overrides [GH job link](https://github.com/pytorch/pytorch/actions/runs/13152430750/job/36702812599) [HUD commit link](b0b3fe8bcf) ([comment](https://github.com/pytorch/pytorch/pull/146216#issuecomment-2636961317))
2025-02-05 14:13:45 +00:00
Davide Italiano
8a2000fd42 [MPS] Implement support for zeta (both eager and inductor). (#146465)
A test was failing in inductor (`test_pointwise_zeta`) -- and I realized the operation was missing also from eager.
Implemented for both, leveraging the kernel. Happy to split in two (one PR for eager, one for inductor) if folks prefer.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146465
Approved by: https://github.com/malfet
2025-02-05 13:55:50 +00:00
Nichols A. Romero
fd0cd6a08f [ROCm][TunableOp] Improve identification of fastest solution (#144942)
This PR addresses some stability issues with identifying the fastest solution on AMD GPUs, particularly the MI300.

Changes include:
- An improved timer, StreamTimerNoSync
- More aggressive skipping of slow solutions
- Additional statistics that can be used for diagnostics PYTORCH_TUNABLEOP_VERBOSE=3

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144942
Approved by: https://github.com/jeffdaily
2025-02-05 11:16:49 +00:00
Simon Fan
e20b0c82d1 [ca] no longer require is_traceable annotations for c++ autograd functions (#146229)
This PR removes the CA compile-time error for C++ autograd functions, and supports them by having dynamo graph break on them (instead of allow_in_graph). The CppNode's collects are kept as is for now.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146229
Approved by: https://github.com/jansel, https://github.com/zou3519
2025-02-05 08:49:17 +00:00
cyy
6293d1446b [2/N] Remove NOLINT suppressions (#146402)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146402
Approved by: https://github.com/soulitzer
2025-02-05 08:38:52 +00:00
bobrenjc93
e5ea7e9cdc add support for capturing provenance of unary operations (#146413)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146413
Approved by: https://github.com/angelayi
ghstack dependencies: #145848
2025-02-05 08:31:38 +00:00