Commit graph

83996 commits

Author SHA1 Message Date
Justin Chu
9756c7d788 [benchmark] Remove ONNX (#146325)
ONNX exporter experiments in benchmark is obsolete and unmaintained. This PR removes it to unblock https://github.com/pytorch/pytorch/pull/146003

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146325
Approved by: https://github.com/titaiwangms
2025-02-04 04:02:47 +00:00
Doru Bercea
a79d8f8ba4 [ROCm] Tune 3d tensor sums when not using fastest dimension (#146170)
Tune 3d tensor sums when not using fastest dimension.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146170
Approved by: https://github.com/jeffdaily
2025-02-04 04:02:16 +00:00
David Berard
7997ecf809 [BE] reduce log spew from test_triton_kernels.py (#145895)
One of the tests in this file was setting `self._logging.set_logs(output_code=True)` - which would cause logs to be printed for the rest of the tests in this file.

This PR puts the log-setting in a context manager so that the old behavior is restored afterwards.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145895
Approved by: https://github.com/nmacchioni
2025-02-04 03:44:23 +00:00
Animesh Jain
5f53889850 [dynamo][builtin-skipfiles-cleanup] Remove inspect (#146116)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146116
Approved by: https://github.com/williamwen42, https://github.com/zou3519, https://github.com/jansel
ghstack dependencies: #146322
2025-02-04 03:36:07 +00:00
Ke Wen
762a05b3b3 [DCP] Remove all-gather of state dict keys (#145998)
The original `_all_gather_keys` call was for a safety check, but could be costly as things scale, and it blocks CPU.

Instead, we make it clear in the documentation that the `state_dict` passed to the `load` API should have same set of keys, otherwise the API may hang.

In addition, we move the check to a utility function: `utils.assert_same_keys`. User uncertain about state dict unity can optionally call this API to check.

Resolves #145965 (as a workaround).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145998
Approved by: https://github.com/mhorowitz, https://github.com/fegin
2025-02-04 03:16:13 +00:00
PyTorch MergeBot
7f796eb8b7 Revert "[inductor] Add typing to common.KernelArgs (#145916)"
This reverts commit 68cf36d5ab.

Reverted https://github.com/pytorch/pytorch/pull/145916 on behalf of https://github.com/atalman due to Failing internally, please see associated diff ([comment](https://github.com/pytorch/pytorch/pull/145916#issuecomment-2632715678))
2025-02-04 03:07:12 +00:00
PyTorch MergeBot
d3c7e4bb9c Revert "[inductor] Add typing to common.CSE (#145993)"
This reverts commit 8c657ae4be.

Reverted https://github.com/pytorch/pytorch/pull/145993 on behalf of https://github.com/atalman due to Sorry need to revert https://github.com/pytorch/pytorch/pull/145916 ([comment](https://github.com/pytorch/pytorch/pull/145993#issuecomment-2632712384))
2025-02-04 03:04:01 +00:00
PyTorch MergeBot
ecbc725fad Revert "[inductor] Finish typing common.py (#146225)"
This reverts commit 3a67c0e48d.

Reverted https://github.com/pytorch/pytorch/pull/146225 on behalf of https://github.com/atalman due to Sorry need to revert https://github.com/pytorch/pytorch/pull/145916 ([comment](https://github.com/pytorch/pytorch/pull/146225#issuecomment-2632709707))
2025-02-04 03:01:36 +00:00
PyTorch MergeBot
0061eb5b70 Revert "[inductor] Refactor CSEProxy into global scope (#146226)"
This reverts commit 18380ab877.

Reverted https://github.com/pytorch/pytorch/pull/146226 on behalf of https://github.com/atalman due to Sorry need to revert https://github.com/pytorch/pytorch/pull/145916 ([comment](https://github.com/pytorch/pytorch/pull/146226#issuecomment-2632707618))
2025-02-04 02:58:50 +00:00
cyy
f397c72697 Remove NOLINTNEXTLINE (#146238)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146238
Approved by: https://github.com/albanD
2025-02-04 02:45:32 +00:00
Nikita Shulga
5451c9b7c9 [MPSInductor] Add support for any reduction (#146370)
- Add `_new_accvar` function that creates a threadgroup variable
- As threadgroup variables can not be initialized in place, add explicit initialization for reduction var

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146370
Approved by: https://github.com/dcci, https://github.com/jansel
ghstack dependencies: #146369
2025-02-04 02:45:03 +00:00
Nikita Shulga
71179772cd [MPSInductor] Prep change for reduction support (#146369)
Add `group_pos` parameter as well as set `group_size` when invoking reduction kernels
Separates loads and stores and insert threadgroup barrier if reduction is in place

Should be a no-op right now

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146369
Approved by: https://github.com/dcci, https://github.com/jansel
2025-02-04 02:38:07 +00:00
Henry Tsang
3dcbd04d1d [cutlass backend] Add instantiation level for generating configs (#146230)
Passing through instantiation level to generate more configs.

I do see some C++ compilation error. But running is fine. Using 2222 generates 1k+ configs.

Differential Revision: D68989194

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146230
Approved by: https://github.com/Chillee, https://github.com/mlazos
2025-02-04 02:36:04 +00:00
bobrenjc93
0e49f35e3d Integrate sympy expression provenance logging with structured logs (#145848)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145848
Approved by: https://github.com/angelayi
2025-02-04 01:21:37 +00:00
Aaron Orenstein
4168982dad PEP585: .github release triggers (#145708)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145708
Approved by: https://github.com/malfet
2025-02-04 01:02:46 +00:00
Davide Italiano
cf6c5b8fa8 [mps/inductor] Adjust more tests that expect float64 as input. (#146366)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146366
Approved by: https://github.com/malfet
2025-02-04 00:48:02 +00:00
PyTorch MergeBot
2f40f789da Revert "[inductor] Refactor op handlers part 1 (#146235)"
This reverts commit 204be4e0a2.

Reverted https://github.com/pytorch/pytorch/pull/146235 on behalf of https://github.com/atalman due to Breaks lint, sorry: Definition of polygamma in base class MetalOverrides is incompatible with definition in base class OpsHandler. Please rebase fix lint and reland ([comment](https://github.com/pytorch/pytorch/pull/146235#issuecomment-2632444514))
2025-02-04 00:00:08 +00:00
Stas Bekman
3aeccf2a28 DeepSpeed github repo move sync (#146320)
DeepSpeed has moved to a new repo on github https://github.com/deepspeedai/DeepSpeed

This PR updates this repo to use the new URL.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146320
Approved by: https://github.com/awgu
2025-02-03 23:20:49 +00:00
Jason Ansel
204be4e0a2 [inductor] Refactor op handlers part 1 (#146235)
This enforces the invariant that every backend implements the same set of ops and removes a layer of indirection for BasicMathOps.

Interestingly this is a small compile time win:
```
...
WIN: benchmark ('add_loop_inductor', 'compile_time_instruction_count') failed, actual result 30151159301 is -6.13% lower than expected 32120000000 ±1.50% please update the expected results.

please update all results that changed significantly, and not only the failed ones
PASS: benchmark ('add_loop_inductor_dynamic_gpu', 'compile_time_instruction_count') pass, actual result 44447549162 -1.69% is within expected 45210000000 ±2.50%

WIN: benchmark ('add_loop_inductor_gpu', 'compile_time_instruction_count') failed, actual result 26743557195 is -2.25% lower than expected 27360000000 ±1.50% please update the expected results.

please update all results that changed significantly, and not only the failed ones
PASS: benchmark ('basic_modules_ListOfLinears_eager', 'compile_time_instruction_count') pass, actual result 945129734 +0.93% is within expected 936400000 ±1.50%

WIN: benchmark ('basic_modules_ListOfLinears_inductor', 'compile_time_instruction_count') failed, actual result 18984384503 is -3.19% lower than expected 19610000000 ±1.50% please update the expected results.

please update all results that changed significantly, and not only the failed ones
WIN: benchmark ('basic_modules_ListOfLinears_inductor_gpu_force_shape_pad', 'compile_time_instruction_count') failed, actual result 17258025389 is -1.94% lower than expected 17600000000 ±1.50% please update the expected results.
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146235
Approved by: https://github.com/shunting314
ghstack dependencies: #146225, #146226
2025-02-03 23:15:13 +00:00
Jason Ansel
18380ab877 [inductor] Refactor CSEProxy into global scope (#146226)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146226
Approved by: https://github.com/shunting314
ghstack dependencies: #146225
2025-02-03 23:15:13 +00:00
Natalia Gimelshein
0bc036a9e9 use copy2d in h2d/d2h copy when possible (#146256)
A rewrite of #138964
In addition to rewriting the conditions for using copy2d, this PR fixes a few other problems with #138964:
1) gpu-gpu copies when peer access is disabled shouldn't rely on copy2d
2) copy2d should record even for the host pinned memory, like the regular copy does
3) copy2d shouldn't pretend that it's synchronizing (for the purposes of cuda sanitizer tracer) when it's non-blocking

In this PR copy2d behaves in exactly the same way as copy does wrt to those additional syncs, except it calls a different underlying cuda call.

Tests for multiple cases going through copy2d and avoiding copy2d pattern due to unsatisfied conditions are added.
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146256
Approved by: https://github.com/eqy, https://github.com/malfet

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2025-02-03 23:07:54 +00:00
Henry Tsang
35af193408 [easy] Add type annotation for autotune_num_choices_displayed (#146323)
Test Plan: ci

Differential Revision: D69064447

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146323
Approved by: https://github.com/ColinPeppler
2025-02-03 23:04:21 +00:00
Davide Italiano
0463cb6ca5 [mps/inductor] Add support for digamma(). (#146292)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146292
Approved by: https://github.com/malfet, https://github.com/jansel
2025-02-03 22:48:13 +00:00
titaiwangms
178531c95e [ONNX] torch.onnx.export(dynamo=True) changes optimization to default (#146187)
Fixes #145897
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146187
Approved by: https://github.com/justinchuby
2025-02-03 22:44:54 +00:00
bobrenjc93
d69c181d77 log out partial fx graph when guard on data dependent during non stirct tracing (#146298)
As discussed with @avikchaudhuri and @bdhirsh last week, this can be quite useful when debugging.

The following code produces a data dependent error

```
import torch
from torch import nn

# UserError: Could not guard on data-dependent expression Eq(507 - u0, 0) (unhinted: Eq(507 - u0, 0)).  (Size-like symbols: u0)
class Repro(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, cache, update, pos):
        _, _, max_seq_len, _ = cache.shape
        _, _, seqlen, _ = update.shape

        pos_item = pos[0].item() # u0
        torch._check(pos_item + seqlen <= max_seq_len) # u0 + 502 <= 507
        torch._check(pos_item >= 0)
        before = cache.narrow(2, 0, pos_item)

        # FAIL
        # Laith: why can't we make unbacked expressions size-like?
        after = cache.narrow(2, (pos_item + seqlen), (max_seq_len - pos_item - seqlen))

        # PASS
        end = torch.tensor(max_seq_len - pos_item - seqlen).item()
        after = cache.narrow(2, (pos_item + seqlen), end)

        return torch.cat([before, update, after], dim=2)

repro = Repro()

bsz = 1
n_heads = 4
max_seq_len = 512
head_dim = 64
seqlen = 5
pos_item = 1

cache = torch.zeros(bsz, n_heads, max_seq_len, head_dim)
update = torch.ones(bsz, n_heads, seqlen, head_dim)
pos = torch.tensor([pos_item])
example_inputs = (cache, update, pos)

torch.export.export(repro, example_inputs, strict=False)
```

This is what it now prints out

```
class GraphModule(torch.nn.Module):
    def forward(self, arg0_1: "f32[1, 4, 512, 64][131072, 32768, 64, 1]cpu", arg1_1: "f32[1, 4, 5, 64][1280, 320, 64, 1]cpu", arg2_1: "i64[1][1]cpu"):
         # File: /data/users/bobren/a/pytorch/r1.py:14 in forward, code: pos_item = pos[0].item() # u0
        select: "i64[][]cpu" = torch.ops.aten.select.int(arg2_1, 0, 0);  arg2_1 = None
        item: "Sym(u0)" = torch.ops.aten.item.default(select);  select = None

         # File: /data/users/bobren/a/pytorch/r1.py:15 in forward, code: torch._check(pos_item + seqlen <= max_seq_len) # u0 + 502 <= 507
        add: "Sym(u0 + 5)" = item + 5
        le: "Sym(u0 + 5 <= 512)" = add <= 512;  add = le = None

         # File: /data/users/bobren/a/pytorch/r1.py:16 in forward, code: torch._check(pos_item >= 0)
        ge: "Sym(u0 >= 0)" = item >= 0;  ge = None

         # File: /data/users/bobren/a/pytorch/r1.py:17 in forward, code: before = cache.narrow(2, 0, pos_item)
        narrow: "f32[1, 4, u0, 64][131072, 32768, 64, 1]cpu" = torch.ops.aten.narrow.default(arg0_1, 2, 0, item);  narrow = None

         # File: /data/users/bobren/a/pytorch/r1.py:21 in forward, code: after = cache.narrow(2, (pos_item + seqlen), (max_seq_len - pos_item - seqlen))
        add_1: "Sym(u0 + 5)" = item + 5
        sub: "Sym(512 - u0)" = 512 - item;  item = None
        sub_1: "Sym(507 - u0)" = sub - 5;  sub = None
        narrow_1 = torch.ops.aten.narrow.default(arg0_1, 2, add_1, sub_1);  arg0_1 = add_1 = sub_1 = narrow_1 = None

Traceback (most recent call last):
  File "/data/users/bobren/a/pytorch/r1.py", line 45, in <module>
    torch.export.export(repro, example_inputs, strict=False)
  File "/data/users/bobren/a/pytorch/torch/export/__init__.py", line 368, in export
    return _export(
  File "/data/users/bobren/a/pytorch/torch/export/_trace.py", line 1044, in wrapper
    raise e
  File "/data/users/bobren/a/pytorch/torch/export/_trace.py", line 1017, in wrapper
    ep = fn(*args, **kwargs)
  File "/data/users/bobren/a/pytorch/torch/export/exported_program.py", line 117, in wrapper
    return fn(*args, **kwargs)
  File "/data/users/bobren/a/pytorch/torch/export/_trace.py", line 2079, in _export
    return _export_for_training(
  File "/data/users/bobren/a/pytorch/torch/export/_trace.py", line 1044, in wrapper
    raise e
  File "/data/users/bobren/a/pytorch/torch/export/_trace.py", line 1017, in wrapper
    ep = fn(*args, **kwargs)
  File "/data/users/bobren/a/pytorch/torch/export/exported_program.py", line 117, in wrapper
    return fn(*args, **kwargs)
  File "/data/users/bobren/a/pytorch/torch/export/_trace.py", line 1944, in _export_for_training
    export_artifact = export_func(  # type: ignore[operator]
  File "/data/users/bobren/a/pytorch/torch/export/_trace.py", line 1879, in _non_strict_export
    aten_export_artifact = _to_aten_func(  # type: ignore[operator]
  File "/data/users/bobren/a/pytorch/torch/export/_trace.py", line 1665, in _export_to_aten_ir_make_fx
    gm, graph_signature = transform(_make_fx_helper)(
  File "/data/users/bobren/a/pytorch/torch/export/_trace.py", line 1809, in _aot_export_non_strict
    gm, sig = aot_export(wrapped_mod, args, kwargs=kwargs, **flags)
  File "/data/users/bobren/a/pytorch/torch/export/_trace.py", line 1585, in _make_fx_helper
    gm = make_fx(
  File "/data/users/bobren/a/pytorch/torch/fx/experimental/proxy_tensor.py", line 2194, in wrapped
    return make_fx_tracer.trace(f, *args)
  File "/data/users/bobren/a/pytorch/torch/fx/experimental/proxy_tensor.py", line 2132, in trace
    return self._trace_inner(f, *args)
  File "/data/users/bobren/a/pytorch/torch/fx/experimental/proxy_tensor.py", line 2103, in _trace_inner
    t = dispatch_trace(
  File "/data/users/bobren/a/pytorch/torch/_compile.py", line 51, in inner
    return disable_fn(*args, **kwargs)
  File "/data/users/bobren/a/pytorch/torch/_dynamo/eval_frame.py", line 749, in _fn
    return fn(*args, **kwargs)
  File "/data/users/bobren/a/pytorch/torch/fx/experimental/proxy_tensor.py", line 1136, in dispatch_trace
    graph = tracer.trace(root, concrete_args)  # type: ignore[arg-type]
  File "/data/users/bobren/a/pytorch/torch/fx/experimental/proxy_tensor.py", line 1692, in trace
    res = super().trace(root, concrete_args)
  File "/data/users/bobren/a/pytorch/torch/fx/_symbolic_trace.py", line 834, in trace
    (self.create_arg(fn(*args)),),
  File "/data/users/bobren/a/pytorch/torch/fx/experimental/proxy_tensor.py", line 1191, in wrapped
    out = f(*tensors)  # type:ignore[call-arg]
  File "<string>", line 1, in <lambda>
  File "/data/users/bobren/a/pytorch/torch/export/_trace.py", line 1488, in wrapped_fn
    return tuple(flat_fn(*args))
  File "/data/users/bobren/a/pytorch/torch/_functorch/_aot_autograd/utils.py", line 184, in flat_fn
    tree_out = fn(*args, **kwargs)
  File "/data/users/bobren/a/pytorch/torch/_functorch/_aot_autograd/traced_function_transforms.py", line 879, in functional_call
    out = mod(*args[params_len:], **kwargs)
  File "/data/users/bobren/a/pytorch/torch/fx/_symbolic_trace.py", line 811, in module_call_wrapper
    return self.call_module(mod, forward, args, kwargs)
  File "/data/users/bobren/a/pytorch/torch/fx/experimental/proxy_tensor.py", line 1762, in call_module
    return Tracer.call_module(self, m, forward, args, kwargs)
  File "/data/users/bobren/a/pytorch/torch/fx/_symbolic_trace.py", line 529, in call_module
    ret_val = forward(*args, **kwargs)
  File "/data/users/bobren/a/pytorch/torch/fx/_symbolic_trace.py", line 804, in forward
    return _orig_module_call(mod, *args, **kwargs)
  File "/data/users/bobren/a/pytorch/torch/nn/modules/module.py", line 1749, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data/users/bobren/a/pytorch/torch/nn/modules/module.py", line 1760, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data/users/bobren/a/pytorch/torch/export/_trace.py", line 1793, in forward
    tree_out = mod(*args, **kwargs)
  File "/data/users/bobren/a/pytorch/torch/fx/_symbolic_trace.py", line 811, in module_call_wrapper
    return self.call_module(mod, forward, args, kwargs)
  File "/data/users/bobren/a/pytorch/torch/fx/experimental/proxy_tensor.py", line 1762, in call_module
    return Tracer.call_module(self, m, forward, args, kwargs)
  File "/data/users/bobren/a/pytorch/torch/fx/_symbolic_trace.py", line 529, in call_module
    ret_val = forward(*args, **kwargs)
  File "/data/users/bobren/a/pytorch/torch/fx/_symbolic_trace.py", line 804, in forward
    return _orig_module_call(mod, *args, **kwargs)
  File "/data/users/bobren/a/pytorch/torch/nn/modules/module.py", line 1749, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data/users/bobren/a/pytorch/torch/nn/modules/module.py", line 1760, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data/users/bobren/a/pytorch/r1.py", line 21, in forward
    after = cache.narrow(2, (pos_item + seqlen), (max_seq_len - pos_item - seqlen))
  File "/data/users/bobren/a/pytorch/torch/fx/experimental/proxy_tensor.py", line 1239, in __torch_function__
    return func(*args, **kwargs)
  File "/data/users/bobren/a/pytorch/torch/fx/experimental/proxy_tensor.py", line 1286, in __torch_function__
    return func(*args, **kwargs)
  File "/data/users/bobren/a/pytorch/torch/_export/non_strict_utils.py", line 654, in __torch_function__
    return func(*args, **kwargs)
  File "/data/users/bobren/a/pytorch/torch/_ops.py", line 866, in handler
    return torch._library.utils.handle_dispatch_mode(
  File "/data/users/bobren/a/pytorch/torch/_library/utils.py", line 296, in handle_dispatch_mode
    return curr_mode.__torch_dispatch__(op_overload, overload_types, args, kwargs)
  File "/data/users/bobren/a/pytorch/torch/utils/_stats.py", line 27, in wrapper
    return fn(*args, **kwargs)
  File "/data/users/bobren/a/pytorch/torch/fx/experimental/proxy_tensor.py", line 1341, in __torch_dispatch__
    return proxy_call(self, func, self.pre_dispatch, args, kwargs)
  File "/data/users/bobren/a/pytorch/torch/fx/experimental/proxy_tensor.py", line 910, in proxy_call
    out = func(*args, **kwargs)
  File "/data/users/bobren/a/pytorch/torch/_ops.py", line 749, in __call__
    return self._op(*args, **kwargs)
  File "/data/users/bobren/a/pytorch/torch/utils/_stats.py", line 27, in wrapper
    return fn(*args, **kwargs)
  File "/data/users/bobren/a/pytorch/torch/_subclasses/fake_tensor.py", line 1267, in __torch_dispatch__
    return self.dispatch(func, types, args, kwargs)
  File "/data/users/bobren/a/pytorch/torch/_subclasses/fake_tensor.py", line 1808, in dispatch
    return self._cached_dispatch_impl(func, types, args, kwargs)
  File "/data/users/bobren/a/pytorch/torch/_subclasses/fake_tensor.py", line 1369, in _cached_dispatch_impl
    output = self._dispatch_impl(func, types, args, kwargs)
  File "/data/users/bobren/a/pytorch/torch/_subclasses/fake_tensor.py", line 2282, in _dispatch_impl
    decomposition_table[func](*args, **kwargs)
  File "/data/users/bobren/a/pytorch/torch/_decomp/decompositions.py", line 759, in slice_forward
    return self.as_strided(sizes, strides, storage_offset)
  File "/data/users/bobren/a/pytorch/torch/utils/_stats.py", line 27, in wrapper
    return fn(*args, **kwargs)
  File "/data/users/bobren/a/pytorch/torch/_subclasses/fake_tensor.py", line 1267, in __torch_dispatch__
    return self.dispatch(func, types, args, kwargs)
  File "/data/users/bobren/a/pytorch/torch/_subclasses/fake_tensor.py", line 1808, in dispatch
    return self._cached_dispatch_impl(func, types, args, kwargs)
  File "/data/users/bobren/a/pytorch/torch/_subclasses/fake_tensor.py", line 1370, in _cached_dispatch_impl
    entry = self._make_cache_entry(state, key, func, args, kwargs, output)
  File "/data/users/bobren/a/pytorch/torch/_subclasses/fake_tensor.py", line 1640, in _make_cache_entry
    output_info = self._get_output_info_for_cache_entry(
  File "/data/users/bobren/a/pytorch/torch/_subclasses/fake_tensor.py", line 1583, in _get_output_info_for_cache_entry
    synth_output = self._output_from_cache_entry(
  File "/data/users/bobren/a/pytorch/torch/_subclasses/fake_tensor.py", line 1738, in _output_from_cache_entry
    return self._get_output_tensor_from_cache_entry(
  File "/data/users/bobren/a/pytorch/torch/_subclasses/fake_tensor.py", line 1709, in _get_output_tensor_from_cache_entry
    empty.set_(storage, storage_offset, shape, stride)
  File "/data/users/bobren/a/pytorch/torch/fx/experimental/sym_node.py", line 564, in guard_size_oblivious
    r = self.shape_env.evaluate_expr(
  File "/data/users/bobren/a/pytorch/torch/fx/experimental/recording.py", line 263, in wrapper
    return retlog(fn(*args, **kwargs))
  File "/data/users/bobren/a/pytorch/torch/fx/experimental/symbolic_shapes.py", line 6468, in evaluate_expr
    return self._evaluate_expr(
  File "/data/users/bobren/a/pytorch/torch/fx/experimental/symbolic_shapes.py", line 6658, in _evaluate_expr
    raise self._make_data_dependent_error(
torch.fx.experimental.symbolic_shapes.GuardOnDataDependentSymNode: Could not guard on data-dependent expression Ne(507 - u0, 1) (unhinted: Ne(507 - u0, 1)).  (Size-like symbols: u0)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146298
Approved by: https://github.com/bdhirsh
2025-02-03 22:16:03 +00:00
Animesh Jain
0da07a6d1d [dynamo][skip-function] Add missing unimplemented line (#146322)
This is a missing line from the merged PR in the stack below. Lets try to get this in quickly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146322
Approved by: https://github.com/StrongerXi, https://github.com/jansel, https://github.com/mlazos
2025-02-03 22:11:55 +00:00
PyTorch MergeBot
00dc5b10f6 Revert "[Environment Variable][7/N] Use thread-safe getenv functions (#140211)"
This reverts commit 2fd1b6b361.

Reverted https://github.com/pytorch/pytorch/pull/140211 on behalf of https://github.com/atalman due to Breaks executorch tests ([comment](https://github.com/pytorch/pytorch/pull/140211#issuecomment-2632202864))
2025-02-03 22:04:28 +00:00
Yanbo Liang
15e12d5ec3 [Trace PyDispatcher] Support temporarily_pop_interpreter_stack ctx manager (#146271)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146271
Approved by: https://github.com/zou3519
ghstack dependencies: #146270
2025-02-03 21:47:54 +00:00
Yanbo Liang
bd8d7b1b74 [Dynamo][Trace PyDispatcher] Remove disable from HigherOrderOperator.__call__ (#146270)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146270
Approved by: https://github.com/zou3519
2025-02-03 21:47:54 +00:00
Yang Wang
fd73ae2068 [Utilization] Convert timestamp to str for datetime64 (#145985)
Convert all timestamp(float) to int  timestamp during data pipeline for db type datetime64.
float does not work when try to insert into clickhouse using jsonExtract.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145985
Approved by: https://github.com/huydhn
2025-02-03 21:05:18 +00:00
Simon Fan
1d4adf4e1f [dynamo] log recompile reason to dynamo_compile (#146117)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146117
Approved by: https://github.com/bobrenjc93
2025-02-03 21:04:04 +00:00
Zhengxu Chen
11f69808c6 [aoti] Assign proxy call args by name, and support default values. (#146263)
Fixing the following issue when compiling the following program:
```
                window = torch.hann_window(N_FFT).to(x.device)
                stft = torch.stft(
                    x, N_FFT, HOP_LENGTH, window=window, return_complex=True
                )
                magnitudes = stft[..., :-1].abs() ** 2
                return magnitudes
```
```
Traceback (most recent call last):
  File "/home/zhxchen17/miniconda3/envs/dev/lib/python3.11/unittest/case.py", line 57, in testPartExecutor
    yield
  File "/home/zhxchen17/miniconda3/envs/dev/lib/python3.11/unittest/case.py", line 623, in run
    self._callTestMethod(testMethod)
  File "/home/zhxchen17/miniconda3/envs/dev/lib/python3.11/unittest/case.py", line 579, in _callTestMethod
    if method() is not None:
       ^^^^^^^^
  File "/home/zhxchen17/pytorch/torch/testing/_internal/common_utils.py", line 3120, in wrapper
    method(*args, **kwargs)
  File "/home/zhxchen17/pytorch/test/inductor/test_torchinductor.py", line 12356, in new_test
    return value(self)
           ^^^^^^^^^^^
  File "/home/zhxchen17/pytorch/test/inductor/test_aot_inductor.py", line 4334, in test_stft
    self.check_model(model, example_inputs)
  File "/home/zhxchen17/pytorch/test/inductor/test_aot_inductor_utils.py", line 185, in check_model
    actual = AOTIRunnerUtil.run(
             ^^^^^^^^^^^^^^^^^^^
  File "/home/zhxchen17/pytorch/test/inductor/test_aot_inductor_utils.py", line 137, in run
    optimized = AOTIRunnerUtil.load(device, so_path)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zhxchen17/pytorch/test/inductor/test_aot_inductor_utils.py", line 119, in load
    return torch._export.aot_load(so_path, device)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zhxchen17/pytorch/torch/_export/__init__.py", line 165, in aot_load
    runner = torch._C._aoti.AOTIModelContainerRunnerCuda(so_path, 1, device)  # type: ignore[assignment, call-arg]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Expected extern kernel aten::hann_window to have serialized argument type as_scalar_type for argument 1 but got as_device
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146263
Approved by: https://github.com/angelayi
2025-02-03 20:15:59 +00:00
Henry Tsang
e67ce67498 [cutlass backend] update try_import_cutlass to accomodate for pip install (#145891)
The goal of this PR is to provide 3 ways for people to try out CUTLASS backend:
1. fbcode / internal
2. pip install torch (nightly) and pip install nvidia-cutlass
3. build from source

I will go into more detailed combos between building from source and downloading via pip for torch and cutlass.

repro:
```
import torch
import torch.nn as nn

import torch._inductor.config as config

config.force_disable_caches = True
config.max_autotune = True
config.max_autotune_gemm_backends = "CUTLASS"
# the following is only needed if you use a custom cutlass library
# config.cuda.cutlass_dir = "/data/users/henrylhtsang/cutlass"

class TestModule(nn.Module):
    def forward(self, A, B):
        return A @ B

model = TestModule().cuda()
M, K, N = 2048, 2048, 2048
A = torch.randn(M, K).cuda().half()
B = torch.randn(K, N).cuda().half()

C = torch.compile(model, fullgraph=True)(A, B)
```

## pre-requisite
Assuming you have the right cuda toolkit. Recommend 12.4. Make sure PATH, LD_LIBRARY_PATH and CUDA_NVCC_EXECUTABLE are good.

## combo 1: pip install torch + pip install nvidia-cutlass
Check https://pytorch.org/get-started/locally/ for **nightly** install command.
```
pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu124
pip install nvidia-cutlass
```
Then try running the script above. It should work.

## combo 2: build torch from source + pip install nvidia-cutlass
This is going to be be pretty straightforward. Just keep in mind that even though pytorch/third_party/cutlass exists, the one that will be used is the pip package, so mindful of version differences.

## combo 3: build torch from source + use pytorch/third_party/cutlass
This is how most pytorch devs would do it. Just make sure you don't have a cutlass pip package installed, i.e., make sure `import cutlass_library` would fail on its own.

## combo 4: any torch version + cutlass library from somewhere else
This is probably the only case you need to pass in cutlass_dir. Just set cutlass_dir to the cutlass repo library. The expectations is that cutlass_dir is the directory that contains include, tool, and python/cutlass_library.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145891
Approved by: https://github.com/Chillee, https://github.com/ColinPeppler
2025-02-03 20:05:41 +00:00
Isalia20
f237172768 Fix not inlining functions used in metal files (#146316)
Fixes issue when building PyTorch with Xcode installed after https://github.com/pytorch/pytorch/pull/146231
```
FAILED: caffe2/aten/src/ATen/kernels_basic.metallib /Users/Irakli_Salia/Desktop/pytorch/build/caffe2/aten/src/ATen/kernels_basic.metallib
cd /Users/Irakli_Salia/Desktop/pytorch/build/caffe2/aten/src/ATen && xcrun metallib -o kernels_basic.metallib BinaryKernel_30.air Bucketization_30.air CrossKernel_30.air FusedOptimizerOps_30.air Gamma_30.air HistogramKernel_30.air Im2Col_30.air Indexing_30.air LinearAlgebra_30.air Quantized_30.air RMSNorm_30.air RenormKernel_30.air Repeat_30.air SpecialOps_30.air TriangularOps_30.air UnaryKernel_30.air UnfoldBackward_30.air UpSample_30.air
LLVM ERROR: multiple symbols ('_ZN3c105metal4zetaEff')!
[3835/5420] Building CXX object c10/test/CMakeFiles/c10_small_vector_test.dir/util/small_vector_test.cpp.o
ninja: build stopped: subcommand failed.
```

AI to @malfet: Add linter that ensures that `c10/metal/` headers do not have any functions there, only templates
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146316
Approved by: https://github.com/malfet, https://github.com/atalman
2025-02-03 19:33:52 +00:00
Yidi Wu
674e0b668a Add non-strict export while_loop test back (#146195)
This is fixed by https://github.com/pytorch/pytorch/pull/145762

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146195
Approved by: https://github.com/zou3519
ghstack dependencies: #146194
2025-02-03 19:28:22 +00:00
Yidi Wu
1138d0c4f6 [hop] enable while_loop return torch.ones with unbacked symbol expression. (#146194)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146194
Approved by: https://github.com/zou3519
2025-02-03 19:28:22 +00:00
Animesh Jain
57b1fc35f6 [dynamo] Disable compiling on elementwise_type_promotion_wrapper (#146219)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146219
Approved by: https://github.com/zou3519
ghstack dependencies: #146075, #146283
2025-02-03 18:02:48 +00:00
PyTorch MergeBot
64fc9ff09c Revert "[ONNX] Create deprecation warning on dynamo_export (#146003)"
This reverts commit e6c39d37e9.

Reverted https://github.com/pytorch/pytorch/pull/146003 on behalf of https://github.com/atalman due to Broke internally ([comment](https://github.com/pytorch/pytorch/pull/146003#issuecomment-2631599314))
2025-02-03 17:17:14 +00:00
Tugsbayasgalan Manlaibaatar
041e08f9dc Add buffers to parameterizaiton rule (#145991)
Differential Revision: [D68959513](https://our.internmc.facebook.com/intern/diff/D68959513)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145991
Approved by: https://github.com/bdhirsh
2025-02-03 16:49:03 +00:00
PyTorch MergeBot
c0979d72b5 Revert "[hop][inductor] track the dependency on unbacked symbols correctly with constant_args for hops (#143456)"
This reverts commit 68a3635484.

Reverted https://github.com/pytorch/pytorch/pull/143456 on behalf of https://github.com/atalman due to New tests are failing internally ([comment](https://github.com/pytorch/pytorch/pull/143456#issuecomment-2631475900))
2025-02-03 16:25:58 +00:00
Harmen Stoppels
01554c7b5a fix incorrect literal strings / accidental tuples (#146037)
* `expr,` is short for `(expr,)`
* literal strings over multiple lines need to escape the newline `\` or use `(...)`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146037
Approved by: https://github.com/Skylion007
2025-02-03 15:08:11 +00:00
PyTorch UpdateBot
550441a87b Update slow tests (#146301)
This PR is auto-generated weekly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/weekly.yml).
Update the list of slow tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146301
Approved by: https://github.com/pytorchbot
2025-02-03 11:37:16 +00:00
Isuru Fernando
08b14936ae Disable has_relational_guards check for dict_tag optimization for now (#146232)
has_relational_guards evaluates to true almost always, and leads to a
slowdown in guards runtime

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146232
Approved by: https://github.com/anijain2305
2025-02-03 07:56:06 +00:00
Isalia20
e3643e1e0e [MPS] Add linalg det and fix lu factor for non contiguous tensors (#146279)
Requested in #77764

This PR adds support for linalg.det on MPS and fixes lu factor for non contiguous tensors, current implementation crashed on any kind of non-contiguous tensor with an error:
```
-[AGXG13XFamilyCommandBuffer blitCommandEncoderCommon:]:833: failed assertion `A command encoder is already encoding to this command buffer'
zsh: abort      python det.py
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146279
Approved by: https://github.com/malfet

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2025-02-03 06:06:43 +00:00
Zhengxu Chen
1580f47bf4 [export][ez] Fix generated header file. (#146208)
Summary: as title.

Test Plan: CI

Differential Revision: D68978788

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146208
Approved by: https://github.com/yiming0416
2025-02-03 06:01:05 +00:00
cyy
7b512095ef Enable some tests on MacOS (#146268)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146268
Approved by: https://github.com/Skylion007, https://github.com/malfet
2025-02-03 05:04:24 +00:00
Animesh Jain
fa48757180 [dynamo] misc fixes for inspect (#146283)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146283
Approved by: https://github.com/jansel
ghstack dependencies: #146075
2025-02-03 04:26:10 +00:00
cyy
6ac8bc0cd2 Remove unused import in tests (#146266)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146266
Approved by: https://github.com/Skylion007
2025-02-03 03:40:18 +00:00
Davide Italiano
d80eef7c6d [inductor] Guard a member variable with a define. (#146278)
It's unused otherwise, and when running MPS tests, I get a bunch of warnings of this kind:

/Users/davidino/pytorch/pytorch/torch/include/torch/csrc/inductor/aoti_runtime/model_container.h:412:10: warning: private field 'blob_size_' is not used [-Wunused-private-field]
  412 |   size_t blob_size_;
      |          ^
1 warning generated.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146278
Approved by: https://github.com/Skylion007, https://github.com/jansel
2025-02-03 02:20:08 +00:00
Animesh Jain
c0ec2e0a0d [dynamo][functions] Improve getattr on functions (#146075)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146075
Approved by: https://github.com/jansel
2025-02-03 02:01:57 +00:00