Commit graph

76633 commits

Author SHA1 Message Date
Jiashen Cao
09fcd792eb [Fix]: ScriptObject lifting issue (#130952)
#### Issue
ScriptObject was treated as normal attribute by the converter previously. This PR lifts it to be a constant and convert it directly to a GetAttr fx node. ScriptObject would also trigger `CallMethod` and this PR adds that support as well.

#### Test Plan
Add test case for ScriptObject.
`pytest test/export/test_converter.py -s -k test_convert_script_object`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130952
Approved by: https://github.com/angelayi
2024-08-04 16:52:45 +00:00
PyTorch MergeBot
5dac4d2c78 Revert "[easy] fix f-string messages in torch/_ops.py (#132531)"
This reverts commit 908d2a153b.

Reverted https://github.com/pytorch/pytorch/pull/132531 on behalf of https://github.com/davidberard98 due to still breaks tests ([comment](https://github.com/pytorch/pytorch/pull/132531#issuecomment-2267584289))
2024-08-04 15:41:56 +00:00
cyy
105ba7b58c [5/N] Fix clang-tidy warnings in aten/src/ATen (#132565)
Follows #132001

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132565
Approved by: https://github.com/Skylion007
2024-08-04 14:39:16 +00:00
David Berard
908d2a153b [easy] fix f-string messages in torch/_ops.py (#132531)
I encountered these when making this change:

```
diff --git a/test/functorch/test_ac.py b/test/functorch/test_ac.py
index 3a2e07fa147..a4d003399e7 100644
--- a/test/functorch/test_ac.py
+++ b/test/functorch/test_ac.py
@@ -259,15 +259,8 @@ class MemoryBudgetTest(TestCase):

         expected = call()
         for budget in range(0, 11):
-            memory_budget = budget / 10
-            torch._dynamo.reset()
-            with config.patch(activation_memory_budget=memory_budget):
-                if memory_budget is not None:
-                    f_compile = torch.compile(
-                        call, backend="aot_eager_decomp_partition"
-                    )
-
-                self.assertEqual(expected, f_compile())
+            get_mem_and_flops(call, memory_budget=budget / 10)
+

     def test_prioritize_cheaper_matmul(self):
         def f(xs, ws):
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132531
Approved by: https://github.com/Skylion007
ghstack dependencies: #132356, #132466
2024-08-04 14:30:42 +00:00
Xu Han
87d46d70d7 [inductor] export kernel for gemm template. (#132580)
Changes:
1. Move `get_export_declaration` to `cpp_utils.py` as basic function.
2. Export kernel for gemm template.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132580
Approved by: https://github.com/ezyang
2024-08-04 11:17:19 +00:00
Xuehai Pan
d2dc173664 Remove lint dependency ufmt (#132573)
`ufmt` is a combination of `black + usort`.

This PR removes `ufmt` and run `black` and `usort` separately.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132573
Approved by: https://github.com/ezyang
ghstack dependencies: #129769, #132572
2024-08-04 10:24:09 +00:00
Xuehai Pan
f7aeb394b6 [BE][Easy] Remove empty ISORT_SKIPLIST (#132572)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132572
Approved by: https://github.com/ezyang, https://github.com/justinchuby
ghstack dependencies: #129769
2024-08-04 10:24:09 +00:00
Xuehai Pan
f3fce597e9 [BE][Easy][17/19] enforce style for empty lines in import segments in torch/[a-c]*/ and torch/[e-n]*/ (#129769)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129769
Approved by: https://github.com/ezyang
2024-08-04 10:24:09 +00:00
Dan Zimmerman
2714adce20 [caffe2] Fix compiling ATen-hip in non-opt mode (#132581)
Summary:
It looks like https://github.com/pytorch/pytorch/pull/131894 accidentally broke non-opt hip builds. I.e. `is_flash_attention_available` doesn't get inlined in non-opt mode, so all of `can_use_flash_attention` is compiled into the
 final object file. This includes a reference to `aotriton::v2::flash::check_gpu` which we haven't setup yet for HIP builds.

Test Plan:
CI

Differential Revision: D60720707

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132581
Approved by: https://github.com/jianyuh, https://github.com/xw285cornell
2024-08-04 07:51:18 +00:00
cyy
522fa03e91 [Submodule] Bump ONNX to v1.16.2 (#132566)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132566
Approved by: https://github.com/justinchuby
2024-08-04 07:01:54 +00:00
Wei Feng
2a8e94347f [TP] verify numeric parity on Transfromers for multiple iterations (#132543)
Before setting up float8 numeric parity test, I have to set up regular TP numeric parity test, preferrably testing 10 iterations

this PR sets a baseline of TP numerics. I can verify fp8 on top of it

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132543
Approved by: https://github.com/tianyu-l
ghstack dependencies: #132350
2024-08-04 06:43:27 +00:00
Gabriel Ferns
8ff310392e add __torch_function__ handler to get_device cpp (#132567)
From the issue:
```
import torch

class CustomParameter(torch.nn.Parameter):
    @classmethod
    def __torch_function__(cls, func, types, args=(), kwargs=None):
         return func.__name__

x = CustomParameter(torch.rand(2))

print(x.square()) # 'square'
print(torch.square(x)) # 'square'
print(x.get_device()) # 'get_device'
print(torch.get_device(x)) # -1
```
after fix:
```
$ python repro.py
square
square
get_device
get_device
```

Fixes: https://github.com/pytorch/pytorch/issues/131944

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132567
Approved by: https://github.com/ezyang
2024-08-04 04:26:30 +00:00
Xu Han
7f8a384a8f [inductor] add msvc_cl compiler check (#132571)
add `msvc_cl` compiler check.
Local test:
<img width="880" alt="image" src="https://github.com/user-attachments/assets/fe4da5e0-dd52-4dbc-831e-c32479e27a29">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132571
Approved by: https://github.com/ezyang
2024-08-04 03:48:25 +00:00
Feng Yuan
81b8d3586f Update torch-xpu-ops pin (ATen XPU implementation) (#132390)
Regular update.
1. New 69 ATen operators and variants are added. See https://github.com/intel/torch-xpu-ops/blob/main/yaml/xpu_functions.yaml.
2. Align with PyTorch in-tree to use safe data pointer access APIs.
3. Enable FP64 conversion emulation for some platforms.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132390
Approved by: https://github.com/EikanWang
2024-08-04 02:22:46 +00:00
CaoE
6ec4af6865 [Inductor][CPP] Add vectorization support for double (#131886)
Before:
```
extern "C"  void kernel(const double* in_ptr0, double* out_ptr0)
{
     #pragma omp parallel num_threads(112)
     {
         int tid = omp_get_thread_num();
         {
             #pragma omp for
             for(long x0=static_cast<long>(0L); x0<static_cast<long>(1024L); x0+=static_cast<long>(1L))
             {
                 auto tmp0 = in_ptr0[static_cast<long>(x0)];
                 auto tmp1 = decltype(tmp0)(tmp0 * tmp0);
                 out_ptr0[static_cast<long>(x0)] = tmp1;
             }
         }
     }
 }
```

After:
```
extern "C"  void kernel(const double* in_ptr0, double* out_ptr0)
{
    #pragma omp parallel num_threads(112)
    {
        int tid = omp_get_thread_num();
        {
            #pragma omp for
            for(long x0=static_cast<long>(0L); x0<static_cast<long>(1024L); x0+=static_cast<long>(16L))
            {
                auto tmp0 = at::vec::VectorizedN<double,2>::loadu(in_ptr0 + static_cast<long>(x0), 16);
                auto tmp1 = tmp0 * tmp0;
                tmp1.store(out_ptr0 + static_cast<long>(x0), 16);
            }
        }
    }
}
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131886
Approved by: https://github.com/jgong5, https://github.com/peterbell10
2024-08-04 02:13:21 +00:00
PyTorch MergeBot
d984105748 Revert "[export] Convert autocast to HOO (#131914)"
This reverts commit b28c01d90d.

Reverted https://github.com/pytorch/pytorch/pull/131914 on behalf of https://github.com/ezyang due to Failing lint, but was covered up by master failure on lint ([comment](https://github.com/pytorch/pytorch/pull/131914#issuecomment-2267248773))
2024-08-04 02:10:35 +00:00
Adnan Akhundov
6c65fd0394 [inductor] Add type hints to functions in mkldnn_fusion.py (#131820)
Summary: ATT

Test Plan: lintrunner

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131820
Approved by: https://github.com/eellison
2024-08-03 22:11:47 +00:00
cyy
bc46f205c4 [15/N] Fix clang-tidy warnings in jit (#132564)
Follows  #132477

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132564
Approved by: https://github.com/Skylion007
2024-08-03 19:33:24 +00:00
PyTorch MergeBot
00097f3458 Revert "C++ network flow implementation in c10 (#132188)"
This reverts commit dccce77935.

Reverted https://github.com/pytorch/pytorch/pull/132188 on behalf of https://github.com/ZainRizvi due to Sorry but this appears to be failing internal tests. Please see D60702564 to investigate ([comment](https://github.com/pytorch/pytorch/pull/132188#issuecomment-2267098420))
2024-08-03 18:44:28 +00:00
Xu Han
e3387c6712 [inductor] use uint64_t replace long to add Windows support. (#132491)
`long` type is different between `Windows` and `Linux`.
This PR use `int64_t` instead of `long` on Windows. `LL` suffix is used to initial `int64_t` value.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132491
Approved by: https://github.com/malfet
2024-08-03 18:38:30 +00:00
Yanbo Liang
bbce517221 [Inductor][FlexAttention] TestFlexAttention -> TestFlexDecoding (#132547)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132547
Approved by: https://github.com/Chillee
ghstack dependencies: #132015
2024-08-03 17:26:44 +00:00
PyTorch MergeBot
21d02f8b4b Revert "[easy] fix f-string messages in torch/_ops.py (#132531)"
This reverts commit 25903f3932.

Reverted https://github.com/pytorch/pytorch/pull/132531 on behalf of https://github.com/davidberard98 due to broke lint and tests due to conflict with 132377 ([comment](https://github.com/pytorch/pytorch/pull/132531#issuecomment-2266743391))
2024-08-03 14:49:07 +00:00
Pian Pawakapan
a896fb1b36 check unsupported sympy functions for runtime asserts (#132457)
Some sympy Functions aren't supported by sympy_interp(); we can't turn them into FX nodes, so currently the runtime asserts CSE pass avoids CSE'ing on any expression containing a sympy Function. https://github.com/pytorch/pytorch/pull/132325 started tracking unsupported functions, so we switch the check to that to be more precise. We also check for and skip unsupported functions when adding asserts - previously we only did the check for CSE, and not adding new expressions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132457
Approved by: https://github.com/avikchaudhuri
2024-08-03 10:17:25 +00:00
Xuehai Pan
0e7e61f7ce Deprecate torch._utils.is_compiling() and torch._dynamo.external_utils.is_compiling() (#127690)
This PR is split from PR #126898.

- #126898

------

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127690
Approved by: https://github.com/Skylion007, https://github.com/malfet
2024-08-03 09:43:38 +00:00
Jiashen Cao
159d508f03 [Fix]: prim::If with multiple outputs and input return directly (#131779)
#### Issue
Test is not working for prim::Loop with multiple outputs. Additionally fix issue where input is directly returned, which is not supported by HigherOrderOp.

#### Test Plan
`pytest test/export/test_converter.py -s -k test_convert_if_multiple_out`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131779
Approved by: https://github.com/angelayi, https://github.com/SherlockNoMad
2024-08-03 08:07:21 +00:00
Xu Han
36ec0fdf10 [inductor] check compiler exist on Windows. (#132533)
Current Windows env, if we are not activate the MSVC env. It will not raise a clear error to compiler:
<img width="904" alt="image" src="https://github.com/user-attachments/assets/725ea608-d181-40b1-8930-42fe2b32643a">

With this PR, we can help users point to the issue is from compiler.
<img width="1034" alt="image" src="https://github.com/user-attachments/assets/8515a796-e3e9-4909-a68f-8a14d4864951">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132533
Approved by: https://github.com/jansel
2024-08-03 07:47:11 +00:00
Adnan Akhundov
8ad9f89ccc [inductor] Reland: Add flag to ignore unsupported @triton.autotune args in user-written kernel compilation (#132562)
Summary:
This is a reland attempt of [#131431](https://github.com/pytorch/pytorch/pull/131431), as, in its original form, the PR has caused issues internally.

We currently don't support some of the `triton.autotune` arguments when compiling user-written Triton kernels with PT2. In this PR, we're adding a flag to circumvent it. This is to unblock internal compilation in some cases. The flag is supplied with the docs mentioning why it is not a good idea to set it.

Test Plan:
```
python test/inductor/test_triton_kernels.py -k test_triton_kernel_
autotune_with_unsupported_args
...
----------------------------------------------------------------------
Ran 3 tests in 3.636s

OK
```

Differential Revision: D60701839

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132562
Approved by: https://github.com/chenyang78
2024-08-03 06:31:28 +00:00
Animesh Jain
06581c277a [dynamo][stable-diffusion] Support dict(obj) on constrained subclasses of dict and OrderedDict (#132558)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132558
Approved by: https://github.com/jansel
2024-08-03 06:31:00 +00:00
Shangdi Yu
b28c01d90d [export] Convert autocast to HOO (#131914)
Summary:
Suggested in https://github.com/pytorch/pytorch/issues/128394.

If there's an autocast context manager, the predispatch (strict) graph can look something like:

```
class <lambda>(torch.nn.Module):
    def forward(self, x: "f32[1]"):
        ...
        _enter_autocast = torch.amp.autocast_mode._enter_autocast('cuda', torch.bfloat16, True, None)
        mm: "f32[8, 8]" = torch.ops.aten.mm.default(rand, rand_1);  rand = rand_1 = None
        _exit_autocast = torch.amp.autocast_mode._exit_autocast(_enter_autocast);  _enter_autocast = None
        return (mm_1,)
```

But the operator `torch.amp.autocast_mode._enter_autocast` is not a valid ATen op. We remove these nodes by turning autocast into a higher order operator and make a submodule for the blocks between `_enter_autocast` and `_exit_autocast`.

Some potential followup improvement:
1) Merge some of the duplicated logic with `replace_set_grad_with_hop_pass.py`
2) Check the current autocast status (any enabled? dtype?) and not create a submodule if the autocast args matches current autocast status.

Test Plan:
CI

```
parsh --build-flags fbcode//mode/dev-nosan  fbcode//caffe2/test:test_export
run_tests("test_predispatch_autocast")
```

Reviewed By: angelayi

Differential Revision: D60206382

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131914
Approved by: https://github.com/angelayi
2024-08-03 05:48:57 +00:00
Avik Chaudhuri
ed4493de0e dim name is identifier (#132557)
Summary: Dim names appear in suggested fixes so should be valid Python identifiers.

Test Plan: none

Differential Revision: D60696854

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132557
Approved by: https://github.com/pianpwk
2024-08-03 05:28:50 +00:00
Edward Z. Yang
1f5dfe00da Subtracer should always be real to inherit fake/real tensors from parent config (#132488)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132488
Approved by: https://github.com/zou3519
2024-08-03 04:55:42 +00:00
Justin Chu
6966d44eda [ONNX] Rename _internal/exporter to _exporter_legacy (#132429)
The next PR will be creating an `exporter` directory to house logic from `torch-onnx`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132429
Approved by: https://github.com/titaiwangms
2024-08-03 04:23:05 +00:00
David Berard
5973aec671 [fx] python_code(verbose=True): show size/strides for all tensors (#132192)
python_code(verbose=True) (or print_readable()) generates a string with the code representing the fx graph, with extra annotations indicating the size or stride of the tensor. Currently, it'll only shows sizes/strides for FakeTensors provided in metadata. For subclass tensors like NestedTensor, the outer class (provided in the node metadata) will be a non-FakeTensor and the inner tensors will be fake. This PR expands the conditional to show sizes/strides for all tensors, not just FakeTensors.

Testing: I ran this test script (below), ran it with `TORCH_LOGS=+dynamo` and found in the logs the graph shown below - we see that the input nested tensor has sizes and strides associated with it. Also, I stacked a diff on top of this one that forces the readable graph to be generated whenever PT2 is in use in tests, which should hopefully find any issues; https://github.com/pytorch/pytorch/pull/132195 shows no significant failures except for preexisting failures.

test script:
```python
import torch

def fn(x):
    return x.cos()

nt = torch.nested.nested_tensor_from_jagged(
    torch.randn(10, 10),
    torch.tensor([0, 1, 3, 6, 10]),
)

torch.compile(fn)(nt)
```

logs excerpt:
```
[0/0] [__graph_code] TRACED GRAPH
[0/0] [__graph_code]  ===== __compiled_fn_1 =====
[0/0] [__graph_code]  /data/users/dberard/pytorch/torch/fx/_lazy_graph_module.py class GraphModule(torch.nn.M

[0/0] [__graph_code]     def forward(self, L_x_: "f32[4, zf1, 10][10*zf1, 10, 1]cpu", zf1: "Sym(zf1)"):
[0/0] [__graph_code]         l_x_ = L_x_
[0/0] [__graph_code]
[0/0] [__graph_code]          # File: /data/users/dberard/scripts/nt_print_graph.py:4 in fn, code: return x.c

[0/0] [__graph_code]         cos: "f32[4, zf1, 10][10*zf1, 10, 1]cpu" = l_x_.cos();  l_x_ = None
[0/0] [__graph_code]         return (cos,)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132192
Approved by: https://github.com/Chillee
2024-08-03 02:54:32 +00:00
Ivan Zaitsev
0b571b1058 [codemod][pyre] Add missing Pyre mode headers (#132548)
Reviewed By: connernilsen

Differential Revision: D59849027

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132548
Approved by: https://github.com/kit1980, https://github.com/ZainRizvi
2024-08-03 02:32:53 +00:00
Yanbo Liang
373e9be457 [Inductor][FlexAttention] Add kwarg to top level for users to specify kernel params (#132015)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132015
Approved by: https://github.com/Chillee
2024-08-03 02:27:02 +00:00
David Berard
25903f3932 [easy] fix f-string messages in torch/_ops.py (#132531)
I encountered these when making this change:

```
diff --git a/test/functorch/test_ac.py b/test/functorch/test_ac.py
index 3a2e07fa147..a4d003399e7 100644
--- a/test/functorch/test_ac.py
+++ b/test/functorch/test_ac.py
@@ -259,15 +259,8 @@ class MemoryBudgetTest(TestCase):

         expected = call()
         for budget in range(0, 11):
-            memory_budget = budget / 10
-            torch._dynamo.reset()
-            with config.patch(activation_memory_budget=memory_budget):
-                if memory_budget is not None:
-                    f_compile = torch.compile(
-                        call, backend="aot_eager_decomp_partition"
-                    )
-
-                self.assertEqual(expected, f_compile())
+            get_mem_and_flops(call, memory_budget=budget / 10)
+

     def test_prioritize_cheaper_matmul(self):
         def f(xs, ws):
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132531
Approved by: https://github.com/Skylion007
ghstack dependencies: #132356, #132466
2024-08-03 02:23:44 +00:00
Animesh Jain
419b76c4ac [dynamo] Reland 132308, 132314, 132318, 132334 - Make builtin nn modules attributes static (#132539)
Relanding 4 PRs ending at https://github.com/pytorch/pytorch/pull/132334

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132539
Approved by: https://github.com/Skylion007, https://github.com/yanboliang, https://github.com/mlazos
2024-08-03 02:08:22 +00:00
Ivan Zaitsev
841cadd555 Fix discrepancies from 129973 (#132545)
#129973 ([D59132793](https://www.internalfb.com/diff/D59132793)) was exported missing changes in `test/cpp/jit/CMakeLists.txt` this PR remediates that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132545
Approved by: https://github.com/kit1980
2024-08-03 01:57:49 +00:00
Eli Uriegas
243a763e1b ci: Remove split-build CUDA testing from pull.yml (#132537)
This is already represented in trunk.yml so it seems a bit redundant to include this level of testing in pull.yml.

I've been observing a large spike in our usage of `g3.4xlarge` which seems to correspond to these builds in particular so removing these from `pull.yml` since they are already covered in `trunk.yml`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132537
Approved by: https://github.com/ZainRizvi, https://github.com/malfet
2024-08-03 01:24:17 +00:00
Shangdi Yu
a503136583 [export] Detect whether case_name is registered in exportdb (#132420)
Summary:
- moves logging functionalities into `torch/_export/db/logging.py` file.
- add a check in `_dynamo/eval_frame.py` to check for optional input and error out with `UnsupportedError`
- change the case name of `torch_sym_int` to `unsupported_operator`
- Check if the case name is registered in exportdb, if so, we give a link to the case in exportdb.
- TODO: add test

Test Plan:
CI

Running the example in https://pytorch.org/docs/main/generated/exportdb/index.html#optional-input gives the following error logging:

```
E0730 10:53:33.687000 4155538 torch/_dynamo/eval_frame.py:1086] Parameter y is optional with a default value of tensor([[-0.1633,  1.2414, -0.1071],
E0730 10:53:33.687000 4155538 torch/_dynamo/eval_frame.py:1086]         [-0.1936, -0.9425, -0.0824]])
E0730 10:53:33.688000 4155538 torch/export/_trace.py:1043] See optional_input in exportdb for unsupported case.                 https://pytorch.org/docs/main/generated/exportdb/index.html#optional-input
......
  File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/389acaeb40d57230/tutorials/pytorch/nntest/__torchtest__/torchtest#link-tree/torch/_dynamo/eval_frame.py", line 1091, in produce_matching
    raise Unsupported(
torch._dynamo.exc.Unsupported: Tracing through optional input is not supported yet
```

It also logs a `export.error.classified` event in Scuba.

Reviewed By: zhxchen17

Differential Revision: D60427208

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132420
Approved by: https://github.com/zhxchen17
2024-08-03 01:08:48 +00:00
Joel Schlosser
64720f3b89 Introduce checks to validate public API tests (#131390)
This PR introduces a new sanity check for the public API tests in `.ci/pytorch/test.sh`.
* Validates two public API tests:
    1. Ensures `test_correct_module_names` fails when a new file OR an existing file adds an invalid public API function (e.g. one whose `__module__` is unset).
    2. Ensures `test_modules_can_be_imported` fails when a module underneath `torch/` cannot be imported.
* Runs this in CI as part just before the pre-existing FC / BC checks.

I've verified that re-introducing the bug that #131386 fixed causes the new check to fail:
![public_api_failure](https://github.com/user-attachments/assets/376ddef3-d14a-41f6-93e2-f935deb6555a)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131390
Approved by: https://github.com/albanD
2024-08-03 00:29:00 +00:00
cyy
fcef6cc6d1 [13/N] Fix clang-tidy warnings in jit (#132477)
Follows  #132209

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132477
Approved by: https://github.com/Skylion007
2024-08-03 00:13:18 +00:00
Shivam Raikundalia
705ac311aa Fix Distributed EventList usage (#132448)
Summary: Summarized here: https://github.com/pytorch/pytorch/issues/132227

Test Plan: Use suggestion in issue, should see test passing again

Differential Revision: D60614690

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132448
Approved by: https://github.com/aaronenyeshi
2024-08-02 23:55:31 +00:00
Sherlock Huang
e3513fb2af [ts_converter]handle python list append, list add, aten.to.dtype+mutation_op pattern (#132529)
Summary:
#### Description
Add support for aten::append with a python function that returns a new list with the appended element. We then update the `fx_node` in the `name_to_node` mapping.

aten::append contributed by Jiashen Cao <jiashenc@meta.com>

Fix conversion for csr_ranker_test

```
    model_name: csr_ranker_test_4.ptl
    has_ts_model: True
    has_sample_inputs: True
    ops_maybe_missing_meta: set()
    script_objects: set()
    ts_can_run: True
    ts_run_exception: None
    can_convert: True
    convert_exception: None
    ep_result_correct: True
    ep_run_exception: None
    can_package: True
    package_exception: None
    sigmoid_can_run: False
    sigmoid_run_exception: RuntimeError('not for symbolics')
    sigmoid_result_correct: None
```

Test Plan:
test_aten_add_t
test_aten_append_t
test_aten_to_dtype_with_mutating_storage

buck2 run mode/opt sigmoid/inference/ts_migration:main -- --mode test_one --model_name csr_ranker_test

Differential Revision: D60635893

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132529
Approved by: https://github.com/jiashenC
2024-08-02 23:32:37 +00:00
David Berard
85f19ce14a Support meta["val"] that is a dict, for triton kernels and for the partitioner (#132466)
Internally there's a model that's using memory_budget with the partitioner, and using custom triton kernels. The partitioner fails when encountering the triton ops because they don't have `meta["val"]`. This PR adds `meta["val"]`  to these fx graph nodes and then adds handling for `meta["val"]` being a dict in the partitioner.

Differential Revision: [D60627813](https://our.internmc.facebook.com/intern/diff/D60627813)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132466
Approved by: https://github.com/zou3519
ghstack dependencies: #132356
2024-08-02 23:24:29 +00:00
Shivam Raikundalia
bcac71517c [Profiler] Test Logging for Empty Traces (#132444)
Summary: Tests D60311331. Please see that diff for explanation

Test Plan: This diff is adding a test itself

Reviewed By: aaronenyeshi

Differential Revision: D60311555

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132444
Approved by: https://github.com/aaronenyeshi
2024-08-02 22:04:15 +00:00
David Berard
1962f9475f [NJT][flop counter] attention: if offsets are fake, use max seqlen (#132356)
The flop counter is used by the partitioner, in which case the tensors passed in can be fake.

The flop computations for nested attention use the offsets to determine the actual amount of compute that will be done. But when the offsets are fake, we end up with unbacked symints (from `(offsets[1:] - offsets[:-1]).to_list()`). If we find that the offsets are fake or functional tensors, then use the max sequence length instead.

Repro: https://gist.github.com/davidberard98/903fb3e586edb6d1d466786e1a610eba

Differential Revision: [D60597463](https://our.internmc.facebook.com/intern/diff/D60597463)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132356
Approved by: https://github.com/soulitzer
2024-08-02 20:42:29 +00:00
Will Constable
37c3d503b7 [pipelining] Make test_schedule quiet (#132369)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132369
Approved by: https://github.com/H-Huang
ghstack dependencies: #129810, #130378
2024-08-02 20:38:17 +00:00
Will Constable
7c1cca9fda [pipelining] Add schedule send/recv pass (#130378)
Inserts send/recv ops where needed in a compute-only pipeline schedule.

Any F or B action will require a recv op for its input and a send op
for its output, except for at the ends of the pipeline.

To avoid hangs caused by mixed-up orderings of sends/recvs across ranks,
we pick one compute action at a time and insert both its send op (on
that rank's schedule), and the matching recv op for the recipient stage
(on the schedule for the rank for that stage).

TODO
Currently ignores a couple of edge cases
- ignores batching (which is an optimization)
- ignores cases where a stage sends to anotehr stage on the same rank,
  and should skip the send/recv and directly access memory

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130378
Approved by: https://github.com/H-Huang
ghstack dependencies: #129810
2024-08-02 20:38:17 +00:00
Will Constable
625f494619 [Pipelining] Add schedule unshard/reshard pass (#129810)
Adds fsdp unshard/reshard ops to a compute-only schedule.

Operates on one pp-rank's schedule at a time, since there is no
cross-pp-rank coordination needed for FSDP.  (Unshard/Reshard is across
DP ranks within a PP group).

Uses a heuristic based on examining the next N stages to run compute
operations on this rank, evicting (resharding) and fetching (unsharding)
ahead of time to give unshard operations a chance to overlap with
compute and PP comms.
- this heuristic has not been validated and may not be optimal

Makes the assumption that it's fine to add the UNSHARD/RESHARD actions
to the schedule regardless of if FSDP will actually be used.
- this way, users do not have to tell us at PP schedule creation time if
  they plan to use FSDP or DDP
- it is trivial to implement UNSHARD/RESHARD as no-ops inside the
  runtime, if FSDP is not detected on the stage module

TODO
- also add FSDP's reduce-scatter? or is it sufficient to leave this
  handled by PipelineStage at 'last backward' time
- validate 'next N stages' heuristic and expose an API if needed
- add an e2e test

Co-authored-by: Howard Huang <howardhuang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129810
Approved by: https://github.com/kwen2501, https://github.com/H-Huang
2024-08-02 20:38:17 +00:00