#### Issue
ScriptObject was treated as normal attribute by the converter previously. This PR lifts it to be a constant and convert it directly to a GetAttr fx node. ScriptObject would also trigger `CallMethod` and this PR adds that support as well.
#### Test Plan
Add test case for ScriptObject.
`pytest test/export/test_converter.py -s -k test_convert_script_object`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130952
Approved by: https://github.com/angelayi
Before setting up float8 numeric parity test, I have to set up regular TP numeric parity test, preferrably testing 10 iterations
this PR sets a baseline of TP numerics. I can verify fp8 on top of it
Summary:
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132543
Approved by: https://github.com/tianyu-l
ghstack dependencies: #132350
Some sympy Functions aren't supported by sympy_interp(); we can't turn them into FX nodes, so currently the runtime asserts CSE pass avoids CSE'ing on any expression containing a sympy Function. https://github.com/pytorch/pytorch/pull/132325 started tracking unsupported functions, so we switch the check to that to be more precise. We also check for and skip unsupported functions when adding asserts - previously we only did the check for CSE, and not adding new expressions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132457
Approved by: https://github.com/avikchaudhuri
Summary:
This is a reland attempt of [#131431](https://github.com/pytorch/pytorch/pull/131431), as, in its original form, the PR has caused issues internally.
We currently don't support some of the `triton.autotune` arguments when compiling user-written Triton kernels with PT2. In this PR, we're adding a flag to circumvent it. This is to unblock internal compilation in some cases. The flag is supplied with the docs mentioning why it is not a good idea to set it.
Test Plan:
```
python test/inductor/test_triton_kernels.py -k test_triton_kernel_
autotune_with_unsupported_args
...
----------------------------------------------------------------------
Ran 3 tests in 3.636s
OK
```
Differential Revision: D60701839
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132562
Approved by: https://github.com/chenyang78
Summary:
Suggested in https://github.com/pytorch/pytorch/issues/128394.
If there's an autocast context manager, the predispatch (strict) graph can look something like:
```
class <lambda>(torch.nn.Module):
def forward(self, x: "f32[1]"):
...
_enter_autocast = torch.amp.autocast_mode._enter_autocast('cuda', torch.bfloat16, True, None)
mm: "f32[8, 8]" = torch.ops.aten.mm.default(rand, rand_1); rand = rand_1 = None
_exit_autocast = torch.amp.autocast_mode._exit_autocast(_enter_autocast); _enter_autocast = None
return (mm_1,)
```
But the operator `torch.amp.autocast_mode._enter_autocast` is not a valid ATen op. We remove these nodes by turning autocast into a higher order operator and make a submodule for the blocks between `_enter_autocast` and `_exit_autocast`.
Some potential followup improvement:
1) Merge some of the duplicated logic with `replace_set_grad_with_hop_pass.py`
2) Check the current autocast status (any enabled? dtype?) and not create a submodule if the autocast args matches current autocast status.
Test Plan:
CI
```
parsh --build-flags fbcode//mode/dev-nosan fbcode//caffe2/test:test_export
run_tests("test_predispatch_autocast")
```
Reviewed By: angelayi
Differential Revision: D60206382
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131914
Approved by: https://github.com/angelayi
python_code(verbose=True) (or print_readable()) generates a string with the code representing the fx graph, with extra annotations indicating the size or stride of the tensor. Currently, it'll only shows sizes/strides for FakeTensors provided in metadata. For subclass tensors like NestedTensor, the outer class (provided in the node metadata) will be a non-FakeTensor and the inner tensors will be fake. This PR expands the conditional to show sizes/strides for all tensors, not just FakeTensors.
Testing: I ran this test script (below), ran it with `TORCH_LOGS=+dynamo` and found in the logs the graph shown below - we see that the input nested tensor has sizes and strides associated with it. Also, I stacked a diff on top of this one that forces the readable graph to be generated whenever PT2 is in use in tests, which should hopefully find any issues; https://github.com/pytorch/pytorch/pull/132195 shows no significant failures except for preexisting failures.
test script:
```python
import torch
def fn(x):
return x.cos()
nt = torch.nested.nested_tensor_from_jagged(
torch.randn(10, 10),
torch.tensor([0, 1, 3, 6, 10]),
)
torch.compile(fn)(nt)
```
logs excerpt:
```
[0/0] [__graph_code] TRACED GRAPH
[0/0] [__graph_code] ===== __compiled_fn_1 =====
[0/0] [__graph_code] /data/users/dberard/pytorch/torch/fx/_lazy_graph_module.py class GraphModule(torch.nn.M
[0/0] [__graph_code] def forward(self, L_x_: "f32[4, zf1, 10][10*zf1, 10, 1]cpu", zf1: "Sym(zf1)"):
[0/0] [__graph_code] l_x_ = L_x_
[0/0] [__graph_code]
[0/0] [__graph_code] # File: /data/users/dberard/scripts/nt_print_graph.py:4 in fn, code: return x.c
[0/0] [__graph_code] cos: "f32[4, zf1, 10][10*zf1, 10, 1]cpu" = l_x_.cos(); l_x_ = None
[0/0] [__graph_code] return (cos,)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132192
Approved by: https://github.com/Chillee
This is already represented in trunk.yml so it seems a bit redundant to include this level of testing in pull.yml.
I've been observing a large spike in our usage of `g3.4xlarge` which seems to correspond to these builds in particular so removing these from `pull.yml` since they are already covered in `trunk.yml`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132537
Approved by: https://github.com/ZainRizvi, https://github.com/malfet
Summary:
- moves logging functionalities into `torch/_export/db/logging.py` file.
- add a check in `_dynamo/eval_frame.py` to check for optional input and error out with `UnsupportedError`
- change the case name of `torch_sym_int` to `unsupported_operator`
- Check if the case name is registered in exportdb, if so, we give a link to the case in exportdb.
- TODO: add test
Test Plan:
CI
Running the example in https://pytorch.org/docs/main/generated/exportdb/index.html#optional-input gives the following error logging:
```
E0730 10:53:33.687000 4155538 torch/_dynamo/eval_frame.py:1086] Parameter y is optional with a default value of tensor([[-0.1633, 1.2414, -0.1071],
E0730 10:53:33.687000 4155538 torch/_dynamo/eval_frame.py:1086] [-0.1936, -0.9425, -0.0824]])
E0730 10:53:33.688000 4155538 torch/export/_trace.py:1043] See optional_input in exportdb for unsupported case. https://pytorch.org/docs/main/generated/exportdb/index.html#optional-input
......
File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/389acaeb40d57230/tutorials/pytorch/nntest/__torchtest__/torchtest#link-tree/torch/_dynamo/eval_frame.py", line 1091, in produce_matching
raise Unsupported(
torch._dynamo.exc.Unsupported: Tracing through optional input is not supported yet
```
It also logs a `export.error.classified` event in Scuba.
Reviewed By: zhxchen17
Differential Revision: D60427208
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132420
Approved by: https://github.com/zhxchen17
This PR introduces a new sanity check for the public API tests in `.ci/pytorch/test.sh`.
* Validates two public API tests:
1. Ensures `test_correct_module_names` fails when a new file OR an existing file adds an invalid public API function (e.g. one whose `__module__` is unset).
2. Ensures `test_modules_can_be_imported` fails when a module underneath `torch/` cannot be imported.
* Runs this in CI as part just before the pre-existing FC / BC checks.
I've verified that re-introducing the bug that #131386 fixed causes the new check to fail:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131390
Approved by: https://github.com/albanD
Summary:
#### Description
Add support for aten::append with a python function that returns a new list with the appended element. We then update the `fx_node` in the `name_to_node` mapping.
aten::append contributed by Jiashen Cao <jiashenc@meta.com>
Fix conversion for csr_ranker_test
```
model_name: csr_ranker_test_4.ptl
has_ts_model: True
has_sample_inputs: True
ops_maybe_missing_meta: set()
script_objects: set()
ts_can_run: True
ts_run_exception: None
can_convert: True
convert_exception: None
ep_result_correct: True
ep_run_exception: None
can_package: True
package_exception: None
sigmoid_can_run: False
sigmoid_run_exception: RuntimeError('not for symbolics')
sigmoid_result_correct: None
```
Test Plan:
test_aten_add_t
test_aten_append_t
test_aten_to_dtype_with_mutating_storage
buck2 run mode/opt sigmoid/inference/ts_migration:main -- --mode test_one --model_name csr_ranker_test
Differential Revision: D60635893
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132529
Approved by: https://github.com/jiashenC
Internally there's a model that's using memory_budget with the partitioner, and using custom triton kernels. The partitioner fails when encountering the triton ops because they don't have `meta["val"]`. This PR adds `meta["val"]` to these fx graph nodes and then adds handling for `meta["val"]` being a dict in the partitioner.
Differential Revision: [D60627813](https://our.internmc.facebook.com/intern/diff/D60627813)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132466
Approved by: https://github.com/zou3519
ghstack dependencies: #132356
Inserts send/recv ops where needed in a compute-only pipeline schedule.
Any F or B action will require a recv op for its input and a send op
for its output, except for at the ends of the pipeline.
To avoid hangs caused by mixed-up orderings of sends/recvs across ranks,
we pick one compute action at a time and insert both its send op (on
that rank's schedule), and the matching recv op for the recipient stage
(on the schedule for the rank for that stage).
TODO
Currently ignores a couple of edge cases
- ignores batching (which is an optimization)
- ignores cases where a stage sends to anotehr stage on the same rank,
and should skip the send/recv and directly access memory
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130378
Approved by: https://github.com/H-Huang
ghstack dependencies: #129810
Adds fsdp unshard/reshard ops to a compute-only schedule.
Operates on one pp-rank's schedule at a time, since there is no
cross-pp-rank coordination needed for FSDP. (Unshard/Reshard is across
DP ranks within a PP group).
Uses a heuristic based on examining the next N stages to run compute
operations on this rank, evicting (resharding) and fetching (unsharding)
ahead of time to give unshard operations a chance to overlap with
compute and PP comms.
- this heuristic has not been validated and may not be optimal
Makes the assumption that it's fine to add the UNSHARD/RESHARD actions
to the schedule regardless of if FSDP will actually be used.
- this way, users do not have to tell us at PP schedule creation time if
they plan to use FSDP or DDP
- it is trivial to implement UNSHARD/RESHARD as no-ops inside the
runtime, if FSDP is not detected on the stage module
TODO
- also add FSDP's reduce-scatter? or is it sufficient to leave this
handled by PipelineStage at 'last backward' time
- validate 'next N stages' heuristic and expose an API if needed
- add an e2e test
Co-authored-by: Howard Huang <howardhuang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129810
Approved by: https://github.com/kwen2501, https://github.com/H-Huang