pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-14 20:57:59 +00:00

Author	SHA1	Message	Date
Jiashen Cao	09fcd792eb	[Fix]: ScriptObject lifting issue (#130952 ) #### Issue ScriptObject was treated as normal attribute by the converter previously. This PR lifts it to be a constant and convert it directly to a GetAttr fx node. ScriptObject would also trigger `CallMethod` and this PR adds that support as well. #### Test Plan Add test case for ScriptObject. `pytest test/export/test_converter.py -s -k test_convert_script_object` Pull Request resolved: https://github.com/pytorch/pytorch/pull/130952 Approved by: https://github.com/angelayi	2024-08-04 16:52:45 +00:00
PyTorch MergeBot	5dac4d2c78	Revert "[easy] fix f-string messages in torch/_ops.py (#132531 )" This reverts commit `908d2a153b`. Reverted https://github.com/pytorch/pytorch/pull/132531 on behalf of https://github.com/davidberard98 due to still breaks tests ([comment](https://github.com/pytorch/pytorch/pull/132531#issuecomment-2267584289))	2024-08-04 15:41:56 +00:00
cyy	105ba7b58c	[5/N] Fix clang-tidy warnings in aten/src/ATen (#132565 ) Follows #132001 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132565 Approved by: https://github.com/Skylion007	2024-08-04 14:39:16 +00:00
David Berard	908d2a153b	[easy] fix f-string messages in torch/_ops.py (#132531 ) I encountered these when making this change: ``` diff --git a/test/functorch/test_ac.py b/test/functorch/test_ac.py index 3a2e07fa147..a4d003399e7 100644 --- a/test/functorch/test_ac.py +++ b/test/functorch/test_ac.py @@ -259,15 +259,8 @@ class MemoryBudgetTest(TestCase): expected = call() for budget in range(0, 11): - memory_budget = budget / 10 - torch._dynamo.reset() - with config.patch(activation_memory_budget=memory_budget): - if memory_budget is not None: - f_compile = torch.compile( - call, backend="aot_eager_decomp_partition" - ) - - self.assertEqual(expected, f_compile()) + get_mem_and_flops(call, memory_budget=budget / 10) + def test_prioritize_cheaper_matmul(self): def f(xs, ws): ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/132531 Approved by: https://github.com/Skylion007 ghstack dependencies: #132356, #132466	2024-08-04 14:30:42 +00:00
Xu Han	87d46d70d7	[inductor] export kernel for gemm template. (#132580 ) Changes: 1. Move `get_export_declaration` to `cpp_utils.py` as basic function. 2. Export kernel for gemm template. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132580 Approved by: https://github.com/ezyang	2024-08-04 11:17:19 +00:00
Xuehai Pan	d2dc173664	Remove lint dependency `ufmt` (#132573 ) `ufmt` is a combination of `black + usort`. This PR removes `ufmt` and run `black` and `usort` separately. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132573 Approved by: https://github.com/ezyang ghstack dependencies: #129769, #132572	2024-08-04 10:24:09 +00:00
Xuehai Pan	f7aeb394b6	[BE][Easy] Remove empty `ISORT_SKIPLIST` (#132572 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132572 Approved by: https://github.com/ezyang, https://github.com/justinchuby ghstack dependencies: #129769	2024-08-04 10:24:09 +00:00
Xuehai Pan	f3fce597e9	[BE][Easy][17/19] enforce style for empty lines in import segments in `torch/[a-c]/` and `torch/[e-n]/` (#129769 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129769 Approved by: https://github.com/ezyang	2024-08-04 10:24:09 +00:00
Dan Zimmerman	2714adce20	[caffe2] Fix compiling ATen-hip in non-opt mode (#132581 ) Summary: It looks like https://github.com/pytorch/pytorch/pull/131894 accidentally broke non-opt hip builds. I.e. `is_flash_attention_available` doesn't get inlined in non-opt mode, so all of `can_use_flash_attention` is compiled into the final object file. This includes a reference to `aotriton::v2::flash::check_gpu` which we haven't setup yet for HIP builds. Test Plan: CI Differential Revision: D60720707 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132581 Approved by: https://github.com/jianyuh, https://github.com/xw285cornell	2024-08-04 07:51:18 +00:00
cyy	522fa03e91	[Submodule] Bump ONNX to v1.16.2 (#132566 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/132566 Approved by: https://github.com/justinchuby	2024-08-04 07:01:54 +00:00
Wei Feng	2a8e94347f	[TP] verify numeric parity on Transfromers for multiple iterations (#132543 ) Before setting up float8 numeric parity test, I have to set up regular TP numeric parity test, preferrably testing 10 iterations this PR sets a baseline of TP numerics. I can verify fp8 on top of it Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/132543 Approved by: https://github.com/tianyu-l ghstack dependencies: #132350	2024-08-04 06:43:27 +00:00
Gabriel Ferns	8ff310392e	add __torch_function__ handler to get_device cpp (#132567 ) From the issue: ``` import torch class CustomParameter(torch.nn.Parameter): @classmethod def __torch_function__(cls, func, types, args=(), kwargs=None): return func.__name__ x = CustomParameter(torch.rand(2)) print(x.square()) # 'square' print(torch.square(x)) # 'square' print(x.get_device()) # 'get_device' print(torch.get_device(x)) # -1 ``` after fix: ``` $ python repro.py square square get_device get_device ``` Fixes: https://github.com/pytorch/pytorch/issues/131944 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132567 Approved by: https://github.com/ezyang	2024-08-04 04:26:30 +00:00
Xu Han	7f8a384a8f	[inductor] add msvc_cl compiler check (#132571 ) add `msvc_cl` compiler check. Local test: <img width="880" alt="image" src="https://github.com/user-attachments/assets/fe4da5e0-dd52-4dbc-831e-c32479e27a29"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/132571 Approved by: https://github.com/ezyang	2024-08-04 03:48:25 +00:00
Feng Yuan	81b8d3586f	Update torch-xpu-ops pin (ATen XPU implementation) (#132390 ) Regular update. 1. New 69 ATen operators and variants are added. See https://github.com/intel/torch-xpu-ops/blob/main/yaml/xpu_functions.yaml. 2. Align with PyTorch in-tree to use safe data pointer access APIs. 3. Enable FP64 conversion emulation for some platforms. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132390 Approved by: https://github.com/EikanWang	2024-08-04 02:22:46 +00:00
CaoE	6ec4af6865	[Inductor][CPP] Add vectorization support for double (#131886 ) Before: ``` extern "C" void kernel(const double* in_ptr0, double* out_ptr0) { #pragma omp parallel num_threads(112) { int tid = omp_get_thread_num(); { #pragma omp for for(long x0=static_cast<long>(0L); x0<static_cast<long>(1024L); x0+=static_cast<long>(1L)) { auto tmp0 = in_ptr0[static_cast<long>(x0)]; auto tmp1 = decltype(tmp0)(tmp0 * tmp0); out_ptr0[static_cast<long>(x0)] = tmp1; } } } } ``` After: ``` extern "C" void kernel(const double* in_ptr0, double* out_ptr0) { #pragma omp parallel num_threads(112) { int tid = omp_get_thread_num(); { #pragma omp for for(long x0=static_cast<long>(0L); x0<static_cast<long>(1024L); x0+=static_cast<long>(16L)) { auto tmp0 = at::vec::VectorizedN<double,2>::loadu(in_ptr0 + static_cast<long>(x0), 16); auto tmp1 = tmp0 * tmp0; tmp1.store(out_ptr0 + static_cast<long>(x0), 16); } } } } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/131886 Approved by: https://github.com/jgong5, https://github.com/peterbell10	2024-08-04 02:13:21 +00:00
PyTorch MergeBot	d984105748	Revert "[export] Convert autocast to HOO (#131914 )" This reverts commit `b28c01d90d`. Reverted https://github.com/pytorch/pytorch/pull/131914 on behalf of https://github.com/ezyang due to Failing lint, but was covered up by master failure on lint ([comment](https://github.com/pytorch/pytorch/pull/131914#issuecomment-2267248773))	2024-08-04 02:10:35 +00:00
Adnan Akhundov	6c65fd0394	[inductor] Add type hints to functions in mkldnn_fusion.py (#131820 ) Summary: ATT Test Plan: lintrunner Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/131820 Approved by: https://github.com/eellison	2024-08-03 22:11:47 +00:00
cyy	bc46f205c4	[15/N] Fix clang-tidy warnings in jit (#132564 ) Follows #132477 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132564 Approved by: https://github.com/Skylion007	2024-08-03 19:33:24 +00:00
PyTorch MergeBot	00097f3458	Revert "C++ network flow implementation in c10 (#132188 )" This reverts commit `dccce77935`. Reverted https://github.com/pytorch/pytorch/pull/132188 on behalf of https://github.com/ZainRizvi due to Sorry but this appears to be failing internal tests. Please see D60702564 to investigate ([comment](https://github.com/pytorch/pytorch/pull/132188#issuecomment-2267098420))	2024-08-03 18:44:28 +00:00
Xu Han	e3387c6712	[inductor] use uint64_t replace long to add Windows support. (#132491 ) `long` type is different between `Windows` and `Linux`. This PR use `int64_t` instead of `long` on Windows. `LL` suffix is used to initial `int64_t` value. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132491 Approved by: https://github.com/malfet	2024-08-03 18:38:30 +00:00
Yanbo Liang	bbce517221	[Inductor][FlexAttention] TestFlexAttention -> TestFlexDecoding (#132547 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132547 Approved by: https://github.com/Chillee ghstack dependencies: #132015	2024-08-03 17:26:44 +00:00
PyTorch MergeBot	21d02f8b4b	Revert "[easy] fix f-string messages in torch/_ops.py (#132531 )" This reverts commit `25903f3932`. Reverted https://github.com/pytorch/pytorch/pull/132531 on behalf of https://github.com/davidberard98 due to broke lint and tests due to conflict with 132377 ([comment](https://github.com/pytorch/pytorch/pull/132531#issuecomment-2266743391))	2024-08-03 14:49:07 +00:00
Pian Pawakapan	a896fb1b36	check unsupported sympy functions for runtime asserts (#132457 ) Some sympy Functions aren't supported by sympy_interp(); we can't turn them into FX nodes, so currently the runtime asserts CSE pass avoids CSE'ing on any expression containing a sympy Function. https://github.com/pytorch/pytorch/pull/132325 started tracking unsupported functions, so we switch the check to that to be more precise. We also check for and skip unsupported functions when adding asserts - previously we only did the check for CSE, and not adding new expressions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132457 Approved by: https://github.com/avikchaudhuri	2024-08-03 10:17:25 +00:00
Xuehai Pan	0e7e61f7ce	Deprecate `torch._utils.is_compiling()` and `torch._dynamo.external_utils.is_compiling()` (#127690 ) This PR is split from PR #126898. - #126898 ------ Pull Request resolved: https://github.com/pytorch/pytorch/pull/127690 Approved by: https://github.com/Skylion007, https://github.com/malfet	2024-08-03 09:43:38 +00:00
Jiashen Cao	159d508f03	[Fix]: prim::If with multiple outputs and input return directly (#131779 ) #### Issue Test is not working for prim::Loop with multiple outputs. Additionally fix issue where input is directly returned, which is not supported by HigherOrderOp. #### Test Plan `pytest test/export/test_converter.py -s -k test_convert_if_multiple_out` Pull Request resolved: https://github.com/pytorch/pytorch/pull/131779 Approved by: https://github.com/angelayi, https://github.com/SherlockNoMad	2024-08-03 08:07:21 +00:00
Xu Han	36ec0fdf10	[inductor] check compiler exist on Windows. (#132533 ) Current Windows env, if we are not activate the MSVC env. It will not raise a clear error to compiler: <img width="904" alt="image" src="https://github.com/user-attachments/assets/725ea608-d181-40b1-8930-42fe2b32643a"> With this PR, we can help users point to the issue is from compiler. <img width="1034" alt="image" src="https://github.com/user-attachments/assets/8515a796-e3e9-4909-a68f-8a14d4864951"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/132533 Approved by: https://github.com/jansel	2024-08-03 07:47:11 +00:00
Adnan Akhundov	8ad9f89ccc	[inductor] Reland: Add flag to ignore unsupported @triton.autotune args in user-written kernel compilation (#132562 ) Summary: This is a reland attempt of [#131431](https://github.com/pytorch/pytorch/pull/131431), as, in its original form, the PR has caused issues internally. We currently don't support some of the `triton.autotune` arguments when compiling user-written Triton kernels with PT2. In this PR, we're adding a flag to circumvent it. This is to unblock internal compilation in some cases. The flag is supplied with the docs mentioning why it is not a good idea to set it. Test Plan: ``` python test/inductor/test_triton_kernels.py -k test_triton_kernel_ autotune_with_unsupported_args ... ---------------------------------------------------------------------- Ran 3 tests in 3.636s OK ``` Differential Revision: D60701839 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132562 Approved by: https://github.com/chenyang78	2024-08-03 06:31:28 +00:00
Animesh Jain	06581c277a	[dynamo][stable-diffusion] Support dict(obj) on constrained subclasses of dict and OrderedDict (#132558 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132558 Approved by: https://github.com/jansel	2024-08-03 06:31:00 +00:00
Shangdi Yu	b28c01d90d	[export] Convert autocast to HOO (#131914 ) Summary: Suggested in https://github.com/pytorch/pytorch/issues/128394. If there's an autocast context manager, the predispatch (strict) graph can look something like: ``` class <lambda>(torch.nn.Module): def forward(self, x: "f32[1]"): ... _enter_autocast = torch.amp.autocast_mode._enter_autocast('cuda', torch.bfloat16, True, None) mm: "f32[8, 8]" = torch.ops.aten.mm.default(rand, rand_1); rand = rand_1 = None _exit_autocast = torch.amp.autocast_mode._exit_autocast(_enter_autocast); _enter_autocast = None return (mm_1,) ``` But the operator `torch.amp.autocast_mode._enter_autocast` is not a valid ATen op. We remove these nodes by turning autocast into a higher order operator and make a submodule for the blocks between `_enter_autocast` and `_exit_autocast`. Some potential followup improvement: 1) Merge some of the duplicated logic with `replace_set_grad_with_hop_pass.py` 2) Check the current autocast status (any enabled? dtype?) and not create a submodule if the autocast args matches current autocast status. Test Plan: CI ``` parsh --build-flags fbcode//mode/dev-nosan fbcode//caffe2/test:test_export run_tests("test_predispatch_autocast") ``` Reviewed By: angelayi Differential Revision: D60206382 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131914 Approved by: https://github.com/angelayi	2024-08-03 05:48:57 +00:00
Avik Chaudhuri	ed4493de0e	dim name is identifier (#132557 ) Summary: Dim names appear in suggested fixes so should be valid Python identifiers. Test Plan: none Differential Revision: D60696854 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132557 Approved by: https://github.com/pianpwk	2024-08-03 05:28:50 +00:00
Edward Z. Yang	1f5dfe00da	Subtracer should always be real to inherit fake/real tensors from parent config (#132488 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/132488 Approved by: https://github.com/zou3519	2024-08-03 04:55:42 +00:00
Justin Chu	6966d44eda	[ONNX] Rename _internal/exporter to _exporter_legacy (#132429 ) The next PR will be creating an `exporter` directory to house logic from `torch-onnx` Pull Request resolved: https://github.com/pytorch/pytorch/pull/132429 Approved by: https://github.com/titaiwangms	2024-08-03 04:23:05 +00:00
David Berard	5973aec671	[fx] python_code(verbose=True): show size/strides for all tensors (#132192 ) python_code(verbose=True) (or print_readable()) generates a string with the code representing the fx graph, with extra annotations indicating the size or stride of the tensor. Currently, it'll only shows sizes/strides for FakeTensors provided in metadata. For subclass tensors like NestedTensor, the outer class (provided in the node metadata) will be a non-FakeTensor and the inner tensors will be fake. This PR expands the conditional to show sizes/strides for all tensors, not just FakeTensors. Testing: I ran this test script (below), ran it with `TORCH_LOGS=+dynamo` and found in the logs the graph shown below - we see that the input nested tensor has sizes and strides associated with it. Also, I stacked a diff on top of this one that forces the readable graph to be generated whenever PT2 is in use in tests, which should hopefully find any issues; https://github.com/pytorch/pytorch/pull/132195 shows no significant failures except for preexisting failures. test script: ```python import torch def fn(x): return x.cos() nt = torch.nested.nested_tensor_from_jagged( torch.randn(10, 10), torch.tensor([0, 1, 3, 6, 10]), ) torch.compile(fn)(nt) ``` logs excerpt: ``` [0/0] [__graph_code] TRACED GRAPH [0/0] [__graph_code] ===== __compiled_fn_1 ===== [0/0] [__graph_code] /data/users/dberard/pytorch/torch/fx/_lazy_graph_module.py class GraphModule(torch.nn.M [0/0] [__graph_code] def forward(self, L_x_: "f32[4, zf1, 10][10zf1, 10, 1]cpu", zf1: "Sym(zf1)"): [0/0] [__graph_code] l_x_ = L_x_ [0/0] [__graph_code] [0/0] [__graph_code] # File: /data/users/dberard/scripts/nt_print_graph.py:4 in fn, code: return x.c [0/0] [__graph_code] cos: "f32[4, zf1, 10][10zf1, 10, 1]cpu" = l_x_.cos(); l_x_ = None [0/0] [__graph_code] return (cos,) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/132192 Approved by: https://github.com/Chillee	2024-08-03 02:54:32 +00:00
Ivan Zaitsev	0b571b1058	[codemod][pyre] Add missing Pyre mode headers (#132548 ) Reviewed By: connernilsen Differential Revision: D59849027 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132548 Approved by: https://github.com/kit1980, https://github.com/ZainRizvi	2024-08-03 02:32:53 +00:00
Yanbo Liang	373e9be457	[Inductor][FlexAttention] Add kwarg to top level for users to specify kernel params (#132015 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132015 Approved by: https://github.com/Chillee	2024-08-03 02:27:02 +00:00
David Berard	25903f3932	[easy] fix f-string messages in torch/_ops.py (#132531 ) I encountered these when making this change: ``` diff --git a/test/functorch/test_ac.py b/test/functorch/test_ac.py index 3a2e07fa147..a4d003399e7 100644 --- a/test/functorch/test_ac.py +++ b/test/functorch/test_ac.py @@ -259,15 +259,8 @@ class MemoryBudgetTest(TestCase): expected = call() for budget in range(0, 11): - memory_budget = budget / 10 - torch._dynamo.reset() - with config.patch(activation_memory_budget=memory_budget): - if memory_budget is not None: - f_compile = torch.compile( - call, backend="aot_eager_decomp_partition" - ) - - self.assertEqual(expected, f_compile()) + get_mem_and_flops(call, memory_budget=budget / 10) + def test_prioritize_cheaper_matmul(self): def f(xs, ws): ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/132531 Approved by: https://github.com/Skylion007 ghstack dependencies: #132356, #132466	2024-08-03 02:23:44 +00:00
Animesh Jain	419b76c4ac	[dynamo] Reland 132308, 132314, 132318, 132334 - Make builtin nn modules attributes static (#132539 ) Relanding 4 PRs ending at https://github.com/pytorch/pytorch/pull/132334 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132539 Approved by: https://github.com/Skylion007, https://github.com/yanboliang, https://github.com/mlazos	2024-08-03 02:08:22 +00:00
Ivan Zaitsev	841cadd555	Fix discrepancies from 129973 (#132545 ) #129973 ([D59132793](https://www.internalfb.com/diff/D59132793)) was exported missing changes in `test/cpp/jit/CMakeLists.txt` this PR remediates that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132545 Approved by: https://github.com/kit1980	2024-08-03 01:57:49 +00:00
Eli Uriegas	243a763e1b	ci: Remove split-build CUDA testing from pull.yml (#132537 ) This is already represented in trunk.yml so it seems a bit redundant to include this level of testing in pull.yml. I've been observing a large spike in our usage of `g3.4xlarge` which seems to correspond to these builds in particular so removing these from `pull.yml` since they are already covered in `trunk.yml`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132537 Approved by: https://github.com/ZainRizvi, https://github.com/malfet	2024-08-03 01:24:17 +00:00
Shangdi Yu	a503136583	[export] Detect whether case_name is registered in exportdb (#132420 ) Summary: - moves logging functionalities into `torch/_export/db/logging.py` file. - add a check in `_dynamo/eval_frame.py` to check for optional input and error out with `UnsupportedError` - change the case name of `torch_sym_int` to `unsupported_operator` - Check if the case name is registered in exportdb, if so, we give a link to the case in exportdb. - TODO: add test Test Plan: CI Running the example in https://pytorch.org/docs/main/generated/exportdb/index.html#optional-input gives the following error logging: ``` E0730 10:53:33.687000 4155538 torch/_dynamo/eval_frame.py:1086] Parameter y is optional with a default value of tensor([[-0.1633, 1.2414, -0.1071], E0730 10:53:33.687000 4155538 torch/_dynamo/eval_frame.py:1086] [-0.1936, -0.9425, -0.0824]]) E0730 10:53:33.688000 4155538 torch/export/_trace.py:1043] See optional_input in exportdb for unsupported case. https://pytorch.org/docs/main/generated/exportdb/index.html#optional-input ...... File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/389acaeb40d57230/tutorials/pytorch/nntest/__torchtest__/torchtest#link-tree/torch/_dynamo/eval_frame.py", line 1091, in produce_matching raise Unsupported( torch._dynamo.exc.Unsupported: Tracing through optional input is not supported yet ``` It also logs a `export.error.classified` event in Scuba. Reviewed By: zhxchen17 Differential Revision: D60427208 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132420 Approved by: https://github.com/zhxchen17	2024-08-03 01:08:48 +00:00
Joel Schlosser	64720f3b89	Introduce checks to validate public API tests (#131390 ) This PR introduces a new sanity check for the public API tests in `.ci/pytorch/test.sh`. * Validates two public API tests: 1. Ensures `test_correct_module_names` fails when a new file OR an existing file adds an invalid public API function (e.g. one whose `__module__` is unset). 2. Ensures `test_modules_can_be_imported` fails when a module underneath `torch/` cannot be imported. * Runs this in CI as part just before the pre-existing FC / BC checks. I've verified that re-introducing the bug that #131386 fixed causes the new check to fail: ![public_api_failure](https://github.com/user-attachments/assets/376ddef3-d14a-41f6-93e2-f935deb6555a) Pull Request resolved: https://github.com/pytorch/pytorch/pull/131390 Approved by: https://github.com/albanD	2024-08-03 00:29:00 +00:00
cyy	fcef6cc6d1	[13/N] Fix clang-tidy warnings in jit (#132477 ) Follows #132209 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132477 Approved by: https://github.com/Skylion007	2024-08-03 00:13:18 +00:00
Shivam Raikundalia	705ac311aa	Fix Distributed EventList usage (#132448 ) Summary: Summarized here: https://github.com/pytorch/pytorch/issues/132227 Test Plan: Use suggestion in issue, should see test passing again Differential Revision: D60614690 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132448 Approved by: https://github.com/aaronenyeshi	2024-08-02 23:55:31 +00:00
Sherlock Huang	e3513fb2af	[ts_converter]handle python list append, list add, aten.to.dtype+mutation_op pattern (#132529 ) Summary: #### Description Add support for aten::append with a python function that returns a new list with the appended element. We then update the `fx_node` in the `name_to_node` mapping. aten::append contributed by Jiashen Cao <jiashenc@meta.com> Fix conversion for csr_ranker_test ``` model_name: csr_ranker_test_4.ptl has_ts_model: True has_sample_inputs: True ops_maybe_missing_meta: set() script_objects: set() ts_can_run: True ts_run_exception: None can_convert: True convert_exception: None ep_result_correct: True ep_run_exception: None can_package: True package_exception: None sigmoid_can_run: False sigmoid_run_exception: RuntimeError('not for symbolics') sigmoid_result_correct: None ``` Test Plan: test_aten_add_t test_aten_append_t test_aten_to_dtype_with_mutating_storage buck2 run mode/opt sigmoid/inference/ts_migration:main -- --mode test_one --model_name csr_ranker_test Differential Revision: D60635893 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132529 Approved by: https://github.com/jiashenC	2024-08-02 23:32:37 +00:00
David Berard	85f19ce14a	Support meta["val"] that is a dict, for triton kernels and for the partitioner (#132466 ) Internally there's a model that's using memory_budget with the partitioner, and using custom triton kernels. The partitioner fails when encountering the triton ops because they don't have `meta["val"]`. This PR adds `meta["val"]` to these fx graph nodes and then adds handling for `meta["val"]` being a dict in the partitioner. Differential Revision: [D60627813](https://our.internmc.facebook.com/intern/diff/D60627813) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132466 Approved by: https://github.com/zou3519 ghstack dependencies: #132356	2024-08-02 23:24:29 +00:00
Shivam Raikundalia	bcac71517c	[Profiler] Test Logging for Empty Traces (#132444 ) Summary: Tests D60311331. Please see that diff for explanation Test Plan: This diff is adding a test itself Reviewed By: aaronenyeshi Differential Revision: D60311555 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132444 Approved by: https://github.com/aaronenyeshi	2024-08-02 22:04:15 +00:00
David Berard	1962f9475f	[NJT][flop counter] attention: if offsets are fake, use max seqlen (#132356 ) The flop counter is used by the partitioner, in which case the tensors passed in can be fake. The flop computations for nested attention use the offsets to determine the actual amount of compute that will be done. But when the offsets are fake, we end up with unbacked symints (from `(offsets[1:] - offsets[:-1]).to_list()`). If we find that the offsets are fake or functional tensors, then use the max sequence length instead. Repro: https://gist.github.com/davidberard98/903fb3e586edb6d1d466786e1a610eba Differential Revision: [D60597463](https://our.internmc.facebook.com/intern/diff/D60597463) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132356 Approved by: https://github.com/soulitzer	2024-08-02 20:42:29 +00:00
Will Constable	37c3d503b7	[pipelining] Make test_schedule quiet (#132369 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132369 Approved by: https://github.com/H-Huang ghstack dependencies: #129810, #130378	2024-08-02 20:38:17 +00:00
Will Constable	7c1cca9fda	[pipelining] Add schedule send/recv pass (#130378 ) Inserts send/recv ops where needed in a compute-only pipeline schedule. Any F or B action will require a recv op for its input and a send op for its output, except for at the ends of the pipeline. To avoid hangs caused by mixed-up orderings of sends/recvs across ranks, we pick one compute action at a time and insert both its send op (on that rank's schedule), and the matching recv op for the recipient stage (on the schedule for the rank for that stage). TODO Currently ignores a couple of edge cases - ignores batching (which is an optimization) - ignores cases where a stage sends to anotehr stage on the same rank, and should skip the send/recv and directly access memory Pull Request resolved: https://github.com/pytorch/pytorch/pull/130378 Approved by: https://github.com/H-Huang ghstack dependencies: #129810	2024-08-02 20:38:17 +00:00
Will Constable	625f494619	[Pipelining] Add schedule unshard/reshard pass (#129810 ) Adds fsdp unshard/reshard ops to a compute-only schedule. Operates on one pp-rank's schedule at a time, since there is no cross-pp-rank coordination needed for FSDP. (Unshard/Reshard is across DP ranks within a PP group). Uses a heuristic based on examining the next N stages to run compute operations on this rank, evicting (resharding) and fetching (unsharding) ahead of time to give unshard operations a chance to overlap with compute and PP comms. - this heuristic has not been validated and may not be optimal Makes the assumption that it's fine to add the UNSHARD/RESHARD actions to the schedule regardless of if FSDP will actually be used. - this way, users do not have to tell us at PP schedule creation time if they plan to use FSDP or DDP - it is trivial to implement UNSHARD/RESHARD as no-ops inside the runtime, if FSDP is not detected on the stage module TODO - also add FSDP's reduce-scatter? or is it sufficient to leave this handled by PipelineStage at 'last backward' time - validate 'next N stages' heuristic and expose an API if needed - add an e2e test Co-authored-by: Howard Huang <howardhuang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/129810 Approved by: https://github.com/kwen2501, https://github.com/H-Huang	2024-08-02 20:38:17 +00:00

1 2 3 4 5 ...

76633 commits