pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-14 20:57:59 +00:00

Author	SHA1	Message	Date
drisspg	42547f8d48	Add support for blackwell codegen (#141724 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141724 Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/eqy	2024-12-03 20:34:43 +00:00
Mu-Chu Lee	8b0fcad0fd	[AOTInductor] Add update_constant_buffer pybind support (#140755 ) Summary: We add update_constant_buffer python support for testing purpose. Test Plan: Included in commit Differential Revision: D65968613 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140755 Approved by: https://github.com/22quinn	2024-12-03 20:34:25 +00:00
Ting Lu	e5f5283ab2	Fix cuda arch full version for 12.6 (#141976 ) follow up for https://github.com/pytorch/pytorch/pull/141433/files build still showing up as 12.6.2 in the name, see latest https://github.com/pytorch/pytorch/actions/runs/12134985224/job/33833276884. related to https://github.com/pytorch/pytorch/issues/138440 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141976 Approved by: https://github.com/atalman, https://github.com/nWEIdia, https://github.com/Skylion007	2024-12-03 20:33:01 +00:00
Fabian Keller	f472b3aee1	improve typings around torch.export (#141829 ) This is another follow-up to https://github.com/pytorch/pytorch/pull/115074 / https://github.com/pytorch/pytorch/pull/141240 following the strategy discussed there (https://github.com/pytorch/pytorch/pull/115074#issuecomment-2480992230). This PR improves the type annotations around `torch._export`. Even though the PR introduces a few runtime type asserts, the runtime behavior should stay equivalent, because the failed assertions should have been immediate crashes anyway. CC @Skylion007 @ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/141829 Approved by: https://github.com/ezyang	2024-12-03 19:57:21 +00:00
Bob Ren	43c5f59190	flip capture_autograd_function to default to true and warn if false (#141972 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141972 Approved by: https://github.com/zou3519 ghstack dependencies: #141932	2024-12-03 19:50:14 +00:00
angelayi	96a35716d1	[aoti] Improve OSSProxyExecutor error messages (#141501 ) For debugging issues like https://fb.workplace.com/groups/1028545332188949/permalink/1092584242451724/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/141501 Approved by: https://github.com/henrylhtsang	2024-12-03 19:32:49 +00:00
Colin L. Rice	6b620423a3	dynamo_timed: Add a log_waitcounter option. (#141402 ) This logs a waitcounter of the name pytorch.dynamo_timed.{key}. Primarily sending this now to make sure everyone likes the API, then I'll add tests, and migrate one dynamo_timed to use it. (likely starting with https://github.com/pytorch/pytorch/pull/141379). Testing is a bit harder, since we don't normally have any way to read _WaitCounter state AFAICT. I want to poke around and see if I can figure out a way to read the state, otherwise I'll just mock it to at least make sure it's mostly working. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141402 Approved by: https://github.com/jamesjwu, https://github.com/masnesral	2024-12-03 19:24:29 +00:00
drisspg	d35358b271	[FlexAttention] Remove failing num_warps=8 in bwds (#141653 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141653 Approved by: https://github.com/BoyuanFeng	2024-12-03 19:22:52 +00:00
dan_the_3rd	9125e9119c	Fix memory leak in `ModuleTracker` (#141960 ) Thanks @drisspg and @albanD for finding the fix TEST PLAN ``` import gc import torch import torch.nn as nn from torch.utils.module_tracker import ModuleTracker class MyModel(nn.Module): def forward(self, x): return x * x print(f"torch=={torch.__version__}") m = MyModel() m.cuda() m.to(torch.bfloat16) mt = ModuleTracker() for i in range(1000): if i % 100 == 0: gc.collect() print("memory_allocated:", torch.cuda.memory_allocated()) x = torch.randn([128, 256], device="cuda", dtype=torch.bfloat16, requires_grad=True) with mt: m(x) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/141960 Approved by: https://github.com/albanD	2024-12-03 18:36:15 +00:00
iupaikov-amd	7bb2228ffd	Test cpp_wrapper_hipify string comparison (#141353 ) Updating the test to match this code that takes device warpsize into account: `cf1d95a965/torch/_inductor/codegen/cuda/device_op_overrides.py (L120)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/141353 Approved by: https://github.com/desertfire	2024-12-03 18:25:32 +00:00
Chien-Chin Huang	8b5c26287d	Initialize lr as a tensor if it is originally a tensor (#141620 ) Fix https://github.com/pytorch/pytorch/issues/139575 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141620 Approved by: https://github.com/kwen2501	2024-12-03 18:10:23 +00:00
Uttam Thakore	314e08eb52	[fr_trace][bugfix] Log missing ranks when provided (#141924 ) Summary: For missing ranks issues, `build_collectives` doesn't log any errors (`5c2584a14c/tools/flight_recorder/components/builder.py (L293C23-L306C24)`), which means that when `EntryState.to_collective` is called [here](`5c2584a14c/tools/flight_recorder/components/builder.py (L400C21-L405C22)`), errors will be empty and `to_collective` will enter the first if statement. But that codepath doesn't log `missing_ranks`, meaning it will be absent from the `Collective` returned. This diff fixes that oversight. Test Plan: eyes Sandcastle run Differential Revision: D66679224 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141924 Approved by: https://github.com/c-p-i-o	2024-12-03 17:54:43 +00:00
Andrew Gu	5c59f4a55a	Remove old FSDP1 `fully_shard` (#141875 ) FSDP1's `fully_shard` frontend was an exploration at the end of 2022 H2 as part of the `torch/distributed/_composable` APIs to avoid `nn.Module` wrappers. It calls into the same backend code as FSDP1's `FullyShardedDataParallel`. The API did not gain traction internally, so we instead reused the name `fully_shard` for FSDP2, which similarly is not an `nn.Module` wrapper and follows similar design principles as FSDP1's `fully_shard`. To the best of our knowledge, we have removed all instances of FSDP1's `fully_shard` internally, and we put the deprecation warning in open source in 2.4 saying it will be removed after 2.5 (which is now): `4959784dac/torch/distributed/_composable/fully_shard.py (L40-L48)` We are skipping the PR sanity check because this PR is only removing code, not adding new code, and should not require this sanity check. Differential Revision: [D66664988](https://our.internmc.facebook.com/intern/diff/D66664988) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141875 Approved by: https://github.com/weifengpy	2024-12-03 17:00:47 +00:00
rzou	ed4831b93c	Improve torch.library.opcheck and register_autograd docs (#141883 ) Fixes https://github.com/pytorch/pytorch/issues/141618 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141883 Approved by: https://github.com/albanD ghstack dependencies: #141894, #141880	2024-12-03 16:28:56 +00:00
rzou	827c322290	Make torch.library.triton_op public (#141880 ) We've been using it privately for half a year and everything's been good. This PR: 1. Makes torch.library.triton_op public 2. Renames capture_triton -> wrap_triton. We got feedback that no one knew what "capture triton" does. 3. Makes torch.library.wrap_triton public. triton_op is used to construct a Python custom operator that may call 1+ triton kernels. Each of those triton kernels must be annotated with wrap_triton. Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/141880 Approved by: https://github.com/albanD ghstack dependencies: #141894	2024-12-03 16:28:56 +00:00
rzou	ac600fdce6	Type exposed_in decorator (#141894 ) Test Plan: - lintrunner Pull Request resolved: https://github.com/pytorch/pytorch/pull/141894 Approved by: https://github.com/albanD	2024-12-03 16:28:17 +00:00
Nikita Shulga	7a806a839d	[FP8] Expand MaskedSelect to float8 (#141928 ) Needed for printing those. Though I wonder if better solution would be to change those ops to use element size rather than actual type (to extend them automatically to unsigned integral types as well) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141928 Approved by: https://github.com/ezyang, https://github.com/jgong5	2024-12-03 15:14:26 +00:00
Xuehai Pan	78543e6002	[dynamo][pytree][1/N] make CXX pytree traceable: `tree_iter` / `tree_leaves` (#137397 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137397 Approved by: https://github.com/jansel	2024-12-03 11:17:39 +00:00
Valentine233	9990b47ea3	[inductor][pattern matcher] revise mkldnn pattern matcher UT (#141334 ) Fixes #139970, #139812. Revise mkldnn pattern matcher UTs, to check the relevant specific matched patterns instead of the total matched number. 1) Add the missing specific counters in pattern matchers, e.g. `mkldnn_unary_fusion_matcher_nodes`/`mkldnn_conv_weight_pack_matcher_count`. 2) In UTs, change the general `matcher_count`/`matcher_nodes` checks to the specific ones, e.g. `mkldnn_unary_fusion_matcher_nodes`/`mkldnn_conv_weight_pack_matcher_count`. 3) In UTs, remove the option of `matcher_count`/`matcher_nodes` params in _test_common and make `matcher_check_fn` a necessary param. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141334 Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5, https://github.com/jansel	2024-12-03 09:26:43 +00:00
Ryan Guo	ff73e2e679	[dynamo] Validate `mutation_type` and `source` in `VariableTracker.__init__` (#141717 ) As title, this also uncovered a few invalid use cases; the cases that cause error are fixed in separate patches prior to this patch, and the rest are fixed in this patch. This patch also moves a few `.source` mutation to variable construction, to increase the coverage of the validation. Fixes #133027. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141717 Approved by: https://github.com/jansel ghstack dependencies: #141713, #141714, #141715, #141902, #141716	2024-12-03 09:18:06 +00:00
Ryan Guo	0efd184685	[dynamo] Fix side effects for range iterator that escapes the graph (#141716 ) `wrap_range_iterator` mistakenly used `ValueMutationNew`, when it should've used `ValueMutationExisting`, because this code path always has a source. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141716 Approved by: https://github.com/jansel ghstack dependencies: #141713, #141714, #141715, #141902	2024-12-03 09:18:06 +00:00
Ryan Guo	7c3c8a662e	[dynamo] Add `RANGE_ITERATOR_MATCH` to properly guard on range iterators (#141902 ) A subsequeunt patch attempts to fix a side-effect issue for range iterators, which in turn exposed an exising issue on guards for range iterators -- the following test started failing: ``` PYTORCH_TEST_WITH_DYNAMO=1 python test/test_tensor_creation_ops.py TestTensorCreationCPU.test_hstack_column_stack_cpu_int16 ``` This patch adds a `RANGE_ITERATOR_MATCH` guard to make sure that we properly guard on range iterators, and adds a regression test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141902 Approved by: https://github.com/jansel ghstack dependencies: #141713, #141714, #141715	2024-12-03 09:18:06 +00:00
Ryan Guo	ff3f4a164c	[dynamo] Fix aliasing issue for `dict.copy` that escapes the graph (#141715 ) Dynamo accidentally passed the original `ConstDictVariable.source` to the result of `dict.copy(...)`, which caused aliasing issue when the result escapes the graph (e.g., is a return value). This patch fixes that and adds a regression test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141715 Approved by: https://github.com/jansel ghstack dependencies: #141713, #141714	2024-12-03 09:18:06 +00:00
Ryan Guo	9eb0520d75	[dynamo] Fix side-effect handling for pre-existing `collections.deque` (#141714 ) Previously we never replayed side effects to `DequeVariable` with a source; the bug was already in the `test_deque_input` test, but went unnoticed because we didn't check the deque objects. This patch adds limited but practical support for this (see comments in `side_effects.py` for why limited), and updates the deque tests to check for this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141714 Approved by: https://github.com/jansel ghstack dependencies: #141713	2024-12-03 09:18:06 +00:00
Ryan Guo	f2ce2d435b	[dynamo] Add test for returning a nested recursive function and update documentation (#141713 ) Addresses https://github.com/pytorch/pytorch/pull/137905#discussion_r1806923085. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141713 Approved by: https://github.com/jansel	2024-12-03 09:18:06 +00:00
Mwiza Kunda	f8a64c324e	Broadcast constants on vectorised stores in `CppTile2DKernel` (#140262 ) Currently constants are not broadcasted on vectorised stores in `CppTile2DKernel`. This leads to errors like the following: ```shell error:: request for member 'store' in 'tmp1', which is of non-class type 'signed char' 61 \| tmp1.store(tmp2 + static_cast<int64_t>(8L*x0_inner), static_cast<int64_t>(8)); \| ^~~~~ ``` This PR adds the required broadcasting. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/140262 Approved by: https://github.com/jgong5	2024-12-03 09:15:17 +00:00
Bob Ren	e1e3bbc2e1	Set capture_autograd_function=False by default (#141932 ) https://github.com/pytorch/pytorch/pull/136959 cleaned up the flag and added a warning. @Chillee pointed out that we should really default this flag to false otherwise we subject all users that go down this code path to log spew. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141932 Approved by: https://github.com/jansel	2024-12-03 07:59:03 +00:00
Nikita Shulga	e499b46465	Speed up half tensors printing (#141927 ) This PR removes copycast of reduced precision types to float before printing, that was added in https://github.com/pytorch/pytorch/pull/14418 to probably unblock printing when many operations, like `is_nan` and `max` were not supported on CPUs (Reusing old test plan) Before the PR: ```python In [1]: import torch; a = torch.rand(1, 1700, 34, 50, dtype=torch.float16) In [2]: %timeit str(a) 621 μs ± 5.06 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) ``` after the PR ```python In [1]: import torch; a = torch.rand(1, 1700, 34, 50, dtype=torch.float16) In [2]: %timeit str(a) 449 μs ± 2.34 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) ``` Also, this allows one printing 15Gb Metal tensors on 32GB Mac machine: ``` % python3 -c "import torch;print(torch.empty(72250,72250, device='mps', dtype=torch.float16))" tensor([[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]], device='mps:0', dtype=torch.float16) ``` Before this change it failed with non-descriptive ``` % python3 -c "import torch;print(torch.empty(72250,72250, device='mps', dtype=torch.float16))" Traceback (most recent call last): File "<string>", line 1, in <module> import torch;print(torch.empty(72250,72250, device='mps', dtype=torch.float16)) ~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/malfet/git/pytorch/pytorch/torch/_tensor.py", line 568, in __repr__ return torch._tensor_str._str(self, tensor_contents=tensor_contents) ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/malfet/git/pytorch/pytorch/torch/_tensor_str.py", line 708, in _str return _str_intern(self, tensor_contents=tensor_contents) File "/Users/malfet/git/pytorch/pytorch/torch/_tensor_str.py", line 625, in _str_intern tensor_str = _tensor_str(self, indent) File "/Users/malfet/git/pytorch/pytorch/torch/_tensor_str.py", line 339, in _tensor_str self = self.float() RuntimeError: Invalid buffer size: 19.45 GB ``` Convert fp8 dtypes to float16, as float range is an overkill Pull Request resolved: https://github.com/pytorch/pytorch/pull/141927 Approved by: https://github.com/ezyang	2024-12-03 07:01:49 +00:00
Xiaozhu Meng	d035db3d86	[AMD] [submodule] aten.bmm CK-backend prototype (#140758 ) Summary: Early prototype of adding CK backend for aten.bmm. Currently, it is very limited in that: 1. BF16 only 2. A single CK instance 3. NT layout only 4. Alpha=1, Beta=0 only Reviewed By: xw285cornell, zjing14 Differential Revision: D65954695 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140758 Approved by: https://github.com/bradleyhd	2024-12-03 06:54:51 +00:00
Edward Z. Yang	6afcec0c58	Assert is GraphModule in compile_fx_aot (#141575 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/141575 Approved by: https://github.com/Skylion007, https://github.com/desertfire	2024-12-03 05:39:44 +00:00
PyTorch MergeBot	ce86119503	Revert "Set remote cache version and backend type once in compilation metrics (#141707 )" This reverts commit `d633cf1f55`. Reverted https://github.com/pytorch/pytorch/pull/141707 on behalf of https://github.com/malfet due to It breaks tests by referencing FbRemoteFxGraphCache, but CI was green ([comment](https://github.com/pytorch/pytorch/pull/141707#issuecomment-2513555185))	2024-12-03 05:01:02 +00:00
PyTorch MergeBot	2999dbfd21	Revert "[REFACTOR] Inline FxGraphCache.post_compile into sole call site (#141877 )" This reverts commit `3ab4a28eaa`. Reverted https://github.com/pytorch/pytorch/pull/141877 on behalf of https://github.com/huydhn due to Job are failing en masse after this lands, so it looks like a land race ([comment](https://github.com/pytorch/pytorch/pull/141877#issuecomment-2513552752))	2024-12-03 04:57:58 +00:00
Nikita Shulga	38bbe37187	Enable CI on SM89 (#140305 ) Using EC2 G6 instance, based on NVIDIA L4, added to scale config in https://github.com/pytorch/test-infra/pull/5376 To enable more balanced sharding, had to push `148ae19935` Added `@xfailIfSM89` to the following tests: - test_fp8_pattern_2 - test_original_aten_preserved_split_addmm - test_sparse_semi_structured_scaled_mm - test_sparse_semi_structured_scaled_mm_fp8 - test_sparse_fp8fp8_mm Increased tolerance to 2e-4 for `RNNTest.BidirectionalMultilayerGRU_CPU_vs_CUDA` Skipped following inductor tests (that either flaky OOMs or timeouts): - test_reduction_fn_std_float64 - test_reduction_fn_var_mean_float64 - test_multi_output_unbacked_custom_op Pull Request resolved: https://github.com/pytorch/pytorch/pull/140305 Approved by: https://github.com/wdvr, https://github.com/ZainRizvi	2024-12-03 04:49:46 +00:00
chilli	af88326250	Ensure that BlockMask length must always exactly match the sequence length in flex_attention (#141625 ) Fixes https://github.com/pytorch/pytorch/issues/141435 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141625 Approved by: https://github.com/drisspg ghstack dependencies: #138788	2024-12-03 04:45:05 +00:00
Yidi Wu	9cfc9e636d	[while_loop] change to guard_equals for checking output and carry (#141734 ) The input with the same can be represented with different symbols e.g. ```python def body_fn(a, b): return b.sin(), a.sin() ``` , where a = torch.randn(3, 4), b= torch.randn(3, 4). There could be 4 symbols allocated for a and b. So instead of checking their shapes and strides' symbol must be the same, we just use guard_equals to enforce the constraint. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141734 Approved by: https://github.com/zou3519, https://github.com/eellison	2024-12-03 04:00:21 +00:00
Thomas Bohnstingl	871b93bc59	[associative_scan] Fixing shape checks (#141698 ) This PR fixes the shape checks that are done in the associative_scan operation. Before all shapes of the input leaves were required to be the same. With this PR only the shapes of the output of the combine_fn and the input leaves need to be the same, but not among the input leaves. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141698 Approved by: https://github.com/ydwu4	2024-12-03 03:49:11 +00:00
Edward Z. Yang	3ab4a28eaa	[REFACTOR] Inline FxGraphCache.post_compile into sole call site (#141877 ) I am going to break apart the arguments passed to the constituents to only pass exactly what is needed, so easy access to the insides is helpful here. This also moves two helper functions to output_code.py as well. Also set _boxed_call at constructor. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/141877 Approved by: https://github.com/jamesjwu, https://github.com/jansel Co-authored-by: James Wu <jjwu@meta.com>	2024-12-03 03:48:23 +00:00
Mikayla Gawarecki	ecbb8a8800	Mention version of flip in weights_only error message (#141304 ) Fixes https://github.com/pytorch/pytorch/issues/141139 How the 3 versions of the error message now look ### Version 1 Old error message: ``` _pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. (1) Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. WeightsUnpickler error: Unsupported global: GLOBAL __main__._rebuild_class_that_uses_build_instruction was not an allowed global by default. Please use `torch.serialization.add_safe_globals([_rebuild_class_that_uses_build_instruction])` or the `torch.serialization.safe_globals([_rebuild_class_that_uses_build_instruction])` context manager to allowlist this global if you trust this class/function. Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. ``` New error message: ``` _pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. WeightsUnpickler error: Unsupported global: GLOBAL __main__._rebuild_class_that_uses_build_instruction was not an allowed global by default. Please use `torch.serialization.add_safe_globals([_rebuild_class_that_uses_build_instruction])` or the `torch.serialization.safe_globals([_rebuild_class_that_uses_build_instruction])` context manager to allowlist this global if you trust this class/function. Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. ```` ### Version 2 Old error message: ``` _pickle.UnpicklingError: Weights only load failed. ``torch.nested`` and ``torch._dynamo`` must be imported to load nested jagged tensors (NJTs) ``` New error message: ``` _pickle.UnpicklingError: Weights only load failed. ``torch.nested`` and ``torch._dynamo`` must be imported to load nested jagged tensors (NJTs) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. ``` ### Version 3 Old error message ``` _pickle.UnpicklingError: Weights only load failed. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. Trying to load unsupported GLOBAL posix.execv whose module posix is blocked. Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. ``` New error message ``` _pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. Trying to load unsupported GLOBAL posix.execv whose module posix is blocked. Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. ```` Pull Request resolved: https://github.com/pytorch/pytorch/pull/141304 Approved by: https://github.com/zou3519	2024-12-03 03:26:27 +00:00
Michal Gallus	4cbb3b4bd2	[ROCm] Enable finding HIP and ROCm libraries on Windows (#137279 ) This PR introduces support for finding HIP-SDK Libraries on Windows. Since reading the code changes using the diff view is a bit cumbersome due to introduced if branch, let me explain what was changed: - The linux-specific steps to find HIP packages have been dragged into `if(UNIX) block` - Windows steps follow in the `else()` clause The separation was needed, because of several factors: - HIP SDK for Windows typically names its components using `hip` in their names (for exmaple: `hip_version.h` instead of `rocm_version.h`, `HIP_VERSION_DEV_MAJOR` instead of `ROCM_VERSION_DEV_MAJOR`, etc.), - The libraries included in HIP SDK are only a subset of what is available in Linux ROCm (missing hsa-rt, rccl, roctx) - MIOpen isn't a part of HIP SDK, but can be built separately and as of now requires additional path to be defined using and env var. - Windows can only find hip package in version greater than 1.0 and its libraries if the lowercase `find_package(hip ...)` is invoked first. This is because the lowercase `hip` name will cause the mechanism to find hip's packages using [config mode](https://cmake.org/cmake/help/latest/command/find_package.html#search-modes) which is the only one supported on Windows, assuming we also want to [include its libraries](https://rocm.docs.amd.com/en/latest/conceptual/cmake-packages.html#consuming-the-hip-api-in-c-code). The upper-case module-mode-seearched `find_package(HIP)` is used later for inclusion of macros such as `hip_add_library` and related macros. Pull Request resolved: https://github.com/pytorch/pytorch/pull/137279 Approved by: https://github.com/jeffdaily	2024-12-03 03:26:01 +00:00
eellison	33573488d0	Make Dtypepropagation singleton (#141882 ) Should fix compile time regression, it was doing fairly expensive meta programming in init and being instantiated multiple times. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141882 Approved by: https://github.com/ezyang ghstack dependencies: #139945, #140057, #141495	2024-12-03 03:15:16 +00:00
Benjamin Glass	f911361de1	Correctly specify size of sparse_csr tensors in maskedtensor binary ops (#134335 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/134335 Approved by: https://github.com/amjames, https://github.com/cpuhrsch	2024-12-03 02:55:57 +00:00
Aaron Gokaslan	08db735629	[BE]: Update mypy to 1.13.0 (#140808 ) Update mypy to 1.13.0 . Should hopefully reduce linting time. Has support for orjson cache serialization which should improve mypy cache perf if orjson is installed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140808 Approved by: https://github.com/ezyang, https://github.com/malfet	2024-12-03 02:50:10 +00:00
Guilherme Leobas	34127fc688	Only reconstruct dict if needed (#141606 ) Fixes #141452 This is a follow-up of PR #134876, which optimized dict reconstruct to codegen only if any value changed. In this PR we cover the general case and do not codegen any instruction if the dictionary remains the same. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141606 Approved by: https://github.com/zou3519	2024-12-03 02:22:34 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	a6bea3d86d	Fix DCe in training IR to reflect correct record function op (#141899 ) Summary: The exit function is actually exit._recordFunction not exit.default Test Plan: CI Differential Revision: D66665359 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141899 Approved by: https://github.com/ydwu4	2024-12-03 01:59:37 +00:00
James Wu	d633cf1f55	Set remote cache version and backend type once in compilation metrics (#141707 ) This is causing FbFxGraphRemoteCache.init to no longer be idempotent, i.e. only safe to call once per compile. AOTAutogradCache initializes a new remote cache for the forward and the backward. Technically, we could make AOTAutogradCache smart and globally thread through a single FbFxGraphRemoteCache everywhere. But there's no reason to do so, as this class is just the handle to access the cache. Plus, it's very brittle for FbFxGraphRemoteCache to not be safe to call multiple times. (Same problem, different fix of D66502138) Differential Revision: [D66508492](https://our.internmc.facebook.com/intern/diff/D66508492/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141707 Approved by: https://github.com/ezyang	2024-12-03 01:49:11 +00:00
Yu, Guangye	77748ed8ec	fix c10::Event UT failure on XPU backend (#141800 ) # Motivation Fix this UT failure introduced by https://github.com/pytorch/pytorch/pull/140865. The unrelated failure suppressed this UT failure. It goes to happen since https://github.com/pytorch/pytorch/pull/141546 is landed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141800 Approved by: https://github.com/EikanWang	2024-12-03 01:34:42 +00:00
PyTorch MergeBot	09ce760fef	Revert "Add missing data types at torch export serialization (#138561 )" This reverts commit `1ef1b3b391`. Reverted https://github.com/pytorch/pytorch/pull/138561 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/138561#issuecomment-2513343401))	2024-12-03 01:32:50 +00:00
Benjamin Glass	4959784dac	Add API query for available per-process CUDA memory (#140620 ) Certain `cpp_wrapper`-enabled tests were OOM-ing in the CI pipeline, with error messages suggesting that sufficient memory was accessible. This ultimately resulted from an internal memory limitation that was not queryable in the API. This PR adds querying for that limit. Additionally, the failing tests had incorrect memory availability checks, and are updated with measured memory requirements. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140620 Approved by: https://github.com/malfet, https://github.com/eqy ghstack dependencies: #141367	2024-12-03 00:24:03 +00:00
Chris Sidebottom	5c33c9202f	Skip test_cpu_repo.py::CPUReproTests::test_auto_zvec_vsx_simd on AArch64 (#141155 ) The skipping logic clearly states it shouldn't be running on this architecture. The test then fails due to `VecNEON` returning `128` from `bit_width()` Pull Request resolved: https://github.com/pytorch/pytorch/pull/141155 Approved by: https://github.com/jgong5, https://github.com/desertfire, https://github.com/malfet	2024-12-03 00:19:06 +00:00
atalman	c17ba69ba5	[submodule] Revert "Adds support for accelerated sorting with x86-simd-sort (#127936 ) (#141901 ) Looks like the original PR caused: https://github.com/pytorch/pytorch/issues/140590 Please see comment: https://github.com/pytorch/pytorch/issues/140590#issuecomment-2508704480 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141901 Approved by: https://github.com/andrewor14, https://github.com/malfet	2024-12-03 00:16:35 +00:00

1 2 3 4 5 ...

81673 commits