pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-14 20:57:59 +00:00

Author	SHA1	Message	Date
Jack Zhang	ed309b9156	Re-add stft option to align window for center = false (#146379 ) Skips advancing the fc window on https://github.com/pytorch/pytorch/pull/145437, since I just found that there were non-trivial efforts to do so a while ago that eventually was reverted: https://github.com/pytorch/pytorch/pull/73434 Works around the issue by keeping the stft sans center overload Pull Request resolved: https://github.com/pytorch/pytorch/pull/146379 Approved by: https://github.com/justinchuby, https://github.com/iseeyuan	2025-02-06 14:07:13 +00:00
PyTorch MergeBot	1b79d47635	Revert "[dynamo] check for incompatible configs (#146513 )" This reverts commit `aab7925418`. Reverted https://github.com/pytorch/pytorch/pull/146513 on behalf of https://github.com/atalman due to inductor/test_fuzzer.py::TestConfigFuzzer::test_config_fuzzer_dynamo_bisect [GH job link](https://github.com/pytorch/pytorch/actions/runs/13174131431/job/36772837627) [HUD commit link](`4a545eb85d`) ([comment](https://github.com/pytorch/pytorch/pull/146513#issuecomment-2639860568))	2025-02-06 13:42:25 +00:00
Animesh Jain	340cfe4f28	[dynamo][fbcode] Turn on inline_inbuilt_nn_modules (#145407 ) As title. Some internal testing at https://fb.workplace.com/groups/241460628989036/permalink/411650015303429/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/145407 Approved by: https://github.com/ezyang, https://github.com/jansel	2025-02-06 13:18:35 +00:00
PyTorch MergeBot	bd7d4fb2b5	Revert "[DTensor][Test] Create a simple unit test for tensordot (#146514 )" This reverts commit `1f8baf09ea`. Reverted https://github.com/pytorch/pytorch/pull/146514 on behalf of https://github.com/albanD due to The lint failures that you ignored are real right? ([comment](https://github.com/pytorch/pytorch/pull/146514#issuecomment-2639554636))	2025-02-06 11:26:43 +00:00
zeshengzong	4a545eb85d	Fix torch.nn.functional.one_hot param num_classes optional description (#146470 ) `torch.nn.functional.one_hot` [document](https://pytorch.org/docs/stable/generated/torch.nn.functional.one_hot.html) describe param `num_classes` not optional, but user can call method without pass it. ![image](https://github.com/user-attachments/assets/4e6d4feb-691f-451f-95b5-4ac11bac7bc2) ```python >>> import torch >>> a = torch.arange(0, 5) % 3 # [0,1,2,0,1] >>> torch.nn.functional.one_hot(a) tensor([[1, 0, 0], [0, 1, 0], [0, 0, 1], [1, 0, 0], [0, 1, 0]]) ``` `num_classes` has default value -1 `93d98aca31/aten/src/ATen/native/native_functions.yaml (L6154-L6157)` ## Test Result ![image](https://github.com/user-attachments/assets/2c7203b7-6226-4ebc-84c8-cbf912fc48e2) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146470 Approved by: https://github.com/albanD	2025-02-06 07:48:05 +00:00
Simon Fan	aab7925418	[dynamo] check for incompatible configs (#146513 ) internal: https://fb.workplace.com/groups/1075192433118967/permalink/1599802033991335/ Assuming flags don't change during compilation, we shouldn't allow incompatible configs to be set at torch.compile wrap time. Not in this PR: For flags that need to change during compilation, we'd have to be strict about where they can be used in the compile lifecycle Pull Request resolved: https://github.com/pytorch/pytorch/pull/146513 Approved by: https://github.com/williamwen42	2025-02-06 07:39:52 +00:00
eqy	5f0901e573	[cuBLAS][cuBLASLt] Unify `cuBLASLt` workspaces with `cuBLAS` workspaces (#145130 ) As `cuBLAS` workspaces are already per-stream, there shouldn't be kernel execution overlap with `cuBLASLt` kernels. This PR reuses `cuBLAS` workspaces for `cuBLASLt` for the following benefits: + caching (`cuBLAS` workspaces were already cached, so now we get that for `cuBLASLt`) + "free" workspace size bump for `cuBLASLt` `cuBLASLt` workspace sizes were previously smaller than those for `cuBLAS` by default which potentially hurts performance, and we encountered difficulty in increasing the size due to downstream OOMs , see also #120925 + fixes behavior broken behavior with the memtracker; https://github.com/pytorch/pytorch/pull/139442 attempted to handle peaky allocation behavior that broke memtracker equivalence tests but it didn't seem to fully work, here the cached/reused `cuBLAS` workspace seems to fix it + one environment variable to rule them all: `CUBLAS_WORKSPACE_CONFIG` applies directly to `cuBLASLt` without a confusing `CUBLASLT_WORKSPACE_SIZE` that users would also need to consider Pull Request resolved: https://github.com/pytorch/pytorch/pull/145130 Approved by: https://github.com/ngimel	2025-02-06 05:57:33 +00:00
Nikita Shulga	36c6e09528	[MPSInductor] Fix min/max for bfloat16 (#146552 ) By introducing a full specialization that upcasts everything to float, as bfloat does not have a native min/max Test by runing `test_min_max_reduction` Pull Request resolved: https://github.com/pytorch/pytorch/pull/146552 Approved by: https://github.com/dcci	2025-02-06 05:15:00 +00:00
wz337	1f8baf09ea	[DTensor][Test] Create a simple unit test for tensordot (#146514 ) Fixes #ISSUE_NUMBER The dims and shape of the tensors are from a specific Shampoo use case. We want to create a unit test for it to make sure there are no regressions for this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146514 Approved by: https://github.com/tianyu-l	2025-02-06 05:09:34 +00:00
Michael Diggin	e01a5e9e1e	Small improvements to NJT matrix multiplies (#146405 ) Fixes #146404 Adds changes to the matmul and matmul_backward operation for nested jagged tensors, to support back propagation when the output is a regular strided tensor. This required adding support for the nested matmul operation to work when the nested tensor wasn't 'self', i.e `A@B` where `A` isn't nested but `B` is. The operation schemas had to be updated to reflect that either input can be a strided tensor instead (and the gradient), so an extra assertion is added in an edge case where neither input is nested. Unit tests are also added. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146405 Approved by: https://github.com/soulitzer, https://github.com/jbschlosser	2025-02-06 04:51:12 +00:00
bobrenjc93	389c5c0842	print out partial fx graph for all data-dependent errors (#146363 ) The previous implementation didn't catch the following type of errors ``` torch.fx.experimental.symbolic_shapes.GuardOnDataDependentSymNode: Could not extract specialized integer from data-dependent expression u2 (unhinted: u2). (Size-like symbols: none) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/146363 Approved by: https://github.com/angelayi, https://github.com/bdhirsh ghstack dependencies: #146298, #146296	2025-02-06 04:21:34 +00:00
Michael Suo	425804db2b	[torch] fix exception types in custom class magic setattr/getattr (#146516 ) Summary: `c10::AttributeError` is not automatically converted to Python AttributeError, it needs some special macros (e.g. `HANDLE_TH_ERRORS`). Some Python functions like `hasattr` rely on the type of the throw exception to be correct. We don't need the fully generality of those macros, so just do a targeted error type conversion here. Test Plan: added unit test Differential Revision: D69197217 Pull Request resolved: https://github.com/pytorch/pytorch/pull/146516 Approved by: https://github.com/zdevito	2025-02-06 02:14:11 +00:00
Pian Pawakapan	3a6a203b98	[dynamic shapes][real tensor tracing] propagate unbacked hint when creating mod replacement (#146381 ) Fixes data-dependent errors for 2 PT2I models in draft export Pull Request resolved: https://github.com/pytorch/pytorch/pull/146381 Approved by: https://github.com/angelayi	2025-02-06 01:48:40 +00:00
Pian Pawakapan	c5062cca98	[export] make stack_trace optional in insert_custom_op_guards (#146438 ) Summary: Fixes 1 PT2I exportability error Test Plan: - Differential Revision: D69132186 Pull Request resolved: https://github.com/pytorch/pytorch/pull/146438 Approved by: https://github.com/yiming0416, https://github.com/angelayi	2025-02-06 01:48:26 +00:00
Nikita Shulga	6a985d8b2e	Make `inductor_utils.requires_gpu` accept MPS (#145156 ) Not yet ready to setp HAS_GPU to true, but can unskip tests that require GPU (Noticed while running test_mps_basics.py that `test_scalar_cpu_tensor_arg` is getting skipped) - Replace `GPU_TYPE` with `self.device` in `test_custom_op_fixed_layout_sequential`, `test_inductor_layout_optimization_input_mutations`, `test_mutable_custom_op_fixed_layout2` otherwise they GPU tests are just running for _cpu suffixes. - Tweak `test_tmp_not_defined_issue3` to work correctly on CPU, by defining `test_device` and `test_device_0` - UnXFail `test_mutable_custom_op_fixed_layout2_dynamic_shapes` as it should just work on CPU - Add `skip_if_no_triton` decorator and decorate `test_reduction_config_limit` with it, as it does not need CPU nor GPU, but rather a triton backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145156 Approved by: https://github.com/dcci, https://github.com/Skylion007, https://github.com/jansel	2025-02-06 01:14:36 +00:00
Isalia20	0dc03134d9	[MPS] linalg solve implementation (#146531 ) Fixes #98222 Pull Request resolved: https://github.com/pytorch/pytorch/pull/146531 Approved by: https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2025-02-06 00:57:49 +00:00
Nikita Shulga	495049860b	[BE][Metal] Fix signed unsigned comparison warning (#146549 ) I wish I knew how to extract Metal warnings during JIT compilation but https://developer.apple.com/documentation/metal/mtldevice/makelibrary(source:options:)?changes=_7&language=objc is a lie as `error:` stays `nil` unless shader compilation fails. But when it does following warnings are thrown ``` program_source:666:26: warning: comparison of integers of different signs: 'int' and 'unsigned int' [-Wsign-compare] for (auto idx = 1; idx < size; ++idx) { ~~~ ^ ~~~~ program_source:677:26: warning: comparison of integers of different signs: 'int' and 'unsigned int' [-Wsign-compare] for (auto idx = 1; idx < size; ++idx) { ~~~ ^ ~~~~ program_source:688:26: warning: comparison of integers of different signs: 'int' and 'unsigned int' [-Wsign-compare] for (auto idx = 1; idx < size; ++idx) { ~~~ ^ ~~~~ program_source:699:26: warning: comparison of integers of different signs: 'int' and 'unsigned int' [-Wsign-compare] for (auto idx = 1; idx < size; ++idx) { ~~~ ^ ~~~~ program_source:710:26: warning: comparison of integers of different signs: 'int' and 'unsigned int' [-Wsign-compare] for (auto idx = 1; idx < size; ++idx) { ~~~ ^ ~~~~ program_source:723:26: warning: comparison of integers of different signs: 'int' and 'unsigned int' [-Wsign-compare] for (auto idx = 1; idx < size; ++idx) { ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/146549 Approved by: https://github.com/dcci	2025-02-06 00:40:17 +00:00
PyTorch MergeBot	e0cf519ade	Revert "[inductor] Refactor op handlers part 2 (#146252 )" This reverts commit `13f0436abd`. Reverted https://github.com/pytorch/pytorch/pull/146252 on behalf of https://github.com/atalman due to Sorry need to revert, failing internally ([comment](https://github.com/pytorch/pytorch/pull/146252#issuecomment-2638305417))	2025-02-06 00:04:04 +00:00
Nikita Shulga	c7087d6b14	[BE][EZ][Metal] Do not pass tensor length as arg (#146522 ) As all devices capable of running Metal-2 support nonuniform threadgroup sizes, see https://developer.apple.com/metal/Metal-Feature-Set-Tables.pdf for more detail Pull Request resolved: https://github.com/pytorch/pytorch/pull/146522 Approved by: https://github.com/dcci ghstack dependencies: #146521	2025-02-06 00:03:41 +00:00
Nikita Shulga	54ef029532	[BE][EZ][Metal] Mark constant inputs as constant (#146521 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146521 Approved by: https://github.com/dcci	2025-02-06 00:03:41 +00:00
PyTorch MergeBot	2001066c61	Revert "[inductor] Refactor op handlers part 3 (#146254 )" This reverts commit `8e9bda8d89`. Reverted https://github.com/pytorch/pytorch/pull/146254 on behalf of https://github.com/atalman due to Sorry need to revert https://github.com/pytorch/pytorch/pull/146252 ([comment](https://github.com/pytorch/pytorch/pull/146254#issuecomment-2638300857))	2025-02-05 23:59:50 +00:00
Simon Fan	72405b0c0f	[ca] refactor compile reasons and log to tlparse (#146386 ) This PR accumulates comple reasons inside each CacheNode, and logs them to tlparse on each CA compile. This defines a compile as an autograd structure change, and a recompile as a dynamic shape change. sample tlparse: https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpdbo7gt/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=100 for compiles: ```python [ "!0: Cache miss due to new autograd node: torch::autograd::GraphRoot (NodeCall 0) with key size 39, previous key sizes=[]" ] ``` for recompiles: ```python [ "!0: Cache miss due to new autograd node: torch::autograd::GraphRoot (NodeCall 0) with key size 39, previous key sizes=[]", "!1: Cache miss due to 7 changed tensor shapes (total of 7): sizes[0], sizes[1], sizes[2], sizes[3], sizes[4], sizes[5], sizes[6]" ] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/146386 Approved by: https://github.com/jansel ghstack dependencies: #146229	2025-02-05 23:33:21 +00:00
PyTorch MergeBot	68304dba7a	Revert "[inductor] Refactor op handlers part 4 (#146255 )" This reverts commit `7aced455c5`. Reverted https://github.com/pytorch/pytorch/pull/146255 on behalf of https://github.com/atalman due to Sorry need to revert https://github.com/pytorch/pytorch/pull/146252 ([comment](https://github.com/pytorch/pytorch/pull/146255#issuecomment-2638258089))	2025-02-05 23:24:20 +00:00
PyTorch MergeBot	49effa0deb	Revert "[inductor] Refactor op handlers part 5 (#146257 )" This reverts commit `d3dd3eeb7f`. Reverted https://github.com/pytorch/pytorch/pull/146257 on behalf of https://github.com/atalman due to Sorry need to revert https://github.com/pytorch/pytorch/pull/146252 ([comment](https://github.com/pytorch/pytorch/pull/146257#issuecomment-2638251994))	2025-02-05 23:20:38 +00:00
PyTorch MergeBot	93e1e6e07c	Revert "[inductor] Minor compile time optimizations in DefaultHandler (#146282 )" This reverts commit `b8a529cca1`. Reverted https://github.com/pytorch/pytorch/pull/146282 on behalf of https://github.com/atalman due to Sorry need to revert https://github.com/pytorch/pytorch/pull/146252 ([comment](https://github.com/pytorch/pytorch/pull/146282#issuecomment-2638239575))	2025-02-05 23:13:08 +00:00
PyTorch MergeBot	7dc5cfe2ad	Revert "[inductor] Refactor CaptureIndexing into global scope (#146297 )" This reverts commit `7288950bcd`. Reverted https://github.com/pytorch/pytorch/pull/146297 on behalf of https://github.com/atalman due to Sorry need to revert https://github.com/pytorch/pytorch/pull/146252 ([comment](https://github.com/pytorch/pytorch/pull/146297#issuecomment-2638234829))	2025-02-05 23:10:08 +00:00
PyTorch MergeBot	9555bfce88	Revert "[inductor] Pre-populate cache for simplify_with_ranges return value (#146373 )" This reverts commit `84ba9c6e78`. Reverted https://github.com/pytorch/pytorch/pull/146373 on behalf of https://github.com/atalman due to Sorry need to revert https://github.com/pytorch/pytorch/pull/146252 ([comment](https://github.com/pytorch/pytorch/pull/146373#issuecomment-2638232033))	2025-02-05 23:07:08 +00:00
Yanan Cao (PyTorch)	8af31e30d7	[Codemod][AddExplicitStrictExportArg] caffe2/torch (#146439 ) Differential Revision: D69068432 Pull Request resolved: https://github.com/pytorch/pytorch/pull/146439 Approved by: https://github.com/avikchaudhuri	2025-02-05 22:56:54 +00:00
Catherine Lee	97b64f2e5c	Fix workflow for closing nonexistent disable issues (#146447 ) The workflow could not update issues because it didn't have permissions, and it looked green because it didn't check return codes. Tested by running the workflow and seeing that issues did get closed Fixes https://github.com/pytorch/pytorch/issues/145382 Pull Request resolved: https://github.com/pytorch/pytorch/pull/146447 Approved by: https://github.com/huydhn	2025-02-05 22:29:05 +00:00
Howard Huang	9b6d680131	Remove stage_index_to_group_rank from schedule (#146217 ) This PR allows schedules loaded via CSV to automatically set their `stage_index_to_group_rank ` and removes the `stage_index_to_group_rank ` argument from the `PipelineScheduleMulti` constructor Pull Request resolved: https://github.com/pytorch/pytorch/pull/146217 Approved by: https://github.com/wconstab ghstack dependencies: #146193	2025-02-05 21:26:45 +00:00
Howard Huang	4ee7d0de86	Add generate_stage_to_rank_mapping utility (#146193 ) We use `stage_index_to_group_rank` in the stage to determine what send/recv ops and in the schedule for IR generation. However, we don't need to expose this as an argument in our schedule class, so this stack of PRs is to remove it. This PR creates a `stage_index_to_group_rank` utility function and removes the arg for the ZBVschedule. In a following PR I will add code to infer the `stage_index_to_group_rank` for the CSV schedule path and we will be able to remove this argument from our classes entirely. Related comment from @wconstab https://github.com/pytorch/torchtitan/issues/774#issuecomment-2619793741 Pull Request resolved: https://github.com/pytorch/pytorch/pull/146193 Approved by: https://github.com/wconstab	2025-02-05 21:26:45 +00:00
rzou	98b5d455fd	[opcheck] Improve error reporting; allow atol/rtol overrides (#146488 ) This PR improves opcheck to: 1. directly use torch.testing.assert_close (without a msg override). This allows it to print the absolute and relative differences and the number of mismatched elements. 2. take in an atol/rtol tolerance (for if someone just wants to use opcheck in their testing). Test Plan: - tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/146488 Approved by: https://github.com/williamwen42	2025-02-05 21:25:06 +00:00
Justin Chu	1f6b566d74	[ONNX] Bump onnx and onnxscript versions in CI (#146097 ) Bump onnx onnxscript==0.1 in CI; Skipped onnxruntime 1.19 because it has regression on avgpool. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146097 Approved by: https://github.com/malfet	2025-02-05 21:00:25 +00:00
Katarzyna Fojcik	9da376daa6	Add retain-output argument (#145921 ) This PR add retain-output argument which enables appending to the already existing output file if it exists instead of deleting it and creating a new one. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145921 Approved by: https://github.com/jansel	2025-02-05 19:45:09 +00:00
Raymond Li	dd349207c5	Add check that envvar configs are boolean (#145454 ) So we don't get unexpected behavior when higher typed values are passed in Pull Request resolved: https://github.com/pytorch/pytorch/pull/145454 Approved by: https://github.com/c00w, https://github.com/jamesjwu	2025-02-05 19:40:10 +00:00
Anant Gulati	9091096d6c	Refactoring Distributed test cases to be device agnostic [1/n] (#145222 ) In this series of PR we intend to refactoring distributed test cases to enable to be completely device agnostic. These changes will include the following approaches to do the same : - Allowing for multiple device types using instantiate_device_type_test - Replacing calls to cuda stream with torch.get_device_module(device) wherever it applies - Skipping set up steps required while using MultiProcessTestCase with DistributedTestBase (#138216) wherever applicable - Replacing explicit calls to distributed backend (NCCL,HCCL,etc) with get_default_backend_for_device (#140536). This should result in significant improvement in usability for all devices Pull Request resolved: https://github.com/pytorch/pytorch/pull/145222 Approved by: https://github.com/kwen2501	2025-02-05 18:47:09 +00:00
eqy	6f7fda3f49	Bump `nn.functional.conv3d` tolerances for `test_comprehensive` (#135719 ) `float16` tolerance was previously set to `1e-5` which seemed very low Pull Request resolved: https://github.com/pytorch/pytorch/pull/135719 Approved by: https://github.com/Chillee, https://github.com/albanD	2025-02-05 18:34:12 +00:00
Tugsbayasgalan Manlaibaatar	d2a2b9f8a7	Fix constants with non-functional operators (#145593 ) Previously, in non-strict path, we always error when trying to inplace update a constant tensor because those constant tensors are not actually wrapped by functional tensors. This is correct behaviour in torch.compile, because dynamo makes all constant tensors into buffers and AOTDispatcher just lifts them and wraps them in functional tensors. However, in non-strict, there is no such step that registers constants as buffers so AOTDispatcher panics when it sees these dangling constant tensors when functioanalizing. Due to recent change in the IR, this is no longer an issue in non-strict path because we don't call AOTDispatcher at training IR level, but now it is a problem for both strict and non-strict when we lower to inference. (lowering to inference is very similar to non-strict tracing) As a result, we have at least one external (https://github.com/pytorch/pytorch/issues/141336) and internal issues reported due to this difference. To fix this, there are two ways: 1. Make functionalization be aware of constant tensors and map them to functional tensors on the fly. This makes functionalization invariant uglier and could potentially open up a gate for more nasty bugs. 2. Special handle this in export. This seems more aligned with what dynamo does today so i think we should do it this way. I think the current state could benefit from more refactors to make the run_deocmpositions to be more similar to strict export (because both of them now handle this constant registerinig logic) but it is bit complicated to do it now because strict export version of this logic is also not complete because it doesn't take into account of export graph renaming pass etc). I will follow up with more refactors after this PR (T213466691) to unblock users faster. For future reference: Why are we not doing "turning constants into non-persistent buffers and never de-register"? The reason is because in some internal models, they rely on module.to to reliably work to move params/buffers to correct device. As a result, buffers are moved while constants are not. In composibility meeting, we agreed that export won't do device agnostic tracing going forward (it will provide a way to specify FakeTensor in CPU that can be configured to be run on GPU), so after that is done, we can always turn constants into non-persistent buffers which will simplify export's constant handling. Differential Revision: [D68610739](https://our.internmc.facebook.com/intern/diff/D68610739) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145593 Approved by: https://github.com/avikchaudhuri	2025-02-05 17:44:19 +00:00
Jeff Daily	44248c44eb	[ROCm] miopen benchmark behavior now better aligns with cudnn (#145294 ) The default benchmark setting is now false. The new miopen behavior means when benchmarking is disabled, for any shape that doesn't have a find hit, then it will do a quick search (same behavior as the prior default), and use that result. Now when benchmark is enabled, it will perform an exhaustive search and update any DBs. miopen immediate mode is still available and is used when deterministic is true and benchmark is false. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145294 Approved by: https://github.com/BrianHarrisonAMD, https://github.com/malfet	2025-02-05 17:19:53 +00:00
PyTorch MergeBot	f27220e32a	Revert "Move get accelerator to use build time flags when possible (#146098 )" This reverts commit `157d81c201`. Reverted https://github.com/pytorch/pytorch/pull/146098 on behalf of https://github.com/atalman due to Failing internally, sorry need to revert ([comment](https://github.com/pytorch/pytorch/pull/146098#issuecomment-2637443675))	2025-02-05 16:39:37 +00:00
Jason Ansel	f55c0af37f	[inductor] Support non-power-of-2 cooperative RSPLIT (#145689 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145689 Approved by: https://github.com/eellison	2025-02-05 16:36:53 +00:00
maajidkhann	db22e9d5a2	Implement blend operation for float, double, int in VEC ATen backend for SVE (#146479 ) - Added support for SVE vectorized blend operation for float, double, int8_t, int16_t, int32_t and int64_t data types. - Utilizes SVE ACLE intrinsic (svcntb, svcntw, svcmpne, svsel) to handle different vector lengths (VL) dynamically. - Ensured compatibility with SVE128, SVE256, and SVE512 hardware configurations. - Enabled back blend SVE vec tests Testing: a) Float DType: ./vec_test_all_types_SVE256 --gtest_filter=BitwiseFloatsAdditional2/0.Blend [Test Passed] on Graviton 3 machine (SVE256) ./vec_test_all_types_SVE128 --gtest_filter=BitwiseFloatsAdditional2/0.Blend [Test Passed] on Graviton 4 machine (SVE128) b) Double DType: ./vec_test_all_types_SVE256 --gtest_filter=BitwiseFloatsAdditional2/1.Blend [Test Passed] on Graviton 3 machine (SVE256) ./vec_test_all_types_SVE128 --gtest_filter=BitwiseFloatsAdditional2/1.Blend [Test Passed] on Graviton 4 machine (SVE128) c)Int DType: python3 test/inductor/test_cpu_repro.py CPUReproTests.test_vec_remainder [Test Passed] on Graviton 3 machine (SVE256) and on Graviton 4 machine (SVE128) <img width="661" alt="grv4_test_case_passed" src="https://github.com/user-attachments/assets/5572fcc0-a861-4bd6-bf9e-356219ffe656" /> Fixes https://github.com/pytorch/pytorch/issues/146309 Pull Request resolved: https://github.com/pytorch/pytorch/pull/146479 Approved by: https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2025-02-05 16:29:13 +00:00
Zhengxu Chen	cd6c0707a8	[aoti] Assign proxy call args by name, and support default values. (#146263 ) Fixing the following issue when compiling the following program: ``` window = torch.hann_window(N_FFT).to(x.device) stft = torch.stft( x, N_FFT, HOP_LENGTH, window=window, return_complex=True ) magnitudes = stft[..., :-1].abs() ** 2 return magnitudes ``` ``` Traceback (most recent call last): File "/home/zhxchen17/miniconda3/envs/dev/lib/python3.11/unittest/case.py", line 57, in testPartExecutor yield File "/home/zhxchen17/miniconda3/envs/dev/lib/python3.11/unittest/case.py", line 623, in run self._callTestMethod(testMethod) File "/home/zhxchen17/miniconda3/envs/dev/lib/python3.11/unittest/case.py", line 579, in _callTestMethod if method() is not None: ^^^^^^^^ File "/home/zhxchen17/pytorch/torch/testing/_internal/common_utils.py", line 3120, in wrapper method(args, *kwargs) File "/home/zhxchen17/pytorch/test/inductor/test_torchinductor.py", line 12356, in new_test return value(self) ^^^^^^^^^^^ File "/home/zhxchen17/pytorch/test/inductor/test_aot_inductor.py", line 4334, in test_stft self.check_model(model, example_inputs) File "/home/zhxchen17/pytorch/test/inductor/test_aot_inductor_utils.py", line 185, in check_model actual = AOTIRunnerUtil.run( ^^^^^^^^^^^^^^^^^^^ File "/home/zhxchen17/pytorch/test/inductor/test_aot_inductor_utils.py", line 137, in run optimized = AOTIRunnerUtil.load(device, so_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/zhxchen17/pytorch/test/inductor/test_aot_inductor_utils.py", line 119, in load return torch._export.aot_load(so_path, device) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/zhxchen17/pytorch/torch/_export/__init__.py", line 165, in aot_load runner = torch._C._aoti.AOTIModelContainerRunnerCuda(so_path, 1, device) # type: ignore[assignment, call-arg] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: Expected extern kernel aten::hann_window to have serialized argument type as_scalar_type for argument 1 but got as_device ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/146263 Approved by: https://github.com/angelayi	2025-02-05 15:43:05 +00:00
rzou	1bb977a2a4	[auto_functionalized] Support `Tensor(a!)[]?` (#145400 ) Summary: This is just updating some of the checks to allow the Tensor(a!)[]? type through. Fixes #144072 Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/145400 Approved by: https://github.com/laithsakka	2025-02-05 14:52:39 +00:00
PyTorch MergeBot	282d185ec1	Revert "[inductor] use ftz variant of exp (#146216 )" This reverts commit `b0b3fe8bcf`. Reverted https://github.com/pytorch/pytorch/pull/146216 on behalf of https://github.com/atalman due to inductor/test_op_completeness.py::TestOpCompleteness::test_triton_overrides [GH job link](https://github.com/pytorch/pytorch/actions/runs/13152430750/job/36702812599) [HUD commit link](`b0b3fe8bcf`) ([comment](https://github.com/pytorch/pytorch/pull/146216#issuecomment-2636961317))	2025-02-05 14:13:45 +00:00
Davide Italiano	8a2000fd42	[MPS] Implement support for zeta (both eager and inductor). (#146465 ) A test was failing in inductor (`test_pointwise_zeta`) -- and I realized the operation was missing also from eager. Implemented for both, leveraging the kernel. Happy to split in two (one PR for eager, one for inductor) if folks prefer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146465 Approved by: https://github.com/malfet	2025-02-05 13:55:50 +00:00
Nichols A. Romero	fd0cd6a08f	[ROCm][TunableOp] Improve identification of fastest solution (#144942 ) This PR addresses some stability issues with identifying the fastest solution on AMD GPUs, particularly the MI300. Changes include: - An improved timer, StreamTimerNoSync - More aggressive skipping of slow solutions - Additional statistics that can be used for diagnostics PYTORCH_TUNABLEOP_VERBOSE=3 Pull Request resolved: https://github.com/pytorch/pytorch/pull/144942 Approved by: https://github.com/jeffdaily	2025-02-05 11:16:49 +00:00
Simon Fan	e20b0c82d1	[ca] no longer require is_traceable annotations for c++ autograd functions (#146229 ) This PR removes the CA compile-time error for C++ autograd functions, and supports them by having dynamo graph break on them (instead of allow_in_graph). The CppNode's collects are kept as is for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146229 Approved by: https://github.com/jansel, https://github.com/zou3519	2025-02-05 08:49:17 +00:00
cyy	6293d1446b	[2/N] Remove NOLINT suppressions (#146402 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/146402 Approved by: https://github.com/soulitzer	2025-02-05 08:38:52 +00:00
bobrenjc93	e5ea7e9cdc	add support for capturing provenance of unary operations (#146413 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146413 Approved by: https://github.com/angelayi ghstack dependencies: #145848	2025-02-05 08:31:38 +00:00

1 2 3 4 5 ...

84115 commits