johnnynunez
35f5668f7e
[NVIDIA] RTX50 Blackwell Support codegen ( #145270 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145270
Approved by: https://github.com/ezyang
2025-01-21 21:10:05 +00:00
PyTorch MergeBot
895659cb41
Revert "Fix RMSNorm epsilon value type for BF16 or FP16 ( #142848 )"
...
This reverts commit 07e23653cd .
Reverted https://github.com/pytorch/pytorch/pull/142848 on behalf of https://github.com/izaitsevfb due to breaking internal tests, see D68355212 ([comment](https://github.com/pytorch/pytorch/pull/142848#issuecomment-2605734067 ))
2025-01-21 21:04:45 +00:00
Aaron Orenstein
bac62341eb
PEP585 update - torch/_inductor ( #145198 )
...
See #145101 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145198
Approved by: https://github.com/bobrenjc93
2025-01-21 21:04:33 +00:00
Aaron Orenstein
2f9d378f7b
PEP585 update - torch/utils ( #145201 )
...
See #145101 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145201
Approved by: https://github.com/bobrenjc93
2025-01-21 21:04:10 +00:00
Edward Z. Yang
693d8c7e94
Output of nonzero is transposed, fix fake tensor ( #144695 )
...
Needs this companion executorch PR: https://github.com/pytorch/executorch/pull/7657
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144695
Approved by: https://github.com/bobrenjc93 , https://github.com/albanD
2025-01-21 20:50:09 +00:00
Edward Z. Yang
323fb4dad0
Unconditionally exclude upper bound in all size oblivious tests ( #144867 )
...
I was thinking about https://github.com/pytorch/pytorch/pull/144471 some more and I thought, "Hmm, why not just always exclude the constant upper bound." So here it is.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144867
Approved by: https://github.com/bobrenjc93
2025-01-21 20:44:09 +00:00
Wei Wang
df67ac4c86
[CI][CUDA][Distributed][FSDP] Remove hardcoded world size of 2 ( #145195 )
...
as these unit tests would fail if run
on a single GPU (i.e**. skip_if_lt_x_gpu(2)) seems to view world size as 2 even on platforms with 1 GPU.**
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145195
Approved by: https://github.com/Skylion007 , https://github.com/atalman
2025-01-21 20:25:52 +00:00
Jason Ansel
505ade7471
[inductor] Simplify mode options, only apply CompilerBisector changes once ( #145232 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145232
Approved by: https://github.com/yanboliang
2025-01-21 19:25:46 +00:00
RanTao123
85811631d7
[Intel CPU] Fix issue #143489 . ( #145062 )
...
Fix issue in https://github.com/pytorch/pytorch/issues/143489 .
kernel_height * kernel_weight will cause Floating point exception, so we will divide by them one by one.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145062
Approved by: https://github.com/soulitzer
2025-01-21 18:38:33 +00:00
Joel Schlosser
128f3627b1
Implement backward for NJT matmul ( #144587 )
...
Part of my BE project addressing NJT bugs surfaced via OpInfo tests.
This PR implements missing backward support for NJT matmul. Notably, for dense tensors, matmul dispatches to bmm. However, due to historical reasons related to NST, NJT handles matmul directly, and thus can't rely on the CompositeImplicit impl of matmul to get the derivative formula.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144587
Approved by: https://github.com/soulitzer
ghstack dependencies: #144586
2025-01-21 18:27:50 +00:00
Joel Schlosser
af204135d8
Fix NJT fill.Scalar for contiguous inputs ( #144586 )
...
Part of my BE project addressing NJT bugs surfaced via OpInfo tests.
This PR implements the missing `fill.Scalar` support, which works fine for contiguous inputs, but there is still some AOTAutograd debugging required to handle non-contiguous transposed NJTs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144586
Approved by: https://github.com/soulitzer
2025-01-21 18:22:08 +00:00
Edward Z. Yang
efa88e04e1
Don't overspecialize float when propagating cache guards to ShapeEnv ( #145078 )
...
Fixes https://github.com/pytorch/pytorch/issues/142507
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145078
Approved by: https://github.com/Skylion007
2025-01-21 18:05:43 +00:00
Edward Z. Yang
b3e90c8c33
Add support for torch function on dtype arguments ( #145085 )
...
Along the lines of https://github.com/pytorch/pytorch/issues/119194 although it doesn't actually address the FCD case.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145085
Approved by: https://github.com/vmoens , https://github.com/Skylion007
2025-01-21 17:44:47 +00:00
Huy Do
eb553ae3cf
Fix broken gpt_fast micro benchmark after #144315 ( #145235 )
...
The benchmark is failing with the following error
```
File "/var/lib/jenkins/workspace/benchmarks/gpt_fast/benchmark.py", line 333, in <module>
main(output_file=args.output, only_model=args.only)
File "/var/lib/jenkins/workspace/benchmarks/gpt_fast/benchmark.py", line 308, in main
lst = func(device)
File "/var/lib/jenkins/workspace/benchmarks/gpt_fast/benchmark.py", line 66, in run_mlp_layer_norm_gelu
us_per_iter = benchmarker.benchmark(compiled_mod, (x,)) * 1000
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/_inductor/runtime/benchmarking.py", line 39, in wrapper
return fn(self, *args, **kwargs)
TypeError: benchmark() missing 1 required positional argument: 'fn_kwargs'
```
An example error is https://github.com/pytorch/pytorch/actions/runs/12862761823/job/35858912555
I also assign `oncall: pt2` as the owner of this job going forward.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145235
Approved by: https://github.com/nmacchioni
2025-01-21 17:42:24 +00:00
atalman
2cffbff7da
Add 3.13t Windows and MacOS binary builds ( #141806 )
...
Related to: https://github.com/pytorch/pytorch/issues/130249
For conda uses approach described here:
https://conda-forge.org/blog/2024/09/26/python-313/
Create Python 3.13t conda env like so:
```
conda create -n py313 python=3.13 python-freethreading -c conda-forge
```
For windows executable installation we need to pass additional parameter to enable 3.13t:
```
Include_freethreaded=1
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141806
Approved by: https://github.com/albanD
2025-01-21 17:16:19 +00:00
Aaron Orenstein
0afd335174
PEP585 update - torch/nn torch/optim torch/package torch/profiler torch/serialization torch/sparse torch/xpu ( #145175 )
...
See #145101 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145175
Approved by: https://github.com/bobrenjc93
2025-01-21 16:57:27 +00:00
Shunting Zhang
803017f3cb
[inductor] fix MA on poor gpu ( #145133 )
...
Found this bug when debugging a MA issue in CI that can not be repro-ed on devgpu.
On GPU with less than 68 SMs (like NVidia L4 used in CI), running torch compile in max-autotune mode may result in the following confusing error https://gist.github.com/shunting314/370f42f547e3367a3773237942725a86 complaining about layout:
```
torch._inductor.exc.InductorError: LoweringException: AssertionError: convert FlexibleLayout to FixedLayout first
```
The reason is, even if we don't pick Triton template, Inductor still returns a MultiTemplateBuffer for tuned addmm. MultiTemplateBuffer.get_reads called from Reduction.num_splits may indexing a FlexibleLayout which results in the error aforementioned.
The issue does not appear on devgpu because we freeze the layout of addmm inputs when rendering triton templates.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145133
Approved by: https://github.com/jansel
2025-01-21 09:31:34 +00:00
Aaron Orenstein
b5655d9821
PEP585 update - .ci android aten ( #145177 )
...
See #145101 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145177
Approved by: https://github.com/Skylion007
2025-01-21 06:31:26 +00:00
Aaron Orenstein
00ffeca1b1
PEP585 update - torch/distributed ( #145164 )
...
See #145101 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145164
Approved by: https://github.com/bobrenjc93
2025-01-21 04:23:29 +00:00
PyTorch MergeBot
c6986ca2e1
Revert "[dcp] Add ZStandard transformer ( #143360 )"
...
This reverts commit 7b56b039af .
Reverted https://github.com/pytorch/pytorch/pull/143360 on behalf of https://github.com/atalman due to Broke 3.13t builds please test with ciflow/binaries label attached ([comment](https://github.com/pytorch/pytorch/pull/143360#issuecomment-2603433066 ))
2025-01-21 01:10:16 +00:00
PyTorch MergeBot
5fd881a5b6
Revert "PEP585 update - torch/nn torch/optim torch/package torch/profiler torch/serialization torch/sparse torch/xpu ( #145175 )"
...
This reverts commit 54a00af2c6 .
Reverted https://github.com/pytorch/pytorch/pull/145175 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to break some trunk tests ([comment](https://github.com/pytorch/pytorch/pull/145175#issuecomment-2603418267 ))
2025-01-21 00:49:55 +00:00
Aaron Orenstein
dea7ad3371
PEP585 update - torch/testing ( #145200 )
...
See #145101 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145200
Approved by: https://github.com/bobrenjc93
2025-01-20 22:42:42 +00:00
Aaron Orenstein
805c4b597a
PEP585 update - torch/_higher_order_ops torch/_subclasses torch/backends torch/compiler torch/cuda torch/masked torch/mtia torch/nested ( #145202 )
...
See #145101 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145202
Approved by: https://github.com/bobrenjc93
2025-01-20 22:37:26 +00:00
Aaron Orenstein
54a00af2c6
PEP585 update - torch/nn torch/optim torch/package torch/profiler torch/serialization torch/sparse torch/xpu ( #145175 )
...
See #145101 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145175
Approved by: https://github.com/bobrenjc93
2025-01-20 22:32:59 +00:00
Aaron Orenstein
bd97ce0b45
PEP585 update - torch/ao ( #145199 )
...
See #145101 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145199
Approved by: https://github.com/bobrenjc93
2025-01-20 22:32:35 +00:00
Aaron Gokaslan
cf05f6a134
[BE]: Improve typing for torch/fx/_pytree.py and torch/utils/_pytree.py ( #145173 )
...
Improve type inference in _pytree.py utility functions
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145173
Approved by: https://github.com/bobrenjc93
2025-01-20 22:18:19 +00:00
Wang, Chuanqi
225a10febe
[CI] Add xpu linux build into pull workflow ( #145084 )
...
To mitigate the XPU build failure risk introduced by non-XPU specific PRs. Refer #144967 & #143803
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145084
Approved by: https://github.com/huydhn , https://github.com/atalman
2025-01-20 19:31:48 +00:00
Zhengxu Chen
d0100050dd
[aoti] Deduplicate "V.aot_compilation" and "V.graph.aot_mode" flags. [2/n] ( #145091 )
...
Summary: Following up D68122536 to remove configurable aot_mode for inner_compile
Test Plan: CI
Reviewed By: desertfire
Differential Revision: D68158512
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145091
Approved by: https://github.com/ydwu4
2025-01-20 19:09:10 +00:00
Aaron Orenstein
0b2a3687b9
PEP585 update - torch/fx ( #145166 )
...
See #145101 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145166
Approved by: https://github.com/bobrenjc93
2025-01-20 18:11:54 +00:00
PyTorch MergeBot
6374332d33
Revert "PEP585 update - torch/distributed ( #145164 )"
...
This reverts commit 6cb186e279 .
Reverted https://github.com/pytorch/pytorch/pull/145164 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing an inductor test ([comment](https://github.com/pytorch/pytorch/pull/145164#issuecomment-2602875679 ))
2025-01-20 16:46:46 +00:00
Dmitry Nikolaev
57b2b64acf
Fix always true scaled_mm test ( #143912 )
...
Looks like `out_fp8` should use matmul without scales and `out_fp8_s` with
Scales were optional arguments before PR https://github.com/pytorch/pytorch/pull/128683
Then test_float8_scale started comparing two identical results and lost its meaning
Reason of making scales required https://github.com/pytorch/pytorch/pull/128683#issuecomment-2169146402UMBER
This PR uses scale=1.0 to compare result with scaled matmul
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143912
Approved by: https://github.com/drisspg , https://github.com/malfet , https://github.com/pruthvistony
2025-01-20 16:17:46 +00:00
Aleksei Nikiforov
53e2408015
Improve cleanup of cancelled jobs on s390x for tests too ( #144968 )
...
Follow up to https://github.com/pytorch/pytorch/pull/144149
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144968
Approved by: https://github.com/huydhn
2025-01-20 12:56:07 +00:00
Sun, Jiayi
92b9da1fc2
fix torch.atan for torch.complex datatypes on CPU ( #144749 )
...
Fix https://github.com/pytorch/pytorch/issues/141487 .
This issue is caused by the lack of special handling of the case where the real number/imag number is 0/Inf/NaN in the vectorized implementation of `atan`. For correctness, I temporarily fallback the implementation of `atan` to scalar implementation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144749
Approved by: https://github.com/mingfeima , https://github.com/Skylion007
2025-01-20 08:45:03 +00:00
Sun, Jiayi
ed669a9db7
fix torch.div for torch.complex datatypes on CPU ( #140375 )
...
Fix https://github.com/pytorch/pytorch/issues/135428 .
Fix https://github.com/pytorch/pytorch/issues/106845 .
These two issues are caused by the lack of special handling of the case where the real number/imag number is 0/Inf/NaN in the vectorized implementation of `div`. For correctness, I temporarily fallback the implementation of `div` to scalar implementation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140375
Approved by: https://github.com/mingfeima
2025-01-20 08:34:29 +00:00
Sun, Jiayi
c922ccb7c4
fix sigmoid for torch.complex datatypes on CPU ( #140391 )
...
Fix https://github.com/pytorch/pytorch/issues/135777 .
This issue is caused by the lack of special handling of the case where the real number/imag number is 0/Inf/NaN in the vectorized implementation of `reciprocal`. For correctness, I temporarily fallback the implementation of `reciprocal` to scalar implementation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140391
Approved by: https://github.com/mingfeima , https://github.com/Skylion007
ghstack dependencies: #140358
2025-01-20 08:23:58 +00:00
Sun, Jiayi
507bf65c6a
fix torch.exp for torch.complex datatypes on CPU ( #140358 )
...
Fix https://github.com/pytorch/pytorch/issues/48010 , https://github.com/pytorch/pytorch/issues/136063 .
These two issues are caused by the lack of special handling of the case where the real number/imag number is 0/Inf/NaN in the vectorized implementation of `exp`. For correctness, I temporarily fallback the implementation of `exp` to scalar implementation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140358
Approved by: https://github.com/mingfeima , https://github.com/Skylion007
2025-01-20 08:03:17 +00:00
ankurneog
972d4a154d
Add facility to run dynamo UTs for non-cuda devices ( #140929 )
...
This is in line with changes introduced with https://github.com/pytorch/pytorch/pull/130714 , additional files are included to support non-cuda devices.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140929
Approved by: https://github.com/kwen2501 , https://github.com/EikanWang , https://github.com/guangyey
2025-01-20 05:56:38 +00:00
Aaron Orenstein
2b809e58ad
PEP585 update - torch/onnx ( #145174 )
...
See #145101 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145174
Approved by: https://github.com/justinchuby
2025-01-20 05:48:52 +00:00
Animesh Jain
19584b28fd
[dynamo][dicts] Consolidate dict(..) construction ( #144342 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144342
Approved by: https://github.com/StrongerXi
2025-01-20 04:42:06 +00:00
Nikita Shulga
980c75fe6e
[MPSInductor] Add TrueDiv and Round[Int|Decimal] ( #145160 )
...
That fixes `test_builtins_round_float_ndigits_neg` and `test_builtins_round`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145160
Approved by: https://github.com/jansel , https://github.com/dcci
2025-01-20 04:29:42 +00:00
Aaron Orenstein
6cb186e279
PEP585 update - torch/distributed ( #145164 )
...
See #145101 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145164
Approved by: https://github.com/bobrenjc93
2025-01-20 00:19:01 +00:00
Aaron Orenstein
b6c5562c1f
PEP585 update - torch/export ( #145165 )
...
See #145101 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145165
Approved by: https://github.com/bobrenjc93
2025-01-19 20:56:55 +00:00
Aaron Orenstein
316808e4e9
PEP585 update - torch/distributed/elastic torch/distributed/checkpoint ( #145163 )
...
See #145101 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145163
Approved by: https://github.com/Skylion007
2025-01-19 20:55:59 +00:00
Aaron Orenstein
c64e657632
PEP585 update - torch/distributed/fsdp ( #145162 )
...
See #145101 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145162
Approved by: https://github.com/bobrenjc93
2025-01-19 20:04:05 +00:00
Nikita Shulga
371a361db9
Enable bfloat16 testing on MacOS14+ ( #145159 )
...
As Metal-3.1 supports this dtype
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145159
Approved by: https://github.com/Skylion007 , https://github.com/jansel
ghstack dependencies: #145157
2025-01-19 19:35:31 +00:00
Aaron Orenstein
97d4d3c40a
PEP585 update - torch/_export ( #145138 )
...
See #145101 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145138
Approved by: https://github.com/bobrenjc93
ghstack dependencies: #145154
2025-01-19 18:48:35 +00:00
Aaron Orenstein
cd8d0fa20c
Tweak schema_check to handle annotated builtin types ( #145154 )
...
As of python 3.9 annotated lists can be written as `list[T]` and `List[T]` has been deprecated. However schema_check was converting `list[T]` to simply be `list`. This change teaches it to handle `list[T]` the same as `List[T]`.
A couple small drive-by changes I noticed as well:
- Path concatenation should use `os.path.join`, not `+`
- Spelling in error message
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145154
Approved by: https://github.com/bobrenjc93
2025-01-19 18:48:35 +00:00
Aaron Orenstein
9e0437a04a
PEP585 update - torch/ao/quantization ( #145140 )
...
See #145101 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145140
Approved by: https://github.com/bobrenjc93
2025-01-19 10:20:00 +00:00
Aaron Orenstein
78bff1e8c1
PEP585 update - torch/_functorch ( #145139 )
...
See #145101 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145139
Approved by: https://github.com/bobrenjc93
2025-01-19 07:06:10 +00:00
cassanof
10e4d3aebb
[DCP] Fix fsspec fsync bug on .finish() ( #144753 )
...
Fixes #144752
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144753
Approved by: https://github.com/Skylion007 , https://github.com/saumishr
2025-01-19 03:21:00 +00:00