Commit graph

2230 commits

Author SHA1 Message Date
clr
f746bb6311 config: Don't spam warnings about reference type configs (#145800)
Summary:
https://github.com/pytorch/pytorch/issues/145755

The is_dynamic check for reference types was subtly broken, causing log spam
after it was accessed

Added an explicit type for is_default for reference types to make sure this
behaviour is correct
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145800
Approved by: https://github.com/eellison
2025-01-30 18:57:16 +00:00
clr
6b41f310c2 config: Support str env variables (#145980)
Summary:
This allows us to use environment variables to set string values. We've added
tests for the specific functionality implemented here. Note that we already
accidentally started setting up configs to use this, so we're just adding the
feature.

Additionally, we're not fully validating the underlying type when we set the
value (and in general, it's more difficult than we would like to do this). Let
me know if people feel strongly, and we can add a PR to do this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145980
Approved by: https://github.com/yushangdi, https://github.com/oulgen
2025-01-30 00:13:02 +00:00
Colin Peppler
521588519d re-use FloorDiv for RShift (#145898)
I encountered this C++ compilation error.
```
  579 |     int64_t var_6 = (static_cast<int64_t>(std::floor((1.0/2.0)*u0)) | static_cast<int64_t>(std::floor((1.0/4.0)*static_cast<int64_t>(std::floor((1.0/2.0)*u0))))) | std::floor((1.0/16.0)*(static_cast<int64_t>(std::floor((1.0/2.0)*u0)) | static_cast<int64_t>(std::floor((1.0/4.0)*static_cast<int64_t>(std::floor((1.0/2.0)*u0))))));
      |                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                                                                     |                                                                                                         |
      |                                                                     int64_t {aka long int}                                                                                    double
```

Then, I figured out where this std::floor came from with the help of Bob's guard provenance tool. It comes from RShift which is used in `triton.next_power_of_2`.

---
Before, we used `std::floor`
```
int64_t var_6 = (
   static_cast<int64_t>(std::floor((1.0/2.0)*u0)) |
   static_cast<int64_t>(std::floor((1.0/4.0)*static_cast<int64_t>(std::floor((1.0/2.0)*u0)))))
   | std::floor((1.0/16.0)*(static_cast<int64_t>(std::floor((1.0/2.0)*u0))             # no cast to int here.
   | static_cast<int64_t>(std::floor((1.0/4.0)*static_cast<int64_t>(std::floor((1.0/2.0)*u0))))));
```

Now, we use `c10::div_floor_integer` instead
```
int64_t var_6 = (
   (c10::div_floor_integer(static_cast<int64_t>(u0), static_cast<int64_t>(2L))) |
   (c10::div_floor_integer(static_cast<int64_t>(u0), static_cast<int64_t>(8L)))) |
   (c10::div_floor_integer(static_cast<int64_t>((c10::div_floor_integer(static_cast<int64_t>(u0), static_cast<int64_t>(2L)))
   | (c10::div_floor_integer(static_cast<int64_t>(u0), static_cast<int64_t>(8L)))), static_cast<int64_t>(16L)));
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145898
Approved by: https://github.com/desertfire, https://github.com/bobrenjc93
ghstack dependencies: #145802
2025-01-29 22:50:22 +00:00
Aaron Orenstein
7178b827d7 PEP585: Missed conversions (#145342)
Differential Revision: [D68785969](https://our.internmc.facebook.com/intern/diff/D68785969)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145342
Approved by: https://github.com/bobrenjc93
2025-01-29 05:24:36 +00:00
Jane Xu
515e55e692 Set -DPy_LIMITED_API flag for py_limited_api=True extensions (#145764)
This could be BC breaking, because there was a period of time when we use py_limited_api=True but don't enforce the flag, and now that we will start enforcing the flag, people's custom extensions may fail to build.

This is strictly still better behavior, as it is sketchy to claim CPython agnosticism without the flag, but calling this out as potential people yelling at us. Ways to mitigate this risk + reasons this may not be too big a deal:
- People haven't known about py_limited_api for extensions much due to lack of docs from python so usage is low right now
- My current tutorial is in store to make new users of py_limited_api pass this flag, so it'd be a noop for them.

Test plan:
* Locally i'm confident as I tried rebuilding ao with this change and it reliably failed (cuz importing torch/extension.h is a nono)
* Unit test wise, the normal python_agnostic one I added should work

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145764
Approved by: https://github.com/ezyang, https://github.com/zou3519, https://github.com/albanD
2025-01-28 20:11:05 +00:00
Aaron Gokaslan
8e46d0f595 [BE]: Update typing of OrderedSet ancestor (#145783)
Now that we are on python 3.9 minimum version we can properly use Generics in the superclass
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145783
Approved by: https://github.com/eellison
2025-01-28 04:43:49 +00:00
PyTorch MergeBot
9010649292 Revert "Add option to serialization config to reduce random reads from get_record_offset when loading with mmap=True (#143880)"
This reverts commit db3685a35c.

Reverted https://github.com/pytorch/pytorch/pull/143880 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but either this PR or the base PR breaks distributed tests ([comment](https://github.com/pytorch/pytorch/pull/143880#issuecomment-2617743403))
2025-01-28 03:07:17 +00:00
Mikayla Gawarecki
db3685a35c Add option to serialization config to reduce random reads from get_record_offset when loading with mmap=True (#143880)
## Background

This PR adds `torch.utils.serialization.config.load.calculate_storage_offsets`. This option relies  on the previous PR in this stack, where storage order was changed to non lexicographical. A `.format_version` entry was added to the zipfile and `calculate_storage_offsets` will only work on checkpoints with `.format_version`.

When this is turned on, for `torch.load(mmap=True)`, offsets of each storage record (other than the 0th storage will be calculated instead of relying on `miniz` APIs to determine this).

The existing APIs will issue multiple random reads (reading the end of central directory record, then reading the zipfile header for the record) to determine the storage offset where the record starts. This can greatly degrade `torch.load(mmap=True)` performance for non-filesystem cases.

6aaae9d78f/caffe2/serialize/inline_container.cc (L589-L605)

## Testing strategy

The agreed upon testing strategy was as follows:
- Add debug code gated by an environment flag `TORCH_SERIALIZATION_DEBUG` that will run this offset calculation logic and verify it against getRecordOffset for each storage (when mmap=False)
- This flag is set throughout CI, which means that every time `torch.load` is called, the offset calculation logic is implicitly being tested.

Differential Revision: [D67673026](https://our.internmc.facebook.com/intern/diff/D67673026)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143880
Approved by: https://github.com/albanD
ghstack dependencies: #143879
2025-01-27 23:57:30 +00:00
Randolf Scholz
835e770bad Use typing.IO[bytes] instead of io.BytesIO in annotations (#144994)
Fixes #144976

Using appoach ① `IO[bytes]`, but could also try with a protocol.

## Notes:

- moved `torch.serialization.FILE_LIKE` to `torch.types.FileLike`
- Use `FileLike` annotation where it makes sense
- made sure those functions also support `os.PathLike`
- Replaced `isinstance(x, io.BytesIO)` with `isinstance(x, (io.IOBase, IO))` where appropriate.
- Replaced `BinaryIO` with `IO[bytes]` (the two ABCs are almost identical, the only difference is that `BinaryIO` allows `bytearray` input to `write`, whereas `IO[bytes]` only `bytes`)
- needed to make `torch.serialization._opener` generic to avoid LSP violations.
- skipped `torch/onnx/verification` for now (functions use `BytesIO.getvalue` which is not part of the `IO[bytes]` ABC, but it kind of seems that this is redundant, as e.g. `onnx.load` supports `str | PathLike[str] | IO[bytes]` directly...

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144994
Approved by: https://github.com/ezyang, https://github.com/Skylion007
2025-01-27 18:08:07 +00:00
H. Vetinari
e6c1e6e20e simplify torch.utils.cpp_extension.include_paths; use it in cpp_builder (#145480)
While working on conda-forge integration, I needed to look at the way the include paths are calculated, and noticed an avoidable duplication between `torch/utils/cpp_extension.py` and `torch/_inductor/cpp_builder.py`. The latter already imports the former anyway, so simply reuse the same function.

Furthermore, remove long-obsolete include-paths. AFAICT, the `/TH` headers have not existed since pytorch 1.11.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145480
Approved by: https://github.com/ezyang
2025-01-27 07:19:42 +00:00
PyTorch MergeBot
09ae69a364 Revert "Fix type annotation of Linear.bias (#142326)"
This reverts commit 81e370fc6b.

Reverted https://github.com/pytorch/pytorch/pull/142326 on behalf of https://github.com/malfet due to This introduced a graph break and regressed inductor tests, see 73622fc5fa/1 ([comment](https://github.com/pytorch/pytorch/pull/142326#issuecomment-2614196349))
2025-01-26 03:41:00 +00:00
Fabian Keller
81e370fc6b Fix type annotation of Linear.bias (#142326)
Currently the `bias` attribute of `torch.nn.Linear` (and `Bilinear`) is typed incorrectly, because it relies on the implicit `Module.__getattr__` which types it as `Tensor | Module`. This has two issues:

- It hides the fact that `bias` is optional, and can be `None`, which in turn can hide actual bugs on user side.
- It blurs the type due to having `Module` in the union, which can require unnecessary `isistance(linear.bias, Tensor)` on user side.

This PR types the `bias` attribute explicitly to fix these issues.

CC @ezyang @Skylion007

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142326
Approved by: https://github.com/ezyang
2025-01-24 22:43:52 +00:00
Oguz Ulgen
d3989ca636 Add multi env variable support to configs (#145288)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145288
Approved by: https://github.com/c00w
2025-01-24 10:04:24 +00:00
Johnny
732c4998f3 [NVIDIA] Full Family Blackwell Support codegen (#145436)
More references:
https://github.com/NVIDIA/nccl

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145436
Approved by: https://github.com/ezyang, https://github.com/drisspg
2025-01-24 04:36:00 +00:00
PyTorch MergeBot
714f64329b Revert "Add multi env variable support to configs (#145288)"
This reverts commit a8b7cb6a2d.

Reverted https://github.com/pytorch/pytorch/pull/145288 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing lint from a landrace with some recent PEP585 changes ([comment](https://github.com/pytorch/pytorch/pull/145288#issuecomment-2611278428))
2025-01-24 00:20:00 +00:00
Oguz Ulgen
a8b7cb6a2d Add multi env variable support to configs (#145288)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145288
Approved by: https://github.com/c00w
2025-01-23 23:00:23 +00:00
Irem Yuksel
66bf7da446 Enable sleef for Win Arm64 (#144876)
Sleef module was disabled for Windows Arm64 on b021486405
This PR enables it again since the issue is no longer valid.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144876
Approved by: https://github.com/albanD, https://github.com/malfet

Co-authored-by: Ozan Aydin <148207261+ozanMSFT@users.noreply.github.com>
2025-01-23 19:22:58 +00:00
Aaron Orenstein
629840e038 Backout PEP585 use of Iterable (#145438)
Summary:
Importing Iterable from collections.abc here causes an internal product to fail
MRO discovery causing a collision between Iterable and Generic.

This fixes the failure on D68461304

Differential Revision: D68531443

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145438
Approved by: https://github.com/izaitsevfb
2025-01-23 11:45:37 +00:00
Johnny
a57133e3c7 [NVIDIA] Jetson Thor Blackwell Support codegen (#145395)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145395
Approved by: https://github.com/eqy, https://github.com/malfet
2025-01-22 20:13:19 +00:00
Isuru Fernando
4b77ff9784 Fix PythonMod printing for C++ (#143385)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143385
Approved by: https://github.com/leslie-fang-intel, https://github.com/anijain2305
2025-01-22 14:58:35 +00:00
johnnynunez
35f5668f7e [NVIDIA] RTX50 Blackwell Support codegen (#145270)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145270
Approved by: https://github.com/ezyang
2025-01-21 21:10:05 +00:00
Aaron Orenstein
2f9d378f7b PEP585 update - torch/utils (#145201)
See #145101 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145201
Approved by: https://github.com/bobrenjc93
2025-01-21 21:04:10 +00:00
Edward Z. Yang
efa88e04e1 Don't overspecialize float when propagating cache guards to ShapeEnv (#145078)
Fixes https://github.com/pytorch/pytorch/issues/142507

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145078
Approved by: https://github.com/Skylion007
2025-01-21 18:05:43 +00:00
Aaron Gokaslan
cf05f6a134 [BE]: Improve typing for torch/fx/_pytree.py and torch/utils/_pytree.py (#145173)
Improve type inference in _pytree.py utility functions

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145173
Approved by: https://github.com/bobrenjc93
2025-01-20 22:18:19 +00:00
Nikita Shulga
dc9b77cc55 [MPS] Support includes in metal objects (#145087)
Useful for code reuse for Metal shader build both for eager mode and MPSInductor, but it requires one to implement `_cpp_embed_headers` tool that, as name suggests, would preprocess and embeds the for shader to be used in dynamic compilation.
Test using:
 -  `TestMetalLibrary.test_metal_include`
 - Moving `i0`/`i1` implementation to `c10/util/metal_special_math.h` and call it from `SpecialOps.metal` shader, which now looks much more compact:
 ```metal
template <typename T, typename Tout = T>
void kernel
i0(constant T* input,
   device Tout* output,
   uint index [[thread_position_in_grid]]) {
  output[index] = c10::i0(static_cast<Tout>(input[index]));
}
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145087
Approved by: https://github.com/dcci
ghstack dependencies: #145023
2025-01-18 05:35:22 +00:00
Luca Wehrstedt
a0d2c09115 Add flop formula for _scaled_mm (#144973)
This will make it work correctly with the partitioner's AutoAC
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144973
Approved by: https://github.com/jeffdaily
2025-01-17 09:38:30 +00:00
Zhengxu Chen
53256edff9 [export] Support module inputs for non strict mode. (#143925)
Summary:
Add experimental support for torch.nn.Module as input types.

Before this change, we don't support module inputs but recently we saw some interesting use cases like gpt-fast https://github.com/pytorch-labs/gpt-fast/blob/main/generate.py#L68 where we directly pass in a module input for different variants of the same models.

Since we don't really care about non-param or non-buffer states in non strict mode, we don't care about those either and pretend they are like plain constants during tracing. We treat any module input like a nested container of tensor, and each time we will automatically register a pytree handler for these module types to flatten its state dict into a group of tensors. We will just inline any module method call during tracing like we did for `self` module in export_for_training. This will make input modules' behavior very similar to the training module in typical case, except that we don't record the inputs as parameter or buffers but rather just plain user inputs.

Test Plan: buck run mode/opt caffe2/test:test_export -- -r test_module_input

Differential Revision: D67680827

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143925
Approved by: https://github.com/tugsbayasgalan
2025-01-16 17:30:36 +00:00
PyTorch MergeBot
6559374494 Revert "Add flop formula for _scaled_mm (#144872)"
This reverts commit f31452268b.

Reverted https://github.com/pytorch/pytorch/pull/144872 on behalf of https://github.com/lw due to Breaks ROCm jobs on main ([comment](https://github.com/pytorch/pytorch/pull/144872#issuecomment-2595994134))
2025-01-16 15:16:18 +00:00
Luca Wehrstedt
f31452268b Add flop formula for _scaled_mm (#144872)
This will make it work correctly with the partitioner's AutoAC
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144872
Approved by: https://github.com/vkuzo
2025-01-16 13:57:54 +00:00
wizzniu
c07dc64017 Update pin memory related APIs to not pass 'device' argument (#131858)
Based on https://github.com/pytorch/pytorch/pull/126376, this PR tries to update all PT callers (e.g., `Tensor.is_pinned()`, `Tensor.pin_memory()`) to not pass `device` argument.
As for `storage/untyped_storage.is_pinned()/pin_memory()`, we keep the `device` argument but passing `device` is discouraged. And if not given, the default `device` is still 'cuda' for BC.
Additionally, based on device-agnostic pin_memory, `pin_memory_device` argument of `torch.utils.data.DataLoader` is discouraged  now. For BC, explictly passing this argument is still effective. If not given, the default `device` will be the current accelerator.

Fixes #124908
Relates https://github.com/pytorch/pytorch/pull/126376

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131858
Approved by: https://github.com/albanD

Co-authored-by: albanD <desmaison.alban@gmail.com>
2025-01-15 17:23:35 +00:00
PyTorch MergeBot
d21738f24a Revert "Fix torch.normal ignores default_device (#144070)"
This reverts commit 184549b2d7.

Reverted https://github.com/pytorch/pytorch/pull/144070 on behalf of https://github.com/ezyang due to broken a specific use case ([comment](https://github.com/pytorch/pytorch/pull/144070#issuecomment-2590681953))
2025-01-14 17:41:58 +00:00
Shangdi Yu
5c727d5679 [minifier] Fix config generator for callables (#144518)
Summary:
When config contains callables, the current configs generated cannot be run:

```
torch._dynamo.config.reorderable_logging_functions = {<built-in function print>, <function warning at 0x7f774c595630>, <function log at 0x7f774c595870>, <function error at 0x7f774c595510>, <function info at 0x7f774c595750>, <built-in function warn>, <function exception at 0x7f774c5955a0>, <function debug at 0x7f774c5957e0>, <function critical at 0x7f774c5953f0>}
```

We fix the config to generate the right string, so the config is runnable, like below

```
import logging
import warnings
torch._dynamo.config.reorderable_logging_functions = { warnings.warn, logging.warn, print }
```

Test Plan:
```
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:utils -- -r test_codegen_config
```

Differential Revision: D67998703

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144518
Approved by: https://github.com/desertfire
2025-01-14 17:18:13 +00:00
PyTorch MergeBot
0aa34e9591 Revert "Collect packages with importlib in collect_env (#144616)"
This reverts commit 3541d2a2aa.

Reverted https://github.com/pytorch/pytorch/pull/144616 on behalf of https://github.com/malfet due to Somehow this change causes test_bottleneck_cuda to fail ([comment](https://github.com/pytorch/pytorch/pull/144616#issuecomment-2586095595))
2025-01-13 03:11:04 +00:00
Sv. Lockal
3541d2a2aa Collect packages with importlib in collect_env (#144616)
If pytorch is installed systemwide (via os package manager) or by alternative package manager like `uv`, pip is not available, causing error in `collect_env`.
However it is still possible to collect exactly the same list using `importlib` API, which is always available.

Fixes #144615

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144616
Approved by: https://github.com/malfet
2025-01-12 23:21:08 +00:00
Gabriel Ferns
1376116ab1 Config fuzzer (#139736)
This tool makes it easy to search through config state-space with a minimal reproduction or test. It presents a similar interface to the config bisector by taking a test_function that should either raise on Exception or return False upon failure.

It has two entry points: `fuzz_n_tuple`, which tries every combination of n configs, and `bisect`, which randomly flips configs and tries to find the minimal reproduction upon failure. `bisect` is a much more efficient way to search the space, but `fuzz_n_tuple` can give you peace of mind that a new config will compose with every other config.

It's been used to find three bugs so far in the inductor config:
https://github.com/pytorch/pytorch/issues/140220 https://github.com/pytorch/pytorch/issues/140219
https://github.com/pytorch/pytorch/issues/143524

This PR also adds a bunch of missing types to the inductor config to get them to play nice with the fuzzer, so it can be a good forcing function for adding types to config.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139736
Approved by: https://github.com/eellison
2025-01-12 22:59:02 +00:00
zeshengzong
184549b2d7 Fix torch.normal ignores default_device (#144070)
Fixes #122886

1. Enable `torch.normal` working with `DeviceContext` to get default device which set via `set_default_device`.
2. Add hint in `set_default_device` doc, suggest use `torch.Tensor.to` method move to desired device explicitly.

**Test Result**
1. **Doc Preview**
![image](https://github.com/user-attachments/assets/eb69c334-be2b-4dc5-bdce-567da21e1635)

2. **Local Test**
```python
>>> import torch
>>> torch.normal(0.,1., (10,10)).device
device(type='cpu')
>>> torch.set_default_device('cuda')
>>> torch.normal(0.,1., (10,10)).device
device(type='cuda', index=0)
```

```bash
pytest test/test_tensor_creation_ops.py
```

![image](https://github.com/user-attachments/assets/8b466b55-f162-4b83-8b20-71de2c1d0914)

```bash
lintrunner
```
![image](https://github.com/user-attachments/assets/5b269c50-da57-47ed-8500-4edf2c2295e4)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144070
Approved by: https://github.com/ezyang
2025-01-10 08:19:55 +00:00
Eddie Yan
28b1960d49 [CUDA] parse arch-conditional compute-capability when building extensions (#144446)
don't choke on arch-conditional compute capabilities e.g., `sm_90a`: #144037

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144446
Approved by: https://github.com/Skylion007, https://github.com/ezyang
2025-01-09 22:05:18 +00:00
bobrenjc93
90e81a157a Migrate from Tuple -> tuple in torch/utils/data (#144255)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144255
Approved by: https://github.com/andrewkho
2025-01-08 04:09:45 +00:00
Shangdi Yu
72e8f34715 [AoTI Minifier] UX Improvement (#143330)
Summary:
- When a user specify `TORCHINDUCTOR_MAX_AUTOTUNE=1` env variable, we add `config.max_autotune=True` to the generated minifier_launcher
- We should do this to other inductor configs as well in a followup Diff

Currently in dynamo and aoti minifier, if a config is overwritten by an env variable, the config will not show up in the config list in the minifier_launcher.py file. As a result, when running the minifier_launcher, they need to re-apply the same env variable.
 This is:
1) not convenient for the users
2) if they copy-paste the minifier_launcher.py to us without including the env variable, we could be confused and not able to reproduce the error.

Underlying implementation change:

- Add `env_default` parameter to `codegen_config()`. If set, configs overriden by the env are not considered default.

Test Plan:
```
 buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:utils -- -r test_codegen_config
```

Differential Revision: D67299312

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143330
Approved by: https://github.com/jansel, https://github.com/eellison
2025-01-07 20:04:19 +00:00
Henry Hu
f013cfee38 [TreeSpec] Support enum in defaultdict (#144235)
Summary: Followup from D66269157, add support for enum in defaultdict.

Test Plan: Added unit test

Differential Revision: D67832100

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144235
Approved by: https://github.com/henrylhtsang, https://github.com/houseroad
2025-01-07 00:10:46 +00:00
Isuru Fernando
301b9c8a90 Fix PythonMod printing (#144078)
Fixes #144075
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144078
Approved by: https://github.com/anijain2305
2025-01-06 22:52:34 +00:00
Xiaodong Wang
3d3a07963f [reland][attempt2][AMD] Turn on TF32 for aten::mm (#144145)
Summary:
https://github.com/pytorch/pytorch/pull/143549 was reverted due to some
internal/oss tooling issue. Relanding.

hipblaslt supports TF32, so adding the support.
Original PR https://github.com/pytorch/pytorch/pull/139869

Test Plan: CI

Differential Revision: D67785496

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144145
Approved by: https://github.com/jianyuh
2025-01-06 00:37:01 +00:00
Michal Gallus
93633d0e80 [ROCm][Windows] Fix export macros (#144098)
For correct import and export of functions when the dynamic linkage is used for HIP libraries on windows, the appropriate export/import macros need to be put in place. This Pull Request utilizes existing CUDA import/export macros by converting them to corresponding HIP macros during the hipification process.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144098
Approved by: https://github.com/jeffdaily
2025-01-04 17:12:46 +00:00
Aaron Orenstein
45ef3309e3 [BE] typing for decorators (#144161)
Summary:
Untyped decorators strip annotations from the decorated items.

- _compile
- _inductor/fx_passes/post_grad
- _inductor/lowering
- _library/custom_ops
- _meta_registrations
- _ops
- _refs/nn/functional
- ao/quantization/quantizer/xnnpack_quantizer_utils
- distributed/_composable/contract
- fx/experimental/graph_gradual_typechecker
- fx/experimental/migrate_gradual_types/constraint_generator
- optim/optimizer
- signal/windows/windows
- testing/_internal/common_device_type
- torch/_inductor/decomposition
- utils/flop_counter

Test Plan: unit tests

Differential Revision: D62302684

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144161
Approved by: https://github.com/Skylion007, https://github.com/albanD
2025-01-04 16:40:09 +00:00
PyTorch MergeBot
99f2491af9 Revert "Use absolute path path.resolve() -> path.absolute() (#129409)"
This reverts commit 45411d1fc9.

Reverted https://github.com/pytorch/pytorch/pull/129409 on behalf of https://github.com/jeanschmidt due to Breaking internal CI, @albanD please help get this PR merged ([comment](https://github.com/pytorch/pytorch/pull/129409#issuecomment-2571316444))
2025-01-04 14:17:20 +00:00
Xuehai Pan
45411d1fc9 Use absolute path path.resolve() -> path.absolute() (#129409)
Changes:

1. Always explicit `.absolute()`: `Path(__file__)` -> `Path(__file__).absolute()`
2. Replace `path.resolve()` with `path.absolute()` if the code is resolving the PyTorch repo root directory.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129409
Approved by: https://github.com/albanD
2025-01-03 20:03:40 +00:00
Michael Diggin
55dc61dd52 Dataloader distribute tasks to workers when in_order is False (#142324)
Fixes #105203 and is a follow up PR to #141833

When `in_order` is True (the default), tasks are given out to workers in a round robin fashion. When `in_order` is False this is no longer needed, as we give up guarantees of reproducibility, and instead tasks should be given to workers that are able to perform work.
In this PR I've added tracking of the number of outstanding tasks for each worker (updated when tasks are added to their queue, and when data is returned to the main thread). When finding the next queue to add a task to, if `in_order` is False it will only add the task to the workers queue if it has fewer than `_prefetch_factor` tasks outstanding.
The current default behaviour is left as is.

Tests are also updated to assert on the worker IDs for each sample of data returned.
I've run the following to confirm they aren't flaky
```bash
for i in {1..20}; do python test/test_dataloader.py TestOutOfOrderDataLoader; done
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142324
Approved by: https://github.com/andrewkho
2025-01-03 12:57:04 +00:00
bobrenjc93
0d6db839a7 remove allow-untyped-defs from utils/data/datapipes/iter/streamreader.py (#144088)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144088
Approved by: https://github.com/aorenste
2025-01-03 01:21:44 +00:00
bobrenjc93
bdfb40ed29 remove allow-untyped-defs from utils/_import_utils.py (#144089)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144089
Approved by: https://github.com/aorenste
2025-01-03 01:21:13 +00:00
Kasperi Apell
a7915c56f6 Propagate callable parameter types using ParamSpec (#142306) (#143797)
The codebase has a few locations where callable parameter type information is lost when the unpackings *args and **kwargs are typed as Any. Refactor these instances to retain type information using typing_extensions.ParamSpec.

Also, in these functions, enforce return type with TypeVar.

Addresses #142306

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143797
Approved by: https://github.com/Skylion007

Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>
Co-authored-by: Xuehai Pan <XuehaiPan@outlook.com>
2024-12-29 23:03:14 +00:00