Commit graph

83707 commits

Author SHA1 Message Date
Gabriel Ferns
edf266e9bb inductor.config.descriptive_names = False is not actually supported (#145523)
Summary:
This config is not supported (it throws an error when set), and doesn't really make sense imo.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145523
Approved by: https://github.com/eellison
2025-01-28 20:22:23 +00:00
Jane Xu
515e55e692 Set -DPy_LIMITED_API flag for py_limited_api=True extensions (#145764)
This could be BC breaking, because there was a period of time when we use py_limited_api=True but don't enforce the flag, and now that we will start enforcing the flag, people's custom extensions may fail to build.

This is strictly still better behavior, as it is sketchy to claim CPython agnosticism without the flag, but calling this out as potential people yelling at us. Ways to mitigate this risk + reasons this may not be too big a deal:
- People haven't known about py_limited_api for extensions much due to lack of docs from python so usage is low right now
- My current tutorial is in store to make new users of py_limited_api pass this flag, so it'd be a noop for them.

Test plan:
* Locally i'm confident as I tried rebuilding ao with this change and it reliably failed (cuz importing torch/extension.h is a nono)
* Unit test wise, the normal python_agnostic one I added should work

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145764
Approved by: https://github.com/ezyang, https://github.com/zou3519, https://github.com/albanD
2025-01-28 20:11:05 +00:00
Nikita Shulga
8d91bfd965 [BE] Include CheckFunctionExists in FindBLAS.cmake (#145849)
It's used in the script, so it must be included
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145849
Approved by: https://github.com/Skylion007
2025-01-28 19:47:05 +00:00
Ryan Guo
eaff13275e [dynamo] Properly branch on an unspecialized NN module (#145786)
User defined NN module might have their own `__len__` or `__bool__`
methods which Dynamo needs to trace through, so that side effects and/or
reads to buffered writes are properly handled.

This patch removes the special `UnspecializedNNModuleVariable` branch in
Dynamo's branch handling, and lets these cases fall into the
`UserDefinedObjectVariable` branch, which handles the aforementioned
cases correctly.

Fixes #145284.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145786
Approved by: https://github.com/williamwen42
2025-01-28 19:45:17 +00:00
James Wu
d9ffa5da65 Log info for AOTAutogradCache bypasses instead of warning (#145768)
Fixes #145767

FxGraphCache also logs to info instead of warning so lets do that

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145768
Approved by: https://github.com/eellison, https://github.com/bdhirsh
2025-01-28 19:25:36 +00:00
Camyll Harajli
6c09954a9e Windows builds with VS2022 (#145319)
[Fixes #ISSUE_NUMBER
](https://github.com/pytorch/pytorch/issues/128835)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145319
Approved by: https://github.com/huydhn
2025-01-28 19:07:24 +00:00
Pian Pawakapan
cbc4094298 [draft_export] add LOC for data-dep error logging (#145443)
Summary:
maybe this is too much info, but it's difficult to go through old draft export reports where the stack trace is out of sync with the current codebase. Data-dependent errors now look like:
```
2. Data dependent error.
    When exporting, we were unable to evaluate the value of `u306`.
    This occurred at the following stacktrace:
    File /data/users/pianpwk/fbsource/buck-out/v2/gen/fbcode/78204cab86e8a0fb/sigmoid/inference/ts_migration/__pt2i_readiness_main__/pt2i_readiness_main#link-tree/caffe2/torch/fb/training_toolkit/common/proxy_module_thrift/embedding_bag_proxy.py, lineno 109, in _forward_impl:
         `if offsets[-1] > len(input):`
    As a result, it was specialized to evaluate to `261`, and asserts were inserted into the graph.
    Please add `torch._check(...)` to the original code to assert this data-dependent assumption.
    Please refer to https://docs.google.com/document/d/1kZ_BbB3JnoLbUZleDT6635dHs88ZVYId8jT-yTFgf3A/edit#heading=h.boi2xurpqa0o for more details.
```

This would be even more helpful for reports on torch-packaged models, but that requires some more work on PT2I-specific stack trace processing

Test Plan: .

Differential Revision: D68534017

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145443
Approved by: https://github.com/angelayi
2025-01-28 18:55:16 +00:00
Xinya Zhang
c32bafeb0b [ROCm] Bump AOTriton to 0.8.2b (#145508)
We received reports AOTriton kernels mishandles the bias pointer and it causes NaN during fine-tuning llama3.2-11b vision model. This PR will fix the problem.

Note: this AOTriton 0.8.1b adds head dimension 512 support and thus the binary size increases,  but it is considered experimental and will not be enabled right now.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145508
Approved by: https://github.com/jeffdaily
2025-01-28 18:34:25 +00:00
eellison
621604ce46 Maintain multiple configs (#145103)
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom):

Previously, we would finalize the config of a triton template after its first fusion. this maintains multiple configs, in case we epilogue fuse, then prologue fuse, and prologue fusion has a new better config.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145103
Approved by: https://github.com/jansel, https://github.com/shunting314
ghstack dependencies: #143408
2025-01-28 18:32:14 +00:00
Ryan Guo
eaec97ab1f [dynamo] Properly prune dead input cell object (#145781)
This patch models input cell object as "newly created" rather than
"pre-existing" python object (see added documentation for why this
actually captures the semantics more accurately).

This enables the `SideEffects.prune_dead_object_new` algorithm to prune
away writes to input cell objects which are no longer relevant; this
didn't happen prior to this patch because we modelled them as
pre-existing objects, which forces us to codegen their attribute
mutations.

Fixes #145564.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145781
Approved by: https://github.com/williamwen42, https://github.com/jansel
2025-01-28 18:28:13 +00:00
eellison
8e258e2ecd Parallelize epilogue/prologue benchmarking (#143408)
When we attempt prologue or epilogue fusion with a TritonTemplate, we benchmark it at compile time in order to determine profitability. This avoids slowdowns/register spilling, and allows us to pick fusion when a base triton template is slower than cublas but faster when considering an epilogue. However, that fused benchmarking does not do the same async compilation as we do for the base TritonTemplate. The Base TritonTemplate is async compiled during lowering, then later waited on and benchmarked.

This PR extends a similar process to benchmarking fused TritonTemplates in the scheduler. We keep a list of pending fusions which have async compilations. And we resolve any pending fusions a node is in prior to attempting to fuse it with any other node.

Initially, I saw some slowdowns with this because we kick off async compilations of identical fusions in parallel. To address this I added source code caching at the `async_compile` level (we also already cache benchmark runs, but that would not happen in parallel).

Compilation speedups:

<img width="717" alt="image" src="https://github.com/user-attachments/assets/8e8f7d6c-7824-4210-83f9-a2a0f6db5ac9" />

This also should let us be a bit more aggressive with either configs, or benchmarking other fusions which are hard to determine profitability of.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143408
Approved by: https://github.com/jansel, https://github.com/shunting314
2025-01-28 18:18:24 +00:00
Nikita Shulga
3fd4691908 [MPS] Add op_math_t (#145808)
Similar to `at::opmath_t` to be used for reduction (and int mms)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145808
Approved by: https://github.com/dcci
2025-01-28 18:03:52 +00:00
atalman
5382ab57d7 Move trunk windows builds to CUDA-12.4 (#145844)
Same as : https://github.com/pytorch/pytorch/pull/130446

That should catch build regressions that were previously only detectable during the nightly builds for 12.4

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145844
Approved by: https://github.com/janeyx99, https://github.com/malfet
2025-01-28 18:00:51 +00:00
Huy Do
56915b093a Fix environment deployment spam (#145823)
With https://github.com/pytorch-labs/pytorch-gha-infra/pull/598 in place, the environment can now be removed.

Fixes https://github.com/pytorch/pytorch/issues/145704

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145823
Approved by: https://github.com/clee2000
2025-01-28 17:46:31 +00:00
PyTorch MergeBot
cfbb27462e Revert "[inductor][BE] Enable test_cpu_cpp_wrapper in fbcode (#145373)"
This reverts commit b8087747f5.

Reverted https://github.com/pytorch/pytorch/pull/145373 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/145373#issuecomment-2619674197))
2025-01-28 17:46:11 +00:00
PyTorch MergeBot
dbef2a9bc9 Revert "Remove lexicographical sorting of storage keys in torch.save (#143879)"
This reverts commit 7db0afabaa.

Reverted https://github.com/pytorch/pytorch/pull/143879 on behalf of https://github.com/ZainRizvi due to Sorry but this is breaking internally. See D68746524 for details ([comment](https://github.com/pytorch/pytorch/pull/143879#issuecomment-2619661492))
2025-01-28 17:40:16 +00:00
Zain Rizvi
097ccd9c39 Move ROCm MI300 jobs to unstable to make CI green (#145790)
This is a temporary change to reduce intermittent tests failures. Jobs can be moved back once those machines get better runner isolation.

This also sneaks in a small fix to all the rocm job's build step to be run on Linux Foundation runners (the get-label-type dependency).  The inductor-rocm-mi300 workflow already had it, but it was missing in the rocm-mi300 workflow.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145790
Approved by: https://github.com/yangw-dev
2025-01-28 17:25:15 +00:00
saienduri
7eb51e5464 Ensure GPU isolation for kubernetes pod MI300 runners. (#145829)
Fixes the reason behind moving the tests to unstable initially. (https://github.com/pytorch/pytorch/pull/145790)
We ensure gpu isolation for each pod within kubernetes by propagating the drivers selected for the pod from the Kubernetes layer up to the docker run in pytorch here.
Now we stick with the GPUs assigned to the pod in the first place and there is no overlap between the test runners.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145829
Approved by: https://github.com/jeffdaily
2025-01-28 17:20:46 +00:00
cyy
c751541e79 Fix cppcoreguidelines-init-variables ignorance (#141795)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141795
Approved by: https://github.com/albanD
2025-01-28 17:11:37 +00:00
Mu-Chu Lee
ac87388e61 [AOTInductor] Refactor CPU and GPU to remove ifdef macros (#145639)
Summary: Remove #ifdef USE_CUDA macros through some refactor

Test Plan: Refactor code, existing tests.

Differential Revision: D68636743

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145639
Approved by: https://github.com/desertfire
2025-01-28 16:46:00 +00:00
Dmitry Nikolaev
6967ef1b07 [ROCm] fix test_cublas_workspace_explicit_allocation for gfx12 (#145227)
gfx12 passes the condition `torch.cuda.get_device_capability() >= (9, 4)` and uses `default_workspace_size=128MB`, but it required only for MI300
Fix condition to use `("gfx94" in gcn_arch)` instead of `torch.cuda.get_device_properties()` to detect MI300.
Now `default_workspace_size=32MB` is used for gfx12 and the test passes

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145227
Approved by: https://github.com/jeffdaily, https://github.com/eqy
2025-01-28 16:19:27 +00:00
Animesh Jain
80a0412b76 [dynamo][builtin-skipfiles-cleanup] Remove posixpath (#145828)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145828
Approved by: https://github.com/zou3519
ghstack dependencies: #145744, #145753, #145826
2025-01-28 16:14:34 +00:00
Animesh Jain
6824a4a75d [dynamo][builtin-skipfiles-cleanup] Remove re (#145826)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145826
Approved by: https://github.com/zou3519
ghstack dependencies: #145744, #145753
2025-01-28 16:14:34 +00:00
Animesh Jain
4307e6c008 [dynamo][builtin-skipfile-cleanup] Remove signal (#145753)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145753
Approved by: https://github.com/zou3519
ghstack dependencies: #145744
2025-01-28 16:14:23 +00:00
eellison
3a56089217 fix unbacked + view incorrectness (#145548)
fix for https://github.com/pytorch/pytorch/issues/143498

We were incorrectly using contiguous strides for a non-contiguous tensor. There are two separate causes:

1. https://github.com/pytorch/pytorch/pull/110520 made it so we turn Views contiguous with unbacked symints becuase
`dynamic_reshape_indexer below will fail due to the size_hint's inability to process unbacked SymInts`. Seems like we should fix. Regardless - it will make the input contiguous if input is unbacked to workaround this.

2. We weren't actually making it contiguous! I filed an issue for this here: https://github.com/pytorch/pytorch/issues/145561.

This is still worth landing as a fix, even though we should those issues.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145548
Approved by: https://github.com/desertfire
2025-01-28 16:03:45 +00:00
cyyever
97b3b73f3e [Environment Variable][7/N] Use thread-safe getenv functions (#140211)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140211
Approved by: https://github.com/ezyang, https://github.com/eqy
2025-01-28 15:21:12 +00:00
Zhenbin Lin
a08f7f3266 OpenReg: fix issue of pin_memory (#145046)
Fix issue of `pin_memory` when rewrapping a storage.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145046
Approved by: https://github.com/albanD
2025-01-28 09:41:04 +00:00
Chirag Pandya
bdf6dfa17d [chore][ez] change alloc buffer size from 4000 to 4096 (#145759)
Summary:
Allocations typically happen as a power of 2 anyway.
Change the default alloc size to 4096 so eek out a bit more perf.

Test:
unit tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145759
Approved by: https://github.com/XilunWu, https://github.com/fduwjj
ghstack dependencies: #145756, #145757
2025-01-28 09:14:07 +00:00
Animesh Jain
5c5306e8bc [dynamo][builtin-skiplist-cleanup] Remove weakref (#145744)
WeakKeyDictionary already works very nicely with the UserDefinedObject Variable Tracker.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145744
Approved by: https://github.com/jansel
2025-01-28 07:55:12 +00:00
Avik Chaudhuri
45f64e770a relax assertion to warning for unbacked binding names (#145777)
Summary:
Quick fix following up on https://github.com/pytorch/pytorch/pull/144894 to unblock internal tests.

Will keep investigating a more principled fix.

Test Plan: Failures in T213563826 now pass

Differential Revision: D68731710

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145777
Approved by: https://github.com/angelayi
2025-01-28 07:52:40 +00:00
Michael Graczyk
0a8a0ef767 [inductor] Fix crash running wrapper_benchmark with no device (#145644)
Fixes #145434

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145644
Approved by: https://github.com/shunting314
2025-01-28 07:31:36 +00:00
eellison
a699034eec Record inputs at time of tracing, constrain to them for triton fn (#145448)
Record input fake tensors at time of tracing and store them in the node meta. Inductor passes have the possibility of changing strides, so it is safer to record the strides of the inputs at tracing. See, https://github.com/pytorch/pytorch/issues/137979 for more context.

We can also extend this to custom ops, and user-visible outputs. If this ends up being compilation time sensitive we can just record strides (and maybe storage offset, per @zou3519) instead of the complete fake tensor.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145448
Approved by: https://github.com/zou3519
2025-01-28 07:07:14 +00:00
Nikita Shulga
0f5a68344a [BE][Inductor] Simplify custom_op tests (#145814)
Not sure what were the motivation behind repeating the same function over and over again for different backends
Change `test_custom_op_[123]` from acceptig separate (but identical) implementations for CPU, CUDA and XPU, to take just `fn` and `fn_meta` args

Test that it also extendable to MPS

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145814
Approved by: https://github.com/jansel
2025-01-28 05:58:51 +00:00
cyyever
23eb0a3201 Improve typing in torch/types.py (#145237)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145237
Approved by: https://github.com/XuehaiPan, https://github.com/albanD

Co-authored-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2025-01-28 05:29:12 +00:00
Aaron Gokaslan
8e46d0f595 [BE]: Update typing of OrderedSet ancestor (#145783)
Now that we are on python 3.9 minimum version we can properly use Generics in the superclass
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145783
Approved by: https://github.com/eellison
2025-01-28 04:43:49 +00:00
cyy
67fcc7cf02 [3/N] Remove unnecessary once flag usage (#145672)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145672
Approved by: https://github.com/albanD
2025-01-28 04:28:18 +00:00
Burak Turk
01a4d86b31 add pt2 callbacks for backward pass and prevent duplicate callbacks (#145732)
Summary: This change adds callbacks for lazy backwards compilation while preventing duplicate callbacks to be fired.

Differential Revision: D68577593

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145732
Approved by: https://github.com/mlazos
2025-01-28 03:50:02 +00:00
Pian Pawakapan
1a26cdd5cb [cond] remove warning for unsupported tuple returns (#145766)
I guess this is supported now
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145766
Approved by: https://github.com/ydwu4, https://github.com/zou3519
2025-01-28 03:13:36 +00:00
PyTorch MergeBot
9010649292 Revert "Add option to serialization config to reduce random reads from get_record_offset when loading with mmap=True (#143880)"
This reverts commit db3685a35c.

Reverted https://github.com/pytorch/pytorch/pull/143880 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but either this PR or the base PR breaks distributed tests ([comment](https://github.com/pytorch/pytorch/pull/143880#issuecomment-2617743403))
2025-01-28 03:07:17 +00:00
Chirag Pandya
78f02bf07c [bug] handle case when remote peer closes connection (#145757)
Summary:
In the case where remote peer closes the connection, nread returns 0. In
this case, we still want to free up the allocated buffer.
Also, reorder the if so that the likely success cases (nread > 0) is at
the top of the function with an early return.

Test Plan:
unit tests

Differential Revision: [D68733192](https://our.internmc.facebook.com/intern/diff/D68733192)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145757
Approved by: https://github.com/XilunWu, https://github.com/fduwjj
ghstack dependencies: #145756
2025-01-28 03:06:38 +00:00
Pian Pawakapan
4be831ba2d [draft_export] fix dense-in-memory check for inferring fakes (#145653)
Test Plan: fixes check for dense tensors with size-1 dimensions

Differential Revision: D68644028

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145653
Approved by: https://github.com/zou3519
2025-01-28 02:52:14 +00:00
James Wu
7c1fc0a047 Log cache state for AOTAutograd in title of file (#145715)
Differential Revision: [D68692755](https://our.internmc.facebook.com/intern/diff/D68692755/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145715
Approved by: https://github.com/bobrenjc93
2025-01-28 02:14:18 +00:00
Jason Ansel
78a94c9114 [inductor] Remove type ignores from scheduler.py (#145712)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145712
Approved by: https://github.com/yanboliang, https://github.com/Skylion007
ghstack dependencies: #145692
2025-01-28 01:44:32 +00:00
Jason Ansel
2df2f9d895 [inductor] Change type of get_backend_features to OrderedSet (#145692)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145692
Approved by: https://github.com/yanboliang
2025-01-28 01:44:32 +00:00
Yifu Wang
db33d23aa8 [SymmetricMemory] fix an issue where rendezvous is performed with wrong device context when torch.cuda.set_device() is not callled (#144886)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144886
Approved by: https://github.com/awgu
2025-01-28 01:43:37 +00:00
William Wen
e3d3f2b22e [dynamo] save/restore system random state more carefully (#145750)
Reattempt of https://github.com/pytorch/pytorch/pull/145435 since the state of the linked internal diff appears to be messed up.

Note: I have verified that the previously failing internal tests now pass internally.

Differential Revision: [D68723334](https://our.internmc.facebook.com/intern/diff/D68723334)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145750
Approved by: https://github.com/StrongerXi
2025-01-28 01:34:13 +00:00
Gabriel Ferns
f16ce3c7e9 Refactor fuzzer and add support for Dynamo (#145565)
## Summary:
Dynamo now works with config fuzzer.

For BE week, we also found and fixed 5 different bugs (in inductor):
- https://github.com/pytorch/pytorch/pull/145426
- https://github.com/pytorch/pytorch/pull/145523
- https://github.com/pytorch/pytorch/pull/145527
- https://github.com/pytorch/pytorch/pull/145532
- https://github.com/pytorch/pytorch/pull/145538

## Test Plan:
New Dynamo Unit tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145565
Approved by: https://github.com/masnesral
2025-01-28 00:44:27 +00:00
Syed Tousif Ahmed
6eb74fbec6 Updates NCCL user buffer registration test for NCCL 2.24.3 (#145285)
NCCL 2.24.3 changed the content of the debug output for NVLS registration. We use this debug output in our test suite to check if NVLS was successfully registered or not. Hence we need to specialize for the NCCL version in the test.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145285
Approved by: https://github.com/kwen2501
2025-01-28 00:24:53 +00:00
Ryan Guo
5a4d959cdb [dynamo] Properly model torch profiler context objects (#145537)
Prior to this patch, Dynamo conveniently modelled torch profiler context
objects (e.g., `torch.profiler.profile`) as `NullContextVariable`
because `torch.compile` ignore the effect of these profiler contexts.

However, the semantics of these profiler contexts diverges from
`contextlib.nullcontext` in the `__enter__` function, where the former
returns `self` and the latter returns `None`. This causes subtle error
as observed in #125021.

This patch adds back a `ProfilerContextVariable`, which addresses the
aforementioned semantic discrepency.

Fixes #125021.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145537
Approved by: https://github.com/zou3519, https://github.com/williamwen42
2025-01-28 00:03:36 +00:00
Mikayla Gawarecki
db3685a35c Add option to serialization config to reduce random reads from get_record_offset when loading with mmap=True (#143880)
## Background

This PR adds `torch.utils.serialization.config.load.calculate_storage_offsets`. This option relies  on the previous PR in this stack, where storage order was changed to non lexicographical. A `.format_version` entry was added to the zipfile and `calculate_storage_offsets` will only work on checkpoints with `.format_version`.

When this is turned on, for `torch.load(mmap=True)`, offsets of each storage record (other than the 0th storage will be calculated instead of relying on `miniz` APIs to determine this).

The existing APIs will issue multiple random reads (reading the end of central directory record, then reading the zipfile header for the record) to determine the storage offset where the record starts. This can greatly degrade `torch.load(mmap=True)` performance for non-filesystem cases.

6aaae9d78f/caffe2/serialize/inline_container.cc (L589-L605)

## Testing strategy

The agreed upon testing strategy was as follows:
- Add debug code gated by an environment flag `TORCH_SERIALIZATION_DEBUG` that will run this offset calculation logic and verify it against getRecordOffset for each storage (when mmap=False)
- This flag is set throughout CI, which means that every time `torch.load` is called, the offset calculation logic is implicitly being tested.

Differential Revision: [D67673026](https://our.internmc.facebook.com/intern/diff/D67673026)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143880
Approved by: https://github.com/albanD
ghstack dependencies: #143879
2025-01-27 23:57:30 +00:00