pytorch/torch
Mikayla Gawarecki db3685a35c Add option to serialization config to reduce random reads from get_record_offset when loading with mmap=True (#143880)
## Background

This PR adds `torch.utils.serialization.config.load.calculate_storage_offsets`. This option relies  on the previous PR in this stack, where storage order was changed to non lexicographical. A `.format_version` entry was added to the zipfile and `calculate_storage_offsets` will only work on checkpoints with `.format_version`.

When this is turned on, for `torch.load(mmap=True)`, offsets of each storage record (other than the 0th storage will be calculated instead of relying on `miniz` APIs to determine this).

The existing APIs will issue multiple random reads (reading the end of central directory record, then reading the zipfile header for the record) to determine the storage offset where the record starts. This can greatly degrade `torch.load(mmap=True)` performance for non-filesystem cases.

6aaae9d78f/caffe2/serialize/inline_container.cc (L589-L605)

## Testing strategy

The agreed upon testing strategy was as follows:
- Add debug code gated by an environment flag `TORCH_SERIALIZATION_DEBUG` that will run this offset calculation logic and verify it against getRecordOffset for each storage (when mmap=False)
- This flag is set throughout CI, which means that every time `torch.load` is called, the offset calculation logic is implicitly being tested.

Differential Revision: [D67673026](https://our.internmc.facebook.com/intern/diff/D67673026)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143880
Approved by: https://github.com/albanD
ghstack dependencies: #143879
2025-01-27 23:57:30 +00:00
..
_awaits
_C Revert "[CUDA][cuBLAS] Add fp16 accumulate option to cuBLAS/cuBLASLt (#144441)" 2025-01-27 19:38:26 +00:00
_C_flatbuffer
_custom_op PEP585 update - mostly toplevels (#145178) 2025-01-22 02:21:14 +00:00
_decomp PEP585 update - torch/_C torch/_decomp torch/_lazy torch/_library torch/_numpy torch/_prims torch/_refs torch/_strobelight (#145102) 2025-01-18 20:47:12 +00:00
_dispatch PEP585 update - mostly toplevels (#145178) 2025-01-22 02:21:14 +00:00
_dynamo inductor_config_logging: Don't drop keys (#144700) 2025-01-27 23:47:25 +00:00
_export serde unbacked bindings (#144894) 2025-01-25 02:34:27 +00:00
_functorch functional compiled autograd (#144707) 2025-01-27 05:20:56 +00:00
_higher_order_ops [BE][Ez]: FURB148 - remove useless enumerate calls (#145619) 2025-01-24 23:37:15 +00:00
_inductor Revert "pickler for GraphModule (#141659)" 2025-01-27 22:39:30 +00:00
_lazy PEP585 update - torch/_C torch/_decomp torch/_lazy torch/_library torch/_numpy torch/_prims torch/_refs torch/_strobelight (#145102) 2025-01-18 20:47:12 +00:00
_library [Custom Ops] Add a new API to allow users to register an autocast for the custom op (#145588) 2025-01-27 19:22:43 +00:00
_logging [BE][export] add "+export" logging to de/serialization (#145283) 2025-01-23 19:47:48 +00:00
_numpy PEP585 update - torch/_C torch/_decomp torch/_lazy torch/_library torch/_numpy torch/_prims torch/_refs torch/_strobelight (#145102) 2025-01-18 20:47:12 +00:00
_prims PEP585 update - torch/_C torch/_decomp torch/_lazy torch/_library torch/_numpy torch/_prims torch/_refs torch/_strobelight (#145102) 2025-01-18 20:47:12 +00:00
_prims_common Output of nonzero is transposed, fix fake tensor (#144695) 2025-01-26 01:07:22 +00:00
_refs [Inductor][CPP] fix torch logit decomposition (#145576) 2025-01-27 19:37:51 +00:00
_strobelight PEP585 update - torch/_C torch/_decomp torch/_lazy torch/_library torch/_numpy torch/_prims torch/_refs torch/_strobelight (#145102) 2025-01-18 20:47:12 +00:00
_subclasses Revert "pickler for GraphModule (#141659)" 2025-01-27 22:39:30 +00:00
_vendor
accelerator
amp [autocast][pytorch] Support autocast for MTIA (#145627) 2025-01-25 03:24:59 +00:00
ao Revert "Fix type annotation of Linear.bias (#142326)" 2025-01-26 03:41:00 +00:00
autograd functional compiled autograd (#144707) 2025-01-27 05:20:56 +00:00
backends Revert "[CUDA][cuBLAS] Add fp16 accumulate option to cuBLAS/cuBLASLt (#144441)" 2025-01-27 19:38:26 +00:00
compiler [Doc] Add period at the end of the sentence (#145384) 2025-01-22 19:56:31 +00:00
contrib PEP585 update - mostly toplevels (#145178) 2025-01-22 02:21:14 +00:00
cpu
csrc Add option to serialization config to reduce random reads from get_record_offset when loading with mmap=True (#143880) 2025-01-27 23:57:30 +00:00
cuda PEP585 update - torch/_higher_order_ops torch/_subclasses torch/backends torch/compiler torch/cuda torch/masked torch/mtia torch/nested (#145202) 2025-01-20 22:37:26 +00:00
distributed [dcp] Add ZStandard transformer (#143360) 2025-01-25 00:14:07 +00:00
distributions torch.distributions: replace numbers.Number with torch.types.Number. (#145086) 2025-01-27 20:24:55 +00:00
export Use typing.IO[bytes] instead of io.BytesIO in annotations (#144994) 2025-01-27 18:08:07 +00:00
fft
func
futures PEP585 update - mostly toplevels (#145178) 2025-01-22 02:21:14 +00:00
fx Revert "pickler for GraphModule (#141659)" 2025-01-27 22:39:30 +00:00
jit PEP585 update - mostly toplevels (#145178) 2025-01-22 02:21:14 +00:00
legacy
lib
linalg
masked PEP585 update - torch/_higher_order_ops torch/_subclasses torch/backends torch/compiler torch/cuda torch/masked torch/mtia torch/nested (#145202) 2025-01-20 22:37:26 +00:00
monitor
mps
mtia [S481486] Move MTIA dynamic library loading from __init__.py to a separate module (#145322) 2025-01-22 23:39:43 +00:00
multiprocessing
nested Support remaining *_like factory functions for NJT (#144889) 2025-01-27 21:33:51 +00:00
nn Revert "Fix type annotation of Linear.bias (#142326)" 2025-01-26 03:41:00 +00:00
onnx Use typing.IO[bytes] instead of io.BytesIO in annotations (#144994) 2025-01-27 18:08:07 +00:00
optim PEP585 update - torch/nn torch/optim torch/package torch/profiler torch/serialization torch/sparse torch/xpu (#145175) 2025-01-21 16:57:27 +00:00
package Use typing.IO[bytes] instead of io.BytesIO in annotations (#144994) 2025-01-27 18:08:07 +00:00
profiler PEP585 update - torch/nn torch/optim torch/package torch/profiler torch/serialization torch/sparse torch/xpu (#145175) 2025-01-21 16:57:27 +00:00
quantization
signal PEP585 update - mostly toplevels (#145178) 2025-01-22 02:21:14 +00:00
sparse PEP585 update - torch/nn torch/optim torch/package torch/profiler torch/serialization torch/sparse torch/xpu (#145175) 2025-01-21 16:57:27 +00:00
special
testing [BE] Remove test_ops from FIXME_inductor_dont_reset_dynamo (#145307) 2025-01-27 18:12:39 +00:00
utils Add option to serialization config to reduce random reads from get_record_offset when loading with mmap=True (#143880) 2025-01-27 23:57:30 +00:00
xpu PEP585 update - torch/nn torch/optim torch/package torch/profiler torch/serialization torch/sparse torch/xpu (#145175) 2025-01-21 16:57:27 +00:00
__config__.py
__future__.py
__init__.py [CUDA] Change slim-wheel libraries load order (#145638) 2025-01-24 22:00:56 +00:00
_appdirs.py
_classes.py
_compile.py
_custom_ops.py
_deploy.py
_environment.py
_guards.py PEP585 update - mostly toplevels (#145178) 2025-01-22 02:21:14 +00:00
_jit_internal.py PEP585 update - mostly toplevels (#145178) 2025-01-22 02:21:14 +00:00
_linalg_utils.py PEP585 update - mostly toplevels (#145178) 2025-01-22 02:21:14 +00:00
_lobpcg.py PEP585 update - mostly toplevels (#145178) 2025-01-22 02:21:14 +00:00
_lowrank.py PEP585 update - mostly toplevels (#145178) 2025-01-22 02:21:14 +00:00
_meta_registrations.py Remove FFT from stride incorrect ops (#145080) 2025-01-27 04:26:04 +00:00
_namedtensor_internals.py
_ops.py [fx] move DCE rand check to import time (#145118) 2025-01-22 02:23:02 +00:00
_python_dispatcher.py
_size_docs.py
_sources.py PEP585 update - mostly toplevels (#145178) 2025-01-22 02:21:14 +00:00
_storage_docs.py
_streambase.py
_tensor.py PEP585 update - mostly toplevels (#145178) 2025-01-22 02:21:14 +00:00
_tensor_docs.py
_tensor_str.py PEP585 update - mostly toplevels (#145178) 2025-01-22 02:21:14 +00:00
_thread_safe_fork.py
_torch_docs.py Revert "Add generator parameter to rand*_like functions (#136780)" 2025-01-24 19:00:21 +00:00
_utils.py [utils] add try_import method for importing optional modules (#145528) 2025-01-25 00:14:07 +00:00
_utils_internal.py PEP585 update - mostly toplevels (#145178) 2025-01-22 02:21:14 +00:00
_VF.py
_vmap_internals.py PEP585 update - mostly toplevels (#145178) 2025-01-22 02:21:14 +00:00
_weights_only_unpickler.py PEP585 update - mostly toplevels (#145178) 2025-01-22 02:21:14 +00:00
abi-check.cpp
CMakeLists.txt
custom_class.h
custom_class_detail.h
extension.h
functional.py PEP585 update - mostly toplevels (#145178) 2025-01-22 02:21:14 +00:00
hub.py PEP585 update - mostly toplevels (#145178) 2025-01-22 02:21:14 +00:00
library.h
library.py [Custom Ops] Fix f-strings in custom ops error message (#145673) 2025-01-27 19:22:43 +00:00
overrides.py Revert "Add generator parameter to rand*_like functions (#136780)" 2025-01-24 19:00:21 +00:00
py.typed
quasirandom.py
random.py PEP585 update - mostly toplevels (#145178) 2025-01-22 02:21:14 +00:00
README.txt
return_types.py
script.h
serialization.py Add option to serialization config to reduce random reads from get_record_offset when loading with mmap=True (#143880) 2025-01-27 23:57:30 +00:00
storage.py PEP585 update - mostly toplevels (#145178) 2025-01-22 02:21:14 +00:00
torch_version.py PEP585 update - mostly toplevels (#145178) 2025-01-22 02:21:14 +00:00
types.py torch.distributions: replace numbers.Number with torch.types.Number. (#145086) 2025-01-27 20:24:55 +00:00
version.py.tpl

Note [TH abstraction violation]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

TH/THC provide some hpp headers, which are proper C++ headers rather than
C headers.  These headers serve double duty as *internal implementation
detail* headers, whose contents should largely not be used by external
clients.

Ideally, we would not install these headers at all; instead, you should
use public functions (in headers like `THTensor.h`, NOT `THTensor.hpp`)
to manipulate these structs.  However, there are a few places
in torch/csrc where we violate this abstraction.  They are marked with
a pointer to this note.  Each of those sites will have to be refactored
when we refactor the guts of THTensor and related structures.