pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-14 20:57:59 +00:00

History

Mikayla Gawarecki 001e355a56 Add option to serialization config to reduce random reads from get_record_offset when loading with mmap=True (#143880 ) ## Background This PR adds `torch.utils.serialization.config.load.calculate_storage_offsets`. This option relies on the previous PR in this stack, where storage order was changed to non lexicographical. A `.format_version` entry was added to the zipfile and `calculate_storage_offsets` will only work on checkpoints with `.format_version`. When this is turned on, for `torch.load(mmap=True)`, offsets of each storage record (other than the 0th storage will be calculated instead of relying on `miniz` APIs to determine this). The existing APIs will issue multiple random reads (reading the end of central directory record, then reading the zipfile header for the record) to determine the storage offset where the record starts. This can greatly degrade `torch.load(mmap=True)` performance for non-filesystem cases. `6aaae9d78f/caffe2/serialize/inline_container.cc (L589-L605)` ## How does this work The format for the checkpoint is as such ``` archive_name/ \|_ data.pkl \|_.format_version \|_byteorder \|_data/ \|_ 0 \|_ 1 \|_ 2 \|_ ... \|_ ``` Each `data/i` record represents a storage, where storages are written in the order that the Pickler encounters them. For each storage, our `persistent_load` logic saves the following metadata to the pickle file `dtype, numel, key, location` where `numel` is the number of bytes in the storage. Note that we always use `miniz` writer in the zip64 mode per [here](`7796e308d0/caffe2/serialize/inline_container.cc (L701)`) A zipfile record written by miniz looks as such ``` ---------------- ----------------- ------------------- ---------------- --------- ------------------------------ \| 30 byte header \| n byte filename \| zip64_extra_data \| m byte padding \| storage \| 16 or 24 byte local dir footer \| ---------------- ----------------- ------------------- ---------------- --------- ------------------------------ ``` - The header size (30) is given by [`MZ_ZIP_LOCAL_DIR_HEADER_SIZE`](https://github.com/pytorch/pytorch/blob/main/third_party/miniz-3.0.2/miniz.c?fbclid=IwZXh0bgNhZW0CMTEAAR2O8Vysd--UoSCxW70gabXIS1dbz733oHwuUQ5_Ff1hY2WU6PL2i6CSH4A_aem_J9oaU2HpDeWtJKOU9EnVqw#L3290) - filename will be `"{archive_name}/{filepath}"` - `zip64_extra_data` is determined by [`mz_zip_writer_create_zip64_extra_data`](`7796e308d0/third_party/miniz-3.0.2/miniz.c (L6202)`). Note that [we only create zip64_extra_data if storage_size >= 0xFFFFFFFF or the offset of the start of the header >= 0xFFFFFFFF](`7796e308d0/third_party/miniz-3.0.2/miniz.c (L6519-L6524)`) - `m` is determined by [`getPadding`](`7796e308d0/caffe2/serialize/inline_container.cc (L254)`), which accounts for filename, zip64_extra_data to determine `m` such that the start of `storage` is aligned to 64 bytes. The `m` bytes will always start with `F B padding_size" as the first 4 bytes - The local dir footer size is determined based on [this snippet ](`7796e308d0/third_party/miniz-3.0.2/miniz.c (L6610-L6632)`): if the buffer size is 0 it is skipped. If the zip64_extra_data was created, it is 24, otherwise it is 16. When `torch.utils.serialization.config.load.calculate_storage_offsets` is set we do the following - We keep track of where the "cursor" is in the file using `current_offset`, after each persistent_load call, it will be at the offset where the header for the next record starts - for the 0th storage, "data/0", we use the regular get_record_offset to determine the start of the storage - for any other storage, (where the storages will be in order encountered by the unpickler, 0, 1, 2, 3, ...) we use `get_record_offset_no_read`, which re-uses the `getPadding` logic to determine the offset of the storage - Note that `load_tensor` will only ever be called again with the same key if the storage's `._data_ptr()` is 0 [[pointer1](https://github.com/pytorch/pytorch/blob/main/torch/serialization.py#L1917-L1918)][[pointer2](https://github.com/pytorch/pytorch/blob/main/torch/serialization.py#L1936-L1937)], so we cache the offsets for this edge case - After each storage, if the storage is non-zero, we account for the local dir footer based on the logic described above ## Testing strategy The agreed upon testing strategy was as follows: - Add debug code gated by an environment flag `TORCH_SERIALIZATION_DEBUG` that will run this offset calculation logic and verify it against getRecordOffset for each storage (when mmap=False) - This flag is set throughout CI, which means that every time `torch.load` is called, the offset calculation logic is implicitly being tested. Differential Revision: [D67673026](https://our.internmc.facebook.com/intern/diff/D67673026) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143880 Approved by: https://github.com/albanD ghstack dependencies: #143879		2025-01-31 17:09:20 +00:00
..
_awaits
_C	[CUDA][cuBLAS] Add fp16 accumulate option to cuBLAS/cuBLASLt (#144441 )	2025-01-30 22:33:50 +00:00
_C_flatbuffer
_custom_op	PEP585 update - mostly toplevels (#145178 )	2025-01-22 02:21:14 +00:00
_decomp	PEP585 update - torch/_C torch/_decomp torch/_lazy torch/_library torch/_numpy torch/_prims torch/_refs torch/_strobelight (#145102 )	2025-01-18 20:47:12 +00:00
_dispatch	PEP585 update - mostly toplevels (#145178 )	2025-01-22 02:21:14 +00:00
_dynamo	pickler for GraphModule (#141659 )	2025-01-31 05:34:28 +00:00
_export	[export] nested terms in nn_module_stack deserialization (#145901 )	2025-01-31 10:00:13 +00:00
_functorch	Turn on mypy for _dynamo/variables/builtin.py (#145552 )	2025-01-30 22:21:32 +00:00
_higher_order_ops	Introduce aoti_call_delegate HOP (#145630 )	2025-01-31 04:57:36 +00:00
_inductor	pickler for GraphModule (#141659 )	2025-01-31 05:34:28 +00:00
_lazy	PEP585 update - torch/_C torch/_decomp torch/_lazy torch/_library torch/_numpy torch/_prims torch/_refs torch/_strobelight (#145102 )	2025-01-18 20:47:12 +00:00
_library	PEP585: Missed conversions (#145342 )	2025-01-29 05:24:36 +00:00
_logging	[BE][export] add "+export" logging to de/serialization (#145283 )	2025-01-23 19:47:48 +00:00
_numpy	PEP585 update - torch/_C torch/_decomp torch/_lazy torch/_library torch/_numpy torch/_prims torch/_refs torch/_strobelight (#145102 )	2025-01-18 20:47:12 +00:00
_prims	PEP585 update - torch/_C torch/_decomp torch/_lazy torch/_library torch/_numpy torch/_prims torch/_refs torch/_strobelight (#145102 )	2025-01-18 20:47:12 +00:00
_prims_common	Output of nonzero is transposed, fix fake tensor (#144695 )	2025-01-26 01:07:22 +00:00
_refs	[Inductor][CPP] fix torch logit decomposition (#145576 )	2025-01-27 19:37:51 +00:00
_strobelight	PEP585 update - torch/_C torch/_decomp torch/_lazy torch/_library torch/_numpy torch/_prims torch/_refs torch/_strobelight (#145102 )	2025-01-18 20:47:12 +00:00
_subclasses	fix a small typo in comments (#145323 )	2025-01-31 06:45:44 +00:00
_vendor
accelerator
amp	[autocast][pytorch] Support autocast for MTIA (#145627 )	2025-01-25 03:24:59 +00:00
ao	Revert "Fix type annotation of `Linear.bias` (#142326 )"	2025-01-26 03:41:00 +00:00
autograd	functional compiled autograd (#144707 )	2025-01-27 05:20:56 +00:00
backends	[CUDA][cuBLAS] Add fp16 accumulate option to cuBLAS/cuBLASLt (#144441 )	2025-01-30 22:33:50 +00:00
compiler	[Doc] Add period at the end of the sentence (#145384 )	2025-01-22 19:56:31 +00:00
contrib	PEP585 update - mostly toplevels (#145178 )	2025-01-22 02:21:14 +00:00
cpu
csrc	Add option to serialization config to reduce random reads from get_record_offset when loading with mmap=True (#143880 )	2025-01-31 17:09:20 +00:00
cuda	[inductor triton] Disable incorrect TF32 usage on CUDA capability < 8 (#145684 )	2025-01-28 22:01:08 +00:00
distributed	Turn on mypy for _dynamo/variables/builtin.py (#145552 )	2025-01-30 22:21:32 +00:00
distributions	`torch.distributions`: replace `numbers.Number` with `torch.types.Number`. (#145086 )	2025-01-27 20:24:55 +00:00
export	[export] Add tlparse to draft-export (#145810 )	2025-01-29 19:26:00 +00:00
fft
func
futures	PEP585 update - mostly toplevels (#145178 )	2025-01-22 02:21:14 +00:00
fx	pickler for GraphModule (#141659 )	2025-01-31 05:34:28 +00:00
jit	PEP585 update - mostly toplevels (#145178 )	2025-01-22 02:21:14 +00:00
legacy
lib
linalg
masked	PEP585 update - torch/_higher_order_ops torch/_subclasses torch/backends torch/compiler torch/cuda torch/masked torch/mtia torch/nested (#145202 )	2025-01-20 22:37:26 +00:00
monitor
mps	[MPS] Support includes in metal objects (#145087 )	2025-01-18 05:35:22 +00:00
mtia	[S481486] Move MTIA dynamic library loading from __init__.py to a separate module (#145322 )	2025-01-22 23:39:43 +00:00
multiprocessing
nested	Support remaining *_like factory functions for NJT (#144889 )	2025-01-27 21:33:51 +00:00
nn	Fix a number of flexattention issues (cse, cudagraph, etc.) (#145059 )	2025-01-29 20:27:39 +00:00
onnx	Revert "Advance past fc window for stft center (#145437 )"	2025-01-30 23:14:16 +00:00
optim	PEP585: Missed conversions (#145342 )	2025-01-29 05:24:36 +00:00
package	Use `typing.IO[bytes]` instead of `io.BytesIO` in annotations (#144994 )	2025-01-27 18:08:07 +00:00
profiler	PEP585 update - torch/nn torch/optim torch/package torch/profiler torch/serialization torch/sparse torch/xpu (#145175 )	2025-01-21 16:57:27 +00:00
quantization
signal	PEP585 update - mostly toplevels (#145178 )	2025-01-22 02:21:14 +00:00
sparse	PEP585 update - torch/nn torch/optim torch/package torch/profiler torch/serialization torch/sparse torch/xpu (#145175 )	2025-01-21 16:57:27 +00:00
special
testing	Introduce aoti_call_delegate HOP (#145630 )	2025-01-31 04:57:36 +00:00
utils	Add option to serialization config to reduce random reads from get_record_offset when loading with mmap=True (#143880 )	2025-01-31 17:09:20 +00:00
xpu	PEP585 update - torch/nn torch/optim torch/package torch/profiler torch/serialization torch/sparse torch/xpu (#145175 )	2025-01-21 16:57:27 +00:00
__config__.py
__future__.py
__init__.py	[CUDA] Change slim-wheel libraries load order (#145638 )	2025-01-24 22:00:56 +00:00
_appdirs.py
_classes.py
_compile.py
_custom_ops.py
_deploy.py
_environment.py
_guards.py	PEP585 update - mostly toplevels (#145178 )	2025-01-22 02:21:14 +00:00
_jit_internal.py	PEP585: Missed conversions (#145342 )	2025-01-29 05:24:36 +00:00
_linalg_utils.py	PEP585 update - mostly toplevels (#145178 )	2025-01-22 02:21:14 +00:00
_lobpcg.py	PEP585 update - mostly toplevels (#145178 )	2025-01-22 02:21:14 +00:00
_lowrank.py	PEP585 update - mostly toplevels (#145178 )	2025-01-22 02:21:14 +00:00
_meta_registrations.py	nonzero_static with symint size (#146006 )	2025-01-30 23:42:42 +00:00
_namedtensor_internals.py
_ops.py	Introduce aoti_call_delegate HOP (#145630 )	2025-01-31 04:57:36 +00:00
_python_dispatcher.py
_size_docs.py
_sources.py	PEP585 update - mostly toplevels (#145178 )	2025-01-22 02:21:14 +00:00
_storage_docs.py
_streambase.py
_tensor.py	[pytorch] raise exception when calling dim order on sparse tensor (#145888 )	2025-01-29 06:15:44 +00:00
_tensor_docs.py
_tensor_str.py	PEP585 update - mostly toplevels (#145178 )	2025-01-22 02:21:14 +00:00
_thread_safe_fork.py
_torch_docs.py	Add overloads to diagonal docs (#144214 )	2025-01-31 15:53:59 +00:00
_utils.py	[utils] add try_import method for importing optional modules (#145528 )	2025-01-25 00:14:07 +00:00
_utils_internal.py	PEP585 update - mostly toplevels (#145178 )	2025-01-22 02:21:14 +00:00
_VF.py
_vmap_internals.py	PEP585 update - mostly toplevels (#145178 )	2025-01-22 02:21:14 +00:00
_weights_only_unpickler.py	PEP585 update - mostly toplevels (#145178 )	2025-01-22 02:21:14 +00:00
abi-check.cpp
CMakeLists.txt
custom_class.h
custom_class_detail.h
extension.h
functional.py	Revert "Advance past fc window for stft center (#145437 )"	2025-01-30 23:14:16 +00:00
hub.py	PEP585 update - mostly toplevels (#145178 )	2025-01-22 02:21:14 +00:00
library.h
library.py	[Custom Ops] Fix f-strings in custom ops error message (#145673 )	2025-01-27 19:22:43 +00:00
overrides.py	Revert "Add generator parameter to rand*_like functions (#136780 )"	2025-01-24 19:00:21 +00:00
py.typed
quasirandom.py
random.py	PEP585 update - mostly toplevels (#145178 )	2025-01-22 02:21:14 +00:00
README.txt
return_types.py
script.h
serialization.py	Add option to serialization config to reduce random reads from get_record_offset when loading with mmap=True (#143880 )	2025-01-31 17:09:20 +00:00
storage.py	PEP585 update - mostly toplevels (#145178 )	2025-01-22 02:21:14 +00:00
torch_version.py	PEP585 update - mostly toplevels (#145178 )	2025-01-22 02:21:14 +00:00
types.py	Improve typing in torch/types.py (#145237 )	2025-01-28 05:29:12 +00:00
version.py.tpl

README.txt

Note [TH abstraction violation]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

TH/THC provide some hpp headers, which are proper C++ headers rather than
C headers.  These headers serve double duty as *internal implementation
detail* headers, whose contents should largely not be used by external
clients.

Ideally, we would not install these headers at all; instead, you should
use public functions (in headers like `THTensor.h`, NOT `THTensor.hpp`)
to manipulate these structs.  However, there are a few places
in torch/csrc where we violate this abstraction.  They are marked with
a pointer to this note.  Each of those sites will have to be refactored
when we refactor the guts of THTensor and related structures.