pytorch/docs/source
Mikayla Gawarecki db3685a35c Add option to serialization config to reduce random reads from get_record_offset when loading with mmap=True (#143880)
## Background

This PR adds `torch.utils.serialization.config.load.calculate_storage_offsets`. This option relies  on the previous PR in this stack, where storage order was changed to non lexicographical. A `.format_version` entry was added to the zipfile and `calculate_storage_offsets` will only work on checkpoints with `.format_version`.

When this is turned on, for `torch.load(mmap=True)`, offsets of each storage record (other than the 0th storage will be calculated instead of relying on `miniz` APIs to determine this).

The existing APIs will issue multiple random reads (reading the end of central directory record, then reading the zipfile header for the record) to determine the storage offset where the record starts. This can greatly degrade `torch.load(mmap=True)` performance for non-filesystem cases.

6aaae9d78f/caffe2/serialize/inline_container.cc (L589-L605)

## Testing strategy

The agreed upon testing strategy was as follows:
- Add debug code gated by an environment flag `TORCH_SERIALIZATION_DEBUG` that will run this offset calculation logic and verify it against getRecordOffset for each storage (when mmap=False)
- This flag is set throughout CI, which means that every time `torch.load` is called, the offset calculation logic is implicitly being tested.

Differential Revision: [D67673026](https://our.internmc.facebook.com/intern/diff/D67673026)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143880
Approved by: https://github.com/albanD
ghstack dependencies: #143879
2025-01-27 23:57:30 +00:00
..
_static Update OSS nested tensor docs to focus on NJT (#145402) 2025-01-25 04:08:19 +00:00
_templates Add an option for classic search (#142018) 2024-12-06 01:24:52 +00:00
community Update maintainers for inductor and x86 CPU (#136839) 2024-10-11 07:24:07 +00:00
elastic DOC: add docstring to construct_and_record_rdzv_event() (#128189) 2024-06-10 22:17:33 +00:00
notes Add option to serialization config to reduce random reads from get_record_offset when loading with mmap=True (#143880) 2025-01-27 23:57:30 +00:00
rpc [Doc] fix some typos (found by codespell and typos) (#132544) 2024-08-05 17:21:56 +00:00
scripts [ONNX] Update images and APIs to onnx_dynamo.rst (#144358) 2025-01-08 21:44:43 +00:00
accelerator.rst [BE][accelerator] formalize API name {current,set}_device_{idx => index} (#140542) 2024-12-12 10:53:48 +00:00
amp.rst Update document for autocast on CPU (#135299) 2024-09-13 09:11:47 +00:00
autograd.rst
backends.rst Revert "Reverting the PR adding Kleidiai-based int4 kernels (#145392)" (#145505) 2025-01-23 18:50:59 +00:00
benchmark_utils.rst
bottleneck.rst
checkpoint.rst [checkpoint] Clean up selective activation checkpoint and make public (#125795) 2024-06-18 18:18:50 +00:00
complex_numbers.rst
cond.rst [Doc] fix some typos (found by codespell and typos) (#132544) 2024-08-05 17:21:56 +00:00
conf.py Revert "Add flop formula for _scaled_mm (#144872)" 2025-01-16 15:16:18 +00:00
config_mod.rst
cpp_extension.rst
cpp_index.rst
cpu.rst
cuda._sanitizer.rst
cuda.rst Add get_stream_from_external API for CUDA backend (#143799) 2024-12-31 11:15:59 +00:00
cuda.tunable.rst [ROCm] Fix TunableOp UTs: Rotating Buffer (#143172) 2024-12-14 06:18:11 +00:00
cuda_environment_variables.rst
cudnn_persistent_rnn.rst
cudnn_rnn_determinism.rst
data.rst
ddp_comm_hooks.rst
debugging_environment_variables.rst
deploy.rst
deterministic.rst
distributed.algorithms.join.rst
distributed.checkpoint.rst [DCP] Cross-link DCP doc to tutorials (#139776) 2024-11-07 02:19:49 +00:00
distributed.elastic.rst
distributed.fsdp.fully_shard.rst [FSDP2] Move to public torch.distributed.fsdp (#141868) 2024-12-07 01:24:28 +00:00
distributed.optim.rst
distributed.pipelining.rst [pipelining] Update tutorials and documentation (#143045) 2024-12-12 18:42:17 +00:00
distributed.rst [C10D] Update docs for wait() (#143305) 2024-12-17 00:41:11 +00:00
distributed.tensor.parallel.rst Update link in distributed.tensor.parallel.rst (#136103) 2024-09-15 19:36:29 +00:00
distributed.tensor.rst [dtensor] expose the __create_chunk_list__ in the doc (#144100) 2025-01-03 20:06:23 +00:00
distributions.rst
dlpack.rst
docutils.conf
export.ir_spec.rst [export] Update docs (#142011) 2024-12-05 03:44:46 +00:00
export.programming_model.rst fix formatting in programming model doc (#143587) 2024-12-20 07:09:19 +00:00
export.rst torch export programming model (#143546) 2024-12-19 16:56:13 +00:00
fft.rst
fsdp.rst
func.api.rst
func.batch_norm.rst
func.migrating.rst
func.rst
func.ux_limitations.rst
func.whirlwind_tour.rst
future_mod.rst
futures.rst
fx.experimental.rst Add truediv support in export serializer (#136364) 2024-12-05 17:33:33 +00:00
fx.rst Consolidate SymDispatchMode into ProxyTensorMode (#132674) 2024-08-08 12:02:54 +00:00
hub.rst
index.rst Add Torchao docs link to Pytorch libraries (#145412) 2025-01-24 17:11:20 +00:00
jit.rst
jit_builtin_functions.rst
jit_language_reference.rst [Doc] fix some typos (found by codespell and typos) (#132544) 2024-08-05 17:21:56 +00:00
jit_language_reference_v2.rst [Doc] fix some typos (found by codespell and typos) (#132544) 2024-08-05 17:21:56 +00:00
jit_python_reference.rst
jit_unsupported.rst
jit_utils.rst
library.rst [Custom Ops] Add a new API to allow users to register an autocast for the custom op (#145588) 2025-01-27 19:22:43 +00:00
linalg.rst
logging.rst
masked.rst Add MaskedTensor passthrough: unfold, F.Unfold, F.Fold, stack (#125262) 2024-09-06 19:06:23 +00:00
math-quantizer-equation.png
meta.rst
miscellaneous_environment_variables.rst Add environment variable to force no weights_only load (#138225) 2024-10-21 23:26:15 +00:00
mobile_optimizer.rst Add ExecuTorch warning to mobile_optimizer (#134697) 2024-09-04 17:47:14 +00:00
model_zoo.rst
module_tracker.rst
monitor.rst
mps.rst [MPS] Expose MPSProfiler::start/stopCapture to Python (#144561) 2025-01-11 02:05:36 +00:00
mps_environment_variables.rst [MPS] Add mps profiler env vars to docs (#129552) 2024-07-04 06:44:48 +00:00
mtia.memory.rst Revert "[MTIA] (3/n) Implement PyTorch APIs to query/reset device peak memory usage (#143347)" 2024-12-21 04:04:16 +00:00
mtia.rst Revert "[MTIA] (3/n) Implement PyTorch APIs to query/reset device peak memory usage (#143347)" 2024-12-21 04:04:16 +00:00
multiprocessing.rst
name_inference.rst
named_tensor.rst
nested.rst Update OSS nested tensor docs to focus on NJT (#145402) 2025-01-25 04:08:19 +00:00
nn.attention.bias.rst
nn.attention.experimental.rst [Flex Attention] Paged Attention (#137164) 2024-10-29 17:05:22 +00:00
nn.attention.flex_attention.rst FlexAttention support for NJT (#136792) 2024-10-28 20:01:27 +00:00
nn.attention.rst [Flex Attention] Paged Attention (#137164) 2024-10-29 17:05:22 +00:00
nn.functional.rst
nn.init.rst
nn.rst Add APIs to separate norm calculation and gradient scaling in nn.utils.clip_grad_norm_ (#139662) 2024-11-07 23:13:23 +00:00
onnx.rst [ONNX] Improves documentation of ONNX exporter (#135372) 2024-09-09 15:09:01 +00:00
onnx_dynamo.rst [ONNX] Update images and APIs to onnx_dynamo.rst (#144358) 2025-01-08 21:44:43 +00:00
onnx_dynamo_memory_usage.rst Update TorchDynamo-based ONNX Exporter memory usage example code. (#144139) 2025-01-03 20:41:36 +00:00
onnx_dynamo_onnxruntime_backend.rst
onnx_torchscript.rst [ONNX] Update images and APIs to onnx_dynamo.rst (#144358) 2025-01-08 21:44:43 +00:00
onnx_torchscript_supported_aten_ops.rst
optim.rst Ensure SWA boundary conditions w.r.t. definition (#133773) 2024-10-31 18:24:08 +00:00
package.rst
profiler.rst
quantization-accuracy-debugging.rst
quantization-backend-configuration.rst
quantization-support.rst Add support for prototype affine quantization in pt2e flow (#141421) 2024-12-24 04:22:18 +00:00
quantization.rst [BC-Breaking]Remove capture_pre_autograd_graph references in quantization (#139505) 2024-12-13 22:26:22 +00:00
random.rst
rpc.rst
signal.rst
size.rst
sparse.rst SparseCsrCUDA: cuDSS backend for linalg.solve (#129856) 2024-08-22 07:57:30 +00:00
special.rst
storage.rst Doc: Rewrite the storage.rst file to emphasize untyped storages (#140145) 2024-11-13 17:40:16 +00:00
tensor_attributes.rst [Docs] Remove duplicate declaration of double_tensor (#140927) 2024-11-18 21:22:30 +00:00
tensor_view.rst
tensorboard.rst
tensors.rst add xpu to torch.tensors (#127280) 2024-06-11 18:13:01 +00:00
testing.rst
threading_environment_variables.rst
torch.ao.ns._numeric_suite.rst
torch.ao.ns._numeric_suite_fx.rst
torch.compiler.config.rst Profile guided optimization for automatic_dynamic (#139001) 2024-11-03 06:29:57 +00:00
torch.compiler.rst Profile guided optimization for automatic_dynamic (#139001) 2024-11-03 06:29:57 +00:00
torch.compiler_aot_inductor.rst [AOTI][doc] Update tutorial (#143390) 2024-12-17 18:35:40 +00:00
torch.compiler_aot_inductor_minifier.rst Aoti minifier flatten (#141156) 2024-12-06 07:12:45 +00:00
torch.compiler_api.rst [export] add is_exporting flag (#142425) 2024-12-18 21:36:28 +00:00
torch.compiler_best_practices_for_backends.rst
torch.compiler_cudagraph_trees.rst [CUDAGraph][Docs] add cuda to torch.randn (#144793) 2025-01-15 18:02:10 +00:00
torch.compiler_custom_backends.rst [pt2, docs] Add new PT2 troubleshooting doc (#138620) 2024-11-09 01:17:39 +00:00
torch.compiler_dynamic_shapes.rst
torch.compiler_dynamo_deepdive.rst fix typo in torch.compiler_dynamo_deepdive.rst (#140871) 2024-11-19 14:42:36 +00:00
torch.compiler_dynamo_overview.rst
torch.compiler_fake_tensor.rst [doc] improve code in fake tensor doc (#140329) 2024-11-13 05:14:56 +00:00
torch.compiler_faq.rst Rename cache limit to recompile limit in configs (#143709) 2024-12-22 10:03:57 +00:00
torch.compiler_fine_grain_apis.rst [export] add is_exporting flag (#142425) 2024-12-18 21:36:28 +00:00
torch.compiler_get_started.rst [Inductor] Update AttrsDescriptor instantiation for Triton changes (#137458) 2024-10-14 20:20:29 +00:00
torch.compiler_inductor_profiling.rst
torch.compiler_ir.rst
torch.compiler_nn_module.rst
torch.compiler_performance_dashboard.rst
torch.compiler_profiling_torch_compile.rst [EZ] Fix spelling typo (#136157) 2024-09-16 19:30:30 +00:00
torch.compiler_transformations.rst
torch.compiler_troubleshooting.rst Rename cache limit to recompile limit in configs (#143709) 2024-12-22 10:03:57 +00:00
torch.compiler_troubleshooting_old.rst Rename cache limit to recompile limit in configs (#143709) 2024-12-22 10:03:57 +00:00
torch.overrides.rst
torch.rst Transform unbacked int expressions into a fresh unbacked int. (#141917) 2024-12-05 16:53:44 +00:00
torch_cuda_memory.rst
torch_environment_variables.rst [Docs][MPS] Add mps environment variable table (#129008) 2024-06-20 03:30:35 +00:00
torch_nccl_environment_variables.rst [c10d][doc] Add docs for ENV variables TORCH_NCCL_ASYNC_ERROR_HANDLING TORCH_NCCL_TRACE_CPP_STACK and TORCH_NCCL_COORD_CHECK_MILSEC (#132920) 2024-08-09 21:08:20 +00:00
type_info.rst
utils.rst
xpu.rst Add get_stream_from_external API for XPU backend (#141123) 2024-12-31 11:15:52 +00:00