pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-14 20:57:59 +00:00

History

Mikayla Gawarecki db3685a35c Add option to serialization config to reduce random reads from get_record_offset when loading with mmap=True (#143880 ) ## Background This PR adds `torch.utils.serialization.config.load.calculate_storage_offsets`. This option relies on the previous PR in this stack, where storage order was changed to non lexicographical. A `.format_version` entry was added to the zipfile and `calculate_storage_offsets` will only work on checkpoints with `.format_version`. When this is turned on, for `torch.load(mmap=True)`, offsets of each storage record (other than the 0th storage will be calculated instead of relying on `miniz` APIs to determine this). The existing APIs will issue multiple random reads (reading the end of central directory record, then reading the zipfile header for the record) to determine the storage offset where the record starts. This can greatly degrade `torch.load(mmap=True)` performance for non-filesystem cases. `6aaae9d78f/caffe2/serialize/inline_container.cc (L589-L605)` ## Testing strategy The agreed upon testing strategy was as follows: - Add debug code gated by an environment flag `TORCH_SERIALIZATION_DEBUG` that will run this offset calculation logic and verify it against getRecordOffset for each storage (when mmap=False) - This flag is set throughout CI, which means that every time `torch.load` is called, the offset calculation logic is implicitly being tested. Differential Revision: [D67673026](https://our.internmc.facebook.com/intern/diff/D67673026) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143880 Approved by: https://github.com/albanD ghstack dependencies: #143879		2025-01-27 23:57:30 +00:00
..
_static	Update OSS nested tensor docs to focus on NJT (#145402 )	2025-01-25 04:08:19 +00:00
_templates	Add an option for classic search (#142018 )	2024-12-06 01:24:52 +00:00
community	Update maintainers for inductor and x86 CPU (#136839 )	2024-10-11 07:24:07 +00:00
elastic	DOC: add docstring to construct_and_record_rdzv_event() (#128189 )	2024-06-10 22:17:33 +00:00
notes	Add option to serialization config to reduce random reads from get_record_offset when loading with mmap=True (#143880 )	2025-01-27 23:57:30 +00:00
rpc	[Doc] fix some typos (found by codespell and typos) (#132544 )	2024-08-05 17:21:56 +00:00
scripts	[ONNX] Update images and APIs to onnx_dynamo.rst (#144358 )	2025-01-08 21:44:43 +00:00
accelerator.rst	[BE][accelerator] formalize API name `{current,set}_device_{idx => index}` (#140542 )	2024-12-12 10:53:48 +00:00
amp.rst	Update document for autocast on CPU (#135299 )	2024-09-13 09:11:47 +00:00
autograd.rst
backends.rst	Revert "Reverting the PR adding Kleidiai-based int4 kernels (#145392 )" (#145505 )	2025-01-23 18:50:59 +00:00
benchmark_utils.rst
bottleneck.rst
checkpoint.rst	[checkpoint] Clean up selective activation checkpoint and make public (#125795 )	2024-06-18 18:18:50 +00:00
complex_numbers.rst
cond.rst	[Doc] fix some typos (found by codespell and typos) (#132544 )	2024-08-05 17:21:56 +00:00
conf.py	Revert "Add flop formula for _scaled_mm (#144872 )"	2025-01-16 15:16:18 +00:00
config_mod.rst
cpp_extension.rst
cpp_index.rst
cpu.rst
cuda._sanitizer.rst
cuda.rst	Add get_stream_from_external API for CUDA backend (#143799 )	2024-12-31 11:15:59 +00:00
cuda.tunable.rst	[ROCm] Fix TunableOp UTs: Rotating Buffer (#143172 )	2024-12-14 06:18:11 +00:00
cuda_environment_variables.rst
cudnn_persistent_rnn.rst
cudnn_rnn_determinism.rst
data.rst
ddp_comm_hooks.rst
debugging_environment_variables.rst
deploy.rst
deterministic.rst
distributed.algorithms.join.rst
distributed.checkpoint.rst	[DCP] Cross-link DCP doc to tutorials (#139776 )	2024-11-07 02:19:49 +00:00
distributed.elastic.rst
distributed.fsdp.fully_shard.rst	[FSDP2] Move to public `torch.distributed.fsdp` (#141868 )	2024-12-07 01:24:28 +00:00
distributed.optim.rst
distributed.pipelining.rst	[pipelining] Update tutorials and documentation (#143045 )	2024-12-12 18:42:17 +00:00
distributed.rst	[C10D] Update docs for wait() (#143305 )	2024-12-17 00:41:11 +00:00
distributed.tensor.parallel.rst	Update link in distributed.tensor.parallel.rst (#136103 )	2024-09-15 19:36:29 +00:00
distributed.tensor.rst	[dtensor] expose the __create_chunk_list__ in the doc (#144100 )	2025-01-03 20:06:23 +00:00
distributions.rst
dlpack.rst
docutils.conf
export.ir_spec.rst	[export] Update docs (#142011 )	2024-12-05 03:44:46 +00:00
export.programming_model.rst	fix formatting in programming model doc (#143587 )	2024-12-20 07:09:19 +00:00
export.rst	torch export programming model (#143546 )	2024-12-19 16:56:13 +00:00
fft.rst
fsdp.rst
func.api.rst
func.batch_norm.rst
func.migrating.rst
func.rst
func.ux_limitations.rst
func.whirlwind_tour.rst
future_mod.rst
futures.rst
fx.experimental.rst	Add `truediv` support in export serializer (#136364 )	2024-12-05 17:33:33 +00:00
fx.rst	Consolidate SymDispatchMode into ProxyTensorMode (#132674 )	2024-08-08 12:02:54 +00:00
hub.rst
index.rst	Add Torchao docs link to Pytorch libraries (#145412 )	2025-01-24 17:11:20 +00:00
jit.rst
jit_builtin_functions.rst
jit_language_reference.rst	[Doc] fix some typos (found by codespell and typos) (#132544 )	2024-08-05 17:21:56 +00:00
jit_language_reference_v2.rst	[Doc] fix some typos (found by codespell and typos) (#132544 )	2024-08-05 17:21:56 +00:00
jit_python_reference.rst
jit_unsupported.rst
jit_utils.rst
library.rst	[Custom Ops] Add a new API to allow users to register an autocast for the custom op (#145588 )	2025-01-27 19:22:43 +00:00
linalg.rst
logging.rst
masked.rst	Add MaskedTensor passthrough: unfold, F.Unfold, F.Fold, stack (#125262 )	2024-09-06 19:06:23 +00:00
math-quantizer-equation.png
meta.rst
miscellaneous_environment_variables.rst	Add environment variable to force no weights_only load (#138225 )	2024-10-21 23:26:15 +00:00
mobile_optimizer.rst	Add ExecuTorch warning to mobile_optimizer (#134697 )	2024-09-04 17:47:14 +00:00
model_zoo.rst
module_tracker.rst
monitor.rst
mps.rst	[MPS] Expose `MPSProfiler::start/stopCapture` to Python (#144561 )	2025-01-11 02:05:36 +00:00
mps_environment_variables.rst	[MPS] Add mps profiler env vars to docs (#129552 )	2024-07-04 06:44:48 +00:00
mtia.memory.rst	Revert "[MTIA] (3/n) Implement PyTorch APIs to query/reset device peak memory usage (#143347 )"	2024-12-21 04:04:16 +00:00
mtia.rst	Revert "[MTIA] (3/n) Implement PyTorch APIs to query/reset device peak memory usage (#143347 )"	2024-12-21 04:04:16 +00:00
multiprocessing.rst
name_inference.rst
named_tensor.rst
nested.rst	Update OSS nested tensor docs to focus on NJT (#145402 )	2025-01-25 04:08:19 +00:00
nn.attention.bias.rst
nn.attention.experimental.rst	[Flex Attention] Paged Attention (#137164 )	2024-10-29 17:05:22 +00:00
nn.attention.flex_attention.rst	FlexAttention support for NJT (#136792 )	2024-10-28 20:01:27 +00:00
nn.attention.rst	[Flex Attention] Paged Attention (#137164 )	2024-10-29 17:05:22 +00:00
nn.functional.rst
nn.init.rst
nn.rst	Add APIs to separate norm calculation and gradient scaling in `nn.utils.clip_grad_norm_` (#139662 )	2024-11-07 23:13:23 +00:00
onnx.rst	[ONNX] Improves documentation of ONNX exporter (#135372 )	2024-09-09 15:09:01 +00:00
onnx_dynamo.rst	[ONNX] Update images and APIs to onnx_dynamo.rst (#144358 )	2025-01-08 21:44:43 +00:00
onnx_dynamo_memory_usage.rst	Update TorchDynamo-based ONNX Exporter memory usage example code. (#144139 )	2025-01-03 20:41:36 +00:00
onnx_dynamo_onnxruntime_backend.rst
onnx_torchscript.rst	[ONNX] Update images and APIs to onnx_dynamo.rst (#144358 )	2025-01-08 21:44:43 +00:00
onnx_torchscript_supported_aten_ops.rst
optim.rst	Ensure SWA boundary conditions w.r.t. definition (#133773 )	2024-10-31 18:24:08 +00:00
package.rst
profiler.rst
quantization-accuracy-debugging.rst
quantization-backend-configuration.rst
quantization-support.rst	Add support for prototype affine quantization in pt2e flow (#141421 )	2024-12-24 04:22:18 +00:00
quantization.rst	[BC-Breaking]Remove capture_pre_autograd_graph references in quantization (#139505 )	2024-12-13 22:26:22 +00:00
random.rst
rpc.rst
signal.rst
size.rst
sparse.rst	SparseCsrCUDA: cuDSS backend for linalg.solve (#129856 )	2024-08-22 07:57:30 +00:00
special.rst
storage.rst	Doc: Rewrite the storage.rst file to emphasize untyped storages (#140145 )	2024-11-13 17:40:16 +00:00
tensor_attributes.rst	[Docs] Remove duplicate declaration of `double_tensor` (#140927 )	2024-11-18 21:22:30 +00:00
tensor_view.rst
tensorboard.rst
tensors.rst	add xpu to torch.tensors (#127280 )	2024-06-11 18:13:01 +00:00
testing.rst
threading_environment_variables.rst
torch.ao.ns._numeric_suite.rst
torch.ao.ns._numeric_suite_fx.rst
torch.compiler.config.rst	Profile guided optimization for automatic_dynamic (#139001 )	2024-11-03 06:29:57 +00:00
torch.compiler.rst	Profile guided optimization for automatic_dynamic (#139001 )	2024-11-03 06:29:57 +00:00
torch.compiler_aot_inductor.rst	[AOTI][doc] Update tutorial (#143390 )	2024-12-17 18:35:40 +00:00
torch.compiler_aot_inductor_minifier.rst	Aoti minifier flatten (#141156 )	2024-12-06 07:12:45 +00:00
torch.compiler_api.rst	[export] add is_exporting flag (#142425 )	2024-12-18 21:36:28 +00:00
torch.compiler_best_practices_for_backends.rst
torch.compiler_cudagraph_trees.rst	[CUDAGraph][Docs] add `cuda` to `torch.randn` (#144793 )	2025-01-15 18:02:10 +00:00
torch.compiler_custom_backends.rst	[pt2, docs] Add new PT2 troubleshooting doc (#138620 )	2024-11-09 01:17:39 +00:00
torch.compiler_dynamic_shapes.rst
torch.compiler_dynamo_deepdive.rst	fix typo in `torch.compiler_dynamo_deepdive.rst` (#140871 )	2024-11-19 14:42:36 +00:00
torch.compiler_dynamo_overview.rst
torch.compiler_fake_tensor.rst	[doc] improve code in fake tensor doc (#140329 )	2024-11-13 05:14:56 +00:00
torch.compiler_faq.rst	Rename cache limit to recompile limit in configs (#143709 )	2024-12-22 10:03:57 +00:00
torch.compiler_fine_grain_apis.rst	[export] add is_exporting flag (#142425 )	2024-12-18 21:36:28 +00:00
torch.compiler_get_started.rst	[Inductor] Update AttrsDescriptor instantiation for Triton changes (#137458 )	2024-10-14 20:20:29 +00:00
torch.compiler_inductor_profiling.rst
torch.compiler_ir.rst
torch.compiler_nn_module.rst
torch.compiler_performance_dashboard.rst
torch.compiler_profiling_torch_compile.rst	[EZ] Fix spelling typo (#136157 )	2024-09-16 19:30:30 +00:00
torch.compiler_transformations.rst
torch.compiler_troubleshooting.rst	Rename cache limit to recompile limit in configs (#143709 )	2024-12-22 10:03:57 +00:00
torch.compiler_troubleshooting_old.rst	Rename cache limit to recompile limit in configs (#143709 )	2024-12-22 10:03:57 +00:00
torch.overrides.rst
torch.rst	Transform unbacked int expressions into a fresh unbacked int. (#141917 )	2024-12-05 16:53:44 +00:00
torch_cuda_memory.rst
torch_environment_variables.rst	[Docs][MPS] Add mps environment variable table (#129008 )	2024-06-20 03:30:35 +00:00
torch_nccl_environment_variables.rst	[c10d][doc] Add docs for ENV variables TORCH_NCCL_ASYNC_ERROR_HANDLING TORCH_NCCL_TRACE_CPP_STACK and TORCH_NCCL_COORD_CHECK_MILSEC (#132920 )	2024-08-09 21:08:20 +00:00
type_info.rst
utils.rst
xpu.rst	Add get_stream_from_external API for XPU backend (#141123 )	2024-12-31 11:15:52 +00:00