pytorch/torch
Edward Z. Yang 90bed32b98 Introduce torch.sym_sum (#136429)
Partially addresses https://github.com/pytorch/pytorch/issues/128150

When you have big sums of values, we end up computing long chains of
binary addition in our FX graph representation.  Not only is this ugly,
it also is quadratic, as the sympy.Add constructor is O(N) in number
of arguments.  Instead, ensure that we maintain the summation as a
single FX node so we can do the entire addition all in one go.

update_hint_regression benchmark, before and after:

```
update_hint_regression,compile_time_instruction_count,2648328980
update_hint_regression,compile_time_instruction_count,2563748678
```

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136429
Approved by: https://github.com/isuruf
2024-10-08 18:12:57 +00:00
..
_awaits
_C Revert "Disallow FakeTensor.data_ptr access in eager mode (#137221)" 2024-10-07 21:46:13 +00:00
_C_flatbuffer
_custom_op
_decomp Preserve custom ops via run_decomps (#136882) 2024-10-01 17:38:00 +00:00
_dispatch
_dynamo Introduce torch.sym_sum (#136429) 2024-10-08 18:12:57 +00:00
_export Add original forward names to schema so that prettify pass works (#136887) 2024-10-08 04:21:02 +00:00
_functorch fix silly mapping issue with torch.Size (#137465) 2024-10-08 16:53:15 +00:00
_higher_order_ops Revert "[FlexAttention] Support training bias for eager (#136910)" 2024-10-08 17:29:02 +00:00
_inductor Introduce torch.sym_sum (#136429) 2024-10-08 18:12:57 +00:00
_lazy
_library Proper handling of arguments passed by in kwargs inside zip_schema (#137311) 2024-10-04 21:50:31 +00:00
_logging Don't actually import module when checking if its valid (#136548) 2024-09-25 20:47:32 +00:00
_numpy
_prims Fix AOT Graph capture not propagating non_blocking copy parameter to … (#136513) 2024-10-01 00:32:47 +00:00
_prims_common Fix six broken tests in test_ops.py (#136653) 2024-09-30 20:32:55 +00:00
_refs Fix typo in _normalize ref (#137079) 2024-10-02 19:06:48 +00:00
_strobelight [Pytorch] Cleanup Strobelight URL and shorten for readability (#136102) 2024-09-16 18:10:33 +00:00
_subclasses Revert "Disallow FakeTensor.data_ptr access in eager mode (#137221)" 2024-10-07 21:46:13 +00:00
_vendor
amp [MPS] Add support for autocast in MPS (#99272) 2024-09-05 23:23:17 +00:00
ao Change to export_for_training in quantize_pt2e tests (#137233) 2024-10-04 18:33:02 +00:00
autograd Param fixes in docstring (#136097) 2024-09-21 18:56:34 +00:00
backends [sparse][semi-structured] Add float8 dtype support to 24 sparsity (#136397) 2024-09-27 21:37:34 +00:00
compiler
contrib
cpu Revise CPU vectorization ISA support API (#135075) 2024-09-05 12:14:56 +00:00
csrc [Profiler] Clear Out Dangling AppendOnlyLists (#137450) 2024-10-08 17:48:59 +00:00
cuda raw_alloc ignores PYTORCH_NO_CUDA_MEMORY_CACHING (#131114) 2024-10-04 15:36:29 +00:00
distributed [FSDP2] Required mesh_dim_names for HSDP (#137436) 2024-10-08 16:31:18 +00:00
distributions [BE]: Update mypy to 1.11.2 (#133816) 2024-09-16 19:44:11 +00:00
export Add original forward names to schema so that prettify pass works (#136887) 2024-10-08 04:21:02 +00:00
fft
func
futures
fx Introduce torch.sym_sum (#136429) 2024-10-08 18:12:57 +00:00
jit
legacy
lib
linalg docs: clarify alias usage for x parameter in vector_norm function (#136921) 2024-09-30 02:50:06 +00:00
masked [BE]: Update mypy to 1.11.2 (#133816) 2024-09-16 19:44:11 +00:00
monitor
mps
mtia [MTIA] Support torch.cuda.get_device_capability equivalent API on MTIA (#135889) 2024-09-17 17:42:56 +00:00
multiprocessing multiprocessing.spawn: allow a grace period when shutdown (#131278) 2024-10-07 12:37:34 +00:00
nested Fix to() on non-contiguous NJTs (#137124) 2024-10-08 15:11:05 +00:00
nn Revert "[Dynamo] Move flex attention torch function mode to traceable HOP file (#137120)" 2024-10-08 17:26:19 +00:00
onnx [ONNX] Insert contiguous node between transpose and view before calling run_decompositions (#137340) 2024-10-08 16:45:59 +00:00
optim Add missing input "eps" to adam docs (#135191) 2024-09-25 20:17:23 +00:00
package [3.13] fix 3.13 pickle error in torch/package (#136049) 2024-09-14 14:28:09 +00:00
profiler [Profiler] Torch Profiler distributed info is not JSON serializable (#135548) 2024-09-13 02:22:33 +00:00
quantization
signal
sparse [sparse][semi-structured] Add float8 dtype support to 24 sparsity (#136397) 2024-09-27 21:37:34 +00:00
special
testing Fix to() on non-contiguous NJTs (#137124) 2024-10-08 15:11:05 +00:00
utils Introduce torch.sym_sum (#136429) 2024-10-08 18:12:57 +00:00
xpu Use torch.Stream&torch.Event for Dynamo capature (#134850) 2024-10-02 14:15:33 +00:00
__config__.py
__future__.py
__init__.py Introduce torch.sym_sum (#136429) 2024-10-08 18:12:57 +00:00
_appdirs.py
_classes.py
_compile.py
_custom_ops.py
_deploy.py
_environment.py Improve is_fbcode functionality (#136871) 2024-09-27 21:19:01 +00:00
_guards.py Turn on type-checking in torch.fx.experimental.symbolic_shapes (#136972) 2024-10-01 13:22:10 +00:00
_jit_internal.py
_linalg_utils.py
_lobpcg.py
_lowrank.py
_meta_registrations.py Introduce torch.sym_sum (#136429) 2024-10-08 18:12:57 +00:00
_namedtensor_internals.py
_ops.py Add type annotations for higher order ops/flex_attention (#137065) 2024-10-02 04:39:25 +00:00
_python_dispatcher.py
_size_docs.py
_sources.py
_storage_docs.py
_streambase.py Use torch.Stream&torch.Event for Dynamo capature (#134850) 2024-10-02 14:15:33 +00:00
_tensor.py Fix wrapper subclass serialization with custom sizes / strides (#137030) 2024-10-02 18:55:03 +00:00
_tensor_docs.py Revert "Add deterministic path for CUDA cumsum (#136224)" 2024-09-27 12:54:47 +00:00
_tensor_str.py
_thread_safe_fork.py [inductor] parallel compile: add import of thread_safe_fork for internal (#137155) 2024-10-03 17:37:21 +00:00
_torch_docs.py [Doc] Clarify that NaNs are not equal to each other (#137386) 2024-10-05 06:19:12 +00:00
_utils.py Add torch.serialization.skip_data context manager (#134504) 2024-09-05 16:53:39 +00:00
_utils_internal.py Log compile ids to pt2_remote_cache and pt2_compile_events (#137431) 2024-10-08 18:04:48 +00:00
_VF.py
_vmap_internals.py
_weights_only_unpickler.py
abi-check.cpp
CMakeLists.txt
custom_class.h
custom_class_detail.h
extension.h
functional.py Revert "Add deterministic path for CUDA cumsum (#136224)" 2024-09-27 12:54:47 +00:00
hub.py torch.hub: add get_dir/set_dir type hints (#134906) 2024-09-12 03:53:29 +00:00
library.h
library.py noop on torch.library APIs under torch::deploy (multipy) (#136645) 2024-09-26 02:34:34 +00:00
overrides.py Introduce torch.sym_sum (#136429) 2024-10-08 18:12:57 +00:00
py.typed
quasirandom.py
random.py
README.txt
return_types.py
script.h
serialization.py [3.13] fix 3.13 pickle error in serialization.py (#136034) 2024-09-14 00:02:40 +00:00
storage.py Fix serialization for torch.uint16, torch.uint32, torch.uint64 (#137184) 2024-10-03 14:56:11 +00:00
torch_version.py
types.py
version.py.tpl

Note [TH abstraction violation]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

TH/THC provide some hpp headers, which are proper C++ headers rather than
C headers.  These headers serve double duty as *internal implementation
detail* headers, whose contents should largely not be used by external
clients.

Ideally, we would not install these headers at all; instead, you should
use public functions (in headers like `THTensor.h`, NOT `THTensor.hpp`)
to manipulate these structs.  However, there are a few places
in torch/csrc where we violate this abstraction.  They are marked with
a pointer to this note.  Each of those sites will have to be refactored
when we refactor the guts of THTensor and related structures.