pytorch/torch
drisspg 0f9eea1329 [FlexAttention] Fix multiple calls to flex bug (#140761)
# Summary
Fixes long-standing bug we've had in the backward pass for flex attention. See https://github.com/pytorch/pytorch/issues/135161 for details

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140761
Approved by: https://github.com/Chillee, https://github.com/zou3519
2024-11-16 04:57:04 +00:00
..
_awaits
_C [dynamo] Identify pre-existing captured cells by cell id rather than content id (#140436) 2024-11-15 17:17:30 +00:00
_C_flatbuffer
_custom_op
_decomp Fix split decomp returning self (#140065) 2024-11-13 01:58:02 +00:00
_dispatch
_dynamo Ensure index for state guard construction is a source (#140515) 2024-11-15 22:02:50 +00:00
_export [reland] [aoti] Selectively package AOTI generated files (#140675) 2024-11-15 23:48:34 +00:00
_functorch [RFC] Implement caching for user defined triton kernels (#140326) 2024-11-16 02:37:16 +00:00
_higher_order_ops [FlexAttention] Fix multiple calls to flex bug (#140761) 2024-11-16 04:57:04 +00:00
_inductor [RFC] Implement caching for user defined triton kernels (#140326) 2024-11-16 02:37:16 +00:00
_lazy
_library Optimize mutable torch.library.custom_op overhead (#139513) 2024-11-05 18:30:53 +00:00
_logging [logging] Overhaul dynamo_timed and CompilationMetrics logging. (#139849) 2024-11-14 19:11:20 +00:00
_numpy
_prims Add dim to logging to help debug (#140445) 2024-11-16 01:33:29 +00:00
_prims_common check fake/real mismatches during real tensor prop (#137747) 2024-11-04 23:39:48 +00:00
_refs
_strobelight
_subclasses type annotations for meta_utils (#140203) 2024-11-13 20:07:47 +00:00
_vendor
accelerator
amp
ao Fix for split gates enabled quantizable LSTM subclass (#140818) 2024-11-15 20:15:52 +00:00
autograd
backends
compiler Profile guided optimization for automatic_dynamic (#139001) 2024-11-03 06:29:57 +00:00
contrib
cpu [Inductor][CPP] Add oneDNN BRGEMM config for Half cpp gemm template (#136255) 2024-11-05 05:33:29 +00:00
csrc [c10d][fr] wait counter for dump function (#140823) 2024-11-16 02:22:08 +00:00
cuda Revert "create a new torch.cuda.memory_usage_in_bytes api (#140719)" 2024-11-15 20:05:32 +00:00
distributed [FSDP2] privateuse1 support fsdp2. (#139539) 2024-11-15 06:34:35 +00:00
distributions Clarify meaning of rate parameter in Gamma distribution (#134847) 2024-11-09 00:22:13 +00:00
export Fix _out_spec (#140608) 2024-11-14 20:09:30 +00:00
fft
func
futures
fx [IG] Avoid generation of empty merge cpu submodule by splitter v2 (#140794) 2024-11-16 01:49:03 +00:00
jit
legacy
lib
linalg
masked
monitor
mps
mtia
multiprocessing
nested Misc. non-contig NJT fixes (#140160) 2024-11-09 01:18:26 +00:00
nn remove typo in UninitializedParameter docstring (#140197) 2024-11-15 23:26:23 +00:00
onnx [ONNX] Improve the conversion of from dynamic axes to shapes (#140488) 2024-11-15 04:26:45 +00:00
optim Support tensor betas in Adam and AdamW (#134171) 2024-11-15 21:55:55 +00:00
package
profiler [fx graph cache] Support freezing with FX graph caching (#136505) 2024-11-01 18:29:29 +00:00
quantization
signal
sparse
special
testing [RFC] Implement caching for user defined triton kernels (#140326) 2024-11-16 02:37:16 +00:00
utils [torchgen] Improve schema parsing with regex for numeric ranges (#140210) 2024-11-14 23:28:27 +00:00
xpu
__config__.py
__future__.py
__init__.py Revert "[dynamo] add SymNode bitwise and/or (#138777)" 2024-11-14 21:52:40 +00:00
_appdirs.py
_classes.py
_compile.py
_custom_ops.py
_deploy.py
_environment.py
_guards.py dynamo: guard on FSDP module parameters (#138819) 2024-11-13 20:46:46 +00:00
_jit_internal.py
_linalg_utils.py
_lobpcg.py
_lowrank.py
_meta_registrations.py triangular_solve: fix meta function output argument dtype check. (#140286) 2024-11-14 15:25:14 +00:00
_namedtensor_internals.py
_ops.py
_python_dispatcher.py
_size_docs.py
_sources.py
_storage_docs.py
_streambase.py
_tensor.py type annotations for meta_utils (#140203) 2024-11-13 20:07:47 +00:00
_tensor_docs.py
_tensor_str.py
_thread_safe_fork.py
_torch_docs.py Fix type description of torch.chunk (#140089) 2024-11-08 15:21:13 +00:00
_utils.py Revert "Deprecate torch._utils.is_compiling() and torch._dynamo.external_utils.is_compiling() (#127690)" 2024-11-05 23:10:38 +00:00
_utils_internal.py [pytorch] Add logger for pt2 compile chromium events to hive (#139941) 2024-11-14 18:27:38 +00:00
_VF.py
_vmap_internals.py
_weights_only_unpickler.py Revert "Allow NJT by default for weights_only torch.load (#140304)" 2024-11-13 15:24:00 +00:00
abi-check.cpp
CMakeLists.txt Add torch.version.xpu (#139466) 2024-11-09 13:31:21 +00:00
custom_class.h
custom_class_detail.h
extension.h
functional.py
hub.py
library.h
library.py no-op torch.library.custom_op APIs on torch.deploy (#139509) 2024-11-04 18:01:08 +00:00
overrides.py
py.typed
quasirandom.py
random.py
README.txt
return_types.py
script.h
serialization.py Fix get_unsafe_globals_in_checkpoint to account for user allowed globals per docstring (#140738) 2024-11-15 22:47:35 +00:00
storage.py
torch_version.py
types.py
version.py.tpl

Note [TH abstraction violation]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

TH/THC provide some hpp headers, which are proper C++ headers rather than
C headers.  These headers serve double duty as *internal implementation
detail* headers, whose contents should largely not be used by external
clients.

Ideally, we would not install these headers at all; instead, you should
use public functions (in headers like `THTensor.h`, NOT `THTensor.hpp`)
to manipulate these structs.  However, there are a few places
in torch/csrc where we violate this abstraction.  They are marked with
a pointer to this note.  Each of those sites will have to be refactored
when we refactor the guts of THTensor and related structures.