pytorch/torch
Iris Zhang (PyTorch) 43f4e71daa Making _MeshEnv subclassing thread local (#124555)
With _mesh_resources being global var, when thread pg based testing is used (aka spawn_threads_and_init_comms()), the last rank with the same key would overwrite the formers. This isn't an issue in regular process-based runtime as logically each key is unique.

Example failure: https://github.com/pytorch/pytorch/actions/runs/8779134353/job/24087295785
```
RuntimeError: Could not resolve the process group registered under the name 8
or
Throwing assert not none error
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124555
Approved by: https://github.com/xunnanxu, https://github.com/wanchaol
2024-04-26 02:45:42 +00:00
..
_awaits
_C Revert "torch.mtia module for MTIA device backend (#123612)" 2024-04-25 16:06:46 +00:00
_C_flatbuffer
_custom_op Move schema inference to torch._library (#124199) 2024-04-19 17:56:30 +00:00
_decomp Fix global flake8 issues (#124771) 2024-04-25 14:25:00 +00:00
_dispatch
_dynamo Revert "Made FlexAttention rewrite getitem calls to use aten.index in score_mod (#124799)" 2024-04-26 02:35:14 +00:00
_export [export] Serialize empty list based on argument type (#123748) 2024-04-25 23:03:27 +00:00
_functorch Revert "Made FlexAttention rewrite getitem calls to use aten.index in score_mod (#124799)" 2024-04-26 02:35:14 +00:00
_higher_order_ops Revert "Made FlexAttention rewrite getitem calls to use aten.index in score_mod (#124799)" 2024-04-26 02:35:14 +00:00
_inductor Add support for capturing tensors with score_mod (#124444) 2024-04-26 01:02:28 +00:00
_lazy
_library [custom_op] setup_context fills in default values (#124852) 2024-04-25 04:22:01 +00:00
_logging Refactor all top level usages of record_shapeenv_event to ShapeEnv class (#123735) 2024-04-25 14:02:48 +00:00
_numpy Fix global flake8 issues (#124771) 2024-04-25 14:25:00 +00:00
_prims [BE]: Update ruff to 0.4.1 (#124549) 2024-04-21 14:06:23 +00:00
_prims_common Revert "Made FlexAttention rewrite getitem calls to use aten.index in score_mod (#124799)" 2024-04-26 02:35:14 +00:00
_refs guard_size_oblivious in unbind (#124959) 2024-04-25 23:45:14 +00:00
_subclasses Typo fix: s/nonzero/unique/ (#124935) 2024-04-25 17:22:50 +00:00
_vendor
amp refactor autocast python APIs (#124479) 2024-04-25 14:33:33 +00:00
ao Fix global flake8 issues (#124771) 2024-04-25 14:25:00 +00:00
autograd [Profiler][PrivateUse1] Profiler support PrivateUse1 key (#124818) 2024-04-24 18:52:08 +00:00
backends preferred blas library; cublaslt gemm implementation (#122106) 2024-04-22 15:38:22 +00:00
compiler
contrib
cpu
csrc Revert "Made FlexAttention rewrite getitem calls to use aten.index in score_mod (#124799)" 2024-04-26 02:35:14 +00:00
cuda [BE]: TRY002 - Ban raising vanilla exceptions (#124570) 2024-04-21 22:26:40 +00:00
distributed Making _MeshEnv subclassing thread local (#124555) 2024-04-26 02:45:42 +00:00
distributions [BE]: Update ruff to 0.4.1 (#124549) 2024-04-21 14:06:23 +00:00
export [export] Fix state dict reparametrization in non-strict. (#124847) 2024-04-25 22:44:16 +00:00
fft
func
futures
fx Revert "remove empty partition (#124920)" 2024-04-26 02:03:01 +00:00
jit [BE]: TRY002 - Ban raising vanilla exceptions (#124570) 2024-04-21 22:26:40 +00:00
legacy
lib
linalg Fix global flake8 issues (#124771) 2024-04-25 14:25:00 +00:00
masked Fix global flake8 issues (#124771) 2024-04-25 14:25:00 +00:00
monitor
mps Conform torch.mps to device module interface (#124676) 2024-04-23 18:38:48 +00:00
multiprocessing
nested Fix global flake8 issues (#124771) 2024-04-25 14:25:00 +00:00
nn [export] Fix state dict reparametrization in non-strict. (#124847) 2024-04-25 22:44:16 +00:00
onnx Fix global flake8 issues (#124771) 2024-04-25 14:25:00 +00:00
optim add fused_sgd_kernel support for CPU device (#123629) 2024-04-23 08:28:19 +00:00
package Fix global flake8 issues (#124771) 2024-04-25 14:25:00 +00:00
profiler Revert "OSS: Capture triton kernel in ET (#124775)" 2024-04-25 11:24:39 +00:00
quantization
signal Fix global flake8 issues (#124771) 2024-04-25 14:25:00 +00:00
sparse Fix a bug in retrieving approximate bsr_dense_addmm kernel meta data (#124371) 2024-04-24 13:59:18 +00:00
special Fix global flake8 issues (#124771) 2024-04-25 14:25:00 +00:00
testing Revert "Made FlexAttention rewrite getitem calls to use aten.index in score_mod (#124799)" 2024-04-26 02:35:14 +00:00
utils refactor autocast python APIs (#124479) 2024-04-25 14:33:33 +00:00
xpu [BE]: TRY002 - Ban raising vanilla exceptions (#124570) 2024-04-21 22:26:40 +00:00
__config__.py
__future__.py
__init__.py Revert "torch.mtia module for MTIA device backend (#123612)" 2024-04-25 16:06:46 +00:00
_appdirs.py
_classes.py
_compile.py
_custom_ops.py [BE] enable ruff rule RSE and remove useless parentheses in raise statements (#124261) 2024-04-17 19:29:34 +00:00
_deploy.py
_guards.py Restore CompileContext as well in backwards (#124626) 2024-04-23 14:39:52 +00:00
_jit_internal.py [BE]: TRY002 - Ban raising vanilla exceptions (#124570) 2024-04-21 22:26:40 +00:00
_linalg_utils.py
_lobpcg.py
_lowrank.py
_meta_registrations.py Fix global flake8 issues (#124771) 2024-04-25 14:25:00 +00:00
_namedtensor_internals.py
_ops.py Fix mypy issues in fake_tensor.py (#124428) 2024-04-25 14:07:53 +00:00
_python_dispatcher.py
_size_docs.py Added a docstring for torch.Size.numel. (#124186) 2024-04-19 09:23:02 +00:00
_sources.py
_storage_docs.py
_streambase.py [BE] enable ruff rule RSE and remove useless parentheses in raise statements (#124261) 2024-04-17 19:29:34 +00:00
_tensor.py Add testing and fix weights_only load for quantized types and nn.Parameters with python attrs (#124330) 2024-04-23 04:13:26 +00:00
_tensor_docs.py
_tensor_str.py
_torch_docs.py Fix global flake8 issues (#124771) 2024-04-25 14:25:00 +00:00
_utils.py Revert "torch.mtia module for MTIA device backend (#123612)" 2024-04-25 16:06:46 +00:00
_utils_internal.py [ROCm] Triton upstream AMD backend integration (#121801) 2024-04-25 20:44:27 +00:00
_VF.py
_vmap_internals.py
_weights_only_unpickler.py Add testing and fix weights_only load for quantized types and nn.Parameters with python attrs (#124330) 2024-04-23 04:13:26 +00:00
abi-check.cpp
CMakeLists.txt [rfc] opentelemetry in pytorch (#122999) 2024-04-21 15:20:21 +00:00
custom_class.h
custom_class_detail.h
extension.h
functional.py
hub.py [BE]: TRY002 - Ban raising vanilla exceptions (#124570) 2024-04-21 22:26:40 +00:00
library.h Verify types in custom op schemas (#124520) 2024-04-25 01:56:58 +00:00
library.py Delete erroneous print (#124972) 2024-04-26 00:07:54 +00:00
overrides.py Revert "torch.mtia module for MTIA device backend (#123612)" 2024-04-25 16:06:46 +00:00
py.typed
quasirandom.py
random.py
README.txt
return_types.py
script.h
serialization.py
storage.py Add testing and fix weights_only load for quantized types and nn.Parameters with python attrs (#124330) 2024-04-23 04:13:26 +00:00
torch_version.py
types.py
version.py.tpl

Note [TH abstraction violation]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

TH/THC provide some hpp headers, which are proper C++ headers rather than
C headers.  These headers serve double duty as *internal implementation
detail* headers, whose contents should largely not be used by external
clients.

Ideally, we would not install these headers at all; instead, you should
use public functions (in headers like `THTensor.h`, NOT `THTensor.hpp`)
to manipulate these structs.  However, there are a few places
in torch/csrc where we violate this abstraction.  They are marked with
a pointer to this note.  Each of those sites will have to be refactored
when we refactor the guts of THTensor and related structures.