pytorch/c10/cuda
Aaron Enye Shi aa4ad711ef [CCA][Memory Snapshot] Create TraceEntryRingBuffer class for alloc_trace logic (#130741)
Summary:
Move the alloc_trace logic into a separate class, to reduce risk of deadlocks when mixing with CCA's lock. Switch to an std::mutex instead of std::recursive_mutex.

Let's us re-use the logic in TraceEntryRingBuffer class for later diffs.

Test Plan: CI, resnet run, and FBR model.

Differential Revision: D59690408

Pulled By: aaronenyeshi

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130741
Approved by: https://github.com/davidberard98
2024-07-16 15:01:48 +00:00
..
impl [1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301) 2024-07-08 07:03:53 +00:00
test [Split Build] Add option to create libtorch wheel and use it to build pytorch as a separate wheel (#126328) 2024-05-29 04:33:56 +00:00
BUILD.bazel
build.bzl
CMakeLists.txt [Split Build] Load dependencies from libtorch in __init__.py (#126826) 2024-05-29 22:03:50 +00:00
CUDAAlgorithm.h
CUDAAllocatorConfig.cpp [CUDA Caching Allocator] Allow division of 0 (#126833) 2024-05-22 17:40:39 +00:00
CUDAAllocatorConfig.h
CUDACachingAllocator.cpp [CCA][Memory Snapshot] Create TraceEntryRingBuffer class for alloc_trace logic (#130741) 2024-07-16 15:01:48 +00:00
CUDACachingAllocator.h [Memory Snapshot] Add recordAnnotations to capture record_function annotations (#129072) 2024-06-19 18:05:41 +00:00
CUDADeviceAssertion.h
CUDADeviceAssertionHost.cpp
CUDADeviceAssertionHost.h
CUDAException.cpp
CUDAException.h
CUDAFunctions.cpp [1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301) 2024-07-08 07:03:53 +00:00
CUDAFunctions.h [codemod] c10:optional -> std::optional (#126135) 2024-05-14 19:35:51 +00:00
CUDAGraphsC10Utils.h [ROCm] enforce ROCM_VERSION >= 6.0 (#125646) 2024-05-12 18:01:28 +00:00
CUDAGuard.h [9/N] Replace c10::optional with std::optional (#130674) 2024-07-15 00:48:43 +00:00
CUDAMacros.h
CUDAMallocAsyncAllocator.cpp
CUDAMathCompat.h Revert " [5/N] Change static functions in headers to inline (#130673)" 2024-07-15 03:27:11 +00:00
CUDAMiscFunctions.cpp Made FlexAttention rewrite getitem calls to use aten.index in score_mod (#124799) 2024-04-26 17:22:13 +00:00
CUDAMiscFunctions.h
CUDAStream.cpp
CUDAStream.h
driver_api.cpp
driver_api.h Introduce a prototype for SymmetricMemory (#128582) 2024-06-21 08:49:11 +00:00
README.md

c10/cuda is a core library with CUDA functionality. It is distinguished from c10 in that it links against the CUDA library, but like c10 it doesn't contain any kernels, and consists solely of core functionality that is generally useful when writing CUDA code; for example, C++ wrappers for the CUDA C API.

Important notes for developers. If you want to add files or functionality to this folder, TAKE NOTE. The code in this folder is very special, because on our AMD GPU build, we transpile it into c10/hip to provide a ROCm environment. Thus, if you write:

// c10/cuda/CUDAFoo.h
namespace c10 { namespace cuda {

void my_func();

}}

this will get transpiled into:

// c10/hip/HIPFoo.h
namespace c10 { namespace hip {

void my_func();

}}

Thus, if you add new functionality to c10, you must also update C10_MAPPINGS torch/utils/hipify/cuda_to_hip_mappings.py to transpile occurrences of cuda::my_func to hip::my_func. (At the moment, we do NOT have a catch all cuda:: to hip:: namespace conversion, as not all cuda namespaces are converted to hip::, even though c10's are.)

Transpilation inside this folder is controlled by CAFFE2_SPECIFIC_MAPPINGS (oddly enough.) C10_MAPPINGS apply to ALL source files.

If you add a new directory to this folder, you MUST update both c10/cuda/CMakeLists.txt and c10/hip/CMakeLists.txt