pytorch/c10/cuda
Michal Gallus 3f5ed05688 [Windows][ROCm] Fix c10 hip tests (#146599)
- Solves a problem related to .hip source files being ignored by the build system when HIP language is not enabled in CMake.
- Also ensures that the test executables link to an appropriate CRT Runtime Library and hence have access to all the necessary symbols. Previously, there were many problems related to linkage errors.
- Moves part of Linux-related hipBLASLt changes in `LoadHIP.cmake` under the UNIX conditional branch, as these aren't supported on Windows yet.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146599
Approved by: https://github.com/jeffdaily
2025-02-06 23:41:25 +00:00
..
impl Fix torch.accelerator api abort when passing invaild device (#143550) 2024-12-23 03:44:22 +00:00
test [Windows][ROCm] Fix c10 hip tests (#146599) 2025-02-06 23:41:25 +00:00
BUILD.bazel
build.bzl
CMakeLists.txt
CUDAAlgorithm.h
CUDAAllocatorConfig.cpp Enable more readability-redundant checks (#143963) 2024-12-30 14:49:33 +00:00
CUDAAllocatorConfig.h
CUDACachingAllocator.cpp [CCA] remove TODO for hardware_destructive_interference_size (#145591) 2025-02-06 14:41:25 +00:00
CUDACachingAllocator.h [4/N] Avoid copy in std::get (#142285) 2024-12-09 07:59:35 +00:00
CUDADeviceAssertion.h
CUDADeviceAssertionHost.cpp
CUDADeviceAssertionHost.h
CUDAException.cpp
CUDAException.h
CUDAFunctions.cpp [18/N] Fix extra warnings brought by clang-tidy-17 (#144014) 2025-01-08 17:21:55 +00:00
CUDAFunctions.h use copy2d in h2d/d2h copy when possible (#146256) 2025-02-03 23:07:54 +00:00
CUDAGraphsC10Utils.h
CUDAGuard.h Enable more readability-redundant checks (#143963) 2024-12-30 14:49:33 +00:00
CUDAMacros.h Revert "Increase C10_COMPILE_TIME_MAX_GPUS to 128 (#144138)" 2025-01-14 19:04:12 +00:00
CUDAMallocAsyncAllocator.cpp Add API query for available per-process CUDA memory (#140620) 2024-12-03 00:24:03 +00:00
CUDAMathCompat.h
CUDAMiscFunctions.cpp
CUDAMiscFunctions.h
CUDAStream.cpp Revert "Increase C10_COMPILE_TIME_MAX_GPUS to 128 (#144138)" 2025-01-14 19:04:12 +00:00
CUDAStream.h
driver_api.cpp
driver_api.h [SymmetricMemory] introduce a binding for cuMemset32Async (#138755) 2024-11-05 18:47:24 +00:00
README.md

c10/cuda is a core library with CUDA functionality. It is distinguished from c10 in that it links against the CUDA library, but like c10 it doesn't contain any kernels, and consists solely of core functionality that is generally useful when writing CUDA code; for example, C++ wrappers for the CUDA C API.

Important notes for developers. If you want to add files or functionality to this folder, TAKE NOTE. The code in this folder is very special, because on our AMD GPU build, we transpile it into c10/hip to provide a ROCm environment. Thus, if you write:

// c10/cuda/CUDAFoo.h
namespace c10 { namespace cuda {

void my_func();

}}

this will get transpiled into:

// c10/hip/HIPFoo.h
namespace c10 { namespace hip {

void my_func();

}}

Thus, if you add new functionality to c10, you must also update C10_MAPPINGS torch/utils/hipify/cuda_to_hip_mappings.py to transpile occurrences of cuda::my_func to hip::my_func. (At the moment, we do NOT have a catch all cuda:: to hip:: namespace conversion, as not all cuda namespaces are converted to hip::, even though c10's are.)

Transpilation inside this folder is controlled by CAFFE2_SPECIFIC_MAPPINGS (oddly enough.) C10_MAPPINGS apply to ALL source files.

If you add a new directory to this folder, you MUST update both c10/cuda/CMakeLists.txt and c10/hip/CMakeLists.txt