pytorch/c10
Frank Lin e9dcda5cba Graph-Safe RNG State Exchange for Tensor Parallelism (#114068)
See #113541

The PR allows for registering and controlling multiple RNG states using indices, ensuring cudagraph-safe operations, and includes both C++ and Python API changes to support this functionality.

cc  @eellison @anijain2305 @jansel @ezyang @ptrblck @csarofeen @mcarilli
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114068
Approved by: https://github.com/ezyang
2024-03-21 01:57:08 +00:00
..
benchmark
core Graph-Safe RNG State Exchange for Tensor Parallelism (#114068) 2024-03-21 01:57:08 +00:00
cuda Refactor gpu trace to be device-agnostic (#121794) 2024-03-21 01:52:58 +00:00
hip [ROCm] remove HCC references (#111975) 2023-10-26 02:39:10 +00:00
macros Remove C10_FALLTHROUGH (#120157) 2024-02-21 06:18:58 +00:00
mobile Fix broken lint after #116876 (#122253) 2024-03-20 04:09:00 +00:00
test Make SparseCsr a functionality dispatch key (#120703) 2024-03-01 13:28:46 +00:00
util [Fix] Fixed behaviour for the conversion of complex tensors to bool (#121803) 2024-03-14 13:35:15 +00:00
xpu Support gpu trace on XPU (#121795) 2024-03-21 01:56:42 +00:00
BUCK.oss [1/4] Intel GPU Runtime Upstreaming for Device (#116019) 2024-01-12 07:36:25 +00:00
BUILD.bazel
build.bzl
CMakeLists.txt Remove redundant CMake NUMA code (#119650) 2024-02-12 21:53:44 +00:00
ovrsource_defs.bzl [ROCm] Disabling Kernel Asserts for ROCm by default - fix and clean up and refactoring (#114660) 2023-12-13 15:44:53 +00:00