pytorch/c10
Frank Lin 249e65b92d Graph-Safe RNG State Exchange for Tensor Parallelism (#114068)
See #113541

The PR allows for registering and controlling multiple RNG states using indices, ensuring cudagraph-safe operations, and includes both C++ and Python API changes to support this functionality.

cc  @eellison @anijain2305 @jansel @ezyang @ptrblck @csarofeen @mcarilli
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114068
Approved by: https://github.com/ezyang, https://github.com/eqy, https://github.com/xuzhao9
2024-03-27 01:14:38 +00:00
..
benchmark
core Graph-Safe RNG State Exchange for Tensor Parallelism (#114068) 2024-03-27 01:14:38 +00:00
cuda Remove unused variables (#122496) 2024-03-22 18:04:09 +00:00
hip
macros Remove C10_FALLTHROUGH (#120157) 2024-02-21 06:18:58 +00:00
mobile Fix broken lint after #116876 (#122253) 2024-03-20 04:09:00 +00:00
test Make SparseCsr a functionality dispatch key (#120703) 2024-03-01 13:28:46 +00:00
util [Fix] Fixed behaviour for the conversion of complex tensors to bool (#121803) 2024-03-14 13:35:15 +00:00
xpu Revert "Support gpu trace on XPU (#121795)" 2024-03-21 20:33:16 +00:00
BUCK.oss
BUILD.bazel
build.bzl
CMakeLists.txt Remove redundant CMake NUMA code (#119650) 2024-02-12 21:53:44 +00:00
ovrsource_defs.bzl