pytorch/torch/cuda
Nichols A. Romero a99332eb25 [ROCM] Support Multi-GPU offline tuning in TunableOp (#139673)
This PR enhances offline tuning to support multi-GPUs.

High-level description of algorithm:
- Duplicate GEMMs are first eliminated
- GEMMs are distributed to multi-GPUs for tuning
- Results are gathered into a file with `_full` in the filename

Also adding support for GemmAndBias and ScaledGemm

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139673
Approved by: https://github.com/jeffdaily, https://github.com/hongxiayang
2024-11-26 19:07:41 +00:00
..
amp
__init__.py [ROCm] AMDSMI memory usage unification (#139900) 2024-11-21 21:11:39 +00:00
_gpu_trace.py
_memory_viz.py
_sanitizer.py
_utils.py
comm.py
error.py
gds.py
graphs.py
jiterator.py
memory.py fix: Add type annotation to _record_memory_history (#140545) 2024-11-14 17:44:46 +00:00
nccl.py
nvtx.py
profiler.py
random.py [BE]: Apply PERF401 autofixes from ruff (#140980) 2024-11-20 17:52:07 +00:00
sparse.py
streams.py
tunable.py [ROCM] Support Multi-GPU offline tuning in TunableOp (#139673) 2024-11-26 19:07:41 +00:00