mirror of
https://github.com/saymrwulf/pytorch.git
synced 2026-05-14 20:57:59 +00:00
As `cuBLAS` workspaces are already per-stream, there shouldn't be kernel execution overlap with `cuBLASLt` kernels. This PR reuses `cuBLAS` workspaces for `cuBLASLt` for the following benefits: + caching (`cuBLAS` workspaces were already cached, so now we get that for `cuBLASLt`) + "free" workspace size bump for `cuBLASLt` `cuBLASLt` workspace sizes were previously smaller than those for `cuBLAS` by default which potentially hurts performance, and we encountered difficulty in increasing the size due to downstream OOMs , see also #120925 + fixes behavior broken behavior with the memtracker; https://github.com/pytorch/pytorch/pull/139442 attempted to handle peaky allocation behavior that broke memtracker equivalence tests but it didn't seem to fully work, here the cached/reused `cuBLAS` workspace seems to fix it + one environment variable to rule them all: `CUBLAS_WORKSPACE_CONFIG` applies directly to `cuBLASLt` without a confusing `CUBLASLT_WORKSPACE_SIZE` that users would also need to consider Pull Request resolved: https://github.com/pytorch/pytorch/pull/145130 Approved by: https://github.com/ngimel |
||
|---|---|---|
| .. | ||
| distributed | ||
| dynamo | ||
| fastrnns | ||
| framework_overhead_benchmark | ||
| functional_autograd_benchmark | ||
| fuser | ||
| gpt_fast | ||
| inference | ||
| instruction_counts | ||
| nested | ||
| operator_benchmark | ||
| overrides_benchmark | ||
| profiler_benchmark | ||
| record_function_benchmark | ||
| serialization | ||
| sparse | ||
| static_runtime | ||
| tensorexpr | ||
| transformer | ||
| compare-fastrnn-results.py | ||
| compare.sh | ||
| README.md | ||
| upload_scribe.py | ||
PyTorch Benchmarks
This folder contains scripts that produce reproducible timings of various PyTorch features.
It also provides mechanisms to compare PyTorch with other frameworks.
Setup environment
Make sure you're on a machine with CUDA, torchvision, and pytorch installed. Install in the following order:
# Install torchvision. It comes with the pytorch stable release binary
conda install pytorch torchvision -c pytorch
# Install the latest pytorch master from source.
# It should supersede the installation from the release binary.
cd $PYTORCH_HOME
python setup.py build develop
# Check the pytorch installation version
python -c "import torch; print(torch.__version__)"
Benchmark List
Please refer to each subfolder to discover each benchmark suite. Links are provided where descriptions exist: