pytorch/benchmarks
eqy 5f0901e573 [cuBLAS][cuBLASLt] Unify cuBLASLt workspaces with cuBLAS workspaces (#145130)
As `cuBLAS` workspaces are already per-stream, there shouldn't be kernel execution overlap with `cuBLASLt` kernels.

This PR reuses `cuBLAS` workspaces for `cuBLASLt` for the following benefits:

+ caching (`cuBLAS` workspaces were already cached, so now we get that for `cuBLASLt`)
+ "free" workspace size bump for `cuBLASLt` `cuBLASLt` workspace sizes were previously smaller than those for `cuBLAS` by default which potentially hurts performance, and we encountered difficulty in increasing the size due to downstream OOMs , see also #120925
+ fixes behavior broken behavior with the memtracker; https://github.com/pytorch/pytorch/pull/139442 attempted to handle peaky allocation behavior that broke memtracker equivalence tests but it didn't seem to fully work, here the cached/reused `cuBLAS` workspace seems to fix it
+ one environment variable to rule them all: `CUBLAS_WORKSPACE_CONFIG` applies directly to `cuBLASLt` without a confusing `CUBLASLT_WORKSPACE_SIZE` that users would also need to consider

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145130
Approved by: https://github.com/ngimel
2025-02-06 05:57:33 +00:00
..
distributed Revert "Use absolute path path.resolve() -> path.absolute() (#129409)" 2025-01-04 14:17:20 +00:00
dynamo [cuBLAS][cuBLASLt] Unify cuBLASLt workspaces with cuBLAS workspaces (#145130) 2025-02-06 05:57:33 +00:00
fastrnns PEP585 update - benchmarks tools torchgen (#145101) 2025-01-18 05:05:07 +00:00
framework_overhead_benchmark Fix unused Python variables outside torch/ and test/ (#136359) 2024-12-11 17:10:23 +00:00
functional_autograd_benchmark PEP585 update - benchmarks tools torchgen (#145101) 2025-01-18 05:05:07 +00:00
fuser Fix unused Python variables outside torch/ and test/ (#136359) 2024-12-11 17:10:23 +00:00
gpt_fast Fix broken gpt_fast micro benchmark after #144315 (#145235) 2025-01-21 17:42:24 +00:00
inference
instruction_counts PEP585 update - benchmarks tools torchgen (#145101) 2025-01-18 05:05:07 +00:00
nested Fix unused Python variables outside torch/ and test/ (#136359) 2024-12-11 17:10:23 +00:00
operator_benchmark Additional operators in operator benchmark (#145625) 2025-01-26 19:20:02 +00:00
overrides_benchmark
profiler_benchmark Apply TorchFix TOR203 fixes (#143691) 2024-12-23 18:21:03 +00:00
record_function_benchmark
serialization Fix unused Python variables outside torch/ and test/ (#136359) 2024-12-11 17:10:23 +00:00
sparse Fix unused Python variables outside torch/ and test/ (#136359) 2024-12-11 17:10:23 +00:00
static_runtime Re-enable some C++ warnings (#142332) 2024-12-12 04:02:12 +00:00
tensorexpr [BE][CI] bump ruff to 0.8.4 (#143753) 2024-12-24 12:24:10 +00:00
transformer PEP585 update - benchmarks tools torchgen (#145101) 2025-01-18 05:05:07 +00:00
compare-fastrnn-results.py
compare.sh
README.md
upload_scribe.py

PyTorch Benchmarks

This folder contains scripts that produce reproducible timings of various PyTorch features.

It also provides mechanisms to compare PyTorch with other frameworks.

Setup environment

Make sure you're on a machine with CUDA, torchvision, and pytorch installed. Install in the following order:

# Install torchvision. It comes with the pytorch stable release binary
conda install pytorch torchvision -c pytorch

# Install the latest pytorch master from source.
# It should supersede the installation from the release binary.
cd $PYTORCH_HOME
python setup.py build develop

# Check the pytorch installation version
python -c "import torch; print(torch.__version__)"

Benchmark List

Please refer to each subfolder to discover each benchmark suite. Links are provided where descriptions exist: