pytorch/benchmarks
Brian Hirsh 7ca156f0ee partitioner: avoid inserting duplicates into heap (#145082)
Fixes https://github.com/pytorch/pytorch/issues/145081

This looks like it was a source of quadratic compile times in the torchtitan CP graphs. There's some code in the partitioner that iteratively adds users of a node to a heap, and pops the earliest user. If you have long parallel chains of fusible ops that all eventually feed into some shared ops, then this can result in:
(1) a node getting added to the heap many times
(2) each time we pop that node, we add (duplicates of) each of that node users to the heap
(3) repeat with each user

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145082
Approved by: https://github.com/xmfan
2025-01-28 23:44:45 +00:00
..
distributed Revert "Use absolute path path.resolve() -> path.absolute() (#129409)" 2025-01-04 14:17:20 +00:00
dynamo partitioner: avoid inserting duplicates into heap (#145082) 2025-01-28 23:44:45 +00:00
fastrnns PEP585 update - benchmarks tools torchgen (#145101) 2025-01-18 05:05:07 +00:00
framework_overhead_benchmark
functional_autograd_benchmark PEP585 update - benchmarks tools torchgen (#145101) 2025-01-18 05:05:07 +00:00
fuser
gpt_fast Fix broken gpt_fast micro benchmark after #144315 (#145235) 2025-01-21 17:42:24 +00:00
inference
instruction_counts PEP585 update - benchmarks tools torchgen (#145101) 2025-01-18 05:05:07 +00:00
nested
operator_benchmark Additional operators in operator benchmark (#145625) 2025-01-26 19:20:02 +00:00
overrides_benchmark
profiler_benchmark Apply TorchFix TOR203 fixes (#143691) 2024-12-23 18:21:03 +00:00
record_function_benchmark
serialization
sparse
static_runtime Re-enable some C++ warnings (#142332) 2024-12-12 04:02:12 +00:00
tensorexpr [BE][CI] bump ruff to 0.8.4 (#143753) 2024-12-24 12:24:10 +00:00
transformer PEP585 update - benchmarks tools torchgen (#145101) 2025-01-18 05:05:07 +00:00
compare-fastrnn-results.py
compare.sh
README.md
upload_scribe.py

PyTorch Benchmarks

This folder contains scripts that produce reproducible timings of various PyTorch features.

It also provides mechanisms to compare PyTorch with other frameworks.

Setup environment

Make sure you're on a machine with CUDA, torchvision, and pytorch installed. Install in the following order:

# Install torchvision. It comes with the pytorch stable release binary
conda install pytorch torchvision -c pytorch

# Install the latest pytorch master from source.
# It should supersede the installation from the release binary.
cd $PYTORCH_HOME
python setup.py build develop

# Check the pytorch installation version
python -c "import torch; print(torch.__version__)"

Benchmark List

Please refer to each subfolder to discover each benchmark suite. Links are provided where descriptions exist: