Applies a bunch of new ruff lint rules that are now stable. Some of these improve efficiency or readability. Since I already did passes on the codebase for these when they were in preview, there should be relatively few changes to the codebase. This is just more for future hardening of it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129825 Approved by: https://github.com/XuehaiPan, https://github.com/jansel, https://github.com/malfet |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| bench.py | ||
| cells.py | ||
| conftest.py | ||
| custom_lstms.py | ||
| factory.py | ||
| fuser.py | ||
| profile.py | ||
| README.md | ||
| runner.py | ||
| scratch.py | ||
| test.py | ||
| test_bench.py | ||
Fast RNN benchmarks
Benchmarks for TorchScript models
For most stable results, do the following:
- Set CPU Governor to performance mode (as opposed to energy save)
- Turn off turbo for all CPUs (assuming Intel CPUs)
- Shield cpus via
cset shieldwhen running benchmarks.
Some of these scripts accept command line args but most of them do not because I was lazy. They will probably be added sometime in the future, but the default sizes are pretty reasonable.
Test fastrnns (fwd + bwd) correctness
Test the fastrnns benchmarking scripts with the following:
python -m fastrnns.test
or run the test independently:
python -m fastrnns.test --rnns jit
Run benchmarks
python -m fastrnns.bench
should give a good comparison, or you can specify the type of model to run
python -m fastrnns.bench --rnns cudnn aten jit --group rnns
Run model profiling, calls nvprof
python -m fastrnns.profile
should generate nvprof file for all models somewhere. you can also specify the models to generate nvprof files separately:
python -m fastrnns.profile --rnns aten jit
Caveats
Use Linux for the most accurate timing. A lot of these tests only run on CUDA.