mirror of
https://github.com/saymrwulf/pytorch.git
synced 2026-05-14 20:57:59 +00:00
Attempt number 2 at https://github.com/pytorch/pytorch/issues/108950. Improves debugging for guard failures/recompilations by: - only running guard fail reason generation during recompilation, instead of when a guard fails during dynamo cache lookup (so generating guard failure reasons is not on the critical path) - ~~always reporting all guard failures~~ Reports the first-failing guard failure for each cache entry. We don't expect a performance hit since the guard fail reasons are only generated at recompile time rather than runtime. Perf benchmark to check this (https://hud.pytorch.org/benchmark/torchbench/inductor_with_cudagraphs?startTime=Fri,%2027%20Oct%202023%2017:42:43%20GMT&stopTime=Fri,%2003%20Nov%202023%2017:42:43%20GMT&granularity=hour&mode=training&dtype=amp&lBranch=gh/williamwen42/62/head&lCommit=f4724f5ffc6d17ceae513a42fc18627be7b85482&rBranch=main&rCommit=29f3d392bf230072e3bffae37b078e770cae1956). We may also need to verify this on benchmarks where guard fails are common. Sample script: ```python import torch def generate_data(b): return ( torch.randn(b, 3, 32, 32).to(torch.float32).cuda(), torch.randint(1000, (b,)).cuda(), ) from torchvision.models import resnet18 def init_model(): return resnet18().to(torch.float32).cuda() model = init_model() model_opt = torch.compile(model, dynamic=False) for b in range(16, 32): data = generate_data(b) model_opt(data[0]) ``` Sample logs: ```bash (/data/users/williamwen/py310-env) [williamwen@devgpu020.odn1 /data/users/williamwen/pytorch (wwen/log-all-guards)]$ python playground5.py /data/users/williamwen/pytorch/torch/_inductor/compile_fx.py:141: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance. warnings.warn( [2023-11-06 14:50:47,605] torch._dynamo.convert_frame: [WARNING] torch._dynamo hit config.cache_size_limit (8) [2023-11-06 14:50:47,605] torch._dynamo.convert_frame: [WARNING] function: 'forward' (/data/users/williamwen/torchvision/torchvision/models/resnet.py:284) [2023-11-06 14:50:47,605] torch._dynamo.convert_frame: [WARNING] last reason: tensor 'L['x']' size mismatch at index 0. expected 16, actual 24 [2023-11-06 14:50:47,605] torch._dynamo.convert_frame: [WARNING] To log all recompilation reasons, use TORCH_LOGS="recompiles". [2023-11-06 14:50:47,605] torch._dynamo.convert_frame: [WARNING] To diagnose recompilation issues, see https://pytorch.org/docs/master/compile/troubleshooting.html. (/data/users/williamwen/py310-env) [williamwen@devgpu020.odn1 /data/users/williamwen/pytorch (wwen/log-all-guards)]$ TORCH_LOGS="recompiles" python playground5.py /data/users/williamwen/pytorch/torch/_inductor/compile_fx.py:141: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance. warnings.warn( [2023-11-06 14:53:31,591] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function forward in /data/users/williamwen/torchvision/torchvision/models/resnet.py:284 [2023-11-06 14:53:31,591] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:53:31,591] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 16, actual 17 [2023-11-06 14:53:41,333] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function forward in /data/users/williamwen/torchvision/torchvision/models/resnet.py:284 [2023-11-06 14:53:41,333] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:53:41,333] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 17, actual 18 [2023-11-06 14:53:41,333] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 16, actual 18 [2023-11-06 14:53:50,463] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function forward in /data/users/williamwen/torchvision/torchvision/models/resnet.py:284 [2023-11-06 14:53:50,463] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:53:50,463] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 18, actual 19 [2023-11-06 14:53:50,463] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 17, actual 19 [2023-11-06 14:53:50,463] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 16, actual 19 [2023-11-06 14:53:59,848] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function forward in /data/users/williamwen/torchvision/torchvision/models/resnet.py:284 [2023-11-06 14:53:59,848] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:53:59,848] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 19, actual 20 [2023-11-06 14:53:59,848] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 18, actual 20 [2023-11-06 14:53:59,848] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 17, actual 20 [2023-11-06 14:53:59,848] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 16, actual 20 [2023-11-06 14:54:08,549] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function forward in /data/users/williamwen/torchvision/torchvision/models/resnet.py:284 [2023-11-06 14:54:08,549] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:54:08,549] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 20, actual 21 [2023-11-06 14:54:08,549] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 19, actual 21 [2023-11-06 14:54:08,549] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 18, actual 21 [2023-11-06 14:54:08,549] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 17, actual 21 [2023-11-06 14:54:08,549] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 16, actual 21 [2023-11-06 14:54:17,795] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function forward in /data/users/williamwen/torchvision/torchvision/models/resnet.py:284 [2023-11-06 14:54:17,795] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:54:17,795] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 21, actual 22 [2023-11-06 14:54:17,795] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 20, actual 22 [2023-11-06 14:54:17,795] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 19, actual 22 [2023-11-06 14:54:17,795] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 18, actual 22 [2023-11-06 14:54:17,795] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 17, actual 22 [2023-11-06 14:54:17,795] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 16, actual 22 [2023-11-06 14:54:27,430] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function forward in /data/users/williamwen/torchvision/torchvision/models/resnet.py:284 [2023-11-06 14:54:27,430] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:54:27,430] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 22, actual 23 [2023-11-06 14:54:27,430] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 21, actual 23 [2023-11-06 14:54:27,430] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 20, actual 23 [2023-11-06 14:54:27,430] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 19, actual 23 [2023-11-06 14:54:27,430] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 18, actual 23 [2023-11-06 14:54:27,430] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 17, actual 23 [2023-11-06 14:54:27,430] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 16, actual 23 [2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function forward in /data/users/williamwen/torchvision/torchvision/models/resnet.py:284 [2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 23, actual 24 [2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 22, actual 24 [2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 21, actual 24 [2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 20, actual 24 [2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 19, actual 24 [2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 18, actual 24 [2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 17, actual 24 [2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 16, actual 24 [2023-11-06 14:54:36,744] torch._dynamo.convert_frame: [WARNING] torch._dynamo hit config.cache_size_limit (8) [2023-11-06 14:54:36,744] torch._dynamo.convert_frame: [WARNING] function: 'forward' (/data/users/williamwen/torchvision/torchvision/models/resnet.py:284) [2023-11-06 14:54:36,744] torch._dynamo.convert_frame: [WARNING] last reason: tensor 'L['x']' size mismatch at index 0. expected 16, actual 24 [2023-11-06 14:54:36,744] torch._dynamo.convert_frame: [WARNING] To log all recompilation reasons, use TORCH_LOGS="recompiles". [2023-11-06 14:54:36,744] torch._dynamo.convert_frame: [WARNING] To diagnose recompilation issues, see https://pytorch.org/docs/master/compile/troubleshooting.html. [2023-11-06 14:54:45,922] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function _forward_impl in /data/users/williamwen/torchvision/torchvision/models/resnet.py:266 [2023-11-06 14:54:45,922] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:54:45,922] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 24, actual 25 [2023-11-06 14:54:54,691] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function _forward_impl in /data/users/williamwen/torchvision/torchvision/models/resnet.py:266 [2023-11-06 14:54:54,691] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:54:54,691] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 25, actual 26 [2023-11-06 14:54:54,691] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 24, actual 26 [2023-11-06 14:55:03,591] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function _forward_impl in /data/users/williamwen/torchvision/torchvision/models/resnet.py:266 [2023-11-06 14:55:03,591] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:55:03,591] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 26, actual 27 [2023-11-06 14:55:03,591] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 25, actual 27 [2023-11-06 14:55:03,591] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 24, actual 27 [2023-11-06 14:55:12,384] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function _forward_impl in /data/users/williamwen/torchvision/torchvision/models/resnet.py:266 [2023-11-06 14:55:12,384] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:55:12,384] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 27, actual 28 [2023-11-06 14:55:12,384] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 26, actual 28 [2023-11-06 14:55:12,384] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 25, actual 28 [2023-11-06 14:55:12,384] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 24, actual 28 [2023-11-06 14:55:21,442] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function _forward_impl in /data/users/williamwen/torchvision/torchvision/models/resnet.py:266 [2023-11-06 14:55:21,442] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:55:21,442] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 28, actual 29 [2023-11-06 14:55:21,442] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 27, actual 29 [2023-11-06 14:55:21,442] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 26, actual 29 [2023-11-06 14:55:21,442] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 25, actual 29 [2023-11-06 14:55:21,442] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 24, actual 29 [2023-11-06 14:55:30,315] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function _forward_impl in /data/users/williamwen/torchvision/torchvision/models/resnet.py:266 [2023-11-06 14:55:30,315] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:55:30,315] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 29, actual 30 [2023-11-06 14:55:30,315] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 28, actual 30 [2023-11-06 14:55:30,315] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 27, actual 30 [2023-11-06 14:55:30,315] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 26, actual 30 [2023-11-06 14:55:30,315] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 25, actual 30 [2023-11-06 14:55:30,315] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 24, actual 30 [2023-11-06 14:55:39,839] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function _forward_impl in /data/users/williamwen/torchvision/torchvision/models/resnet.py:266 [2023-11-06 14:55:39,839] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:55:39,839] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 30, actual 31 [2023-11-06 14:55:39,839] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 29, actual 31 [2023-11-06 14:55:39,839] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 28, actual 31 [2023-11-06 14:55:39,839] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 27, actual 31 [2023-11-06 14:55:39,839] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 26, actual 31 [2023-11-06 14:55:39,839] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 25, actual 31 [2023-11-06 14:55:39,839] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 24, actual 31 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110325 Approved by: https://github.com/ezyang, https://github.com/jon-chuang |
||
|---|---|---|
| .. | ||
| cpp | ||
| distributed | ||
| dynamo | ||
| fastrnns | ||
| framework_overhead_benchmark | ||
| functional_autograd_benchmark | ||
| fuser | ||
| inference | ||
| instruction_counts | ||
| nested | ||
| operator_benchmark | ||
| overrides_benchmark | ||
| profiler_benchmark | ||
| record_function_benchmark | ||
| serialization | ||
| sparse | ||
| static_runtime | ||
| tensorexpr | ||
| transformer | ||
| compare-fastrnn-results.py | ||
| compare.sh | ||
| README.md | ||
| upload_scribe.py | ||
PyTorch Benchmarks
This folder contains scripts that produce reproducible timings of various PyTorch features.
It also provides mechanisms to compare PyTorch with other frameworks.
Setup environment
Make sure you're on a machine with CUDA, torchvision, and pytorch installed. Install in the following order:
# Install torchvision. It comes with the pytorch stable release binary
conda install pytorch torchvision -c pytorch
# Install the latest pytorch master from source.
# It should supersede the installation from the release binary.
cd $PYTORCH_HOME
python setup.py build develop
# Check the pytorch installation version
python -c "import torch; print(torch.__version__)"
Benchmark List
Please refer to each subfolder to discover each benchmark suite. Links are provided where descriptions exist: