mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-14 20:57:59 +00:00

History

Sheng Qin c9222b7471 Implement clip_ranges operator for PyTorch Test Plan: unit test for correctness ``` buck test caffe2/torch/fb/sparsenn:test -- test_clip_ranges Parsing buck files: finished in 1.6 sec Creating action graph: finished in 18.9 sec Building: finished in 15.0 sec (100%) 9442/9442 jobs, 1 updated Total time: 35.6 sec More details at https://www.internalfb.com/intern/buck/build/66fb17de-859e-4d01-89bf-5c5de2950693 Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 80f5e0c2-7db2-48a4-b148-25dd34651682 Trace available for this run at /tmp/tpx-20201026-123217.050766/trace.log Started reporting to test run: https://our.intern.facebook.com/intern/testinfra/testrun/4503599665041422 ✓ ListingSuccess: caffe2/torch/fb/sparsenn:test - main (14.912) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_clip_ranges (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (14.098) Summary Pass: 1 ListingSuccess: 1 Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/4503599665041422 ``` new benchmark perf test ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH6_M1_N2_MAX_LENGTH1_dtypetorch.int32_cpu # Input: LENGTH: 6, M: 1, N: 2, MAX_LENGTH: 1, dtype: torch.int32, device: cpu Forward Execution Time (us) : 155.765 # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH7_M1_N2_MAX_LENGTH2_dtypetorch.int32_cpu # Input: LENGTH: 7, M: 1, N: 2, MAX_LENGTH: 2, dtype: torch.int32, device: cpu Forward Execution Time (us) : 156.248 # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH8_M1_N2_MAX_LENGTH3_dtypetorch.int32_cpu # Input: LENGTH: 8, M: 1, N: 2, MAX_LENGTH: 3, dtype: torch.int32, device: cpu Forward Execution Time (us) : 156.634 # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH9_M1_N2_MAX_LENGTH4_dtypetorch.int32_cpu # Input: LENGTH: 9, M: 1, N: 2, MAX_LENGTH: 4, dtype: torch.int32, device: cpu Forward Execution Time (us) : 155.408 # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH10_M1_N2_MAX_LENGTH5_dtypetorch.int32_cpu # Input: LENGTH: 10, M: 1, N: 2, MAX_LENGTH: 5, dtype: torch.int32, device: cpu Forward Execution Time (us) : 165.168 ``` Compare with the old implementation, there are around 300us gain ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH6_M1_N2_MAX_LENGTH1_dtypetorch.int32_cpu # Input: LENGTH: 6, M: 1, N: 2, MAX_LENGTH: 1, dtype: torch.int32, device: cpu Forward Execution Time (us) : 443.012 # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH7_M1_N2_MAX_LENGTH2_dtypetorch.int32_cpu # Input: LENGTH: 7, M: 1, N: 2, MAX_LENGTH: 2, dtype: torch.int32, device: cpu Forward Execution Time (us) : 446.480 # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH8_M1_N2_MAX_LENGTH3_dtypetorch.int32_cpu # Input: LENGTH: 8, M: 1, N: 2, MAX_LENGTH: 3, dtype: torch.int32, device: cpu Forward Execution Time (us) : 444.064 # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH9_M1_N2_MAX_LENGTH4_dtypetorch.int32_cpu # Input: LENGTH: 9, M: 1, N: 2, MAX_LENGTH: 4, dtype: torch.int32, device: cpu Forward Execution Time (us) : 445.511 # Benchmarking PyTorch: clip_ranges # Mode: JIT # Name: clip_ranges_LENGTH10_M1_N2_MAX_LENGTH5_dtypetorch.int32_cpu # Input: LENGTH: 10, M: 1, N: 2, MAX_LENGTH: 5, dtype: torch.int32, device: cpu Forward Execution Time (us) : 450.468 ``` Reviewed By: MarcioPorto Differential Revision: D24546110 fbshipit-source-id: e6c9b38e911f177f97961ede5bf375107f240363		2020-10-28 09:46:37 -07:00
..
cpp/tensorexpr	[pytorch][te] Add compilation time benchmark (#46124 )	2020-10-09 23:11:37 -07:00
distributed/ddp
fastrnns	Benchmarks: tweak PE config settings. (#45349 )	2020-09-26 23:13:29 -07:00
framework_overhead_benchmark	Remove py2 compatible future imports (#44735 )	2020-09-16 12:55:57 -07:00
functional_autograd_benchmark	Reland of benchmark code (#43428 )	2020-08-24 13:27:26 -07:00
operator_benchmark	Implement clip_ranges operator for PyTorch	2020-10-28 09:46:37 -07:00
overrides_benchmark	Add __torch_function__ for methods (#37091 )	2020-08-05 20:44:13 -07:00
profiler_benchmark	Source code level attribution in profiler (#43898 )	2020-09-30 00:57:35 -07:00
record_function_benchmark	Fix D23995953 import.	2020-09-29 19:30:23 -07:00
serialization
static_runtime	[StaticRuntime] Threading model (#46219 )	2020-10-20 14:37:30 -07:00
tensorexpr	[NVFuser]Benchmark minor update (#46778 )	2020-10-26 12:22:36 -07:00
compare-fastrnn-results.py	Benchmarks: add scripts for FastRNNs results comparison. (#44134 )	2020-09-03 13:44:42 -07:00
compare.sh	Benchmarks: add scripts for FastRNNs results comparison. (#44134 )	2020-09-03 13:44:42 -07:00
README.md
upload_scribe.py	Benchmarks: make fuser and executor configurable from command line. (#44291 )	2020-09-09 11:59:35 -07:00

README.md

PyTorch Benchmarks

NOTE: This folder is currently work in progress.

This folder contains scripts that produce reproducible timings of various PyTorch features.

It also provides mechanisms to compare PyTorch with other frameworks.

Setup environment

Make sure you're on a machine with CUDA, torchvision, and pytorch installed. Install in the following order:

# Install torchvision. It comes with the pytorch stable release binary
conda install pytorch torchvision -c pytorch

# Install the latest pytorch master from source.
# It should supersede the installation from the release binary.
cd $PYTORCH_HOME
python setup.py build develop

# Check the pytorch installation version
python -c "import torch; print(torch.__version__)"

Benchmark List

Please refer to each subfolder to discover each benchmark suite

Fast RNNs benchmarks