pytorch/benchmarks
Sheng Qin c9222b7471 Implement clip_ranges operator for PyTorch
Test Plan:
unit test for correctness
```
buck test caffe2/torch/fb/sparsenn:test -- test_clip_ranges
Parsing buck files: finished in 1.6 sec
Creating action graph: finished in 18.9 sec
Building: finished in 15.0 sec (100%) 9442/9442 jobs, 1 updated
  Total time: 35.6 sec
More details at https://www.internalfb.com/intern/buck/build/66fb17de-859e-4d01-89bf-5c5de2950693
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 80f5e0c2-7db2-48a4-b148-25dd34651682
Trace available for this run at /tmp/tpx-20201026-123217.050766/trace.log
Started reporting to test run: https://our.intern.facebook.com/intern/testinfra/testrun/4503599665041422
    ✓ ListingSuccess: caffe2/torch/fb/sparsenn:test - main (14.912)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_clip_ranges (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (14.098)
Summary
  Pass: 1
  ListingSuccess: 1
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/4503599665041422
```

new  benchmark perf test
```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: clip_ranges
# Mode: JIT
# Name: clip_ranges_LENGTH6_M1_N2_MAX_LENGTH1_dtypetorch.int32_cpu
# Input: LENGTH: 6, M: 1, N: 2, MAX_LENGTH: 1, dtype: torch.int32, device: cpu
Forward Execution Time (us) : 155.765

# Benchmarking PyTorch: clip_ranges
# Mode: JIT
# Name: clip_ranges_LENGTH7_M1_N2_MAX_LENGTH2_dtypetorch.int32_cpu
# Input: LENGTH: 7, M: 1, N: 2, MAX_LENGTH: 2, dtype: torch.int32, device: cpu
Forward Execution Time (us) : 156.248

# Benchmarking PyTorch: clip_ranges
# Mode: JIT
# Name: clip_ranges_LENGTH8_M1_N2_MAX_LENGTH3_dtypetorch.int32_cpu
# Input: LENGTH: 8, M: 1, N: 2, MAX_LENGTH: 3, dtype: torch.int32, device: cpu
Forward Execution Time (us) : 156.634

# Benchmarking PyTorch: clip_ranges
# Mode: JIT
# Name: clip_ranges_LENGTH9_M1_N2_MAX_LENGTH4_dtypetorch.int32_cpu
# Input: LENGTH: 9, M: 1, N: 2, MAX_LENGTH: 4, dtype: torch.int32, device: cpu
Forward Execution Time (us) : 155.408

# Benchmarking PyTorch: clip_ranges
# Mode: JIT
# Name: clip_ranges_LENGTH10_M1_N2_MAX_LENGTH5_dtypetorch.int32_cpu
# Input: LENGTH: 10, M: 1, N: 2, MAX_LENGTH: 5, dtype: torch.int32, device: cpu
Forward Execution Time (us) : 165.168
```

Compare with the old implementation, there are **around 300us gain**
```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: clip_ranges
# Mode: JIT
# Name: clip_ranges_LENGTH6_M1_N2_MAX_LENGTH1_dtypetorch.int32_cpu
# Input: LENGTH: 6, M: 1, N: 2, MAX_LENGTH: 1, dtype: torch.int32, device: cpu
Forward Execution Time (us) : 443.012

# Benchmarking PyTorch: clip_ranges
# Mode: JIT
# Name: clip_ranges_LENGTH7_M1_N2_MAX_LENGTH2_dtypetorch.int32_cpu
# Input: LENGTH: 7, M: 1, N: 2, MAX_LENGTH: 2, dtype: torch.int32, device: cpu
Forward Execution Time (us) : 446.480

# Benchmarking PyTorch: clip_ranges
# Mode: JIT
# Name: clip_ranges_LENGTH8_M1_N2_MAX_LENGTH3_dtypetorch.int32_cpu
# Input: LENGTH: 8, M: 1, N: 2, MAX_LENGTH: 3, dtype: torch.int32, device: cpu
Forward Execution Time (us) : 444.064

# Benchmarking PyTorch: clip_ranges
# Mode: JIT
# Name: clip_ranges_LENGTH9_M1_N2_MAX_LENGTH4_dtypetorch.int32_cpu
# Input: LENGTH: 9, M: 1, N: 2, MAX_LENGTH: 4, dtype: torch.int32, device: cpu
Forward Execution Time (us) : 445.511

# Benchmarking PyTorch: clip_ranges
# Mode: JIT
# Name: clip_ranges_LENGTH10_M1_N2_MAX_LENGTH5_dtypetorch.int32_cpu
# Input: LENGTH: 10, M: 1, N: 2, MAX_LENGTH: 5, dtype: torch.int32, device: cpu
Forward Execution Time (us) : 450.468
```

Reviewed By: MarcioPorto

Differential Revision: D24546110

fbshipit-source-id: e6c9b38e911f177f97961ede5bf375107f240363
2020-10-28 09:46:37 -07:00
..
cpp/tensorexpr [pytorch][te] Add compilation time benchmark (#46124) 2020-10-09 23:11:37 -07:00
distributed/ddp
fastrnns Benchmarks: tweak PE config settings. (#45349) 2020-09-26 23:13:29 -07:00
framework_overhead_benchmark Remove py2 compatible future imports (#44735) 2020-09-16 12:55:57 -07:00
functional_autograd_benchmark Reland of benchmark code (#43428) 2020-08-24 13:27:26 -07:00
operator_benchmark Implement clip_ranges operator for PyTorch 2020-10-28 09:46:37 -07:00
overrides_benchmark Add __torch_function__ for methods (#37091) 2020-08-05 20:44:13 -07:00
profiler_benchmark Source code level attribution in profiler (#43898) 2020-09-30 00:57:35 -07:00
record_function_benchmark Fix D23995953 import. 2020-09-29 19:30:23 -07:00
serialization
static_runtime [StaticRuntime] Threading model (#46219) 2020-10-20 14:37:30 -07:00
tensorexpr [NVFuser]Benchmark minor update (#46778) 2020-10-26 12:22:36 -07:00
compare-fastrnn-results.py Benchmarks: add scripts for FastRNNs results comparison. (#44134) 2020-09-03 13:44:42 -07:00
compare.sh Benchmarks: add scripts for FastRNNs results comparison. (#44134) 2020-09-03 13:44:42 -07:00
README.md
upload_scribe.py Benchmarks: make fuser and executor configurable from command line. (#44291) 2020-09-09 11:59:35 -07:00

PyTorch Benchmarks

NOTE: This folder is currently work in progress.

This folder contains scripts that produce reproducible timings of various PyTorch features.

It also provides mechanisms to compare PyTorch with other frameworks.

Setup environment

Make sure you're on a machine with CUDA, torchvision, and pytorch installed. Install in the following order:

# Install torchvision. It comes with the pytorch stable release binary
conda install pytorch torchvision -c pytorch

# Install the latest pytorch master from source.
# It should supersede the installation from the release binary.
cd $PYTORCH_HOME
python setup.py build develop

# Check the pytorch installation version
python -c "import torch; print(torch.__version__)"

Benchmark List

Please refer to each subfolder to discover each benchmark suite