pytorch/tools/codegen/api
Scott Wolchok 4495b49ffa [PyTorch] Pass TensorOptions by value (#51165)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51165

`TensorOptions` does not have a non-trivial copy, move, or
destroy operation and is small enough to fit in a register, so it
seems like we should pass it by value.
ghstack-source-id: 120697498

Test Plan:
Measured timing for empty framework overhead benchmark before & after this change:

Before:
```
I0126 16:02:50.662864 2137574 bench.cpp:139] Mean 0.268645
I0126 16:02:50.662891 2137574 bench.cpp:140] Median 0.267485
I0126 16:02:50.662896 2137574 bench.cpp:141] Min 0.266485
I0126 16:02:50.662901 2137574 bench.cpp:142] stddev 0.00219359
I0126 16:02:50.662915 2137574 bench.cpp:143] stddev / mean 0.00816537

          2,968.37 msec task-clock                #    0.997 CPUs utilized            ( +-  0.03% )
               250      context-switches          #    0.084 K/sec                    ( +-  2.21% )
                 1      cpu-migrations            #    0.000 K/sec
            11,403      page-faults               #    0.004 M/sec                    ( +-  0.28% )
     5,898,481,882      cycles                    #    1.987 GHz                      ( +-  0.03% )  (50.05%)
    16,169,242,938      instructions              #    2.74  insn per cycle           ( +-  0.03% )  (50.06%)
     3,076,546,626      branches                  # 1036.443 M/sec                    ( +-  0.05% )  (50.05%)
         2,531,859      branch-misses             #    0.08% of all branches          ( +-  0.89% )  (50.03%)
```

After:
```
I0126 16:23:20.010062 2244624 bench.cpp:139] Mean 0.266814
I0126 16:23:20.010092 2244624 bench.cpp:140] Median 0.265759
I0126 16:23:20.010099 2244624 bench.cpp:141] Min 0.260291
I0126 16:23:20.010107 2244624 bench.cpp:142] stddev 0.00548279
I0126 16:23:20.010118 2244624 bench.cpp:143] stddev / mean 0.0205491

          2,983.75 msec task-clock                #    0.995 CPUs utilized            ( +-  0.36% )
               243      context-switches          #    0.082 K/sec                    ( +-  1.26% )
                 1      cpu-migrations            #    0.000 K/sec
            11,422      page-faults               #    0.004 M/sec                    ( +-  0.18% )
     5,928,639,486      cycles                    #    1.987 GHz                      ( +-  0.36% )  (50.02%)
    16,105,928,210      instructions              #    2.72  insn per cycle           ( +-  0.05% )  (50.02%)
     3,150,273,453      branches                  # 1055.809 M/sec                    ( +-  0.03% )  (50.05%)
         3,713,617      branch-misses             #    0.12% of all branches          ( +-  0.83% )  (50.07%)

```

It looked close to neutral, so I used `perf stat` to confirm it's about a 1% instruction count win.

For deciding whether this stack is worth it, I went back and ran `perf stat` on the baseline diff before I started touching the dispatcher:

```
          2,968.37 msec task-clock                #    0.997 CPUs utilized            ( +-  0.03% )
               250      context-switches          #    0.084 K/sec                    ( +-  2.21% )
                 1      cpu-migrations            #    0.000 K/sec
            11,403      page-faults               #    0.004 M/sec                    ( +-  0.28% )
     5,898,481,882      cycles                    #    1.987 GHz                      ( +-  0.03% )  (50.05%)
    16,169,242,938      instructions              #    2.74  insn per cycle           ( +-  0.03% )  (50.06%)
     3,076,546,626      branches                  # 1036.443 M/sec                    ( +-  0.05% )  (50.05%)
         2,531,859      branch-misses             #    0.08% of all branches          ( +-  0.89% )  (50.03%)
```

If I've done the arithmetic correctly, we have an 0.39% instruction count win.

Reviewed By: ezyang

Differential Revision: D25983863

fbshipit-source-id: 87d1451a01ead25738ea6b80db270d344bc583b2
2021-02-01 12:40:08 -08:00
..
__init__.py
autograd.py [pytorch][codegen] migrate gen_variable_type to new data model (#49735) 2021-01-05 14:12:39 -08:00
cpp.py [PyTorch] Pass TensorOptions by value (#51165) 2021-02-01 12:40:08 -08:00
dispatcher.py Remove codegen logic to support non-c10-full ops (#49164) 2021-01-06 14:17:36 -08:00
meta.py Introduce tools.codegen.api.translate (#49122) 2020-12-16 16:18:40 -08:00
native.py [pytorch] fix ConstRefCType usage in codegen/api/native.py (#50742) 2021-01-20 15:01:37 -08:00
python.py Remove codegen logic to support non-c10-full ops (#49164) 2021-01-06 14:17:36 -08:00
translate.py Introduce tools.codegen.api.translate (#49122) 2020-12-16 16:18:40 -08:00
types.py Add at::cpu namespace of functions for structured kernels (#49505) 2021-01-22 13:11:59 -08:00