pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-15 21:00:47 +00:00

History

Scott Wolchok 4495b49ffa [PyTorch] Pass TensorOptions by value (#51165 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51165 `TensorOptions` does not have a non-trivial copy, move, or destroy operation and is small enough to fit in a register, so it seems like we should pass it by value. ghstack-source-id: 120697498 Test Plan: Measured timing for empty framework overhead benchmark before & after this change: Before: ``` I0126 16:02:50.662864 2137574 bench.cpp:139] Mean 0.268645 I0126 16:02:50.662891 2137574 bench.cpp:140] Median 0.267485 I0126 16:02:50.662896 2137574 bench.cpp:141] Min 0.266485 I0126 16:02:50.662901 2137574 bench.cpp:142] stddev 0.00219359 I0126 16:02:50.662915 2137574 bench.cpp:143] stddev / mean 0.00816537 2,968.37 msec task-clock # 0.997 CPUs utilized ( +- 0.03% ) 250 context-switches # 0.084 K/sec ( +- 2.21% ) 1 cpu-migrations # 0.000 K/sec 11,403 page-faults # 0.004 M/sec ( +- 0.28% ) 5,898,481,882 cycles # 1.987 GHz ( +- 0.03% ) (50.05%) 16,169,242,938 instructions # 2.74 insn per cycle ( +- 0.03% ) (50.06%) 3,076,546,626 branches # 1036.443 M/sec ( +- 0.05% ) (50.05%) 2,531,859 branch-misses # 0.08% of all branches ( +- 0.89% ) (50.03%) ``` After: ``` I0126 16:23:20.010062 2244624 bench.cpp:139] Mean 0.266814 I0126 16:23:20.010092 2244624 bench.cpp:140] Median 0.265759 I0126 16:23:20.010099 2244624 bench.cpp:141] Min 0.260291 I0126 16:23:20.010107 2244624 bench.cpp:142] stddev 0.00548279 I0126 16:23:20.010118 2244624 bench.cpp:143] stddev / mean 0.0205491 2,983.75 msec task-clock # 0.995 CPUs utilized ( +- 0.36% ) 243 context-switches # 0.082 K/sec ( +- 1.26% ) 1 cpu-migrations # 0.000 K/sec 11,422 page-faults # 0.004 M/sec ( +- 0.18% ) 5,928,639,486 cycles # 1.987 GHz ( +- 0.36% ) (50.02%) 16,105,928,210 instructions # 2.72 insn per cycle ( +- 0.05% ) (50.02%) 3,150,273,453 branches # 1055.809 M/sec ( +- 0.03% ) (50.05%) 3,713,617 branch-misses # 0.12% of all branches ( +- 0.83% ) (50.07%) ``` It looked close to neutral, so I used `perf stat` to confirm it's about a 1% instruction count win. For deciding whether this stack is worth it, I went back and ran `perf stat` on the baseline diff before I started touching the dispatcher: ``` 2,968.37 msec task-clock # 0.997 CPUs utilized ( +- 0.03% ) 250 context-switches # 0.084 K/sec ( +- 2.21% ) 1 cpu-migrations # 0.000 K/sec 11,403 page-faults # 0.004 M/sec ( +- 0.28% ) 5,898,481,882 cycles # 1.987 GHz ( +- 0.03% ) (50.05%) 16,169,242,938 instructions # 2.74 insn per cycle ( +- 0.03% ) (50.06%) 3,076,546,626 branches # 1036.443 M/sec ( +- 0.05% ) (50.05%) 2,531,859 branch-misses # 0.08% of all branches ( +- 0.89% ) (50.03%) ``` If I've done the arithmetic correctly, we have an 0.39% instruction count win. Reviewed By: ezyang Differential Revision: D25983863 fbshipit-source-id: 87d1451a01ead25738ea6b80db270d344bc583b2		2021-02-01 12:40:08 -08:00
..
__init__.py
autograd.py	[pytorch][codegen] migrate gen_variable_type to new data model (#49735 )	2021-01-05 14:12:39 -08:00
cpp.py	[PyTorch] Pass TensorOptions by value (#51165 )	2021-02-01 12:40:08 -08:00
dispatcher.py	Remove codegen logic to support non-c10-full ops (#49164 )	2021-01-06 14:17:36 -08:00
meta.py	Introduce tools.codegen.api.translate (#49122 )	2020-12-16 16:18:40 -08:00
native.py	[pytorch] fix ConstRefCType usage in codegen/api/native.py (#50742 )	2021-01-20 15:01:37 -08:00
python.py	Remove codegen logic to support non-c10-full ops (#49164 )	2021-01-06 14:17:36 -08:00
translate.py	Introduce tools.codegen.api.translate (#49122 )	2020-12-16 16:18:40 -08:00
types.py	Add at::cpu namespace of functions for structured kernels (#49505 )	2021-01-22 13:11:59 -08:00