pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-15 21:00:47 +00:00

History

Horace He 416357648c Optimize alias analysis (#20899 ) Summary: # Overall Improvements 1. Switched from using `unordered_set` to sparse bitset. 1. Prevent some excessive memory allocations (thanks to resistor ) 1. Take advantage of the sparse bitset operations 1. Switch to `flat_hash_map` instead of `unordered_map` in some places. # Benchmarks (somewhat approximate, best of a couple runs) 1. InceptionNet (load + one forward pass): 19.8->13.3 1. GoogleNet(load + one forward pass): 10.0 -> 7.24 1. DenseNet (only load): 7.3 -> 5.3 I use the `sparse bitset` taken from https://llvm.org/doxygen/SparseBitVector_8h_source.html. I had to make some modifications to use `__builtin_popcountl` and instructions like that instead of other transitive clang dependencies. ## Some notes on our graph topologies In general, our graphs are very sparse, and most of the components aren't connected. For GoogleNet, we have 200k nodes, we do 2k `mayAlias` queries, and the sum of magnitudes of sets at each node is 500k (ie: every node, on average, reaches 2.5 leaves). PS: Holy crap macbooks throttle an insane amount with the default fan settings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20899 Differential Revision: D15564612 Pulled By: Chillee fbshipit-source-id: 2a293a21a9be25f942ca888c8f225cab32bbfcd0		2019-05-30 15:37:50 -07:00
..
core	Native ATen/Parallel backend (#20087 )	2019-05-28 01:40:54 -07:00
cuda	Make CUDACachingAllocator::recordStream() a no-op on null ptrs (#20658 )	2019-05-20 07:13:51 -07:00
hip	Revert "remove use of tmp_install" (#15847 )	2019-01-08 16:30:19 -08:00
macros	Lightweight at-most-once logging for API usage (#20745 )	2019-05-23 23:17:59 -07:00
test	Explicitly define supported types (#19516 )	2019-04-22 16:31:28 -07:00
util	Optimize alias analysis (#20899 )	2019-05-30 15:37:50 -07:00
CMakeLists.txt	Move schema inference to c10 (#18090 )	2019-03-21 14:57:30 -07:00