pytorch/c10
Horace He 416357648c Optimize alias analysis (#20899)
Summary:
# Overall Improvements
1. Switched from using `unordered_set` to sparse bitset.
1. Prevent some excessive memory allocations (thanks to resistor )
1. Take advantage of the sparse bitset operations
1. Switch to `flat_hash_map` instead of `unordered_map` in some places.

# Benchmarks (somewhat approximate, best of a couple runs)
1. InceptionNet (load + one forward pass): 19.8->13.3
1. GoogleNet(load + one forward pass): 10.0 -> 7.24
1. DenseNet (only load): 7.3 -> 5.3

I use the `sparse bitset` taken from https://llvm.org/doxygen/SparseBitVector_8h_source.html. I had to make some modifications to use `__builtin_popcountl` and instructions like that instead of other transitive clang dependencies.

## Some notes on our graph topologies
In general, our graphs are very sparse, and most of the components aren't connected. For GoogleNet, we have 200k nodes, we do 2k `mayAlias` queries, and the sum of magnitudes of sets at each node is 500k (ie: every node, on average, reaches 2.5 leaves).

PS: Holy crap macbooks throttle an insane amount with the default fan settings.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20899

Differential Revision: D15564612

Pulled By: Chillee

fbshipit-source-id: 2a293a21a9be25f942ca888c8f225cab32bbfcd0
2019-05-30 15:37:50 -07:00
..
core Native ATen/Parallel backend (#20087) 2019-05-28 01:40:54 -07:00
cuda Make CUDACachingAllocator::recordStream() a no-op on null ptrs (#20658) 2019-05-20 07:13:51 -07:00
hip Revert "remove use of tmp_install" (#15847) 2019-01-08 16:30:19 -08:00
macros Lightweight at-most-once logging for API usage (#20745) 2019-05-23 23:17:59 -07:00
test Explicitly define supported types (#19516) 2019-04-22 16:31:28 -07:00
util Optimize alias analysis (#20899) 2019-05-30 15:37:50 -07:00
CMakeLists.txt Move schema inference to c10 (#18090) 2019-03-21 14:57:30 -07:00