pytorch/caffe2
Aapo Kyrola 631971e459 threaded RNN executor for CPU, multi-stream executor CUDA
Summary:
Special executor for RNNs which can exploit parallelism over timesteps. For CPU we use multi-threading, achiving 3x or so improved on 4-layers LSTMs.
With CUDA, perf improvements are more modest, but the structure allows for optimizing it further. For CUDA, we use multiple streams and events if there is parallellism
over timesteps. In my experiments, it was not good to use more than 2 streams, though.

Flag --caffe2_rnn_executor can be used to switch the executor off.

Reviewed By: salexspb

Differential Revision: D5749304

fbshipit-source-id: d6f76b3e16598be5b4e8188aff031671ebafaa4c
2017-09-06 12:26:30 -07:00
..
binaries Update the speed benchmark code 2017-09-01 23:16:39 -07:00
contrib Enable mpscnn only for 10.2 and above 2017-09-06 11:02:25 -07:00
core threaded RNN executor for CPU, multi-stream executor CUDA 2017-09-06 12:26:30 -07:00
cuda_rtc comment out unused parameters 2017-07-21 15:14:43 -07:00
db Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
distributed Fix build 2017-08-07 15:34:49 -07:00
experiments Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
image ImageInputOp_more_data_augmentation 2017-08-19 14:15:58 -07:00
mkl Fix more MKL build issues 2017-08-25 14:01:01 -07:00
mpi Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
operators threaded RNN executor for CPU, multi-stream executor CUDA 2017-09-06 12:26:30 -07:00
perfkernels Code generator for and high-performance emebding look-up kernels, supporting 2017-08-30 16:22:11 -07:00
proto Update proto definition 2017-08-22 19:01:18 -07:00
python threaded RNN executor for CPU, multi-stream executor CUDA 2017-09-06 12:26:30 -07:00
queue Ability to dequeue and concat multiple records in a single QueueDequeue op 2017-08-31 10:48:59 -07:00
sgd Replaced std::copysign(x) with (x > 0 ? 1 : -1) 2017-09-01 11:52:44 -07:00
test
transforms Common Subexpression Elimination 2017-08-18 16:31:48 -07:00
utils threaded RNN executor for CPU, multi-stream executor CUDA 2017-09-06 12:26:30 -07:00
video comment out unused parameters 2017-07-21 15:14:43 -07:00
CMakeLists.txt cmake: generate macros.h with configure_file() 2017-08-22 14:22:36 -07:00