pytorch/caffe2/python
Aapo Kyrola 453c60ce28 Threaded dependency-aware RNNExecutor (frontier/diagonal execution).
Summary:
This diff adds dependency-aware concurrent/parallel execution of operators in stepnets. For CPU, we use multi-threaded execution. For CUDA, we use multiple streams and cuda events for parallelism and dependency tracking.

Much of the diff is about computing dependency graph, which was quite tricky because we need to also avoid write-races of multiple operators running in multiple timesteps in parallel. Also, recurrent blobs "change name" when passing over timestep ("_prev"), so that needs to be handled as well.

This diff also restores the link-ops that I unlanded earlier.

The performance gain of this diff is very good for CPU (same perf as with static_dag, even better on forward-only). On CUDA, the gains are modest, at least with the sizes i was testing with.

Reviewed By: salexspb

Differential Revision: D5001637

fbshipit-source-id: 3d0a71593d73a9ff22f4c1a5c9abf2a4a0c633c8
2017-08-15 23:55:15 -07:00
..
docs Dict fixes/improvements and unittest targets for Python 3 in caffe2 core 2017-06-29 17:05:41 -07:00
examples Important typo in resnet50_trainer 2017-08-15 19:03:15 -07:00
helpers Add TensorCore support 2017-08-10 20:16:48 -07:00
layers Make reservoir sampling thread safe 2017-08-10 15:27:21 -07:00
mint
mkl Support grouped convolutions in MKL 2017-07-25 14:19:02 -07:00
modeling allow param_info to set optimizer 2017-07-12 08:49:48 -07:00
models rectify args btw. train and translate 2017-08-10 15:27:18 -07:00
operator_test Threaded dependency-aware RNNExecutor (frontier/diagonal execution). 2017-08-15 23:55:15 -07:00
predictor Dict fixes/improvements and unittest targets for Python 3 in caffe2 core 2017-06-29 17:05:41 -07:00
rnn Revert D5589309: modify _LSTM into _RNN to adapt GRU 2017-08-10 16:42:41 -07:00
_import_c_extension.py
allcompare_test.py Adding AllCompare-like function to data_parallel_model 2017-07-13 13:03:57 -07:00
attention.py Reduce memory usage for dot attention 2017-08-14 12:35:50 -07:00
binarysize.py binary size util 2017-07-14 17:49:24 -07:00
brew.py Adding tanh to brew 2017-07-11 18:17:52 -07:00
brew_test.py Allow passing unsymmetric 2d kernels to brew.conv. 2017-08-10 15:27:16 -07:00
caffe_translator.py Read pretrained weights using binary mode in caffe_translator.py 2017-07-08 10:17:57 -07:00
caffe_translator_test.py
checkpoint.py Temporarily disables the checkpoints for the readers. 2017-08-15 19:36:11 -07:00
checkpoint_test.py Fixes the flaky upload test 2017-08-11 18:58:24 -07:00
CMakeLists.txt
cnn.py
context.py
context_test.py
control.py Dict fixes/improvements and unittest targets for Python 3 in caffe2 core 2017-06-29 17:05:41 -07:00
control_test.py
convnet_benchmarks.py brew API in convnet benchmark 2017-07-05 10:34:48 -07:00
convnet_benchmarks_test.py
core.py Add support for specifying engine preferences 2017-08-09 00:47:18 -07:00
core_gradients_test.py warn about orphan StopGradient output 2017-07-20 21:41:41 -07:00
core_test.py quick fix inplace blob bug 2017-07-23 02:18:16 -07:00
crf.py Deprecate CNNModelHelper in python/crf.py 2017-06-14 08:49:27 -07:00
data_parallel_model.py DataParallelModel: take param_init_net into account in _InferBlobDevice 2017-08-15 12:06:46 -07:00
data_parallel_model_test.py disable travis test for dpm test 2017-08-15 19:17:41 -07:00
data_workers.py Caffe2: Refactor the core logic from data_workers.py into parallel_workers.py 2017-08-07 10:14:08 -07:00
data_workers_test.py Caffe2: Refactor the core logic from data_workers.py into parallel_workers.py 2017-08-07 10:14:08 -07:00
dataio.py Fix a few typos and grammars in comment 2017-06-14 18:22:39 -07:00
dataio_test.py Allow tasks/execution_steps to be cloned at runtime 2017-06-20 22:32:07 -07:00
dataset.py Option to enforce batch size 2017-08-01 22:29:55 -07:00
db_test.py
device_checker.py Dict fixes/improvements and unittest targets for Python 3 in caffe2 core 2017-06-29 17:05:41 -07:00
dyndep.py
embedding_generation_benchmark.py Benchmark for embedding generation 2017-08-15 14:22:41 -07:00
empty.so
experiment_util.py Dict fixes/improvements and unittest targets for Python 3 in caffe2 core 2017-06-29 17:05:41 -07:00
extension_loader.py
gradient_check_test.py Cos, Sin, and Abs operators 2017-07-03 22:18:32 -07:00
gradient_checker.py Fix a few typos and grammars in comment 2017-06-14 18:22:39 -07:00
gru_cell.py Revert D5589309: modify _LSTM into _RNN to adapt GRU 2017-08-10 16:42:41 -07:00
hsm_util.py
hypothesis_test.py Implement gradients for Col2Im and Im2Col operators 2017-08-07 15:51:30 -07:00
hypothesis_test_util.py Add CUDA implementation of BooleanUnmask and fixed some bugs in the test 2017-08-01 16:51:40 -07:00
layer_model_helper.py Adding parameter sharing API to Dper2 2017-08-03 00:33:18 -07:00
layer_model_instantiator.py saving/loading CPU/GPU nets 2017-07-23 02:18:15 -07:00
layer_parameter_sharing_test.py Adding parameter sharing API to Dper2 2017-08-03 00:33:18 -07:00
layer_test_util.py Add a method to run a train net multiple times in layer_test_util.py 2017-07-28 19:56:05 -07:00
layers_test.py Add conv layer and layer tests 2017-08-08 10:57:43 -07:00
load_save_test.py
lstm_benchmark.py Threaded dependency-aware RNNExecutor (frontier/diagonal execution). 2017-08-15 23:55:15 -07:00
memonger.py check _grad suffix 2017-08-14 19:47:59 -07:00
memonger_test.py fix for duplicate input case 2017-07-13 01:51:30 -07:00
mkl_test_util.py Implement a filler op test 2017-07-25 14:18:57 -07:00
model_device_test.py Deprecate CNNModelHelper in caffe2/python/model_device_test.py 2017-06-22 15:37:17 -07:00
model_helper.py ExtractPredictorNet should strip gpu_id prefix from step_net 2017-07-27 16:06:47 -07:00
mpi_python.cc
muji.py
muji_test.py
net_builder.py Allow tasks/execution_steps to be cloned at runtime 2017-06-20 22:32:07 -07:00
net_builder_test.py Allow tasks/execution_steps to be cloned at runtime 2017-06-20 22:32:07 -07:00
net_drawer.py Dict fixes/improvements and unittest targets for Python 3 in caffe2 core 2017-06-29 17:05:41 -07:00
net_printer.py net_printer.to_string() accepts NetDef 2017-08-01 10:17:29 -07:00
net_printer_test.py Allow tasks/execution_steps to be cloned at runtime 2017-06-20 22:32:07 -07:00
optimizer.py Populate learning rate blob name into data_parallel_model and fix resnet50_trainer example. 2017-07-21 16:24:10 -07:00
optimizer_context.py allow param_info to set optimizer 2017-07-12 08:49:48 -07:00
optimizer_test.py Always use assertAlmostEqual for floats when crossing python and C boundaries 2017-08-06 14:51:11 -07:00
optimizer_test_util.py
parallel_workers.py Caffe2 [easy]: Better exception logging in parallel_workers/data_workers 2017-08-10 15:27:19 -07:00
parallel_workers_test.py Caffe2: Refactor the core logic from data_workers.py into parallel_workers.py 2017-08-07 10:14:08 -07:00
parallelize_gpu_bmuf_distributed_test.py Added Nesterov 2017-08-11 13:52:43 -07:00
pipeline.py Enable runtime cloning of tasks. 2017-06-21 03:18:20 -07:00
predictor_constants.py
pybind_state.cc Add support for specifying engine preferences 2017-08-09 00:47:18 -07:00
pybind_state.h fast simple-net memonger for C++ 2017-07-06 15:17:07 -07:00
pybind_state_gpu.cc
pybind_state_mkl.cc MKL code move 2017-07-26 20:21:55 -07:00
python_op_test.py Fix some typos 2017-06-28 13:50:48 -07:00
queue_util.py
record_queue.py Fix a few typos and grammars in comment 2017-06-14 18:22:39 -07:00
recurrent.py Threaded dependency-aware RNNExecutor (frontier/diagonal execution). 2017-08-15 23:55:15 -07:00
rnn_cell.py Revert D5589309: modify _LSTM into _RNN to adapt GRU 2017-08-10 16:42:41 -07:00
schema.py Return empty Struct when get_field has empty input 2017-08-01 19:49:47 -07:00
schema_test.py Return empty Struct when get_field has empty input 2017-08-01 19:49:47 -07:00
scope.py Dict fixes/improvements and unittest targets for Python 3 in caffe2 core 2017-06-29 17:05:41 -07:00
scope_test.py
session.py Allow tasks/execution_steps to be cloned at runtime 2017-06-20 22:32:07 -07:00
session_test.py
sparse_to_dense_mask_test.py
task.py Dict fixes/improvements and unittest targets for Python 3 in caffe2 core 2017-06-29 17:05:41 -07:00
test_util.py
text_file_reader.py
timeout_guard.py Dict fixes/improvements and unittest targets for Python 3 in caffe2 core 2017-06-29 17:05:41 -07:00
toy_regression_test.py
tt_core.py Fix a few typos and grammars in comment 2017-06-14 18:22:39 -07:00
tt_core_test.py
utils.py include numpy's other 32bit int type 2017-08-01 13:53:11 -07:00
visualize.py Python 3 compatible integer division 2017-07-06 11:47:12 -07:00
workspace.py Transforms in Python 2017-08-01 16:51:38 -07:00
workspace_test.py Transforms in Python 2017-08-01 16:51:38 -07:00