pytorch/caffe2/operators
Aapo Kyrola 631971e459 threaded RNN executor for CPU, multi-stream executor CUDA
Summary:
Special executor for RNNs which can exploit parallelism over timesteps. For CPU we use multi-threading, achiving 3x or so improved on 4-layers LSTMs.
With CUDA, perf improvements are more modest, but the structure allows for optimizing it further. For CUDA, we use multiple streams and events if there is parallellism
over timesteps. In my experiments, it was not good to use more than 2 streams, though.

Flag --caffe2_rnn_executor can be used to switch the executor off.

Reviewed By: salexspb

Differential Revision: D5749304

fbshipit-source-id: d6f76b3e16598be5b4e8188aff031671ebafaa4c
2017-09-06 12:26:30 -07:00
..
abs_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
abs_op.cu Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
accumulate_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
accumulate_op.cu Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
accumulate_op.h
accuracy_op.cc change bunch of inexpensive DCHECKS to CAFFE_ENFORCEs 2017-07-28 11:35:19 -07:00
accuracy_op.cu change bunch of inexpensive DCHECKS to CAFFE_ENFORCEs 2017-07-28 11:35:19 -07:00
accuracy_op.h
apmeter_op.cc Implement APMeter op 2017-06-07 15:03:04 -07:00
apmeter_op.h Implement APMeter op 2017-06-07 15:03:04 -07:00
atomic_ops.cc Fix a few typos and grammars in comment 2017-06-14 18:22:39 -07:00
batch_box_cox_op.cc add box cox transform op 2017-04-27 22:06:43 -07:00
batch_box_cox_op.h add box cox transform op 2017-04-27 22:06:43 -07:00
batch_gather_ops.cc GPU version of BatchGatherOp 2017-08-17 18:31:10 -07:00
batch_gather_ops.cu GPU version of BatchGatherOp 2017-08-17 18:31:10 -07:00
batch_gather_ops.h BatchGatherOp 2017-07-27 10:17:42 -07:00
batch_matmul_op.cc shape inference for batchmatmul 2017-08-28 18:31:55 -07:00
batch_matmul_op.cu
batch_matmul_op.h
boolean_mask_ops.cc Added window mode for caffe2 sequence operator 2017-08-16 21:34:29 -07:00
boolean_mask_ops.cu Added window mode for caffe2 sequence operator 2017-08-16 21:34:29 -07:00
boolean_mask_ops.h Added window mode for caffe2 sequence operator 2017-08-16 21:34:29 -07:00
boolean_unmask_ops.cc Add CUDA implementation of BooleanUnmask and fixed some bugs in the test 2017-08-01 16:51:40 -07:00
boolean_unmask_ops.cu Add CUDA implementation of BooleanUnmask and fixed some bugs in the test 2017-08-01 16:51:40 -07:00
boolean_unmask_ops.h Add CUDA implementation of BooleanUnmask and fixed some bugs in the test 2017-08-01 16:51:40 -07:00
boolean_unmask_ops_test.cc Fix a bug in BooleanUnmaskOp 2017-06-20 08:34:09 -07:00
cast_op.cc Use cast::GetCastDataType to handle "from_type" and "to" arguments 2017-08-23 10:18:01 -07:00
cast_op.cu Support fp16 output from ImageInputOp 2017-04-28 14:50:47 -07:00
cast_op.h Nuke arg_helper() in OperatorBase 2017-07-19 13:52:39 -07:00
channel_shuffle_op.cc Opensourcing channel shuffle 2017-08-25 16:46:31 -07:00
channel_shuffle_op.h Opensourcing channel shuffle 2017-08-25 16:46:31 -07:00
channel_shuffle_op_gpu.cu Opensourcing channel shuffle 2017-08-25 16:46:31 -07:00
clip_op.cc change bunch of inexpensive DCHECKS to CAFFE_ENFORCEs 2017-07-28 11:35:19 -07:00
clip_op.cu change bunch of inexpensive DCHECKS to CAFFE_ENFORCEs 2017-07-28 11:35:19 -07:00
clip_op.h fix clip_op bug 2017-06-23 22:31:54 -07:00
CMakeLists.txt
communicator_op.cc DestroyCommonWorld op 2017-08-25 14:01:01 -07:00
communicator_op_gpu.cc
concat_split_op.cc Set default values for concat_split_op 2017-09-05 17:02:22 -07:00
concat_split_op.h Set default values for concat_split_op 2017-09-05 17:02:22 -07:00
concat_split_op_gpu.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
conditional_op.cc Use char-ngram embedding for out-of-vocabulary words 2017-09-01 19:16:49 -07:00
conditional_op.h Use char-ngram embedding for out-of-vocabulary words 2017-09-01 19:16:49 -07:00
conv_gradient_op.cc Adding 1d-2d-3d Schemas for Conv and Pool 2017-08-17 09:45:54 -07:00
conv_op.cc Accidental addition of a file 2017-08-31 20:17:12 -07:00
conv_op.h Per-workspace mutex for shared im2col buffer 2017-06-29 10:19:37 -07:00
conv_op_cache_cudnn.cc bindings 2017-07-21 19:03:43 -07:00
conv_op_cache_cudnn.h bindings 2017-07-21 19:03:43 -07:00
conv_op_cache_cudnn_test.cc
conv_op_cudnn.cc Adding 1d-2d-3d Schemas for Conv and Pool 2017-08-17 09:45:54 -07:00
conv_op_eigen.cc Adding 1d-2d-3d Schemas for Conv and Pool 2017-08-17 09:45:54 -07:00
conv_op_gpu.cc Adding 1d-2d-3d Schemas for Conv and Pool 2017-08-17 09:45:54 -07:00
conv_op_impl.h Per-workspace mutex for shared im2col buffer 2017-06-29 10:19:37 -07:00
conv_op_shared.cc Per-workspace mutex for shared im2col buffer 2017-06-29 10:19:37 -07:00
conv_op_shared.h Per-workspace mutex for shared im2col buffer 2017-06-29 10:19:37 -07:00
conv_op_shared_gpu.cc Per-workspace mutex for shared im2col buffer 2017-06-29 10:19:37 -07:00
conv_pool_op_base.h add conv flops inference 2017-08-31 14:18:21 -07:00
conv_transpose_gradient_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
conv_transpose_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
conv_transpose_op.h
conv_transpose_op_cudnn.cc Support new arguments in ConvTranspose 2017-08-31 11:17:32 -07:00
conv_transpose_op_gpu.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
conv_transpose_op_impl.h Support new arguments in ConvTranspose 2017-08-31 11:17:32 -07:00
conv_transpose_op_mobile.cc
conv_transpose_op_mobile.h Support new arguments in ConvTranspose 2017-08-31 11:17:32 -07:00
conv_transpose_op_mobile_impl.h Support new arguments in ConvTranspose 2017-08-31 11:17:32 -07:00
conv_transpose_op_mobile_test.cc Sync of codebases 2017-08-06 11:27:06 -07:00
conv_transpose_unpool_op_base.h Support new arguments in ConvTranspose 2017-08-31 11:17:32 -07:00
cos_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
cos_op.cu Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
cosine_embedding_criterion_op.cc
cosine_embedding_criterion_op.cu
cosine_embedding_criterion_op.h
counter_ops.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
counter_ops.h
counter_ops_gpu.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
cross_entropy_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
cross_entropy_op.cu Fixed cuda loss op 2017-08-30 17:02:23 -07:00
cross_entropy_op.h
dataset_ops.cc Option to enforce batch size 2017-08-01 22:29:55 -07:00
dataset_ops.h Sampling random negative based on sparse features 2017-08-10 15:27:18 -07:00
distance_op.cc Cannot divide on 0 2017-08-17 17:50:36 -07:00
distance_op.cu Cannot divide on 0 2017-08-17 17:50:36 -07:00
distance_op.h CosineSimilarity GPU 2017-07-25 13:34:01 -07:00
do_op.cc Control flow operators 2017-08-28 20:04:43 -07:00
do_op.h Control flow operators 2017-08-28 20:04:43 -07:00
dropout_op.cc shape inference for ReduceFront/Back/Sum/Mean, Gather and Dropout 2017-08-25 11:31:17 -07:00
dropout_op.cu change bunch of inexpensive DCHECKS to CAFFE_ENFORCEs 2017-07-28 11:35:19 -07:00
dropout_op.h change bunch of inexpensive DCHECKS to CAFFE_ENFORCEs 2017-07-28 11:35:19 -07:00
dropout_op_cudnn.cc protect cudnnSetDropoutDescriptor with mutex 2017-08-31 14:56:07 -07:00
elementwise_add_op.cc
elementwise_div_op.cc Add caffe2 operators to mobile: Log, StumpFunc, Div, Sub 2017-05-03 15:10:34 -07:00
elementwise_linear_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
elementwise_linear_op.cu Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
elementwise_linear_op.h Optimize memory usage for MI-LSTM 2017-05-10 16:53:43 -07:00
elementwise_logical_ops.cc
elementwise_logical_ops.h Add boolean type in input2 and input3 for caffe2: Where operator 2017-08-03 13:17:06 -07:00
elementwise_mul_op.cc
elementwise_op.cc Add caffe2 operators to mobile: Log, StumpFunc, Div, Sub 2017-05-03 15:10:34 -07:00
elementwise_op.cu Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
elementwise_op.h change bunch of inexpensive DCHECKS to CAFFE_ENFORCEs 2017-07-28 11:35:19 -07:00
elementwise_op_gpu_test.cc
elementwise_op_schema.cc Nuke arg_helper() in OperatorBase 2017-07-19 13:52:39 -07:00
elementwise_op_test.cc
elementwise_op_test.h
elementwise_sub_op.cc Add caffe2 operators to mobile: Log, StumpFunc, Div, Sub 2017-05-03 15:10:34 -07:00
elementwise_sum_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
elu_op.cc Vectorize ELU op on CPU 2017-08-10 21:52:49 -07:00
elu_op.cu ELU CUDA implementation 2017-06-21 11:47:13 -07:00
elu_op.h
exp_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
exp_op.cu Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
extend_tensor_op.cc
feed_blob_op.cc
feed_blob_op.h
filler_op.cc Caffe2: diagonal fill op 2017-08-16 13:05:11 -07:00
filler_op.cu Caffe2: diagonal fill op 2017-08-16 13:05:11 -07:00
filler_op.h Caffe2: diagonal fill op 2017-08-16 13:05:11 -07:00
find_duplicate_elements_op.cc
find_duplicate_elements_op.h
find_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
find_op.cu Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
find_op.h
free_op.cc
free_op.h
free_op_gpu.cc
fully_connected_op.cc fix FC shape inference 2017-08-28 16:08:07 -07:00
fully_connected_op.h Make FC op work with empty batch in cuda 2017-08-24 18:52:04 -07:00
fully_connected_op_gpu.cc Add TensorCore support 2017-08-10 20:16:48 -07:00
fully_connected_op_gpu_test.cc
fully_connected_op_test.cc
given_tensor_fill_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
given_tensor_fill_op.cu Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
given_tensor_fill_op.h
gru_unit_op.cc Implement CUDA version of GRU operator 2017-08-08 10:57:40 -07:00
gru_unit_op.h Implement CUDA version of GRU operator 2017-08-08 10:57:40 -07:00
gru_unit_op_gpu.cu Implement CUDA version of GRU operator 2017-08-08 10:57:40 -07:00
h_softmax_op.cc
h_softmax_op.h
half_float_ops.cc Improve float16 support 2017-08-23 16:33:07 -07:00
half_float_ops.cu Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
half_float_ops.h Improve float16 support 2017-08-23 16:33:07 -07:00
if_op.cc Control flow operators 2017-08-28 20:04:43 -07:00
if_op.h Control flow operators 2017-08-28 20:04:43 -07:00
im2col_op.cc Implement gradients for Col2Im and Im2Col operators 2017-08-07 15:51:30 -07:00
im2col_op.h
im2col_op_gpu.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
index_hash_ops.cc IndexHash 2017-07-07 23:06:11 -07:00
index_hash_ops.h IndexHash 2017-07-07 23:06:11 -07:00
index_ops.cc
instance_norm_gradient_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
instance_norm_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
instance_norm_op.cu
instance_norm_op.h
last_n_window_collector.cc Generalize LastNWindowCollector 2017-05-04 16:05:15 -07:00
layer_norm_op.cc IMplement layer normalization backward CPU 2017-08-17 11:17:46 -07:00
layer_norm_op.cu Implement layer norm gradient GPU 2017-08-17 11:17:46 -07:00
layer_norm_op.h Implement layer norm gradient GPU 2017-08-17 11:17:46 -07:00
leaky_relu_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
leaky_relu_op.cu Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
leaky_relu_op.h fbcode nnpack ops for Relu and LeakyRelu 2017-06-19 12:36:32 -07:00
lengths_reducer_ops.cc Fix SparseLengthSum undeclared schema 2017-07-25 18:19:10 -07:00
lengths_reducer_ops.h Scaffolding for perfkernels dispatch of embedding lookup 2017-07-30 12:34:23 -07:00
lengths_reducer_rowwise_8bit_ops.cc Rowwise quantization 2017-09-06 10:19:38 -07:00
lengths_reducer_rowwise_8bit_ops.h Rowwise quantization 2017-09-06 10:19:38 -07:00
lengths_tile_op.cc
lengths_tile_op.h
lengths_top_k_op.cc fix Windows build breaks by LengthsTopKOp 2017-08-08 18:06:24 -07:00
lengths_top_k_op.h implement LengthsTopK operator 2017-08-07 18:19:29 -07:00
load_save_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
load_save_op.h Added functionality that allows users to store huge blobs 2017-08-02 16:08:09 -07:00
load_save_op_gpu.cc Do CaffeCudaSetDevice and CaffeCudaGetDevice 2017-08-25 18:20:14 -07:00
local_response_normalization_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
local_response_normalization_op.cu Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
local_response_normalization_op.h Change default argument for LRN 2017-08-30 10:51:19 -07:00
local_response_normalization_op_cudnn.cc New cudnn ops 2017-05-08 16:33:21 -07:00
log_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
log_op.cu Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
logit_op.cc
loss_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
loss_op.cu Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
loss_op.h
lp_pool_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
lp_pool_op.cu Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
lpnorm_op.cc adding operator lp_norm to support calculating l1 norm and l2 norm 2017-08-02 15:09:08 -07:00
lpnorm_op.h adding operator lp_norm to support calculating l1 norm and l2 norm 2017-08-02 15:09:08 -07:00
lstm_unit_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
lstm_unit_op.h comment out unused parameters 2017-07-21 15:14:43 -07:00
lstm_unit_op_gpu.cu Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
map_ops.cc CreateMapOp 2017-08-09 13:32:19 -07:00
map_ops.h CreateMapOp 2017-08-09 13:32:19 -07:00
margin_ranking_criterion_op.cc
margin_ranking_criterion_op.cu
margin_ranking_criterion_op.h
math_ops.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
math_ops.cu Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
math_ops.h
matmul_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
matmul_op.h
matmul_op_gpu.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
max_pool_with_index.cu Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
max_pool_with_index.h New cudnn ops 2017-05-08 16:33:21 -07:00
mem_query_op.cu
merge_id_lists_op.cc Operator to Merge ID_LIST features 2017-08-17 01:16:00 -07:00
merge_id_lists_op.h Operator to Merge ID_LIST features 2017-08-17 01:16:00 -07:00
multi_class_accuracy_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
multi_class_accuracy_op.cu Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
multi_class_accuracy_op.h
negative_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
negative_op.cu Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
no_default_engine_op.h Rename def() to debug_def() 2017-07-17 23:50:01 -07:00
normalize_op.cc add axis argument to NormalizeOp and NormalizeGradientOp 2017-09-05 11:17:32 -07:00
normalize_op.cu add axis argument to NormalizeOp and NormalizeGradientOp 2017-09-05 11:17:32 -07:00
normalize_op.h add axis argument to NormalizeOp and NormalizeGradientOp 2017-09-05 11:17:32 -07:00
one_hot_ops.cc Caffe2: Write CUDA version of OneHot operator 2017-08-08 18:17:39 -07:00
one_hot_ops.cu Caffe2: Write CUDA version of OneHot operator 2017-08-08 18:17:39 -07:00
one_hot_ops.h Caffe2: Write CUDA version of OneHot operator 2017-08-08 18:17:39 -07:00
operator_fallback_gpu.h Rename def() to debug_def() 2017-07-17 23:50:01 -07:00
operator_fallback_gpu_test.cc
order_switch_ops.cc add dimension check to NHWC2NCHW shape inference 2017-07-27 09:54:44 -07:00
order_switch_ops.cu Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
order_switch_ops.h
pack_rnn_sequence_op.cc added PackRNNSequence and UnpackRNNSequence operators 2017-06-30 09:53:31 -07:00
pack_rnn_sequence_op.h added PackRNNSequence and UnpackRNNSequence operators 2017-06-30 09:53:31 -07:00
pack_segments.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
pack_segments.h
pack_segments_op_gpu.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
pad_op.cc
pad_op.h
pad_op_gpu.cu
partition_ops.cc
partition_ops.h
perplexity_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
perplexity_op.cu Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
perplexity_op.h
piecewise_linear_transform_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
piecewise_linear_transform_op.cu Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
piecewise_linear_transform_op.h Added PiecewiseLinearTransform CUDA Op 2017-07-07 15:20:00 -07:00
pool_gradient_op.cc Adding 1d-2d-3d Schemas for Conv and Pool 2017-08-17 09:45:54 -07:00
pool_op.cc Adding 1d-2d-3d Schemas for Conv and Pool 2017-08-17 09:45:54 -07:00
pool_op.cu Adding 1d-2d-3d Schemas for Conv and Pool 2017-08-17 09:45:54 -07:00
pool_op.h
pool_op_cudnn.cu Added fast path for CUDNN global max pooling 2017-08-23 16:33:06 -07:00
prefetch_op.h Make Context::FinishDeviceComputation throw instead of FATAL 2017-07-31 00:05:10 -07:00
prelu_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
prelu_op.cu Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
prelu_op.h
prepend_dim_op.cc PrependDimOp 2017-08-24 18:52:05 -07:00
prepend_dim_op.h PrependDimOp 2017-08-24 18:52:05 -07:00
prepend_dim_op_gpu.cc PrependDimOp 2017-08-24 18:52:05 -07:00
rank_loss_op.cc improve pair_wise_loss operator to support multiple sessions 2017-07-28 15:12:47 -07:00
rank_loss_op.h improve pair_wise_loss operator to support multiple sessions 2017-07-28 15:12:47 -07:00
recurrent_network_blob_fetcher_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
recurrent_network_blob_fetcher_op.h RNN Workspace Blob Extraction 2017-07-17 10:24:18 -07:00
recurrent_network_blob_fetcher_op_gpu.cc Fixed error when compiling with clang 2017-07-17 12:52:39 -07:00
recurrent_network_executor.cc threaded RNN executor for CPU, multi-stream executor CUDA 2017-09-06 12:26:30 -07:00
recurrent_network_executor.h threaded RNN executor for CPU, multi-stream executor CUDA 2017-09-06 12:26:30 -07:00
recurrent_network_executor_gpu.cc threaded RNN executor for CPU, multi-stream executor CUDA 2017-09-06 12:26:30 -07:00
recurrent_network_executor_gpu.h threaded RNN executor for CPU, multi-stream executor CUDA 2017-09-06 12:26:30 -07:00
recurrent_network_executor_incl.h threaded RNN executor for CPU, multi-stream executor CUDA 2017-09-06 12:26:30 -07:00
recurrent_network_op.cc threaded RNN executor for CPU, multi-stream executor CUDA 2017-09-06 12:26:30 -07:00
recurrent_network_op.h threaded RNN executor for CPU, multi-stream executor CUDA 2017-09-06 12:26:30 -07:00
recurrent_network_op_gpu.cc threaded RNN executor for CPU, multi-stream executor CUDA 2017-09-06 12:26:30 -07:00
recurrent_op_cudnn.cc Fix build 2017-08-07 15:34:49 -07:00
recurrent_op_cudnn.h
reducer_functors.h shape inference for ReduceFront/Back/Sum/Mean, Gather and Dropout 2017-08-25 11:31:17 -07:00
reduction_ops.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
reduction_ops.cu Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
reduction_ops.h Use the same schema of switching to device reduce sum for SumSqrElements 2017-07-05 10:52:17 -07:00
relu_op.cc change bunch of inexpensive DCHECKS to CAFFE_ENFORCEs 2017-07-28 11:35:19 -07:00
relu_op.cu change bunch of inexpensive DCHECKS to CAFFE_ENFORCEs 2017-07-28 11:35:19 -07:00
relu_op.h
relu_op_cudnn.cc
relu_op_fp16.cu change bunch of inexpensive DCHECKS to CAFFE_ENFORCEs 2017-07-28 11:35:19 -07:00
remove_data_blocks_op.cc
remove_data_blocks_op.h
replace_nan_op.cc
replace_nan_op.h
reservoir_sampling.cc Make reservoir sampling thread safe 2017-08-10 15:27:21 -07:00
reshape_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
reshape_op.h
reshape_op_gpu.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
reshape_op_gpu_test.cc
resize_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
resize_op.cu fix #983 by remove unsupported archs 2017-07-31 18:38:59 -07:00
resize_op.h added gradients for ResizeNearest (CPU + CUDA) and ref 2017-07-07 14:19:42 -07:00
reverse_packed_segs_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
reverse_packed_segs_op.cu Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
reverse_packed_segs_op.h
roi_pool_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
roi_pool_op.cu Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
roi_pool_op.h
rowmul_op.cc
rowmul_op.h Add row-wise broadcasting to "Where" operator 2017-04-27 12:31:54 -07:00
scale_op.cc
scale_op.h
scale_op_gpu.cc
segment_reduction_op.cc CUDA SparseLengthsWeightedSum 2017-08-22 15:42:02 -07:00
segment_reduction_op.h TensorInference function for LengthsSum and such 2017-08-31 09:32:48 -07:00
segment_reduction_op_gpu.cu CUDA SparseLengthsWeightedSum 2017-08-22 15:42:02 -07:00
sequence_ops.cc
sequence_ops.h
shape_op.cc move ShapeOp out from utility_ops 2017-08-23 16:33:06 -07:00
shape_op.h move ShapeOp out from utility_ops 2017-08-23 16:33:06 -07:00
shape_op_gpu.cc move ShapeOp out from utility_ops 2017-08-23 16:33:06 -07:00
sigmoid_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
sigmoid_op.cu Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
sin_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
sin_op.cu Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
sinusoid_position_encoding_op.cc Optimizations for Caffe2 SinusoidPositionEncodingOp 2017-08-22 00:04:06 -07:00
sinusoid_position_encoding_op.h Allow caffe2 to detect if cuda lib has been linked, and also fix oss build error. 2017-08-23 18:41:15 -07:00
slice_op.cc Return TensorInferenceFunction for SliceOp 2017-08-31 14:03:47 -07:00
slice_op.cu Move SliceOp outisde of utility_ops.h 2017-08-30 18:03:58 -07:00
slice_op.h Move SliceOp outisde of utility_ops.h 2017-08-30 18:03:58 -07:00
softmax_op.cc Update SoftmaxOp documentation: input not necessarily 2-D 2017-08-01 10:38:12 -07:00
softmax_op.h
softmax_op_cudnn.cc
softmax_ops.cu change bunch of inexpensive DCHECKS to CAFFE_ENFORCEs 2017-07-28 11:35:19 -07:00
softmax_shared.cc
softmax_shared.h
softmax_with_loss_op.cc change bunch of inexpensive DCHECKS to CAFFE_ENFORCEs 2017-07-28 11:35:19 -07:00
softmax_with_loss_op.h Use cub::DeviceReduce for faster math::Sum CUDA version 2017-06-30 15:04:06 -07:00
softplus_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
softplus_op.cu Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
softplus_op.h softplus op 2017-05-08 10:40:25 -07:00
softsign_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
softsign_op.cu Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
space_batch_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
space_batch_op.h comment out unused parameters 2017-07-21 15:14:43 -07:00
space_batch_op_gpu.cu Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
sparse_to_dense_mask_op.cc add gradient for SparseToDenseMask operator 2017-08-01 13:05:03 -07:00
sparse_to_dense_mask_op.h Add more enforces to SparseToDenseMask operator. 2017-09-02 02:16:24 -07:00
sparse_to_dense_op.cc EnsureDense/SparseToDense for CUDA 2017-09-01 09:33:05 -07:00
sparse_to_dense_op.cu EnsureDense/SparseToDense for CUDA 2017-09-01 09:33:05 -07:00
sparse_to_dense_op.h EnsureDense/SparseToDense for CUDA 2017-09-01 09:33:05 -07:00
spatial_batch_norm_gradient_op.cc change bunch of inexpensive DCHECKS to CAFFE_ENFORCEs 2017-07-28 11:35:19 -07:00
spatial_batch_norm_op.cc fix arxiv link to batch-norm paper 2017-08-30 07:51:13 -07:00
spatial_batch_norm_op.h
spatial_batch_norm_op_cudnn.cc Add TensorCore support 2017-08-10 20:16:48 -07:00
spatial_softmax_with_loss_op.cc change bunch of inexpensive DCHECKS to CAFFE_ENFORCEs 2017-07-28 11:35:19 -07:00
spatial_softmax_with_loss_op.h Use cub::DeviceReduce for faster math::Sum CUDA version 2017-06-30 15:04:06 -07:00
square_root_divide_op.cc float support for square root divide 2017-07-27 17:40:40 -07:00
square_root_divide_op.h float support for square root divide 2017-07-27 17:40:40 -07:00
stats_ops.cc Tuning number of parameter servers based on performance estimation job 2017-08-30 18:03:59 -07:00
stop_gradient.cc
stop_gradient.h
stop_gradient_gpu.cc
string_ops.cc Improve StringJoin operator 2017-08-01 19:03:43 -07:00
string_ops.h Improve StringJoin operator 2017-08-01 19:03:43 -07:00
string_ops_test.cc Improve StringJoin operator 2017-08-01 19:03:43 -07:00
summarize_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
summarize_op.cu Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
summarize_op.h
tanh_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
tanh_op.cu Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
tensor_protos_db_input.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
tensor_protos_db_input.h
tensor_protos_db_input_gpu.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
text_file_reader.cc
text_file_reader_utils.cc
text_file_reader_utils.h Fix a few typos and grammars in comment 2017-06-14 18:22:39 -07:00
text_file_reader_utils_test.cc
tile_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
tile_op.cu Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
tile_op.h Fix a few typos and grammars in comment 2017-06-14 18:22:39 -07:00
top_k.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
top_k.cu Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
top_k.h Implement TopKOp for GPU 2017-06-17 08:47:38 -07:00
top_k_heap_selection.cuh CUDA 9 support 2017-08-06 11:50:17 -07:00
top_k_radix_selection.cuh CUDA 9 support 2017-08-06 11:50:17 -07:00
transpose_op.cc HPTT 2017-08-29 21:06:40 -07:00
transpose_op.cu Relax dimension constraint in CUDA to 6 for Transpose 2017-08-30 17:02:21 -07:00
transpose_op.h fix perf bug in TransposeOp for CUDA 2017-06-27 15:27:28 -07:00
transpose_op_cudnn.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
tt_linear_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
tt_linear_op.h
utility_ops.cc Remove redundant tensor inference function 2017-08-31 09:17:43 -07:00
utility_ops.cu EnsureDense/SparseToDense for CUDA 2017-09-01 09:33:05 -07:00
utility_ops.h Move SliceOp outisde of utility_ops.h 2017-08-30 18:03:58 -07:00
utility_ops_gpu.cc move ShapeOp out from utility_ops 2017-08-23 16:33:06 -07:00
utility_ops_gpu_test.cc
utility_ops_test.cc
while_op.cc Control flow operators 2017-08-28 20:04:43 -07:00
while_op.h Control flow operators 2017-08-28 20:04:43 -07:00
workspace_ops.cc
zero_gradient_op.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00
zero_gradient_op.h ZeroGradient op 2017-06-08 16:02:38 -07:00
zero_gradient_op_gpu.cc Add linter for enforcing caffe operator documentation 2017-07-24 15:27:47 -07:00