pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-14 20:57:59 +00:00

History

Aapo Kyrola 631971e459 threaded RNN executor for CPU, multi-stream executor CUDA Summary: Special executor for RNNs which can exploit parallelism over timesteps. For CPU we use multi-threading, achiving 3x or so improved on 4-layers LSTMs. With CUDA, perf improvements are more modest, but the structure allows for optimizing it further. For CUDA, we use multiple streams and events if there is parallellism over timesteps. In my experiments, it was not good to use more than 2 streams, though. Flag --caffe2_rnn_executor can be used to switch the executor off. Reviewed By: salexspb Differential Revision: D5749304 fbshipit-source-id: d6f76b3e16598be5b4e8188aff031671ebafaa4c		2017-09-06 12:26:30 -07:00
..
abs_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
abs_op.cu	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
accumulate_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
accumulate_op.cu	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
accumulate_op.h
accuracy_op.cc	change bunch of inexpensive DCHECKS to CAFFE_ENFORCEs	2017-07-28 11:35:19 -07:00
accuracy_op.cu	change bunch of inexpensive DCHECKS to CAFFE_ENFORCEs	2017-07-28 11:35:19 -07:00
accuracy_op.h
apmeter_op.cc	Implement APMeter op	2017-06-07 15:03:04 -07:00
apmeter_op.h	Implement APMeter op	2017-06-07 15:03:04 -07:00
atomic_ops.cc	Fix a few typos and grammars in comment	2017-06-14 18:22:39 -07:00
batch_box_cox_op.cc	add box cox transform op	2017-04-27 22:06:43 -07:00
batch_box_cox_op.h	add box cox transform op	2017-04-27 22:06:43 -07:00
batch_gather_ops.cc	GPU version of BatchGatherOp	2017-08-17 18:31:10 -07:00
batch_gather_ops.cu	GPU version of BatchGatherOp	2017-08-17 18:31:10 -07:00
batch_gather_ops.h	BatchGatherOp	2017-07-27 10:17:42 -07:00
batch_matmul_op.cc	shape inference for batchmatmul	2017-08-28 18:31:55 -07:00
batch_matmul_op.cu
batch_matmul_op.h
boolean_mask_ops.cc	Added window mode for caffe2 sequence operator	2017-08-16 21:34:29 -07:00
boolean_mask_ops.cu	Added window mode for caffe2 sequence operator	2017-08-16 21:34:29 -07:00
boolean_mask_ops.h	Added window mode for caffe2 sequence operator	2017-08-16 21:34:29 -07:00
boolean_unmask_ops.cc	Add CUDA implementation of BooleanUnmask and fixed some bugs in the test	2017-08-01 16:51:40 -07:00
boolean_unmask_ops.cu	Add CUDA implementation of BooleanUnmask and fixed some bugs in the test	2017-08-01 16:51:40 -07:00
boolean_unmask_ops.h	Add CUDA implementation of BooleanUnmask and fixed some bugs in the test	2017-08-01 16:51:40 -07:00
boolean_unmask_ops_test.cc	Fix a bug in BooleanUnmaskOp	2017-06-20 08:34:09 -07:00
cast_op.cc	Use cast::GetCastDataType to handle "from_type" and "to" arguments	2017-08-23 10:18:01 -07:00
cast_op.cu	Support fp16 output from ImageInputOp	2017-04-28 14:50:47 -07:00
cast_op.h	Nuke arg_helper() in OperatorBase	2017-07-19 13:52:39 -07:00
channel_shuffle_op.cc	Opensourcing channel shuffle	2017-08-25 16:46:31 -07:00
channel_shuffle_op.h	Opensourcing channel shuffle	2017-08-25 16:46:31 -07:00
channel_shuffle_op_gpu.cu	Opensourcing channel shuffle	2017-08-25 16:46:31 -07:00
clip_op.cc	change bunch of inexpensive DCHECKS to CAFFE_ENFORCEs	2017-07-28 11:35:19 -07:00
clip_op.cu	change bunch of inexpensive DCHECKS to CAFFE_ENFORCEs	2017-07-28 11:35:19 -07:00
clip_op.h	fix clip_op bug	2017-06-23 22:31:54 -07:00
CMakeLists.txt
communicator_op.cc	DestroyCommonWorld op	2017-08-25 14:01:01 -07:00
communicator_op_gpu.cc
concat_split_op.cc	Set default values for concat_split_op	2017-09-05 17:02:22 -07:00
concat_split_op.h	Set default values for concat_split_op	2017-09-05 17:02:22 -07:00
concat_split_op_gpu.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
conditional_op.cc	Use char-ngram embedding for out-of-vocabulary words	2017-09-01 19:16:49 -07:00
conditional_op.h	Use char-ngram embedding for out-of-vocabulary words	2017-09-01 19:16:49 -07:00
conv_gradient_op.cc	Adding 1d-2d-3d Schemas for Conv and Pool	2017-08-17 09:45:54 -07:00
conv_op.cc	Accidental addition of a file	2017-08-31 20:17:12 -07:00
conv_op.h	Per-workspace mutex for shared im2col buffer	2017-06-29 10:19:37 -07:00
conv_op_cache_cudnn.cc	bindings	2017-07-21 19:03:43 -07:00
conv_op_cache_cudnn.h	bindings	2017-07-21 19:03:43 -07:00
conv_op_cache_cudnn_test.cc
conv_op_cudnn.cc	Adding 1d-2d-3d Schemas for Conv and Pool	2017-08-17 09:45:54 -07:00
conv_op_eigen.cc	Adding 1d-2d-3d Schemas for Conv and Pool	2017-08-17 09:45:54 -07:00
conv_op_gpu.cc	Adding 1d-2d-3d Schemas for Conv and Pool	2017-08-17 09:45:54 -07:00
conv_op_impl.h	Per-workspace mutex for shared im2col buffer	2017-06-29 10:19:37 -07:00
conv_op_shared.cc	Per-workspace mutex for shared im2col buffer	2017-06-29 10:19:37 -07:00
conv_op_shared.h	Per-workspace mutex for shared im2col buffer	2017-06-29 10:19:37 -07:00
conv_op_shared_gpu.cc	Per-workspace mutex for shared im2col buffer	2017-06-29 10:19:37 -07:00
conv_pool_op_base.h	add conv flops inference	2017-08-31 14:18:21 -07:00
conv_transpose_gradient_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
conv_transpose_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
conv_transpose_op.h
conv_transpose_op_cudnn.cc	Support new arguments in ConvTranspose	2017-08-31 11:17:32 -07:00
conv_transpose_op_gpu.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
conv_transpose_op_impl.h	Support new arguments in ConvTranspose	2017-08-31 11:17:32 -07:00
conv_transpose_op_mobile.cc
conv_transpose_op_mobile.h	Support new arguments in ConvTranspose	2017-08-31 11:17:32 -07:00
conv_transpose_op_mobile_impl.h	Support new arguments in ConvTranspose	2017-08-31 11:17:32 -07:00
conv_transpose_op_mobile_test.cc	Sync of codebases	2017-08-06 11:27:06 -07:00
conv_transpose_unpool_op_base.h	Support new arguments in ConvTranspose	2017-08-31 11:17:32 -07:00
cos_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
cos_op.cu	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
cosine_embedding_criterion_op.cc
cosine_embedding_criterion_op.cu
cosine_embedding_criterion_op.h
counter_ops.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
counter_ops.h
counter_ops_gpu.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
cross_entropy_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
cross_entropy_op.cu	Fixed cuda loss op	2017-08-30 17:02:23 -07:00
cross_entropy_op.h
dataset_ops.cc	Option to enforce batch size	2017-08-01 22:29:55 -07:00
dataset_ops.h	Sampling random negative based on sparse features	2017-08-10 15:27:18 -07:00
distance_op.cc	Cannot divide on 0	2017-08-17 17:50:36 -07:00
distance_op.cu	Cannot divide on 0	2017-08-17 17:50:36 -07:00
distance_op.h	CosineSimilarity GPU	2017-07-25 13:34:01 -07:00
do_op.cc	Control flow operators	2017-08-28 20:04:43 -07:00
do_op.h	Control flow operators	2017-08-28 20:04:43 -07:00
dropout_op.cc	shape inference for ReduceFront/Back/Sum/Mean, Gather and Dropout	2017-08-25 11:31:17 -07:00
dropout_op.cu	change bunch of inexpensive DCHECKS to CAFFE_ENFORCEs	2017-07-28 11:35:19 -07:00
dropout_op.h	change bunch of inexpensive DCHECKS to CAFFE_ENFORCEs	2017-07-28 11:35:19 -07:00
dropout_op_cudnn.cc	protect cudnnSetDropoutDescriptor with mutex	2017-08-31 14:56:07 -07:00
elementwise_add_op.cc
elementwise_div_op.cc	Add caffe2 operators to mobile: Log, StumpFunc, Div, Sub	2017-05-03 15:10:34 -07:00
elementwise_linear_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
elementwise_linear_op.cu	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
elementwise_linear_op.h	Optimize memory usage for MI-LSTM	2017-05-10 16:53:43 -07:00
elementwise_logical_ops.cc
elementwise_logical_ops.h	Add boolean type in input2 and input3 for caffe2: Where operator	2017-08-03 13:17:06 -07:00
elementwise_mul_op.cc
elementwise_op.cc	Add caffe2 operators to mobile: Log, StumpFunc, Div, Sub	2017-05-03 15:10:34 -07:00
elementwise_op.cu	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
elementwise_op.h	change bunch of inexpensive DCHECKS to CAFFE_ENFORCEs	2017-07-28 11:35:19 -07:00
elementwise_op_gpu_test.cc
elementwise_op_schema.cc	Nuke arg_helper() in OperatorBase	2017-07-19 13:52:39 -07:00
elementwise_op_test.cc
elementwise_op_test.h
elementwise_sub_op.cc	Add caffe2 operators to mobile: Log, StumpFunc, Div, Sub	2017-05-03 15:10:34 -07:00
elementwise_sum_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
elu_op.cc	Vectorize ELU op on CPU	2017-08-10 21:52:49 -07:00
elu_op.cu	ELU CUDA implementation	2017-06-21 11:47:13 -07:00
elu_op.h
exp_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
exp_op.cu	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
extend_tensor_op.cc
feed_blob_op.cc
feed_blob_op.h
filler_op.cc	Caffe2: diagonal fill op	2017-08-16 13:05:11 -07:00
filler_op.cu	Caffe2: diagonal fill op	2017-08-16 13:05:11 -07:00
filler_op.h	Caffe2: diagonal fill op	2017-08-16 13:05:11 -07:00
find_duplicate_elements_op.cc
find_duplicate_elements_op.h
find_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
find_op.cu	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
find_op.h
free_op.cc
free_op.h
free_op_gpu.cc
fully_connected_op.cc	fix FC shape inference	2017-08-28 16:08:07 -07:00
fully_connected_op.h	Make FC op work with empty batch in cuda	2017-08-24 18:52:04 -07:00
fully_connected_op_gpu.cc	Add TensorCore support	2017-08-10 20:16:48 -07:00
fully_connected_op_gpu_test.cc
fully_connected_op_test.cc
given_tensor_fill_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
given_tensor_fill_op.cu	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
given_tensor_fill_op.h
gru_unit_op.cc	Implement CUDA version of GRU operator	2017-08-08 10:57:40 -07:00
gru_unit_op.h	Implement CUDA version of GRU operator	2017-08-08 10:57:40 -07:00
gru_unit_op_gpu.cu	Implement CUDA version of GRU operator	2017-08-08 10:57:40 -07:00
h_softmax_op.cc
h_softmax_op.h
half_float_ops.cc	Improve float16 support	2017-08-23 16:33:07 -07:00
half_float_ops.cu	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
half_float_ops.h	Improve float16 support	2017-08-23 16:33:07 -07:00
if_op.cc	Control flow operators	2017-08-28 20:04:43 -07:00
if_op.h	Control flow operators	2017-08-28 20:04:43 -07:00
im2col_op.cc	Implement gradients for Col2Im and Im2Col operators	2017-08-07 15:51:30 -07:00
im2col_op.h
im2col_op_gpu.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
index_hash_ops.cc	IndexHash	2017-07-07 23:06:11 -07:00
index_hash_ops.h	IndexHash	2017-07-07 23:06:11 -07:00
index_ops.cc
instance_norm_gradient_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
instance_norm_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
instance_norm_op.cu
instance_norm_op.h
last_n_window_collector.cc	Generalize LastNWindowCollector	2017-05-04 16:05:15 -07:00
layer_norm_op.cc	IMplement layer normalization backward CPU	2017-08-17 11:17:46 -07:00
layer_norm_op.cu	Implement layer norm gradient GPU	2017-08-17 11:17:46 -07:00
layer_norm_op.h	Implement layer norm gradient GPU	2017-08-17 11:17:46 -07:00
leaky_relu_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
leaky_relu_op.cu	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
leaky_relu_op.h	fbcode nnpack ops for Relu and LeakyRelu	2017-06-19 12:36:32 -07:00
lengths_reducer_ops.cc	Fix SparseLengthSum undeclared schema	2017-07-25 18:19:10 -07:00
lengths_reducer_ops.h	Scaffolding for perfkernels dispatch of embedding lookup	2017-07-30 12:34:23 -07:00
lengths_reducer_rowwise_8bit_ops.cc	Rowwise quantization	2017-09-06 10:19:38 -07:00
lengths_reducer_rowwise_8bit_ops.h	Rowwise quantization	2017-09-06 10:19:38 -07:00
lengths_tile_op.cc
lengths_tile_op.h
lengths_top_k_op.cc	fix Windows build breaks by LengthsTopKOp	2017-08-08 18:06:24 -07:00
lengths_top_k_op.h	implement LengthsTopK operator	2017-08-07 18:19:29 -07:00
load_save_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
load_save_op.h	Added functionality that allows users to store huge blobs	2017-08-02 16:08:09 -07:00
load_save_op_gpu.cc	Do CaffeCudaSetDevice and CaffeCudaGetDevice	2017-08-25 18:20:14 -07:00
local_response_normalization_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
local_response_normalization_op.cu	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
local_response_normalization_op.h	Change default argument for LRN	2017-08-30 10:51:19 -07:00
local_response_normalization_op_cudnn.cc	New cudnn ops	2017-05-08 16:33:21 -07:00
log_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
log_op.cu	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
logit_op.cc
loss_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
loss_op.cu	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
loss_op.h
lp_pool_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
lp_pool_op.cu	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
lpnorm_op.cc	adding operator lp_norm to support calculating l1 norm and l2 norm	2017-08-02 15:09:08 -07:00
lpnorm_op.h	adding operator lp_norm to support calculating l1 norm and l2 norm	2017-08-02 15:09:08 -07:00
lstm_unit_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
lstm_unit_op.h	comment out unused parameters	2017-07-21 15:14:43 -07:00
lstm_unit_op_gpu.cu	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
map_ops.cc	CreateMapOp	2017-08-09 13:32:19 -07:00
map_ops.h	CreateMapOp	2017-08-09 13:32:19 -07:00
margin_ranking_criterion_op.cc
margin_ranking_criterion_op.cu
margin_ranking_criterion_op.h
math_ops.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
math_ops.cu	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
math_ops.h
matmul_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
matmul_op.h
matmul_op_gpu.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
max_pool_with_index.cu	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
max_pool_with_index.h	New cudnn ops	2017-05-08 16:33:21 -07:00
mem_query_op.cu
merge_id_lists_op.cc	Operator to Merge ID_LIST features	2017-08-17 01:16:00 -07:00
merge_id_lists_op.h	Operator to Merge ID_LIST features	2017-08-17 01:16:00 -07:00
multi_class_accuracy_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
multi_class_accuracy_op.cu	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
multi_class_accuracy_op.h
negative_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
negative_op.cu	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
no_default_engine_op.h	Rename def() to debug_def()	2017-07-17 23:50:01 -07:00
normalize_op.cc	add axis argument to NormalizeOp and NormalizeGradientOp	2017-09-05 11:17:32 -07:00
normalize_op.cu	add axis argument to NormalizeOp and NormalizeGradientOp	2017-09-05 11:17:32 -07:00
normalize_op.h	add axis argument to NormalizeOp and NormalizeGradientOp	2017-09-05 11:17:32 -07:00
one_hot_ops.cc	Caffe2: Write CUDA version of OneHot operator	2017-08-08 18:17:39 -07:00
one_hot_ops.cu	Caffe2: Write CUDA version of OneHot operator	2017-08-08 18:17:39 -07:00
one_hot_ops.h	Caffe2: Write CUDA version of OneHot operator	2017-08-08 18:17:39 -07:00
operator_fallback_gpu.h	Rename def() to debug_def()	2017-07-17 23:50:01 -07:00
operator_fallback_gpu_test.cc
order_switch_ops.cc	add dimension check to NHWC2NCHW shape inference	2017-07-27 09:54:44 -07:00
order_switch_ops.cu	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
order_switch_ops.h
pack_rnn_sequence_op.cc	added PackRNNSequence and UnpackRNNSequence operators	2017-06-30 09:53:31 -07:00
pack_rnn_sequence_op.h	added PackRNNSequence and UnpackRNNSequence operators	2017-06-30 09:53:31 -07:00
pack_segments.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
pack_segments.h
pack_segments_op_gpu.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
pad_op.cc
pad_op.h
pad_op_gpu.cu
partition_ops.cc
partition_ops.h
perplexity_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
perplexity_op.cu	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
perplexity_op.h
piecewise_linear_transform_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
piecewise_linear_transform_op.cu	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
piecewise_linear_transform_op.h	Added PiecewiseLinearTransform CUDA Op	2017-07-07 15:20:00 -07:00
pool_gradient_op.cc	Adding 1d-2d-3d Schemas for Conv and Pool	2017-08-17 09:45:54 -07:00
pool_op.cc	Adding 1d-2d-3d Schemas for Conv and Pool	2017-08-17 09:45:54 -07:00
pool_op.cu	Adding 1d-2d-3d Schemas for Conv and Pool	2017-08-17 09:45:54 -07:00
pool_op.h
pool_op_cudnn.cu	Added fast path for CUDNN global max pooling	2017-08-23 16:33:06 -07:00
prefetch_op.h	Make Context::FinishDeviceComputation throw instead of FATAL	2017-07-31 00:05:10 -07:00
prelu_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
prelu_op.cu	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
prelu_op.h
prepend_dim_op.cc	PrependDimOp	2017-08-24 18:52:05 -07:00
prepend_dim_op.h	PrependDimOp	2017-08-24 18:52:05 -07:00
prepend_dim_op_gpu.cc	PrependDimOp	2017-08-24 18:52:05 -07:00
rank_loss_op.cc	improve pair_wise_loss operator to support multiple sessions	2017-07-28 15:12:47 -07:00
rank_loss_op.h	improve pair_wise_loss operator to support multiple sessions	2017-07-28 15:12:47 -07:00
recurrent_network_blob_fetcher_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
recurrent_network_blob_fetcher_op.h	RNN Workspace Blob Extraction	2017-07-17 10:24:18 -07:00
recurrent_network_blob_fetcher_op_gpu.cc	Fixed error when compiling with clang	2017-07-17 12:52:39 -07:00
recurrent_network_executor.cc	threaded RNN executor for CPU, multi-stream executor CUDA	2017-09-06 12:26:30 -07:00
recurrent_network_executor.h	threaded RNN executor for CPU, multi-stream executor CUDA	2017-09-06 12:26:30 -07:00
recurrent_network_executor_gpu.cc	threaded RNN executor for CPU, multi-stream executor CUDA	2017-09-06 12:26:30 -07:00
recurrent_network_executor_gpu.h	threaded RNN executor for CPU, multi-stream executor CUDA	2017-09-06 12:26:30 -07:00
recurrent_network_executor_incl.h	threaded RNN executor for CPU, multi-stream executor CUDA	2017-09-06 12:26:30 -07:00
recurrent_network_op.cc	threaded RNN executor for CPU, multi-stream executor CUDA	2017-09-06 12:26:30 -07:00
recurrent_network_op.h	threaded RNN executor for CPU, multi-stream executor CUDA	2017-09-06 12:26:30 -07:00
recurrent_network_op_gpu.cc	threaded RNN executor for CPU, multi-stream executor CUDA	2017-09-06 12:26:30 -07:00
recurrent_op_cudnn.cc	Fix build	2017-08-07 15:34:49 -07:00
recurrent_op_cudnn.h
reducer_functors.h	shape inference for ReduceFront/Back/Sum/Mean, Gather and Dropout	2017-08-25 11:31:17 -07:00
reduction_ops.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
reduction_ops.cu	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
reduction_ops.h	Use the same schema of switching to device reduce sum for SumSqrElements	2017-07-05 10:52:17 -07:00
relu_op.cc	change bunch of inexpensive DCHECKS to CAFFE_ENFORCEs	2017-07-28 11:35:19 -07:00
relu_op.cu	change bunch of inexpensive DCHECKS to CAFFE_ENFORCEs	2017-07-28 11:35:19 -07:00
relu_op.h
relu_op_cudnn.cc
relu_op_fp16.cu	change bunch of inexpensive DCHECKS to CAFFE_ENFORCEs	2017-07-28 11:35:19 -07:00
remove_data_blocks_op.cc
remove_data_blocks_op.h
replace_nan_op.cc
replace_nan_op.h
reservoir_sampling.cc	Make reservoir sampling thread safe	2017-08-10 15:27:21 -07:00
reshape_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
reshape_op.h
reshape_op_gpu.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
reshape_op_gpu_test.cc
resize_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
resize_op.cu	fix #983 by remove unsupported archs	2017-07-31 18:38:59 -07:00
resize_op.h	added gradients for ResizeNearest (CPU + CUDA) and ref	2017-07-07 14:19:42 -07:00
reverse_packed_segs_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
reverse_packed_segs_op.cu	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
reverse_packed_segs_op.h
roi_pool_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
roi_pool_op.cu	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
roi_pool_op.h
rowmul_op.cc
rowmul_op.h	Add row-wise broadcasting to "Where" operator	2017-04-27 12:31:54 -07:00
scale_op.cc
scale_op.h
scale_op_gpu.cc
segment_reduction_op.cc	CUDA SparseLengthsWeightedSum	2017-08-22 15:42:02 -07:00
segment_reduction_op.h	TensorInference function for LengthsSum and such	2017-08-31 09:32:48 -07:00
segment_reduction_op_gpu.cu	CUDA SparseLengthsWeightedSum	2017-08-22 15:42:02 -07:00
sequence_ops.cc
sequence_ops.h
shape_op.cc	move ShapeOp out from utility_ops	2017-08-23 16:33:06 -07:00
shape_op.h	move ShapeOp out from utility_ops	2017-08-23 16:33:06 -07:00
shape_op_gpu.cc	move ShapeOp out from utility_ops	2017-08-23 16:33:06 -07:00
sigmoid_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
sigmoid_op.cu	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
sin_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
sin_op.cu	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
sinusoid_position_encoding_op.cc	Optimizations for Caffe2 SinusoidPositionEncodingOp	2017-08-22 00:04:06 -07:00
sinusoid_position_encoding_op.h	Allow caffe2 to detect if cuda lib has been linked, and also fix oss build error.	2017-08-23 18:41:15 -07:00
slice_op.cc	Return TensorInferenceFunction for SliceOp	2017-08-31 14:03:47 -07:00
slice_op.cu	Move SliceOp outisde of utility_ops.h	2017-08-30 18:03:58 -07:00
slice_op.h	Move SliceOp outisde of utility_ops.h	2017-08-30 18:03:58 -07:00
softmax_op.cc	Update SoftmaxOp documentation: input not necessarily 2-D	2017-08-01 10:38:12 -07:00
softmax_op.h
softmax_op_cudnn.cc
softmax_ops.cu	change bunch of inexpensive DCHECKS to CAFFE_ENFORCEs	2017-07-28 11:35:19 -07:00
softmax_shared.cc
softmax_shared.h
softmax_with_loss_op.cc	change bunch of inexpensive DCHECKS to CAFFE_ENFORCEs	2017-07-28 11:35:19 -07:00
softmax_with_loss_op.h	Use cub::DeviceReduce for faster math::Sum CUDA version	2017-06-30 15:04:06 -07:00
softplus_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
softplus_op.cu	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
softplus_op.h	softplus op	2017-05-08 10:40:25 -07:00
softsign_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
softsign_op.cu	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
space_batch_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
space_batch_op.h	comment out unused parameters	2017-07-21 15:14:43 -07:00
space_batch_op_gpu.cu	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
sparse_to_dense_mask_op.cc	add gradient for SparseToDenseMask operator	2017-08-01 13:05:03 -07:00
sparse_to_dense_mask_op.h	Add more enforces to SparseToDenseMask operator.	2017-09-02 02:16:24 -07:00
sparse_to_dense_op.cc	EnsureDense/SparseToDense for CUDA	2017-09-01 09:33:05 -07:00
sparse_to_dense_op.cu	EnsureDense/SparseToDense for CUDA	2017-09-01 09:33:05 -07:00
sparse_to_dense_op.h	EnsureDense/SparseToDense for CUDA	2017-09-01 09:33:05 -07:00
spatial_batch_norm_gradient_op.cc	change bunch of inexpensive DCHECKS to CAFFE_ENFORCEs	2017-07-28 11:35:19 -07:00
spatial_batch_norm_op.cc	fix arxiv link to batch-norm paper	2017-08-30 07:51:13 -07:00
spatial_batch_norm_op.h
spatial_batch_norm_op_cudnn.cc	Add TensorCore support	2017-08-10 20:16:48 -07:00
spatial_softmax_with_loss_op.cc	change bunch of inexpensive DCHECKS to CAFFE_ENFORCEs	2017-07-28 11:35:19 -07:00
spatial_softmax_with_loss_op.h	Use cub::DeviceReduce for faster math::Sum CUDA version	2017-06-30 15:04:06 -07:00
square_root_divide_op.cc	float support for square root divide	2017-07-27 17:40:40 -07:00
square_root_divide_op.h	float support for square root divide	2017-07-27 17:40:40 -07:00
stats_ops.cc	Tuning number of parameter servers based on performance estimation job	2017-08-30 18:03:59 -07:00
stop_gradient.cc
stop_gradient.h
stop_gradient_gpu.cc
string_ops.cc	Improve StringJoin operator	2017-08-01 19:03:43 -07:00
string_ops.h	Improve StringJoin operator	2017-08-01 19:03:43 -07:00
string_ops_test.cc	Improve StringJoin operator	2017-08-01 19:03:43 -07:00
summarize_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
summarize_op.cu	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
summarize_op.h
tanh_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
tanh_op.cu	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
tensor_protos_db_input.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
tensor_protos_db_input.h
tensor_protos_db_input_gpu.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
text_file_reader.cc
text_file_reader_utils.cc
text_file_reader_utils.h	Fix a few typos and grammars in comment	2017-06-14 18:22:39 -07:00
text_file_reader_utils_test.cc
tile_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
tile_op.cu	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
tile_op.h	Fix a few typos and grammars in comment	2017-06-14 18:22:39 -07:00
top_k.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
top_k.cu	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
top_k.h	Implement TopKOp for GPU	2017-06-17 08:47:38 -07:00
top_k_heap_selection.cuh	CUDA 9 support	2017-08-06 11:50:17 -07:00
top_k_radix_selection.cuh	CUDA 9 support	2017-08-06 11:50:17 -07:00
transpose_op.cc	HPTT	2017-08-29 21:06:40 -07:00
transpose_op.cu	Relax dimension constraint in CUDA to 6 for Transpose	2017-08-30 17:02:21 -07:00
transpose_op.h	fix perf bug in TransposeOp for CUDA	2017-06-27 15:27:28 -07:00
transpose_op_cudnn.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
tt_linear_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
tt_linear_op.h
utility_ops.cc	Remove redundant tensor inference function	2017-08-31 09:17:43 -07:00
utility_ops.cu	EnsureDense/SparseToDense for CUDA	2017-09-01 09:33:05 -07:00
utility_ops.h	Move SliceOp outisde of utility_ops.h	2017-08-30 18:03:58 -07:00
utility_ops_gpu.cc	move ShapeOp out from utility_ops	2017-08-23 16:33:06 -07:00
utility_ops_gpu_test.cc
utility_ops_test.cc
while_op.cc	Control flow operators	2017-08-28 20:04:43 -07:00
while_op.h	Control flow operators	2017-08-28 20:04:43 -07:00
workspace_ops.cc
zero_gradient_op.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00
zero_gradient_op.h	ZeroGradient op	2017-06-08 16:02:38 -07:00
zero_gradient_op_gpu.cc	Add linter for enforcing caffe operator documentation	2017-07-24 15:27:47 -07:00