pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-15 21:00:47 +00:00

History

Aapo Kyrola 1ed746df45 BatchMatMulOp: use cuBLAS batched strided gemm for CUDA Summary: Instead of doing gemms in a for-loop (which is not parallelized), it is much better to do the batched matmuls using CUDA 8's new batched-striped version of gemm. With the MT team's test, we get 5-10% improvement in overall walltime, so it is significant improvement: ---- Without batched gemm: I0328 10:46:48.118605 58068 prof_dag_net.cc:136] 424.757 ms/iter ( 283.878 ms/iter) RecurrentNetwork I0328 10:46:48.118609 58068 prof_dag_net.cc:136] 352.603 ms/iter ( 265.85 ms/iter) RecurrentNetworkGradient With batched gemm: I0328 10:53:48.169996 85617 prof_dag_net.cc:136] 407.438 ms/iter ( 269.564 ms/iter) RecurrentNetwork I0328 10:53:48.169999 85617 prof_dag_net.cc:136] 322.393 ms/iter ( 287.625 ms/iter) RecurrentNetworkGradient Reviewed By: jamesr66a Differential Revision: D4788272 fbshipit-source-id: 210e8b94c1e036b6ef0f039ce000d455258651f4		2017-03-28 11:54:09 -07:00
..
activation_ops_test.py	Caffe2: CUDA implementation for LeakyReluOp	2017-03-28 08:48:25 -07:00
atomic_ops_test.py
checkpoint_test.py
conv_test.py	Conv-ND NCHW CUP/CUDA implementation	2017-03-20 14:01:07 -07:00
conv_transpose_test.py
copy_ops_test.py	Reset workspace after each test in copy_ops_test	2017-03-24 12:20:34 -07:00
cosine_embedding_criterion_op_test.py
counter_ops_test.py	AtomicCounter to return previous value on Reset.	2017-02-02 14:59:30 -08:00
crf_test.py	CRF layer in caffe2	2017-03-23 22:02:02 -07:00
cross_entropy_ops_test.py	delete redundant comment lines.	2017-02-24 11:04:36 -08:00
dataset_ops_test.py	NextScopedBlob with well-defined behavior and respect namescope	2017-02-16 17:16:36 -08:00
duplicate_operands_test.py
elementwise_op_broadcast_test.py
elementwise_ops_test.py	Sqr op and gradient	2017-03-07 03:03:07 -08:00
emptysample_ops_test.py
extend_tensor_op_test.py
fc_operator_test.py	Test for FC operator + fix for docs	2017-01-27 10:44:24 -08:00
filler_ops_test.py	add exception for empty shape param	2017-03-10 00:33:59 -08:00
gather_ops_test.py
gather_ranges_op_test.py
given_tensor_fill_op_test.py	support fill bool tensors in GivenTensorFill	2017-03-02 20:18:59 -08:00
group_conv_test.py	Make all convolution operators allow optional bias term	2016-12-21 15:14:24 -08:00
hsm_test.py	Generate huffman tree	2017-01-19 16:14:23 -08:00
index_ops_test.py	Change the schema of IndexLoad & IndexFreeze so that state change is captured by the framework	2017-02-14 10:05:12 -08:00
instance_norm_test.py	instance norm test fix	2017-02-25 14:31:42 -08:00
margin_ranking_criterion_op_test.py
matmul_op_test.py	BatchMatMulOp: use cuBLAS batched strided gemm for CUDA	2017-03-28 11:54:09 -07:00
mkl_conv_op_test.py	MKL convolution operator	2017-01-23 09:59:30 -08:00
mkl_packed_fc_op_test.py	MKL convolution operator	2017-01-23 09:59:30 -08:00
mkl_speed_test.py	MKL convolution operator	2017-01-23 09:59:30 -08:00
momentum_sgd_test.py	SparseMomentumSGDUpdateOp	2017-03-28 07:47:46 -07:00
mpi_test.py
one_hot_ops_test.py
pack_ops_test.py	Registering GPU version of PackSegments using GPUFallbackOp	2017-03-24 16:01:53 -07:00
partition_ops_test.py
piecewise_linear_transform_test.py	PiecewiseLinearTransformOp transform binary predictions specially	2017-02-15 16:00:44 -08:00
pooling_test.py	Unit test for big batch size avg pooling	2017-01-18 19:29:20 -08:00
pow_op_test.py	CUDA version of elementwise power + rename to Pow + gradient	2017-03-07 10:20:40 -08:00
python_op_test.py
rank_loss_operator_test.py	Normalize rank loss gradient to avoid convergence issues when the number of pairs is really large	2016-12-21 17:29:24 -08:00
record_queue_test.py
recurrent_network_test.py	RNN: avoid copy for gradients of inputs to the rnn cell and save more memory!	2017-03-28 10:02:25 -07:00
reduce_ops_test.py	ReduceBack{Sum\|Mean}Op CPU & GPU implementation	2017-03-13 16:19:58 -07:00
relu_op_test.py
reshape_ops_test.py	Allow test discovery in caffe2/python/	2017-03-14 18:16:41 -07:00
resize_op_test.py	Add ResizeNearest operator	2017-03-16 18:49:01 -07:00
segment_ops_test.py	Allow test discovery in caffe2/python/	2017-03-14 18:16:41 -07:00
sequence_ops_test.py	add gpu support for caffe2-seq2seq	2017-03-17 05:19:14 -07:00
shape_inference_test.py	Bugfix: type not being set when inferring types+shapes	2017-03-15 18:48:40 -07:00
softmax_ops_test.py	add soft label functionality to softmax with loss op	2017-02-10 09:01:53 -08:00
sparse_gradient_checker_test.py
sparse_ops_test.py
spatial_bn_op_test.py
square_root_divide_op_test.py
stats_ops_test.py	Performance counters	2017-02-21 16:31:24 -08:00
string_ops_test.py
text_file_reader_test.py
tile_op_test.py	Caffe2: Tile operator	2017-02-28 23:17:26 -08:00
top_k_test.py	Implement TopK op in caffe2	2017-03-16 17:32:20 -07:00
unique_uniform_fill_op_test.py	UniqueUniformFillOp	2017-02-15 16:00:44 -08:00
utility_ops_test.py	Add gradient operator for SumElements	2017-03-07 20:03:07 -08:00