pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-15 21:00:47 +00:00

Author	SHA1	Message	Date
Kittipat Virochsiri	d8b9e787c2	DuplicateOp Summary: This is like LengthsToSegmentIds + Gather w/o the immediate segment IDs blob. I only realized that after I wrote the whole thing. That combination is not obvious, so just check this in? Reviewed By: xianjiec Differential Revision: D4847591 fbshipit-source-id: a1c480f16b317763866af13c83b3aaaeb6a60751	2017-04-08 00:01:59 -07:00
Yiming Wu	dc5a34200f	SumReduceLikeOp CPU/GPU implementation Summary: 1. CPU/GPU implementation of SumReduceLikeOp. [SRLOp](matrix A, matrix B) -> C where C is of the same shape as B, its value would be the reduce sum of corresponding A element. 2. Make SumReduceLikeOp (part of) the gradient of Add/Mul/Sub and provide unittests ===Update for Translation Team=== 3. Passed Tests: $ buck test caffe2/caffe2/python/operator_test:recurrent_network_test $ buck test fblearner/flow/tests/langtech/translation/neural_mt:seq2seq_model_caffe2 $ buck test fblearner/flow/tests/langtech/translation/neural_mt:seq2seq_ensemble_beam_model_caffe2 Reviewed By: Yangqing Differential Revision: D4711302 fbshipit-source-id: 0865abde871b3046b367599731593dae03f0775a	2017-04-07 15:19:24 -07:00
Kittipat Virochsiri	8482cf9823	TensorVectorSizeOp Summary: Put the size of the input tensor vector into the output blob Reviewed By: xianjiec Differential Revision: D4849556 fbshipit-source-id: 0929319e1705b027874d41a90a9159b335d93545	2017-04-07 14:46:19 -07:00
Aapo Kyrola	23183b9642	memory-saving only_loss argument for SoftmaxWithLoss Summary: When only_loss=True is enabled, the softmax output buffer is shared with the gradient buffer (which is of same size). Added tests for this. Only for GPU version for now. Reviewed By: salexspb Differential Revision: D4843991 fbshipit-source-id: 834d2a1b357d784e4d64efe484f893442201ad6a	2017-04-06 13:04:31 -07:00
Jerry Pan	76abd9a8ac	Caffe2: consolidate AveragedLoss with SumElementsOp Summary: Caffe2: consolidate AveragedLoss with SumElementsOp Differential Revision: D4781561 fbshipit-source-id: 6734adb9dd81d4cad1819a5f8fe736de2477cb72	2017-04-06 10:35:01 -07:00
Aapo Kyrola	cf201ebac8	support axis for cudnn softmax Summary: Added the support of axis for cudnn version of softmax + added cudnn tests to the softmax_ops_test Reviewed By: urikz Differential Revision: D4835409 fbshipit-source-id: 9150b969237e38daebff961fee3c36759f834ac4	2017-04-05 14:06:03 -07:00
James Reed	320b598ff1	Add NanCheckOp, an operator that checks for NaNs and inf's on both the forward and backward pass. Summary: NanCheck is an in-place operator for GPU that checks the input for any NaN or inf values. The operator fails and prints diagnostic information (input tensor dims and values) if it detects these erroneous values. This should help us to narrow down our numerical instability issues in the NMT models, and it might help others as well. Differential Revision: D4818141 fbshipit-source-id: e5aa9762089c58ce160270446007c7a91a7a85e5	2017-04-05 13:07:59 -07:00
Aapo Kyrola	ecd3bda44e	Fix Softmax for CUDA Summary: Following jamesr66a's brilliant observation, this diff fixes the non-CUDNN versions of Softmax. The op did not take into account that blocks can run in parallel, and thus could overwrite each others values, particularly the "row max" that is important for numerical stability So in this diff: 1) SoftmaxOp now shares all the code with SoftmaxWithLoss, that had better implementation + Strengthen the test case and renaming of file. Reviewed By: jamesr66a Differential Revision: D4832929 fbshipit-source-id: 4a1bfa2106ceb65ec75f5b868323ee1e7a3457fb	2017-04-05 10:07:54 -07:00
Yury Zemlyanskiy	5f263c6175	RecurrentNetwork and variable length links Summary: Two new features for RecurrentNetwork: 1. Ability to specify longer (for a few steps) initial state 2. Ability to link more than one step of external blob to internal one. Some motivation for these changes is provided in the unit test Reviewed By: salexspb Differential Revision: D4816230 fbshipit-source-id: 5ae6fed53b3b08a6ce4547ff1d0cb773dab42af0	2017-04-04 19:46:53 -07:00
Jon Morton	0e5b2fd016	Support cropping with negative pad sizes in PadImage Summary: The PadImage op supports cropping along the H/W dimensions by using negative pads; but currently passing negative values for pad attributes throws an error in ConvPoolOpBase, which PadImage inherits from. Modify ConvPoolOpBase to accept negative pad values for non-conv, non-pool ops. Also add a python operator test for cropping Reviewed By: ajtulloch Differential Revision: D4817118 fbshipit-source-id: 5ea5203e8072cc34fe14938e534b157d0ad55f6b	2017-04-03 23:47:54 -07:00
Aapo Kyrola	e13e9c1302	cuDNN version of TransposeOp Summary: Uses the cudnnTransformTensor function. It works by shuffling the strides according to the transpose axis. Significant speedup over current GPU version . + moves the transpose test under utility_ops, because hypothesis_test is too big Reviewed By: jamesr66a Differential Revision: D4810993 fbshipit-source-id: 82577c4ced1389e70bd5992820ae4d8297a3817f	2017-04-03 13:33:10 -07:00
Pooya Davoodi	a2593ea0c2	Add GatherOp for GPU, and update its tests. Summary: This is an initial (read: unoptimized) implementation of GatherOp on GPU. Closes https://github.com/caffe2/caffe2/pull/209 Differential Revision: D4809676 Pulled By: Yangqing fbshipit-source-id: bc36fa02e9964370ca845e9cc13344e5f3dbf176	2017-03-31 13:20:09 -07:00
Aapo Kyrola	8421bf7c60	Faster softmaxWithLoss rowMaxKernel Summary: We did not parallelize over D, which can be very large, especially in RNN models. This speeds up significantly, with my quick test in lstm_benchmark and nvprof, the time of RowMaxKernel dropped from 1.2s total to 0.28s total. + addded softmaxwithloss to the lstm_benchmark Reviewed By: jamesr66a Differential Revision: D4800629 fbshipit-source-id: 3400ea1064b1eb2793bc403df2c1b68801d545e5	2017-03-30 15:49:46 -07:00
Luke Yeager	d76a814c93	Fixes for ops without a CUDA backend Summary: All of these tests fail with some variant of `Cannot create operator of type 'X' on the device 'CUDA'` (see commit messages). Closes https://github.com/caffe2/caffe2/pull/227 Differential Revision: D4797060 Pulled By: Yangqing fbshipit-source-id: 5feaa8e949098bfc1254d4c7449a2744e552f925	2017-03-29 14:36:09 -07:00
Aapo Kyrola	1ed746df45	BatchMatMulOp: use cuBLAS batched strided gemm for CUDA Summary: Instead of doing gemms in a for-loop (which is not parallelized), it is much better to do the batched matmuls using CUDA 8's new batched-striped version of gemm. With the MT team's test, we get 5-10% improvement in overall walltime, so it is significant improvement: ---- Without batched gemm: I0328 10:46:48.118605 58068 prof_dag_net.cc:136] 424.757 ms/iter ( 283.878 ms/iter) RecurrentNetwork I0328 10:46:48.118609 58068 prof_dag_net.cc:136] 352.603 ms/iter ( 265.85 ms/iter) RecurrentNetworkGradient With batched gemm: I0328 10:53:48.169996 85617 prof_dag_net.cc:136] 407.438 ms/iter ( 269.564 ms/iter) RecurrentNetwork I0328 10:53:48.169999 85617 prof_dag_net.cc:136] 322.393 ms/iter ( 287.625 ms/iter) RecurrentNetworkGradient Reviewed By: jamesr66a Differential Revision: D4788272 fbshipit-source-id: 210e8b94c1e036b6ef0f039ce000d455258651f4	2017-03-28 11:54:09 -07:00
Alexander Sidorov	242bff8480	RNN: avoid copy for gradients of inputs to the rnn cell and save more memory! Summary: This is pretty tricky to explain, but we can just use backward_links. This way the whole cell would use a blob from the states_grad tensor instead of having its own blob. This also should save on memory a bit Differential Revision: D4770798 fbshipit-source-id: 673f85b2c2fdf42c47feeaa24d1e2bf086f012f9	2017-03-28 10:02:25 -07:00
Jerry Pan	78f0b35949	Caffe2: CUDA implementation for LeakyReluOp Summary: Caffe2: CUDA implementation for LeakyReluOp Reviewed By: asaadaldien Differential Revision: D4782336 fbshipit-source-id: 402eace695307b62740c918660d9e521217e928a	2017-03-28 08:48:25 -07:00
James Cross	b41449b680	SparseMomentumSGDUpdateOp Summary: Creates SparseMomentumSGDUpdate, a sparse version of MomentumSGDUpdate, to make that optimization method (via in-place updating operator) compatible with GradientSlices. Differential Revision: D4784973 fbshipit-source-id: e6330f471a4d5f53589a6ac245e38f256ca7f354	2017-03-28 07:47:46 -07:00
Deepak Gopinath	6aee34b666	Registering GPU version of PackSegments using GPUFallbackOp Summary: Creating PackSegments and UnpackSegments GPU operators using GPUFallbackOp for now. The op does mainly copying of blobs and this is a reasonable solution until we have a CUDA op. Reviewed By: pietern Differential Revision: D4761589 fbshipit-source-id: dd483b9e34ecb6b53925405e5b4c24859c549606	2017-03-24 16:01:53 -07:00
Luke Yeager	0ade0578b1	Reset workspace after each test in copy_ops_test Summary: This was a nasty one to track down. This was the error message: ``` E0323 14:47:46.138900 2870 context_gpu.h:126] Encountered CUDA error: an illegal memory access was encountered F0323 14:47:46.139143 2870 operator.h:176] Computation on device returned error in operator input: "x_gpu_2" output: "loss" name: "" type: "AveragedLoss" device_option { device_type: 1 cuda_gpu_id: 1 } ``` Closes https://github.com/caffe2/caffe2/pull/220 Differential Revision: D4771086 Pulled By: Yangqing fbshipit-source-id: f2d0f39f1647c84d97d9745f8a0305a389bfbc41	2017-03-24 12:20:34 -07:00
Ahmed Aly	99bfd36a04	CRF layer in caffe2 Summary: This is implementation of a CRF layer in caffe2 according to this paper: https://arxiv.org/abs/1603.01360 Currently this implementation works only for batch_size = 1 Reference implementations: - Tensorflow: `63a21e0540/tensorflow/contrib/crf/python/ops/crf.py` - Theano: https://github.com/glample/tagger/blob/master/model.py#L286 Differential Revision: D4644004 fbshipit-source-id: bf0801fd8562d11dca3fefe371c3d85e1dd69ccc	2017-03-23 22:02:02 -07:00
Alexander Sidorov	d7b2aebf2c	Support for Sum in cell net as first operator Summary: This didn't work for a reason specified in comments. Also some cleanup in the unit tests, now inference uses a custom workspace to run cell net on Reviewed By: urikz Differential Revision: D4742670 fbshipit-source-id: 04165c029fddec5ae31b20b207faf06d2fa20816	2017-03-21 18:32:18 -07:00
Ahmed Taei	e41d35909a	Conv-ND NCHW CUP/CUDA implementation Summary: Migrate caffe1 ConvNd implementation to caffe2. Reviewed By: Yangqing Differential Revision: D4659868 fbshipit-source-id: 14b178af3faa2c0b12e5a9f7aa76c1d8945419ea	2017-03-20 14:01:07 -07:00
James Reed	33f41c06c0	Remove more instances of batch_size Summary: D4734505 part 2. Remove more instances of the batch_size parameter Reviewed By: urikz Differential Revision: D4736906 fbshipit-source-id: fc9d374e9308017d61c427890364c5ab9cec2edf	2017-03-19 22:31:30 -07:00
James Reed	17da5856ed	Remove batch_size parameter from attention and LSTMWithAttention interfaces Summary: Reshape based on tensor shapes in the graph rather than based on a passed-in batch_size parameter Reviewed By: urikz Differential Revision: D4734505 fbshipit-source-id: d9c23d85be84f61124106e752ef2b4f6945e2a07	2017-03-19 18:16:28 -07:00
Yury Zemlyanskiy	d1424c3265	Revert D4702086: Remove batch_size parameter from attention and LSTMWithAttention interfaces Summary: This reverts commit c4c1d8425cd36c1e86695918eaba2667c27e9601 Differential Revision: D4702086 fbshipit-source-id: 4620610b182bb84b9297b5de32782761ae89d20b	2017-03-17 17:36:47 -07:00
Alexander Sidorov	f97d7949d0	Remove legacy LSTM, cleanup tests Summary: we don't use this one any more except a few tests Reviewed By: urikz Differential Revision: D4731401 fbshipit-source-id: c5c28b7594e3251f501fc28455dfc9bd2093a836	2017-03-17 16:33:53 -07:00
James Cross	79c3a3af54	add gpu support for caffe2-seq2seq Summary: Adding synchronous optimization on GPUs to the translation training pipeline, via data_parallel_model.Parallelize_GPU, which needs to be updated so there is some way of performing sparse parameter updates (e.g., on embedding tables), whether on GPU or CPU. Reviewed By: urikz Differential Revision: D4631914 fbshipit-source-id: 9cdd655f7dbda3f9b2733d459228b3e097892441	2017-03-17 05:19:14 -07:00
Jon Morton	1513b1de6b	Add ResizeNearest operator Summary: This adds a nearest neighbor interpolation resizing operator to caffe2. CPU only, NCHW only, no gradients. Also adds torch2caffe support. This is probably not optimal in terms of performance, but it works. Reviewed By: ajtulloch Differential Revision: D4724244 fbshipit-source-id: b8295061141fb513da84acf91fdfd67264119059	2017-03-16 18:49:01 -07:00
James Reed	cc2e915461	Implement TopK op in caffe2 Reviewed By: salexspb, urikz Differential Revision: D4718439 fbshipit-source-id: e6866eb7bb586f2716662cd4b65961bdd9914525	2017-03-16 17:32:20 -07:00
James Reed	10d95bd0f0	Remove batch_size parameter from attention and LSTMWithAttention interfaces Summary: Reshape based on tensor shapes in the graph rather than based on a passed-in batch_size parameter Reviewed By: urikz Differential Revision: D4702086 fbshipit-source-id: c4c1d8425cd36c1e86695918eaba2667c27e9601	2017-03-16 11:47:52 -07:00
Luke Yeager	7773a2d643	Bugfix: type not being set when inferring types+shapes Summary: /cc akyrola I basically just copied all the `ShapeCall` stuff as `TypeCall`. Is there a better way? Closes https://github.com/caffe2/caffe2/pull/187 Differential Revision: D4699312 Pulled By: Yangqing fbshipit-source-id: 92f736ffe4127b00b5821acb1eb359771975fdd7	2017-03-15 18:48:40 -07:00
Luke Yeager	014d1fe5c4	Allow test discovery in caffe2/python/ Summary: These are all essentially no-op changes which allow for nose-style (or pytest-style) test discovery. With this patch, you can use any of these methods to discover and run tests under `caffe2/python`: ``` python -m unittest discover -p 'test.py' caffe2/python/ python -m nose caffe2/python/ python -m pytest caffe2/python/ ``` Future work: * Get all of the tests to pass * Some seem to be testing operations which don't have GPU implementations * I get a segfault unless I set `CUDA_VISIBLE_DEVICES=0` * Some tests are flaky * Allow test discovery throughout the whole project (e.g. the `experiments/` dir) Closes https://github.com/caffe2/caffe2/pull/199 Reviewed By: pietern Differential Revision: D4704504 Pulled By: Yangqing fbshipit-source-id: 8f5687ec9c8aa873dfaff30dbf44272bc38a206b	2017-03-14 18:16:41 -07:00
Ahmed Taei	a745981c94	ReduceBack{Sum\|Mean}Op CPU & GPU implementation Summary: Implement ReduceBackSum & ReduceBackMean with gradients for CPU & GPU contexts. The reduction happens among the last dimenstions for example if input is a M x N matrix ReduceBackSum will results a vector of dim M x 1 contains the rowwise sums. Differential Revision: D4689768 fbshipit-source-id: 5b0482d4341867ecf23526dc6c4d544420e7d8f7	2017-03-13 16:19:58 -07:00
Kairan Sun	ee2bc06926	Add Shape Inference for Reshape Operator Summary: Add shape inference for reshape. Because it cannot do shape inference for reshaped tensor with runtime tensor data, set `out[0].set_unknown_shape(true)` if no `shape` argument is used. Differential Revision: D4671125 fbshipit-source-id: 685a9198f9b08e3336014c792f20051b381d8619	2017-03-13 14:31:27 -07:00
Aapo Kyrola	adb3f0ec22	add exception for empty shape param Summary: Following krp's suggestion, check if the shape parameter is empty. Reviewed By: dzhulgakov Differential Revision: D4686698 fbshipit-source-id: 3f9fb1e3215dd2a4a726442531201eeb18224bc6	2017-03-10 00:33:59 -08:00
Karthik Prasad	965a7daf9b	Implement MILSTM in caffe2 Summary: Created a new function with specifics related to MI LSTM implementation in caffe2 See https://arxiv.org/pdf/1606.06630.pdf for details. See D4478877 for the implementation of the same in tensorflow Reviewed By: jhcross Differential Revision: D4669882 fbshipit-source-id: 095bbcf187dbdac2cd79558ff0c8f9f67d8af639	2017-03-09 16:32:47 -08:00
James Cross	c5621ded31	Allow use of ReversePackedSegs operator in CUDA context Summary: ReversePackedSegs operator for CUDA. Input "lengths" (static integers) required to be in CPU memory. Differential Revision: D4661281 fbshipit-source-id: c800c316c34015ba8e732dcbcaa8c4edaffdfeab	2017-03-09 15:03:55 -08:00
James Reed	8de1db9eb6	Implement recurrent attention in C2 Summary: Super rough implementation of recurrent attention. Planning to factor out the common code between the two functions as well as train and eval. I want to get this out and get eyes on it sooner rather than later Differential Revision: D4647837 fbshipit-source-id: 54bc4e8ed0df6f04c86c425926decbe89f73b068	2017-03-08 11:21:28 -08:00
James Cross	8de2027d9b	Add gradient operator for SumElements Summary: Add gradient support for Caffe2 operator SumElements (for use in Translation RNN training pipeline). Differential Revision: D4669036 fbshipit-source-id: 502760a2a624b20b3241e83a2f208f450b6ff36f	2017-03-07 20:03:07 -08:00
Aapo Kyrola	d8588d8007	CUDA version of elementwise power + rename to Pow + gradient Summary: Renamed ElementwisePower to Pow for better discoverability. Added CUDA version and Gradient + tests. Reviewed By: kennyhorror Differential Revision: D4665550 fbshipit-source-id: dd33d8ad3917d71504e363ab397af50d38a63b1f	2017-03-07 10:20:40 -08:00
Aapo Kyrola	695ea6c7a1	SumElementsOp Summary: Add a simple op to sum the elements, with optional averaging. This is basically copy from AverageLossOp that we should alias to this. And maybe develop this towards a generic norm op. Reviewed By: jhcross Differential Revision: D4664591 fbshipit-source-id: 0e0c0efe9e415e2ad2feecfa42b03db2c83bee70	2017-03-07 05:23:53 -08:00
Aapo Kyrola	8fab453863	Sqr op and gradient Summary: Due to popular demand, added an op to compute element-wise square + gradient for it (just for the fun of it). Reviewed By: Yangqing Differential Revision: D4664797 fbshipit-source-id: 0a29c7c249fdc72f51412bebd6ae352a7801cf05	2017-03-07 03:03:07 -08:00
Kairan Sun	9f588aa8a2	Add Inference for Flatten Summary: Implementing shape inference for Flatten operator and adding unit tests. Differential Revision: D4664073 fbshipit-source-id: c54a269fc7633908fe4197682d27076ef97d9c22	2017-03-07 01:21:40 -08:00
Aapo Kyrola	2333ccadfb	MaxOp for CUDA Summary: Simple elementwise Max implementation for CUDA. Given N inputs, it will do N-1 pairwise maxes. I am not sure if it would be much better to iterate through all the inputs in the kernel, since this has better locality. We can also optimize later. Reviewed By: Yangqing Differential Revision: D4659953 fbshipit-source-id: 3a23b7fb3dbdf1d43bf3134ece03af4a791844dd	2017-03-06 16:46:53 -08:00
Pooya Davoodi	c61a7ca777	Make counts datatype int. Used as index. Summary: To avoid Numpy warning: using a non-integer number instead of an integer will result in an error in the future Closes https://github.com/caffe2/caffe2/pull/64 Differential Revision: D4658348 Pulled By: Yangqing fbshipit-source-id: 3a1b33cbb27849bc167b08147d078e8d487567f4	2017-03-06 10:46:36 -08:00
Aapo Kyrola	8caa7cec8d	CUDA version of Log Summary: As in the title. Simple registration issue. Reviewed By: Yangqing, jhcross Differential Revision: D4655691 fbshipit-source-id: 661e4d5f1226ec05e099c84f4454aa07c6be4449	2017-03-04 00:32:03 -08:00
Huazhong Ning	6c9105447c	support fill bool tensors in GivenTensorFill Summary: the existing code uses vector<T> to store the given tensor and then copy to output. If T=bool, vector<bool> stores the data as bits and then copy does not work. we use TensorCPU to store it instead. Also add unittest. Reviewed By: kennyhorror Differential Revision: D4622325 fbshipit-source-id: 95c27b5d1cfbc836d2419d01cacde5a3172f4d7e	2017-03-02 20:18:59 -08:00
Aapo Kyrola	ec56737190	fix shape inference for spatial softmax with loss Summary: The shape inferenec did not check for spatial mode. Reviewed By: andrewwdye Differential Revision: D4638218 fbshipit-source-id: f15419738587013dea39e04a3da086890938c4e2	2017-03-01 19:32:32 -08:00
Aapo Kyrola	02937903cc	add inference for gradient ops + a couple of missing shape inference functions + fix to scalars Summary: A bit too much stuff in one diff, so sorry: 1. Add inference for gradient types by using the fact that x_grad is gradient of x and must be of same shape. This is kind of awkward to use string matching, but in addition I rely on the operator being actually a gradient op. 2. dzhulgakov was write, scalar shape is () and not (1). Sorry, my claim easlier was #fakenews. 3. Added inference functions for MakeTwoClass, MomentumSGDUpdate and Cross entropy ops. Reviewed By: dzhulgakov Differential Revision: D4569758 fbshipit-source-id: 0db13f33819777fdddefe21d4b1ebf906fcaf98c	2017-02-28 23:33:32 -08:00

1 2 3

129 commits