pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-15 21:00:47 +00:00

Author	SHA1	Message	Date
Janusz Kudelka	ee7b3c9b2b	caffe2: rebatching queue for MultiTask Summary: RFC. This is a naive implementation of Rebatchin Queue for MultiTask effort. Full disclaimer, I'm very new to Caffe/Machine Learning and I'm doing dodge science here (under Dmytros supervision), so please be extra tough on this review so I can learn best practices :) Differential Revision: D4871970 fbshipit-source-id: 924820ef0fce45b5e2bdabeec9885cbafa23a880	2017-05-02 15:22:46 -07:00
Ahmed Taei	561255218a	NormalizeOP CUDA impelementation Summary: Implement NormalizeOP for GPU using CUDA, and re-write the graident to be a function of the output so its more efficent specially for CUDA implemntation. Reviewed By: akyrola Differential Revision: D4971300 fbshipit-source-id: e0ab66462000988aaf1f26010ea550533d107167	2017-05-01 09:25:30 -07:00
Viswanath Sivakumar	6e1333fe92	CUDA operators for DotProduct and DotProductGradient Summary: Only CPU impl is available at the moment. Wrote simple cuda kernels. Reviewed By: akyrola Differential Revision: D4577736 fbshipit-source-id: c2540aa9d332fcdeac46cc7f89aab164d107d7a8	2017-04-28 19:47:00 -07:00
Ying Zhang	d223d71703	Add shape inference function for RoiPool. Summary: As the title. Reviewed By: akyrola Differential Revision: D4960241 fbshipit-source-id: d5f7d7c2eea72a75f810aa2f532965fff48f8388	2017-04-28 17:03:29 -07:00
Kevin Matzen	6bb43ee41e	leaky relu gradient op Summary: Implement CPU and GPU gradient for Leaky ReLU op. Differential Revision: D4943905 fbshipit-source-id: 541f13cd5f274a18b69ecf1362722b1bc0105ad9	2017-04-28 10:06:23 -07:00
Kevin Matzen	482ffccd76	Make instance norm grad test less flakey Summary: Instance norm failed grad check in some cases that needed a smaller step size. Decreased step size, but also increased threshold slightly. Related diff: D4627379 Reviewed By: kennyhorror Differential Revision: D4941827 fbshipit-source-id: d6f565340da92af40bfee90627960a3356c69412	2017-04-27 22:35:10 -07:00
Xianjie Chen	726ded4758	add box cox transform op Summary: as desc Reviewed By: kittipatv Differential Revision: D4949042 fbshipit-source-id: 06b8828d8fbe2a88f6798c5d19a702ebaf6def70	2017-04-27 22:06:43 -07:00
Alexander Sidorov	bf50599c70	Layered LSTM (naive version) Summary: This is a naive layering approroach till we have a better one. It could be c++ based and support diagonal execution. Not integrating into main LSTM API yet as this might be revised a bit. Would like to land so we can compare current implementation in the benchmark and also use this as an example of how LSTMs could be combined (as some folks are doing similar things with some variations). Later we can LSTM() support API of layered_LSTM() and also change it under the hood so it stacks cells into a bigger cell instead. This way if we make RNN op use a kind of a DAG net, then RNN op can provide more parallelizm in stacked cells. Reviewed By: urikz Differential Revision: D4936015 fbshipit-source-id: b1e25f12d985dda582f0c67d9a02508027e5497f	2017-04-27 19:16:58 -07:00
Mathieu Baudet	1aadf4324b	Add row-wise broadcasting to "Where" operator Summary: Add row-wise mode to `Where` (D4901402), similar to `RowMul`. Reviewed By: ender-wieczorek Differential Revision: D4928221 fbshipit-source-id: 3443e559cd366e48c2f6a3f379aeefb7921264ee	2017-04-27 12:31:54 -07:00
Alexander Sidorov	ad6204eb0b	LSTM: support dropping hidden / cell states when sequence Summary: This is useful when data has standalone sequences which are not connected to each other by any meaningful context Reviewed By: yqwangustc Differential Revision: D4835164 fbshipit-source-id: f95626acc26acc3eba3bca7efb08ed1dbdb36c83	2017-04-27 11:47:29 -07:00
Jeffrey Dunn	9f9a2da1a1	Revert D4920719: [dper2][operator] ScaleGradientOp Summary: This reverts commit 0e1e0888f79594be874fdbdda5ccef7389064c50 Differential Revision: D4920719 fbshipit-source-id: 1ca9dc329eaffeb2932267d631506bb124d4e7ae	2017-04-26 09:34:47 -07:00
Huazhong Ning	e42c14e819	ScaleGradientOp Summary: ScaleGradient is a helper operator that does no actual numerical computation, and in the gradient computation phase scales the gradient from being computed through it. Differential Revision: D4920719 fbshipit-source-id: 0e1e0888f79594be874fdbdda5ccef7389064c50	2017-04-25 21:46:45 -07:00
Yang Yang	5692969e8f	add gradient for LengthsTileOp Summary: lengthTile goes from 1 to multiple, the gradient op is simply the reverse, by adding up the fanned-out rows of gradients together into 1 Reviewed By: kittipatv Differential Revision: D4943375 fbshipit-source-id: deae9984e849974a0d484a10b94efdb1d30941cc	2017-04-25 14:31:15 -07:00
Aapo Kyrola	f82a510be6	share forward activation blobs + pass unused free blobs down all branches + use shape infernece Summary: Added optional support for using activation blobs for sharing as well. Doing this change revealed an non-optimal implementation in the blob sharing: we need to prefer to reuse freeblobs by prefering those blobs that are already shared by many other blobs. Otherwise the memory usage can increase when the pool of 'free blobs' grows. Also, my first version only passed "free blobs" (i.e blobs in recycling pool) down the first branch when operators forked. But now we pass those blobs that were not used by the first branch down the second branch and so on. Also added support for blob size information in the heuristic. This uses the shape inference mechanism. I had to also do some small tweaks: - use Sum() operator as a way to match shapes of blobs that had otherwise unknown shapes. This is related to the Sum() operator that is added to combine multiple incoming gradient inputs (with _autosplit gradients). - a couple of random shape inference fixes This reduces the Resnet-50 memory usage on 64 batch from 9.45 Gig to 8.5 Gig. For a 32 batch, the memory usage is 4330 MiB, down from 4800 MB, compared to Torch's 6856MiB (thanks prigoyal for checking this for me). This is unfortunately quite a bunch to review... Reviewed By: asaadaldien Differential Revision: D4393909 fbshipit-source-id: 9c7c94125f96512bea80463ebcb63c215ef95ff9	2017-04-25 14:23:25 -07:00
Ahmed Taei	2533671a97	Support 3D&1D SpatialBatchNorm in cuDNN Differential Revision: D4941087 fbshipit-source-id: 4adbf1f8990c7356f8effd8b0e1ae286fce6558c	2017-04-24 22:16:19 -07:00
Mathieu Baudet	081001a176	"IsMemberOf" operator Summary: Add a pointwise `IsMemberOf` operator to Caffe2. The original idea was `In` but I think this is not so clear. I used `UnaryElementwiseWithArgsOp` at some point, but it was making the code a bit more difficult to read without bringing any feature. Reviewed By: ender-wieczorek Differential Revision: D4912655 fbshipit-source-id: 716b66bb51468dd59db5f76f23d78cda85961b58	2017-04-24 18:18:49 -07:00
Mathieu Baudet	24ff90ee6b	"Where" operator Summary: Adding a pointwise `Where(condition, left, right)` operator to Caffe2. Reviewed By: ender-wieczorek Differential Revision: D4901402 fbshipit-source-id: a33682e77b2e7367050a94eeb4e10b7e5de9f955	2017-04-24 18:18:48 -07:00
Janusz Kudelka	902409be56	caffe2: datasets pack/unpack Summary: Two new operators to pack and unpack a dataset. This is so that we can re-use other operators that do not understand the schema format. The immediate use-case is to use it with a partition operator. Packing works by splitting the input into separate tensors, putting them in a vector and wrapping in a shared_ptr (as opposed to a unique_ptr, so we can copy). Unpack takes the packed input and concatenates it back to the original. I also had a gard time understanding the iteration, so I created a TreeWalker that just hides the complexity of operating with all the arrays and makes the short functions for a given purpose that at least for me are easier to understand. Reviewed By: dzhulgakov Differential Revision: D4918002 fbshipit-source-id: ecbf9196ed25e886a94383961176b8c84dde2d2f	2017-04-24 16:09:39 -07:00
Aapo Kyrola	9cb901caf0	Forward-only rnns Summary: Added option to recurrent_net and RNNCell's for forward_only. If this is set, the backward_step_net is not passed to the operator. When backward_step_net is not available, operator knows it is in forward_only mode and does not create workspaces for each step but cycles through only one private workspace. Note: we could avoid doing a lot of work in recurrent.py:recurrent_network call when backward step is not needed, but doing that nicely requires more refactoring that I did not want to do now. Thus, we create the backward step nets etc, but just don't pass it to the op. This can be used to create more efficient inference models. You can also sanitize existing inference nets and remove the backward_step_net argument to get the benefits. Reviewed By: salexspb Differential Revision: D4916482 fbshipit-source-id: c99b93c9cb897c32b0f449253f7f6d6a942618ad	2017-04-24 15:52:27 -07:00
Yiming Wu	bef6e45f8b	rename ModelHelperBase Summary: rename ModelHelperBase to Model. This is the result of running: find . -type f -exec sed -i 's/ModelHelperBase/ModelHelper/g' {} + We had 19 results when fbgs ModelHelperBase. Here is 20 instances because I added 1 test in model_helpers_test.py Reviewed By: salexspb Differential Revision: D4928337 fbshipit-source-id: bc4c12b60b90c167e717de50ea9fe17521e142e3	2017-04-24 15:52:26 -07:00
Alexander Sidorov	4f77a49ddd	refactor LSTM test to avoid copy pasta, improve speed 1.5x and provide better coverage Summary: This is getting too messy again. So cleaning up it even more. One thing I added here - not calling random to generate the input sequence. Ideally we do this for all other inputs. This was reported to be an issue when hypothesis finds bad examples - it can make it run very long. Also I tunned ranges a bit so test finishes faster. On my devgpu test the whole test took 600 before and now is 39 seconds. One more important thing - we want to test all combinations of things that are in the for loop. While things provided by hypothesis are just random tensor inputs. Differential Revision: D4902956 fbshipit-source-id: ceb02d6761406b3192101d3b255abe90b2866770	2017-04-24 15:52:26 -07:00
Aapo Kyrola	41f4198344	CUDA version of PRelu/Gradient + Fix Gradient for dW Summary: CUDA version of PRelu and its gradient. Forward pass is straightforward, backward pass requires reductino over the weights. tsaizhenling, please patch this and test. Differential Revision: D4931630 fbshipit-source-id: 1238e7d536e41480713865ced91aaef88f4feef5	2017-04-24 15:52:25 -07:00
Luke Yeager	09bb91022a	Fix tests for ops without a CUDA backend Summary: See https://github.com/caffe2/caffe2/pull/227 * Logit * ReplaceNaN * BatchOneHot Closes https://github.com/caffe2/caffe2/pull/277 Differential Revision: D4915268 Pulled By: Yangqing fbshipit-source-id: 77ccb2e7d03e6953e8ca60646987a02868d0ef5b	2017-04-24 15:52:25 -07:00
Aapo Kyrola	b82f9e9ea7	FindOp Summary: Simple FindOp for CPU and GPU which searches a list of unordered needles from an unordered index. CPU version might be faster if first sorting the index / needles, but we can get back to that later. CUDA op is also kind of brutish, but pretty parallel. Since the index and the queries are smallish at least in the use case currently in mind (Machine Translation's team word candidate search), I think this is a sufficient start. Note that this is much simpler than the Index-class of ops which allow modifying the index etc. Since CUDA ops are more complex to implement for the full Index functionality, I decided to make a separate op with this very simple functionality. Differential Revision: D4910131 fbshipit-source-id: 6df35c9e3c71d5392a500d5b98fd708ab0c8e587	2017-04-24 15:52:25 -07:00
James Reed	01c76bf830	Optimize TransposeOp by using strided access pattern, bulk memory transfer, and other profile-guided optimizations Summary: Work in progress for improving the performance of the TransposeOp on CPU. This is used extensively for inference in several neural MT systems, so optimizing this function is worthwhile and will reduce request latency. Differential Revision: D4913075 fbshipit-source-id: fa2742829291d91f3eba00fdfe7d6c0dae83e206	2017-04-20 18:31:40 -07:00
Kittipat Virochsiri	e5e3ec1498	fix unit test Summary: CUDA is not implemented Reviewed By: xianjiec Differential Revision: D4917368 fbshipit-source-id: dc41a76cf569018896cf457c0e3358ce840e198e	2017-04-19 17:22:00 -07:00
Yiming Wu	4ad3a4fc8b	Revert D4794432: Added tiles and axis as input parameters to Tile Operator Summary: This reverts commit a7e38f4f925a4cedf530924bd426c3bb08b5aad8 Differential Revision: D4794432 fbshipit-source-id: 05b2b0d101ebd917527e94ef8a74e63ab40942a4	2017-04-19 14:17:25 -07:00
Kittipat Virochsiri	883ff96f74	Allow UniformIntFill to produce empty tensor Summary: This is needed for the completeness of random negative sampling. When the pool size is 0, we want to generate empty indices tensor. Reviewed By: xianjiec Differential Revision: D4906866 fbshipit-source-id: 75d66a92d15d60bb37bcd1075d324f28069c4fa0	2017-04-19 13:03:23 -07:00
Yangqing Jia	41620f86c9	Update IntelComposerXE to 2017.2.274 Summary: Due to the massive dependencies I did not update the version number - under the same big version number (2017) the API is compatible so no need to rebuild all the dependencies. This will unblock the Caffe2 Intel pull request on MKLDNN. Differential Revision: D4906463 fbshipit-source-id: 0f74436ac3a05605e35b8b649c3e8b5c1c69b500	2017-04-19 10:07:09 -07:00
Shenxiu Liu	8492c411e8	Caffe2 unit test for unmask Summary: unit test using hypothesis for unmask operator Reviewed By: ender-wieczorek Differential Revision: D4904075 fbshipit-source-id: 874d3756ec703ab2cc82f24f7160b4254bf791f1	2017-04-18 21:06:18 -07:00
Dmytro Dzhulgakov	580e192151	Revert D4870606: caffe2: datasets pack/unpack Summary: This reverts commit dc29428de5c96cc3039af2885d9e4b026d9f482d Differential Revision: D4870606 fbshipit-source-id: 1d05912b1a9e35e84b0c163c7b018db125ce060f	2017-04-18 16:47:05 -07:00
Kittipat Virochsiri	009bbc9983	Allow UniformFill/UniformIntFill to take parameters from input blobs Summary: This will be used to generate random indices input to `Gather` Reviewed By: xianjiec Differential Revision: D4904591 fbshipit-source-id: 8d858631e3d640be2cec12f1566cbf195e6aad4b	2017-04-18 14:31:03 -07:00
Janusz Kudelka	34269a6fda	caffe2: datasets pack/unpack Summary: Two new operators to pack and unpack a dataset. This is so that we can re-use other operators that do not understand the schema format. The immediate use-case is to use it with a partition operator. Packing works by splitting the input into separate tensors, putting them in a vector and wrapping in a shared_ptr (as opposed to a unique_ptr, so we can copy). Unpack takes the packed input and concatenates it back to the original. I also had a gard time understanding the iteration, so I created a TreeWalker that just hides the complexity of operating with all the arrays and makes the short functions for a given purpose that at least for me are easier to understand. Reviewed By: dzhulgakov Differential Revision: D4870606 fbshipit-source-id: dc29428de5c96cc3039af2885d9e4b026d9f482d	2017-04-18 13:31:10 -07:00
Yury Zemlyanskiy	4bf559eddb	RNNCell, LSTMCell, LSTMWithAttentionCell Summary: This is the nice way to re-use RNN layers for training and for inference. Reviewed By: salexspb Differential Revision: D4825894 fbshipit-source-id: 779c69758cee8caca6f36bc507e3ea0566f7652a	2017-04-18 00:47:20 -07:00
Yangqing Jia	cf317d1106	create_net: explicitly specify if one wants to overwrite the network. Summary: This is from discussion with dzhulgakov : as a step towards revisiting the core.Net autonaming, we will first guard against accidental overwrites of existing networks in the workspace. ajtulloch since we are doing Predictors in mobile, this should be safe right? azzolini - I assume this would be safe, but would love to get your approval. akyrola - would this hurt xray? Reviewed By: dzhulgakov Differential Revision: D4897725 fbshipit-source-id: aa41271927ad6671f07a53b9505283623f8c49e5	2017-04-17 21:46:53 -07:00
Romain Cledat	20330fe3f4	Added tiles and axis as input parameters to Tile Operator Summary: Added the possibility to add 'tiles' and 'axis' as input as opposed to arguments for the Tile Operator. If provided, the input values will override the argument values Differential Revision: D4794432 fbshipit-source-id: a7e38f4f925a4cedf530924bd426c3bb08b5aad8	2017-04-17 15:31:20 -07:00
Zhicheng Yan	25035e8b3b	ElementwiseLinearOp Summary: Implement a new op ElementwiseLinear. Given inputs X of size (N x D), a of size D and b of size D, the op computes Y of size (N X D) where Y_{nd} = X_{nd} * a_d + b_d. Typically, this op is followed by SigmoidCrossEntropyWithLogits op for multi-label classification problem. Differential Revision: D4892220 fbshipit-source-id: 77bffc5fbe03d48b3d83ab785f7c24a71c952aec	2017-04-17 14:18:27 -07:00
Aapo Kyrola	4db7bec686	CUDA version of SigmoidCrossEntropyWithLogits Summary: CUDA versions of SigmoidCrossEntropyWithLogits/Gradient. Reviewed By: jay-mahadeokar Differential Revision: D4891254 fbshipit-source-id: cabad908026e30d9a0721cad092ba948659ab917	2017-04-14 16:07:33 -07:00
Yangqing Jia	d65892b7f2	Change back the function signature of relu gradient to only use Summary: This allows us to do in-place relu and also corrects the previous error of inconsistency between the cudnn impl and the non-cudnn impl. This implementation butchers the cudnn interface, in the sense that we pass in the output instead of the input for the gradient pass. We do have a gradient checker to guard this situation, so we should be safe. Reviewed By: asaadaldien Differential Revision: D4889426 fbshipit-source-id: 081f8fe06de78413b5786086bfd5ae6c8128cd6e	2017-04-13 22:08:09 -07:00
James Reed	e8cc5563fe	Add an optional forget bias argument to LSTMUnit Summary: Add an option to bias the forget gate one way or another by adding in some float value before the sigmoid is applied. Differential Revision: D4880712 fbshipit-source-id: 1306a97c29fb31630838b2f96597a46e952d940a	2017-04-13 21:49:17 -07:00
Aapo Kyrola	69f42e3f70	make CopyGPUToCPU/CPUToGPU handle sparse gradients Summary: CopyGPUToCPu and CopyGPUToCPU need to handle gradients that come sparse on their way. Added unit test and fixed the gradient makers to create copies for both value and indices. This becomes less important with gpu sparse parameter update ops land, but nevertheless good to fix. Reviewed By: dzhulgakov Differential Revision: D4882327 fbshipit-source-id: aafd2df46b3e1bcb30b52b1edf40fad8271f1f88	2017-04-13 17:16:26 -07:00
Luke Yeager	8bd0522c20	Add tests and GPU impls for sparse optimizers Summary: These GPU paths are probably even buggier than the CPU paths for sparse gradients with duplicate indices. Both paths cause multiple momentum updates in a single iteration, but only the GPU path is non-deterministic. Depending on how we decide to address the issues on the CPU path, pooyadavoodi has a good idea for how to match dense behavior with the sparse GPU ops. Closes https://github.com/caffe2/caffe2/pull/254 Reviewed By: bwasti Differential Revision: D4871680 Pulled By: dzhulgakov fbshipit-source-id: 220be57a0f699a22ea85ed4f7022d92d362d06b3	2017-04-13 11:07:40 -07:00
Yiming Wu	83f360887f	new SumReduceLike op CPU/GPU implementation and doc Summary: new SumReduceLikeOp CPU/GPU implementation and doc. Unit tests and NMT team tests passed. Some benchmark results here: shape(A) = [100, 1000, 100] shape(B) = [1000] 0.36684 ms/iter (0.00122679 ms/iter) SumReduceLike 0.246593 ms/iter (0.00151116 ms/iter) ReduceBackSum 0.202563 ms/iter (0.00511932 ms/iter) ReduceFrontSum // This means that we are faster than back+front sum now shape(A) = [32, 32, 100] shape(B) = [32, 100] 0.0253826 ms/iter (0.00257504 ms/iter) ReduceFrontSum 0.0233368 ms/iter (0.00118283 ms/iter) SumReduceLike shape(A) = [32, 32, 100] shape(B) = [32, 32] 0.0276206 ms/iter (0.00691918 ms/iter) ReduceBackSum 0.0254768 ms/iter (0.00325529 ms/iter) SumReduceLike Reviewed By: Yangqing Differential Revision: D4873222 fbshipit-source-id: 736b1537998f4289876bc53d38607b8052e89c70	2017-04-13 10:28:46 -07:00
Kittipat Virochsiri	05002442eb	Renaming DuplicateOp to LengthsTileOp Summary: making the name a bit clearer Reviewed By: xianjiec Differential Revision: D4866940 fbshipit-source-id: 3e0f7067a9d3ba89cb038d85c1991e541f1e439c	2017-04-12 22:04:20 -07:00
Kittipat Virochsiri	f5ac83b060	LengthsGatherOp Summary: Length-aware gather operator. This will be use for random negative sampling. See the task for details. This should be equivalent to: LengthsToRange + Gather + Reshape + GatherRanges That's pretty complicated. Differential Revision: D4846023 fbshipit-source-id: 8d9b7ff3eddc75a7ab147cd1c2a12f377652df93	2017-04-12 12:01:35 -07:00
Ahmed Taei	75c2168966	Generalize PoolingOp(CUDA) to compute 1D, 2D and 3D pooling. Summary: Extend MaxPooling & AvergePooling CUDA ops to compute 1D, 2D & 3D pooling. Differential Revision: D4866699 fbshipit-source-id: 9bf2d970f2df2b87194a539fc60c07ac19fa1042	2017-04-12 09:16:45 -07:00
Ahmed Taei	09bfc8043b	Generalize PoolingOp(CPU) to compute 1D, 2D and 3D pooling. Summary: Extend the op compute 1D, 2D & 3D pooling. Differential Revision: D4828691 fbshipit-source-id: 87540e82ed20d1361476cfbc43a708de9ca7a88e	2017-04-11 18:18:21 -07:00
Aapo Kyrola	1e5140aa76	option to recompute blobs backward pass with massive memory savings Summary: This diff adds an option to recurrent_net to define some cell blobs to be recomputed on backward step, and thus they don't need to be stored in the step workspace. This is done by modifying the backward step to automatically include all operators that are needed to produce the output that is to be recomputed, and by storing those blobs in a shared workspace. To enable the shared workspace, i had to modify the stepworkspaces blob to also store a forward shared workspace. Making it a class field won't work since the lifecycle of the blob does not match the lifecycle of the operator. For basic LSTM, the performance hit is quite modest (about 15% with one setting, but your mileage might vary. For Attention models, I am sure this is beneficial as computing the attention blobs is not expensive. For basic LSTM, the memory saving is wonderful: each forward workspace only has 4 bytes (for timestep). I also modified the neural_mt LSTM Cells, but there is no test available, so I am not 100% sure I did it correctly. Please have a look. Added options to LSTM, MILSTM and LSTMAttention to enable memory mode. Reviewed By: urikz Differential Revision: D4853890 fbshipit-source-id: d8d0e0e75a5330d174fbfa39b96d8e4e8c446baa	2017-04-11 13:03:48 -07:00
Xianjie Chen	70e9c08f27	feature processing ops Summary: add necessary ops for feature processing * logit op * replace nan * batch one hot op Reviewed By: kittipatv Differential Revision: D4840869 fbshipit-source-id: 197123ea5608d54f0b5ac7899973a077a6a86775	2017-04-11 07:07:51 -07:00
Aapo Kyrola	22584b546a	Revert D4711302: SumReduceLikeOp CPU/GPU implementation Summary: This reverts commit 0865abde871b3046b367599731593dae03f0775a Differential Revision: D4711302 fbshipit-source-id: 6c22e683544f6627142fc9970a781ec98f682cad	2017-04-10 23:01:26 -07:00

1 2 3 4

181 commits