pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-15 21:00:47 +00:00

Author	SHA1	Message	Date
Yury Zemlyanskiy	31643d5ecb	Inference code for seq2seq model Summary: Beam search implementation Differential Revision: D4975939 fbshipit-source-id: 67d8b73390221583f36b4367f23626a2aa80f4b4	2017-05-02 22:47:28 -07:00
Yiming Wu	3504e1d836	cuda (sparse) lengths sum Reviewed By: azzolini Differential Revision: D4961327 fbshipit-source-id: 4ee61dcdd907c044876cb0de671ceee953c15129	2017-05-02 22:21:42 -07:00
Alexander Sidorov	379ac514b8	lstm_benchmark: add warm-up stage, support layers Summary: We need a warm-up stage because otherwise first iteration speds too much timedoing all the allocations Reviewed By: akyrola Differential Revision: D4986201 fbshipit-source-id: f60a75520988ff3f1540bb157cdc69634f307db4	2017-05-02 20:34:00 -07:00
Kittipat Virochsiri	22d4eaeb9e	JoinContext Summary: Layer to allow model to follow different paths for each instantiation context and join later. Together with tagging system cleanup (this is a separate issue), this should reduce the need to write a layer to differentiate between context. Re: tagging system clean up, we should make exclusion more explicit: EXCLUDE_FROM_<CONTEXT>. This would simplify instation code. TRAIN_ONLY should become a set of all EXCLUDE_FROM_*, except EXCLUDE_FROM_TRAIN. Reviewed By: kennyhorror Differential Revision: D4964949 fbshipit-source-id: ba6453b0deb92d1989404efb9d86e1ed25297202	2017-05-02 17:32:26 -07:00
Aapo Kyrola	282298dd1c	Data parallel model: Disable NCCL by default to hopefully reduce deadlocks Summary: Make NCCL optional in data_parallel_model due to continuing reliablity (deadlock) issues. Reviewed By: pietern Differential Revision: D4988950 fbshipit-source-id: 8a2192f01b5f3c0e847137cd37aefc69e553a56f	2017-05-02 16:09:17 -07:00
Janusz Kudelka	ee7b3c9b2b	caffe2: rebatching queue for MultiTask Summary: RFC. This is a naive implementation of Rebatchin Queue for MultiTask effort. Full disclaimer, I'm very new to Caffe/Machine Learning and I'm doing dodge science here (under Dmytros supervision), so please be extra tough on this review so I can learn best practices :) Differential Revision: D4871970 fbshipit-source-id: 924820ef0fce45b5e2bdabeec9885cbafa23a880	2017-05-02 15:22:46 -07:00
Yiming Wu	222b781f76	Ensure sparse_gradients feed to CPU Summary: Ensure sparse gradients tensors are copied to CPU Reviewed By: dzhulgakov Differential Revision: D4987701 fbshipit-source-id: 81f93c4f9d4b9bc5855cd4e9683d1a887b27e0cf	2017-05-02 15:01:26 -07:00
Kittipat Virochsiri	e8e36945cf	make debug message more explicit & verbose Summary: I ran into this earlier and the debug messages were not helpful enuogh Reviewed By: kennyhorror Differential Revision: D4985754 fbshipit-source-id: b3d12b5e2cfa1b54fca9126768c84c902664ef28	2017-05-02 12:39:14 -07:00
Krishna Vudata	1f3c7f8080	Handle net.external_inputs correctly in AppendNet Summary: When appending net A to net B, an external input of net A should not be added as an external input of net B if net B is outputting that blob. Reviewed By: dzhulgakov Differential Revision: D4975921 fbshipit-source-id: a5c0ada7b96d851e57d345244d322dd93c7be8e4	2017-05-02 11:20:26 -07:00
Chonglin Sun	e8e93066e7	add workflow for user complicated embedding Summary: Correctly propagate request_only tag to all layer. Reviewed By: kennyhorror Differential Revision: D4751496 fbshipit-source-id: e65fd8cfe56d2989213d44e684a528ede691d316	2017-05-02 10:46:52 -07:00
Jiyan Yang	a458aa4b2a	Fix tags to be based on EXCLUDE_FROM_{CONTEXT} Summary: Cleaning up the tagging system. Introducing tags EXCLUDE_FROM_{CONTEXT}. Reviewed By: kennyhorror Differential Revision: D4974842 fbshipit-source-id: b0fa6772299bb70afa2192c39e45191c9f41336a	2017-05-02 09:32:27 -07:00
Kittipat Virochsiri	7d6d67119f	Allow LayerModelHelper to keep input blobs from schema Summary: In certain situation, like in D4907916 where we insert additional step in the middle of a model, it's neccessary to keep the blob names constant across model helper so that it doesn't break communication schema. Reviewed By: kennyhorror Differential Revision: D4981527 fbshipit-source-id: 6b8d6d240279dd48f801cfacbaa1d320ba54d694	2017-05-01 21:31:36 -07:00
Ahmed Aly	58bc830660	Integrate CRF in DeepText + New caffe2 operator for viterbi decode Summary: Inegration of the CRF Layer in DeepText wordmodels + Implementing the viterbi decode operator in C++ instead of python so that the CRF models can be deployed in production. Differential Revision: D4912196 fbshipit-source-id: 64f499a1bd47e811e7a96dde839904dcd05cacb3	2017-05-01 20:39:41 -07:00
Kittipat Virochsiri	38d3bfa5d4	Warn on setting blob on Scalar Summary: Calling `set()` or `set_value()` on Scalar is dangerous as something might be holding a reference to it. This is especially true with `LayerModel`, where instantiation is delayed. The code may still run but it will produce unexpected results, i.e., values maybe written to the wrong blob. Reviewed By: kennyhorror Differential Revision: D4955366 fbshipit-source-id: f5e8694a9a411ee319ca9f39a0fed632d180b8a5	2017-05-01 20:18:30 -07:00
Aapo Kyrola	c86610b738	special executor class for RecurrentNetworks (just single threaded now) Summary: This is preamble for the "diagonal executor". Instead of creating a Net for each timestep, we have a single executor for the RecurrentNetworkOp that manages ops per timestep. This will be used if net_type='rnn', so one can still use the old way by using a net type of 'simple' or 'dag' (so there is effective kill-switch if there are some issues with this). Did this only for the forward-model. Gradient op will follow later on, but it is basically similar, just reverse order. Reviewed By: salexspb Differential Revision: D4979933 fbshipit-source-id: bda77918ec518cb6b29d7021ee036d59eb2dd303	2017-05-01 19:06:25 -07:00
Yiming Wu	f0dd96c116	brew fc test fix for packed_fc Summary: It turned out that we can not run PackedFC on a machine that does not have avx2 right now, as there is an known issue with MKL 2017.0.098 that produces wrong results on non-avx2 machines. I just moved this test from here because this is not the purpose of this test Reviewed By: salexspb Differential Revision: D4974021 fbshipit-source-id: c5b82a41021defc9946a8219f59b28abb13d3beb	2017-05-01 14:33:05 -07:00
Kittipat Virochsiri	ffc6bad116	Concat axis=0 Summary: Previously, the code below would go out of bound. Reviewed By: xianjiec Differential Revision: D4968037 fbshipit-source-id: 3760e2cddc919c45d85ac644ac3fabf72dbaf666	2017-05-01 12:19:34 -07:00
Ahmed Taei	561255218a	NormalizeOP CUDA impelementation Summary: Implement NormalizeOP for GPU using CUDA, and re-write the graident to be a function of the output so its more efficent specially for CUDA implemntation. Reviewed By: akyrola Differential Revision: D4971300 fbshipit-source-id: e0ab66462000988aaf1f26010ea550533d107167	2017-05-01 09:25:30 -07:00
Aapo Kyrola	2c59f017e6	Port Xray OC workflow to elastic_data_parallel_model Summary: As in the title + added scuba logging of the results. Reviewed By: andrewwdye Differential Revision: D4974261 fbshipit-source-id: 3e05b97133be95ffe37c8bcafd8a5a6bf3e7da93	2017-05-01 00:32:47 -07:00
Viswanath Sivakumar	6e1333fe92	CUDA operators for DotProduct and DotProductGradient Summary: Only CPU impl is available at the moment. Wrote simple cuda kernels. Reviewed By: akyrola Differential Revision: D4577736 fbshipit-source-id: c2540aa9d332fcdeac46cc7f89aab164d107d7a8	2017-04-28 19:47:00 -07:00
Ying Zhang	d223d71703	Add shape inference function for RoiPool. Summary: As the title. Reviewed By: akyrola Differential Revision: D4960241 fbshipit-source-id: d5f7d7c2eea72a75f810aa2f532965fff48f8388	2017-04-28 17:03:29 -07:00
Aapo Kyrola	ed05c28bc6	Speedup SquaredL2Distance CUDA Summary: Both SquaredL2Distance and SquaredL2DistanceGradient had bad CUDA implementations. Use proper reductions and batched kernels. Reviewed By: asaadaldien Differential Revision: D4968527 fbshipit-source-id: f7cf82072d38bc127c757c5751863a9439aca8b5	2017-04-28 11:55:59 -07:00
Kevin Matzen	6bb43ee41e	leaky relu gradient op Summary: Implement CPU and GPU gradient for Leaky ReLU op. Differential Revision: D4943905 fbshipit-source-id: 541f13cd5f274a18b69ecf1362722b1bc0105ad9	2017-04-28 10:06:23 -07:00
Kevin Matzen	482ffccd76	Make instance norm grad test less flakey Summary: Instance norm failed grad check in some cases that needed a smaller step size. Decreased step size, but also increased threshold slightly. Related diff: D4627379 Reviewed By: kennyhorror Differential Revision: D4941827 fbshipit-source-id: d6f565340da92af40bfee90627960a3356c69412	2017-04-27 22:35:10 -07:00
Xianjie Chen	726ded4758	add box cox transform op Summary: as desc Reviewed By: kittipatv Differential Revision: D4949042 fbshipit-source-id: 06b8828d8fbe2a88f6798c5d19a702ebaf6def70	2017-04-27 22:06:43 -07:00
Alexander Sidorov	bf50599c70	Layered LSTM (naive version) Summary: This is a naive layering approroach till we have a better one. It could be c++ based and support diagonal execution. Not integrating into main LSTM API yet as this might be revised a bit. Would like to land so we can compare current implementation in the benchmark and also use this as an example of how LSTMs could be combined (as some folks are doing similar things with some variations). Later we can LSTM() support API of layered_LSTM() and also change it under the hood so it stacks cells into a bigger cell instead. This way if we make RNN op use a kind of a DAG net, then RNN op can provide more parallelizm in stacked cells. Reviewed By: urikz Differential Revision: D4936015 fbshipit-source-id: b1e25f12d985dda582f0c67d9a02508027e5497f	2017-04-27 19:16:58 -07:00
Yiming Wu	aa5a46b848	fix LRN order Summary: fix LRN helper's order Reviewed By: salexspb Differential Revision: D4949902 fbshipit-source-id: 88b1aa985546d36aa66c0677c617979ff186d78a	2017-04-27 16:46:47 -07:00
Yury Zemlyanskiy	12a024241a	Move BeamSearchForwardOnly to OSS Summary: Step 1 for inference code in OSS Differential Revision: D4960547 fbshipit-source-id: 4c3121e5cb3c2402be08947c1e1afa0dd6eb921a	2017-04-27 13:35:53 -07:00
Mathieu Baudet	1aadf4324b	Add row-wise broadcasting to "Where" operator Summary: Add row-wise mode to `Where` (D4901402), similar to `RowMul`. Reviewed By: ender-wieczorek Differential Revision: D4928221 fbshipit-source-id: 3443e559cd366e48c2f6a3f379aeefb7921264ee	2017-04-27 12:31:54 -07:00
Alexander Sidorov	ad6204eb0b	LSTM: support dropping hidden / cell states when sequence Summary: This is useful when data has standalone sequences which are not connected to each other by any meaningful context Reviewed By: yqwangustc Differential Revision: D4835164 fbshipit-source-id: f95626acc26acc3eba3bca7efb08ed1dbdb36c83	2017-04-27 11:47:29 -07:00
Xiaolong Wang	e9d5863860	Allow Load operator to load into overriden names Summary: A new argument `blob_name_overrides` is added, which is to specify the destination of loaded blob (in order to allow they have different names than what are in the saved file/db). This will be used for parameter initailization by pretrained model in Dper 2. When loading a blob, we need to avoid name collision by assigning the loaded blob with a new (temp) name. Reviewed By: xianjiec Differential Revision: D4952485 fbshipit-source-id: 4ce79bf40223314bb94981c22cbe537ae3f3d27c	2017-04-27 01:18:12 -07:00
Aapo Kyrola	6a1ef687f6	Free scratch blobs when data workers exits, add utility function to reset blobs Summary: Free scratch blobs at data workers exit. Also add utility function that you can use to reset gradient blobs easily: from caffe2.python import utils grad_blobs = [b for b in workspace.Blobs() if b.endswith("_grad") or b.endswith("_shared")] utils.ResetBlobs(grad_blobs) Reviewed By: rpenggithub Differential Revision: D4955531 fbshipit-source-id: d33b2bb2b5247dd2c4cff51c82b1257c871a4179	2017-04-26 13:40:13 -07:00
Jiyan Yang	795dc1c326	Remove loss ops from eval net Summary: Current eval nets contain loss operators; see example: https://fburl.com/6otbe0n7, which is unnecessary. This diff is to remove them from the eval net. Differential Revision: D4934589 fbshipit-source-id: 1ba96c20a3a7ef720414acb4124002fb54cabfc7	2017-04-26 12:46:25 -07:00
Aapo Kyrola	9215afef7d	Allow stopping of specific data workers + specify c2 queue size Summary: Now you can call coordinator.stop_coordinator("train") to stop the train model's data input and release its memory. Reviewed By: rpenggithub Differential Revision: D4955014 fbshipit-source-id: c1bc3ec67337b94aff8ea9b306c3b4158eeef42c	2017-04-26 11:18:40 -07:00
Bor-Yiing Su	13bdd4ec05	Replaces the non-existing _param_init_net net by raising an exception. Summary: The _param_init_net does not exist. All the other places reference param_init_net instead. So far no one has encountered any problem because all the passed params are BlobReferences. This diff makes this assumption explicit. Reviewed By: azzolini Differential Revision: D4922930 fbshipit-source-id: e6dbd7a29ea640b7e62fcfec7ced3cc7d149f872	2017-04-26 10:35:45 -07:00
Jeffrey Dunn	9f9a2da1a1	Revert D4920719: [dper2][operator] ScaleGradientOp Summary: This reverts commit 0e1e0888f79594be874fdbdda5ccef7389064c50 Differential Revision: D4920719 fbshipit-source-id: 1ca9dc329eaffeb2932267d631506bb124d4e7ae	2017-04-26 09:34:47 -07:00
Huazhong Ning	e42c14e819	ScaleGradientOp Summary: ScaleGradient is a helper operator that does no actual numerical computation, and in the gradient computation phase scales the gradient from being computed through it. Differential Revision: D4920719 fbshipit-source-id: 0e1e0888f79594be874fdbdda5ccef7389064c50	2017-04-25 21:46:45 -07:00
Yangqing Jia	deb1327b6e	Re-apply #266 Summary: Closes https://github.com/caffe2/caffe2/pull/404 Differential Revision: D4943280 Pulled By: Yangqing fbshipit-source-id: c0988598d8ccb8329feac88382686324b90d4d46	2017-04-25 21:17:04 -07:00
Alexander Sidorov	b905166362	RNN: fix bug for parameter gradient in a case when SumOp is Summary: Issue is that AliasOp doesn't work well with swaps that we do for param.grad and param.accGrad. Tensors become the same if there is no reallocation of the gradient tensor inside the backward cell net's local workspace. bug explanation from akyrola: ``` gpu_0/decoder/decoder_hidden_encoder_outputs_sum_grad: tensor A on each timestap back to 0, we Alias gpu_0/decoder/weighted_encoder_outputs_grad, so then also gpu_0/decoder/weighted_encoder_outputs_grad: tensor A It's acc is: gpu_0/decoder/weighted_encoder_outputs_grad_acc: tensor B Now after timesteps, we swap (line 626) with _acc to get gpu_0/decoder/weighted_encoder_outputs_grad: tensor B gpu_0/decoder/weighted_encoder_outputs_grad_acc: tensor A OPTION A -- batch size is same as before or smaller: Then on next iteration, we do again the Alias to gpu_0/decoder/decoder_hidden_encoder_outputs_sum_grad, so now gpu_0/decoder/weighted_encoder_outputs_grad: tensor A and also gpu_0/decoder/weighted_encoder_outputs_grad_acc: tensor A swapping them does nothing and they are the same OPTION B -- batch size increases gpu_0/decoder/decoder_hidden_encoder_outputs_sum_grad is reallocated, becomes tensor C gpu_0/decoder/weighted_encoder_outputs_grad becomes tensor C with Alias gpu_0/decoder/weighted_encoder_outputs_grad_acc: is tensor A ``` Reviewed By: urikz Differential Revision: D4946730 Tags: rnn, caffe2 fbshipit-source-id: b52d63cb238b81d2ad40e05e70deb32a81336f47	2017-04-25 20:46:59 -07:00
Jiyan Yang	ef2701a57e	MapToRange layer Summary: A layer that takes raw ids as inputs and outputs the indices which can be used as labels. The mapping will be stored with the model. Reviewed By: kittipatv Differential Revision: D4902556 fbshipit-source-id: 647db47b0362142cdba997effa2ef7a5294c84ee	2017-04-25 16:03:58 -07:00
Yiming Wu	2c8b41e3f3	Adding add_weight_decay and image_input to brew module Summary: Adding add_weight_decay and image_input to brew module & remove `getWeights` and `getBias` from CNNModelHelper With fbgs `useWeights`, the results show that noone but add_weight_decay is using this function. I checked with oculus people, their getWeights is a different function. kennyhorror Please notice whether this is going to affect you :) Reviewed By: salexspb Differential Revision: D4945392 fbshipit-source-id: 4ef350fd81dd40a91847e9f3ebc5421eb564df32	2017-04-25 16:03:58 -07:00
Yiming Wu	885f906e67	resnet train print loss and accuracy Summary: printing resnet training loss and accuracy for each batch so that people will have better idea of what is going on Reviewed By: pietern Differential Revision: D4945390 fbshipit-source-id: 0fcd60f4735e81641355aba6e6cbf0e57e886e38	2017-04-25 16:03:58 -07:00
Yang Yang	5692969e8f	add gradient for LengthsTileOp Summary: lengthTile goes from 1 to multiple, the gradient op is simply the reverse, by adding up the fanned-out rows of gradients together into 1 Reviewed By: kittipatv Differential Revision: D4943375 fbshipit-source-id: deae9984e849974a0d484a10b94efdb1d30941cc	2017-04-25 14:31:15 -07:00
Aapo Kyrola	f82a510be6	share forward activation blobs + pass unused free blobs down all branches + use shape infernece Summary: Added optional support for using activation blobs for sharing as well. Doing this change revealed an non-optimal implementation in the blob sharing: we need to prefer to reuse freeblobs by prefering those blobs that are already shared by many other blobs. Otherwise the memory usage can increase when the pool of 'free blobs' grows. Also, my first version only passed "free blobs" (i.e blobs in recycling pool) down the first branch when operators forked. But now we pass those blobs that were not used by the first branch down the second branch and so on. Also added support for blob size information in the heuristic. This uses the shape inference mechanism. I had to also do some small tweaks: - use Sum() operator as a way to match shapes of blobs that had otherwise unknown shapes. This is related to the Sum() operator that is added to combine multiple incoming gradient inputs (with _autosplit gradients). - a couple of random shape inference fixes This reduces the Resnet-50 memory usage on 64 batch from 9.45 Gig to 8.5 Gig. For a 32 batch, the memory usage is 4330 MiB, down from 4800 MB, compared to Torch's 6856MiB (thanks prigoyal for checking this for me). This is unfortunately quite a bunch to review... Reviewed By: asaadaldien Differential Revision: D4393909 fbshipit-source-id: 9c7c94125f96512bea80463ebcb63c215ef95ff9	2017-04-25 14:23:25 -07:00
Kittipat Virochsiri	aaafcfc529	Improving usability of schema Summary: This diff contains the following changes: - implementing __repr__ on Field types; this makes it a little easier to see what broken in the unit tests - preserve the shape of ndarray input to schema; previously, empty and scalar arrays lose their shape, while other keeps the shape. - type-checking ndarray input; this ensures basic integrety of schema Reviewed By: xianjiec Differential Revision: D4913030 fbshipit-source-id: bd0f6b8722d95bfe800edf98ba05029c5b99d2af	2017-04-25 10:32:08 -07:00
intel	b3b66e3d00	MKL related files with review comments incorporated Summary: This PR is based on commit "977c6b3" as this version allows MKL to use all the cores available. All MKL related files are added here after incorporating review comments, major changes include 1. usage of Clang-format(Linter) with --style = Google 2. usage of macros for checking input and filter dimension in the mkl operators 3. merged Max and Average pooling functions 4. created a new folder for mkl related python scripts in Python folder and moved them there 5. there is no mkl_alexnet_test.py as that was redundant while convnet_benchmark.py does the same thing Closes https://github.com/caffe2/caffe2/pull/270 Differential Revision: D4905219 Pulled By: Yangqing fbshipit-source-id: e5f5b189714a835b93b9ebda24c52e09572dfca7	2017-04-25 00:31:29 -07:00
Andrey Malevich	7153594d7b	Fix corruption of NameScope when exception is thrown Summary: If exception is getting thrown inside of the namescope it won't be reset to it's previous value. This diff is changing this behavior to expected one. Reviewed By: kittipatv Differential Revision: D4928621 fbshipit-source-id: 1d3579f2093ca60901b0d37ae3f2108deb2333ea	2017-04-24 22:46:27 -07:00
Ahmed Taei	2533671a97	Support 3D&1D SpatialBatchNorm in cuDNN Differential Revision: D4941087 fbshipit-source-id: 4adbf1f8990c7356f8effd8b0e1ae286fce6558c	2017-04-24 22:16:19 -07:00
Xian Li	4c08d6ae3b	Allow cpu-only grad update in Parallelize_GPU. Summary: Instead of requiring gradient updates on GPU, this change will allow the usage when loss computation happens on GPU while all grad updates happen on CPU. Reviewed By: jhcross Differential Revision: D4943996 fbshipit-source-id: 1f2144c4277dfdb865877e0d0216ca1ac7dd7309	2017-04-24 18:47:36 -07:00
Mathieu Baudet	081001a176	"IsMemberOf" operator Summary: Add a pointwise `IsMemberOf` operator to Caffe2. The original idea was `In` but I think this is not so clear. I used `UnaryElementwiseWithArgsOp` at some point, but it was making the code a bit more difficult to read without bringing any feature. Reviewed By: ender-wieczorek Differential Revision: D4912655 fbshipit-source-id: 716b66bb51468dd59db5f76f23d78cda85961b58	2017-04-24 18:18:49 -07:00

1 2 3 4 5 ...

617 commits