Commit graph

419 commits

Author SHA1 Message Date
Aaron Markham
58f7f2b441 doxygen python block added
Summary: Closes https://github.com/caffe2/caffe2/pull/226

Differential Revision: D4793550

Pulled By: JoelMarcey

fbshipit-source-id: cc33e58186304fa8dcac2ee9115dcc271d785b1e
2017-03-29 06:46:16 -07:00
Andrey Malevich
7cc92b1260 Add eval net for layer_model_helper
Summary:
This diff is adding eval nets to layer model helper. It should be useful for
the cases when train/eval nets need some extra input (usually some supervision)
for train/eval. For example various sampled layers, etc.

Differential Revision: D4769453

fbshipit-source-id: 7a8ec7024051eab73b8869ec21e20b5f10fd9acb
2017-03-29 04:03:40 -07:00
Fei Sun
95657ea1e8 Protobuf is binary string. Use bytes instead.
Summary: Prepare for the Protobuf change.

Reviewed By: dzhulgakov

Differential Revision: D4784884

fbshipit-source-id: 86219eecefaf7637e70339437c9274c526ebd6fe
2017-03-28 19:03:23 -07:00
Aapo Kyrola
fd2835887b only resize stepWorkspaces when sequence length increases
Summary:
We should resize the workspace-vector only when it increases. Otherwise we end up destroying and recreating workspaces constantly if sequence length varies.

Modified the lstm_benchmark test to randomize sequence length.

This provides big perf improvement to machine translation pipeline. Look at the recurrent network op runtimes.

WITH:
I0328 12:17:54.073976 492094 prof_dag_net.cc:156]    136.271 ms/iter (   120.987 ms/iter) RecurrentNetwork
I0328 12:17:54.073982 492094 prof_dag_net.cc:156]    190.074 ms/iter (   156.828 ms/iter) RecurrentNetworkGradient

WITHOUT:
I0328 12:25:17.658206 518884 prof_dag_net.cc:156]    375.369 ms/iter (   249.268 ms/iter) RecurrentNetwork
I0328 12:25:17.658211 518884 prof_dag_net.cc:156]    278.892 ms/iter (    227.29 ms/iter) RecurrentNetworkGradient

With LSTM benchmark, get about 2x speedup

Reviewed By: jamesr66a

Differential Revision: D4789354

fbshipit-source-id: ad72f61974e35b0474abcacdc466ae9c6b4eb0ff
2017-03-28 14:08:00 -07:00
Bor-Yiing Su
a03d956b56 Fixes the flaky test. Although we create nets in three different nodes,
Reviewed By: azzolini

Differential Revision: D4788418

fbshipit-source-id: bdf90c5674b5dbb8b3bda21cf85ea33fedb36fa6
2017-03-28 13:48:07 -07:00
Ahmed Taei
f2b8150a1a Fix PadImage same padding argument.
Summary: PadImage has no kernel parameters resulting pads_ paraemeters to be not set (0). I added a test case too.

Differential Revision: D4785230

fbshipit-source-id: fd475e7c41208e07fa7a363def9a45c6f82cddfe
2017-03-28 13:21:36 -07:00
Alexander Sidorov
939daa3d99 gradient checker for nets
Summary: this is useful to test rnn cells

Reviewed By: dzhulgakov

Differential Revision: D4720641

fbshipit-source-id: baa7df43357ed8af72ede64be3e0a642a40472df
2017-03-28 13:03:14 -07:00
Aapo Kyrola
1ed746df45 BatchMatMulOp: use cuBLAS batched strided gemm for CUDA
Summary:
Instead of doing gemms in a for-loop (which is not parallelized), it is much better to do the batched matmuls using CUDA 8's new batched-striped version of gemm.

With the MT team's test, we get 5-10% improvement in overall walltime, so it is significant improvement:

----

Without batched gemm:

I0328 10:46:48.118605 58068 prof_dag_net.cc:136]    424.757 ms/iter (   283.878 ms/iter) RecurrentNetwork
I0328 10:46:48.118609 58068 prof_dag_net.cc:136]    352.603 ms/iter (    265.85 ms/iter) RecurrentNetworkGradient

With batched gemm:
I0328 10:53:48.169996 85617 prof_dag_net.cc:136]    407.438 ms/iter (   269.564 ms/iter) RecurrentNetwork
I0328 10:53:48.169999 85617 prof_dag_net.cc:136]    322.393 ms/iter (   287.625 ms/iter) RecurrentNetworkGradient

Reviewed By: jamesr66a

Differential Revision: D4788272

fbshipit-source-id: 210e8b94c1e036b6ef0f039ce000d455258651f4
2017-03-28 11:54:09 -07:00
Alexander Sidorov
242bff8480 RNN: avoid copy for gradients of inputs to the rnn cell and save more memory!
Summary:
This is pretty tricky to explain, but we can just use
backward_links. This way the whole cell would use a blob from the
states_grad tensor instead of having its own blob. This also should
save on memory a bit

Differential Revision: D4770798

fbshipit-source-id: 673f85b2c2fdf42c47feeaa24d1e2bf086f012f9
2017-03-28 10:02:25 -07:00
Jerry Pan
327d3cb2b5 Caffe2: add init method and metric logging to data loader
Summary: Caffe2: add init method and metric logging to data loader

Differential Revision: D4685665

fbshipit-source-id: c4e0a09ab6a90c26c329f731f261cba8af1d6bbd
2017-03-28 08:48:27 -07:00
Jerry Pan
78f0b35949 Caffe2: CUDA implementation for LeakyReluOp
Summary: Caffe2: CUDA implementation for LeakyReluOp

Reviewed By: asaadaldien

Differential Revision: D4782336

fbshipit-source-id: 402eace695307b62740c918660d9e521217e928a
2017-03-28 08:48:25 -07:00
James Cross
b41449b680 SparseMomentumSGDUpdateOp
Summary: Creates SparseMomentumSGDUpdate, a sparse version of MomentumSGDUpdate, to make that optimization method (via in-place updating operator) compatible with GradientSlices.

Differential Revision: D4784973

fbshipit-source-id: e6330f471a4d5f53589a6ac245e38f256ca7f354
2017-03-28 07:47:46 -07:00
Kittipat Virochsiri
da36212259 SamplingTrain layer
Summary:
`SamplingTrain` layer is a wrapper around another layer subclassing `SamplingTrainableMixin`. When initiated in the training context, `SamplingTrain` produces sparse output of the wrapped layer. Output can be paired with `indices` to create Map schema.  When initiated in prediction context, the full output of the wrap layer is produced.

This is liked the SampledFC function in model helper, https://fburl.com/gi9g1awh, with the ability to initiated in both trainig and prediction context.

I'd like to get consensus whether we should introduce the `SamplingTrain` layer and the accompaying mixin. This can probably be accomplished in some other way, but I think this is not too bad.

Reviewed By: xianjiec

Differential Revision: D4689887

fbshipit-source-id: 7be8a52d82f3a09a053378146262df1047ab26a8
2017-03-27 23:31:55 -07:00
Minsuk (Brian) Kahng
ebeb36f6ee Refactoring, t-sne, additional features
Summary:
t-sne projection of instances activations
Minor refactorings

Reviewed By: Mortimerp9

Differential Revision: D4752784

fbshipit-source-id: f5cdb74616ab8e00f9ec362c0b94bcf7988e680e
2017-03-27 20:33:20 -07:00
Yury Zemlyanskiy
0c47d345df Multi-gpu training for OSS seq2seq
Summary:
Use data_parallel_model for seq2seq multi-gpu training. The main reason for complexity here is that GatherOp hasn't yet been implemented on GPU.

This diff also adds better cliping procedure - clip by global norm rather than by absolute value.

Differential Revision: D4778691

fbshipit-source-id: bff184dae02ecc227413fef51f48a4726e5d3825
2017-03-27 17:32:39 -07:00
Fei Sun
3ddcff659d Move AddPlan, AddNet, AddBlobs to predictor_py_utils.py
Summary: Cleanup

Reviewed By: salexspb

Differential Revision: D4775061

fbshipit-source-id: b58405729227a6e3fd867d9d5ba959feaa99e5a6
2017-03-27 11:03:22 -07:00
Jerry Pan
ee28b6ce22 Caffe2: instrument Everstore loader
Summary: Caffe2: instrument Everstore loader and log to Scuba

Differential Revision: D4669060

fbshipit-source-id: 603256e4ba62a32d9aeadc409f83ef9b1f6a7358
2017-03-27 10:02:11 -07:00
Bor-Yiing Su
7fa4acab9b Loads only the model blobs from the checkpoints.
Summary:
To evaluate from checkpoints, we need to load a model from the checkpoints.
However, the checkpoints store way more blobs than the blobs needed by the
model. This function enables the model builder to load only the blobs
associated with the model to the workspace. After that, the model builder
can evaluate the model from the populated workspace.

Reviewed By: azzolini

Differential Revision: D4751414

fbshipit-source-id: a7a420228d681fc2dcfd8573cf69a97b1abc2ef3
2017-03-27 10:02:11 -07:00
Kittipat Virochsiri
6163676ebe Skip optimizer when param doesn't have gradient and optimizer is not set
Summary: Currently, we cannot have layer constant because layer params are required to have gradient and optimizer. Global constants don't cut for this because it can only be added once; therefore, a layer that add any global constant can only be used once.

Differential Revision: D4773212

fbshipit-source-id: 5b60d31f3c1602afb04b61f6d30b8e3e06ed2de3
2017-03-24 22:18:34 -07:00
Kevin Waugh
eea0ea7712 Struct nested field name lookup supports List
Summary:
D4690225 added support for nested field name lookup in nested
`schema.Struct`s.  It would throw a KeyError if trying to access a nested
`List`s field.  Writing the lookup recursively avoids the need to enumerate
all complex field types in the lookup.

Differential Revision: D4719755

fbshipit-source-id: 37c87a32d730f0f45f72fb20894da3e32f820999
2017-03-24 18:17:19 -07:00
Deepak Gopinath
6aee34b666 Registering GPU version of PackSegments using GPUFallbackOp
Summary: Creating PackSegments and UnpackSegments GPU operators using GPUFallbackOp for now. The op does mainly copying of blobs and this is a reasonable solution until we have a CUDA op.

Reviewed By: pietern

Differential Revision: D4761589

fbshipit-source-id: dd483b9e34ecb6b53925405e5b4c24859c549606
2017-03-24 16:01:53 -07:00
Xiaolong Wang
8ce34d6c87 Add Calibration
Summary: Add calibration to sparse_nn

Differential Revision: D4735564

fbshipit-source-id: 6baa637cbffcbbd50134a256d622ef8c962fca3b
2017-03-24 14:32:23 -07:00
Alisson Gusatti Azzolini
b711c7d039 More perf stats for BlobsQueue
Summary: Allow to drill down on data throuhgput overall and per field.

Reviewed By: dzhulgakov

Differential Revision: D4622168

fbshipit-source-id: 1462bb2fac05824fda0c02f4f5f0b8713893e650
2017-03-24 14:03:28 -07:00
Fei Sun
29c1102806 Extract net and blobs assignment to separate functions
Summary:
Use AddNet and AddBlobs to add net and blobs to meta_net_def.
This a codemod and does not change the functionality.
It is for preparation of the protobuf change.
Depends on: D4770648

Reviewed By: salexspb

Differential Revision: D4771110

fbshipit-source-id: 00cecb2105f2c332bd50c3c51b9a10e1004fa90f
2017-03-24 13:17:24 -07:00
Luke Yeager
0ade0578b1 Reset workspace after each test in copy_ops_test
Summary:
This was a nasty one to track down. This was the error message:
```
E0323 14:47:46.138900  2870 context_gpu.h:126] Encountered CUDA error: an illegal memory access was encountered
F0323 14:47:46.139143  2870 operator.h:176] Computation on device returned error in operator
input: "x_gpu_2" output: "loss" name: "" type: "AveragedLoss" device_option { device_type: 1 cuda_gpu_id: 1 }
```
Closes https://github.com/caffe2/caffe2/pull/220

Differential Revision: D4771086

Pulled By: Yangqing

fbshipit-source-id: f2d0f39f1647c84d97d9745f8a0305a389bfbc41
2017-03-24 12:20:34 -07:00
Fei Sun
ad8b92b9e8 Extract plans assignment to AddPlan function
Summary:
Codemod to use a separate function, for protobuf change later on
It does not change the functionality

Reviewed By: salexspb

Differential Revision: D4770648

fbshipit-source-id: d8090f45d31ffa5ca1dca47297fb7c196f34d8a6
2017-03-24 12:02:49 -07:00
Yury Zemlyanskiy
97a6400f03 Don't do copy for param_grad in backward_step_net
Summary: We anyway accumulate values of this blob (param_grad) in a another special internal blob

Differential Revision: D4768643

fbshipit-source-id: a9d08b7eafd25f278a8db722f9cdb1d0064b852a
2017-03-24 02:22:33 -07:00
Ahmed Aly
99bfd36a04 CRF layer in caffe2
Summary:
This is implementation of a CRF layer in caffe2 according to this paper: https://arxiv.org/abs/1603.01360
Currently this implementation works only for batch_size = 1

Reference implementations:

- Tensorflow:
 63a21e0540/tensorflow/contrib/crf/python/ops/crf.py

- Theano:
https://github.com/glample/tagger/blob/master/model.py#L286

Differential Revision: D4644004

fbshipit-source-id: bf0801fd8562d11dca3fefe371c3d85e1dd69ccc
2017-03-23 22:02:02 -07:00
Bram Wasti
396ebb0546 exec_net --> predict_net
Summary: Change the naming convention back for maintainability.

Reviewed By: Yangqing

Differential Revision: D4741875

fbshipit-source-id: 044051e772383e81812ae7064a921e97d63615dc
2017-03-23 16:31:49 -07:00
Deepak Gopinath
422c65ca35 Removing unnecessary Copy after fixing gradients for external parameters
Summary: Apart from copying gradient blobs for inputs with initial_cell_input, we needed to perform a similar operation for external parameters used by the step net

Reviewed By: salexspb

Differential Revision: D4752259

fbshipit-source-id: 13ee48cf583ed86221a4cc1cc9f57f5c3a7d2450
2017-03-23 15:04:22 -07:00
Huazhong Ning
8168e8ac25 allows to specify output names for functional layers
Summary:
currently the output schema and blobs are names as "field_i" which is
bad for debugging. This diff allows us to specify output names.

Reviewed By: kennyhorror

Differential Revision: D4744949

fbshipit-source-id: 8ac4d3c75cacbb4c9b5f55793ac969fe1cf20467
2017-03-23 13:18:58 -07:00
Ahmed Taei
3b7cb50d1c Add ConvNd to model helper
Summary:
Add ConvNd interface for Nd  convluton and keep Conv for 2d convlution.
I added _BaseConv to share code between ConvNd and Conv.

Reviewed By: Yangqing

Differential Revision: D4660822

fbshipit-source-id: 8339421351ce9a36ce5a165f7fa455cfcc61733d
2017-03-22 15:47:48 -07:00
Yangqing Jia
0276c992b7 translator fix
Summary:
This completes the fix that viswanathgs started in an earlier diff but did not
cover the full Caffe convention. It should have proper guards for all the stuff
that Caffe implies, either supporting it or throwing an explicit exception.

Reviewed By: viswanathgs

Differential Revision: D4751751

fbshipit-source-id: 474e921c33840cff333a631b7b19f881b39ebccd
2017-03-22 15:09:13 -07:00
Yury Zemlyanskiy
ea66516d5e Output attention weights from apply_xxx_attention methods
Summary: OSS diff. We need it later for beam decoding.

Differential Revision: D4747785

fbshipit-source-id: ce2d53ee2434216ace3c4ddbd40a9b68e9db7ec5
2017-03-21 19:01:58 -07:00
Alexander Sidorov
d7b2aebf2c Support for Sum in cell net as first operator
Summary: This didn't work for a reason specified in comments. Also some cleanup in the unit tests, now inference uses a custom workspace to run cell net on

Reviewed By: urikz

Differential Revision: D4742670

fbshipit-source-id: 04165c029fddec5ae31b20b207faf06d2fa20816
2017-03-21 18:32:18 -07:00
Yangqing Jia
aa4d07d3c4 bugfix for Windows, esp. VS 2017
Summary:
aaronmarkham this solves your Windows build issue. Basically:

(1) VS 2017 does not have CUDA support yet, and we will be waiting on NVidia to do so.

(2) VS 2015 and 2017 need different cmake generator strings.

This PR shows how to determine those and also updates appveyor to do contbuild guard for the following 3 settings:
- VS2015 without cuda
- VS2017 without cuda
- VS2015 with cuda
Closes https://github.com/caffe2/caffe2/pull/210

Differential Revision: D4745007

Pulled By: Yangqing

fbshipit-source-id: 50952552843abd0eb6f4145d9f132daeee3a6794
2017-03-21 05:17:59 -07:00
Yury Zemlyanskiy
93ff338ca7 Beam decoder for NMT in Caffe2
Summary: yolo5

Differential Revision: D4685076

fbshipit-source-id: b5534e441bb453f90e5210294f2dfff6b5c3b5b1
2017-03-20 22:03:59 -07:00
Kevin Waugh
d13f98de4e implemented DistillLRLoss
Summary: Created `BatchDistillLRLoss` layer and added support for it in DPer2.

Differential Revision: D4718333

fbshipit-source-id: b873954ea704daafed94ac65fef47a20d56858e2
2017-03-20 16:01:29 -07:00
Ahmed Taei
e41d35909a Conv-ND NCHW CUP/CUDA implementation
Summary: Migrate caffe1 ConvNd implementation to caffe2.

Reviewed By: Yangqing

Differential Revision: D4659868

fbshipit-source-id: 14b178af3faa2c0b12e5a9f7aa76c1d8945419ea
2017-03-20 14:01:07 -07:00
James Reed
33f41c06c0 Remove more instances of batch_size
Summary: D4734505 part 2. Remove more instances of the batch_size parameter

Reviewed By: urikz

Differential Revision: D4736906

fbshipit-source-id: fc9d374e9308017d61c427890364c5ab9cec2edf
2017-03-19 22:31:30 -07:00
James Reed
17da5856ed Remove batch_size parameter from attention and LSTMWithAttention interfaces
Summary: Reshape based on tensor shapes in the graph rather than based on a passed-in batch_size parameter

Reviewed By: urikz

Differential Revision: D4734505

fbshipit-source-id: d9c23d85be84f61124106e752ef2b4f6945e2a07
2017-03-19 18:16:28 -07:00
Yury Zemlyanskiy
d1424c3265 Revert D4702086: Remove batch_size parameter from attention and LSTMWithAttention interfaces
Summary: This reverts commit c4c1d8425cd36c1e86695918eaba2667c27e9601

Differential Revision: D4702086

fbshipit-source-id: 4620610b182bb84b9297b5de32782761ae89d20b
2017-03-17 17:36:47 -07:00
Alexander Sidorov
f97d7949d0 Remove legacy LSTM, cleanup tests
Summary: we don't use this one any more except a few tests

Reviewed By: urikz

Differential Revision: D4731401

fbshipit-source-id: c5c28b7594e3251f501fc28455dfc9bd2093a836
2017-03-17 16:33:53 -07:00
Kittipat Virochsiri
4829bdb1ea BatchSoftmaxLoss layer
Summary: Similar to BatchLRLoss layer

Reviewed By: xianjiec

Differential Revision: D4689609

fbshipit-source-id: 89fa4b9d4145ce77cb2aaa7a5c0c1a24f901d88f
2017-03-17 10:19:06 -07:00
Kittipat Virochsiri
cea16ff7cd BatchSigmoidCrossEntropyLoss
Summary: To support feed interset team

Reviewed By: kdub0

Differential Revision: D4719213

fbshipit-source-id: 8deb3544377fb06593399b101de66f3f845f93b5
2017-03-17 09:35:51 -07:00
James Cross
79c3a3af54 add gpu support for caffe2-seq2seq
Summary: Adding synchronous optimization on GPUs to the translation training pipeline, via data_parallel_model.Parallelize_GPU, which needs to be updated so there is some way of performing sparse parameter updates (e.g., on embedding tables), whether on GPU or CPU.

Reviewed By: urikz

Differential Revision: D4631914

fbshipit-source-id: 9cdd655f7dbda3f9b2733d459228b3e097892441
2017-03-17 05:19:14 -07:00
Jon Morton
1513b1de6b Add ResizeNearest operator
Summary: This adds a nearest neighbor interpolation resizing operator to caffe2. CPU only, NCHW only, no gradients. Also adds torch2caffe support. This is probably not optimal in terms of performance, but it works.

Reviewed By: ajtulloch

Differential Revision: D4724244

fbshipit-source-id: b8295061141fb513da84acf91fdfd67264119059
2017-03-16 18:49:01 -07:00
Huazhong Ning
ad4ae4528f migrate mtml to dper2
Summary:
1. migrate the basic mtml model to dper 2
2. test dper 2 mtml model
3. test all optimizers

Reviewed By: kittipatv

Differential Revision: D4680215

fbshipit-source-id: 7aac5c59bdac22fcad8ed869b98e9e62dca1d337
2017-03-16 17:48:05 -07:00
James Reed
cc2e915461 Implement TopK op in caffe2
Reviewed By: salexspb, urikz

Differential Revision: D4718439

fbshipit-source-id: e6866eb7bb586f2716662cd4b65961bdd9914525
2017-03-16 17:32:20 -07:00
Kevin Waugh
2c8bf2525b added BatchL2Loss layer
Summary: layer that takes a label, prediction pair and outputs the L2 loss

Reviewed By: kittipatv

Differential Revision: D4702111

fbshipit-source-id: 09f2ede44d1b548e61096de741f1b2aa0b66bbcb
2017-03-16 17:32:20 -07:00