Summary: Instead of printing the exception using print() use traceback.print_exc() This way you get a stack trace
Reviewed By: jay-mahadeokar
Differential Revision: D5604642
fbshipit-source-id: f8cb67e554305cd2fbed384a4a2040fa2b16e7c0
Summary: Make the command-line arguments pertaining to model architecture the same as between train.py and translate.py. Also use s() scoping function for all intermediate blobs in attention.py (this is for comatibility with multi-headed attention).
Differential Revision: D5594312
fbshipit-source-id: cadf51d854b5a9174ec913f32c655be2abf111e5
Summary: In order to control the absolute scale/magnitude of the output of this op, added a tuning parameter: amplitude
Reviewed By: jamesr66a
Differential Revision: D5596574
fbshipit-source-id: 3b7e316de55cce6fd686da70aa5658ec3e99b070
Summary: GRU is different than LSTM that it only has hidden states but no cell states. So in this case, reusing the code of _LSTM is problematic, as we need to delete the part of creating cell state, and change many other places that use hard-coded 4 (hidden_all, hidden, cell_all, cell) into 2 (hidden_all, hidden). Otherwise GRU will break during the backward pass, when the optimizer tries to apply gradient to each of the parameters, because cell state is never used, so it does not have gradients for the corresponding parameters (i.e., cell_state_w, cell_state_b).
Differential Revision: D5589309
fbshipit-source-id: f5af67dfe0842acd68223f6da3e96a81639e8049
Summary:
Model downloader was broken after the move on s3 to the vanity url, download.caffe2.ai. Using this as the url base hits a redirect, and will result in the script throwing a 403 error. Rather than upgrading to urllib2 or putting in a bunch of code to handle a redirect on urllib, we can just use the non-vanity base url.
Closes https://github.com/caffe2/caffe2/pull/1020
Reviewed By: Yangqing
Differential Revision: D5568686
Pulled By: aaronmarkham
fbshipit-source-id: d88a6b3e1b7955835fc03b036dc54dec48316e7f
Summary: as promised, a separate diff for dpm changes I made in experimental code
Reviewed By: pietern
Differential Revision: D5551304
fbshipit-source-id: 9013aeab6c388b1c415ffb2e36fb8dd6b8cf90b0
Summary: This diff implements CUDA version of OneHot operator.
Reviewed By: bddppq
Differential Revision: D5578543
fbshipit-source-id: 55b70e8ec6ee34b647b9140fecbba31b6968f403
Summary: Add CUDA version of GRU operator
Reviewed By: jamesr66a
Differential Revision: D5571043
fbshipit-source-id: 332aa64fc8a9116cc33382f2b2907080e58c13b3
Summary:
Fix multilayer inference in Caffe2 example seq2seq code. (Rely on LSTMWithAttentionDecoder.apply rather than fixed state indices to determine stepwise decoder output.)
Also assorted updates to bring code in line with changes elsewhere in the codebase, and added unit tests which ensure that training and inference networks generate the same loss, which should make these problems much easier to identify in future.
Reviewed By: jamesr66a
Differential Revision: D5579803
fbshipit-source-id: 6e0f27340d981990ab8d0da58e63793222e7be87
Summary:
It was reverted previously because of lack of schema for gradient op. Added it back and resend.
difference between this diff and previous reverted diff:
1. added schema for gradient operator
2. change line:95 in kmax_pooling_op.h from CAFFE_ENFORCE to CAFFE_ENFORCE_GE
Reviewed By: xianjiec
Differential Revision: D5568867
fbshipit-source-id: 39813b389a5da803967a561249793afdfce00c58
Summary:
In Python 3x dictionary values aren't a list and can't be concatenated to a list
this diff should fix that.
Reviewed By: andrewwdye
Differential Revision: D5576724
fbshipit-source-id: c60441857ceceb9c4a71122d2db5e9abad6d3fc2
Summary:
The L1Distance operator used to return a single value denoting the L1 of the entire input, instead of a vector for each input value.
This fixes that.
Reviewed By: Yangqing
Differential Revision: D5570385
fbshipit-source-id: fbab0e0c9262ccbdb3af27262b8baacdeb2d0fc9
Summary: New hybrid randomized sparse nn, which allows layers of sparse NN model to be randomized, semi-random, or learnable
Reviewed By: chocjy
Differential Revision: D5416489
fbshipit-source-id: eb8640ddf463865097ba054b9f8d63da7403024d
Summary:
To train an image model, we also can use label embedding vector as supervision as opposed to using SoftmaxLoss/SigmoidCrossEntropyLoss.
In such case, the label is a dense vector. This diff enables such use cases.
Reviewed By: panshen1
Differential Revision: D5556203
fbshipit-source-id: 52c61495e02fab457dc2d43e3345d7dbd5580ab7
Summary:
data_workers.py provides a really nice, easy way to run background threads for data input. Unfortunately, it's restrictive, the output of the fetcher function has to be a numpy array.
I pulled out that core nice thread management into parallel_workers, and updated the classes data_workers to extend those classes. The main change was refactoring out most of the queue handling logic into QueueManager.
This way parallel_workers can be used to manage background threads without having to use the queue for output.
Reviewed By: akyrola
Differential Revision: D5538626
fbshipit-source-id: f382cc43f800ff90840582a378dc9b86ac05b613
Summary:
Implement dot attention as described in https://arxiv.org/abs/1508.04025
This saves the computation of weighted encoder outputs in `rnn_cell.py`
When the encoder and decoder dimensions are different, we apply an FC, which corresponds to the general case below Figure 2.
Refactored unit tests.
Reviewed By: jhcross
Differential Revision: D5486976
fbshipit-source-id: f9e9aea675b3b072fbe631bc004199b90a9d95cb
Summary:
Caffe2: add a DB that's wrapped around a BlobsQueue as an adapter for data from non-DB interface.
This is useful for bridging the gap between DB interface data processing ops (TensorProtosDBInput, ImageInputOp etc.) and data that's coming from arbitrary Python or the pretty intricate Hive reader.
Reviewed By: akyrola
Differential Revision: D5554560
fbshipit-source-id: 01bb0056410f9ade205367d5fefc721f91f5b629
Summary:
The current implementation for s=0 doesn't support backward pass.
Switching to using pow op instead as a temporary solution.
Reviewed By: jackielxu
Differential Revision: D5551742
fbshipit-source-id: 33db18325b3166d60933284ca1c4e2f88675c3d3
Summary:
This brings it up to par with how the RedisStoreHandler
works. The store handler configuration does not have to change and
only the run ID parameter changes across runs.
This was inconsistent and came up in https://github.com/caffe2/caffe2/issues/984.
Reviewed By: Yangqing
Differential Revision: D5539299
fbshipit-source-id: 3b5f31c6549b46c24bbd70ebc0bec150eac8b76c
Summary:
This diff makes SparseLengthsSum(Gradient) Async. It goes through these logics:
1. Adding INDICES to Gradient op input so that we can make it async without device host copies.
2. Registering new 3 input op as gradient for CPU/GPU version of SLS
3. In order to not breaking old nets(they are mostly on cpu), I still register the old 2 input op. So the op schema will not complain when it encounter some old nets that has SLSGradient op in it.
wickedfoo Sorry this diff might bring you extra work of migrating your optimization effort to this new async gradient op. But we think it is worth it. :(
Reviewed By: dzhulgakov
Differential Revision: D5423188
fbshipit-source-id: 62494a6c52a507c4a4688d5a9e1a2bc720d5370d
Summary: Added caffe2 operator to calculate the sinusoidal position encoding for word embeddings, as described on page 6 in https://arxiv.org/abs/1706.03762.
Reviewed By: jamesr66a
Differential Revision: D5533024
fbshipit-source-id: 1afb35cd7f9d8c71f2635b853e56b2c840f0bc1f
Summary:
To achive this, I modified the blob name scheme defined in a layer.
Before it was scope/fc_w and scope/fc_w_auto_0 (if there is another fc
within the same scope).
Now I change it to scope/fc/w and scope/fc_auto_0/w.
That is, we rely on the uniqueness of the scoped layer name to define
names for blobs.
I also overwrote the create_param method in LayerModelHelper to let it
use the resolved name for blobs given the sharingparameter context.
There are some details such as making the initializer more structured
that I need to finalize.
Reviewed By: kennyhorror
Differential Revision: D5435132
fbshipit-source-id: a0525f5ea0977e255dd5ea765b38913f5951d455
Summary: Implement operators LpNorm, which is to calculate the Lp norm of a tensor for regularization(p=1or 2) . Currently, there are only operator L1Distance to calculate the l1 distance of two same-shape tenors. We want to make it take only one input and output the l1 loss. We would do the same for l2 loss. We also plan to implement l_{p,q} loss, but have not decided which p and q to take.
Reviewed By: xianjiec
Differential Revision: D5460051
fbshipit-source-id: d67a38fbc94afa52de26d4a53e4d2b7df3c50b6a
Summary:
KaimingHe debugged slow model, and found out that global average pooling was hideously slow, even with CUDNN. Turns out CUDNN pooling op (especially backward pass) is not optimized for global pooling.
This adds a fast path for global average pooling with NCHW. This is about 30x faster than CUDNN with 56 x 56 pooling, Compared to equivalent ReduceBackSum, this is about 3x faster.
I will bootcamp the max pooling.
Reviewed By: asaadaldien
Differential Revision: D5533059
fbshipit-source-id: 2d590693d737fa92184603663031d96f6145f304
Summary: This will throw away a few examples. It is desirable to keep batch size constant for full sync data parallel
Reviewed By: dzhulgakov
Differential Revision: D5531788
fbshipit-source-id: e19385401155e731cfc5b25e8e9ea7c16c19d478
Summary:
Currently, for `from_column_list` if the input col_names=[], it throws
errors. To solve this issue, we fix the get_field function so that it creates
an empty Struct when empty col_names is given.
Reviewed By: kittipatv
Differential Revision: D5543865
fbshipit-source-id: f6dfa25326e355f8ec24e5542761851a276beeb9
Summary: Allow the use of apply_transform() in the python API
Reviewed By: bwasti
Differential Revision: D5530483
fbshipit-source-id: 61a6d36fe125c89629fdeea040a717c453d84417
Summary: This allows users to add an arbitrary of additional outputs to ImageInputOp. These are populated by reading additional TensorProto values from the TensorProtos from the DBReader, and converting them into Tensors. Similar to labels, only ints and floats are supported, and multiple values are supported.
Reviewed By: panshen1
Differential Revision: D5502019
fbshipit-source-id: 5a8b61b3a8549272a112e8e02cd613d8f9a271ba
Summary: Caffe2: allow nets that don't use all input in net.ClonePartial
Differential Revision: D5535564
fbshipit-source-id: 0ec8fb3ade4d7d6cd4a702c9c265d9c77f27a627
Summary: Add tensor inference function for squeeze, refactor a bit
Reviewed By: asaadaldien
Differential Revision: D5518880
fbshipit-source-id: 5b8cb9154f5f777d4be3612a96d7ed76a9068c0c
Summary:
Feed team uses distributed training and wants to also use transfer learning.
Currently, transfer learning implements by overwriting the layer parameter
initializer. Therefore, PS builder can't infer correctly the parameter shape.
To fix this, add a field 'shape' in `layer_parameter` and set the shape if we
overwrite its initializer.
We also enforce the check of parameter shape between the original initializer
and the loaded blob. (this adds extra cost)
Differential Revision: D5520541
fbshipit-source-id: 80547dbd328b3f6cbfcea0b2daaf4004703dfe81
Summary: Several refinements to seq2seq example code, including support for multilayer LSTM.
Reviewed By: jamesr66a
Differential Revision: D5460372
fbshipit-source-id: d2eabf6aa9a5b5df7bbc341fd99c4e7d8322e717
Summary: Memonger did not properly track the number of times a blob output has to be produced before an operator can be visited. Actually I remember fixing this before, but well. This bug was manifested in Priya's model, so thanks prigoyal, and benz's model verifier nicely caught the wrong output.
Reviewed By: asaadaldien
Differential Revision: D5524912
fbshipit-source-id: 10f4d7056b84aba0274a918af508ea043e6026f9
Summary: This method runs a train net multiple times therefore enables testing layers with iteration-dependent behavior.
Differential Revision: D5493750
fbshipit-source-id: a7fb967a66f799aaf82acfadc4ecf66e0744da20
Summary: One of my workflows was stuck before everstore/hive data input was experiencing networking issues (No route to host etc.). But it is hard to know this is happening because the errors were logged to stdout. Anyway, added a simple logging to warn if the data workers enqueue thread is not getting new data for over 10 secs.
Reviewed By: panshen1
Differential Revision: D5522816
fbshipit-source-id: a036c4afdfbbafea130a4251c1ca02c138d19a83