Commit graph

1038 commits

Author SHA1 Message Date
Yangqing Jia
cc2c4d07d6 Always use assertAlmostEqual for floats when crossing python and C boundaries
Summary:
This fixes travis numerical issue.
Closes https://github.com/caffe2/caffe2/pull/1024

Differential Revision: D5571340

Pulled By: Yangqing

fbshipit-source-id: 097e6f91da68cc3eacf21fe109f342e0dddea189
2017-08-06 14:51:11 -07:00
Juan Miguel Pino
4d8a8c2e1e Implement dot attention
Summary:
Implement dot attention as described in https://arxiv.org/abs/1508.04025
This saves the computation of weighted encoder outputs in `rnn_cell.py`
When the encoder and decoder dimensions are different, we apply an FC, which corresponds to the general case below Figure 2.
Refactored unit tests.

Reviewed By: jhcross

Differential Revision: D5486976

fbshipit-source-id: f9e9aea675b3b072fbe631bc004199b90a9d95cb
2017-08-06 11:50:16 -07:00
Jerry Pan
fac241bcbc Caffe2: add a DB that's wrapped around a BlobsQueue as an adapter for data from non-DB interface
Summary:
Caffe2: add a DB that's wrapped around a BlobsQueue as an adapter for data from non-DB interface.

This is useful for bridging the gap between DB interface data processing ops (TensorProtosDBInput, ImageInputOp etc.) and data that's coming from arbitrary Python or the pretty intricate Hive reader.

Reviewed By: akyrola

Differential Revision: D5554560

fbshipit-source-id: 01bb0056410f9ade205367d5fefc721f91f5b629
2017-08-06 11:50:14 -07:00
Szymon Piechowicz
12f25c8106 Revert D5545533: [pairatt] implement kMaxPooling operator
Summary:
This reverts commit 8378caaac528a71c154067168787ed493bfb0d37

bypass-lint

Differential Revision: D5545533

fbshipit-source-id: a8d9db807f5b22461b21b7589886cf54861e3757
2017-08-04 01:33:29 -07:00
Jiyan Yang
4b80ff89e2 Use softsign op for s=0 in arc-cosine feature map
Summary:
The current implementation for s=0 doesn't support backward pass.
Switching to using pow op instead as a temporary solution.

Reviewed By: jackielxu

Differential Revision: D5551742

fbshipit-source-id: 33db18325b3166d60933284ca1c4e2f88675c3d3
2017-08-03 23:35:11 -07:00
Pieter Noordhuis
d177846dbf Add prefix argument to FileStoreHandler
Summary:
This brings it up to par with how the RedisStoreHandler
works. The store handler configuration does not have to change and
only the run ID parameter changes across runs.

This was inconsistent and came up in https://github.com/caffe2/caffe2/issues/984.

Reviewed By: Yangqing

Differential Revision: D5539299

fbshipit-source-id: 3b5f31c6549b46c24bbd70ebc0bec150eac8b76c
2017-08-03 10:37:26 -07:00
Yiming Wu
8e1ecb1cfd async sparse length sum op
Summary:
This diff makes SparseLengthsSum(Gradient) Async. It goes through these logics:

1. Adding INDICES to Gradient op input so that we can make it async without device host copies.
2. Registering new 3 input op as gradient for CPU/GPU version of SLS
3. In order to not breaking old nets(they are mostly on cpu), I still register the old 2 input op. So the op schema will not complain when it encounter some old nets that has SLSGradient op in it.

wickedfoo  Sorry this diff might bring you extra work of migrating your optimization effort to this new async gradient op. But we think it is worth it. :(

Reviewed By: dzhulgakov

Differential Revision: D5423188

fbshipit-source-id: 62494a6c52a507c4a4688d5a9e1a2bc720d5370d
2017-08-03 03:04:15 -07:00
Christopher Hay
a4e6ca6956 Added Sinusoidal Position Encoding Op
Summary: Added caffe2 operator to calculate the sinusoidal position encoding for word embeddings, as described on page 6 in  https://arxiv.org/abs/1706.03762.

Reviewed By: jamesr66a

Differential Revision: D5533024

fbshipit-source-id: 1afb35cd7f9d8c71f2635b853e56b2c840f0bc1f
2017-08-03 01:46:46 -07:00
Chonglin Sun
4a8545e3c6 implement kMaxPooling operator
Summary: used by attention model

Differential Revision: D5545533

fbshipit-source-id: 8378caaac528a71c154067168787ed493bfb0d37
2017-08-03 00:48:34 -07:00
Wenlin Chen
adc5510ecb dynamic embedding
Summary: refactor get_categorical_limit

Reviewed By: xianjiec

Differential Revision: D5459389

fbshipit-source-id: 14a7e07394db52fb090c6923e341c34576fcb6d6
2017-08-03 00:33:18 -07:00
Jiyan Yang
a8695178aa Adding parameter sharing API to Dper2
Summary:
To achive this, I modified the blob name scheme defined in a layer.
Before it was scope/fc_w and scope/fc_w_auto_0 (if there is another fc
    within the same scope).
Now I change it to scope/fc/w and scope/fc_auto_0/w.
That is, we rely on the uniqueness of the scoped layer name to define
names for blobs.

I also overwrote the create_param method in LayerModelHelper to let it
use the resolved name for blobs given the sharingparameter context.

There are some details such as making the initializer more structured
that I need to finalize.

Reviewed By: kennyhorror

Differential Revision: D5435132

fbshipit-source-id: a0525f5ea0977e255dd5ea765b38913f5951d455
2017-08-03 00:33:18 -07:00
Honghao Wei
cb1dd21280 adding operator lp_norm to support calculating l1 norm and l2 norm
Summary: Implement operators LpNorm, which is to calculate the Lp norm of a tensor for regularization(p=1or 2) . Currently, there are only operator L1Distance to calculate the l1 distance of two same-shape tenors. We want to make it take only one input and output the l1 loss. We would do the same for l2 loss. We also plan to implement l_{p,q} loss, but have not decided which p and q to take.

Reviewed By: xianjiec

Differential Revision: D5460051

fbshipit-source-id: d67a38fbc94afa52de26d4a53e4d2b7df3c50b6a
2017-08-02 15:09:08 -07:00
Simon Layton
ded2a5899e Option to set BN scale and bias initial values
Summary:
Necessary to reproduce setup from 1-hour imagenet paper
Closes https://github.com/caffe2/caffe2/pull/995

Differential Revision: D5547666

Pulled By: akyrola

fbshipit-source-id: cbd4396888b02f32c67e1fe7e53636329de64f1b
2017-08-02 11:38:57 -07:00
Aapo Kyrola
ab42a95b6f fast path for CUDNN global average pooling
Summary:
KaimingHe  debugged slow model, and found out that global average pooling was hideously slow, even with CUDNN. Turns out CUDNN pooling op (especially backward pass) is not optimized for global pooling.

This adds a fast path for global average pooling with NCHW. This is about 30x faster than CUDNN with 56 x 56 pooling, Compared to equivalent ReduceBackSum, this is about 3x faster.

I will bootcamp the max pooling.

Reviewed By: asaadaldien

Differential Revision: D5533059

fbshipit-source-id: 2d590693d737fa92184603663031d96f6145f304
2017-08-02 11:10:10 -07:00
Alisson Gusatti Azzolini
0fc2bf26b4 Option to enforce batch size
Summary: This will throw away a few examples. It is desirable to keep batch size constant for full sync data parallel

Reviewed By: dzhulgakov

Differential Revision: D5531788

fbshipit-source-id: e19385401155e731cfc5b25e8e9ea7c16c19d478
2017-08-01 22:29:55 -07:00
Yan Shang
c662480ea6 Return empty Struct when get_field has empty input
Summary:
Currently, for `from_column_list` if the input col_names=[], it throws
errors. To solve this issue, we fix the get_field function so that it creates
an empty Struct when empty col_names is given.

Reviewed By: kittipatv

Differential Revision: D5543865

fbshipit-source-id: f6dfa25326e355f8ec24e5542761851a276beeb9
2017-08-01 19:49:47 -07:00
Junjie Bai
0c7ee02c37 Add CUDA implementation of BooleanUnmask and fixed some bugs in the test
Reviewed By: akyrola

Differential Revision: D5405606

fbshipit-source-id: fd755ee2ec3d742597f7f5500f54caa396db4da4
2017-08-01 16:51:40 -07:00
Ben Zhang
6314c1fc15 Transforms in Python
Summary: Allow the use of apply_transform() in the python API

Reviewed By: bwasti

Differential Revision: D5530483

fbshipit-source-id: 61a6d36fe125c89629fdeea040a717c453d84417
2017-08-01 16:51:38 -07:00
Thomas Dudziak
676bedd298 Fixes for Python 3 in caffe2/caffe2/fb/data
Summary: As title

Reviewed By: MisterTea

Differential Revision: D5532387

fbshipit-source-id: 0a51ca40b93cc2eb5371f0b86f2800354cd1939c
2017-08-01 15:22:55 -07:00
Kevin Wilfong
60cb55461e Caffe2: Support additional outputs in ImageInputOp
Summary: This allows users to add an arbitrary of additional outputs to ImageInputOp.  These are populated by reading additional TensorProto values from the TensorProtos from the DBReader, and converting them into Tensors.  Similar to labels, only ints and floats are supported, and multiple values are supported.

Reviewed By: panshen1

Differential Revision: D5502019

fbshipit-source-id: 5a8b61b3a8549272a112e8e02cd613d8f9a271ba
2017-08-01 14:36:05 -07:00
Bram Wasti
3a99698734 include numpy's other 32bit int type
Summary: forgot one :)

Reviewed By: akyrola

Differential Revision: D5534905

fbshipit-source-id: a0e58ca3922ec80f526f7586931ff3da8e9bcffc
2017-08-01 13:53:11 -07:00
Tao Wu
5d304a3b49 add gradient for SparseToDenseMask operator
Summary: add gradient for SparseToDenseMask operator

Reviewed By: kittipatv

Differential Revision: D5320792

fbshipit-source-id: 8ee7f1c87e8270ad6077ed197ce9512524069b59
2017-08-01 13:05:03 -07:00
Alisson Gusatti Azzolini
1968e03486 net_printer.to_string() accepts NetDef
Summary: Title.

Reviewed By: kennyhorror

Differential Revision: D5531925

fbshipit-source-id: 8f8961e6ab14d49720f74ec01c197ba9cc3e33ce
2017-08-01 10:17:29 -07:00
Szymon Piechowicz
3324db447f Caffe2: allow nets that don't use all input in net.ClonePartial
Summary: Caffe2: allow nets that don't use all input in net.ClonePartial

Differential Revision: D5535564

fbshipit-source-id: 0ec8fb3ade4d7d6cd4a702c9c265d9c77f27a627
2017-08-01 10:05:46 -07:00
Aapo Kyrola
e38015756a shape inference for Squeeze
Summary: Add tensor inference function for squeeze, refactor a bit

Reviewed By: asaadaldien

Differential Revision: D5518880

fbshipit-source-id: 5b8cb9154f5f777d4be3612a96d7ed76a9068c0c
2017-07-31 16:04:24 -07:00
Xiaolong Wang
82adbde878 pass layer_parameter shape to ps builder if cannot inferred from initializer
Summary:
Feed team uses distributed training and wants to also use transfer learning.

Currently, transfer learning implements by overwriting the layer parameter
initializer. Therefore, PS builder can't infer correctly the parameter shape.

To fix this, add a field 'shape' in `layer_parameter` and set the shape if we
overwrite its initializer.

We also enforce the check of parameter shape between the original initializer
and the loaded blob. (this adds extra cost)

Differential Revision: D5520541

fbshipit-source-id: 80547dbd328b3f6cbfcea0b2daaf4004703dfe81
2017-07-31 16:04:23 -07:00
James Cross
8c65b5ab34 multilayer seq2seq
Summary: Several refinements to seq2seq example code, including support for multilayer LSTM.

Reviewed By: jamesr66a

Differential Revision: D5460372

fbshipit-source-id: d2eabf6aa9a5b5df7bbc341fd99c4e7d8322e717
2017-07-31 12:27:51 -07:00
Aapo Kyrola
8079abbaf1 fix traversal order
Summary: Memonger did not properly track the number of times a blob output has to be produced before an operator can be visited. Actually I remember fixing this before, but well. This bug was manifested in Priya's model, so thanks prigoyal, and benz's model verifier nicely caught the wrong output.

Reviewed By: asaadaldien

Differential Revision: D5524912

fbshipit-source-id: 10f4d7056b84aba0274a918af508ea043e6026f9
2017-07-30 21:47:48 -07:00
Mingda Li
e3c45206ec Add a method to run a train net multiple times in layer_test_util.py
Summary: This method runs a train net multiple times therefore enables testing layers with iteration-dependent behavior.

Differential Revision: D5493750

fbshipit-source-id: a7fb967a66f799aaf82acfadc4ecf66e0744da20
2017-07-28 19:56:05 -07:00
Aapo Kyrola
84b9d267dc add warnings about slow data input
Summary: One of my workflows was stuck before everstore/hive data input was experiencing networking issues (No route to host etc.). But it is hard to know this is happening because the errors were logged to stdout. Anyway, added a simple logging to warn if the data workers enqueue thread is not getting new data for over 10 secs.

Reviewed By: panshen1

Differential Revision: D5522816

fbshipit-source-id: a036c4afdfbbafea130a4251c1ca02c138d19a83
2017-07-28 18:21:42 -07:00
Tao Wu
6530db49bc improve pair_wise_loss operator to support multiple sessions
Summary: The diff adds support for rank_loss operator to support computing loss for multiple sessions (batch).

Reviewed By: kittipatv

Differential Revision: D5515465

fbshipit-source-id: 55a01cd5ad21eaeae82875ad136c392fed0dbb26
2017-07-28 15:12:47 -07:00
Dmytro Dzhulgakov
f2090debb0 Optimized SparseLengthsSum
Summary:
Optimised SparseLengthsSum (fp32) for now
1) Specialized  reducer
2) created fast routine with prefetches, loop unrolling, block specailization and register tiling
3) added more variety of block sizes to segment_ops_test.py

Reviewed By: Yangqing

Differential Revision: D5392472

fbshipit-source-id: 8ed9baf1b12ec05bd391cabb390024e6bc60a6f6
2017-07-28 10:10:25 -07:00
Bangsheng Tang
a41cbdec0e float support for square root divide
Summary: to support an operation needed by D5507205

Reviewed By: xianjiec

Differential Revision: D5512522

fbshipit-source-id: a9b3a668c28eff71d1e106dbbb572184df4a7638
2017-07-27 17:40:40 -07:00
Viswanath Sivakumar
0676dfef2b ExtractPredictorNet should strip gpu_id prefix from step_net
Summary:
The renames were only being applied to the main net, if step_net has an
external input that is part of renames, running the model would fail with 'blob
not found in workspace' error.

Differential Revision: D5511953

fbshipit-source-id: ba262a094c3263978dfe173f2cab00301131b57f
2017-07-27 16:06:47 -07:00
Jacqueline Xu
13569c9aa0 Fixing semi-random layer model for multi-layer models
Summary:
Updated the semi-random layer model for multi-layer models using semi-random layers.

Notable changes:
- Input and outputs for the semi-random layer is now a Struct with "full" and "random" components
- Flag was added to choose to initialize output schema in Arc Cosine or not (if output schema initialization will happen in Semi Random layer)

Reviewed By: chocjy

Differential Revision: D5496034

fbshipit-source-id: 5245e287a5b1cbffd5e8d2e3da31477c65b41e04
2017-07-27 15:25:19 -07:00
Aapo Kyrola
26645154bb warn about using test/val model with init_params=True + fixed some cases
Summary: It is common mistake to create test/validation model with init_params=True. When its param_init_net is run, it will overwrite training models' params, and with DPM, those won't be synchronized to all GPUs. I don't want to make this an assertion yet, since it might break people's trainers (it is ok to have init_params=True if you never run the param_init_net...).

Reviewed By: asaadaldien

Differential Revision: D5509963

fbshipit-source-id: 63b1a16ec0af96e3790e226850f6e0e64689143f
2017-07-27 13:20:27 -07:00
Aapo Kyrola
af1e45c1e1 support appending net and converting them
Summary:
As per rushabhmshah99 request: he wants to append a pre-trained model (without training that) to the model.
So added data_parallel_model.ConvertNetForDevice() to enable that. The unit test shows example how to use this with
AppendNet, and I also added a blurb to the function.

Differential Revision: D5503335

fbshipit-source-id: b2a5db5c1739dc97f46dd0d7606ed555d99255b8
2017-07-27 11:07:48 -07:00
Bangsheng Tang
d8443b8ffa BatchGatherOp
Summary:
1. added BatchGatherOp and BatchGatherGradientOp
2. unit tests

Reviewed By: xianjiec

Differential Revision: D5443965

fbshipit-source-id: bdcbb7f9f91c55484372a4bdb1727ae6d49e2018
2017-07-27 10:17:42 -07:00
Aapo Kyrola
3363681304 enable CreateCommonWorld to bootstrap from existing common world
Summary: Use romain-intel's ContextFactory to create common worlds from existing common worlds, thus bypassing KV store completely. Changed data_parallel_model to automatically find if there is already a CW we can work. CreateCommonWorldOp takes optional second parameter, which is existing CW.

Reviewed By: andrewwdye

Differential Revision: D5494956

fbshipit-source-id: 5f7a840bcd5fe4ea756fafeacc746bc2cf5078b0
2017-07-26 22:31:55 -07:00
Yangqing Jia
de92dbe4bb MKL code move
Summary:
Nothing gets changed - this would allow us to more easily deal with build
systems. Also now everything that is MKL related lives under mkl/.

Reviewed By: dzhulgakov

Differential Revision: D5505157

fbshipit-source-id: ddb2e6ac290a146a7cb495da23bb0e5b5594bd2a
2017-07-26 20:21:55 -07:00
Ahmed Taei
40b783b746 Fix flaky test due to numerical gradient approximation error.
Summary:
Use smaller step size for GradientChecks and pass seed to help reproducing the
test from logged inputs.

Reviewed By: Yangqing

Differential Revision: D5505698

fbshipit-source-id: fc308efe72d535695ba628944aee1913ba16b2f1
2017-07-26 18:58:19 -07:00
Jacqueline Xu
9bec54bbf1 Modify arc cosine feature map and semi random layers to initialize parameters as global constants
Summary:
The original issue was that the initialized parameters for randomized layers (Arc Cosine and Semi-Random) were not fixed across distributed runs of the layers. Moreover, as the weights are initialized as (constant) parameters, when the layer is added to the preprocessing part, these weights won't be saved after training since they don't exist on the trainer.

I fixed the issue here by building an option to add the randomized parameters to the model global constants so that the same parameter values can be accessed. Also, the parameters can be saved when the training is finished.

In this diff, I've:
- Updated randomized parameters to be added as a global constant across distributed runs of Arc Cosine Feature Map and Semi Random Feature layers
- Updated unit tests
- Ran an end-to-end test, enabling multiple readers to test the fixed issue

Reviewed By: chocjy

Differential Revision: D5483372

fbshipit-source-id: b4617f9ffc1c414d5a381dbded723a31a8be3ccd
2017-07-26 16:37:00 -07:00
Szymon Piechowicz
54b171eae5 Caffe2: don't swallow exception stacktrace
Summary:
Caffe2: don't swallow exception stacktrace

{F69325406}

Reviewed By: akyrola

Differential Revision: D5503227

fbshipit-source-id: 4e11d921652a094e20c46af19ba880390be8e997
2017-07-26 15:48:05 -07:00
Wojciech Glogowski
8f8dccd2ed distance_op_test from hypothesis_test refactored
Summary:
Moved distance_op_test from hypothesis_test to distance_op_test and
refactored

Reviewed By: akyrola, asaadaldien

Differential Revision: D5495104

fbshipit-source-id: 4a90c75eabeb380ae9d150d6258e9b5b0fbfc5ca
2017-07-26 13:37:08 -07:00
Davin Wang
d89632b52c Support (U)INT8, (U)INT16 in data type conversion
Summary:
Data type conversion between Numpy Array and Caffe2 Tensor currently only support 3 types: FLOAT, DOUBLE and INT32. Support 8bit and 16bit date types will help reduce the model size in some circumstance. I benefit from this to reduce size of a data set from 8GB to 1GB by using INT8.
Closes https://github.com/caffe2/caffe2/pull/930

Reviewed By: Yangqing

Differential Revision: D5440929

Pulled By: akyrola

fbshipit-source-id: 3762da1d845e62a13ba384d1c144328b19dd663b
2017-07-26 11:23:53 -07:00
Dmytro Dzhulgakov
cf1ce29631 Fix GPU SparseAdaGrad with empty tensors
Summary: CUDA doesn't like 0-sized grids :)

Reviewed By: Yangqing

Differential Revision: D5495805

fbshipit-source-id: 6819513024978ee6bb70a39b25d23ced06465750
2017-07-25 23:50:54 -07:00
Artem Volkhin
2f5c96a730 Fix Flatten operator for empty tensors
Reviewed By: xianjiec

Differential Revision: D5487475

fbshipit-source-id: f1321e15352b0bbe039312f544a9c2ed78da8732
2017-07-25 17:51:42 -07:00
Andrew Tulloch
133dc2603e Support grouped convolutions in MKL
Reviewed By: Yangqing

Differential Revision: D5487692

fbshipit-source-id: 94fb66b3b104cf16dcad07743def4ea940515689
2017-07-25 14:19:02 -07:00
Andrew Tulloch
d86f32ae2e Implement simple graph rewrite functionality.
Reviewed By: Yangqing

Differential Revision: D5487075

fbshipit-source-id: f7c7867c5cbae39cf197cf5e7ed8a64149f33208
2017-07-25 14:19:01 -07:00
Andrew Tulloch
9e6ea2987f MKLReluOp supports in-place X/Y
Reviewed By: Yangqing

Differential Revision: D5487060

fbshipit-source-id: 35d2d450f46aefc3c9395be45af99e13d1c168ec
2017-07-25 14:19:00 -07:00