Commit graph

656 commits

Author SHA1 Message Date
Huazhong Ning
942f53b5a6 gradient impact of task layers on shared is configurable
Reviewed By: chocjy

Differential Revision: D4943948

fbshipit-source-id: 2e26dfb30c6893b60985f693a823646ed3d3e0e3
2017-05-11 20:34:04 -07:00
Ben Zhang
93f1d0ca7c L1 Operator
Summary: Adds the L1 Distance operator to distance_op.

Reviewed By: bwasti

Differential Revision: D5007719

fbshipit-source-id: fd547c6645cf5f87305e9ebfd95ed918779c1d2a
2017-05-11 18:03:10 -07:00
Ahmed Taei
8df51a84ac Support 3D&1D SpatialBatchNorm[CPU]
Summary:
Generalize SpatialBatchNorm CPU Op to compute Spatial batch normalization for
1D, 2D & 3D input tensors.

Reviewed By: dutran

Differential Revision: D5043563

fbshipit-source-id: 7fcb933a628dd47f13aa622f63601a87382f09cd
2017-05-11 09:32:54 -07:00
Romain Cledat
e16ea46013 Extended ImageInputOp
Summary:
Added several features to the ImageInputOp:
  - bounding box (per image as well as default for the operator). For per-image, it
    only works in Caffe2 format and is passed as the third tensor in the form
    (ymin, xmin, height, width). For the operator, pass bounding_xmin, bounding_ymin,
    bounding_width and bounding_height as parameters.
  - per-channel mean/std. You can use the usual mean/std to pass a single
    value to be used for all channels or also pass mean_per_channel and std_per_channel
    to specify different values per channel. Order of channels is BGR.
  - A minimum size parameter that can be specified instead of the scale parameter.
    The minsize parameter will only scale the image if it is smaller than required.
    This differs from scale which will scale up as well as down. You can only specify
    one of scale or minsize.

Added a test case to test some of the features

Differential Revision: D4874988

fbshipit-source-id: 437191052a46e9916defe8b100d7cc7864373f61
2017-05-10 17:52:01 -07:00
Yury Zemlyanskiy
e8c274cf16 Optimize memory usage for MI-LSTM
Summary:
Use ElementwiseLinearOps instead of manual Mul + Sum. That saves intermediate blobs.

For NMT use case

Before: https://our.intern.facebook.com/intern/fblearner/details/18060753
Time per step: 0.072
memory usage (per each of 2 GPUs): 9041MiB

After:https://our.intern.facebook.com/intern/fblearner/details/18107583
Time per step: 0.0715
Memory (per each GPU): 8560MiB

Reviewed By: akyrola

Differential Revision: D5038785

fbshipit-source-id: 4bc8155dbd0c87729e17236d68d62ca530aadb53
2017-05-10 16:53:43 -07:00
Xiaolong Wang
11bcdbc3f0 Load Parameters from Model
Summary:
In Dper utility, add a function `load_parameters_from_model_init_options` to
allow init parameters from pretrained models

Reviewed By: xianjiec

Differential Revision: D4926075

fbshipit-source-id: 5ab563140b5b072c9ed076bbba1aca43e71c6ac5
2017-05-10 10:33:04 -07:00
Yury Zemlyanskiy
3abd0cb623 Add axis argument to SoftmaxWithLoss
Summary: ##axis## argument for SoftmaxWithLoss (it doesn't yet work for spatial case).

Reviewed By: akyrola

Differential Revision: D5025797

fbshipit-source-id: 9e3cf39223af3f2c8bb357f8d9fe952b7349f913
2017-05-09 19:36:00 -07:00
Alisson Gusatti Azzolini
75bc9f5e77 Relax requirement on token uniqueness
Summary: Relax requirement on token uniqueness since a few use cases broke after the uniqueness requirement was added in a previous diff.

Reviewed By: kittipatv

Differential Revision: D5034132

fbshipit-source-id: 327eb065923e6ea152a360324316f81b7fb9564b
2017-05-09 19:36:00 -07:00
Yury Zemlyanskiy
48de1ea165 Drop extra Reshape in attention calculation
Summary: We can avoid this extra Reshape.

Reviewed By: jamesr66a

Differential Revision: D5032874

fbshipit-source-id: 92bd568bc6bec53d7f81a64cfa96d2c610823f8c
2017-05-09 17:16:36 -07:00
Yury Zemlyanskiy
ae924be3ac Removing extra Reshapes in MILSTM with new broadcasted ops
Summary: D4873222 introduced SumReduceLike and removed the use_grad_hack ... hack. Remove unnecessary reshapes and kill use_grad_hack parameters.

Reviewed By: jamesr66a

Differential Revision: D4894243

fbshipit-source-id: c4f3f84abf95572d436b58bbdc2b18b21583c2f1
2017-05-09 14:11:04 -07:00
Xiaolong Wang
add840510f Refactor Optimizer to Allow scale_learning_rate
Summary:
In transfer learning, parameter initialized from pretrained model might require
a different learning rate than otherwise initialized. To this end, here we
implement a python solution where `base_learning_rate` is scaled by `scale`,
which is in turn set by `scale_learning_rate`; Alternatively, we can achieve
same effect by rewriting the LearningRate operator in C++

Reviewed By: kennyhorror

Differential Revision: D4992827

fbshipit-source-id: 8d7e87a61c95b3eb8ef733ec436f4060e865c0ac
2017-05-09 13:16:21 -07:00
Alisson Gusatti Azzolini
20d8de8d51 Parameter cost estimation job
Summary:
Adds a parameter cost estimation step before the actual training starts. The costs are later used in order to better shard the parameters across instances of the parameter server.

Things I needed to modify:
- A few changes to make ModelLayerHelper picklable
- Add support for stopping a distributed job after a number of stats reporting steps.
- Refactored run_dist_job to support collocating the reader with the trainer even when PS are present.
- Option to disable dense updates (when num_dense_servers=0).

Currently there's a huge overhead posed by having to launch a child workflow. I'll try and address next in a subsequent diff.

This is WIP because the other workflows need to be migrated as well.

I can break this down into smaller diffs if reviewers would prefer it.

Reviewed By: kennyhorror

Differential Revision: D4974752

fbshipit-source-id: 04c336acb2945f8f11324a221ffc6967818c0672
2017-05-09 13:02:24 -07:00
Alisson Gusatti Azzolini
bd8ed6641c Stabilize PythonOp token name
Summary: For distributed jobs, we were relying on the order the PythonOps were registered, which was very fragile.

Reviewed By: dzhulgakov

Differential Revision: D5016847

fbshipit-source-id: f5601467c5b0569d5e8a0efdd76abad0d703c5f5
2017-05-09 11:19:44 -07:00
Simon Layton
1d0ba2cfbd New cudnn ops
Summary:
cuDNN versions of dropout and LRN (for native fp16 support), port of Caffe's max pooling algo that uses an explicit mask to store locations (also supports fp16 storage)
Closes https://github.com/caffe2/caffe2/pull/396

Reviewed By: akyrola

Differential Revision: D4990880

Pulled By: asaadaldien

fbshipit-source-id: a716acffb656843e9b31e3e6808bd2d8aa959d03
2017-05-08 16:33:21 -07:00
Yury Zemlyanskiy
11052d03aa RNNCell API change: returns states and outputs
Summary:
Incorporating definition of cell's output and illustraing it's usage by adding dropout to all types of cell.

I think that we should try to get rid of aliases in RecurrentNetwork, so output of applied_over_sequence is also always (state_1_all, state_2_all, ...). This way we can merge get_output_from_single_step, get_output_from_sequence and get_outputs_with_grads into a single method

Let me know what do you think!

Reviewed By: jhcross

Differential Revision: D4992913

fbshipit-source-id: 737939be336ad145f84e8733cd255d4f7188ef70
2017-05-08 15:19:48 -07:00
Yury Zemlyanskiy
b6a8dd1438 don't recompute small blob in attention
Summary: decoder_hidden_encoder_outputs_sum_tmp is tiny after D5010109, no need to recompute it.

Reviewed By: akyrola

Differential Revision: D5014335

fbshipit-source-id: cc9e8f91372889d10bd99c79366018cb3943a435
2017-05-08 13:06:06 -07:00
Kevin Matzen
0cb7774445 softplus op
Summary: Added softplus function, f(x) = ln(exp(x) + 1)

Reviewed By: akyrola

Differential Revision: D5011057

fbshipit-source-id: 5fddb1568fee625f81ea3a86a85d0f400c3ee278
2017-05-08 10:40:25 -07:00
Xianjie Chen
8a7f00d61b fix mean pooling
Summary:
Segment based Ops requires increasing seg id, and without gap. Lengths based Ops does not
have this requirements.

Otherpooling methods, e.g., LogExpMean does not have Lengths based Ops available yet.

Differential Revision: D5019165

fbshipit-source-id: ab01a220e10d4ed9fa2162939579d346607f905e
2017-05-08 01:09:07 -07:00
Jon Morton
ac1c63dda8 Add specialized ResizeNearest implementation for scale=2
Summary:
Specialized implementation of ResizeNearest for width_scale=2 and height_scale=2. This implementation doesn't use divides or calls to std::min, and is unrolled 2x over the width dimension. Also add a correctness test.

About 6x faster.

Reviewed By: ajtulloch

Differential Revision: D4928579

fbshipit-source-id: 5cc92a52bd688690fee907b4333d9c84b666f9c9
2017-05-07 21:10:11 -07:00
Aapo Kyrola
711ea1d4ac fix enternalinputs handling in AppendNet v2
Summary: External inputs must be computed before updating the _ops_output structure, otherwise if the net to be appended outputs the external input, it is not added correctly

Differential Revision: D5013496

fbshipit-source-id: 6a83d0a6f1c63ef8ae7bec4d862c0ac2a690d47b
2017-05-05 21:50:57 -07:00
Du Tran
033ab9da1b Adding video data layer for caffe2
Summary: Adding a simple video data layer which allows to read video data from frames, videos and output 5D tensor. It also allows multiple labels. The current implementation is based on ffmpeg

Differential Revision: D4801798

fbshipit-source-id: 46448e9c65fb055c2d71855447383a33ade0e444
2017-05-05 14:16:38 -07:00
Aapo Kyrola
a61778a628 fix recompute_blobs_on_backward
Summary: My previous refactoring broke recompute_blobs_on_backward, which was cleared.

Reviewed By: urikz

Differential Revision: D5013351

fbshipit-source-id: 5945778c0cff2ee2c7f5ad7b59b58f4305fa6a05
2017-05-05 14:01:34 -07:00
James Cross
5c667ebe4e AttentionCell
Summary:
This diff creates a generalized AttentionCell class, which will allow us to construct attention decoders out of arbitrary RNNCell components (with a particular view to using stacked, multi-layer RNNs).

In order to do this, we introduce a new optional input for RNNCell._apply which allows us to provide an additional input that is not processed by prepare_input(). Note that this is an argument only to _apply, not apply, since it is only meant to be used for additional recurrent connections to "embedded" cells, not for standalone RNNs.

Reviewed By: urikz

Differential Revision: D4998465

fbshipit-source-id: 473009ea4917e86e365f9d23aa2f11a46a94fd65
2017-05-05 12:33:01 -07:00
Yury Zemlyanskiy
d7f20c94fd Optimize memory for RNN attention
Summary:
The fix should save us (source_len - 1) * target_len * batch_size * encoder_output_size * 4 bytes for the forward pass. Typically, these values are 100 * 100 * 128 * 512 * 4 = 2.4GB.
Not entirely sure about backward pass.

Reviewed By: akyrola

Differential Revision: D5010109

fbshipit-source-id: 2ed68f3ebfd3b8362916d24af991482f1686e064
2017-05-05 12:18:50 -07:00
Eider Moore
0c6099ce25 Add __dir__ so autocomplete in iPython works.
Summary: It is good practice to provide __dir__ whenever __getattr__ is defined so that tooling will work intelligently.  In particular, it is hard to explore the available methods in iPython without tab completion.

Reviewed By: dzhulgakov

Differential Revision: D5006545

fbshipit-source-id: 1a150d91d54637d80b292764513943ff70d971b4
2017-05-05 11:32:06 -07:00
Heng Wang
8a2433eacb Add model saving and loading to resnet50_trainer.py
Summary:
Script caffe2/caffe2/python/examples/resnet50_trainer.py can be used to train a ResNet-50 model with Imagenet data (or similar).

However, currently the script does not actually save the model, so it is kind of useless.

Task 1:  After each Epoch, save the model in a file "<filename>_X.mdl' where X is the epoch number and <filename> is given as a command line parameter. By default, use "resnet50_model" as filename.

Task 2: Add a functionality to restore the model from a previous file:
 - add a command line parameter "load_model", which user can use to specify a filename.
 - if this parameter is set, load the model parameters from the previous file

Reviewed By: prigoyal

Differential Revision: D4984340

fbshipit-source-id: 333e92679ba52a7effe9917fdfc2d55d652b868f
2017-05-05 10:08:37 -07:00
Aapo Kyrola
5c52392229 opsify AccumulateInputGradients
Summary:
Part of project to make all gradient accumulation business ops in RecurrentNetworkGradientOp, this makes the accumulateInputGradients ops.

Also added way to mark operators private so they don't appear in docs.

Reviewed By: salexspb

Differential Revision: D5006698

fbshipit-source-id: 226d7afb473290c8d0f936d2cc87640be3e06615
2017-05-05 09:13:39 -07:00
Romain Cledat
aa5e771042 Added tiles and axis as input parameters to Tile Operator
Summary:
Added the possibility to add 'tiles' and 'axis' as input
as opposed to arguments for the Tile Operator. If provided, the input
values will override the argument values. Now with proper CUDA code

Differential Revision: D4930347

fbshipit-source-id: b44b032b327c7d7bddfce63abf4e3289d7e74bfb
2017-05-04 23:46:51 -07:00
Xiaolong Wang
0d32ab4a45 Refactor FTRL optimizer to allow sending Alpha as input blob
Summary: Split from parent diff

Reviewed By: xianjiec

Differential Revision: D4992993

fbshipit-source-id: 9f8a79023b0c581e84bd5e82e2e730c9e1a86e1e
2017-05-04 22:57:00 -07:00
Kittipat Virochsiri
211eae127c LastNWindowCollector
Summary: Layer for LastNWindowCollector op. We need this since it's an in-place operator.

Reviewed By: chocjy

Differential Revision: D4981772

fbshipit-source-id: ec85dbf247d0944db422ad396771fa9308650883
2017-05-04 17:32:09 -07:00
Luke Yeager
b229b7ff11 Fix typo 'convlution'
Summary: Closes https://github.com/caffe2/caffe2/pull/470

Differential Revision: D5003850

Pulled By: pietern

fbshipit-source-id: 62ba13f58dae9f19a434f2075ff3ac143d34feb5
2017-05-04 17:02:35 -07:00
Aapo Kyrola
d312dcc881 lstm_benchmark use rnn_cell.LSTM multicell + assertion
Summary:
Use the rnn_cell's multi-cell for LSTM benchmark. While doing this, i had not changed the initial_states and I got a inconsistent result from rnn_cell, so added an assertion to check initial states length is 2 * num layers.

+ fix division by zero error

Reviewed By: salexspb

Differential Revision: D5003177

fbshipit-source-id: a8250b825394c352428a0f067098dfcd7516ab2a
2017-05-04 17:02:32 -07:00
Kittipat Virochsiri
c34d5a838f Generalize LastNWindowCollector
Summary: Use `CopyItems` so that it accepts any type of tensor. Also, move the cursor to input blob so that it's checkpoint friendly. Output is now also part of input so that inference can work correctly.

Reviewed By: xianjiec

Differential Revision: D4920987

fbshipit-source-id: da532736225ec27f409ff763ff69a0629235151c
2017-05-04 16:05:15 -07:00
Luke Yeager
c8f444237f net_drawer: --input is required
Summary:
Before:
```
$ python -m caffe2.python.net_drawer
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/data/caffe2/install/caffe2/python/net_drawer.py", line 403, in <module>
    main()
  File "/data/caffe2/install/caffe2/python/net_drawer.py", line 365, in main
    with open(args.input, 'r') as fid:
TypeError: coercing to Unicode: need string or buffer, NoneType found
```
After:
```
$ python -m caffe2.python.net_drawer
usage: net_drawer.py [-h] --input INPUT [--output_prefix OUTPUT_PREFIX]
                     [--minimal] [--minimal_dependency] [--append_output]
                     [--rankdir RANKDIR]
net_drawer.py: error: argument --input is required
```
Closes https://github.com/caffe2/caffe2/pull/479

Differential Revision: D5003898

Pulled By: pietern

fbshipit-source-id: d121c331411ba4bbded81f9658ec787fa2fd3dc1
2017-05-04 11:45:57 -07:00
Aapo Kyrola
1a831ce8f2 Add direct enqueuing to enable RNN input, allow specify batch columns
Summary:
Add a parameter dont_rebatch to data_workers. This disables batching of input from fetcher to equal-batch size chunks. This is not desired with RNNs where with longer sequence length we might want to have smaller batches etc.

For some reason the graceful-shutdown test interfered with other tests, so I removed it.

Reviewed By: jay-mahadeokar

Differential Revision: D4988549

fbshipit-source-id: cbab46d77c948f2e293e79e6eb538dde17d800ee
2017-05-03 14:49:44 -07:00
Pooya Davoodi
16821bc45d Add ScatterWeightedSum for GPU.
Summary:
- Adding ScatterWeightedSumOp for CUDA.
- This version does not support input weight (weight0). In other words, the input weight has to be 1.0, otherwise the op exits.
- To check the value of weight0, we copy its value from device to host at: https://github.com/caffe2/caffe2/pull/443/files#diff-2a77f80797072e8443f4867cb709fb40R244
Closes https://github.com/caffe2/caffe2/pull/443

Reviewed By: akyrola

Differential Revision: D4971910

Pulled By: asaadaldien

fbshipit-source-id: 2282e968f95364f0b3b8126502b053fe7a32ba20
2017-05-03 13:40:48 -07:00
James Cross
ddc4d101ad MultiRNNCell (Caffe2)
Summary: Add Python support for arbitrary (unidirectional) recurrent networks with MultiRNNCell abstraction. Since the combined step net for all layers is created at one time (in method _apply), this may be optimizable as-is. LSTM() function is extended to accept a list of numbers of units for the dim_out argument, producing a multi-layer LSTM in that case.

Reviewed By: salexspb

Differential Revision: D4965001

fbshipit-source-id: 39c069468d5b40bf803503cf62046a479ca83cbb
2017-05-03 10:02:31 -07:00
Pieter Noordhuis
aadad971e4 Fix pybind11 module name for MPI helpers
Summary: TSIA

Reviewed By: dzhulgakov

Differential Revision: D4981136

fbshipit-source-id: 62d0df8dccea0a3ecb6da150eea4752b100c04a8
2017-05-02 23:18:50 -07:00
Kittipat Virochsiri
3ca0de25da Prevent false overwriting of a field
Summary: The code snippet below is invalid in the add unit test is invalid but it may or may not cause exception. Disable the syntax so people don't accidentally use it.

Reviewed By: dzhulgakov

Differential Revision: D4985030

fbshipit-source-id: ffa2b26f7b29128b196aba1b1001a97c87e381cf
2017-05-02 23:18:49 -07:00
Yury Zemlyanskiy
31643d5ecb Inference code for seq2seq model
Summary: Beam search implementation

Differential Revision: D4975939

fbshipit-source-id: 67d8b73390221583f36b4367f23626a2aa80f4b4
2017-05-02 22:47:28 -07:00
Yiming Wu
3504e1d836 cuda (sparse) lengths sum
Reviewed By: azzolini

Differential Revision: D4961327

fbshipit-source-id: 4ee61dcdd907c044876cb0de671ceee953c15129
2017-05-02 22:21:42 -07:00
Alexander Sidorov
379ac514b8 lstm_benchmark: add warm-up stage, support layers
Summary:
We need a warm-up stage because otherwise first iteration
speds too much timedoing all the allocations

Reviewed By: akyrola

Differential Revision: D4986201

fbshipit-source-id: f60a75520988ff3f1540bb157cdc69634f307db4
2017-05-02 20:34:00 -07:00
Kittipat Virochsiri
22d4eaeb9e JoinContext
Summary:
Layer to allow model to follow different paths for each instantiation context and join later. Together with tagging system cleanup (this is a separate issue), this should reduce the need to write a layer to differentiate between context.

Re: tagging system clean up, we should make exclusion more explicit: EXCLUDE_FROM_<CONTEXT>. This would simplify instation code. TRAIN_ONLY should become a set of all EXCLUDE_FROM_*, except EXCLUDE_FROM_TRAIN.

Reviewed By: kennyhorror

Differential Revision: D4964949

fbshipit-source-id: ba6453b0deb92d1989404efb9d86e1ed25297202
2017-05-02 17:32:26 -07:00
Aapo Kyrola
282298dd1c Data parallel model: Disable NCCL by default to hopefully reduce deadlocks
Summary: Make NCCL optional in data_parallel_model due to continuing reliablity (deadlock) issues.

Reviewed By: pietern

Differential Revision: D4988950

fbshipit-source-id: 8a2192f01b5f3c0e847137cd37aefc69e553a56f
2017-05-02 16:09:17 -07:00
Janusz Kudelka
ee7b3c9b2b caffe2: rebatching queue for MultiTask
Summary:
RFC. This is a naive implementation of Rebatchin Queue for MultiTask
effort. Full disclaimer, I'm very new to Caffe/Machine Learning and I'm doing
dodge science here (under Dmytros supervision), so please be extra tough on
this review so I
can learn best practices :)

Differential Revision: D4871970

fbshipit-source-id: 924820ef0fce45b5e2bdabeec9885cbafa23a880
2017-05-02 15:22:46 -07:00
Yiming Wu
222b781f76 Ensure sparse_gradients feed to CPU
Summary: Ensure sparse gradients tensors are copied to CPU

Reviewed By: dzhulgakov

Differential Revision: D4987701

fbshipit-source-id: 81f93c4f9d4b9bc5855cd4e9683d1a887b27e0cf
2017-05-02 15:01:26 -07:00
Kittipat Virochsiri
e8e36945cf make debug message more explicit & verbose
Summary: I ran into this earlier and the debug messages were not helpful enuogh

Reviewed By: kennyhorror

Differential Revision: D4985754

fbshipit-source-id: b3d12b5e2cfa1b54fca9126768c84c902664ef28
2017-05-02 12:39:14 -07:00
Krishna Vudata
1f3c7f8080 Handle net.external_inputs correctly in AppendNet
Summary:
When appending net A to net B, an external input of net A should not be added as
an external input of net B if net B is outputting that blob.

Reviewed By: dzhulgakov

Differential Revision: D4975921

fbshipit-source-id: a5c0ada7b96d851e57d345244d322dd93c7be8e4
2017-05-02 11:20:26 -07:00
Chonglin Sun
e8e93066e7 add workflow for user complicated embedding
Summary: Correctly propagate request_only tag to all layer.

Reviewed By: kennyhorror

Differential Revision: D4751496

fbshipit-source-id: e65fd8cfe56d2989213d44e684a528ede691d316
2017-05-02 10:46:52 -07:00
Jiyan Yang
a458aa4b2a Fix tags to be based on EXCLUDE_FROM_{CONTEXT}
Summary: Cleaning up the tagging system. Introducing tags EXCLUDE_FROM_{CONTEXT}.

Reviewed By: kennyhorror

Differential Revision: D4974842

fbshipit-source-id: b0fa6772299bb70afa2192c39e45191c9f41336a
2017-05-02 09:32:27 -07:00