Commit graph

1099 commits

Author SHA1 Message Date
Yan Shang
57c93435e3 Dedup name in functional layer
Summary:
Before this fix, a functional layer name can appear several time in a
blob and causes confusion. This diff fix this issue.

Reviewed By: kittipatv

Differential Revision: D5641354

fbshipit-source-id: d19349b313aab927e6cb82c5504f89dbab60c2f2
2017-08-17 17:50:34 -07:00
Ben Zhang
cfbd116966 ApplyTransformIfFaster
Summary:
Implemented ApplyTransformIfFaster

Determine if a transform is faster, then return whichever net is better.

Reviewed By: bwasti

Differential Revision: D5534535

fbshipit-source-id: 509943205b0c454bf30fb01343ac4e88d1441c39
2017-08-17 15:36:51 -07:00
Luke Yeager
f7ece79949 Add fp16 and tensorcore support to resnet50_trainer
Summary:
Use like `--dtype=float16 --enable-tensor-core`
Closes https://github.com/caffe2/caffe2/pull/1093

Differential Revision: D5634840

Pulled By: harouwu

fbshipit-source-id: 18c1e70236ba5ef8661ff55fb524caae1be19310
2017-08-17 15:16:24 -07:00
Kittipat Virochsiri
fa984af0f9 use create_param() in layers
Summary: These layers were not codemoded

Reviewed By: chocjy

Differential Revision: D5645982

fbshipit-source-id: 4325f77a0f8152dfe6dfdeee59697b25ecb1de35
2017-08-17 13:47:57 -07:00
Aapo Kyrola
7fad4be4c6 Device-specific memongering
Summary:
Enforce that blobs don't mix between operators on different GPUs or CPU/GPU. Add test.

+ Fix memonger when no namescope is provided.

Reviewed By: asaadaldien

Differential Revision: D5644708

fbshipit-source-id: 0cb361efd6361b6e2138462584bab6b4de039b5d
2017-08-17 13:31:26 -07:00
James Reed
f388135d3f Layer norm brew wrapper
Summary: Implement a brew wrapper for the LayerNorm op. This adds the scalar weight and bias terms to the op.

Reviewed By: jmp84

Differential Revision: D5595836

fbshipit-source-id: 467b2e1158b0c454a149d4b26c47719826e98752
2017-08-17 11:17:47 -07:00
James Reed
e45e621b0e Implement layer norm gradient GPU
Summary: Implement layer normalization from https://arxiv.org/pdf/1607.06450.pdf

Reviewed By: wickedfoo

Differential Revision: D5594445

fbshipit-source-id: 873643165c958fd5829fa7cf07d5d4b1b8b0ed59
2017-08-17 11:17:46 -07:00
James Reed
8e8e90f595 IMplement layer normalization backward CPU
Summary: Implement layer normalization from https://arxiv.org/pdf/1607.06450.pdf

Reviewed By: jmp84

Differential Revision: D5578306

fbshipit-source-id: 94d262f0317b3ee1b504e0110ad5135afe8350ca
2017-08-17 11:17:46 -07:00
James Reed
e16c40eb4f Implement layer normalization op forward GPU
Summary: Implement layer normalization from https://arxiv.org/pdf/1607.06450.pdf

Reviewed By: wickedfoo

Differential Revision: D5552262

fbshipit-source-id: d0cddb0769623a1b3779e2114c19e6ebc57c0f0d
2017-08-17 11:17:45 -07:00
James Reed
474c043be5 Implement layer normalization op forward CPU
Summary: Implement layer normalization from https://arxiv.org/pdf/1607.06450.pdf

Reviewed By: akyrola

Differential Revision: D5543381

fbshipit-source-id: 1102e568439af6a60aad3b87017d5a997fb7dc16
2017-08-17 11:17:44 -07:00
Aapo Kyrola
e89474c496 fix forward_only mode
Summary:
Forward-only mode had broken at some point. Two things: RNNCell did not pass the parameter to recurrent.py and also recurrent.py was broken if forward_only=True after python3 codemod.

Added test to rnn_cell_test to actually check the forward only parameter is passed to prevent future breakage.

Reviewed By: jmp84

Differential Revision: D5639306

fbshipit-source-id: b1bbc39d59c3f3734b2f40a1c2f3740c733e0bd4
2017-08-17 10:19:04 -07:00
Jerry Zhang
a63e7314f3 Adding 1d-2d-3d Schemas for Conv and Pool
Summary: Add Conv and Pool operators with dimensions.

Reviewed By: bddppq

Differential Revision: D5588614

fbshipit-source-id: 2552c40dc3ca180a6ab51817d60f0b85b97885d5
2017-08-17 09:45:54 -07:00
Jerry Zhang
4ca5735753 Allow inplace for spatial_bn_op
Summary: att

Reviewed By: Yangqing

Differential Revision: D5644717

fbshipit-source-id: 1a020fe4ca7028056ce7bebddb7bfd1437998530
2017-08-17 09:18:55 -07:00
Badri Narayan Bhaskar
ae2aad9c0d Operator to Merge ID_LIST features
Summary:
As an alternative to sharing embeddings, we want to explore merging the ID_LISTs in the net.

This commit adds an operator to merge many ID_LIST features into a single one.

Differential Revision: D5481523

fbshipit-source-id: 446121122a32de5682d5d75a165370bc8d776d03
2017-08-17 01:16:00 -07:00
Jingfei Du
b3029df1d0 Added window mode for caffe2 sequence operator
Summary: This can be used for local attention to mask elements outside of a window

Reviewed By: jamesr66a

Differential Revision: D5643677

fbshipit-source-id: 92b33866258ccc7307d5bcf08234610aa3fb152d
2017-08-16 21:34:29 -07:00
Ahmed Taei
a0fe96d7cd Rewrite memonger DAG in C++.
Summary: This diff replaces the main of the memonger for dag algorithm _compute_blob_recycling_for_dag with a c++ implementation.

Reviewed By: akyrola

Differential Revision: D5544219

fbshipit-source-id: 9f868880c8d0eb997ad3dd39433f9d0b9216d303
2017-08-16 16:17:15 -07:00
Yiming Wu
a104dac193 remove unsed code and bring back single benchmark mode
Summary: the old gpu single benchmark mode is lost in recent changes. We still need this mode to benchmark some operators. I also removed some unused ancient code

Reviewed By: azzolini

Differential Revision: D5628501

fbshipit-source-id: c5d2c6c99af18c41bead5d86c46a42f05821e2ff
2017-08-16 14:06:31 -07:00
Kevin Wilfong
1f47a80e88 Caffe2: diagonal fill op
Summary: Caffe2: diagonal fill op

Reviewed By: panshen1

Differential Revision: D4775640

fbshipit-source-id: bb388ffe223e6b153d4cde1fdad6f84a2bb65b0f
2017-08-16 13:05:11 -07:00
Bor-Yiing Su
30616ee309 Fixes the broken checkpoint test.
Summary:
Since we temporarily disable checkpointing the readers, we need to
rename all the node names in the test to make it pass.

Reviewed By: azzolini

Differential Revision: D5640930

fbshipit-source-id: 1e61be31ddf9b6e28efd2eb8e6e91e63dcd83154
2017-08-16 11:24:50 -07:00
Lei Chen
14950a9082 Support session in distributed realtime trainer
Summary:
Convert from PlanDef ProtoBuf into python Plan object by recursively creating
Nets and ExecutionSteps.

Also support running Plan object directly in Session.

Reviewed By: azzolini

Differential Revision: D5608393

fbshipit-source-id: c0ae3b6da743a759af6db3b614a5a3935fe0b34c
2017-08-16 10:28:55 -07:00
Aapo Kyrola
a53192e334 Revert D5001637: [Caffe2][RNN] Threaded dependency-aware RNNExecutor (frontier/diagonal execution).
Summary:
This reverts commit 3d0a71593d73a9ff22f4c1a5c9abf2a4a0c633c8

bypass-lint

Differential Revision: D5001637

fbshipit-source-id: 4d6250ae7e66ea0aa635a68d943d552e5db65b69
2017-08-16 03:21:49 -07:00
Aapo Kyrola
453c60ce28 Threaded dependency-aware RNNExecutor (frontier/diagonal execution).
Summary:
This diff adds dependency-aware concurrent/parallel execution of operators in stepnets. For CPU, we use multi-threaded execution. For CUDA, we use multiple streams and cuda events for parallelism and dependency tracking.

Much of the diff is about computing dependency graph, which was quite tricky because we need to also avoid write-races of multiple operators running in multiple timesteps in parallel. Also, recurrent blobs "change name" when passing over timestep ("_prev"), so that needs to be handled as well.

This diff also restores the link-ops that I unlanded earlier.

The performance gain of this diff is very good for CPU (same perf as with static_dag, even better on forward-only). On CUDA, the gains are modest, at least with the sizes i was testing with.

Reviewed By: salexspb

Differential Revision: D5001637

fbshipit-source-id: 3d0a71593d73a9ff22f4c1a5c9abf2a4a0c633c8
2017-08-15 23:55:15 -07:00
Bor-Yiing Su
49ec942825 Temporarily disables the checkpoints for the readers.
Summary:
The hive reader checkpoints are broken because of D5582328.
This breaks our offline simulator test as well.
This is a temporary fix that disables the checkpoints for readers.

Reviewed By: azzolini

Differential Revision: D5637719

fbshipit-source-id: 4f31ae534cb7e981fcacbb721cbb2420249fad91
2017-08-15 19:36:11 -07:00
Yangqing Jia
1db7a99249 disable travis test for dpm test
Summary:
After this, we should have test going back to all green.
Closes https://github.com/caffe2/caffe2/pull/1058

Reviewed By: harouwu

Differential Revision: D5637495

Pulled By: Yangqing

fbshipit-source-id: ac3ab5a27bc56e3bb08fa81aa8ed186cb7e8832b
2017-08-15 19:17:41 -07:00
Luke Yeager
f92fdd850d Important typo in resnet50_trainer
Summary: Closes https://github.com/caffe2/caffe2/pull/1092

Reviewed By: Yangqing

Differential Revision: D5637489

Pulled By: harouwu

fbshipit-source-id: 13609a3e14a45e640849268821fd8565fd7aae4d
2017-08-15 19:03:15 -07:00
Douglas Chen
e95b79a69c Benchmark for embedding generation
Summary:
Adds a benchmark comparing two methods used to generate positional embeddings,
table-based and sinusoid (as in the Transformer paper).

Reviewed By: jamesr66a

Differential Revision: D5625633

fbshipit-source-id: faee2d20ea0c3d9c41479c5114fa010ac49fab24
2017-08-15 14:22:41 -07:00
Alexander Sidorov
52befa4802 DataParallelModel: take param_init_net into account in _InferBlobDevice
Summary:
Here is my example:

For static RNN timestep is created as a part of param_init_net. Before DPM assumed that it is CUDA blob by default and it participated in broadcasting causing Copy on line 798 to fail. No device mapping is correct for this blob.

Reviewed By: akyrola

Differential Revision: D5631716

fbshipit-source-id: 28c3eb17ecc3080c95c41d69a60bf7262d3907d4
2017-08-15 12:06:46 -07:00
Aapo Kyrola
c05c500a82 check _grad suffix
Summary:
Memonger had a subtle bug which caused it to recycle "splitinfo" outputs of Concat/Split. That is bad since they are in CPU device, and woult cause them to be realloaced. This caused big slowdown with Kaiming's trainer.

Bug was that we checked for gradients as contaning "_grad" in the name, although we should only allow it as a suffix. Admittedly, this is not elegant to do string checking anyways, but that is how Caffe2 works now.

Reviewed By: asaadaldien

Differential Revision: D5627251

fbshipit-source-id: c12be2323109bf81c3725d8884c7ef024e010bd5
2017-08-14 19:47:59 -07:00
Juan Miguel Pino
434fa7f694 Reduce memory usage for dot attention
Summary: Title

Differential Revision: D5569996

fbshipit-source-id: c705fc7870ac3e71a071c3f808ac885a82334af2
2017-08-14 12:35:50 -07:00
James Reed
ffd9316b03 Use SequenceMask op in attention code for sequence masking
Summary: Use the new SequenceMask op to mask out invalid positions in the attention mechanism rather than using PackSegments and UnpackSegments. This should help us on several fronts, including elision of host<>device copies and using fewer intermediate blobs

Differential Revision: D5619156

fbshipit-source-id: e59c644236cee02f853d8743f9a938fb10adc73b
2017-08-12 19:17:49 -07:00
James Reed
a985355935 Gradient for SequenceMaskOp
Summary: Implement backward pass for a SequenceMaskOp to replace https://github.com/caffe2/caffe2/blob/master/caffe2/python/attention.py#L54-L72.

Reviewed By: akyrola

Differential Revision: D5618373

fbshipit-source-id: b831fa69f51d9468c858961f922564159e12b46f
2017-08-12 14:34:29 -07:00
James Reed
0a828768e9 Implement SequenceMaskOp forward pass
Summary:
Implement forward pass for a SequenceMaskOp to replace https://github.com/caffe2/caffe2/blob/master/caffe2/python/attention.py#L54-L72.

This implements two modes: a sequence-length based mode and a matrix triangle mode.

Reviewed By: akyrola

Differential Revision: D5615493

fbshipit-source-id: a2ce4a8e655d9b720049010a7856be052c5567eb
2017-08-12 14:34:28 -07:00
Bor-Yiing Su
8a5bdc383e Fixes the flaky upload test
Summary:
The LocalSession does not work with the multi-node definitions.
The test becomes flaky because of that. The fix is to create
different LocalSession for each Node(), and run each node
sequentially.

Differential Revision: D5617857

fbshipit-source-id: a8079a90291b4c8b5aa6b471c33c06d18e59976c
2017-08-11 18:58:24 -07:00
Bor-Yiing Su
404f8ee9b4 Extends the jobrunner to support uploading checkpoints.
Summary:
1. Adds one more step in the JobRunner class to upload checkpoints.
2. Adds one function to return the name of the checkpoint given
the name of the node.

Reviewed By: andrewwdye

Differential Revision: D5597130

fbshipit-source-id: 570a55785e6227859e1115326d6cab077f0e7f72
2017-08-11 14:17:17 -07:00
Zhaoming Wu
399fc9fb09 Added Nesterov
Summary: Added Nesterov momentum as an option for BMUF and corresponding tests

Reviewed By: asaadaldien

Differential Revision: D5599888

fbshipit-source-id: 30819c9e689347c8b75daddc7444bea9f54193ae
2017-08-11 13:52:43 -07:00
Jerry Pan
9372ff7a86 Caffe2: support Tensor in BlobsQueueDB
Summary: Caffe2: support Tensor in BlobsQueueDB

Reviewed By: kevinwilfong

Differential Revision: D5589616

fbshipit-source-id: 66aa6092b6403960c4858abd986771b58be94106
2017-08-11 11:21:14 -07:00
Simon Layton
85788a0f65 Add TensorCore support
Summary:
Add support for TensorCore convolution and gemm on Volta hardware.

Currently built on top of #1055
Closes https://github.com/caffe2/caffe2/pull/1056

Differential Revision: D5604068

Pulled By: Yangqing

fbshipit-source-id: 100f67e26ed5fabb1dbb31dcd77f7ecb84de4ee7
2017-08-10 20:16:48 -07:00
Alexander Sidorov
a7be496fe2 Revert D5589309: modify _LSTM into _RNN to adapt GRU
Summary:
This reverts commit f5af67dfe0842acd68223f6da3e96a81639e8049

bypass-lint

Differential Revision: D5589309

fbshipit-source-id: 79b0a3a9455829c3899472a1368ef36dc75f6e14
2017-08-10 16:42:41 -07:00
Kittipat Virochsiri
b91c2f5064 Make reservoir sampling thread safe
Summary: Guarding reservoir sampling with mutex & fix the bug in counting number of new entries.

Reviewed By: chocjy

Differential Revision: D5503300

fbshipit-source-id: fd6b0bacb71fbab99d6d5df2c72da523fba02847
2017-08-10 15:27:21 -07:00
Kittipat Virochsiri
9c4872f4bc Reservoir sampling with object ID deduplication
Summary: Adding the option to dedup by object ID so that more frequent objects are not present more than once in the reservoir

Reviewed By: chocjy

Differential Revision: D5503109

fbshipit-source-id: e36c3ad8eea134d6c10a4c875fceadc0f843c976
2017-08-10 15:27:20 -07:00
Kittipat Virochsiri
f78af06f1b Features collection with reservoir sampling
Summary: Make the candidate pool less localized

Reviewed By: chocjy

Differential Revision: D5453289

fbshipit-source-id: 848cb7551d7112f6f47f2cf647bb0daca6eff341
2017-08-10 15:27:20 -07:00
Kevin Wilfong
5dba88b40b Caffe2 [easy]: Better exception logging in parallel_workers/data_workers
Summary: Instead of printing the exception using print() use traceback.print_exc()  This way you get a stack trace

Reviewed By: jay-mahadeokar

Differential Revision: D5604642

fbshipit-source-id: f8cb67e554305cd2fbed384a4a2040fa2b16e7c0
2017-08-10 15:27:19 -07:00
James Cross
4758bd851b rectify args btw. train and translate
Summary: Make the command-line arguments pertaining to model architecture the same as between train.py and translate.py. Also use s() scoping function for all intermediate blobs in attention.py (this is for comatibility with multi-headed attention).

Differential Revision: D5594312

fbshipit-source-id: cadf51d854b5a9174ec913f32c655be2abf111e5
2017-08-10 15:27:18 -07:00
Christopher Hay
f2dfb40302 Added amplitude argument to SinusoidPositionEncodingOp
Summary: In order to control the absolute scale/magnitude of the output of this op, added a tuning parameter: amplitude

Reviewed By: jamesr66a

Differential Revision: D5596574

fbshipit-source-id: 3b7e316de55cce6fd686da70aa5658ec3e99b070
2017-08-10 15:27:17 -07:00
Ahmed Taei
5bb1e6b817 Allow passing unsymmetric 2d kernels to brew.conv.
Reviewed By: jay-mahadeokar

Differential Revision: D5598523

fbshipit-source-id: 47135a8562f7c720badb2be677cb79730dc417a0
2017-08-10 15:27:16 -07:00
Kittipat Virochsiri
eb85258beb CreateMapOp
Summary: Add operator to create empty map

Reviewed By: xianjiec

Differential Revision: D5454652

fbshipit-source-id: ecad6cc58572b378962af08cf02063ef546ed58f
2017-08-09 13:32:19 -07:00
Tao Wu
7b86a34610 modify _LSTM into _RNN to adapt GRU
Summary: GRU is different than LSTM that it only has hidden states but no cell states. So in this case, reusing the code of _LSTM is problematic, as we need to delete the part of creating cell state, and change many other places that use hard-coded 4 (hidden_all, hidden, cell_all, cell) into 2 (hidden_all, hidden). Otherwise GRU will break during the backward pass, when the optimizer tries to apply gradient to each of the parameters, because cell state is never used, so it does not have gradients for the corresponding parameters (i.e., cell_state_w, cell_state_b).

Differential Revision: D5589309

fbshipit-source-id: f5af67dfe0842acd68223f6da3e96a81639e8049
2017-08-09 13:24:45 -07:00
Aaron Markham
784ba07bf3 updated downloader to use s3 url without a redirect via the vanity url
Summary:
Model downloader was broken after the move on s3 to the vanity url, download.caffe2.ai. Using this as the url base hits a redirect, and will result in the script throwing a 403 error.  Rather than upgrading to urllib2 or putting in a bunch of code to handle a redirect on urllib, we can just use the non-vanity base url.
Closes https://github.com/caffe2/caffe2/pull/1020

Reviewed By: Yangqing

Differential Revision: D5568686

Pulled By: aaronmarkham

fbshipit-source-id: d88a6b3e1b7955835fc03b036dc54dec48316e7f
2017-08-09 12:25:30 -07:00
Junjie Bai
1ce95090ca Add support for specifying engine preferences
Reviewed By: Yangqing

Differential Revision: D5460994

fbshipit-source-id: 08a8af699eebec37defc070389a8415b3e81ac16
2017-08-09 00:47:18 -07:00
Priya Goyal
5c77cc8182 Exposing num_workers as parameter and enable recycling activations
Summary: as promised, a separate diff for dpm changes I made in experimental code

Reviewed By: pietern

Differential Revision: D5551304

fbshipit-source-id: 9013aeab6c388b1c415ffb2e36fb8dd6b8cf90b0
2017-08-08 19:48:41 -07:00