Commit graph

55 commits

Author SHA1 Message Date
Kittipat Virochsiri
22d4eaeb9e JoinContext
Summary:
Layer to allow model to follow different paths for each instantiation context and join later. Together with tagging system cleanup (this is a separate issue), this should reduce the need to write a layer to differentiate between context.

Re: tagging system clean up, we should make exclusion more explicit: EXCLUDE_FROM_<CONTEXT>. This would simplify instation code. TRAIN_ONLY should become a set of all EXCLUDE_FROM_*, except EXCLUDE_FROM_TRAIN.

Reviewed By: kennyhorror

Differential Revision: D4964949

fbshipit-source-id: ba6453b0deb92d1989404efb9d86e1ed25297202
2017-05-02 17:32:26 -07:00
Chonglin Sun
e8e93066e7 add workflow for user complicated embedding
Summary: Correctly propagate request_only tag to all layer.

Reviewed By: kennyhorror

Differential Revision: D4751496

fbshipit-source-id: e65fd8cfe56d2989213d44e684a528ede691d316
2017-05-02 10:46:52 -07:00
Jiyan Yang
a458aa4b2a Fix tags to be based on EXCLUDE_FROM_{CONTEXT}
Summary: Cleaning up the tagging system. Introducing tags EXCLUDE_FROM_{CONTEXT}.

Reviewed By: kennyhorror

Differential Revision: D4974842

fbshipit-source-id: b0fa6772299bb70afa2192c39e45191c9f41336a
2017-05-02 09:32:27 -07:00
Kittipat Virochsiri
ffc6bad116 Concat axis=0
Summary: Previously, the code below would go out of bound.

Reviewed By: xianjiec

Differential Revision: D4968037

fbshipit-source-id: 3760e2cddc919c45d85ac644ac3fabf72dbaf666
2017-05-01 12:19:34 -07:00
Jiyan Yang
795dc1c326 Remove loss ops from eval net
Summary: Current eval nets contain loss operators; see example: https://fburl.com/6otbe0n7, which is unnecessary. This diff is to remove them from the eval net.

Differential Revision: D4934589

fbshipit-source-id: 1ba96c20a3a7ef720414acb4124002fb54cabfc7
2017-04-26 12:46:25 -07:00
Yangqing Jia
deb1327b6e Re-apply #266
Summary: Closes https://github.com/caffe2/caffe2/pull/404

Differential Revision: D4943280

Pulled By: Yangqing

fbshipit-source-id: c0988598d8ccb8329feac88382686324b90d4d46
2017-04-25 21:17:04 -07:00
Jiyan Yang
ef2701a57e MapToRange layer
Summary: A layer that takes raw ids as inputs and outputs the indices which can be used as labels. The mapping will be stored with the model.

Reviewed By: kittipatv

Differential Revision: D4902556

fbshipit-source-id: 647db47b0362142cdba997effa2ef7a5294c84ee
2017-04-25 16:03:58 -07:00
Yangqing Jia
a48062b1a2 temporarily fix sync script bugs changes by reverting partially https://github.com/caffe2/caffe2/pull/266/files 2017-04-24 15:49:22 -07:00
Huazhong Ning
f950a1b70f create bucket-based calibration - model manipulation
Summary: added a new context to layers.py

Reviewed By: kennyhorror

Differential Revision: D4817124

fbshipit-source-id: 36f08964b86092e81df24c1b9d4b167293a7ffb8
2017-04-18 22:01:23 -07:00
Huazhong Ning
ad6b53e401 allow to specify output dtypes for functional layers
Summary:
Currently, the functional layer infers the output types and shapes by running the operator once.
But in cases where special input data are needed to run the operator, the inferrence may fail.
This diff allows the caller to manually specify the output types and shapes if the auto infererence may fail.

Reviewed By: kennyhorror

Differential Revision: D4864003

fbshipit-source-id: ba242586ea384f76d745b29a450497135717bdcc
2017-04-18 16:34:52 -07:00
Kittipat Virochsiri
0a726af42e Coerce input of FunctionalLayer to record
Summary: Having to pack the input to schema doesn't make much sense since the structure is not recognized by operators anyway.

Differential Revision: D4895686

fbshipit-source-id: df78884ed331f7bd0c69db4f86c682c52829ec76
2017-04-17 19:26:06 -07:00
Aaron Markham
b93a7b134a doxygen configs and updated python files to inc. doxygen tags (#266)
* updated ubuntu instructions

* updated ubuntu notes and troubleshooting

* updated tutorials using local files

* added doxygen python blocks for docs generation

* doxygen related files for generating docs
2017-04-14 16:30:33 -07:00
Kittipat Virochsiri
baf33161d4 GatherRecord layer
Summary: Perform gather on the whole record. This will be used for negative random sampling.

Reviewed By: kennyhorror

Differential Revision: D4882430

fbshipit-source-id: 19e20f7307064755dc4140afb5ba47a699260289
2017-04-13 15:02:44 -07:00
Huazhong Ning
15c6f637d6 create bucket-based calibration - layer
Summary:
The basic idea of bucket-based calibration:
1. given a model and a calibration data set
2. apply the model to the calibration data set and sort the prediction scores
3. bucketize the prediction scores
4. for the samples in each bucket, compute the proportion of positive samples
5. build a set of piecewise linear functions that map from the bucket range to the proportion
6. appends an operator of piecewise linear transform to the prediction net that is supposed to calibrate the raw predictions.
7. to support calibration in realtime training, we create a new type of Net -- bucket calibration net. This needs a new Context to add_calibration_ops(), to export and load the new Net.

This includes a series of diffs.

This diff implements a layer that adds different operators for train/cali/eval for bucket based calibration.

Reviewed By: dragonxlwang

Differential Revision: D4817119

fbshipit-source-id: 44f8fcad2a94f40f7439cc1ad47e7bae5e17397d
2017-04-11 12:30:26 -07:00
Kittipat Virochsiri
5c32c82a6d Add option to subtract log odd from sampled trained prediction.
Summary: Useful for sampled softmax training

Differential Revision: D4782673

fbshipit-source-id: 88195de60070a0bc16f5e06b9aad4dffd0484546
2017-04-03 17:50:58 -07:00
Kittipat Virochsiri
3b4c950862 Add option to use id_score_list_features column
Summary: Somehow, feed-non-ranking training data usually have this type of column. Add option to support it.

Reviewed By: xianjiec, kennyhorror

Differential Revision: D4773960

fbshipit-source-id: 5a7ef4618a070e04f3cd8ddfcbf2b7441c00d92d
2017-04-03 17:03:09 -07:00
Xianjie Chen
9fc56793dd fix trunk for push and small cleanup
Summary:
multiple places broken, blocking the push :(

- fix the weighted training for ads and feeds
- fix the publishing if no exporter model is selected
- fix the feeds retrieval evaluation
- added the default config for retrieval workflows. plan to use for flow test (in next diff)
- clean up not used code
- smaller hash size for faster canary test

Reviewed By: chocjy

Differential Revision: D4817829

fbshipit-source-id: e3d407314268b6487c22b1ee91f158532dda8807
2017-04-02 23:35:49 -07:00
Jiyan Yang
b401cb48fe Make optimization methods configurable and allow flexible optimization settings
Summary:
This diff does the followings:

1. Add optimization options to model options in the UI for all workflows.
2. Allow different parameters to use different optimizers (or same optimizer with different settings, eg, learning rate).
3. Remove the default values for the `sparseDedupAggregator` field in the thrift file as the default value for that should just be `None` instead of 'sum'.
4. `fb/dper/layer_models/mlp_sparse.py` is deprecated.
5. Add calibration to two tower workflows.

Reviewed By: kittipatv

Differential Revision: D4767004

fbshipit-source-id: de92ea63fb0ff33f8581b1693479b723a68cd2d1
2017-04-01 23:02:21 -07:00
Ou Jin
cd4160c894 distributed training for dper2
Summary:
Add distributed training to dper2 and keep the dper1 working.

* Created a ModelDelegator to wrap ModelHelper and LayerModelHelper to mitigate the difference.
* To get the average length for sparse feature, I extracted some information in feature_processor. There should be some better way to do it after we have new compute_meta.
* metric right now only runs on the first trainer.
* The model is saved correctly for evaluation. But I'm still not sure how to handle the weights for adagrad.

Reviewed By: kennyhorror

Differential Revision: D4767745

fbshipit-source-id: 0559d264827a7fd9327071e8367d1e84a936bea9
2017-03-30 19:04:50 -07:00
Kittipat Virochsiri
e1d64ea4d5 support multilabel in generic preprocessor
Summary:
Adding support for multilabel in multiclass workflow. `input_feature_schema` and `trainer_extra_schema` are now a function taking in the preprocessor option and output the schema. This allows dynamic schema definition based on the option.

Changing default value will be in the next diff.

Reviewed By: xianjiec

Differential Revision: D4750064

fbshipit-source-id: 896143f432e963bc1723c0153749efeb39a83bec
2017-03-29 15:20:54 -07:00
Kittipat Virochsiri
3eb3507367 uniform_sampling layer
Summary: This layer will be used to sample negative labels for sampled softmax.

Differential Revision: D4773444

fbshipit-source-id: 605a979c09d07531293dd9472da9d2fa7439c619
2017-03-29 14:36:12 -07:00
Aaron Markham
58f7f2b441 doxygen python block added
Summary: Closes https://github.com/caffe2/caffe2/pull/226

Differential Revision: D4793550

Pulled By: JoelMarcey

fbshipit-source-id: cc33e58186304fa8dcac2ee9115dcc271d785b1e
2017-03-29 06:46:16 -07:00
Andrey Malevich
7cc92b1260 Add eval net for layer_model_helper
Summary:
This diff is adding eval nets to layer model helper. It should be useful for
the cases when train/eval nets need some extra input (usually some supervision)
for train/eval. For example various sampled layers, etc.

Differential Revision: D4769453

fbshipit-source-id: 7a8ec7024051eab73b8869ec21e20b5f10fd9acb
2017-03-29 04:03:40 -07:00
Kittipat Virochsiri
da36212259 SamplingTrain layer
Summary:
`SamplingTrain` layer is a wrapper around another layer subclassing `SamplingTrainableMixin`. When initiated in the training context, `SamplingTrain` produces sparse output of the wrapped layer. Output can be paired with `indices` to create Map schema.  When initiated in prediction context, the full output of the wrap layer is produced.

This is liked the SampledFC function in model helper, https://fburl.com/gi9g1awh, with the ability to initiated in both trainig and prediction context.

I'd like to get consensus whether we should introduce the `SamplingTrain` layer and the accompaying mixin. This can probably be accomplished in some other way, but I think this is not too bad.

Reviewed By: xianjiec

Differential Revision: D4689887

fbshipit-source-id: 7be8a52d82f3a09a053378146262df1047ab26a8
2017-03-27 23:31:55 -07:00
Huazhong Ning
8168e8ac25 allows to specify output names for functional layers
Summary:
currently the output schema and blobs are names as "field_i" which is
bad for debugging. This diff allows us to specify output names.

Reviewed By: kennyhorror

Differential Revision: D4744949

fbshipit-source-id: 8ac4d3c75cacbb4c9b5f55793ac969fe1cf20467
2017-03-23 13:18:58 -07:00
Kevin Waugh
d13f98de4e implemented DistillLRLoss
Summary: Created `BatchDistillLRLoss` layer and added support for it in DPer2.

Differential Revision: D4718333

fbshipit-source-id: b873954ea704daafed94ac65fef47a20d56858e2
2017-03-20 16:01:29 -07:00
Kittipat Virochsiri
4829bdb1ea BatchSoftmaxLoss layer
Summary: Similar to BatchLRLoss layer

Reviewed By: xianjiec

Differential Revision: D4689609

fbshipit-source-id: 89fa4b9d4145ce77cb2aaa7a5c0c1a24f901d88f
2017-03-17 10:19:06 -07:00
Kittipat Virochsiri
cea16ff7cd BatchSigmoidCrossEntropyLoss
Summary: To support feed interset team

Reviewed By: kdub0

Differential Revision: D4719213

fbshipit-source-id: 8deb3544377fb06593399b101de66f3f845f93b5
2017-03-17 09:35:51 -07:00
Huazhong Ning
ad4ae4528f migrate mtml to dper2
Summary:
1. migrate the basic mtml model to dper 2
2. test dper 2 mtml model
3. test all optimizers

Reviewed By: kittipatv

Differential Revision: D4680215

fbshipit-source-id: 7aac5c59bdac22fcad8ed869b98e9e62dca1d337
2017-03-16 17:48:05 -07:00
Kevin Waugh
2c8bf2525b added BatchL2Loss layer
Summary: layer that takes a label, prediction pair and outputs the L2 loss

Reviewed By: kittipatv

Differential Revision: D4702111

fbshipit-source-id: 09f2ede44d1b548e61096de741f1b2aa0b66bbcb
2017-03-16 17:32:20 -07:00
Xianjie Chen
b2ab7365be fix for special case when dense dim is 1
Summary: otherwise it will fail here: https://fburl.com/puy5x2dq

Reviewed By: kittipatv

Differential Revision: D4719212

fbshipit-source-id: e0d8211f64dca00ee48df3235d2bc030ea30f208
2017-03-16 05:19:10 -07:00
Kittipat Virochsiri
61dd35f1d6 FCWithoutBias layer
Summary: For some embedding task, we don't want to include bias term in embedding computation.

Reviewed By: xianjiec

Differential Revision: D4689620

fbshipit-source-id: 4168584681d30c0eaa1d17ceaf68edda11924644
2017-03-15 11:03:37 -07:00
Kittipat Virochsiri
25b1221579 Allow scalar output in functional layer
Summary: Some operators, e.g., SoftmaxWithLoss, returns scalar-typed tensor. This would allow us to use those ops without having to write layer manually.

Reviewed By: xianjiec, kennyhorror

Differential Revision: D4703982

fbshipit-source-id: f33969971c57fc037c9b44adb37af1caba4084b6
2017-03-14 15:32:47 -07:00
Xianjie Chen
e5858485ca small change to concat layer to make tensor board vis nicer
Summary:
otherwise the blob will be in different namescope, e.g., `_nested`: https://fburl.com/ntlsaezv.
this make tensorboard ugly.

Reviewed By: dzhulgakov

Differential Revision: D4696946

fbshipit-source-id: 73627feccd7c4896964e6c549b7241bcce4f49a7
2017-03-12 23:01:18 -07:00
Xianjie Chen
95501a0165 clean old unit test, add sum processor and sqrt pooling
Summary: sum processor and sqrt pooling is to mimic the DoubleHelix model.

Differential Revision: D4678413

fbshipit-source-id: fc1ccfe3c92c540ce5914dfd8ff1a040805c48db
2017-03-08 23:04:19 -08:00
Chonglin Sun
7472631e7f fix bug in Mean pooling
Summary: simple fix

Reviewed By: xianjiec

Differential Revision: D4655469

fbshipit-source-id: 6dbcfcd2f3f7f7bd74aca88af4f60c6ddffb9138
2017-03-06 11:31:10 -08:00
Qichao Que
2f68632a32 Add SparseNN workflow for feed.
Summary: Add SparseNN workflow for feed. I haven't fully thought about the change needed for ads, as I added a property called 'preproc_output_schema' for LayerModelHelper.

Reviewed By: xianjiec

Differential Revision: D4585796

fbshipit-source-id: 060d08f4beb928e7e7863f2e563f612c358951fb
2017-03-01 11:02:38 -08:00
Artem Volkhin
000db87bc7 Half-floats support for the rest of segment ops
Summary:
previously fp16 type was supported in SparseLengthsSum operator, now it
works in all other segment operator as well.

Reviewed By: dzhulgakov

Differential Revision: D4624312

fbshipit-source-id: c9d72110e3762167270bb088405eaf9c56e88493
2017-02-28 11:19:15 -08:00
Andrey Malevich
a3726759c6 Add a way do describe layers in a more AdHoc manner.
Summary:
This diff is trying to address one of the concerns that Xianjie have had - requirements create a layer for all operators and attach pass shapes and other info around.

The basic idea of the diff:
1. Try to create a layer with a given name, but if it's not available try to fallback on operator with that name (that is expected to have no parameters).
2. For all operators that we're adding through this functional style of creation - try to use C2 Shape/Type inference logic to get output type. If we fail to get - it just return untyped record and expect user to annotate it when it's really needed.

Reviewed By: xianjiec

Differential Revision: D4408771

fbshipit-source-id: aced7487571940d726424269970df0eb62670c39
2017-02-27 23:30:39 -08:00
Xianjie Chen
aed3aabc7f model and preprocessor can handle empty dense inputs
Summary: we may not need dense feature inputs in some models (e.g., double helix).

Reviewed By: dzhulgakov

Differential Revision: D4568755

fbshipit-source-id: 6850508f86fafb53f81783b2a2a38776be5455d7
2017-02-22 11:19:15 -08:00
Artem Volkhin
45e1905722 add support of fp16 to SparseLengthsSum and SparseLengthsMean
Summary: Another part of making DPER compatible with half-floats. This diffs adds supoprt of fp16 to segment reduction operators used in DPER.

Reviewed By: dzhulgakov

Differential Revision: D4587560

fbshipit-source-id: 0ae10648a7286a820bffaee802464dd9464584bc
2017-02-22 11:05:55 -08:00
Artem Volkhin
b2cf0fad15 Convert SparseLookup layer's embedding to fp16 blobs for predictor
Summary:
First part of adding half-floats support to DPER 2.0. Let's add an option use_half_floats to enable converting some weights of the model from fp32 to fp16 before saving it to predictor models parts. For now it's for SparseLookup layer's embeddings. All conversion is done after training is finished and saved models are ready to be used on remote predictors as-is (they will be stored compacted in memory). New fp16 blobs are saved to the model instead of original ones, under the same names, so we don't modify MetaNetDef at all.

Next steps:
1) support on delivery side -- operators working with these blobs should support both float and float16 input types
2) benchmark performance to make sure there is no regression
 a) of serialization
 b) of delivery
3) support realtime training (I'm thinking about adding new pre-publishing net which will be executed each time the realtime trainer stops to publish a new snapshot)

Depends on D4567304

Reviewed By: kennyhorror

Differential Revision: D4571710

fbshipit-source-id: 19967a17d3bd84878d66e8c0ed8c5342bf38d979
2017-02-22 11:05:49 -08:00
Xianjie Chen
8949abe10b more clear about supported output dimension
Summary: Do I understand correctly? It must be of size 1 for sigrid

Reviewed By: kennyhorror

Differential Revision: D4576541

fbshipit-source-id: 92fa8dc62e36ff095e14cceeb80b03c0028f5695
2017-02-16 21:01:52 -08:00
Xianjie Chen
d0621a2449 NextScopedBlob with well-defined behavior and respect namescope
Summary:
Remove the use of `NextName` in layer model helper, so that the same function return `model_helper` that should construct identical `Net`, when under the same NameScope.

The `NextScopedBlob` should only take effect when there is real name conflicting, otherwise it returns ScopedBlobReference.

This is critical for parameter blobs. In long run, we need to be able to specify parameter blobs more explicitly. (kennyhorror is working on this). This solution works in short term for e.g., two tower sparse nn models.

Reviewed By: kennyhorror

Differential Revision: D4555423

fbshipit-source-id: 2c4b99a61392e5d51aa878f7346466a8f14be187
2017-02-16 17:16:36 -08:00
Andrey Malevich
86fb25cefa Rely on embedding size in split
Summary: As desc.

Differential Revision: D4471823

fbshipit-source-id: 2685c64c22556da1749b3e3e6b21a684a7231e7b
2017-01-27 19:44:31 -08:00
Vsevolod Oparin
5e5486491d Replace Gather + RowMul by SparseLengthsWeightedSum
Summary:
Improving performace using command SparseLenghtsWeightedSum. Results for my run:
Before:

  8.98474 RowMul
  6.89952 Gather
  0.80991 LengthsSum
  2.02056 SparseLengthsWeightedSum
  Total: 18.71

After:

  1.075 Gather
  6.54999 SparseLengthsWeightedSum
  Total: 7.62

Log of run: P56992396

With skip_backward. Command:

  CLASSPATH=/mnt/vol/gfsetlprocstore-oregon/users/cxj/hivereader-wrapper-1.0-SNAPSHOT-standalone.jar OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 MKL_DYNAMIC=FALSE ./buck-out/gen/caffe2/caffe2/fb/dper/tools/speed_benchmark.par -loader_param /mnt/vol/gfsfblearner-altoona/flow/data/2017-01-22/d832bb7b-5598-422e-9fee-b3299a9c8c1f -negDownsampleRate 0.1 -hidden 'unary(dot{"num_dense": 6, "pooling_method": "PositionWeighted"}(128, 64)128-128, 1)' -model_type mlp_sparse -warmup_runs 10 -main_runs 1000 -run_individual -skip_backward 2>&1 | tee /tmp/log.txt

Before: P56993234$7509
After: P56992503$7344

Command:

  ./fblearner/nn/ads/canary all

https://our.intern.facebook.com/intern/fblearner/details/13320564/?notif_channel=cli

Cloned "caffe2 ads sparse nn canary" run: https://our.intern.facebook.com/intern/fblearner/details/13322337/

Reviewed By: xianjiec

Differential Revision: D4451073

fbshipit-source-id: 0a4e9693d7b8b0372b2efefa61154e987a493210
2017-01-24 20:44:21 -08:00
Andrey Malevich
ec51f887bf Create only one instance of SigridTransform in DPerExample.
Summary:
DPer example have been creating multiple copies of the transform config in net
defition till this moment, that resulted in the fact that I've hit the limit of
ProtoBuf (64MB) for a certain Task requests (especially visible because of the
ValidationPipeline that I was adding).

After this diff we're going to store SigridTransforms in one instance per
machine for training (or 1 instance per reading).

Difference in sizes of the plans for some simple SparseNN model ~30 MB (even including the fact that second model have validation plan as well).

TODO: Do similar logic for NNPreProc as well (it's also pretty large).

Reviewed By: dzhulgakov

Differential Revision: D4441441

fbshipit-source-id: 4452dd86a4dc49b2c7f5b7642f443aed5720b047
2017-01-22 19:29:16 -08:00
Andrey Malevich
8047b8dc83 Fix random issues with some of the layers getting missing from registry.
Summary:
It looks like for the types that are created directly through type(...)
function call, we don't store the strong references anywhere. As a result
a GC call in Python might/or might not clean up these classes depending on the
phase of the moon and other random things. This results in a fact that in some
cases simple layers as a Relu might disappear.

cat_shame

Reviewed By: xianjiec

Differential Revision: D4396289

fbshipit-source-id: ba4e9b7ef54ee43349853b0acc3d3f40c74e4d73
2017-01-10 15:14:31 -08:00
Ievgen Soboliev
a7f8fe0423 introduce request net into prediction schema
Summary: As title. We want to have request_only net which runs on user_only sparse features. Submitting to get early feedback.

Reviewed By: dzhulgakov

Differential Revision: D4282783

fbshipit-source-id: 71241bf5444550075884c788c2da4783659bc1e0
2016-12-22 15:59:27 -08:00
Ievgen Soboliev
1632f053e5 implement user-only metadata for input_record
Summary:
We want to implement request only net and to do this we decided to split the work into two parts. The first part will propagate required metadata and the second part will cut the nets properly.
This diff is to propagate request_only metadata across the layers.

A few notes about implementation:
  - Each layer contains a field request_only which can be set based on the input_record. If all the scalars from the input_record are marked request_only we mark a layer as request_only;
  - Sparse-To-Dense layer sets request_only metadata;
  - SigridTransformation and SparseLookup layers propagate request_only status;
  - As for now we join request_only and other sparse features together in input_record, but ideally we may want to separate this, because request_only should be served separately;

Reviewed By: xianjiec

Differential Revision: D4259505

fbshipit-source-id: db8a30ef92cba84f1a843981b9dde3a8b9633608
2016-12-15 12:01:29 -08:00