Summary: word_rewards data type is mixed; ConstantFill assigns long but later is filled with float32. This causes issues when running net from outputted protobuf. This change makes data type to be float32 for lifetime of blob.
Reviewed By: jhcross
Differential Revision: D6486723
fbshipit-source-id: c4ce5185a0a6d71b08b1819f2355e9354823b701
Summary:
This can be used for testing and debugging. zdevito and I will primarily use this for our caffe2 script project
Closes https://github.com/caffe2/caffe2/pull/1585
Reviewed By: zdevito
Differential Revision: D6501209
Pulled By: jamesr66a
fbshipit-source-id: fdd65e422c44b74bb6926320af506dcae13327f3
Summary:
Adding if and while control ops to brew, also adding unit tests
Note: unlike net_builder where we can figure which blobs are external and which ones are local to subnets, here in brew we need to use external_blobs param explicitly to point at external blobls
Reviewed By: harouwu
Differential Revision: D6440508
fbshipit-source-id: c920f0af84b77ccb2d8462ffc7567bb1908c844a
Summary: A while ago, we had to change some blob names in `optimizer.py` (more specifically, names of `iteration_mutex` and `optimizer_iteration`) to handle corner cases when preparing a net for parallel execution.
Reviewed By: azzolini
Differential Revision: D6480819
fbshipit-source-id: a03a7aa9fad322a50e7785914b0eb0f8654e6d90
Summary:
Reduced the array sizes used in pack_ops_test to prevent time outs
during Travis CI builds.
Reviewed By: enosair
Differential Revision: D6476703
fbshipit-source-id: 20ab871ae40349ca27186447a84135bbc5c351b1
Summary:
Adds a new `LSTMCell` subclass to the `rnn_cell` module that performs layer normalization on the fused input matrix. Moves around some code in `rnn_cell.py` to avoid copy-pasta. Adds relevant test cases to `rnn_cell_test.py`.
Had to fix `brew.layer_norm` first. See T24013870.
Reviewed By: jhcross
Differential Revision: D6454883
fbshipit-source-id: 0f4ea7a778cc5be6a7274f7b28c793f5dd7c6095
Summary:
Regardless of device checker/gradient checker we cannot run a
backwards pass with cuDNN when NHWC is used.
Closes https://github.com/caffe2/caffe2/pull/1566
Differential Revision: D6474181
Pulled By: pietern
fbshipit-source-id: 727d7b4f2a1431a4d6675ffb76c5b60d3d7fa712
Summary:
This is a supplementary to commit ce8267d425444f60ae650389fb41838847a44a5e. It allows specifying device to prepare_prediction_net() so prediction extractor can work with GPU.
Closes https://github.com/caffe2/caffe2/pull/1035
Differential Revision: D6467420
Pulled By: salexspb
fbshipit-source-id: b5b9a1536fb516e90b5e4b615403086943cfbe93
Summary: Quick fix for unit test broken by D6454290. This is my fault for approving while the tests covering the single callsite were broken.
Reviewed By: goldsborough
Differential Revision: D6466566
fbshipit-source-id: 2683be3d6bb184286e64fbde3e572946e39030c7
Summary:
While working on layer normalization for LSTMs I encountered an issue where the layer norm parameters (which are the scale/gain and bias/shift from the paper) were not registered in the model for `brew.layer_norm`. salexspb explained that this is because it was using the `init_net_param` API instead of `create_param`. This diff fixes this.
While fixing I noticed that I noticed that `brew.layer_norm` actually had a bug where it was multiplying with the bias instead of adding it. Another issue was that the function giving the scale and bias a shape of `[1]`, however the paper (https://arxiv.org/pdf/1607.06450.pdf) specifies that, like for batch norm, there is one scale and bias parameter per neuron, i.e. the shape should be `[1, axis_dimension]`. The API now takes an explicit `dim_in` parameter (also more consistent with other normalization functions in that module) so that this can be specified. See tests for how this now looks.
Reviewed By: jhcross
Differential Revision: D6454290
fbshipit-source-id: fc00ca614de3190c40ab743e8984bec9e85fb58c
Summary:
Adding a check to pack_segments to make sure the lengths passed in add up as expected.
Additionally started to address https://fb.facebook.com/groups/1405155842844877/permalink/1977332432293879/ , but it might not fix that issue, but is still useful if it does not help that issue.
Reviewed By: salexspb
Differential Revision: D6443490
fbshipit-source-id: 680dc763a788a550d321d97a556c5b46e3402dd1
Summary: Replaced sigmoid + xent loss with SigmoidCrossEntropyWithLogits. The sigmoid layer computes the multinomial logistic loss of the sigmoid of its inputs. It's conceptually identical to a sigmoid layer followed by a multinomial logistic loss layer, but provides a more numerical stable gradient.
Reviewed By: xianjiec
Differential Revision: D6305455
fbshipit-source-id: 444c9f651fbdf13c3c52be5142769f8f98ed8770
Summary:
Get higher order interaction of embeddings, similar to cross net but applied in the embedding level.
Formula:
e_(l+1,i) = element_wise_mul[e_(0,i), \sum_i(e_(l,i) * w_(l,i))] + e_(l,i) + b
where l means the l-th layer of this higher order net, i means the i-th embedding in the list.
Finally, concat all the embeddings in the last layer, or concat the sum of each embedding, and attach to the output blob of dot processor.
Differential Revision: D6244001
fbshipit-source-id: 96292914158347b79fc1299694d65605999b55e8
Summary:
Problem:
when we initialize a model from an existing model, currently we load information for each layer parameter independently (in utils.py), including shape information. we have to load the whole model from the db_path every time when we initialize one parameter (in layers.py). For example, in f31078253, the model needs to be initialized twice (not sure why). each time there are 152 layer parameters to load. and loading a model needs 10 min - 50 min depending on resource status.
Restriction:
1. _infer_shape_from_initializer in layers.py is called from multiple other places, besides the if branch of ModelInitDefinition.INIT_MODEL_PATH in load_parameters_from_model_init_options in utils.py, which is the root cause of f31078253. So we still need to support the load operator in _infer_shape_from_initializer. So we need to batch shape blobs loading outside of LayerParameter.
2. in the if branch of ModelInitDefinition.PARAMS in load_parameters_from_model_init_options in utils.py, the db_path can be different from different parameters, so it is hard to batch them.
Solution:
Batch the shape blobs loading in the if branch of ModelInitDefinition.INIT_MODEL_PATH in load_parameters_from_model_init_options in utils.py. We load the model and generate shape blobs of layer parameters in the workspace, so that _infer_shape_from_initializer in layers.py can directly return shape blobs of layer parameters cached in the workspace without reloading the model. and at the same time _infer_shape_from_initializer can still support separate any load operator if shape blobs are not pre-loaded into the workspace (this logic can be used for other ways to initialize a model rather than from an existing model).
Right now we are using 500 layer parameters per batch, and it worked fine. So for 152 layer parameters, one model loading is enough.
Reviewed By: xianjiec
Differential Revision: D6397607
fbshipit-source-id: 54f6f61d6d8b70c82b74c2d72ac56cd010a710da
Summary:
(Work in progress). This diff will allow shifting of activations to other GPUs, in case the model does not fit into memory. To see the API, check the code in data_parallel_model_test, which tests shifting two activations from 0 and 1 to gpu 4, and from gpu 2 and 3 to gpu 5.
I will need to further test on ResNets, and probablly add copy operations to handle device change points.
Reviewed By: asaadaldien
Differential Revision: D5591674
fbshipit-source-id: eb12d23651a56d64fa4db91090c6474218705270
Summary:
This is a CUDA implementation of the RemovePadding operator, modeled on akyrola's implementation for AddPadding.
There's also an incidental spelling correction: GetAddPadingGradient -> GetAddPaddingGradient.
Reviewed By: akyrola
Differential Revision: D6439594
fbshipit-source-id: b29cd0c252021c58e150b901bbaad28a3bd3cc4a
Summary: Experimental code that allows you to write C2 NetDefs directly using python-like syntax. This includes the ability to write native control-flow (if, while) and have it turn into IfOp and WhileOp
Reviewed By: jamesr66a, dzhulgakov
Differential Revision: D6123298
fbshipit-source-id: 25fc078b5769be61ac7fb3aa9a7c95bd88dccc30
Summary: Support regression with output transform in MTML for feed.
Differential Revision: D6403523
fbshipit-source-id: faa0aab1227a27286b617e8e25adfbab3a349d2c
Summary:
With some test seeds this warning starts firing.
Should be addressed in a better way, not generating as many invalid examples.
Closes https://github.com/caffe2/caffe2/pull/1536
Reviewed By: bddppq
Differential Revision: D6437138
Pulled By: pietern
fbshipit-source-id: c619d928a585e3d887f686db5d98f841af10c56b
Summary:
TSIA. This is found in
https://github.com/caffe2/caffe2/pull/1530
Reviewed By: dzhulgakov
Differential Revision: D6434417
fbshipit-source-id: 2285c2f6252eb7f24e83357eb4887851b3adf690
Summary:
Async executor based on async_polling (D5985110):
- Tasks scheduling other tasks, using polling only when necessary (e.g.
CUDA->CPU case)
- Fully async, i.e. RunAsync immediately returns
Reviewed By: azzolini
Differential Revision: D6281681
fbshipit-source-id: 06e3723e1424ffab652c38ca7b279cf76e43fa44
Summary:
enosair caught bug that the operator returned too early if the lengths output was not provided. Fixed and added testing.
+ noticed the op does not support case when no lengths-input is provided. Added a temporary CAFFE_THROW for this case, will fix later
Reviewed By: enosair
Differential Revision: D6405585
fbshipit-source-id: a81717e1b39afde6e900ddd9049b820943aea9f1
Summary: CUDA version of the AddPadding op. It first executes a prefix-sum using Cub to compute the cumulative lenghts array. Then it launches a kernel that uses this information to fill the output tensor with start, end paddding and the actual contents.
Reviewed By: asaadaldien
Differential Revision: D6391413
fbshipit-source-id: 45b431e5976674729e53cb4752c7753c1d8a69e8
Summary:
so that user can use 'WeightedSum' pooling method when there is mix of id list feature and id score list features.
- it's still intuitive to have "WeightedSum" for id list, and we do not need to introduce new "UnWeightedSum" etc.
Reviewed By: chocjy
Differential Revision: D6369270
fbshipit-source-id: 722fa08d1a7986bc6ecf4c7cb02bbae0825bcab4
Summary: small changes as I was reading through the dper code base. all of them are nits, but somewhat helped me understanding things.
Reviewed By: xianjiec
Differential Revision: D6389380
fbshipit-source-id: 3412052e4fcba199c6ffc84c6f7ae11bf8ff6ee9
Summary:
Caffe2 user was confused when model.TensorProtosDBINput([reader]) did not work. This is because of this outdated model helper function, that ignored the input blobs.
Added assertion to enforce correct usage. I did not want to make this work with reader input as well, since this probably should not be used anyway.
Reviewed By: amanrajdce
Differential Revision: D6380326
fbshipit-source-id: 6a50c2861f7f58c06cbfe3e86bde0f17a2b443cb
Summary: Today when PythonOp throws an exception, we log the error and fail the op. Later we assert that the op/net/plan succeeds and throw with a generic message. The user must ttail the logs to find the real error. Instead, align with exception handling from other ops - throw directly. This will include full context of the exception in the error message.
Reviewed By: Yangqing, akyrola
Differential Revision: D6359684
fbshipit-source-id: 85133ba6562759607a3971449120647cbacce946
Summary: change the interface so BMUF can run on cpus
Reviewed By: asaadaldien
Differential Revision: D6356026
fbshipit-source-id: f58a4da9f800d969145a1a376e118b0f3581f8c1
Summary: Reported by SImon Layton from NVIDIA: we had a couple of py3-incompatible expresions in data_parallel_model
Reviewed By: azzolini
Differential Revision: D6349447
fbshipit-source-id: a09feb69396be43296400591a3bfed5b8c370b0d
Summary: Cast op cuda can deal with empty batch now.
Reviewed By: azzolini
Differential Revision: D6350138
fbshipit-source-id: 2f3d19f4d42ff34806aa9597690e66f6b4de1a6b
Summary:
Two ops: BatchSparseToDenseOp and DenseToBatchSparseOp
Inverse operations of each other.
Details are described in op Doc
These op is used along with flexible topK, where the output is
lengths, indices, and values.
We want to do softmax on the values, but the dimension of each batch is different. So these op will convert sparse representation to dense and vice versa. The two ops are also gradient op for each other.
Reviewed By: chocjy
Differential Revision: D6288338
fbshipit-source-id: 0ba9e611058b39e46e7414dcc5f39cab29915fa3
Summary:
This is part one: It adds lambdaNDCG loss which can be used to heuristically
optimize the NDCG metric.
Differential Revision: D5830650
fbshipit-source-id: 1eb696337c9a77727ad40219c68f6468e2e097a5
Summary: Came across this bug in doc when I was figuring out NetBuilder form the code.
Reviewed By: volkhin
Differential Revision: D6341821
fbshipit-source-id: 8818f3d92681366bfe7b90d9d4da9f68ef6e4672