Summary:
Some installations of numba seems to be not compatible with asan, so we
will disable its import.
Reviewed By: dzhulgakov
Differential Revision: D6664055
fbshipit-source-id: 311774667e54bdbf328ef280ab2a52ecba1361f2
Summary:
In this PR I do the following:
1. split lstm_test_main into several tests for LSTM, MiLSTM and various Norm based versions
2. instead of looping over various gradient / optimization parameters now they are random inputs through hypothesis.
3. These change make the test faster and we can avoid limiting number of examples
4. Fix a minor bug with gradient checker in RNN unroll test running twice
5. Generate seed for numpy in hypothesis. This make hypothesis avoid having fluky tests
Also note that Norm tests sometimes fail. I haven't looked into it much, it could be just precision issues. New test split should help identify these issues.
Closes https://github.com/caffe2/caffe2/pull/1678
Reviewed By: pietern
Differential Revision: D6657076
Pulled By: salexspb
fbshipit-source-id: 9f59c71ccd2c818156e9d2424c3423d450b8c8e2
Summary:
There were no dimensionality constraints to the generated indices
array, causing many examples being generated and filtered out. Instead,
we should ensure the probability of unique indices is high.
There is a better fix for this by using the `unique` keyword argument
to `hypothesis.extra.numpy.arrays`, but this is available only in
hypothesis version 3.28.0 and later.
This is related to #1536 and #1599.
Once this change has proven to be OK, we can modify the other tests
that now have health check suppression enabled as well.
Closes https://github.com/caffe2/caffe2/pull/1686
Reviewed By: Yangqing
Differential Revision: D6651789
Pulled By: pietern
fbshipit-source-id: d80886c9ccf0a7a842a7580a279f33a2d6cca97c
Summary: Add Min and MinGradient Op
Reviewed By: jamesr66a
Differential Revision: D6608668
fbshipit-source-id: 7e1f8fa7a42a94f26152da0109d597e5deeb21c0
Summary:
hill: the learning rate changes according to following 3 stages
1) linear warmup (increasing) at first num_iter steps from start_multiplier
2) inverse shrink (decreasing) afterwards (gamma, power)
3) lower bounded by end_multiplier
Differential Revision: D6565379
fbshipit-source-id: 9c0e51fc825ba6a7765803a1f09479497057a9d9
Summary: Simple fallback implementation to support LengthsRangeFill, we can have native CUDA implementation later
Reviewed By: pietern
Differential Revision: D6594031
fbshipit-source-id: b705234a591a61e8d1ee5f7524aceec3f4581f9c
Summary: Currently these operators are implemented in a complex meta-programming fashion, I removed the definitions and put modified CPU/CUDA implementions into reduction_front_back_ops.{cc,cu}. This will help future extension of these ops to support lengths input.
Reviewed By: asaadaldien
Differential Revision: D6506568
fbshipit-source-id: 7323baf7c8e0eca37912f3ae28c02e37ad2e1103
Summary:
Commit 479e4ce5 didn't end up solving the health checks firing and
they are likely still caused by the remaining `assume` calls.
Closes https://github.com/caffe2/caffe2/pull/1625
Differential Revision: D6573036
Pulled By: pietern
fbshipit-source-id: eeb21bdd61dca0a632eb1ba9e529177ac2569bfd
Summary: A version of MILSTMCell which uses layer normalization (see https://arxiv.org/pdf/1607.06450.pdf). There's a lot of copypasta because we don't want to make the existing RNNCell classes harder to approach / understand by adding new options.
Differential Revision: D6564208
fbshipit-source-id: 0bc43e12b6c08ebdf5ea6af2c631f785c302bdb4
Summary: the "assume" statement in adagrad_test leads to health check failure. here we remove it by checking dc == hu.gpu_do
Reviewed By: pietern
Differential Revision: D6513314
fbshipit-source-id: 4caf2d938e5f5935a95cca8abd99185182223d63
Summary:
This enables two learning rate for Generator and Discrimintor in GAN. For each iteration i, it will decide
whether to enable training on G (or D) based on the desired active_period and inactive_period for G (or D).
Reviewed By: dragonxlwang
Differential Revision: D6379325
fbshipit-source-id: 926f1041e25f48791b2ac1fc1a8eaa08db9639b8
Summary:
PR #1536 suppressed test_sparse_adagrad but test_row_wise_sparse_adagrad also filters too many examples. Suppress health checks for this test as well.
Closes https://github.com/caffe2/caffe2/pull/1599
Differential Revision: D6530850
Pulled By: pietern
fbshipit-source-id: c73f30d2e104565421e3e381b1cf66185edc833e
Summary:
This can be used for testing and debugging. zdevito and I will primarily use this for our caffe2 script project
Closes https://github.com/caffe2/caffe2/pull/1585
Reviewed By: zdevito
Differential Revision: D6501209
Pulled By: jamesr66a
fbshipit-source-id: fdd65e422c44b74bb6926320af506dcae13327f3
Summary:
Reduced the array sizes used in pack_ops_test to prevent time outs
during Travis CI builds.
Reviewed By: enosair
Differential Revision: D6476703
fbshipit-source-id: 20ab871ae40349ca27186447a84135bbc5c351b1
Summary:
Adds a new `LSTMCell` subclass to the `rnn_cell` module that performs layer normalization on the fused input matrix. Moves around some code in `rnn_cell.py` to avoid copy-pasta. Adds relevant test cases to `rnn_cell_test.py`.
Had to fix `brew.layer_norm` first. See T24013870.
Reviewed By: jhcross
Differential Revision: D6454883
fbshipit-source-id: 0f4ea7a778cc5be6a7274f7b28c793f5dd7c6095
Summary:
Regardless of device checker/gradient checker we cannot run a
backwards pass with cuDNN when NHWC is used.
Closes https://github.com/caffe2/caffe2/pull/1566
Differential Revision: D6474181
Pulled By: pietern
fbshipit-source-id: 727d7b4f2a1431a4d6675ffb76c5b60d3d7fa712
Summary: Quick fix for unit test broken by D6454290. This is my fault for approving while the tests covering the single callsite were broken.
Reviewed By: goldsborough
Differential Revision: D6466566
fbshipit-source-id: 2683be3d6bb184286e64fbde3e572946e39030c7
Summary:
While working on layer normalization for LSTMs I encountered an issue where the layer norm parameters (which are the scale/gain and bias/shift from the paper) were not registered in the model for `brew.layer_norm`. salexspb explained that this is because it was using the `init_net_param` API instead of `create_param`. This diff fixes this.
While fixing I noticed that I noticed that `brew.layer_norm` actually had a bug where it was multiplying with the bias instead of adding it. Another issue was that the function giving the scale and bias a shape of `[1]`, however the paper (https://arxiv.org/pdf/1607.06450.pdf) specifies that, like for batch norm, there is one scale and bias parameter per neuron, i.e. the shape should be `[1, axis_dimension]`. The API now takes an explicit `dim_in` parameter (also more consistent with other normalization functions in that module) so that this can be specified. See tests for how this now looks.
Reviewed By: jhcross
Differential Revision: D6454290
fbshipit-source-id: fc00ca614de3190c40ab743e8984bec9e85fb58c
Summary:
Adding a check to pack_segments to make sure the lengths passed in add up as expected.
Additionally started to address https://fb.facebook.com/groups/1405155842844877/permalink/1977332432293879/ , but it might not fix that issue, but is still useful if it does not help that issue.
Reviewed By: salexspb
Differential Revision: D6443490
fbshipit-source-id: 680dc763a788a550d321d97a556c5b46e3402dd1
Summary:
This is a CUDA implementation of the RemovePadding operator, modeled on akyrola's implementation for AddPadding.
There's also an incidental spelling correction: GetAddPadingGradient -> GetAddPaddingGradient.
Reviewed By: akyrola
Differential Revision: D6439594
fbshipit-source-id: b29cd0c252021c58e150b901bbaad28a3bd3cc4a
Summary:
With some test seeds this warning starts firing.
Should be addressed in a better way, not generating as many invalid examples.
Closes https://github.com/caffe2/caffe2/pull/1536
Reviewed By: bddppq
Differential Revision: D6437138
Pulled By: pietern
fbshipit-source-id: c619d928a585e3d887f686db5d98f841af10c56b
Summary:
TSIA. This is found in
https://github.com/caffe2/caffe2/pull/1530
Reviewed By: dzhulgakov
Differential Revision: D6434417
fbshipit-source-id: 2285c2f6252eb7f24e83357eb4887851b3adf690
Summary:
enosair caught bug that the operator returned too early if the lengths output was not provided. Fixed and added testing.
+ noticed the op does not support case when no lengths-input is provided. Added a temporary CAFFE_THROW for this case, will fix later
Reviewed By: enosair
Differential Revision: D6405585
fbshipit-source-id: a81717e1b39afde6e900ddd9049b820943aea9f1
Summary: CUDA version of the AddPadding op. It first executes a prefix-sum using Cub to compute the cumulative lenghts array. Then it launches a kernel that uses this information to fill the output tensor with start, end paddding and the actual contents.
Reviewed By: asaadaldien
Differential Revision: D6391413
fbshipit-source-id: 45b431e5976674729e53cb4752c7753c1d8a69e8
Summary: Cast op cuda can deal with empty batch now.
Reviewed By: azzolini
Differential Revision: D6350138
fbshipit-source-id: 2f3d19f4d42ff34806aa9597690e66f6b4de1a6b
Summary:
Two ops: BatchSparseToDenseOp and DenseToBatchSparseOp
Inverse operations of each other.
Details are described in op Doc
These op is used along with flexible topK, where the output is
lengths, indices, and values.
We want to do softmax on the values, but the dimension of each batch is different. So these op will convert sparse representation to dense and vice versa. The two ops are also gradient op for each other.
Reviewed By: chocjy
Differential Revision: D6288338
fbshipit-source-id: 0ba9e611058b39e46e7414dcc5f39cab29915fa3
Summary:
This is part one: It adds lambdaNDCG loss which can be used to heuristically
optimize the NDCG metric.
Differential Revision: D5830650
fbshipit-source-id: 1eb696337c9a77727ad40219c68f6468e2e097a5
Summary:
Datatypes was being handled badly in reference check, causing sporadic fails in CI. All batched mat-mul with fp16 data is performed as pseudo-fp16, with all math in fp32. Adjusted the reference implementation to reflect this.
Adjusted the gradient check threshold to the best I could get to consistently pass.
Closes https://github.com/caffe2/caffe2/pull/1406
Differential Revision: D6324431
Pulled By: pietern
fbshipit-source-id: 83ff2584438a11f7a6db4599a4fb0e75e9e15a3d
Summary: add NegateGradientOp: in forward pass, this op simply copies the input to output. In backward pass, it flips the sign of gradients.
Reviewed By: dragonxlwang
Differential Revision: D6314456
fbshipit-source-id: 56afd8b131eff9f7e120ab7e4e87461df49649d4
Summary: The topk GPU test was taking too much time, but there are still a variety of codepaths to test (k <= 1024, k > 1024, k == 1, k == n). Reduce the batch sizes and n to reduce time taken by the in-python CPU code equivalent.
Reviewed By: pietern
Differential Revision: D6272628
fbshipit-source-id: b8b8f3601f28bf64f144c73d7c9e915f40c84d70
Summary: The number of elements in the caffe2 blob can be larger than int32. Use size_t to prevent overflow.
Reviewed By: ajtulloch
Differential Revision: D6278363
fbshipit-source-id: 356e294c667a53360d8a65b56a63a39d5ce3384e
Summary:
Will probably rename to adaptive topK to be aligned with the layer name.
The main difference from top_k op is that the K is not fixed as a layer parameter,
instead this op takes in a blob that conatins K information for each row of the input data (batch mode).
Reviewed By: chocjy
Differential Revision: D6221209
fbshipit-source-id: f7fd575ff8f515d886d93278ad94fd17e8bd6fa5