* Update elementwise ops to support numpy style boradcast
Update elementwise ops to support numpy style boradcast
* Fix sqrt_op
* Fix compare ops
* Fix gradient test
* Fix optimizer legacy broadcast
* Fix legacy broadcast for elementwise ops
* Skip flaky test
* Fix eigen simple binary op
* Fix attention test
* Fix rnn test
* Fix LSTM test
* Fix tan grad
* Fix schema check
Summary:
I would expect that tests marked "expected failure" mean that there is a known issue in the code which will be fixed later. Both of these tests are simply verifying proper error-checking - nothing needs fixing.
Before (looks like something is wrong):
```
======================================= 2 xfailed in 0.27 seconds =======================================
```
After:
```
======================================= 2 passed in 0.28 seconds ========================================
```
/cc akyrola gsethi523
Closes https://github.com/caffe2/caffe2/pull/1209
Differential Revision: D5825373
Pulled By: akyrola
fbshipit-source-id: 1b98f503e4e406f69567d02425532f43bd16a465
Summary: Quite common confusion is how to use StopGradient, and typical bug is to forget to specify input=output. This adds a sanity check to gradient builder that checks if some StopGradient outputs are orphaned.
Reviewed By: dzhulgakov
Differential Revision: D5458341
fbshipit-source-id: 056fef4f0ee53eb10e66e9be0ecb55b55f9cc3d7
Summary:
It is quite common question when users get some variant of "blob has version 2 but gradient expects version 1" in their backward pass. The error message is completely unhelpful.
To remedy this, I added proper debug information which tells user how the version number of a blob was incremented over time. i.e which ops caused the version to go op. This should help
understand the issue.
Reviewed By: dzhulgakov
Differential Revision: D5358227
fbshipit-source-id: bc09d048ac33200c35d56460e44e86c2f2888f3f
Summary:
a few issues:
1. Randomization hurts memoization
1. Even if we make it non random, then we can get key colisions when loading it back.
2. RNNs use prototxt for step net and apparently its not forward compatible like normal protobuf is
I am thinking of a better less invasive solution now.
Reviewed By: jamesr66a
Differential Revision: D5272118
fbshipit-source-id: ab577fad04fbfc632e1fceffa923377a0d3da1be
Summary: Hard-to-debug problems arise when a gradient creator fails when the forward op is incorrect itself. Add checking of the schema before callig the creator. Also clarify the error messages
Reviewed By: Yangqing
Differential Revision: D5256016
fbshipit-source-id: 78550f7e2ce5b88e26b69fdae4be0eece52edfea
Summary: This is going to show a python Caffe2 user where a failed operator was created. Motivation for having this information not right in protobuf is to avoid having it too verboose and keep ability to read protobufs of a net after a simple print() call.
Reviewed By: jamesr66a
Differential Revision: D5226047
fbshipit-source-id: 7edfe850e05a2ec209577142aa3368664a57a108
Summary:
We waste extra memory by creating two autosplit gradient
blobs and then accumulating it into them main one. Sometimesk, when Sum
/ Sub ops are involved, we can avoid wasting extra memory at all.
Ideally we would not waste any memory and make ops add to the same
blob rather then calculating separate results and then mering
them. But it would require a substantial change to the frameworks and
rewriting a lot of operators.
Reviewed By: dzhulgakov
Differential Revision: D5157667
fbshipit-source-id: 8293824d6cdd971d8853ae90aee68e4a6d1e132b
Summary:
when building a multi layer static RNN the last timestep of
the first layer (and other layers except the last one) doesn't get a
gradient for the cell state as normally user uses results only from
the last layer and cell state doesn't go up either.
ZeroGradient provides a general solution for injecting 0 gradient
blobs. It is in some way similar to StopGradient operator which is
also specialcased
Reviewed By: bwasti
Differential Revision: D5198375
fbshipit-source-id: a21d0cfb3676a77fac72e5897a200d0bd25fc6de
Summary:
Bug repro is in a test. Generally speaking accumulation was
not happening if len(ys) >= 2 (list of blobs we compute gradients
from) and for some blob in the net it was both in ys list and also got
a gradient propagated from another element in ys.
Reviewed By: akyrola
Differential Revision: D5121695
fbshipit-source-id: 282d88f2f4f6e27dadae311964f40246a2739130
Summary:
Fix issue that amyzhang encountered. She was using ConstantFill to create a blob of same size as an another blob. This caused the gradient op computation flow to interrupt through the ConstantFil since the gradient for the input blob was set to None (although it had another gradient already set). The correct solution is to avoid overwriting gradient assignments with None, if they already have a gradient. UNLESS that blob is output of the same op, as with StopGradient op. (Note that Amy's problem was fixed by using instead a fixed shape ConstantFill and Add with broadcast=1, which is better solution anyway).
Not sure if I explained this well, but see the new unit tests. Before this change, the testAddAndDynamicConstant failed but the testAddAndStaticConstant succeeded.
Reviewed By: dzhulgakov
Differential Revision: D4861176
fbshipit-source-id: 3b53621bfaba2e36786a5e4664145038995f6616
Summary:
A bit too much stuff in one diff, so sorry:
1. Add inference for gradient types by using the fact that x_grad is gradient of x and must be of same shape. This is kind of awkward to use string matching, but in addition I rely on the operator being actually a gradient op.
2. dzhulgakov was write, scalar shape is () and not (1). Sorry, my claim easlier was #fakenews.
3. Added inference functions for MakeTwoClass, MomentumSGDUpdate and Cross entropy ops.
Reviewed By: dzhulgakov
Differential Revision: D4569758
fbshipit-source-id: 0db13f33819777fdddefe21d4b1ebf906fcaf98c
Summary:
This operator can always outputs dense gradients regardless of
the input gradients. For forward pass, it passes inputs to outputs in place.
Reviewed By: xianjiec
Differential Revision: D4582511
fbshipit-source-id: 7eb2c5d2142aa05d373f06cab1e7f89d8b747d34
Summary:
Only tests for SparseFunHash for now
Closes https://github.com/caffe2/caffe2/pull/60
Reviewed By: Yangqing
Differential Revision: D4348961
Pulled By: bwasti
fbshipit-source-id: cd05d73ccc711b42a7d33e7a6b65a9d1a9bfa7e6
Summary:
This adds support for automatic aggregation of sparse gradients. We simply concatenate indices and values (no attempt to deduplicate, since this is already done before feeding into the optimizer). This should support various cases (indices and/or values can be generated by one or more gradient ops, or gradient outputs can be directly passed from inputs).
I tried to minimize the code footprint, but I introduced SparseGradGenMeta because GradGenMeta didn't lend itself very well to be used with sparse gradients.
Reviewed By: dzhulgakov
Differential Revision: D4219788
fbshipit-source-id: 1d074664cffd82a8764e4b1473ada6bc46e6c51a