Summary: Due to popular demand, added an op to compute element-wise square + gradient for it (just for the fun of it).
Reviewed By: Yangqing
Differential Revision: D4664797
fbshipit-source-id: 0a29c7c249fdc72f51412bebd6ae352a7801cf05
Summary: DivOp missed a gradient for CUDA, so implemented it. Also added operator test.
Differential Revision: D4396638
fbshipit-source-id: 9949e47aa3735bb418a0db003e2b2f4896056a71