Summary: GPU (CUDA) implementation of the Swish activation function in Caffe2.
Reviewed By: Yangqing, xianjiec
Differential Revision: D6656907
fbshipit-source-id: f5f2c667055abf679728d2b5d43998895ddec708
Summary: I noticed that Sigmoid was taking an inordinate amount of time in our NMT benchmark, so I looked at the implementation and it didn't seem optimal. I replaced the implementation with an Eigen version so that when the Eigen update goes through, we will get proper AVX(2) vectorization.
Differential Revision: D5082464
fbshipit-source-id: aa951f7d730fc05198f7dd04076ec58d471b74c8
Summary: Due to popular demand, added an op to compute element-wise square + gradient for it (just for the fun of it).
Reviewed By: Yangqing
Differential Revision: D4664797
fbshipit-source-id: 0a29c7c249fdc72f51412bebd6ae352a7801cf05
Summary: DivOp missed a gradient for CUDA, so implemented it. Also added operator test.
Differential Revision: D4396638
fbshipit-source-id: 9949e47aa3735bb418a0db003e2b2f4896056a71