pytorch/caffe2/perfkernels
Taiqing Wang 8cb1f2f9dc implement L2 regularization for Adagrad in caffe2 and dper (#37705)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37705

Pull Request resolved: https://github.com/pytorch/pytorch/pull/37372

Posted note: [Regularizing SparseNN Against Over-fitting](https://fb.workplace.com/notes/taiqing-wang/regularizing-sparsenn-against-over-fitting/220306075902708/)

**Problem formulation**

L(w) = J(w) + lambda/2 * ||w||^2
J(w) is the empirical loss, and ||w||^2 is the squared L2 norm of the parameters, a.k.a. L2 regularizer.

dL(w)/ dw_i = dJ(w)/dw_i + lambda w_i
dL(w)/ dw_i is the gradient of L(w) w.r.t. w_i.

To implement the L2 regularizer, the gradient of J(w) w.r.t. w_i is added with w_i. lambda is called as weight decay in this implementation.

**Code changes**
* In the initialization method of AdagradOptimizer, a new input argument, weight_decay, is added.
* In the _run function of AdagradOptimizer, the weight decay will be skipped for 1d bias vectors.
* In the parameter update functions of Adagrad, the gradient is updated by weight_decay * w_i. The default value for weight_decay is zero.

Test Plan:
`
buck build caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_weight_decay
`

`
./buck-out/gen/caffe2/caffe2/fb/dper/layer_models/tests/split_1/sparse_nn_test_weight_decay#binary.par
`

Reviewed By: jspark1105

Differential Revision: D21258652

fbshipit-source-id: d2366ddcd736a03205a2d16f914703b16d9fce8f
2020-05-03 10:42:49 -07:00
..
__init__.py
adagrad.cc implement L2 regularization for Adagrad in caffe2 and dper (#37705) 2020-05-03 10:42:49 -07:00
adagrad.h implement L2 regularization for Adagrad in caffe2 and dper (#37705) 2020-05-03 10:42:49 -07:00
adagrad_avx2.cc implement L2 regularization for Adagrad in caffe2 and dper (#37705) 2020-05-03 10:42:49 -07:00
CMakeLists.txt Fix AVX detection with clang-cl (#35653) 2020-03-30 07:53:37 -07:00
common.h [caffe2] Use cpuinfo in perfkernels to simplify build dependency (#36371) 2020-04-10 13:26:34 -07:00
common_avx.cc
common_avx2.cc
common_avx512.cc
cvtsh_ss_bugfix.h
embedding_lookup.cc
embedding_lookup.h
embedding_lookup_avx2.cc
embedding_lookup_fused_8bit_rowwise_avx2.cc
embedding_lookup_fused_8bit_rowwise_idx_avx2.cc [pytorch][embeddingbag_8bit] Add include_last_offset option to Fused 8bit EmbeddingBag and parallelize the op (#32683) 2020-01-29 16:04:56 -08:00
embedding_lookup_idx.cc [pytorch][embeddingbag] Parallelize the EmbeddingBag operator (#4049) 2020-01-23 21:29:44 -08:00
embedding_lookup_idx.h [pytorch][embeddingbag] Parallelize the EmbeddingBag operator (#4049) 2020-01-23 21:29:44 -08:00
embedding_lookup_idx_avx2.cc [pytorch][embeddingbag] Parallelize the EmbeddingBag operator (#4049) 2020-01-23 21:29:44 -08:00
fused_8bit_rowwise_conversion.cc
fused_8bit_rowwise_conversion.h
fused_8bit_rowwise_conversion_avx2.cc
fused_8bit_rowwise_embedding_lookup.cc
fused_8bit_rowwise_embedding_lookup.h
fused_8bit_rowwise_embedding_lookup_idx.cc [pytorch][embeddingbag_8bit] Add include_last_offset option to Fused 8bit EmbeddingBag and parallelize the op (#32683) 2020-01-29 16:04:56 -08:00
fused_8bit_rowwise_embedding_lookup_idx.h
hp_emblookup_codegen.py [pytorch][embeddingbag_8bit] Add include_last_offset option to Fused 8bit EmbeddingBag and parallelize the op (#32683) 2020-01-29 16:04:56 -08:00
lstm_unit_cpu-impl.h [caffe2] Explicit vectorization of LSTM operator (#35556) 2020-04-01 17:19:56 -07:00
lstm_unit_cpu.h [caffe2] Explicit vectorization of LSTM operator (#35556) 2020-04-01 17:19:56 -07:00
lstm_unit_cpu_avx2.cc [caffe2] Explicit vectorization of LSTM operator (#35556) 2020-04-01 17:19:56 -07:00
lstm_unit_cpu_common.cc [caffe2] Explicit vectorization of LSTM operator (#35556) 2020-04-01 17:19:56 -07:00
lstm_unit_cpu_common.h [caffe2] Explicit vectorization of LSTM operator (#35556) 2020-04-01 17:19:56 -07:00
math.h
math_cpu_avx2.cc
math_cpu_base.cc
typed_axpy.cc
typed_axpy.h
typed_axpy_avx.cc
typed_axpy_avx2.cc