mirror of
https://github.com/saymrwulf/pytorch.git
synced 2026-05-15 21:00:47 +00:00
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37705 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37372 Posted note: [Regularizing SparseNN Against Over-fitting](https://fb.workplace.com/notes/taiqing-wang/regularizing-sparsenn-against-over-fitting/220306075902708/) **Problem formulation** L(w) = J(w) + lambda/2 * ||w||^2 J(w) is the empirical loss, and ||w||^2 is the squared L2 norm of the parameters, a.k.a. L2 regularizer. dL(w)/ dw_i = dJ(w)/dw_i + lambda w_i dL(w)/ dw_i is the gradient of L(w) w.r.t. w_i. To implement the L2 regularizer, the gradient of J(w) w.r.t. w_i is added with w_i. lambda is called as weight decay in this implementation. **Code changes** * In the initialization method of AdagradOptimizer, a new input argument, weight_decay, is added. * In the _run function of AdagradOptimizer, the weight decay will be skipped for 1d bias vectors. * In the parameter update functions of Adagrad, the gradient is updated by weight_decay * w_i. The default value for weight_decay is zero. Test Plan: ` buck build caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_weight_decay ` ` ./buck-out/gen/caffe2/caffe2/fb/dper/layer_models/tests/split_1/sparse_nn_test_weight_decay#binary.par ` Reviewed By: jspark1105 Differential Revision: D21258652 fbshipit-source-id: d2366ddcd736a03205a2d16f914703b16d9fce8f |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| adagrad.cc | ||
| adagrad.h | ||
| adagrad_avx2.cc | ||
| CMakeLists.txt | ||
| common.h | ||
| common_avx.cc | ||
| common_avx2.cc | ||
| common_avx512.cc | ||
| cvtsh_ss_bugfix.h | ||
| embedding_lookup.cc | ||
| embedding_lookup.h | ||
| embedding_lookup_avx2.cc | ||
| embedding_lookup_fused_8bit_rowwise_avx2.cc | ||
| embedding_lookup_fused_8bit_rowwise_idx_avx2.cc | ||
| embedding_lookup_idx.cc | ||
| embedding_lookup_idx.h | ||
| embedding_lookup_idx_avx2.cc | ||
| fused_8bit_rowwise_conversion.cc | ||
| fused_8bit_rowwise_conversion.h | ||
| fused_8bit_rowwise_conversion_avx2.cc | ||
| fused_8bit_rowwise_embedding_lookup.cc | ||
| fused_8bit_rowwise_embedding_lookup.h | ||
| fused_8bit_rowwise_embedding_lookup_idx.cc | ||
| fused_8bit_rowwise_embedding_lookup_idx.h | ||
| hp_emblookup_codegen.py | ||
| lstm_unit_cpu-impl.h | ||
| lstm_unit_cpu.h | ||
| lstm_unit_cpu_avx2.cc | ||
| lstm_unit_cpu_common.cc | ||
| lstm_unit_cpu_common.h | ||
| math.h | ||
| math_cpu_avx2.cc | ||
| math_cpu_base.cc | ||
| typed_axpy.cc | ||
| typed_axpy.h | ||
| typed_axpy_avx.cc | ||
| typed_axpy_avx2.cc | ||