pytorch/caffe2/python
Taiqing Wang 8cb1f2f9dc implement L2 regularization for Adagrad in caffe2 and dper (#37705)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37705

Pull Request resolved: https://github.com/pytorch/pytorch/pull/37372

Posted note: [Regularizing SparseNN Against Over-fitting](https://fb.workplace.com/notes/taiqing-wang/regularizing-sparsenn-against-over-fitting/220306075902708/)

**Problem formulation**

L(w) = J(w) + lambda/2 * ||w||^2
J(w) is the empirical loss, and ||w||^2 is the squared L2 norm of the parameters, a.k.a. L2 regularizer.

dL(w)/ dw_i = dJ(w)/dw_i + lambda w_i
dL(w)/ dw_i is the gradient of L(w) w.r.t. w_i.

To implement the L2 regularizer, the gradient of J(w) w.r.t. w_i is added with w_i. lambda is called as weight decay in this implementation.

**Code changes**
* In the initialization method of AdagradOptimizer, a new input argument, weight_decay, is added.
* In the _run function of AdagradOptimizer, the weight decay will be skipped for 1d bias vectors.
* In the parameter update functions of Adagrad, the gradient is updated by weight_decay * w_i. The default value for weight_decay is zero.

Test Plan:
`
buck build caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_weight_decay
`

`
./buck-out/gen/caffe2/caffe2/fb/dper/layer_models/tests/split_1/sparse_nn_test_weight_decay#binary.par
`

Reviewed By: jspark1105

Differential Revision: D21258652

fbshipit-source-id: d2366ddcd736a03205a2d16f914703b16d9fce8f
2020-05-03 10:42:49 -07:00
..
benchmarks [caffe2] open source 2/4-bit SLS operators (#34903) 2020-03-17 22:55:10 -07:00
docs
examples Fix typos, via a Levenshtein-type corrector (#31523) 2020-01-17 16:03:19 -08:00
fakelowp Make FakeLowP tests work (#36525) 2020-04-13 20:16:33 -07:00
helpers Fix typos, via a Levenshtein-type corrector (#31523) 2020-01-17 16:03:19 -08:00
ideep Remove (most) Python 2 support from Python code (#35615) 2020-04-22 09:23:14 -07:00
layers Add LN after specialzied output embeddings and flexible LCE (#35178) 2020-04-30 15:32:09 -07:00
mint
mkl Fix typos, via a Levenshtein-type corrector (#31523) 2020-01-17 16:03:19 -08:00
modeling Fix typos, via a Levenshtein-type corrector (#31523) 2020-01-17 16:03:19 -08:00
models Make caffe2/caffe2/python/models/seq2seq python3 compatible 2020-02-04 10:51:47 -08:00
onnx Skip c2 ref onnx model tests (#37591) 2020-04-30 14:32:47 -07:00
operator_test implement L2 regularization for Adagrad in caffe2 and dper (#37705) 2020-05-03 10:42:49 -07:00
predictor [DPER3][Shape inference] Update Shape Information in dper3 backend (#34475) 2020-03-19 13:49:34 -07:00
rnn
serialized_test [Caffe2] Fix shape inference for element-wise operators (#33431) 2020-02-25 09:03:06 -08:00
test
trt Fix typos, via a Levenshtein-type corrector (#31523) 2020-01-17 16:03:19 -08:00
__init__.py Fix dll load logic for Python 3.8 on Windows (#32215) 2020-01-22 08:33:34 -08:00
_import_c_extension.py [AMD] Remove num_gpu check for remote execution (#34318) 2020-03-06 09:53:57 -08:00
allcompare_test.py
attention.py
benchmark_generator.py
binarysize.py
brew.py
brew_test.py
build.py
cached_reader.py
caffe_translator.py
caffe_translator_test.py
checkpoint.py Fix typos, via a Levenshtein-type corrector (#31523) 2020-01-17 16:03:19 -08:00
checkpoint_test.py
CMakeLists.txt
cnn.py
compatibility.py
context.py
context_test.py
control.py
control_ops_grad.py
control_ops_grad_test.py
control_ops_util.py
control_test.py
convert.py
convert_test.py
convnet_benchmarks.py
convnet_benchmarks_test.py
core.py [net_transform] only skip ConstantFill for autogen_grad (#34628) 2020-03-11 19:09:52 -07:00
core_gradients_test.py
core_test.py
crf.py Fix typos, via a Levenshtein-type corrector (#31523) 2020-01-17 16:03:19 -08:00
crf_predict.py
crf_viterbi_test.py
data_parallel_model.py
data_parallel_model_test.py Skips test_equiv_recurrent (#29255) 2019-11-06 13:29:23 -08:00
data_workers.py
data_workers_test.py Disables test_atomic_ops and testInputOrder (#29145) 2019-11-05 16:53:53 -08:00
dataio.py
dataio_test.py Fix typos, via a Levenshtein-type corrector (#31523) 2020-01-17 16:03:19 -08:00
dataset.py
db_file_reader.py
db_test.py
device_checker.py
dlpack.h Fix typos (#30606) 2019-12-02 20:17:42 -08:00
dyndep.py
embedding_generation_benchmark.py
experiment_util.py
extension_loader.py
filler_test.py
functional.py Fix typos, via a Levenshtein-type corrector (#31523) 2020-01-17 16:03:19 -08:00
functional_test.py
fused_8bit_rowwise_conversion_ops_test.py [caffe2] make fused rowwise quant/dequant op work for N-dim tensors (#33426) 2020-02-19 23:29:42 -08:00
gradient_check_test.py
gradient_checker.py [caffe2] fix type and shape inference for common gradient ops (#35857) 2020-04-02 11:17:04 -07:00
gru_cell.py
hip_test_util.py
hsm_util.py
hypothesis_test.py Fix typos (#30606) 2019-12-02 20:17:42 -08:00
hypothesis_test_util.py [caffe2] fix type and shape inference for common gradient ops (#35857) 2020-04-02 11:17:04 -07:00
ideep_test_util.py
layer_model_helper.py Add transfer_learning_blob_name_mappings into layer_model_helper to support layer model transfer learning 2020-03-18 15:28:00 -07:00
layer_model_instantiator.py
layer_parameter_sharing_test.py
layer_test_util.py
layers_test.py FCTransposed to FbFCPacked (#29766) 2019-12-10 10:18:21 -08:00
lengths_reducer_fused_8bit_rowwise_ops_test.py [caffe2] fix how np.clip is used in lengths_reducer_fused_{4,8}_rowwise_ops_test (#32086) 2020-01-14 22:53:28 -08:00
lengths_reducer_rowwise_8bit_ops_test.py
lstm_benchmark.py Fix typos (#30606) 2019-12-02 20:17:42 -08:00
memonger.py Fix typos (#30606) 2019-12-02 20:17:42 -08:00
memonger_test.py
mkl_test_util.py
model_device_test.py
model_helper.py Fix TensorProtosDBInput AttributeError (#32274) 2020-01-29 12:05:43 -08:00
model_helper_test.py
modifier_context.py Fix typos (#30606) 2019-12-02 20:17:42 -08:00
mpi_python.cc Replace c10::guts::stuff with std::stuff (#30915) 2019-12-16 13:57:19 -08:00
muji.py
muji_test.py
net_builder.py
net_builder_test.py
net_drawer.py Fix typos, via a Levenshtein-type corrector (#31523) 2020-01-17 16:03:19 -08:00
net_printer.py
net_printer_test.py
nomnigraph.py
nomnigraph_test.py
nomnigraph_transformations.py
nomnigraph_transformations_test.py
normalizer.py Scale init for batch-norm and layer-norm (#31983) 2020-01-10 11:55:56 -08:00
normalizer_context.py Fix typos (#30606) 2019-12-02 20:17:42 -08:00
normalizer_test.py
numa_benchmark.py
numa_test.py
observer_test.py
operator_fp_exceptions_test.py
optimizer.py implement L2 regularization for Adagrad in caffe2 and dper (#37705) 2020-05-03 10:42:49 -07:00
optimizer_context.py Fix typos (#30606) 2019-12-02 20:17:42 -08:00
optimizer_test.py Implementation of STORM optimizer caffe2 python wrapper (#36399) 2020-04-14 23:05:45 -07:00
optimizer_test_util.py Fix typos (#30606) 2019-12-02 20:17:42 -08:00
parallel_workers.py get rid of deprecated thread.isAlive() to use py2.6 modern form is_alive() 2019-10-22 15:37:31 -07:00
parallel_workers_test.py ParallelWorkersTest.testParallelWorkersInitFun is flaky (#29045) 2019-11-01 13:59:02 -07:00
parallelize_bmuf_distributed_test.py
pipeline.py Fix typos, via a Levenshtein-type corrector (#31523) 2020-01-17 16:03:19 -08:00
pipeline_test.py
predictor_constants.py
pybind_state.cc [caffe2] create and register child ws in pybind (#36741) 2020-04-16 14:53:55 -07:00
pybind_state.h caffe2: preserve python exception type from PythonOp (#36267) 2020-04-09 12:43:24 -07:00
pybind_state_dlpack.cc
pybind_state_dlpack.h
pybind_state_gpu.cc
pybind_state_hip.cc Make caffe2/fb folder compatible with AMD (#29131) 2019-11-04 16:40:29 -08:00
pybind_state_ideep.cc Upgrade MKL-DNN to DNNL v1.2 (#32422) 2020-03-26 22:07:59 -07:00
pybind_state_int8.cc
pybind_state_nomni.cc
pybind_state_registry.cc
pybind_state_registry.h
python_op_test.py caffe2: preserve python exception type from PythonOp (#36267) 2020-04-09 12:43:24 -07:00
queue_util.py
record_queue.py
recurrent.py
regularizer.py Fix typos, via a Levenshtein-type corrector (#31523) 2020-01-17 16:03:19 -08:00
regularizer_context.py Fix typos (#30606) 2019-12-02 20:17:42 -08:00
regularizer_test.py
rnn_cell.py [caffe2] Remove python2 from operator_test (#33977) 2020-03-02 08:55:53 -08:00
schema.py Fix typos, via a Levenshtein-type corrector (#31523) 2020-01-17 16:03:19 -08:00
schema_test.py
scope.py
scope_test.py
session.py Fix typos (#30606) 2019-12-02 20:17:42 -08:00
session_test.py
sparse_to_dense_mask_test.py
sparse_to_dense_test.py
task.py Fix typos (#30606) 2019-12-02 20:17:42 -08:00
task_test.py
test_util.py
text_file_reader.py
timeout_guard.py
toy_regression_test.py
transformations.py
transformations_test.py
tt_core.py
tt_core_test.py
utils.py [C2] Introduce extra_info force CPU tags for auto-generated iteration counter blobs (#32607) 2020-02-05 23:49:27 -08:00
utils_test.py [C2] Introduce extra_info force CPU tags for auto-generated iteration counter blobs (#32607) 2020-02-05 23:49:27 -08:00
visualize.py Fix typos, via a Levenshtein-type corrector (#31523) 2020-01-17 16:03:19 -08:00
workspace.py Fix typos, via a Levenshtein-type corrector (#31523) 2020-01-17 16:03:19 -08:00
workspace_test.py Revert "Revert D18171156: Merge Tensor and Variable." (#29299) 2019-11-08 09:11:20 -08:00