From d1fe68e70bb72a425667c63007a68cfb2ea6fc4b Mon Sep 17 00:00:00 2001
From: Ilqar Ramazanli <iramazanli@fb.com>
Date: Fri, 23 Apr 2021 09:33:22 -0700
Subject: [PATCH] To add single and chained learning schedulers to docs
 (#56705)

Summary:
In the optimizer documentation, many of the learning rate schedulers [examples](https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate) are provided according to a generic template. In this PR we provide a precise simple use case example to show how to use learning rate schedulers. Moreover, in a followup example we show an example how to chain two schedulers next to each other.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56705

Reviewed By: ezyang

Differential Revision: D27966704

Pulled By: iramazanli

fbshipit-source-id: f32b2d70d5cad7132335a9b13a2afa3ac3315a13
---
 docs/source/optim.rst | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/docs/source/optim.rst b/docs/source/optim.rst
index e64945fe462..b8ef01f8ec1 100644
--- a/docs/source/optim.rst
+++ b/docs/source/optim.rst
@@ -146,6 +146,45 @@ allows dynamic learning rate reducing based on some validation measurements.
 Learning rate scheduling should be applied after optimizer's update; e.g., you
 should write your code this way:
 
+Example::
+
+    model = [Parameter(torch.randn(2, 2, requires_grad=True))]
+    optimizer = SGD(model, 0.1)
+    scheduler = ExponentialLR(optimizer, gamma=0.9)
+
+    for epoch in range(20):
+        for input, target in dataset:
+            optimizer.zero_grad()
+            output = model(input)
+            loss = loss_fn(output, target)
+            loss.backward()
+            optimizer.step()
+        scheduler.step()
+
+Most learning rate schedulers can be called back-to-back (also referred to as
+chaining schedulers). The result is that each scheduler is applied one after the
+other on the learning rate obtained by the one preceding it.
+
+Example::
+
+    model = [Parameter(torch.randn(2, 2, requires_grad=True))]
+    optimizer = SGD(model, 0.1)
+    scheduler1 = ExponentialLR(optimizer, gamma=0.9)
+    scheduler2 = MultiStepLR(optimizer, milestones=[30,80], gamma=0.1)
+
+    for epoch in range(20):
+        for input, target in dataset:
+            optimizer.zero_grad()
+            output = model(input)
+            loss = loss_fn(output, target)
+            loss.backward()
+            optimizer.step()
+        scheduler1.step()
+        scheduler2.step()
+
+In many places in the documentation, we will use the following template to refer to schedulers
+algorithms.
+
     >>> scheduler = ...
     >>> for epoch in range(100):
     >>>     train(...)