stable-baselines3/docs/modules/ddpg.rst
Antonin RAFFIN 897e98c4e2
Update documentation (#199)
* Update doc and add new example

* Add save/load replay buffer example

* Add save format + export doc

* Add example for get/set parameters

* Typos and minor edits

* Add results sections

* Add note about performance

* Add DDPG results

* Address comments

* Fix grammar/wording

Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>
2020-10-28 09:55:16 +01:00

169 lines
4.1 KiB
ReStructuredText

.. _ddpg:
.. automodule:: stable_baselines3.ddpg
DDPG
====
`Deep Deterministic Policy Gradient (DDPG) <https://spinningup.openai.com/en/latest/algorithms/ddpg.html>`_ combines the
trick for DQN with the deterministic policy gradient, to obtain an algorithm for continuous actions.
.. note::
As ``DDPG`` can be seen as a special case of its successor :ref:`TD3 <td3>`,
they share the same policies and same implementation.
.. rubric:: Available Policies
.. autosummary::
:nosignatures:
MlpPolicy
CnnPolicy
Notes
-----
- Deterministic Policy Gradient: http://proceedings.mlr.press/v32/silver14.pdf
- DDPG Paper: https://arxiv.org/abs/1509.02971
- OpenAI Spinning Guide for DDPG: https://spinningup.openai.com/en/latest/algorithms/ddpg.html
Can I use?
----------
- Recurrent policies: ❌
- Multi processing: ❌
- Gym spaces:
============= ====== ===========
Space Action Observation
============= ====== ===========
Discrete ❌ ✔️
Box ✔️ ✔️
MultiDiscrete ❌ ✔️
MultiBinary ❌ ✔️
============= ====== ===========
Example
-------
.. code-block:: python
import gym
import numpy as np
from stable_baselines3 import DDPG
from stable_baselines3.common.noise import NormalActionNoise, OrnsteinUhlenbeckActionNoise
env = gym.make('Pendulum-v0')
# The noise objects for DDPG
n_actions = env.action_space.shape[-1]
action_noise = NormalActionNoise(mean=np.zeros(n_actions), sigma=0.1 * np.ones(n_actions))
model = DDPG('MlpPolicy', env, action_noise=action_noise, verbose=1)
model.learn(total_timesteps=10000, log_interval=10)
model.save("ddpg_pendulum")
env = model.get_env()
del model # remove to demonstrate saving and loading
model = DDPG.load("ddpg_pendulum")
obs = env.reset()
while True:
action, _states = model.predict(obs)
obs, rewards, dones, info = env.step(action)
env.render()
Results
-------
PyBullet Environments
^^^^^^^^^^^^^^^^^^^^^
Results on the PyBullet benchmark (1M steps) using 6 seeds.
The complete learning curves are available in the `associated issue #48 <https://github.com/DLR-RM/stable-baselines3/issues/48>`_.
.. note::
Hyperparameters of :ref:`TD3 <td3>` from the `gSDE paper <https://arxiv.org/abs/2005.05719>`_ were used for ``DDPG``.
*Gaussian* means that the unstructured Gaussian noise is used for exploration,
*gSDE* (generalized State-Dependent Exploration) is used otherwise.
+--------------+--------------+--------------+--------------+
| Environments | DDPG | TD3 | SAC |
+==============+==============+==============+==============+
| | Gaussian | Gaussian | gSDE |
+--------------+--------------+--------------+--------------+
| HalfCheetah | 2272 +/- 69 | 2774 +/- 35 | 2984 +/- 202 |
+--------------+--------------+--------------+--------------+
| Ant | 1651 +/- 407 | 3305 +/- 43 | 3102 +/- 37 |
+--------------+--------------+--------------+--------------+
| Hopper | 1201 +/- 211 | 2429 +/- 126 | 2262 +/- 1 |
+--------------+--------------+--------------+--------------+
| Walker2D | 882 +/- 186 | 2063 +/- 185 | 2136 +/- 67 |
+--------------+--------------+--------------+--------------+
How to replicate the results?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Clone the `rl-zoo repo <https://github.com/DLR-RM/rl-baselines3-zoo>`_:
.. code-block:: bash
git clone https://github.com/DLR-RM/rl-baselines3-zoo
cd rl-baselines3-zoo/
Run the benchmark (replace ``$ENV_ID`` by the envs mentioned above):
.. code-block:: bash
python train.py --algo ddpg --env $ENV_ID --eval-episodes 10 --eval-freq 10000
Plot the results:
.. code-block:: bash
python scripts/all_plots.py -a ddpg -e HalfCheetah Ant Hopper Walker2D -f logs/ -o logs/ddpg_results
python scripts/plot_from_file.py -i logs/ddpg_results.pkl -latex -l DDPG
Parameters
----------
.. autoclass:: DDPG
:members:
:inherited-members:
.. _ddpg_policies:
DDPG Policies
-------------
.. autoclass:: MlpPolicy
:members:
:inherited-members:
.. autoclass:: stable_baselines3.td3.policies.TD3Policy
:members:
:noindex:
.. autoclass:: CnnPolicy
:members: