stable-baselines3/docs/modules/td3.rst

.. _td3:

.. automodule:: stable_baselines3.td3


TD3
===

`Twin Delayed DDPG (TD3) <https://spinningup.openai.com/en/latest/algorithms/td3.html>`_ Addressing Function Approximation Error in Actor-Critic Methods.

TD3 is a direct successor of DDPG and improves it using three major tricks: clipped double Q-Learning, delayed policy update and target policy smoothing.
We recommend reading `OpenAI Spinning guide on TD3 <https://spinningup.openai.com/en/latest/algorithms/td3.html>`_ to learn more about those.


.. rubric:: Available Policies

.. autosummary::
    :nosignatures:

    MlpPolicy


Notes
-----

- Original paper: https://arxiv.org/pdf/1802.09477.pdf
- OpenAI Spinning Guide for TD3: https://spinningup.openai.com/en/latest/algorithms/td3.html
- Original Implementation: https://github.com/sfujim/TD3

.. note::

    The default policies for TD3 differ a bit from others MlpPolicy: it uses ReLU instead of tanh activation,
    to match the original paper


Can I use?
----------

-  Recurrent policies: ❌
-  Multi processing: ❌
-  Gym spaces:


============= ====== ===========
Space         Action Observation
============= ====== ===========
Discrete      ❌      ✔️
Box           ✔️       ✔️
MultiDiscrete ❌      ✔️
MultiBinary   ❌      ✔️
============= ====== ===========


Example
-------

.. code-block:: python

  import gym
  import numpy as np

  from stable_baselines3 import TD3
  from stable_baselines3.td3.policies import MlpPolicy
  from stable_baselines3.common.noise import NormalActionNoise, OrnsteinUhlenbeckActionNoise

  env = gym.make('Pendulum-v0')
  
  # The noise objects for TD3
  n_actions = env.action_space.shape[-1]
  action_noise = NormalActionNoise(mean=np.zeros(n_actions), sigma=0.1 * np.ones(n_actions))

  model = TD3(MlpPolicy, env, action_noise=action_noise, verbose=1)
  model.learn(total_timesteps=10000, log_interval=10)
  model.save("td3_pendulum")
  env = model.get_env()

  del model # remove to demonstrate saving and loading

  model = TD3.load("td3_pendulum")

  obs = env.reset()
  while True:
      action, _states = model.predict(obs)
      obs, rewards, dones, info = env.step(action)
      env.render()


Parameters
----------

.. autoclass:: TD3
  :members:
  :inherited-members:

.. _td3_policies:

TD3 Policies
-------------

.. autoclass:: MlpPolicy
  :members:
  :inherited-members:


.. .. autoclass:: CnnPolicy
..   :members:
..   :inherited-members:
Add doc 2019-09-26 09:46:40 +00:00			`.. _td3:`

Rename to stable-baselines3 2020-05-05 13:02:35 +00:00			`.. automodule:: stable_baselines3.td3`
Add doc 2019-09-26 09:46:40 +00:00

			`TD3`
			`===`

			`Twin Delayed DDPG (TD3) <https://spinningup.openai.com/en/latest/algorithms/td3.html>`_ Addressing Function Approximation Error in Actor-Critic Methods.

			`TD3 is a direct successor of DDPG and improves it using three major tricks: clipped double Q-Learning, delayed policy update and target policy smoothing.`
			We recommend reading `OpenAI Spinning guide on TD3 <https://spinningup.openai.com/en/latest/algorithms/td3.html>`_ to learn more about those.


			`.. rubric:: Available Policies`

			`.. autosummary::`
			`:nosignatures:`

			`MlpPolicy`


			`Notes`
			`-----`

			`- Original paper: https://arxiv.org/pdf/1802.09477.pdf`
			`- OpenAI Spinning Guide for TD3: https://spinningup.openai.com/en/latest/algorithms/td3.html`
			`- Original Implementation: https://github.com/sfujim/TD3`

			`.. note::`

			`The default policies for TD3 differ a bit from others MlpPolicy: it uses ReLU instead of tanh activation,`
			`to match the original paper`


			`Can I use?`
			`----------`

			`- Recurrent policies: ❌`
			`- Multi processing: ❌`
			`- Gym spaces:`


			`============= ====== ===========`
			`Space Action Observation`
			`============= ====== ===========`
Support for MultiBinary / MultiDiscrete spaces (#13) * multicategorical dist and test * fixed List annotation * bernoulli dist and test * added distributions to preprocessing (needs testing) * fixed and tested distributions * added changelog and fixed ppo policy * minor fix * dist fixes, added test_spaces * clean up * modified changelog * additional fixes * minor changelog mod * hot encoding fix, flake8 clean up * lint tests * preprocessing fix * fixed bernoulli bug * removed commented prints * Update changelog.rst * included suggested modifications * linting fix * increased space dim * Update doc and tests Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> 2020-05-18 12:42:13 +00:00			`Discrete ❌ ✔️`
Add doc 2019-09-26 09:46:40 +00:00			`Box ✔️ ✔️`
Support for MultiBinary / MultiDiscrete spaces (#13) * multicategorical dist and test * fixed List annotation * bernoulli dist and test * added distributions to preprocessing (needs testing) * fixed and tested distributions * added changelog and fixed ppo policy * minor fix * dist fixes, added test_spaces * clean up * modified changelog * additional fixes * minor changelog mod * hot encoding fix, flake8 clean up * lint tests * preprocessing fix * fixed bernoulli bug * removed commented prints * Update changelog.rst * included suggested modifications * linting fix * increased space dim * Update doc and tests Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> 2020-05-18 12:42:13 +00:00			`MultiDiscrete ❌ ✔️`
			`MultiBinary ❌ ✔️`
Add doc 2019-09-26 09:46:40 +00:00			`============= ====== ===========`


			`Example`
			`-------`

			`.. code-block:: python`

Fix TD3 Example Code Documentation (#38) Fix TD3's example code 2020-06-01 08:37:42 +00:00			`import gym`
Add doc 2019-09-26 09:46:40 +00:00			`import numpy as np`

Rename to stable-baselines3 2020-05-05 13:02:35 +00:00			`from stable_baselines3 import TD3`
			`from stable_baselines3.td3.policies import MlpPolicy`
			`from stable_baselines3.common.noise import NormalActionNoise, OrnsteinUhlenbeckActionNoise`
Add doc 2019-09-26 09:46:40 +00:00
Fix TD3 Example Code Documentation (#38) Fix TD3's example code 2020-06-01 08:37:42 +00:00			`env = gym.make('Pendulum-v0')`

Add doc 2019-09-26 09:46:40 +00:00			`# The noise objects for TD3`
			`n_actions = env.action_space.shape[-1]`
			`action_noise = NormalActionNoise(mean=np.zeros(n_actions), sigma=0.1 * np.ones(n_actions))`

Fix TD3 Example Code Documentation (#38) Fix TD3's example code 2020-06-01 08:37:42 +00:00			`model = TD3(MlpPolicy, env, action_noise=action_noise, verbose=1)`
Add base doc 2020-05-07 08:10:51 +00:00			`model.learn(total_timesteps=10000, log_interval=10)`
Add doc 2019-09-26 09:46:40 +00:00			`model.save("td3_pendulum")`
			`env = model.get_env()`

			`del model # remove to demonstrate saving and loading`

			`model = TD3.load("td3_pendulum")`

			`obs = env.reset()`
			`while True:`
			`action, _states = model.predict(obs)`
			`obs, rewards, dones, info = env.step(action)`
			`env.render()`

Add base doc 2020-05-07 08:10:51 +00:00
Add doc 2019-09-26 09:46:40 +00:00			`Parameters`
			`----------`

			`.. autoclass:: TD3`
			`:members:`
			`:inherited-members:`

			`.. _td3_policies:`

			`TD3 Policies`
			`-------------`

			`.. autoclass:: MlpPolicy`
			`:members:`
			`:inherited-members:`
Add base doc 2020-05-07 08:10:51 +00:00

			`.. .. autoclass:: CnnPolicy`
			`.. :members:`
			`.. :inherited-members:`