stable-baselines3/docs/modules/cem_rl.rst
Antonin Raffin 9e250b6818 Build doc
2020-01-20 16:19:35 +01:00

96 lines
1.8 KiB
ReStructuredText

.. _cem_rl:
.. automodule:: torchy_baselines.cem_rl
CEM RL
======
Combining cross-entropy method (CEM) and Twin Delayed Deep Deterministic policy gradient (TD3).
.. rubric:: Available Policies
.. autosummary::
:nosignatures:
MlpPolicy
Notes
-----
- Original paper: https://arxiv.org/abs/1810.01222 and https://openreview.net/forum?id=BkeU5j0ctQ
- Original Implementation: https://github.com/apourchot/CEM-RL
.. note::
CEM RL is currently implemented for TD3
.. note::
The default policies for CEM RL differ a bit from others MlpPolicy: it uses ReLU instead of tanh activation,
to match the original paper
Can I use?
----------
- Recurrent policies: ❌
- Multi processing: ❌
- Gym spaces:
============= ====== ===========
Space Action Observation
============= ====== ===========
Discrete ❌ ❌
Box ✔️ ✔️
MultiDiscrete ❌ ❌
MultiBinary ❌ ❌
============= ====== ===========
Example
-------
.. code-block:: python
import numpy as np
from torchy_baselines import CEMRL
from torchy_baselines.td3.policies import MlpPolicy
# n_grad = 0 corresponds to CEM (in fact CMA-ES without history)
model = CEMRL(MlpPolicy, 'Pendulum-v0', pop_size=10, n_grad=5, verbose=1)
model.learn(total_timesteps=50000, log_interval=10)
model.save("td3_pendulum")
env = model.get_env()
del model # remove to demonstrate saving and loading
model = CEMRL.load("td3_pendulum")
obs = env.reset()
while True:
action, _states = model.predict(obs)
obs, rewards, dones, info = env.step(action)
env.render()
Parameters
----------
.. autoclass:: CEMRL
:members:
:inherited-members:
.. _cemrl_policies:
CEM RL Policies
---------------
.. autoclass:: MlpPolicy
:members:
:inherited-members: