.. _cem_rl: .. automodule:: torchy_baselines.cem_rl CEM RL ====== Combining cross-entropy method (CEM) and Twin Delayed Deep Deterministic policy gradient (TD3). .. rubric:: Available Policies .. autosummary:: :nosignatures: MlpPolicy Notes ----- - Original paper: https://arxiv.org/abs/1810.01222 and https://openreview.net/forum?id=BkeU5j0ctQ - Original Implementation: https://github.com/apourchot/CEM-RL .. note:: CEM RL is currently implemented for TD3 .. note:: The default policies for CEM RL differ a bit from others MlpPolicy: it uses ReLU instead of tanh activation, to match the original paper Can I use? ---------- - Recurrent policies: ❌ - Multi processing: ❌ - Gym spaces: ============= ====== =========== Space Action Observation ============= ====== =========== Discrete ❌ ❌ Box ✔️ ✔️ MultiDiscrete ❌ ❌ MultiBinary ❌ ❌ ============= ====== =========== Example ------- .. code-block:: python import numpy as np from torchy_baselines import CEMRL from torchy_baselines.td3.policies import MlpPolicy # n_grad = 0 corresponds to CEM (in fact CMA-ES without history) model = CEMRL(MlpPolicy, 'Pendulum-v0', pop_size=10, n_grad=5, verbose=1) model.learn(total_timesteps=50000, log_interval=10) model.save("td3_pendulum") env = model.get_env() del model # remove to demonstrate saving and loading model = CEMRL.load("td3_pendulum") obs = env.reset() while True: action, _states = model.predict(obs) obs, rewards, dones, info = env.step(action) env.render() Parameters ---------- .. autoclass:: CEMRL :members: :inherited-members: .. _cemrl_policies: CEM RL Policies --------------- .. autoclass:: MlpPolicy :members: :inherited-members: