2020-05-07 14:08:23 +00:00
|
|
|
|
RL Algorithms
|
|
|
|
|
|
=============
|
|
|
|
|
|
|
|
|
|
|
|
This table displays the rl algorithms that are implemented in the Stable Baselines3 project,
|
|
|
|
|
|
along with some useful characteristics: support for discrete/continuous actions, multiprocessing.
|
|
|
|
|
|
|
|
|
|
|
|
|
2021-10-10 13:41:39 +00:00
|
|
|
|
=================== =========== ============ ================= =============== ================
|
|
|
|
|
|
Name ``Box`` ``Discrete`` ``MultiDiscrete`` ``MultiBinary`` Multi Processing
|
|
|
|
|
|
=================== =========== ============ ================= =============== ================
|
2022-01-18 14:10:25 +00:00
|
|
|
|
ARS [#f1]_ ✔️ ✔️ ❌ ❌ ✔️
|
2021-10-10 13:41:39 +00:00
|
|
|
|
A2C ✔️ ✔️ ✔️ ✔️ ✔️
|
2021-12-01 21:30:09 +00:00
|
|
|
|
DDPG ✔️ ❌ ❌ ❌ ✔️
|
|
|
|
|
|
DQN ❌ ✔️ ❌ ❌ ✔️
|
|
|
|
|
|
HER ✔️ ✔️ ❌ ❌ ❌
|
2021-10-10 13:41:39 +00:00
|
|
|
|
PPO ✔️ ✔️ ✔️ ✔️ ✔️
|
2021-12-29 13:25:09 +00:00
|
|
|
|
QR-DQN [#f1]_ ❌ ️ ✔️ ❌ ❌ ✔️
|
2022-05-31 16:11:16 +00:00
|
|
|
|
RecurrentPPO [#f1]_ ✔️ ✔️ ✔️ ✔️ ✔️
|
2021-12-01 21:30:09 +00:00
|
|
|
|
SAC ✔️ ❌ ❌ ❌ ✔️
|
|
|
|
|
|
TD3 ✔️ ❌ ❌ ❌ ✔️
|
|
|
|
|
|
TQC [#f1]_ ✔️ ❌ ❌ ❌ ✔️
|
2021-12-29 13:25:09 +00:00
|
|
|
|
TRPO [#f1]_ ✔️ ✔️ ✔️ ✔️ ✔️
|
2021-10-10 13:41:39 +00:00
|
|
|
|
Maskable PPO [#f1]_ ❌ ✔️ ✔️ ✔️ ✔️
|
|
|
|
|
|
=================== =========== ============ ================= =============== ================
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.. [#f1] Implemented in `SB3 Contrib <https://github.com/Stable-Baselines-Team/stable-baselines3-contrib>`_
|
2020-05-07 14:08:23 +00:00
|
|
|
|
|
|
|
|
|
|
.. note::
|
2022-05-03 14:27:48 +00:00
|
|
|
|
``Tuple`` observation spaces are not supported by any environment,
|
|
|
|
|
|
however, single-level ``Dict`` spaces are (cf. :ref:`Examples <examples>`).
|
2021-05-11 10:29:30 +00:00
|
|
|
|
|
2020-05-07 14:08:23 +00:00
|
|
|
|
|
|
|
|
|
|
Actions ``gym.spaces``:
|
|
|
|
|
|
|
|
|
|
|
|
- ``Box``: A N-dimensional box that contains every point in the action
|
|
|
|
|
|
space.
|
|
|
|
|
|
- ``Discrete``: A list of possible actions, where each timestep only
|
|
|
|
|
|
one of the actions can be used.
|
|
|
|
|
|
- ``MultiDiscrete``: A list of possible actions, where each timestep only one action of each discrete set can be used.
|
|
|
|
|
|
- ``MultiBinary``: A list of possible actions, where each timestep any of the actions can be used in any combination.
|
|
|
|
|
|
|
|
|
|
|
|
|
2020-12-21 15:17:24 +00:00
|
|
|
|
.. note::
|
|
|
|
|
|
|
|
|
|
|
|
More algorithms (like QR-DQN or TQC) are implemented in our :ref:`contrib repo <sb3_contrib>`.
|
|
|
|
|
|
|
2020-05-07 14:08:23 +00:00
|
|
|
|
.. note::
|
|
|
|
|
|
|
|
|
|
|
|
Some logging values (like ``ep_rew_mean``, ``ep_len_mean``) are only available when using a ``Monitor`` wrapper
|
|
|
|
|
|
See `Issue #339 <https://github.com/hill-a/stable-baselines/issues/339>`_ for more info.
|
|
|
|
|
|
|
|
|
|
|
|
|
2021-05-11 10:29:30 +00:00
|
|
|
|
.. note::
|
|
|
|
|
|
|
|
|
|
|
|
When using off-policy algorithms, `Time Limits <https://arxiv.org/abs/1712.00378>`_ (aka timeouts) are handled
|
|
|
|
|
|
properly (cf. `issue #284 <https://github.com/DLR-RM/stable-baselines3/issues/284>`_).
|
|
|
|
|
|
You can revert to SB3 < 2.1.0 behavior by passing ``handle_timeout_termination=False``
|
|
|
|
|
|
via the ``replay_buffer_kwargs`` argument.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2020-05-07 14:08:23 +00:00
|
|
|
|
Reproducibility
|
|
|
|
|
|
---------------
|
|
|
|
|
|
|
2021-02-19 18:18:41 +00:00
|
|
|
|
Completely reproducible results are not guaranteed across PyTorch releases or different platforms.
|
2020-05-07 14:08:23 +00:00
|
|
|
|
Furthermore, results need not be reproducible between CPU and GPU executions, even when using identical seeds.
|
|
|
|
|
|
|
|
|
|
|
|
In order to make computations deterministics, on your specific problem on one specific platform,
|
|
|
|
|
|
you need to pass a ``seed`` argument at the creation of a model.
|
|
|
|
|
|
If you pass an environment to the model using ``set_env()``, then you also need to seed the environment first.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Credit: part of the *Reproducibility* section comes from `PyTorch Documentation <https://pytorch.org/docs/stable/notes/randomness.html>`_
|