Update SB3 contrib algorithms (#604)

This commit is contained in:
Antonin RAFFIN 2021-10-10 15:41:39 +02:00 committed by GitHub
parent 1881d904a0
commit 75aa31dcfb
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
4 changed files with 23 additions and 12 deletions

View file

@ -72,7 +72,7 @@ Documentation: https://stable-baselines3.readthedocs.io/en/master/guide/rl_zoo.h
We implement experimental features in a separate contrib repository: [SB3-Contrib](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib)
This allows SB3 to maintain a stable and compact core, while still providing the latest features, like Truncated Quantile Critics (TQC) or Quantile Regression DQN (QR-DQN).
This allows SB3 to maintain a stable and compact core, while still providing the latest features, like Truncated Quantile Critics (TQC), Quantile Regression DQN (QR-DQN) or PPO with invalid action masking (Maskable PPO).
Documentation is available online: [https://sb3-contrib.readthedocs.io/](https://sb3-contrib.readthedocs.io/)
@ -167,7 +167,11 @@ All the following examples can be executed online using Google colab notebooks:
| PPO | :x: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| SAC | :x: | :heavy_check_mark: | :x: | :x: | :x: | :x: |
| TD3 | :x: | :heavy_check_mark: | :x: | :x: | :x: | :x: |
| QR-DQN<sup>[1](#f1)</sup> | :x: | :x: | :heavy_check_mark: | :x: | :x: | :x: |
| TQC<sup>[1](#f1)</sup> | :x: | :heavy_check_mark: | :x: | :x: | :x: | :x: |
| Maskable PPO<sup>[1](#f1)</sup> | :x: | :x: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
<b id="f1">1</b>: Implemented in [SB3 Contrib](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib) GitHub repository.
Actions `gym.spaces`:
* `Box`: A N-dimensional box that containes every point in the action space.

View file

@ -5,19 +5,24 @@ This table displays the rl algorithms that are implemented in the Stable Baselin
along with some useful characteristics: support for discrete/continuous actions, multiprocessing.
============ =========== ============ ================= =============== ================
Name ``Box`` ``Discrete`` ``MultiDiscrete`` ``MultiBinary`` Multi Processing
============ =========== ============ ================= =============== ================
A2C ✔️ ✔️ ✔️ ✔️ ✔️
DDPG ✔️ ❌ ❌ ❌ ❌
DQN ❌ ✔️ ❌ ❌ ❌
HER ✔️ ✔️ ❌ ❌ ❌
PPO ✔️ ✔️ ✔️ ✔️ ✔️
SAC ✔️ ❌ ❌ ❌ ❌
TD3 ✔️ ❌ ❌ ❌ ❌
============ =========== ============ ================= =============== ================
=================== =========== ============ ================= =============== ================
Name ``Box`` ``Discrete`` ``MultiDiscrete`` ``MultiBinary`` Multi Processing
=================== =========== ============ ================= =============== ================
A2C ✔️ ✔️ ✔️ ✔️ ✔️
DDPG ✔️ ❌ ❌ ❌ ❌
DQN ❌ ✔️ ❌ ❌ ❌
HER ✔️ ✔️ ❌ ❌ ❌
PPO ✔️ ✔️ ✔️ ✔️ ✔️
SAC ✔️ ❌ ❌ ❌ ❌
TD3 ✔️ ❌ ❌ ❌ ❌
QR-DQN [#f1]_ ✔️ ❌ ❌ ❌
TQC [#f1]_ ✔️ ❌ ❌ ❌ ❌
Maskable PPO [#f1]_ ❌ ✔️ ✔️ ✔️ ✔️
=================== =========== ============ ================= =============== ================
.. [#f1] Implemented in `SB3 Contrib <https://github.com/Stable-Baselines-Team/stable-baselines3-contrib>`_
.. note::
``Tuple`` observation spaces are not supported by any environment
however single-level ``Dict`` spaces are (cf. :ref:`Examples <examples>`).

View file

@ -38,6 +38,7 @@ See documentation for the full list of included features.
- `Truncated Quantile Critics (TQC)`_
- `Quantile Regression DQN (QR-DQN)`_
- `PPO with invalid action masking (Maskable PPO) <https://arxiv.org/abs/2006.14171>`_
**Gym Wrappers**:

View file

@ -44,6 +44,7 @@ Documentation:
- Update read the doc env (fixed ``docutils`` issue)
- Fix PPO environment name (@IljaAvadiev)
- Fix custom env doc and add env registration example
- Update algorithms from SB3 Contrib
Release 1.2.0 (2021-09-03)