mirror of
https://github.com/saymrwulf/stable-baselines3.git
synced 2026-05-28 22:56:53 +00:00
Update SB3 contrib algorithms (#604)
This commit is contained in:
parent
1881d904a0
commit
75aa31dcfb
4 changed files with 23 additions and 12 deletions
|
|
@ -72,7 +72,7 @@ Documentation: https://stable-baselines3.readthedocs.io/en/master/guide/rl_zoo.h
|
|||
|
||||
We implement experimental features in a separate contrib repository: [SB3-Contrib](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib)
|
||||
|
||||
This allows SB3 to maintain a stable and compact core, while still providing the latest features, like Truncated Quantile Critics (TQC) or Quantile Regression DQN (QR-DQN).
|
||||
This allows SB3 to maintain a stable and compact core, while still providing the latest features, like Truncated Quantile Critics (TQC), Quantile Regression DQN (QR-DQN) or PPO with invalid action masking (Maskable PPO).
|
||||
|
||||
Documentation is available online: [https://sb3-contrib.readthedocs.io/](https://sb3-contrib.readthedocs.io/)
|
||||
|
||||
|
|
@ -167,7 +167,11 @@ All the following examples can be executed online using Google colab notebooks:
|
|||
| PPO | :x: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
|
||||
| SAC | :x: | :heavy_check_mark: | :x: | :x: | :x: | :x: |
|
||||
| TD3 | :x: | :heavy_check_mark: | :x: | :x: | :x: | :x: |
|
||||
| QR-DQN<sup>[1](#f1)</sup> | :x: | :x: | :heavy_check_mark: | :x: | :x: | :x: |
|
||||
| TQC<sup>[1](#f1)</sup> | :x: | :heavy_check_mark: | :x: | :x: | :x: | :x: |
|
||||
| Maskable PPO<sup>[1](#f1)</sup> | :x: | :x: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
|
||||
|
||||
<b id="f1">1</b>: Implemented in [SB3 Contrib](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib) GitHub repository.
|
||||
|
||||
Actions `gym.spaces`:
|
||||
* `Box`: A N-dimensional box that containes every point in the action space.
|
||||
|
|
|
|||
|
|
@ -5,19 +5,24 @@ This table displays the rl algorithms that are implemented in the Stable Baselin
|
|||
along with some useful characteristics: support for discrete/continuous actions, multiprocessing.
|
||||
|
||||
|
||||
============ =========== ============ ================= =============== ================
|
||||
Name ``Box`` ``Discrete`` ``MultiDiscrete`` ``MultiBinary`` Multi Processing
|
||||
============ =========== ============ ================= =============== ================
|
||||
A2C ✔️ ✔️ ✔️ ✔️ ✔️
|
||||
DDPG ✔️ ❌ ❌ ❌ ❌
|
||||
DQN ❌ ✔️ ❌ ❌ ❌
|
||||
HER ✔️ ✔️ ❌ ❌ ❌
|
||||
PPO ✔️ ✔️ ✔️ ✔️ ✔️
|
||||
SAC ✔️ ❌ ❌ ❌ ❌
|
||||
TD3 ✔️ ❌ ❌ ❌ ❌
|
||||
============ =========== ============ ================= =============== ================
|
||||
=================== =========== ============ ================= =============== ================
|
||||
Name ``Box`` ``Discrete`` ``MultiDiscrete`` ``MultiBinary`` Multi Processing
|
||||
=================== =========== ============ ================= =============== ================
|
||||
A2C ✔️ ✔️ ✔️ ✔️ ✔️
|
||||
DDPG ✔️ ❌ ❌ ❌ ❌
|
||||
DQN ❌ ✔️ ❌ ❌ ❌
|
||||
HER ✔️ ✔️ ❌ ❌ ❌
|
||||
PPO ✔️ ✔️ ✔️ ✔️ ✔️
|
||||
SAC ✔️ ❌ ❌ ❌ ❌
|
||||
TD3 ✔️ ❌ ❌ ❌ ❌
|
||||
QR-DQN [#f1]_ ❌ ️ ✔️ ❌ ❌ ❌
|
||||
TQC [#f1]_ ✔️ ❌ ❌ ❌ ❌
|
||||
Maskable PPO [#f1]_ ❌ ✔️ ✔️ ✔️ ✔️
|
||||
=================== =========== ============ ================= =============== ================
|
||||
|
||||
|
||||
.. [#f1] Implemented in `SB3 Contrib <https://github.com/Stable-Baselines-Team/stable-baselines3-contrib>`_
|
||||
|
||||
.. note::
|
||||
``Tuple`` observation spaces are not supported by any environment
|
||||
however single-level ``Dict`` spaces are (cf. :ref:`Examples <examples>`).
|
||||
|
|
|
|||
|
|
@ -38,6 +38,7 @@ See documentation for the full list of included features.
|
|||
|
||||
- `Truncated Quantile Critics (TQC)`_
|
||||
- `Quantile Regression DQN (QR-DQN)`_
|
||||
- `PPO with invalid action masking (Maskable PPO) <https://arxiv.org/abs/2006.14171>`_
|
||||
|
||||
**Gym Wrappers**:
|
||||
|
||||
|
|
|
|||
|
|
@ -44,6 +44,7 @@ Documentation:
|
|||
- Update read the doc env (fixed ``docutils`` issue)
|
||||
- Fix PPO environment name (@IljaAvadiev)
|
||||
- Fix custom env doc and add env registration example
|
||||
- Update algorithms from SB3 Contrib
|
||||
|
||||
|
||||
Release 1.2.0 (2021-09-03)
|
||||
|
|
|
|||
Loading…
Reference in a new issue