Update SB3 Contrib doc (ARS) and W&B integration (#726)

* Add ARS to SB3 contrib

* Add integration page
This commit is contained in:
Antonin RAFFIN 2022-01-18 15:10:25 +01:00 committed by GitHub
parent e9a8979022
commit cd6e04705b
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
6 changed files with 60 additions and 1 deletions

View file

@ -160,6 +160,7 @@ All the following examples can be executed online using Google colab notebooks:
| **Name** | **Recurrent** | `Box` | `Discrete` | `MultiDiscrete` | `MultiBinary` | **Multi Processing** |
| ------------------- | ------------------ | ------------------ | ------------------ | ------------------- | ------------------ | --------------------------------- |
| ARS<sup>[1](#f1)</sup> | :x: | :heavy_check_mark: | :heavy_check_mark: | :x: | :x: | :heavy_check_mark: |
| A2C | :x: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| DDPG | :x: | :heavy_check_mark: | :x: | :x: | :x: | :heavy_check_mark: |
| DQN | :x: | :x: | :heavy_check_mark: | :x: | :x: | :heavy_check_mark: |

View file

@ -8,6 +8,7 @@ along with some useful characteristics: support for discrete/continuous actions,
=================== =========== ============ ================= =============== ================
Name ``Box`` ``Discrete`` ``MultiDiscrete`` ``MultiBinary`` Multi Processing
=================== =========== ============ ================= =============== ================
ARS [#f1]_ ✔️ ✔️ ❌ ❌ ✔️
A2C ✔️ ✔️ ✔️ ✔️ ✔️
DDPG ✔️ ❌ ❌ ❌ ✔️
DQN ❌ ✔️ ❌ ❌ ✔️

View file

@ -0,0 +1,49 @@
.. _integrations:
============
Integrations
============
Weights & Biases
================
Weights & Biases provides a callback for experiment tracking that allows to visualize and share results.
The full documentation is available here: https://docs.wandb.ai/guides/integrations/other/stable-baselines-3
.. code-block:: python
import gym
import wandb
from wandb.integration.sb3 import WandbCallback
from stable_baselines3 import PPO
config = {
"policy_type": "MlpPolicy",
"total_timesteps": 25000,
"env_name": "CartPole-v1",
}
run = wandb.init(
project="sb3",
config=config,
sync_tensorboard=True, # auto-upload sb3's tensorboard metrics
# monitor_gym=True, # auto-upload the videos of agents playing the game
# save_code=True, # optional
)
model = PPO(config["policy_type"], config["env_name"], verbose=1, tensorboard_log=f"runs/{run.id}")
model.learn(
total_timesteps=config["total_timesteps"],
callback=WandbCallback(
model_save_path=f"models/{run.id}",
verbose=2,
),
)
run.finish()
Hugging Face
============
To be added.

View file

@ -8,7 +8,7 @@ We implement experimental features in a separate contrib repository:
`SB3-Contrib`_
This allows Stable-Baselines3 (SB3) to maintain a stable and compact core, while still
providing the latest features, like Truncated Quantile Critics (TQC), Trust Region Policy Optimization (TRPO) or
providing the latest features, like Truncated Quantile Critics (TQC), Augmented Random Search (ARS), Trust Region Policy Optimization (TRPO) or
Quantile Regression DQN (QR-DQN).
Why create this repository?
@ -36,6 +36,7 @@ See documentation for the full list of included features.
**RL Algorithms**:
- `Augmented Random Search (ARS) <https://arxiv.org/abs/1803.07055>`_
- `Quantile Regression DQN (QR-DQN)`_
- `Truncated Quantile Critics (TQC)`_
- `Trust Region Policy Optimization (TRPO) <https://arxiv.org/abs/1502.05477>`_

View file

@ -48,6 +48,7 @@ Main Features
guide/custom_policy
guide/callbacks
guide/tensorboard
guide/integrations
guide/rl_zoo
guide/sb3_contrib
guide/imitation

View file

@ -23,6 +23,11 @@ New Features:
- Added ``skip`` option to ``VecTransposeImage`` to skip transforming the channel order when the heuristic is wrong
- Added ``copy()`` and ``combine()`` methods to ``RunningMeanStd``
SB3-Contrib
^^^^^^^^^^^
- Added Trust Region Policy Optimization (TRPO) (@cyprienc)
- Added Augmented Random Search (ARS) (@sgillen)
Bug Fixes:
^^^^^^^^^^
- Fixed a bug where ``set_env()`` with ``VecNormalize`` would result in an error with off-policy algorithms (thanks @cleversonahum)
@ -57,6 +62,7 @@ Documentation:
- Updated SB3 Contrib doc
- Fixed A2C and migration guide guidance on how to set epsilon with RMSpropTFLike (@thomasgubler)
- Fixed custom policy documentation (@IperGiove)
- Added doc on Weights & Biases integration
Release 1.3.0 (2021-10-23)
---------------------------