mirror of
https://github.com/saymrwulf/stable-baselines3.git
synced 2026-05-31 23:28:05 +00:00
Update SB3 Contrib doc (ARS) and W&B integration (#726)
* Add ARS to SB3 contrib * Add integration page
This commit is contained in:
parent
e9a8979022
commit
cd6e04705b
6 changed files with 60 additions and 1 deletions
|
|
@ -160,6 +160,7 @@ All the following examples can be executed online using Google colab notebooks:
|
|||
|
||||
| **Name** | **Recurrent** | `Box` | `Discrete` | `MultiDiscrete` | `MultiBinary` | **Multi Processing** |
|
||||
| ------------------- | ------------------ | ------------------ | ------------------ | ------------------- | ------------------ | --------------------------------- |
|
||||
| ARS<sup>[1](#f1)</sup> | :x: | :heavy_check_mark: | :heavy_check_mark: | :x: | :x: | :heavy_check_mark: |
|
||||
| A2C | :x: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
|
||||
| DDPG | :x: | :heavy_check_mark: | :x: | :x: | :x: | :heavy_check_mark: |
|
||||
| DQN | :x: | :x: | :heavy_check_mark: | :x: | :x: | :heavy_check_mark: |
|
||||
|
|
|
|||
|
|
@ -8,6 +8,7 @@ along with some useful characteristics: support for discrete/continuous actions,
|
|||
=================== =========== ============ ================= =============== ================
|
||||
Name ``Box`` ``Discrete`` ``MultiDiscrete`` ``MultiBinary`` Multi Processing
|
||||
=================== =========== ============ ================= =============== ================
|
||||
ARS [#f1]_ ✔️ ✔️ ❌ ❌ ✔️
|
||||
A2C ✔️ ✔️ ✔️ ✔️ ✔️
|
||||
DDPG ✔️ ❌ ❌ ❌ ✔️
|
||||
DQN ❌ ✔️ ❌ ❌ ✔️
|
||||
|
|
|
|||
49
docs/guide/integrations.rst
Normal file
49
docs/guide/integrations.rst
Normal file
|
|
@ -0,0 +1,49 @@
|
|||
.. _integrations:
|
||||
|
||||
============
|
||||
Integrations
|
||||
============
|
||||
|
||||
Weights & Biases
|
||||
================
|
||||
|
||||
Weights & Biases provides a callback for experiment tracking that allows to visualize and share results.
|
||||
|
||||
The full documentation is available here: https://docs.wandb.ai/guides/integrations/other/stable-baselines-3
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import gym
|
||||
import wandb
|
||||
from wandb.integration.sb3 import WandbCallback
|
||||
|
||||
from stable_baselines3 import PPO
|
||||
|
||||
config = {
|
||||
"policy_type": "MlpPolicy",
|
||||
"total_timesteps": 25000,
|
||||
"env_name": "CartPole-v1",
|
||||
}
|
||||
run = wandb.init(
|
||||
project="sb3",
|
||||
config=config,
|
||||
sync_tensorboard=True, # auto-upload sb3's tensorboard metrics
|
||||
# monitor_gym=True, # auto-upload the videos of agents playing the game
|
||||
# save_code=True, # optional
|
||||
)
|
||||
|
||||
model = PPO(config["policy_type"], config["env_name"], verbose=1, tensorboard_log=f"runs/{run.id}")
|
||||
model.learn(
|
||||
total_timesteps=config["total_timesteps"],
|
||||
callback=WandbCallback(
|
||||
model_save_path=f"models/{run.id}",
|
||||
verbose=2,
|
||||
),
|
||||
)
|
||||
run.finish()
|
||||
|
||||
|
||||
Hugging Face
|
||||
============
|
||||
|
||||
To be added.
|
||||
|
|
@ -8,7 +8,7 @@ We implement experimental features in a separate contrib repository:
|
|||
`SB3-Contrib`_
|
||||
|
||||
This allows Stable-Baselines3 (SB3) to maintain a stable and compact core, while still
|
||||
providing the latest features, like Truncated Quantile Critics (TQC), Trust Region Policy Optimization (TRPO) or
|
||||
providing the latest features, like Truncated Quantile Critics (TQC), Augmented Random Search (ARS), Trust Region Policy Optimization (TRPO) or
|
||||
Quantile Regression DQN (QR-DQN).
|
||||
|
||||
Why create this repository?
|
||||
|
|
@ -36,6 +36,7 @@ See documentation for the full list of included features.
|
|||
|
||||
**RL Algorithms**:
|
||||
|
||||
- `Augmented Random Search (ARS) <https://arxiv.org/abs/1803.07055>`_
|
||||
- `Quantile Regression DQN (QR-DQN)`_
|
||||
- `Truncated Quantile Critics (TQC)`_
|
||||
- `Trust Region Policy Optimization (TRPO) <https://arxiv.org/abs/1502.05477>`_
|
||||
|
|
|
|||
|
|
@ -48,6 +48,7 @@ Main Features
|
|||
guide/custom_policy
|
||||
guide/callbacks
|
||||
guide/tensorboard
|
||||
guide/integrations
|
||||
guide/rl_zoo
|
||||
guide/sb3_contrib
|
||||
guide/imitation
|
||||
|
|
|
|||
|
|
@ -23,6 +23,11 @@ New Features:
|
|||
- Added ``skip`` option to ``VecTransposeImage`` to skip transforming the channel order when the heuristic is wrong
|
||||
- Added ``copy()`` and ``combine()`` methods to ``RunningMeanStd``
|
||||
|
||||
SB3-Contrib
|
||||
^^^^^^^^^^^
|
||||
- Added Trust Region Policy Optimization (TRPO) (@cyprienc)
|
||||
- Added Augmented Random Search (ARS) (@sgillen)
|
||||
|
||||
Bug Fixes:
|
||||
^^^^^^^^^^
|
||||
- Fixed a bug where ``set_env()`` with ``VecNormalize`` would result in an error with off-policy algorithms (thanks @cleversonahum)
|
||||
|
|
@ -57,6 +62,7 @@ Documentation:
|
|||
- Updated SB3 Contrib doc
|
||||
- Fixed A2C and migration guide guidance on how to set epsilon with RMSpropTFLike (@thomasgubler)
|
||||
- Fixed custom policy documentation (@IperGiove)
|
||||
- Added doc on Weights & Biases integration
|
||||
|
||||
Release 1.3.0 (2021-10-23)
|
||||
---------------------------
|
||||
|
|
|
|||
Loading…
Reference in a new issue