Stable-Baselines3 v1.0 (#354)

* Bump version and update doc

* Fix name

* Apply suggestions from code review

Co-authored-by: Adam Gleave <adam@gleave.me>

* Update docs/index.rst

Co-authored-by: Adam Gleave <adam@gleave.me>

* Update wording for RL zoo

Co-authored-by: Adam Gleave <adam@gleave.me>
This commit is contained in:
Antonin RAFFIN 2021-03-17 14:20:31 +01:00 committed by GitHub
parent 237223f834
commit e3875b50a1
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
11 changed files with 75 additions and 17 deletions

View file

@ -36,7 +36,7 @@ you can take a look at the issues [#48](https://github.com/DLR-RM/stable-baselin
| Type hints | :heavy_check_mark: |
### Planned features (v1.1+)
### Planned features
Please take a look at the [Roadmap](https://github.com/DLR-RM/stable-baselines3/issues/1) and [Milestones](https://github.com/DLR-RM/stable-baselines3/milestones).
@ -48,11 +48,13 @@ A migration guide from SB2 to SB3 can be found in the [documentation](https://st
Documentation is available online: [https://stable-baselines3.readthedocs.io/](https://stable-baselines3.readthedocs.io/)
## RL Baselines3 Zoo: A Collection of Trained RL Agents
## RL Baselines3 Zoo: A Training Framework for Stable Baselines3 Reinforcement Learning Agents
[RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo). is a collection of pre-trained Reinforcement Learning agents using Stable-Baselines3.
[RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo) is a training framework for Reinforcement Learning (RL).
It also provides basic scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.
It provides scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.
In addition, it includes a collection of tuned hyperparameters for common environments and RL algorithms, and agents trained with those settings.
Goals of this repository:
@ -110,9 +112,9 @@ import gym
from stable_baselines3 import PPO
env = gym.make('CartPole-v1')
env = gym.make("CartPole-v1")
model = PPO('MlpPolicy', env, verbose=1)
model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=10000)
obs = env.reset()

BIN
docs/_static/img/net_arch.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 135 KiB

BIN
docs/_static/img/sb3_loop.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 165 KiB

BIN
docs/_static/img/sb3_policy.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 176 KiB

View file

@ -13,9 +13,49 @@ and other type of input features (MlpPolicies).
which handles bounds more correctly.
SB3 Policy
^^^^^^^^^^
Custom Policy Architecture
^^^^^^^^^^^^^^^^^^^^^^^^^^
SB3 networks are separated into two mains parts (see figure below):
- A features extractor (usually shared between actor and critic when applicable, to save computation)
whose role is to extract features (i.e. convert to a feature vector) from high-dimensional observations, for instance, a CNN that extracts features from images.
This is the ``features_extractor_class`` parameter. You can change the default parameters of that features extractor
by passing a ``features_extractor_kwargs`` parameter.
- A (fully-connected) network that maps the features to actions/value. Its architecture is controlled by the ``net_arch`` parameter.
.. note::
All observations are first pre-processed (e.g. images are normalized, discrete obs are converted to one-hot vectors, ...) before being fed to the features extractor.
In the case of vector observations, the features extractor is just a ``Flatten`` layer.
.. image:: ../_static/img/net_arch.png
SB3 policies are usually composed of several networks (actor/critic networks + target networks when applicable) together
with the associated optimizers.
Each of these network have a features extractor followed by a fully-connected network.
.. note::
When we refer to "policy" in Stable-Baselines3, this is usually an abuse of language compared to RL terminology.
In SB3, "policy" refers to the class that handles all the networks useful for training,
so not only the network used to predict actions (the "learned controller").
.. image:: ../_static/img/sb3_policy.png
.. .. figure:: https://cdn-images-1.medium.com/max/960/1*h4WTQNVIsvMXJTCpXm_TAw.gif
Custom Network Architecture
^^^^^^^^^^^^^^^^^^^^^^^^^^^
One way of customising the policy network architecture is to pass arguments when creating the model,
using ``policy_kwargs`` parameter:

View file

@ -31,6 +31,9 @@ Each algorithm has two main methods:
- ``.train()`` which updates the parameters using samples from the buffer
.. image:: ../_static/img/sb3_loop.png
Where to start?
===============

View file

@ -98,7 +98,7 @@ Base-class (all algorithms)
Policies
^^^^^^^^
- ``cnn_extractor`` -> ``feature_extractor``, as ``feature_extractor`` in now used with ``MlpPolicy`` too
- ``cnn_extractor`` -> ``features_extractor``, as ``features_extractor`` in now used with ``MlpPolicy`` too
A2C
^^^

View file

@ -4,9 +4,11 @@
RL Baselines3 Zoo
==================
`RL Baselines3 Zoo <https://github.com/DLR-RM/rl-baselines3-zoo>`_. is a collection of pre-trained Reinforcement Learning agents using
Stable-Baselines3.
It also provides basic scripts for training, evaluating agents, tuning hyperparameters and recording videos.
`RL Baselines3 Zoo <https://github.com/DLR-RM/rl-baselines3-zoo>`_ is a training framework for Reinforcement Learning (RL).
It provides scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.
In addition, it includes a collection of tuned hyperparameters for common environments and RL algorithms, and agents trained with those settings.
Goals of this repository:

View file

@ -12,9 +12,9 @@ It is the next major version of `Stable Baselines <https://github.com/hill-a/sta
Github repository: https://github.com/DLR-RM/stable-baselines3
RL Baselines3 Zoo (collection of pre-trained agents): https://github.com/DLR-RM/rl-baselines3-zoo
RL Baselines3 Zoo (training framework for SB3): https://github.com/DLR-RM/rl-baselines3-zoo
RL Baselines3 Zoo also offers a simple interface to train, evaluate agents and do hyperparameter tuning.
RL Baselines3 Zoo provides a collection of pre-trained agents, scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.
SB3 Contrib (experimental RL code, latest algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib

View file

@ -3,13 +3,22 @@
Changelog
==========
Release 1.0rc2 (WIP)
Release 1.0 (2021-03-15)
-------------------------------
**First Major Version**
Breaking Changes:
^^^^^^^^^^^^^^^^^
- Removed ``stable_baselines3.common.cmd_util`` (already deprecated), please use ``env_util`` instead
.. warning::
A refactoring of the ``HER`` algorithm is planned together with support for dictionary observations
(see `PR #243 <https://github.com/DLR-RM/stable-baselines3/pull/243>`_ and `#351 <https://github.com/DLR-RM/stable-baselines3/pull/351>`_)
This will be a backward incompatible change (model trained with previous version of ``HER`` won't work with the new version).
New Features:
^^^^^^^^^^^^^
- Added support for ``custom_objects`` when loading models
@ -24,7 +33,9 @@ Documentation:
- Added new project using SB3: rl_reach (@PierreExeter)
- Added note about slow-down when switching to PyTorch
- Add a note on continual learning and resetting environment
- Updated RL-Zoo to reflect the fact that is it more than a collection of trained agents
- Added images to illustrate the training loop and custom policies (created with https://excalidraw.com/)
- Updated the custom policy section
Pre-Release 0.11.1 (2021-02-27)
-------------------------------

View file

@ -1 +1 @@
1.0rc2
1.0