mirror of
https://github.com/saymrwulf/stable-baselines3.git
synced 2026-05-16 21:10:08 +00:00
Stable-Baselines3 v1.0 (#354)
* Bump version and update doc * Fix name * Apply suggestions from code review Co-authored-by: Adam Gleave <adam@gleave.me> * Update docs/index.rst Co-authored-by: Adam Gleave <adam@gleave.me> * Update wording for RL zoo Co-authored-by: Adam Gleave <adam@gleave.me>
This commit is contained in:
parent
237223f834
commit
e3875b50a1
11 changed files with 75 additions and 17 deletions
14
README.md
14
README.md
|
|
@ -36,7 +36,7 @@ you can take a look at the issues [#48](https://github.com/DLR-RM/stable-baselin
|
|||
| Type hints | :heavy_check_mark: |
|
||||
|
||||
|
||||
### Planned features (v1.1+)
|
||||
### Planned features
|
||||
|
||||
Please take a look at the [Roadmap](https://github.com/DLR-RM/stable-baselines3/issues/1) and [Milestones](https://github.com/DLR-RM/stable-baselines3/milestones).
|
||||
|
||||
|
|
@ -48,11 +48,13 @@ A migration guide from SB2 to SB3 can be found in the [documentation](https://st
|
|||
|
||||
Documentation is available online: [https://stable-baselines3.readthedocs.io/](https://stable-baselines3.readthedocs.io/)
|
||||
|
||||
## RL Baselines3 Zoo: A Collection of Trained RL Agents
|
||||
## RL Baselines3 Zoo: A Training Framework for Stable Baselines3 Reinforcement Learning Agents
|
||||
|
||||
[RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo). is a collection of pre-trained Reinforcement Learning agents using Stable-Baselines3.
|
||||
[RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo) is a training framework for Reinforcement Learning (RL).
|
||||
|
||||
It also provides basic scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.
|
||||
It provides scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.
|
||||
|
||||
In addition, it includes a collection of tuned hyperparameters for common environments and RL algorithms, and agents trained with those settings.
|
||||
|
||||
Goals of this repository:
|
||||
|
||||
|
|
@ -110,9 +112,9 @@ import gym
|
|||
|
||||
from stable_baselines3 import PPO
|
||||
|
||||
env = gym.make('CartPole-v1')
|
||||
env = gym.make("CartPole-v1")
|
||||
|
||||
model = PPO('MlpPolicy', env, verbose=1)
|
||||
model = PPO("MlpPolicy", env, verbose=1)
|
||||
model.learn(total_timesteps=10000)
|
||||
|
||||
obs = env.reset()
|
||||
|
|
|
|||
BIN
docs/_static/img/net_arch.png
vendored
Normal file
BIN
docs/_static/img/net_arch.png
vendored
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 135 KiB |
BIN
docs/_static/img/sb3_loop.png
vendored
Normal file
BIN
docs/_static/img/sb3_loop.png
vendored
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 165 KiB |
BIN
docs/_static/img/sb3_policy.png
vendored
Normal file
BIN
docs/_static/img/sb3_policy.png
vendored
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 176 KiB |
|
|
@ -13,9 +13,49 @@ and other type of input features (MlpPolicies).
|
|||
which handles bounds more correctly.
|
||||
|
||||
|
||||
SB3 Policy
|
||||
^^^^^^^^^^
|
||||
|
||||
Custom Policy Architecture
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
SB3 networks are separated into two mains parts (see figure below):
|
||||
|
||||
- A features extractor (usually shared between actor and critic when applicable, to save computation)
|
||||
whose role is to extract features (i.e. convert to a feature vector) from high-dimensional observations, for instance, a CNN that extracts features from images.
|
||||
This is the ``features_extractor_class`` parameter. You can change the default parameters of that features extractor
|
||||
by passing a ``features_extractor_kwargs`` parameter.
|
||||
|
||||
- A (fully-connected) network that maps the features to actions/value. Its architecture is controlled by the ``net_arch`` parameter.
|
||||
|
||||
|
||||
.. note::
|
||||
|
||||
All observations are first pre-processed (e.g. images are normalized, discrete obs are converted to one-hot vectors, ...) before being fed to the features extractor.
|
||||
In the case of vector observations, the features extractor is just a ``Flatten`` layer.
|
||||
|
||||
|
||||
.. image:: ../_static/img/net_arch.png
|
||||
|
||||
|
||||
SB3 policies are usually composed of several networks (actor/critic networks + target networks when applicable) together
|
||||
with the associated optimizers.
|
||||
|
||||
Each of these network have a features extractor followed by a fully-connected network.
|
||||
|
||||
.. note::
|
||||
|
||||
When we refer to "policy" in Stable-Baselines3, this is usually an abuse of language compared to RL terminology.
|
||||
In SB3, "policy" refers to the class that handles all the networks useful for training,
|
||||
so not only the network used to predict actions (the "learned controller").
|
||||
|
||||
|
||||
|
||||
.. image:: ../_static/img/sb3_policy.png
|
||||
|
||||
|
||||
.. .. figure:: https://cdn-images-1.medium.com/max/960/1*h4WTQNVIsvMXJTCpXm_TAw.gif
|
||||
|
||||
|
||||
Custom Network Architecture
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
One way of customising the policy network architecture is to pass arguments when creating the model,
|
||||
using ``policy_kwargs`` parameter:
|
||||
|
|
|
|||
|
|
@ -31,6 +31,9 @@ Each algorithm has two main methods:
|
|||
- ``.train()`` which updates the parameters using samples from the buffer
|
||||
|
||||
|
||||
.. image:: ../_static/img/sb3_loop.png
|
||||
|
||||
|
||||
Where to start?
|
||||
===============
|
||||
|
||||
|
|
|
|||
|
|
@ -98,7 +98,7 @@ Base-class (all algorithms)
|
|||
Policies
|
||||
^^^^^^^^
|
||||
|
||||
- ``cnn_extractor`` -> ``feature_extractor``, as ``feature_extractor`` in now used with ``MlpPolicy`` too
|
||||
- ``cnn_extractor`` -> ``features_extractor``, as ``features_extractor`` in now used with ``MlpPolicy`` too
|
||||
|
||||
A2C
|
||||
^^^
|
||||
|
|
|
|||
|
|
@ -4,9 +4,11 @@
|
|||
RL Baselines3 Zoo
|
||||
==================
|
||||
|
||||
`RL Baselines3 Zoo <https://github.com/DLR-RM/rl-baselines3-zoo>`_. is a collection of pre-trained Reinforcement Learning agents using
|
||||
Stable-Baselines3.
|
||||
It also provides basic scripts for training, evaluating agents, tuning hyperparameters and recording videos.
|
||||
`RL Baselines3 Zoo <https://github.com/DLR-RM/rl-baselines3-zoo>`_ is a training framework for Reinforcement Learning (RL).
|
||||
|
||||
It provides scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.
|
||||
|
||||
In addition, it includes a collection of tuned hyperparameters for common environments and RL algorithms, and agents trained with those settings.
|
||||
|
||||
Goals of this repository:
|
||||
|
||||
|
|
|
|||
|
|
@ -12,9 +12,9 @@ It is the next major version of `Stable Baselines <https://github.com/hill-a/sta
|
|||
|
||||
Github repository: https://github.com/DLR-RM/stable-baselines3
|
||||
|
||||
RL Baselines3 Zoo (collection of pre-trained agents): https://github.com/DLR-RM/rl-baselines3-zoo
|
||||
RL Baselines3 Zoo (training framework for SB3): https://github.com/DLR-RM/rl-baselines3-zoo
|
||||
|
||||
RL Baselines3 Zoo also offers a simple interface to train, evaluate agents and do hyperparameter tuning.
|
||||
RL Baselines3 Zoo provides a collection of pre-trained agents, scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.
|
||||
|
||||
SB3 Contrib (experimental RL code, latest algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
|
||||
|
||||
|
|
|
|||
|
|
@ -3,13 +3,22 @@
|
|||
Changelog
|
||||
==========
|
||||
|
||||
Release 1.0rc2 (WIP)
|
||||
Release 1.0 (2021-03-15)
|
||||
-------------------------------
|
||||
|
||||
**First Major Version**
|
||||
|
||||
Breaking Changes:
|
||||
^^^^^^^^^^^^^^^^^
|
||||
- Removed ``stable_baselines3.common.cmd_util`` (already deprecated), please use ``env_util`` instead
|
||||
|
||||
.. warning::
|
||||
|
||||
A refactoring of the ``HER`` algorithm is planned together with support for dictionary observations
|
||||
(see `PR #243 <https://github.com/DLR-RM/stable-baselines3/pull/243>`_ and `#351 <https://github.com/DLR-RM/stable-baselines3/pull/351>`_)
|
||||
This will be a backward incompatible change (model trained with previous version of ``HER`` won't work with the new version).
|
||||
|
||||
|
||||
New Features:
|
||||
^^^^^^^^^^^^^
|
||||
- Added support for ``custom_objects`` when loading models
|
||||
|
|
@ -24,7 +33,9 @@ Documentation:
|
|||
- Added new project using SB3: rl_reach (@PierreExeter)
|
||||
- Added note about slow-down when switching to PyTorch
|
||||
- Add a note on continual learning and resetting environment
|
||||
|
||||
- Updated RL-Zoo to reflect the fact that is it more than a collection of trained agents
|
||||
- Added images to illustrate the training loop and custom policies (created with https://excalidraw.com/)
|
||||
- Updated the custom policy section
|
||||
|
||||
Pre-Release 0.11.1 (2021-02-27)
|
||||
-------------------------------
|
||||
|
|
|
|||
|
|
@ -1 +1 @@
|
|||
1.0rc2
|
||||
1.0
|
||||
|
|
|
|||
Loading…
Reference in a new issue