Update custom policy documentation (#312)

* Update README

* Update custom policy documentation

* Add discord link

* Add note about OpenCV headless version
This commit is contained in:
Antonin RAFFIN 2021-02-06 18:19:58 +01:00 committed by GitHub
parent b01bde3e2d
commit 48a19a43ec
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
9 changed files with 30 additions and 20 deletions

View file

@ -6,7 +6,7 @@ title: "[Bug] bug title"
---
**Important Note: We do not do technical support, nor consulting** and don't answer personal questions per email.
Please post your question on [reddit](https://www.reddit.com/r/reinforcementlearning/) or [stack overflow](https://stackoverflow.com/) in that case.
Please post your question on the [RL Discord](https://discord.com/invite/xhfNqQv), [Reddit](https://www.reddit.com/r/reinforcementlearning/) or [Stack Overflow](https://stackoverflow.com/) in that case.
If your issue is related to a **custom gym environment**, please use the custom gym env template.

View file

@ -5,7 +5,7 @@ labels: question, custom gym env
---
**Important Note: We do not do technical support, nor consulting** and don't answer personal questions per email.
Please post your question on [reddit](https://www.reddit.com/r/reinforcementlearning/) or [stack overflow](https://stackoverflow.com/) in that case.
Please post your question on the [RL Discord](https://discord.com/invite/xhfNqQv), [Reddit](https://www.reddit.com/r/reinforcementlearning/) or [Stack Overflow](https://stackoverflow.com/) in that case.
### 🤖 Custom Gym Environment

View file

@ -5,7 +5,7 @@ labels: documentation
---
**Important Note: We do not do technical support, nor consulting** and don't answer personal questions per email.
Please post your question on [reddit](https://www.reddit.com/r/reinforcementlearning/) or [stack overflow](https://stackoverflow.com/) in that case.
Please post your question on the [RL Discord](https://discord.com/invite/xhfNqQv), [Reddit](https://www.reddit.com/r/reinforcementlearning/) or [Stack Overflow](https://stackoverflow.com/) in that case.
### 📚 Documentation

View file

@ -6,7 +6,7 @@ title: "[Feature Request] request title"
---
**Important Note: We do not do technical support, nor consulting** and don't answer personal questions per email.
Please post your question on [reddit](https://www.reddit.com/r/reinforcementlearning/) or [stack overflow](https://stackoverflow.com/) in that case.
Please post your question on the [RL Discord](https://discord.com/invite/xhfNqQv), [Reddit](https://www.reddit.com/r/reinforcementlearning/) or [Stack Overflow](https://stackoverflow.com/) in that case.
### 🚀 Feature

View file

@ -6,7 +6,7 @@ title: "[Question] question title"
---
**Important Note: We do not do technical support, nor consulting** and don't answer personal questions per email.
Please post your question on [reddit](https://www.reddit.com/r/reinforcementlearning/) or [stack overflow](https://stackoverflow.com/) in that case.
Please post your question on the [RL Discord](https://discord.com/invite/xhfNqQv), [Reddit](https://www.reddit.com/r/reinforcementlearning/) or [Stack Overflow](https://stackoverflow.com/) in that case.
### Question

View file

@ -203,7 +203,7 @@ please tell us when if you want your project to appear on this page ;)
To cite this repository in publications:
```
```bibtex
@misc{stable-baselines3,
author = {Raffin, Antonin and Hill, Ashley and Ernestus, Maximilian and Gleave, Adam and Kanervisto, Anssi and Dormann, Noah},
title = {Stable Baselines3},
@ -219,6 +219,7 @@ To cite this repository in publications:
Stable-Baselines3 is currently maintained by [Ashley Hill](https://github.com/hill-a) (aka @hill-a), [Antonin Raffin](https://araffin.github.io/) (aka [@araffin](https://github.com/araffin)), [Maximilian Ernestus](https://github.com/ernestum) (aka @ernestum), [Adam Gleave](https://github.com/adamgleave) (@AdamGleave) and [Anssi Kanervisto](https://github.com/Miffyli) (@Miffyli).
**Important Note: We do not do technical support, nor consulting** and don't answer personal questions per email.
Please post your question on the [RL Discord](https://discord.com/invite/xhfNqQv), [Reddit](https://www.reddit.com/r/reinforcementlearning/) or [Stack Overflow](https://stackoverflow.com/) in that case.
## How To Contribute

View file

@ -27,8 +27,10 @@ using ``policy_kwargs`` parameter:
from stable_baselines3 import PPO
# Custom MLP policy of two layers of size 32 each with Relu activation function
policy_kwargs = dict(activation_fn=th.nn.ReLU, net_arch=[32, 32])
# Custom actor (pi) and value function (vf) networks
# of two layers of size 32 each with Relu activation function
policy_kwargs = dict(activation_fn=th.nn.ReLU,
net_arch=[dict(pi=[32, 32], vf=[32, 32])])
# Create the agent
model = PPO("MlpPolicy", "CartPole-v1", policy_kwargs=policy_kwargs, verbose=1)
# Retrieve the environment
@ -36,20 +38,11 @@ using ``policy_kwargs`` parameter:
# Train the agent
model.learn(total_timesteps=100000)
# Save the agent
model.save("ppo-cartpole")
model.save("ppo_cartpole")
del model
# the policy_kwargs are automatically loaded
model = PPO.load("ppo-cartpole")
You can also easily define a custom architecture for the policy (or value) network:
.. note::
Defining a custom policy class is equivalent to passing ``policy_kwargs``.
However, it lets you name the policy and so usually makes the code clearer.
``policy_kwargs`` is particularly useful when doing hyperparameter search.
model = PPO.load("ppo_cartpole")
Custom Feature Extractor
@ -58,6 +51,15 @@ Custom Feature Extractor
If you want to have a custom feature extractor (e.g. custom CNN when using images), you can define class
that derives from ``BaseFeaturesExtractor`` and then pass it to the model when training.
.. note::
By default the feature extractor is shared between the actor and the critic to save computation (when applicable).
However, this can be changed by defining a custom policy for on-policy algorithms or setting
``share_features_extractor=False`` in the ``policy_kwargs`` for off-policy algorithms
(and when applicable).
.. code-block:: python
import gym
@ -108,7 +110,6 @@ that derives from ``BaseFeaturesExtractor`` and then pass it to the model when t
On-Policy Algorithms
^^^^^^^^^^^^^^^^^^^^

View file

@ -36,6 +36,12 @@ This includes an optional dependencies like Tensorboard, OpenCV or ``atari-py``
pip install stable-baselines3
.. note::
If you need to work with OpenCV on a machine without a X-server (for instance inside a docker image),
you will need to install ``opencv-python-headless``, see `issue #298 <https://github.com/DLR-RM/stable-baselines3/issues/298>`_.
Bleeding-edge version
---------------------

View file

@ -70,6 +70,8 @@ Documentation:
- Fix bug in the example code of DQN (@AptX395)
- Add example on how to access the tensorboard summary writer directly. (@lorenz-h)
- Updated migration guide
- Updated custom policy doc (separate policy architecture recommended)
- Added a note about OpenCV headless version
Pre-Release 0.10.0 (2020-10-28)