mirror of
https://github.com/saymrwulf/stable-baselines3.git
synced 2026-06-30 03:38:13 +00:00
Update custom policy documentation (#312)
* Update README * Update custom policy documentation * Add discord link * Add note about OpenCV headless version
This commit is contained in:
parent
b01bde3e2d
commit
48a19a43ec
9 changed files with 30 additions and 20 deletions
2
.github/ISSUE_TEMPLATE/bug_report.md
vendored
2
.github/ISSUE_TEMPLATE/bug_report.md
vendored
|
|
@ -6,7 +6,7 @@ title: "[Bug] bug title"
|
|||
---
|
||||
|
||||
**Important Note: We do not do technical support, nor consulting** and don't answer personal questions per email.
|
||||
Please post your question on [reddit](https://www.reddit.com/r/reinforcementlearning/) or [stack overflow](https://stackoverflow.com/) in that case.
|
||||
Please post your question on the [RL Discord](https://discord.com/invite/xhfNqQv), [Reddit](https://www.reddit.com/r/reinforcementlearning/) or [Stack Overflow](https://stackoverflow.com/) in that case.
|
||||
|
||||
|
||||
If your issue is related to a **custom gym environment**, please use the custom gym env template.
|
||||
|
|
|
|||
2
.github/ISSUE_TEMPLATE/custom_env.md
vendored
2
.github/ISSUE_TEMPLATE/custom_env.md
vendored
|
|
@ -5,7 +5,7 @@ labels: question, custom gym env
|
|||
---
|
||||
|
||||
**Important Note: We do not do technical support, nor consulting** and don't answer personal questions per email.
|
||||
Please post your question on [reddit](https://www.reddit.com/r/reinforcementlearning/) or [stack overflow](https://stackoverflow.com/) in that case.
|
||||
Please post your question on the [RL Discord](https://discord.com/invite/xhfNqQv), [Reddit](https://www.reddit.com/r/reinforcementlearning/) or [Stack Overflow](https://stackoverflow.com/) in that case.
|
||||
|
||||
### 🤖 Custom Gym Environment
|
||||
|
||||
|
|
|
|||
2
.github/ISSUE_TEMPLATE/documentation.md
vendored
2
.github/ISSUE_TEMPLATE/documentation.md
vendored
|
|
@ -5,7 +5,7 @@ labels: documentation
|
|||
---
|
||||
|
||||
**Important Note: We do not do technical support, nor consulting** and don't answer personal questions per email.
|
||||
Please post your question on [reddit](https://www.reddit.com/r/reinforcementlearning/) or [stack overflow](https://stackoverflow.com/) in that case.
|
||||
Please post your question on the [RL Discord](https://discord.com/invite/xhfNqQv), [Reddit](https://www.reddit.com/r/reinforcementlearning/) or [Stack Overflow](https://stackoverflow.com/) in that case.
|
||||
|
||||
### 📚 Documentation
|
||||
|
||||
|
|
|
|||
2
.github/ISSUE_TEMPLATE/feature_request.md
vendored
2
.github/ISSUE_TEMPLATE/feature_request.md
vendored
|
|
@ -6,7 +6,7 @@ title: "[Feature Request] request title"
|
|||
---
|
||||
|
||||
**Important Note: We do not do technical support, nor consulting** and don't answer personal questions per email.
|
||||
Please post your question on [reddit](https://www.reddit.com/r/reinforcementlearning/) or [stack overflow](https://stackoverflow.com/) in that case.
|
||||
Please post your question on the [RL Discord](https://discord.com/invite/xhfNqQv), [Reddit](https://www.reddit.com/r/reinforcementlearning/) or [Stack Overflow](https://stackoverflow.com/) in that case.
|
||||
|
||||
|
||||
### 🚀 Feature
|
||||
|
|
|
|||
2
.github/ISSUE_TEMPLATE/question.md
vendored
2
.github/ISSUE_TEMPLATE/question.md
vendored
|
|
@ -6,7 +6,7 @@ title: "[Question] question title"
|
|||
---
|
||||
|
||||
**Important Note: We do not do technical support, nor consulting** and don't answer personal questions per email.
|
||||
Please post your question on [reddit](https://www.reddit.com/r/reinforcementlearning/) or [stack overflow](https://stackoverflow.com/) in that case.
|
||||
Please post your question on the [RL Discord](https://discord.com/invite/xhfNqQv), [Reddit](https://www.reddit.com/r/reinforcementlearning/) or [Stack Overflow](https://stackoverflow.com/) in that case.
|
||||
|
||||
|
||||
### Question
|
||||
|
|
|
|||
|
|
@ -203,7 +203,7 @@ please tell us when if you want your project to appear on this page ;)
|
|||
|
||||
To cite this repository in publications:
|
||||
|
||||
```
|
||||
```bibtex
|
||||
@misc{stable-baselines3,
|
||||
author = {Raffin, Antonin and Hill, Ashley and Ernestus, Maximilian and Gleave, Adam and Kanervisto, Anssi and Dormann, Noah},
|
||||
title = {Stable Baselines3},
|
||||
|
|
@ -219,6 +219,7 @@ To cite this repository in publications:
|
|||
Stable-Baselines3 is currently maintained by [Ashley Hill](https://github.com/hill-a) (aka @hill-a), [Antonin Raffin](https://araffin.github.io/) (aka [@araffin](https://github.com/araffin)), [Maximilian Ernestus](https://github.com/ernestum) (aka @ernestum), [Adam Gleave](https://github.com/adamgleave) (@AdamGleave) and [Anssi Kanervisto](https://github.com/Miffyli) (@Miffyli).
|
||||
|
||||
**Important Note: We do not do technical support, nor consulting** and don't answer personal questions per email.
|
||||
Please post your question on the [RL Discord](https://discord.com/invite/xhfNqQv), [Reddit](https://www.reddit.com/r/reinforcementlearning/) or [Stack Overflow](https://stackoverflow.com/) in that case.
|
||||
|
||||
|
||||
## How To Contribute
|
||||
|
|
|
|||
|
|
@ -27,8 +27,10 @@ using ``policy_kwargs`` parameter:
|
|||
|
||||
from stable_baselines3 import PPO
|
||||
|
||||
# Custom MLP policy of two layers of size 32 each with Relu activation function
|
||||
policy_kwargs = dict(activation_fn=th.nn.ReLU, net_arch=[32, 32])
|
||||
# Custom actor (pi) and value function (vf) networks
|
||||
# of two layers of size 32 each with Relu activation function
|
||||
policy_kwargs = dict(activation_fn=th.nn.ReLU,
|
||||
net_arch=[dict(pi=[32, 32], vf=[32, 32])])
|
||||
# Create the agent
|
||||
model = PPO("MlpPolicy", "CartPole-v1", policy_kwargs=policy_kwargs, verbose=1)
|
||||
# Retrieve the environment
|
||||
|
|
@ -36,20 +38,11 @@ using ``policy_kwargs`` parameter:
|
|||
# Train the agent
|
||||
model.learn(total_timesteps=100000)
|
||||
# Save the agent
|
||||
model.save("ppo-cartpole")
|
||||
model.save("ppo_cartpole")
|
||||
|
||||
del model
|
||||
# the policy_kwargs are automatically loaded
|
||||
model = PPO.load("ppo-cartpole")
|
||||
|
||||
|
||||
You can also easily define a custom architecture for the policy (or value) network:
|
||||
|
||||
.. note::
|
||||
|
||||
Defining a custom policy class is equivalent to passing ``policy_kwargs``.
|
||||
However, it lets you name the policy and so usually makes the code clearer.
|
||||
``policy_kwargs`` is particularly useful when doing hyperparameter search.
|
||||
model = PPO.load("ppo_cartpole")
|
||||
|
||||
|
||||
Custom Feature Extractor
|
||||
|
|
@ -58,6 +51,15 @@ Custom Feature Extractor
|
|||
If you want to have a custom feature extractor (e.g. custom CNN when using images), you can define class
|
||||
that derives from ``BaseFeaturesExtractor`` and then pass it to the model when training.
|
||||
|
||||
|
||||
.. note::
|
||||
|
||||
By default the feature extractor is shared between the actor and the critic to save computation (when applicable).
|
||||
However, this can be changed by defining a custom policy for on-policy algorithms or setting
|
||||
``share_features_extractor=False`` in the ``policy_kwargs`` for off-policy algorithms
|
||||
(and when applicable).
|
||||
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import gym
|
||||
|
|
@ -108,7 +110,6 @@ that derives from ``BaseFeaturesExtractor`` and then pass it to the model when t
|
|||
|
||||
|
||||
|
||||
|
||||
On-Policy Algorithms
|
||||
^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
|
|
|
|||
|
|
@ -36,6 +36,12 @@ This includes an optional dependencies like Tensorboard, OpenCV or ``atari-py``
|
|||
pip install stable-baselines3
|
||||
|
||||
|
||||
.. note::
|
||||
|
||||
If you need to work with OpenCV on a machine without a X-server (for instance inside a docker image),
|
||||
you will need to install ``opencv-python-headless``, see `issue #298 <https://github.com/DLR-RM/stable-baselines3/issues/298>`_.
|
||||
|
||||
|
||||
Bleeding-edge version
|
||||
---------------------
|
||||
|
||||
|
|
|
|||
|
|
@ -70,6 +70,8 @@ Documentation:
|
|||
- Fix bug in the example code of DQN (@AptX395)
|
||||
- Add example on how to access the tensorboard summary writer directly. (@lorenz-h)
|
||||
- Updated migration guide
|
||||
- Updated custom policy doc (separate policy architecture recommended)
|
||||
- Added a note about OpenCV headless version
|
||||
|
||||
|
||||
Pre-Release 0.10.0 (2020-10-28)
|
||||
|
|
|
|||
Loading…
Reference in a new issue