Update custom policy documentation (#312)

* Update README * Update custom policy documentation * Add discord link * Add note about OpenCV headless version
2026-07-20 19:12:43 +00:00 · 2021-02-06 18:19:58 +01:00 · 2021-02-06 18:19:58 +01:00 · 48a19a43ec
commit 48a19a43ec
parent b01bde3e2d
9 changed files with 30 additions and 20 deletions
--- a/.github/ISSUE_TEMPLATE/bug_report.md
+++ b/.github/ISSUE_TEMPLATE/bug_report.md
@ -6,7 +6,7 @@ title: "[Bug] bug title"
 ---

 **Important Note: We do not do technical support, nor consulting** and don't answer personal questions per email.
-Please post your question on [reddit](https://www.reddit.com/r/reinforcementlearning/) or [stack overflow](https://stackoverflow.com/) in that case.
+Please post your question on the [RL Discord](https://discord.com/invite/xhfNqQv), [Reddit](https://www.reddit.com/r/reinforcementlearning/) or [Stack Overflow](https://stackoverflow.com/) in that case.


 If your issue is related to a **custom gym environment**, please use the custom gym env template.
--- a/.github/ISSUE_TEMPLATE/custom_env.md
+++ b/.github/ISSUE_TEMPLATE/custom_env.md
@ -5,7 +5,7 @@ labels: question, custom gym env
 ---

 **Important Note: We do not do technical support, nor consulting** and don't answer personal questions per email.
-Please post your question on [reddit](https://www.reddit.com/r/reinforcementlearning/) or [stack overflow](https://stackoverflow.com/) in that case.
+Please post your question on the [RL Discord](https://discord.com/invite/xhfNqQv), [Reddit](https://www.reddit.com/r/reinforcementlearning/) or [Stack Overflow](https://stackoverflow.com/) in that case.

 ### 🤖 Custom Gym Environment

--- a/.github/ISSUE_TEMPLATE/documentation.md
+++ b/.github/ISSUE_TEMPLATE/documentation.md
@ -5,7 +5,7 @@ labels: documentation
 ---

 **Important Note: We do not do technical support, nor consulting** and don't answer personal questions per email.
-Please post your question on [reddit](https://www.reddit.com/r/reinforcementlearning/) or [stack overflow](https://stackoverflow.com/) in that case.
+Please post your question on the [RL Discord](https://discord.com/invite/xhfNqQv), [Reddit](https://www.reddit.com/r/reinforcementlearning/) or [Stack Overflow](https://stackoverflow.com/) in that case.

 ### 📚 Documentation

--- a/.github/ISSUE_TEMPLATE/feature_request.md
+++ b/.github/ISSUE_TEMPLATE/feature_request.md
@ -6,7 +6,7 @@ title: "[Feature Request] request title"
 ---

 **Important Note: We do not do technical support, nor consulting** and don't answer personal questions per email.
-Please post your question on [reddit](https://www.reddit.com/r/reinforcementlearning/) or [stack overflow](https://stackoverflow.com/) in that case.
+Please post your question on the [RL Discord](https://discord.com/invite/xhfNqQv), [Reddit](https://www.reddit.com/r/reinforcementlearning/) or [Stack Overflow](https://stackoverflow.com/) in that case.


 ### 🚀 Feature
--- a/.github/ISSUE_TEMPLATE/question.md
+++ b/.github/ISSUE_TEMPLATE/question.md
@ -6,7 +6,7 @@ title: "[Question] question title"
 ---

 **Important Note: We do not do technical support, nor consulting** and don't answer personal questions per email.
-Please post your question on [reddit](https://www.reddit.com/r/reinforcementlearning/) or [stack overflow](https://stackoverflow.com/) in that case.
+Please post your question on the [RL Discord](https://discord.com/invite/xhfNqQv), [Reddit](https://www.reddit.com/r/reinforcementlearning/) or [Stack Overflow](https://stackoverflow.com/) in that case.


 ### Question
--- a/README.md
+++ b/README.md
@ -203,7 +203,7 @@ please tell us when if you want your project to appear on this page ;)

 To cite this repository in publications:

-```
+```bibtex
@misc{stable-baselines3,
  author = {Raffin, Antonin and Hill, Ashley and Ernestus, Maximilian and Gleave, Adam and Kanervisto, Anssi and Dormann, Noah},
  title = {Stable Baselines3},
@ -219,6 +219,7 @@ To cite this repository in publications:
 Stable-Baselines3 is currently maintained by [Ashley Hill](https://github.com/hill-a) (aka @hill-a), [Antonin Raffin](https://araffin.github.io/) (aka [@araffin](https://github.com/araffin)), [Maximilian Ernestus](https://github.com/ernestum) (aka @ernestum), [Adam Gleave](https://github.com/adamgleave) (@AdamGleave) and [Anssi Kanervisto](https://github.com/Miffyli) (@Miffyli).

 **Important Note: We do not do technical support, nor consulting** and don't answer personal questions per email.
+Please post your question on the [RL Discord](https://discord.com/invite/xhfNqQv), [Reddit](https://www.reddit.com/r/reinforcementlearning/) or [Stack Overflow](https://stackoverflow.com/) in that case.


 ## How To Contribute
--- a/docs/guide/custom_policy.rst
+++ b/docs/guide/custom_policy.rst
@ -27,8 +27,10 @@ using ``policy_kwargs`` parameter:

  from stable_baselines3 import PPO

-  # Custom MLP policy of two layers of size 32 each with Relu activation function
-  policy_kwargs = dict(activation_fn=th.nn.ReLU, net_arch=[32, 32])
+  # Custom actor (pi) and value function (vf) networks
+  # of two layers of size 32 each with Relu activation function
+  policy_kwargs = dict(activation_fn=th.nn.ReLU,
+                       net_arch=[dict(pi=[32, 32], vf=[32, 32])])
  # Create the agent
  model = PPO("MlpPolicy", "CartPole-v1", policy_kwargs=policy_kwargs, verbose=1)
  # Retrieve the environment
@ -36,20 +38,11 @@ using ``policy_kwargs`` parameter:
  # Train the agent
  model.learn(total_timesteps=100000)
  # Save the agent
-  model.save("ppo-cartpole")
+  model.save("ppo_cartpole")

  del model
  # the policy_kwargs are automatically loaded
-  model = PPO.load("ppo-cartpole")
-
-
-You can also easily define a custom architecture for the policy (or value) network:
-
-.. note::
-
-    Defining a custom policy class is equivalent to passing ``policy_kwargs``.
-    However, it lets you name the policy and so usually makes the code clearer.
-    ``policy_kwargs`` is particularly useful when doing hyperparameter search.
+  model = PPO.load("ppo_cartpole")


 Custom Feature Extractor
@ -58,6 +51,15 @@ Custom Feature Extractor
 If you want to have a custom feature extractor (e.g. custom CNN when using images), you can define class
 that derives from ``BaseFeaturesExtractor`` and then pass it to the model when training.

+
+.. note::
+
+  By default the feature extractor is shared between the actor and the critic to save computation (when applicable).
+  However, this can be changed by defining a custom policy for on-policy algorithms or setting
+  ``share_features_extractor=False`` in the ``policy_kwargs`` for off-policy algorithms
+  (and when applicable).
+
+
 .. code-block:: python

  import gym
@ -108,7 +110,6 @@ that derives from ``BaseFeaturesExtractor`` and then pass it to the model when t



-
 On-Policy Algorithms
 ^^^^^^^^^^^^^^^^^^^^

--- a/docs/guide/install.rst
+++ b/docs/guide/install.rst
@ -36,6 +36,12 @@ This includes an optional dependencies like Tensorboard, OpenCV or ``atari-py``
    pip install stable-baselines3


+.. note::
+
+  If you need to work with OpenCV on a machine without a X-server (for instance inside a docker image),
+  you will need to install ``opencv-python-headless``, see `issue #298 <https://github.com/DLR-RM/stable-baselines3/issues/298>`_.
+
+
 Bleeding-edge version
 ---------------------

--- a/docs/misc/changelog.rst
+++ b/docs/misc/changelog.rst
@ -70,6 +70,8 @@ Documentation:
 - Fix bug in the example code of DQN (@AptX395)
 - Add example on how to access the tensorboard summary writer directly. (@lorenz-h)
 - Updated migration guide
+- Updated custom policy doc (separate policy architecture recommended)
+- Added a note about OpenCV headless version


 Pre-Release 0.10.0 (2020-10-28)