diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md index 8b8fd5b..f67b4f2 100644 --- a/.github/ISSUE_TEMPLATE/bug_report.md +++ b/.github/ISSUE_TEMPLATE/bug_report.md @@ -6,7 +6,7 @@ title: "[Bug] bug title" --- **Important Note: We do not do technical support, nor consulting** and don't answer personal questions per email. -Please post your question on [reddit](https://www.reddit.com/r/reinforcementlearning/) or [stack overflow](https://stackoverflow.com/) in that case. +Please post your question on the [RL Discord](https://discord.com/invite/xhfNqQv), [Reddit](https://www.reddit.com/r/reinforcementlearning/) or [Stack Overflow](https://stackoverflow.com/) in that case. If your issue is related to a **custom gym environment**, please use the custom gym env template. diff --git a/.github/ISSUE_TEMPLATE/custom_env.md b/.github/ISSUE_TEMPLATE/custom_env.md index 0c9802e..bea9b59 100644 --- a/.github/ISSUE_TEMPLATE/custom_env.md +++ b/.github/ISSUE_TEMPLATE/custom_env.md @@ -5,7 +5,7 @@ labels: question, custom gym env --- **Important Note: We do not do technical support, nor consulting** and don't answer personal questions per email. -Please post your question on [reddit](https://www.reddit.com/r/reinforcementlearning/) or [stack overflow](https://stackoverflow.com/) in that case. +Please post your question on the [RL Discord](https://discord.com/invite/xhfNqQv), [Reddit](https://www.reddit.com/r/reinforcementlearning/) or [Stack Overflow](https://stackoverflow.com/) in that case. ### 🤖 Custom Gym Environment diff --git a/.github/ISSUE_TEMPLATE/documentation.md b/.github/ISSUE_TEMPLATE/documentation.md index f87070d..59e5da5 100644 --- a/.github/ISSUE_TEMPLATE/documentation.md +++ b/.github/ISSUE_TEMPLATE/documentation.md @@ -5,7 +5,7 @@ labels: documentation --- **Important Note: We do not do technical support, nor consulting** and don't answer personal questions per email. -Please post your question on [reddit](https://www.reddit.com/r/reinforcementlearning/) or [stack overflow](https://stackoverflow.com/) in that case. +Please post your question on the [RL Discord](https://discord.com/invite/xhfNqQv), [Reddit](https://www.reddit.com/r/reinforcementlearning/) or [Stack Overflow](https://stackoverflow.com/) in that case. ### 📚 Documentation diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md index 667b049..a650863 100644 --- a/.github/ISSUE_TEMPLATE/feature_request.md +++ b/.github/ISSUE_TEMPLATE/feature_request.md @@ -6,7 +6,7 @@ title: "[Feature Request] request title" --- **Important Note: We do not do technical support, nor consulting** and don't answer personal questions per email. -Please post your question on [reddit](https://www.reddit.com/r/reinforcementlearning/) or [stack overflow](https://stackoverflow.com/) in that case. +Please post your question on the [RL Discord](https://discord.com/invite/xhfNqQv), [Reddit](https://www.reddit.com/r/reinforcementlearning/) or [Stack Overflow](https://stackoverflow.com/) in that case. ### 🚀 Feature diff --git a/.github/ISSUE_TEMPLATE/question.md b/.github/ISSUE_TEMPLATE/question.md index 30fe1f9..b3288d8 100644 --- a/.github/ISSUE_TEMPLATE/question.md +++ b/.github/ISSUE_TEMPLATE/question.md @@ -6,7 +6,7 @@ title: "[Question] question title" --- **Important Note: We do not do technical support, nor consulting** and don't answer personal questions per email. -Please post your question on [reddit](https://www.reddit.com/r/reinforcementlearning/) or [stack overflow](https://stackoverflow.com/) in that case. +Please post your question on the [RL Discord](https://discord.com/invite/xhfNqQv), [Reddit](https://www.reddit.com/r/reinforcementlearning/) or [Stack Overflow](https://stackoverflow.com/) in that case. ### Question diff --git a/README.md b/README.md index 600c5da..92c36f6 100644 --- a/README.md +++ b/README.md @@ -203,7 +203,7 @@ please tell us when if you want your project to appear on this page ;) To cite this repository in publications: -``` +```bibtex @misc{stable-baselines3, author = {Raffin, Antonin and Hill, Ashley and Ernestus, Maximilian and Gleave, Adam and Kanervisto, Anssi and Dormann, Noah}, title = {Stable Baselines3}, @@ -219,6 +219,7 @@ To cite this repository in publications: Stable-Baselines3 is currently maintained by [Ashley Hill](https://github.com/hill-a) (aka @hill-a), [Antonin Raffin](https://araffin.github.io/) (aka [@araffin](https://github.com/araffin)), [Maximilian Ernestus](https://github.com/ernestum) (aka @ernestum), [Adam Gleave](https://github.com/adamgleave) (@AdamGleave) and [Anssi Kanervisto](https://github.com/Miffyli) (@Miffyli). **Important Note: We do not do technical support, nor consulting** and don't answer personal questions per email. +Please post your question on the [RL Discord](https://discord.com/invite/xhfNqQv), [Reddit](https://www.reddit.com/r/reinforcementlearning/) or [Stack Overflow](https://stackoverflow.com/) in that case. ## How To Contribute diff --git a/docs/guide/custom_policy.rst b/docs/guide/custom_policy.rst index 8970cda..1686258 100644 --- a/docs/guide/custom_policy.rst +++ b/docs/guide/custom_policy.rst @@ -27,8 +27,10 @@ using ``policy_kwargs`` parameter: from stable_baselines3 import PPO - # Custom MLP policy of two layers of size 32 each with Relu activation function - policy_kwargs = dict(activation_fn=th.nn.ReLU, net_arch=[32, 32]) + # Custom actor (pi) and value function (vf) networks + # of two layers of size 32 each with Relu activation function + policy_kwargs = dict(activation_fn=th.nn.ReLU, + net_arch=[dict(pi=[32, 32], vf=[32, 32])]) # Create the agent model = PPO("MlpPolicy", "CartPole-v1", policy_kwargs=policy_kwargs, verbose=1) # Retrieve the environment @@ -36,20 +38,11 @@ using ``policy_kwargs`` parameter: # Train the agent model.learn(total_timesteps=100000) # Save the agent - model.save("ppo-cartpole") + model.save("ppo_cartpole") del model # the policy_kwargs are automatically loaded - model = PPO.load("ppo-cartpole") - - -You can also easily define a custom architecture for the policy (or value) network: - -.. note:: - - Defining a custom policy class is equivalent to passing ``policy_kwargs``. - However, it lets you name the policy and so usually makes the code clearer. - ``policy_kwargs`` is particularly useful when doing hyperparameter search. + model = PPO.load("ppo_cartpole") Custom Feature Extractor @@ -58,6 +51,15 @@ Custom Feature Extractor If you want to have a custom feature extractor (e.g. custom CNN when using images), you can define class that derives from ``BaseFeaturesExtractor`` and then pass it to the model when training. + +.. note:: + + By default the feature extractor is shared between the actor and the critic to save computation (when applicable). + However, this can be changed by defining a custom policy for on-policy algorithms or setting + ``share_features_extractor=False`` in the ``policy_kwargs`` for off-policy algorithms + (and when applicable). + + .. code-block:: python import gym @@ -108,7 +110,6 @@ that derives from ``BaseFeaturesExtractor`` and then pass it to the model when t - On-Policy Algorithms ^^^^^^^^^^^^^^^^^^^^ diff --git a/docs/guide/install.rst b/docs/guide/install.rst index e5ec926..9632777 100644 --- a/docs/guide/install.rst +++ b/docs/guide/install.rst @@ -36,6 +36,12 @@ This includes an optional dependencies like Tensorboard, OpenCV or ``atari-py`` pip install stable-baselines3 +.. note:: + + If you need to work with OpenCV on a machine without a X-server (for instance inside a docker image), + you will need to install ``opencv-python-headless``, see `issue #298 `_. + + Bleeding-edge version --------------------- diff --git a/docs/misc/changelog.rst b/docs/misc/changelog.rst index 3ede50f..fbbcfe1 100644 --- a/docs/misc/changelog.rst +++ b/docs/misc/changelog.rst @@ -70,6 +70,8 @@ Documentation: - Fix bug in the example code of DQN (@AptX395) - Add example on how to access the tensorboard summary writer directly. (@lorenz-h) - Updated migration guide +- Updated custom policy doc (separate policy architecture recommended) +- Added a note about OpenCV headless version Pre-Release 0.10.0 (2020-10-28)