From c895c1d46f5d24cc49ccb20e99089a141fe7f4c1 Mon Sep 17 00:00:00 2001 From: Thomas Gubler Date: Thu, 30 Dec 2021 11:28:12 +0100 Subject: [PATCH] Doc fix: A2C - fix guidance on RMSpropTFLike (#708) * doc: A2C/migration: fix guidance on RMSpropTFLike * Update changelog.rst Co-authored-by: Antonin RAFFIN --- docs/guide/migration.rst | 2 +- docs/misc/changelog.rst | 3 ++- docs/modules/a2c.rst | 2 +- 3 files changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/guide/migration.rst b/docs/guide/migration.rst index c9c15cd..d3bb9d1 100644 --- a/docs/guide/migration.rst +++ b/docs/guide/migration.rst @@ -113,7 +113,7 @@ A2C PyTorch implementation of RMSprop `differs from Tensorflow's `_, which leads to `different and potentially more unstable results `_. Use ``stable_baselines3.common.sb2_compat.rmsprop_tf_like.RMSpropTFLike`` optimizer to match the results - with TensorFlow's implementation. This can be done through ``policy_kwargs``: ``A2C(policy_kwargs=dict(optimizer_class=RMSpropTFLike, eps=1e-5))`` + with TensorFlow's implementation. This can be done through ``policy_kwargs``: ``A2C(policy_kwargs=dict(optimizer_class=RMSpropTFLike, optimizer_kwargs=dict(eps=1e-5)))`` PPO diff --git a/docs/misc/changelog.rst b/docs/misc/changelog.rst index 85598f7..09577fc 100644 --- a/docs/misc/changelog.rst +++ b/docs/misc/changelog.rst @@ -54,6 +54,7 @@ Documentation: - Updated ``BaseAlgorithm.load`` docstring (@Demetrio92) - Added a note on ``load`` behavior in the examples (@Demetrio92) - Updated SB3 Contrib doc +- Fixed A2C and migration guide guidance on how to set epsilon with RMSpropTFLike (@thomasgubler) Release 1.3.0 (2021-10-23) --------------------------- @@ -858,4 +859,4 @@ And all the contributors: @ShangqunYu @PierreExeter @JacopoPan @ltbd78 @tom-doerr @Atlis @liusida @09tangriro @amy12xx @juancroldan @benblack769 @bstee615 @c-rizz @skandermoalla @MihaiAnca13 @davidblom603 @ayeright @cyprienc @wkirgsn @AechPro @CUN-bjy @batu @IljaAvadiev @timokau @kachayev @cleversonahum -@eleurent @ac-93 @cove9988 @theDebugger811 @hsuehch @Demetrio92 +@eleurent @ac-93 @cove9988 @theDebugger811 @hsuehch @Demetrio92 @thomasgubler diff --git a/docs/modules/a2c.rst b/docs/modules/a2c.rst index 3f672ee..e871424 100644 --- a/docs/modules/a2c.rst +++ b/docs/modules/a2c.rst @@ -14,7 +14,7 @@ It uses multiple workers to avoid the use of a replay buffer. If you find training unstable or want to match performance of stable-baselines A2C, consider using ``RMSpropTFLike`` optimizer from ``stable_baselines3.common.sb2_compat.rmsprop_tf_like``. - You can change optimizer with ``A2C(policy_kwargs=dict(optimizer_class=RMSpropTFLike, eps=1e-5))``. + You can change optimizer with ``A2C(policy_kwargs=dict(optimizer_class=RMSpropTFLike, optimizer_kwargs=dict(eps=1e-5)))``. Read more `here `_.