mirror of
https://github.com/saymrwulf/stable-baselines3.git
synced 2026-06-29 03:31:08 +00:00
Doc fix: A2C - fix guidance on RMSpropTFLike (#708)
* doc: A2C/migration: fix guidance on RMSpropTFLike * Update changelog.rst Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
This commit is contained in:
parent
4a5dfaedfc
commit
c895c1d46f
3 changed files with 4 additions and 3 deletions
|
|
@ -113,7 +113,7 @@ A2C
|
|||
PyTorch implementation of RMSprop `differs from Tensorflow's <https://github.com/pytorch/pytorch/issues/23796>`_,
|
||||
which leads to `different and potentially more unstable results <https://github.com/DLR-RM/stable-baselines3/pull/110#issuecomment-663255241>`_.
|
||||
Use ``stable_baselines3.common.sb2_compat.rmsprop_tf_like.RMSpropTFLike`` optimizer to match the results
|
||||
with TensorFlow's implementation. This can be done through ``policy_kwargs``: ``A2C(policy_kwargs=dict(optimizer_class=RMSpropTFLike, eps=1e-5))``
|
||||
with TensorFlow's implementation. This can be done through ``policy_kwargs``: ``A2C(policy_kwargs=dict(optimizer_class=RMSpropTFLike, optimizer_kwargs=dict(eps=1e-5)))``
|
||||
|
||||
|
||||
PPO
|
||||
|
|
|
|||
|
|
@ -54,6 +54,7 @@ Documentation:
|
|||
- Updated ``BaseAlgorithm.load`` docstring (@Demetrio92)
|
||||
- Added a note on ``load`` behavior in the examples (@Demetrio92)
|
||||
- Updated SB3 Contrib doc
|
||||
- Fixed A2C and migration guide guidance on how to set epsilon with RMSpropTFLike (@thomasgubler)
|
||||
|
||||
Release 1.3.0 (2021-10-23)
|
||||
---------------------------
|
||||
|
|
@ -858,4 +859,4 @@ And all the contributors:
|
|||
@ShangqunYu @PierreExeter @JacopoPan @ltbd78 @tom-doerr @Atlis @liusida @09tangriro @amy12xx @juancroldan
|
||||
@benblack769 @bstee615 @c-rizz @skandermoalla @MihaiAnca13 @davidblom603 @ayeright @cyprienc
|
||||
@wkirgsn @AechPro @CUN-bjy @batu @IljaAvadiev @timokau @kachayev @cleversonahum
|
||||
@eleurent @ac-93 @cove9988 @theDebugger811 @hsuehch @Demetrio92
|
||||
@eleurent @ac-93 @cove9988 @theDebugger811 @hsuehch @Demetrio92 @thomasgubler
|
||||
|
|
|
|||
|
|
@ -14,7 +14,7 @@ It uses multiple workers to avoid the use of a replay buffer.
|
|||
|
||||
If you find training unstable or want to match performance of stable-baselines A2C, consider using
|
||||
``RMSpropTFLike`` optimizer from ``stable_baselines3.common.sb2_compat.rmsprop_tf_like``.
|
||||
You can change optimizer with ``A2C(policy_kwargs=dict(optimizer_class=RMSpropTFLike, eps=1e-5))``.
|
||||
You can change optimizer with ``A2C(policy_kwargs=dict(optimizer_class=RMSpropTFLike, optimizer_kwargs=dict(eps=1e-5)))``.
|
||||
Read more `here <https://github.com/DLR-RM/stable-baselines3/pull/110#issuecomment-663255241>`_.
|
||||
|
||||
|
||||
|
|
|
|||
Loading…
Reference in a new issue