diff --git a/docs/conda_env.yml b/docs/conda_env.yml index ac065b3..c9b1392 100644 --- a/docs/conda_env.yml +++ b/docs/conda_env.yml @@ -14,6 +14,6 @@ dependencies: - pandas - numpy>=1.20,<2.0 - matplotlib - - sphinx>=5,<8 + - sphinx>=5,<9 - sphinx_rtd_theme>=1.3.0 - sphinx_copybutton diff --git a/docs/misc/changelog.rst b/docs/misc/changelog.rst index cf2a2a5..23a3413 100644 --- a/docs/misc/changelog.rst +++ b/docs/misc/changelog.rst @@ -59,6 +59,7 @@ Bug Fixes: `SBX`_ (SB3 + Jax) ^^^^^^^^^^^^^^^^^^ - Added CNN support for DQN +- Bug fix for SAC and related algorithms, optimize log of ent coeff to be consistent with SB3 Deprecations: ^^^^^^^^^^^^^ @@ -80,6 +81,7 @@ Documentation: ^^^^^^^^^^^^^^ - Updated PPO doc to recommend using CPU with ``MlpPolicy`` - Clarified documentation about planned features and citing software +- Added a note about the fact we are optimizing log of ent coeff for SAC Release 2.3.2 (2024-04-27) -------------------------- diff --git a/docs/modules/dqn.rst b/docs/modules/dqn.rst index 85d4866..78f70f6 100644 --- a/docs/modules/dqn.rst +++ b/docs/modules/dqn.rst @@ -25,6 +25,7 @@ Notes - Original paper: https://arxiv.org/abs/1312.5602 - Further reference: https://www.nature.com/articles/nature14236 +- Tutorial "From Tabular Q-Learning to DQN": https://github.com/araffin/rlss23-dqn-tutorial .. note:: This implementation provides only vanilla Deep Q-Learning and has no extensions such as Double-DQN, Dueling-DQN and Prioritized Experience Replay. diff --git a/docs/modules/sac.rst b/docs/modules/sac.rst index 960a282..cf6191b 100644 --- a/docs/modules/sac.rst +++ b/docs/modules/sac.rst @@ -35,6 +35,9 @@ Notes which is the equivalent to the inverse of reward scale in the original SAC paper. The main reason is that it avoids having too high errors when updating the Q functions. +.. note:: + When automatically adjusting the temperature (alpha/entropy coefficient), we optimize the logarithm of the entropy coefficient instead of the entropy coefficient itself. This is consistent with the original implementation and has proven to be more stable + (see issues `GH#36 `_, `#55 `_ and others). .. note:: diff --git a/setup.py b/setup.py index 52f6264..15c02cb 100644 --- a/setup.py +++ b/setup.py @@ -101,7 +101,7 @@ setup( "black>=24.2.0,<25", ], "docs": [ - "sphinx>=5,<8", + "sphinx>=5,<9", "sphinx-autobuild", "sphinx-rtd-theme>=1.3.0", # For spelling