Add note about SAC ent coeff optimization (#2037)

* Allow new sphinx version

* Add note about SAC ent coeff and add DQN tutorial link
This commit is contained in:
Antonin RAFFIN 2024-11-08 11:01:04 +01:00 committed by GitHub
parent 8f0b488bc5
commit e4f4f123e3
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
5 changed files with 8 additions and 2 deletions

View file

@ -14,6 +14,6 @@ dependencies:
- pandas
- numpy>=1.20,<2.0
- matplotlib
- sphinx>=5,<8
- sphinx>=5,<9
- sphinx_rtd_theme>=1.3.0
- sphinx_copybutton

View file

@ -59,6 +59,7 @@ Bug Fixes:
`SBX`_ (SB3 + Jax)
^^^^^^^^^^^^^^^^^^
- Added CNN support for DQN
- Bug fix for SAC and related algorithms, optimize log of ent coeff to be consistent with SB3
Deprecations:
^^^^^^^^^^^^^
@ -80,6 +81,7 @@ Documentation:
^^^^^^^^^^^^^^
- Updated PPO doc to recommend using CPU with ``MlpPolicy``
- Clarified documentation about planned features and citing software
- Added a note about the fact we are optimizing log of ent coeff for SAC
Release 2.3.2 (2024-04-27)
--------------------------

View file

@ -25,6 +25,7 @@ Notes
- Original paper: https://arxiv.org/abs/1312.5602
- Further reference: https://www.nature.com/articles/nature14236
- Tutorial "From Tabular Q-Learning to DQN": https://github.com/araffin/rlss23-dqn-tutorial
.. note::
This implementation provides only vanilla Deep Q-Learning and has no extensions such as Double-DQN, Dueling-DQN and Prioritized Experience Replay.

View file

@ -35,6 +35,9 @@ Notes
which is the equivalent to the inverse of reward scale in the original SAC paper.
The main reason is that it avoids having too high errors when updating the Q functions.
.. note::
When automatically adjusting the temperature (alpha/entropy coefficient), we optimize the logarithm of the entropy coefficient instead of the entropy coefficient itself. This is consistent with the original implementation and has proven to be more stable
(see issues `GH#36 <https://github.com/DLR-RM/stable-baselines3/issues/36>`_, `#55 <https://github.com/araffin/sbx/issues/55>`_ and others).
.. note::

View file

@ -101,7 +101,7 @@ setup(
"black>=24.2.0,<25",
],
"docs": [
"sphinx>=5,<8",
"sphinx>=5,<9",
"sphinx-autobuild",
"sphinx-rtd-theme>=1.3.0",
# For spelling