mirror of
https://github.com/saymrwulf/stable-baselines3.git
synced 2026-05-14 20:58:03 +00:00
Add note about SAC ent coeff optimization (#2037)
* Allow new sphinx version * Add note about SAC ent coeff and add DQN tutorial link
This commit is contained in:
parent
8f0b488bc5
commit
e4f4f123e3
5 changed files with 8 additions and 2 deletions
|
|
@ -14,6 +14,6 @@ dependencies:
|
|||
- pandas
|
||||
- numpy>=1.20,<2.0
|
||||
- matplotlib
|
||||
- sphinx>=5,<8
|
||||
- sphinx>=5,<9
|
||||
- sphinx_rtd_theme>=1.3.0
|
||||
- sphinx_copybutton
|
||||
|
|
|
|||
|
|
@ -59,6 +59,7 @@ Bug Fixes:
|
|||
`SBX`_ (SB3 + Jax)
|
||||
^^^^^^^^^^^^^^^^^^
|
||||
- Added CNN support for DQN
|
||||
- Bug fix for SAC and related algorithms, optimize log of ent coeff to be consistent with SB3
|
||||
|
||||
Deprecations:
|
||||
^^^^^^^^^^^^^
|
||||
|
|
@ -80,6 +81,7 @@ Documentation:
|
|||
^^^^^^^^^^^^^^
|
||||
- Updated PPO doc to recommend using CPU with ``MlpPolicy``
|
||||
- Clarified documentation about planned features and citing software
|
||||
- Added a note about the fact we are optimizing log of ent coeff for SAC
|
||||
|
||||
Release 2.3.2 (2024-04-27)
|
||||
--------------------------
|
||||
|
|
|
|||
|
|
@ -25,6 +25,7 @@ Notes
|
|||
|
||||
- Original paper: https://arxiv.org/abs/1312.5602
|
||||
- Further reference: https://www.nature.com/articles/nature14236
|
||||
- Tutorial "From Tabular Q-Learning to DQN": https://github.com/araffin/rlss23-dqn-tutorial
|
||||
|
||||
.. note::
|
||||
This implementation provides only vanilla Deep Q-Learning and has no extensions such as Double-DQN, Dueling-DQN and Prioritized Experience Replay.
|
||||
|
|
|
|||
|
|
@ -35,6 +35,9 @@ Notes
|
|||
which is the equivalent to the inverse of reward scale in the original SAC paper.
|
||||
The main reason is that it avoids having too high errors when updating the Q functions.
|
||||
|
||||
.. note::
|
||||
When automatically adjusting the temperature (alpha/entropy coefficient), we optimize the logarithm of the entropy coefficient instead of the entropy coefficient itself. This is consistent with the original implementation and has proven to be more stable
|
||||
(see issues `GH#36 <https://github.com/DLR-RM/stable-baselines3/issues/36>`_, `#55 <https://github.com/araffin/sbx/issues/55>`_ and others).
|
||||
|
||||
.. note::
|
||||
|
||||
|
|
|
|||
2
setup.py
2
setup.py
|
|
@ -101,7 +101,7 @@ setup(
|
|||
"black>=24.2.0,<25",
|
||||
],
|
||||
"docs": [
|
||||
"sphinx>=5,<8",
|
||||
"sphinx>=5,<9",
|
||||
"sphinx-autobuild",
|
||||
"sphinx-rtd-theme>=1.3.0",
|
||||
# For spelling
|
||||
|
|
|
|||
Loading…
Reference in a new issue