Merge branch 'master' into sde

2026-06-29 03:31:08 +00:00 · 2020-07-30 09:59:49 +02:00 · 2020-07-30 09:59:49 +02:00 · 151104c07d
commit 151104c07d
parent d2fb9b66b4 8f9aaaebe9
3 changed files with 4 additions and 3 deletions
--- a/docs/misc/changelog.rst
+++ b/docs/misc/changelog.rst
@ -32,6 +32,7 @@ Bug Fixes:
 - Fix target for updating q values in SAC: the entropy term was not conditioned by terminals states
 - Use ``cloudpickle.load`` instead of ``pickle.load`` in ``CloudpickleWrapper``. (@shwang)
 - Fixed a bug with orthogonal initialization when `bias=False` in custom policy (@rk37)
+- Fixed approximate entropy calculation in PPO and A2C. (@andyshih12)

 Deprecations:
 ^^^^^^^^^^^^^
@ -355,4 +356,4 @@ And all the contributors:
@Miffyli @dwiel @miguelrass @qxcv @jaberkow @eavelardev @ruifeng96150 @pedrohbtp @srivatsankrishnan @evilsocket
@MarvineGothic @jdossgollin @SyllogismRXS @rusu24edward @jbulow @Antymon @seheevic @justinkterry @edbeeching
@flodorner @KuKuXia @NeoExtended @PartiallyTyped @mmcenta @richardwu @kinalmehta @rolandgvc @tkelestemur @mloo3
-@tirafesi @blurLake @koulakis @joeljosephjin @shwang @rk37
+@tirafesi @blurLake @koulakis @joeljosephjin @shwang @rk37 @andyshih12
--- a/stable_baselines3/a2c/a2c.py
+++ b/stable_baselines3/a2c/a2c.py
@ -141,7 +141,7 @@ class A2C(OnPolicyAlgorithm):
            # Entropy loss favor exploration
            if entropy is None:
                # Approximate entropy when no analytical form
-                entropy_loss = -log_prob.mean()
+                entropy_loss = -th.mean(-log_prob)
            else:
                entropy_loss = -th.mean(entropy)

--- a/stable_baselines3/ppo/ppo.py
+++ b/stable_baselines3/ppo/ppo.py
@ -198,7 +198,7 @@ class PPO(OnPolicyAlgorithm):
                # Entropy loss favor exploration
                if entropy is None:
                    # Approximate entropy when no analytical form
-                    entropy_loss = -log_prob.mean()
+                    entropy_loss = -th.mean(-log_prob)
                else:
                    entropy_loss = -th.mean(entropy)