stable-baselines3/stable_baselines3
Rohan Tangri 2ada2dd0b2
Update PPO KL Divergence Estimator (#419)
* remove unused all_kl_divs memory

* new kl approximate equation

* move kl check before update step

* update changelog

* add continue_training flag update to kl check

* add verbose check

* update changelog

* lint with black

* r -> log_ratio

* Add link to PR

* invert ratio

* Fix for Sphinx v4.0

Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-05-10 13:21:00 +03:00
..
a2c Add supported action spaces checks (#254) 2020-12-06 14:05:10 +02:00
common Policy Base for On-policy Algorithms (#412) (#415) 2021-05-04 12:59:36 +03:00
ddpg Fix default arguments + add bugbear (#363) 2021-03-25 11:35:21 +02:00
dqn Fix default arguments + add bugbear (#363) 2021-03-25 11:35:21 +02:00
her Fix for HER with custom objects (#343) 2021-03-06 15:57:27 +01:00
ppo Update PPO KL Divergence Estimator (#419) 2021-05-10 13:21:00 +03:00
sac Fix default arguments + add bugbear (#363) 2021-03-25 11:35:21 +02:00
td3 Fix default arguments + add bugbear (#363) 2021-03-25 11:35:21 +02:00
__init__.py Implement HER (#120) 2020-10-22 11:56:43 +02:00
py.typed Rename to stable-baselines3 2020-05-05 15:02:35 +02:00
version.txt Fixed saving of A2C and PPO policy when using gSDE (#401) 2021-04-19 12:23:02 +02:00