stable-baselines3/stable_baselines3/ppo
Rohan Tangri 2ada2dd0b2
Update PPO KL Divergence Estimator (#419)
* remove unused all_kl_divs memory

* new kl approximate equation

* move kl check before update step

* update changelog

* add continue_training flag update to kl check

* add verbose check

* update changelog

* lint with black

* r -> log_ratio

* Add link to PR

* invert ratio

* Fix for Sphinx v4.0

Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-05-10 13:21:00 +03:00
..
__init__.py Auto-formatting with black and isort (#97) 2020-07-16 16:12:16 +02:00
policies.py Auto-formatting with black and isort (#97) 2020-07-16 16:12:16 +02:00
ppo.py Update PPO KL Divergence Estimator (#419) 2021-05-10 13:21:00 +03:00