stable-baselines3

mirror of https://github.com/saymrwulf/stable-baselines3.git synced 2026-05-27 22:55:17 +00:00

Author	SHA1	Message	Date
Rohan Tangri	2ada2dd0b2	Update PPO KL Divergence Estimator (#419 ) * remove unused all_kl_divs memory * new kl approximate equation * move kl check before update step * update changelog * add continue_training flag update to kl check * add verbose check * update changelog * lint with black * r -> log_ratio * Add link to PR * invert ratio * Fix for Sphinx v4.0 Co-authored-by: Anssi <kaneran21@hotmail.com> Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2021-05-10 13:21:00 +03:00
Cody Wild	b1aee71772	Improve error messages when PPO effective batch size is 1 and when last mini-batch is truncated (#270 ) * Add warning about total_env_steps not dividing neatly into batch size * Stylistic cleanup * Black reformatting * Add clearer documentation and update changelog * Update changelog.rst * Use specific RolloutBuffer terminology Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Change to minibatch language Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Cleaning up language describing rollout buffer requirements Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Switch to using env.num_envs * Working tests * Black and isort still fighting each other * codestyle finally happy * Basic test exists, possibly in the wrong file * Update phrasing Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2021-01-11 17:03:32 +01:00
Antonin RAFFIN	2b9fc1f923	Add supported action spaces checks (#254 ) * Add supported action spaces checks * Address comment	2020-12-06 14:05:10 +02:00
Antonin RAFFIN	d04aad2a20	Doc fixes and add `monitor_kwargs` parameter (#230 ) * Fix type annotation * Fix migration doc for A2C * Update version * Add `monitor_kwargs` argument * Update docs/guide/migration.rst Co-authored-by: Adam Gleave <adam@gleave.me> * Fix make atari env * Fix docstring * Renamed LearningRateSchedule Co-authored-by: Adam Gleave <adam@gleave.me>	2020-11-20 10:28:54 +01:00
thisray	5ddda44a74	Fix arguments order of `explained_variance()` (#227 ) * Fix for arguments order in explained_variance() Fix for arguments order in explained_variance() in PPO * Fix for arguments order in explained_variance() Fix for arguments order in explained_variance() in a2c * Fix for arguments order in explained_variance() update changelog.rst	2020-11-16 16:27:46 +01:00
M. Ernestus	c74509ae9d	Add callable signatures to type annotations. (#215 ) * Add callback signature to the learning rate type annotations. * Add callback signature to the learning rate schedule type annotations. * Add missing type annotations for learning rate callbacks. * Add signature to old-style learning and evaluation callbacks. * Add signature to env wrapper callback. * Add type annotation to closure function. * Use MaybeCallback more consistently. * Update changelog. * Remove now unused List import. * Fix import order. * Add type alias for learning rate schedules. * Optimize imports. * Fix messed up import. * Remove resolved TODO. Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-11-15 17:50:28 +01:00
Stefan Heid	9d463bc476	Small docstring improvements related to the notion of Rollout (#206 ) * Small docstring improvements related to the notion of Rollout * documented changes in changelog.rst, added myself to contributers * Minor edits Co-authored-by: Stefan Heid <stefan.heid@upb.de> Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-11-02 11:45:08 +01:00
Antonin RAFFIN	55912576ed	Cleanup docstring types (#169 ) * Cleanup docstring types * Update style * Test with js hack * Revert "Test with js hack" This reverts commit d091f438e8851ab8d01b66628e06a104f5e5ec69. * Fix types * Fix typo * Update CONTRIBUTING example	2020-10-02 20:05:55 +03:00
Vsevolod Kompantsev	4fd408bec2	Fix PPO logging of clip_fractions (#150 ) * bugfix for PPO logging of clip_fractions * Update changelog.rst Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-09-01 09:52:31 +02:00
Andy Shih	8f9aaaebe9	fix approximate entropy calculation in PPO and A2C (#130 )	2020-07-29 21:19:41 +02:00
Antonin RAFFIN	23afedb254	Auto-formatting with black and isort (#97 ) * Add auto formatting with black and isort * Reformat code * Ignore typing errors * Add note about line length * Add minimum version for isort * Add commit-checks * Update docker image * Fixed lost import (during last merge) * Fix opencv dependency	2020-07-16 16:12:16 +02:00
Anssi	44f8218df0	Review of code (A2C, PPO and refactoring) (#35 ) * Split torch module code into torch_layers file * Updated reference to CNN * Change 'CxWxH' to 'CxHxW', as per common notion * Fix missing import in policies.py * Move PPOPolicy to OnlineActorCriticPolicy * Create OnPolicyRLModel from PPO, and make A2C and PPO inherit * Update A2C optimizer comment * Clean weight init scales for clarity * Fix A2C log_interval default parameter * Rename 'progress' to 'progress_remaining * Rename 'Models' to 'Algorithms' * Rename 'OnlineActorCriticPolicy' to 'ActorCriticPolicy' * Move static functions out from BaseAlgorithm * Move on/off_policy base algorithms to their own files * Add files for A2C/PPO * Fix docs * Fix pytype * Update documentation on OnPolicyAlgorithm * Add proper doctstring for on_policy rollout gathering * Add bit clarification on the mlppolicy/cnnpolicy naming * Move static function is_vectorized_policies to utils.py * Checking docstrings, pep8 fixes * Update changelog * Clean changelog * Remove policy warnings for sac/td3 * Add monitor_wrapper for OnPolicyAlgorithm. Clean tb logging variables. Add parameter keywords to OffPolicyAlgorithm super init Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-06-09 13:54:18 +02:00
Roland Gavrilescu	bb01253261	Tensorboard integration (#30 ) * init commit tensorboard-integration * Added tb logger to ppo (with output exclusions) * fixed truncated stdout * categorize stdout outputs by tag * separated exclusions from values, added missing logs * saving exclusions as dict instead of list * reformatting, auto run indexing * included renaming suggestions, fixed tests * tb support for sac * linting * moved logging to base class * tb support for td3 * removed histograms, non-verbose output working * modifed changelog * linting * fixed type error * moved logger config to utils * removed episode_rewards log from ppo * Enable tensorboard in tests * Remove unused import * Update logger sub titles * Minor edit for PPO * Update logger and tb log folder * Pass correct logger to Callbacks * updated docs * added tb example image to docs * add support for continuing training in tensorboard * added tensorboard to docs index * added tb test * moved logger config to _setup_learn, updated tests * accessing verbose from base class * Update doc and tests * Rename session -> time * Update version * Update logger truncate * Update types * Remove duplicated code Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-06-01 11:55:44 +02:00
Roland Gavrilescu	91adefdb4b	Support for MultiBinary / MultiDiscrete spaces (#13 ) * multicategorical dist and test * fixed List annotation * bernoulli dist and test * added distributions to preprocessing (needs testing) * fixed and tested distributions * added changelog and fixed ppo policy * minor fix * dist fixes, added test_spaces * clean up * modified changelog * additional fixes * minor changelog mod * hot encoding fix, flake8 clean up * lint tests * preprocessing fix * fixed bernoulli bug * removed commented prints * Update changelog.rst * included suggested modifications * linting fix * increased space dim * Update doc and tests Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-05-18 14:42:13 +02:00
Antonin RAFFIN	15ff6d47ee	Documentation update and style fixes (#21 ) * Update doc: add gSDE * Fix codestyle * Remove travis script * Add lint check to gitlab	2020-05-15 13:54:06 +02:00
Antonin RAFFIN	413a2386d9	Cleanup + reformat code	2020-05-08 15:10:46 +02:00
Antonin RAFFIN	d17f29c8ad	Add base doc	2020-05-07 10:10:51 +02:00
Antonin RAFFIN	d542732c8d	Rename to stable-baselines3	2020-05-05 15:02:35 +02:00

18 commits