stable-baselines3

mirror of https://github.com/saymrwulf/stable-baselines3.git synced 2026-07-01 03:45:11 +00:00

Author	SHA1	Message	Date
AptX395	06498e8be7	Update the code in Example. (#273 ) Replace `Pendulum-v0` with `CartPole-v0`, otherwise the sample code will not run normally.	2021-01-04 14:24:38 +02:00
Antonin RAFFIN	944dfdafe4	Update doc: SB3-Contrib (#267 ) * Fix big when saving/loading q-net alone * Rename variables to match SB3-contrib * Update docker image * Set min version for tensorboard * Add SB3-Contrib to doc * Update DQN * Apply suggestions from code review Co-authored-by: Adam Gleave <adam@gleave.me> * Update wording Co-authored-by: Adam Gleave <adam@gleave.me>	2020-12-21 16:17:24 +01:00
Lucas Alegre	b8c72a5348	Add SUMO-RL as example project in the docs (#257 ) * Add SUMO-RL as example project in the docs * Fixed docstring of AtariWrapper which was not inside of __init__ * Updated changelog regarding docs * Fix docstring of classes in atari_wrappers.py which were inside the constructor * Formated docstring with black Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-12-13 17:15:45 +01:00
M. Ernestus	e63e9d7d5e	Update github name of Maximilian Ernestus (#258 ) * update github name of Maximilian Ernestus * erniejunior -> ernestum * erniejunior -> ernestum * Update changelog.rst	2020-12-10 21:48:11 +01:00
Antonin RAFFIN	6b598323ae	Add eval success rate logging (#255 ) * Add eval success rate logging * Fix name clash * Log data * Bump version	2020-12-08 15:49:07 +01:00
Antonin RAFFIN	2b9fc1f923	Add supported action spaces checks (#254 ) * Add supported action spaces checks * Address comment	2020-12-06 14:05:10 +02:00
Antonin RAFFIN	e747e7e2b3	Add learning rate schedule example (#248 ) * Add learning rate schedule example * Update docs/guide/examples.rst Co-authored-by: Adam Gleave <adam@gleave.me> * Address comments Co-authored-by: Adam Gleave <adam@gleave.me>	2020-12-02 14:54:18 +01:00
Antonin RAFFIN	723b341c61	Fix for saving big replay buffer, use pickle protocol>=4 (#239 )	2020-11-24 16:13:00 +02:00
Antonin RAFFIN	3207bdab17	Automatically wrap with a Monitor when possible (#237 ) * Automatically wrap with a Monitor when possible * Update stable_baselines3/common/base_class.py Co-authored-by: Anssi <kaneran21@hotmail.com>	2020-11-20 18:08:00 +02:00
Megan Klaiber	852961139e	Fix bug with full HerReplayBuffer (#236 ) * Fix bug with full replay buffer * Updated changelog * Update tests/test_her.py Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-11-20 13:23:03 +01:00
Antonin RAFFIN	d04aad2a20	Doc fixes and add `monitor_kwargs` parameter (#230 ) * Fix type annotation * Fix migration doc for A2C * Update version * Add `monitor_kwargs` argument * Update docs/guide/migration.rst Co-authored-by: Adam Gleave <adam@gleave.me> * Fix make atari env * Fix docstring * Renamed LearningRateSchedule Co-authored-by: Adam Gleave <adam@gleave.me>	2020-11-20 10:28:54 +01:00
Antonin RAFFIN	9069cf55f1	Fix DQN predict shape for single Gym env (#222 ) * Fix DQN predict shape for single Gym env * Remove unused imports	2020-11-17 00:43:26 +02:00
thisray	5ddda44a74	Fix arguments order of `explained_variance()` (#227 ) * Fix for arguments order in explained_variance() Fix for arguments order in explained_variance() in PPO * Fix for arguments order in explained_variance() Fix for arguments order in explained_variance() in a2c * Fix for arguments order in explained_variance() update changelog.rst	2020-11-16 16:27:46 +01:00
Anssi	18d10dbf42	Use Monitor episode reward/length for `evaluate_policy` (#220 ) * Update evaluate_policy to use monitor data if available * Update documentation * Cleaning up * Remove unnecessary typing trickery * Update doc * Rename is_wrapped to clarify it is for vecenvs * Add is_wrapped for regular envs * Add is_wrapped call for subprocvecenv and update code for circular imports * Move new functions back to env_util and fix imports * Update changelog * Clarify evaluate_policy docs * Add tests for wrapped modifying episode lengths * Fix tests * Update changelog * Minor edits * Add warn switch to evaluate_policy and update tests Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-11-16 11:52:28 +01:00
M. Ernestus	c74509ae9d	Add callable signatures to type annotations. (#215 ) * Add callback signature to the learning rate type annotations. * Add callback signature to the learning rate schedule type annotations. * Add missing type annotations for learning rate callbacks. * Add signature to old-style learning and evaluation callbacks. * Add signature to env wrapper callback. * Add type annotation to closure function. * Use MaybeCallback more consistently. * Update changelog. * Remove now unused List import. * Fix import order. * Add type alias for learning rate schedules. * Optimize imports. * Fix messed up import. * Remove resolved TODO. Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-11-15 17:50:28 +01:00
Anssi	e2b6f5460f	Avoid transposing channel-first envs (#213 ) * Add test for channel-first environments * Add support for channel-first envs, including more tests * Update changelog * Run black * Run black, again * Improve NatureCNN error message * Update image checks and FrameStack wrapper * Update tests * Update docs * Run isort * Reformat * Fixes: avoid breaking changes for non-image env * Add additional checks * Update docstring Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-11-03 12:34:09 +01:00
Stefan Heid	9d463bc476	Small docstring improvements related to the notion of Rollout (#206 ) * Small docstring improvements related to the notion of Rollout * documented changes in changelog.rst, added myself to contributers * Minor edits Co-authored-by: Stefan Heid <stefan.heid@upb.de> Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-11-02 11:45:08 +01:00
Antonin RAFFIN	4a022dacd8	Add more issue templates (#211 ) * Update algo table * Add more templates * Add doc build in the checklist * Address comments	2020-11-02 10:43:25 +01:00
Antonin RAFFIN	b4188795f5	Release v0.10.0 (#204 )	2020-10-28 13:01:56 +01:00
Antonin RAFFIN	897e98c4e2	Update documentation (#199 ) * Update doc and add new example * Add save/load replay buffer example * Add save format + export doc * Add example for get/set parameters * Typos and minor edits * Add results sections * Add note about performance * Add DDPG results * Address comments * Fix grammar/wording Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>	2020-10-28 09:55:16 +01:00
Antonin RAFFIN	6327cc6156	Fix env loading (#203 ) * Fix loading bug * [ci skip] Fix docstring	2020-10-27 23:12:52 +02:00
Antonin RAFFIN	0fc0dd1b21	Fix off policy features extractor (#198 ) * Faster tests * Fix feature extractor bug + add check * Add missing check * Allow TD3 features extractor to be separate * Add share features extractor option for SAC * Bug fixes * Apply suggestions from code review Co-authored-by: Adam Gleave <adam@gleave.me> Co-authored-by: Adam Gleave <adam@gleave.me>	2020-10-27 14:24:59 +01:00
Steven H. Wang	b252f4212c	Add imitation library docs (#200 ) * docs: Add imitation library docs * Fix doc syntax errors * Fix internal link; PDF->abstract for DAgger for consistency * Grammar * Update migration guide Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> Co-authored-by: Adam Gleave <adam@gleave.me>	2020-10-24 17:33:26 +01:00
Megan Klaiber	dd6e361204	Implement HER (#120 ) * Added working her version, Online sampling is missing. * Updated test_her. * Added first version of online her sampling. Still problems with tensor dimensions. * Reformat * Fixed tests * Added some comments. * Updated changelog. * Add missing init file * Fixed some small bugs. * Reduced arguments for HER, small changes. * Added getattr. Fixed bug for online sampling. * Updated save/load funtions. Small changes. * Added her to init. * Updated save method. * Updated her ratio. * Move obs_wrapper * Added DQN test. * Fix potential bug * Offline and online her share same sample_goal function. * Changed lists into arrays. * Updated her test. * Fix online sampling * Fixed action bug. Updated time limit for episodes. * Updated convert_dict method to take keys as arguments. * Renamed obs dict wrapper. * Seed bit flipping env * Remove get_episode_dict * Add fast online sampling version * Added documentation. * Vectorized reward computation * Vectorized goal sampling * Update time limit for episodes in online her sampling. * Fix max episode length inference * Bug fix for Fetch envs * Fix for HER + gSDE * Reformat (new black version) * Added info dict to compute new reward. Check her_replay_buffer again. * Fix info buffer * Updated done flag. * Fixes for gSDE * Offline her version uses now HerReplayBuffer as episode storage. * Fix num_timesteps computation * Fix get torch params * Vectorized version for offline sampling. * Modified offline her sampling to use sample method of her_replay_buffer * Updated HER tests. * Updated documentation * Cleanup docstrings * Updated to review comments * Fix pytype * Update according to review comments. * Removed random goal strategy. Updated sample transitions. * Updated migration. Removed time signal removal. * Update doc * Fix potential load issue * Add VecNormalize support for dict obs * Updated saving/loading replay buffer for HER. * Fix test memory usage * Fixed save/load replay buffer. * Fixed save/load replay buffer * Fixed transition index after loading replay buffer in online sampling * Better error handling * Add tests for get_time_limit * More tests for VecNormalize with dict obs * Update doc * Improve HER description * Add test for sde support * Add comments * Add comments * Remove check that was always valid * Fix for terminal observation * Updated buffer size in offline version and reset of HER buffer * Reformat * Update doc * Remove np.empty + add doc * Fix loading * Updated loading replay buffer * Separate online and offline sampling + bug fixes * Update tensorboard log name * Version bump * Bug fix for special case Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de> Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-10-22 11:56:43 +02:00
Bernhard Raml	15e94a6d14	Add support to log videos via tensorboard (#196 ) * Add support to log videos via tensorboard The ability to look at renderings of agent's trajectories during training helps evaluate the performance of that agent. One can see what the agent actually does at various stages during training. For now only tensorboard is supported, as it is straightforward to implement. * Remove moviepy dependency from extra & doc update * Removed the moviepy dependency from the `extra` dependencies so the user can decide whether to install it or not * Update the video logging docu with proper naming, comments * Added a warning to the video logging docu explaining the moviepy dependency * Updated the video test, to check for a warning when moviepy is missing * Update doc * Update FormatUnsupportedError message * Also log the offending value making the error message more expressive * Fix reporting the correct format and update regression test * Use string description in FormatUnsupportedError * Instead of converting the value to string without the user's control the constructor takes a string representation of the value * Use string description in FormatUnsupportedError * Use a shorter string description for the error to reduce verbosity Co-authored-by: Bernhard Raml <raml.bernhard@gmail.com> Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-10-22 11:33:58 +02:00
Anssi	19c1a89a3a	Rename cmd_util to env_util (#197 ) * Rename cmd_util to env_util * Fix docs and add missing newline * Address comments	2020-10-22 11:05:52 +02:00
Melvin Wang	856da19609	add check to ensure action space is non-dict non-tuple for env_checker nan check (#192 ) * add check to ensure action space is non-dict non-tuple for env_checker nan check * update changelog.rst * add regression test for new check * commit-checks * add more action space checks * update docstrings * add warning check	2020-10-19 00:23:51 +03:00
Anssi	37f48aa979	Fix initializing CUDA even when `device="cpu"` is used. (#194 ) * Fall back to 'cpu' device in policies instead of 'auto' * Update changelog	2020-10-18 20:51:56 +02:00
Bernhard Raml	97b81f9e9e	Fix ignoring the exclude in the logger's record function for json, csv and log logging formats (#190 ) * Fix ignoring the exclude in logger record For the logging formats json, csv, and log the exclude parameter of the logger's record function has been ignored. The necessary checks were missing from some of the format writer classes. Regression tests have been added to prevent this error in the future. * Fix docstring for filter_excluded_keys Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Added missing type hints to local functions * Update stable_baselines3/common/logger.py Co-authored-by: Bernhard Raml <raml.bernhard@gmail.com> Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-10-16 17:34:49 +02:00
Wilson	fe6ade3089	Allow env_kwargs in make_vec_env when env ID string supplied (#189 ) * Allow env_kwargs in make_vec_env when env ID string supplied Resolves #188 * Update docs/misc/changelog.rst Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Add test for env kargs in make_vec_env * remove unnecessary args in test_vec_env_kwargs function * Fixes and reformat * Doc fix Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-10-16 11:09:19 +02:00
Antonin RAFFIN	2599f04940	Add custom arch for off-policy actor/critic networks (#182 ) * Add custom arch for off-policy actor/critic networks * Fix type hints * Address comments * Make sure number of updated parameters match in polyak * Add zip_strict for strict-length zipping * Fix building docs * Add test for zip strict * Faster tests Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>	2020-10-13 12:01:33 +02:00
Antonin RAFFIN	fc9527157a	Fix off-by-one GAE computation (#185 ) * Fix off-by-one GAE computation * Fix identity test * Revert gae loop	2020-10-13 00:10:54 +03:00
Antonin RAFFIN	fc6c5d3daa	Migration Guide (#123 ) * Start migration guide * Update guide * Add comment on RMSpropTFLike plus PPO/A2C migrations * Add note about set/get-parameters * Update migration guide * Update changelog and readme * Update doc + clean changelog * Address comments Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>	2020-10-11 23:22:12 +02:00
Antonin RAFFIN	a1e055695c	Improve typing coverage (#175 ) * Improve typing coverage * Even more types * Fixes * Update changelog * Unified docstrings * Improve error messages for unsupported spaces	2020-10-07 10:51:49 +02:00
Antonin RAFFIN	a10e3ae587	Release v0.9.0 (#174 )	2020-10-04 17:12:35 +02:00
Antonin RAFFIN	55912576ed	Cleanup docstring types (#169 ) * Cleanup docstring types * Update style * Test with js hack * Revert "Test with js hack" This reverts commit d091f438e8851ab8d01b66628e06a104f5e5ec69. * Fix types * Fix typo * Update CONTRIBUTING example	2020-10-02 20:05:55 +03:00
Antonin RAFFIN	2c924f52f5	Update docs (custom policy, type hints) (#167 ) * Change import * Update custom policy doc * Re-enable sphinx_autodoc_typehints * Update docker image * Attempt to fix read the doc build error * Add sphinx_autodoc_typehints to read the doc env * Fix pip version * Add full custom policy example * Fix	2020-09-29 20:41:14 +03:00
Antonin RAFFIN	44a723eecb	Fix loading of old versions and update changelog (#165 )	2020-09-24 16:05:36 +02:00
Anssi	9855486488	Get/set parameters and review of saving and loading (#138 ) * Update comments and docstrings * Rename get_torch_variables to private and update docs * Clarify documentation on data, params and tensors * Make excluded_save_params private and update docs * Update get_torch_variable_names to get_torch_save_params for description * Simplify saving code and update docs on params vs tensors * Rename saved item tensors to pytorch_variables for clarity * Reformat * Fix a typo * Add get/set_parameters, update tests accordingly * Use f-strings for formatting * Fix load docstring * Reorganize functions in BaseClass * Update changelog * Add library version to the stored models * Actually run isort this time * Fix flake8 complaints and also fix testing code * Fix isort * ...and black * Fix set_random_seed Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>	2020-09-24 14:28:27 +02:00
mloo3	00595b09d8	Add actor/critic loss logging to td3 (#164 ) * add actor/critic loss logging to td3 * Update changelog.rst Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-09-23 22:40:41 +02:00
Wilson	e908583e2a	Fix type annotation in make_vec_env (#162 ) * Fix type annotation in make_vec_env The variable `vec_env_cls` is a type and not an instance of either DummyVecEnv or SubprocVecEnv * Update changelog.rst	2020-09-23 10:34:35 +02:00
liorcohen5	f5104a5efc	Allow to set a device when loading a model (#154 ) * Added a 'device' keyword argument to BaseAlgorithm.load(). Edited the save and load test to also test the load method with all possible devices. Added the changes to the changelog * improved the load test to ensure that the model loads to the correct device. * improved the test: now the correctness is improved. If the get_device policy would change, it wouldn't break the test. * Update tests/test_save_load.py @araffin's suggestion during the PR process Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Update tests/test_save_load.py Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Bug fixes: when comparing devices, comparing only device type since get_device() doesn't provide device index. Now the code loads all of the model parameters from the saved state dict straight into the required device. (fixed load_from_zip_file). * PR fixes: bug fix - a non-related test failed when running on GPU. updated the assertion to consider only types of devices. Also corrected a related bug in 'get_device()' method. * Update changelog.rst Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-09-20 19:13:18 +02:00
Antonin RAFFIN	583d4b8e41	Minor: fix changelog	2020-09-10 16:56:27 +02:00
Vsevolod Kompantsev	4fd408bec2	Fix PPO logging of clip_fractions (#150 ) * bugfix for PPO logging of clip_fractions * Update changelog.rst Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-09-01 09:52:31 +02:00
Francisco Caio	5fc90a7f7d	Add StopTrainingOnMaxEpisodes to callback collection (#147 ) * Add StopTrainingOnMaxEpisodes class to pre-made callback collection * Adjust instant when counters are incremented for both OnPolicy and OffPolicy algorithms * Improv to StopTrainingOnMaxEpisodes including output, tests and doc * Improv StopTrainingOnMaxEpisodes callback running _init_callback * Update callbacks.py * Update test_callbacks.py * Fix style * Update changelog.rst * Fix test Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>	2020-08-28 11:36:33 +02:00
Antonin RAFFIN	a1afc5e42f	Fix typos in SAC and TD3 (#145 )	2020-08-23 17:44:35 +02:00
Stelios Tymvios	9003a09d5b	Callbacks have access to locals (#115 ) * callbacks have access to locals * changeloc * doc * callbacks have access to locals * changeloc * doc * Added update function for child callbacks * Pre-Release 0.8.0 (#134) * Fix double reset and improve typing coverage (#136) * Fix double reset and improve typing coverage * Revert minor edit * Add doc about types * Update child callbacks * cleaned imports * format * import order * Simplify tests and add comments Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-08-23 14:34:01 +02:00
Sam Toyer	42ef6d4677	Remove "device" argument from policies (#141 ) * Remove device arg from policies * Clean up for PR * Update test and doc * Fix codestyle Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-08-23 13:27:52 +02:00
Antonin RAFFIN	21e9994ff9	Fix double reset and improve typing coverage (#136 ) * Fix double reset and improve typing coverage * Revert minor edit * Add doc about types	2020-08-05 13:12:02 +03:00
Antonin RAFFIN	cceffd5ab2	Pre-Release 0.8.0 (#134 )	2020-08-03 22:38:54 +02:00

1 2 3

144 commits