stable-baselines3

mirror of https://github.com/saymrwulf/stable-baselines3.git synced 2026-05-18 21:30:19 +00:00

Author	SHA1	Message	Date
Antonin RAFFIN	bb16645c4e	Add `skip` option for `VecTransposeImage` and bug fix in frame stack (#700 ) * Update doc * Add comment * Add skip option to VecTransposeImage and fix bug in frame stack	2021-12-23 17:12:49 +02:00
Quentin Gallouédec	d496cd4d95	Consistent use of `device` as keyword argument (#702 ) * consistent device as keyword arg * Fixed ``device`` arg inconsistency in changelog	2021-12-22 11:43:59 +01:00
Demetrio92	798b16aaf7	more verbose documentation regarding `.load` vs `.set_parameters` (#696 ) * more verbose documentation regarding `.load` vs `.set_parameters` (#683, #614) * add a note to explain the difference between `.load` and `.set_parameters` to the examples * fix typos Co-authored-by: Anssi <kaneran21@hotmail.com> Co-authored-by: Anssi <kaneran21@hotmail.com>	2021-12-18 17:28:37 +02:00
hsuehch	222a69ca49	Eliminate extra empty lines in CSV monitor files on Windows (DLR-RM#692) (#695 ) * Added ``newline="\n"`` when opening CSV monitor files so that each line ends with ``\r\n`` instead of ``\r\r\n`` on Windows while Linux environments are not affected	2021-12-18 16:04:33 +02:00
Antonin RAFFIN	e24147390d	Improve tests and add check for float32 (#686 ) * Add additional checks * Improve tests and error message * Update changelog * Bump version * Update doc * Add tests for action space * Improve test	2021-12-09 14:14:33 +02:00
Antonin RAFFIN	77f4f5021d	Drop Python 3.6 support (#685 ) * Drop python 3.6 support * Update doc * Update gitlab CI * Update doc env * Fix gitlab CI	2021-12-06 12:54:43 +01:00
Antonin RAFFIN	507ed1762e	Multiprocessing support for off policy algorithms (#439 ) * Add multi-env training support for SAC * Fix for dict obs * Pytype fixes * Fix assert on number of envs * Remove for loop * Add support for Dict obs * Start cleanup * Update doc and bug fix * Add support for vectorized action noise and add multi env example for off-policy * Update version * Bug fix with VecNormalize * Update README table * Update variable names * Update changelog and version * Update doc and fix for `gradient_steps=-1` * Add test for `gradient_steps=-1` * Disable pytype pyi errors * Fix for DQN * Update comment on deepcopy * Remove episode_reward field * Fix RolloutReturn * Avoid modification by reference * Fix error message Co-authored-by: Anssi <kaneran21@hotmail.com>	2021-12-01 22:30:09 +01:00
Antonin RAFFIN	2ebb8aa22b	Update Citation (#684 ) * Update citation * Remove cff file	2021-12-01 18:55:21 +01:00
Antonin RAFFIN	52c29dc497	Fix evaluation script for recurrent policies (#678 ) * Fix evaluation script for RNN * Add error message * Revert "Add error message" This reverts commit 8d69b6cf4de2cd13aecfb425bd3145fad6a6c49a. * Fix for pytype * Rename mask to `episode_start` * Fix type hint * Fix type hints * Remove confusing part of sentence Co-authored-by: Anssi <kaneran21@hotmail.com>	2021-11-30 13:49:06 +01:00
Gary Briggs	8e5ede783f	Add a section on exporting to TFLite/Coral with demonstration (#679 ) * Add a section on exporting to TFLite/Coral with demonstration * Changelog to reflect new export documentation * Update docs/guide/export.rst Fingers on autopilot make word wrong Co-authored-by: Anssi <kaneran21@hotmail.com> * Update docs/guide/export.rst Better wording clarity Co-authored-by: Anssi <kaneran21@hotmail.com> * Update docs/guide/export.rst Better wording clarity Co-authored-by: Anssi <kaneran21@hotmail.com> * Clarify motivations and hardware * Update docs/misc/changelog.rst Make consistent with other changelog entries Co-authored-by: Anssi <kaneran21@hotmail.com> * Sphinx wants the section underline to be at least this long * Remove first-person voice * Typos Co-authored-by: Anssi <kaneran21@hotmail.com>	2021-11-28 10:54:50 +01:00
Shyamal H Anadkat	3b68dc7312	Update GAE computation docstring (#655 ) * Fix typo in buffers.py * Revert "Fix typo in buffers.py" This reverts commit ca643d5e3a509ae1b8a65bf0de98f4609ca9d8da. * Ignore pytype errors * Update GAE computation docstring Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2021-11-25 10:53:42 +01:00
Parth Kothari	58e5506385	Editted Authors of DriverGym project (#669 )	2021-11-18 10:18:18 +01:00
Parth Kothari	1ac35eaef2	Add DriverGym project to SB3 project documentation (#665 ) * Added DriverGym project * Updated changelog * Update changelog.rst Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2021-11-17 11:13:43 +01:00
Antonin RAFFIN	d228364ccf	Add timeout handling for on-policy algorithms (#658 ) * Add timeout handling for on-policy algorithms * Fixes * Fix infinite loop in eval * Skip type check for python 3.9 * Fix for discrete obs + add docstring * Fix A2C test * Removed unused helper * Add test for infinite horizon * typed ast should be fixed * Apply suggestions from code review Co-authored-by: Anssi <kaneran21@hotmail.com> Co-authored-by: Anssi <kaneran21@hotmail.com>	2021-11-16 17:19:16 +01:00
Antonin RAFFIN	e75e1de4c1	Fix indentation in RL tips doc (#657 ) * Update rl_tips.rst indent fix to make if done and its following statement work * Fix indentation and update changelog * Skip type check for python 3.9 Co-authored-by: paulg <cove9988@gmail.com>	2021-11-10 16:54:20 +00:00
Antonin RAFFIN	2bb4500948	Fix `set_env` when using `VecNormalize` (#638 ) * Fix `set_env` when using `VecNormalize` * Update version	2021-11-02 13:52:26 +02:00
ac-93	98c1a637cf	add tactile-gym to the list of projects using SB3 (#640 ) * Update projects.rst * Update changelog.rst * Update projects.rst * Fix doc build Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2021-10-31 18:26:06 +01:00
Oleksii Kachaiev	0c17fedfac	Adjust FPS calculation to accommodate for reset_num_timesteps=False (#636 ) * Store number of timesteps at the beginning of each learn cycle * Update changelog * Set default _num_timesteps_at_start in the contructor * Test case for FPS logger * Adjust test to cover both on-policy and off-policy algorithms * Fix formatting * Update test and add comment * Fix test Co-authored-by: Oleksii Kachaiev <okachaiev@riotgames.com> Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2021-10-31 18:19:03 +01:00
Edouard Leurent	a2e3001598	Add highway-env to the list of projects using SB3 (#639 ) * Add highway-env to the list of projects using SB3 Many thanks for this fantastic library, keep up the good work! * Update changelog with added documentation	2021-10-30 13:53:36 +02:00
Oleksii Kachaiev	0503e694b2	Introduce norm_obs_keys param for VecNormalize environment wrapper (#631 ) * Implement new norm_obs_keys param for VecNormalize environment wrapper * Simplified doc string to avoid issues with lint and doc * Updated changelog * Update changelog.rst * Update test_vec_normalize.py * Update sanity checks * Fix backward compat * Update doc * Update changelog * Fix lint warnings * Fix tests * Minor edit * observation_space sanity check was applied twice Co-authored-by: Oleksii Kachaiev <okachaiev@riotgames.com> Co-authored-by: Anssi <kaneran21@hotmail.com> Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>	2021-10-28 19:18:39 +02:00
Antonin RAFFIN	7b977d7b03	Release 1.3.0 (#625 )	2021-10-23 17:07:00 +02:00
Antonin RAFFIN	e907eca18e	Fix `set_env` to keep the number of timesteps (#615 ) * Fix for `set_env` * Add test and update changelog * Use underscores and f-strings * Add PyPi info * Update comments	2021-10-23 16:36:40 +02:00
Antonin RAFFIN	1564a85081	System info helper (#613 ) * Add `system_env_info` * Add `print_system_info` to load and store system info at save time * Remove TODO * Rename to `get_system_info` * Import as sb3 for consistency * Update changelog * Add warning for old SB3 versions * Use underscore litteral for more clarity	2021-10-18 10:43:56 +02:00
Timo Kaufmann	09e9fc42eb	Use consistent logging keys (#605 ) * Use a consistent key to log the total timesteps This changes the timestep logging key of on-policy algorithms from `time/total_timesteps` to `time/total timesteps` (note the underscore/space). The off-policy algorithms and the eval callback already use the latter, so this behavior is more consistent. * Use underscores instead of spaces in logging keys Most keys already followed this policy and consistent behavior is friendlier to new users. * Minor edit and bump version Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2021-10-12 13:17:30 +02:00
Antonin RAFFIN	75aa31dcfb	Update SB3 contrib algorithms (#604 )	2021-10-10 15:41:39 +02:00
Antonin RAFFIN	1881d904a0	Doc fix and improve error messages (#598 ) * Fix custom env doc * Catch common mistake * Improve `EvalCallback` error message * Lint test * Update docs/guide/custom_env.rst Co-authored-by: Adam Gleave <adam@gleave.me> Co-authored-by: Adam Gleave <adam@gleave.me>	2021-10-08 18:08:31 +02:00
Ilja Avadiev	740d61ada3	Doc fix environment mixup (#588 )	2021-09-29 10:16:59 +02:00
Antonin RAFFIN	306e49fda6	Fixes in `is_vectorized_observation` (#587 ) * Fix is vectorized bug in DQN * Fix sub-classed obs	2021-09-28 21:57:49 +02:00
Antonin RAFFIN	201fbffa8c	Remove `sde_net_arch` + Simplify policy (#584 ) * Remove `sde_net_arch` + Simplify policy * Add warning at load time	2021-09-28 22:32:54 +03:00
batu	89af49ca91	ONNX Documentation Update (#464 ) * Updated ONNX documentation First draft on the documentation explaining how to export SB3 models in the ONNX format * Updated changelog with ONNX documentation fix * Address comments * Update changelog.rst * Update rtd env * Fixes + add test example Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> Co-authored-by: Anssi Kanervisto <anssk@Anssis-MacBook-Air.local> Co-authored-by: Anssi Kanervisto <kaneran21@hotmail.com>	2021-09-26 17:40:35 +02:00
Baek Junyeob	914bc10a0d	Add policy-distillation-baselines to project page (#578 ) * Update projects.rst * Update docs/misc/projects.rst * Apply suggestions from code review * Update changelog.rst Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2021-09-20 16:30:16 +02:00
Adam Gleave	e825fbdd33	VecNormalize: allow non-continuous observations when norm_obs is False (#575 ) * VecNormalize: allow non-continuous observations when norm_obs is False * Update changelog, fix lint * Switch to environment present in new and old versions of Gym * Fix name Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2021-09-18 12:11:01 +02:00
Matthew Allen	76c212a854	Add RLGym to project page (#576 ) * Add RLGym to projects list. Per the request in this issue on our repo: https://github.com/lucas-emery/rocket-league-gym/issues/24 * Update changelog documentation section * Update changelog.rst * Update docs/misc/projects.rst Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2021-09-18 11:47:22 +02:00
Wilhelm Kirchgässner	303df08a80	Add GEM project to project section of doc (#574 ) * add GEM project to project section of doc * Update docs/misc/projects.rst * Update changelog.rst Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2021-09-18 11:10:04 +02:00
Cyprien	f3a35aa786	Add method `predict_values` for ActorCriticPolicy (#569 ) * feat: add method predict_values for ActorCriticPolicy * Fixes for new gym version * Reformat Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2021-09-15 14:03:04 +02:00
Antonin RAFFIN	16f8b21d9b	Add `get_distribution` for on-policy algorithms (#566 ) * feat: get_distribution method for ActorCriticPolicy New method get_distribution for class ActorCriticPolicy returning current action distribution given observations * doc: updating changelog.rst - adding block for Release 1.2.1a0 - adding cyprienc to contributors * style: make format * fix: updating version.txt Changing version from 1.2.0 to 1.2.1a0 * Update changelog * Add test for get distribution Co-authored-by: Cyprien <courtot.c@gmail.com>	2021-09-13 10:25:42 +02:00
Antonin RAFFIN	f8a0869073	Hotfix for Vecnormalize (#558 ) * Hotfix for Vecnormalize * Rename `ret` to `returns`	2021-09-08 12:30:20 +02:00
Antonin RAFFIN	f9e5753acd	Refactor `BasePolicy` predict (#559 )	2021-09-05 02:27:45 +03:00
Scott Brownlie	1afc2f3abe	Avoid putting target networks into training mode (#553 ) * make sure DQN policy is always in correct mode - train or eval * make set_training_mode an abstract method of the base policy - safer * update docstring of _build method to note that the target network is put into eval mode * use set_training_mode to put the dqn target network into eval mode * use set_training_mode to set the training model of the q-network * move set_training_mode abstract method from BasePolicy to BaseModel * set train and eval mode for TD3 * make sure critic is always in correct mode during train * set train and eval mode for SAC * add comment re batch norm and dropout * set train and eval mode for A2C and PPO * add tests for collect rollouts with batch norm * fix formatting * update change log * update version * remove Optional typing for batch size - causing type check to fail * Fix scipy dependency for toy text envs * implement set_training_mode method in BaseModel * move all tests of train/eval mode to test_train_eval_mode * call learn with learning_starts = total_timesteps to test that collect_rollouts does not update batch norm * remove extra calls to set_training_mode in train method of TD3 and SAC * Allow gradient_steps=0 * Refactor tests * Add comment + use aliases * Typos Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2021-08-30 17:42:41 +02:00
David Blom	3efab0d267	Training and evaluation: call model.train() and model.eval() (#537 ) * training and evaluation: call model.train() and model.eval() to enable and disable dropout and batchnorm * Add comment documentation * Fix train and eval for the Actor class * Run black * Add github handle to changelog * Add unit tests for PPO and DQN * Refactor unit test * Run black * unit test: add a dropout layer and check that calling predict with deterministic=True is deterministic * documentation: add bugfix description to changelog * unit test: use learning_starts=0, decrease the size of the network and use more training steps * on policy algorithms: call policy.train() and policy.eval() instead of disable_training and enable_training as it is a th.nn.module * Rename unit test * unit test: use drop out probability of 0.5 * Call policy.train and policy.eval * Fixes + update tests * Remove unneeded eval Co-authored-by: David Blom <davidsblom@gmail.com> Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2021-08-14 14:08:27 +02:00
MihaiAnca13	c41368f2ea	Docs examples warning - issue #526 (#530 ) * Update a2c.rst * Update ddpg.rst * Update dqn.rst * Update her.rst * Update ppo.rst * Update sac.rst * Update td3.rst * Update changelog.rst * modified message * Update examples.rst Co-authored-by: Anssi <kaneran21@hotmail.com>	2021-08-09 16:23:25 +03:00
Antonin RAFFIN	be86883f36	Fix type annotations (#522 ) * Fix type annotations * Add citation file * Update CITATION.cff * Add note about tb logging Co-authored-by: Anssi <kaneran21@hotmail.com>	2021-07-29 13:02:09 +02:00
Antonin RAFFIN	503425932f	Documentation fixes (#514 ) * Update multiprocessing example * Add VecEnvWrapper example * Update docs/guide/vec_envs.rst Co-authored-by: Anssi <kaneran21@hotmail.com> Co-authored-by: Anssi <kaneran21@hotmail.com>	2021-07-18 20:51:41 +02:00
Antonin RAFFIN	2fa06ae8d2	Add Python3.9 CI + upgrade min PyTorch version (#503 ) * Add Python3.9 CI + upgrade min PyTorch version * Upgrade min PyTorch version	2021-07-06 09:32:03 +02:00
Antonin RAFFIN	5af35fa2cc	Release v1.1.0 (#497 )	2021-07-02 11:21:09 +02:00
Skander Moalla	abbf48e93e	Fix Inconsistencies with EvalCallback tensorboard logs (#492 ) * Make EvalCallback dump the evaluation logs it records #457. * Make test deterministic Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2021-07-01 15:43:08 +02:00
Carlo Rizzardo	066e1409d9	Corrected DictReplayBuffer observation dtype #484 (#486 ) * Fix observation buffer dtype in DictReplayBuffer * Formatting fix (line length) * Changelog update, bugfix DictReplaybuffer observations dtype	2021-06-22 13:41:26 +02:00
Antonin RAFFIN	b52c6fc18f	Fix logger setup (#469 ) * Make logger an attribute * Update doc * Fix logger reset when using multiple runs * Cleanup logger: remove `Logger.CURRENT` * Fix for PPO * Update tests and improve docstring * Add warning * Throw error when tensorboard not installed	2021-06-14 15:17:48 +02:00
Benjamin Steenhoek	180a2e3832	Remove recurrent policies from A2C docs (#470 ) * Remove recurrent policies from A2C docs Recurrent policies are not supported yet as of (https://github.com/DLR-RM/stable-baselines3/issues/160#issuecomment-694756355), but the docs say that A2C supports them. Changing it to avoid misleading. * Update changelog Co-authored-by: benjaminjsteenhoek@gmail.com <benjis@iastate.edu>	2021-06-07 19:39:49 +02:00
Benjamin Black	a038044d11	Added support for vector envs in evaluation (#447 ) * added vector env support to evaluate_policy * fixed linting and documentation * updated changelog * fixed code style issue * added tests for vec env * fixed formatting * renamed observations * added comments for vector evaluation * fixed issues * Cleanup + bump version * Add comment * Fix wrong count of episodes Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>	2021-05-28 12:40:29 +02:00

1 2 3 4 5

229 commits