stable-baselines3

mirror of https://github.com/saymrwulf/stable-baselines3.git synced 2026-07-18 18:52:30 +00:00

Author	SHA1	Message	Date
Juan Rocamonde	e22e372306	Fix duplicate key error in HumanOutputFormat (#1079 ) * Fix duplicate key error in HumanOutputFormat * Update changelog * Add test * Update changelog.rst Co-authored-by: Adam Gleave <adam@gleave.me> Co-authored-by: Adam Gleave <adam@gleave.me>	2022-09-28 12:06:07 +02:00
Dominic Kerr	899eee6bd4	Automatically create missing directories of ``filenames passed to` `ResultsWriter`` (#1072 ) * Create (if any) missing filename directories, passed into ResultsWriter * Fixed incorrect ``filename`` docstring (if ``filename`` where ``None``, the string method ``filename.endswith(Monitor.EXT)`` would raise an ``AttributeError``), and renamed ``reset_keywords`` docstring. * Added description of #1068 * Ignore pytype errors * Update changelog.rst Co-authored-by: dominicgkerr <dominicgkerr1@gmail.co> Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de> Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2022-09-21 13:14:38 +02:00
Quentin Gallouédec	440735cbd0	Fix loading a model with different number of environments (#1058 ) * Fix loading with new `n_envs` * Update tests * Update changelog * Fix the fix * Remove `self._setup_model()` from `set_env()` * Raise `AssertionError` when setting env with a different `n_envs` * Update unitests Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2022-09-17 11:10:03 +02:00
Quentin Gallouédec	29f6687b98	Raise error when observation keys and observation space keys don't match (#1047 ) * Raise error when observation keys and observation space keys don't match * Print the difference in keys * Update changelog	2022-09-05 14:54:58 +02:00
Sidney Tio	304c17dc78	Add append mode to Monitor (#1037 ) * Added option to override or use existing CSVs * Updated changelog for Monitor override * Changed default value to override * Simplify code and add test * Update version * Fix for pytype Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2022-08-31 11:53:44 +02:00
Hugh Perkins	2cc1477fa2	Fix advantage normalization with mini-batchsize of 1 (#1028 ) * fix nan in advnatages with batch size 1, for ppo * changelog * black * Simplify test * Bump version Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2022-08-25 11:50:08 +02:00
Anand Balakrishnan	59af0c1b01	`CheckpointCallback` can now save replay buffer and `VecNormalize` (#1030 ) * CheckpointCallback now saves replay buffer (if present) * VecNormalize stats are saved at checkpoints * Make checkpointing replay buffer and VecNormalize opt-in * Edit changelog * Add documentation for new parameters * Update docs/misc/changelog.rst * Add documentation for new parameters * Implement suggested edits * Reformat code * Fix git conflict * Add .pkl suffix to VecNormalize checkpoints * Add tests for new CheckpointCallback params * Merge CheckpointCallback tests * Update test and add helper for checkpoint path Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2022-08-25 10:57:51 +02:00
Honglu Fan	29a481a288	Include `running_mean` and `running_val` when updating target networks (#1004 ) * include `running_mean` and `running_val` when updating target networks in DQN, SAC, TD3. * Update stable_baselines3/common/utils.py Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Precompute batch norm parameters in `_setup_model` and directly copy them in the target update. * include `running_mean` and `running_val` when updating target networks in DQN, SAC, TD3. * Update stable_baselines3/common/utils.py Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Precompute batch norm parameters in `_setup_model` and directly copy them in the target update. * Fix `DictReplayBuffer.next_observations` type (#1013) * Fix DictReplayBuffer.next_observations type * Update changelog Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Fixed missing verbose parameter passing (#1011) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Support for `device=auto` buffers and set it as default value (#1009) * Default device is "auto" for buffer + auto device support in BufferBaseClass * Update docstring * Update tests * Unify tests * Update changelog * Fix tests on CUDA device Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de> * Precompute batch norm parameters in `_setup_model` and directly copy them in the target update. * Update test * Add comments and update tests * Bump version * Remove one extra space to conform code style. * Update docstrings Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Burak Demirbilek <BurakDmb@users.noreply.github.com> Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>	2022-08-23 10:20:43 +02:00
Timothé	01cc127d32	Support hparams logging to tensorboard (#984 ) * create Hparam class & support in all OutputFormats * add hparams documentation & example * add hparam tests * remove unnecessary test & fix name * format changes * support hyperparameters logging to tensorboard * fix HParams class docstring * use more explicit variable names * raise error instead of warning * Unpin protobuf * Add test for logging hparams Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2022-08-22 22:06:54 +02:00
Quentin Gallouédec	73822c34da	Support for `device=auto` buffers and set it as default value (#1009 ) * Default device is "auto" for buffer + auto device support in BufferBaseClass * Update docstring * Update tests * Unify tests * Update changelog * Fix tests on CUDA device Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>	2022-08-16 17:54:55 +02:00
Quentin Gallouédec	c4f54fcf04	Handling multi-dimensional action spaces (#971 ) * Handle non 1D action shape * Revert changes of observation (out of the scope of this PR) * Apply changes to DictReplayBuffer * Update tests * Rollout buffer n-D actions space handling * Remove error when non 1D action space * ActorCriticPolicy return action with the proper shape * remove useless reshape * Update changelog * Add tests Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2022-08-06 14:19:20 +02:00
Adam Gleave	b1cc15970a	Use higher resolution time_ns() and avoid division by zero (#979 ) * Use higher resolution time and round up to eps * Update changelog * Add test case * Fix formatting, time()->time_ns * Bugfix: ns is integer not float * Move test to better place * Divide by 1e9 earlier	2022-07-25 23:02:53 +02:00
Quentin Gallouédec	fda3d4d748	Fix returned type in predict (#964 ) * `arr[0]` to `arr.squeeze(0)` * `squeeze(axis=0)` to `squeeze(0)` * Type testing * Add type test for unvectorized observation * `squeeze(0)` to `squeeze(axis=0)` * Treatment of the laziness symptoms * Update changelog * Udate changelog Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2022-07-18 11:22:19 +02:00
Antonin RAFFIN	c1f1c3d3d7	Release v1.6.0 (#958 ) * Release v1.6.0 + update doc + add copy button * Update read the doc conda env * Update year * Fix bug in kl divergence check * Rephrase requirement for envpool and isaac gym	2022-07-12 22:50:23 +02:00
Max Weltevrede	ef10189d80	Prohibit simultaneous use of optimize_memory_usage and handle_timeout_termination (#948 ) * Prohibit simultaneous use of optimize_memory_buffer and handle_timeout_termination * Modify test to avoid unsupported buffer configuration * Change from assertion to raising of ValueError * Update changelog * Update style for consistency * Use handle_timeout_termination when possible Co-authored-by: Anssi <kaneran21@hotmail.com> Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2022-07-04 15:08:54 +02:00
Antonin RAFFIN	49813d8c68	Update doc and add check for unbounded action space (#918 )	2022-05-25 16:24:21 +02:00
Antonin RAFFIN	0fadc94df3	Fix synchronization bug with EvalCallback (#907 )	2022-05-08 21:54:34 +03:00
Antonin RAFFIN	a6f5049a99	Upgrade code to Python 3.7+ syntax using `pyupgrade` (#887 ) * Upgrade code to Python 3.7+ syntax * Update changelog	2022-04-25 13:01:38 +03:00
Paul Scheikl	ed308a71be	Fixed unchecked None value in SubprocVecEnv (#808 ) * Fixed unchecked None value in SubprocVecEnv * Fixed unchecked None value in DummyVecEnv * Fix formatting * Update test and changelog * Improve test Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2022-04-12 16:05:40 +02:00
Antonin RAFFIN	39a4f9379a	Escape tensorboard log name (#857 ) * escape tensorboard log name Otherwise utils does not recognize the log. * Added fix to changelog * Modifications made by: make commit-checks . * Revert "Modifications made by: make commit-checks ." This reverts commit 529a275d9475f85ef031038a8f3565f7301e5371. * Update changelog and add test Co-authored-by: James Hirschorn <James.Hirschorn@quantitative-technologies.com>	2022-04-11 21:49:18 +02:00
Yifei Cheng	44e53ff811	Enable force_zip64 (#839 ) * Enable force_zip64 * mark tests as expensive * Update changelog Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2022-03-28 10:35:33 +02:00
Julio César Alves	cdaa9ab418	Callback to early stop the training if there is no model improvement after consecutive evaluations (#741 ) * Added StopTrainingOnNoModelImprovement callback and callback_after_eval parameter in EvalCallback * Correction in EvalCallback and tests for StopTrainingOnNoModelImprovement * Update the docs related to new StopTrainingOnNoModelImprovement callback * Update doc Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>	2022-02-25 11:56:47 +01:00
Quentin Gallouédec	13fcb12471	Fix normalization for `DictReplayBuffer` (#744 ) * Normalize samples DictReplayBuffer (#743) * Fixed sample normalization in ``DictReplayBuffer`` (#743) * Test buffer normalization * Rename test replay buffer * Bump version Co-authored-by: Anssi <kaneran21@hotmail.com> Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2022-02-23 13:04:57 +01:00
Boyuan Chen	7a01637128	Fix VecNormalization bug for Dict obs (#768 ) * fix #724 VecNormalization bug for Dict obs * update test and changelog * Update changelog Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2022-02-23 12:33:41 +01:00
Costa Huang	d2ebd2eeaa	Allow PPO to turn off advantage normalization (#763 ) * Allow PPO to turn of advantage normalization * update changelog * Add a test case * Update test and sanity check * Fix tests Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2022-02-22 15:29:21 +01:00
Antonin RAFFIN	7ce4bb8016	Pin gym version (#782 ) * Pin gym version * Cleanup warnings * Reformat	2022-02-21 23:12:54 +01:00
Adam Gleave	78afcbd6d9	HumanOutputFormat: make length configurable, throw error if keys alias (#756 ) * Make HumanOutputFormat length configurable and bump to 36 by default * Add test case * Updated changelog * Blacken * Blacken code * Fix GitLab CI: switch to Docker container with new black version * Incorporate suggestion * Add class docstring * Dummy commit to retrigger GitLab Co-authored-by: Anssi <kaneran21@hotmail.com>	2022-02-05 12:57:35 +02:00
Carlos Luis	5143cd19f7	Gym fixes - Follow up from #705 (#734 ) * fix Atari in CI * fix dtype and atari extra * Update setup.py * remove 3.6 * note about how to install Atari * pendulum-v1 * atari v5 * black * fix pendulum capitalization * add minimum version * moved things in changelog to breaking changes * partial v5 fix * env update to pass tests * mismatch env version fixed * Fix tests after merge * Include autorom in setup.py * Blacken code * Fix dtype issue in more robust way * Fix GitLab CI: switch to Docker container with new black version * Remove workaround from GitLab. (May need to rebuild Docker for this though.) * Revert to v4 * Update setup.py * Apply suggestions from code review * Remove unnecessary autorom * Consistent gym versions Co-authored-by: J K Terry <justinkterry@gmail.com> Co-authored-by: Anssi <kaneran21@hotmail.com> Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> Co-authored-by: modanesh <mohamad4danesh@gmail.com> Co-authored-by: Adam Gleave <adam@gleave.me>	2022-02-04 15:13:57 -08:00
Adam Gleave	f488d0772a	Autoformat code with black (new version complains about new things) (#757 ) * Blacken code * Fix GitLab CI: switch to Docker container with new black version	2022-02-04 02:56:06 +02:00
Paul Scheikl	fc41600225	Fixed logging info_keywords in the VecMonitor class. (#730 ) * Writing the additional info_keywords into the episode infos that are passed to the resulst writer. Directly taken from the non-vec version of monitor. * Added test for monitoring info_keywords. * Removed unnecessary step of registering the env. Not using make_vec_env, because it applies a monitor wrapper to the env. * Reformat Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2022-01-19 17:17:22 +01:00
Antonin RAFFIN	e9a8979022	Add copy and combine method to running mean std (#716 ) * Add copy and combine method to running mean std * Update test * Faster test * Update test * Update test * Shift values in RMS test	2022-01-06 01:31:04 +02:00
Antonin RAFFIN	bb16645c4e	Add `skip` option for `VecTransposeImage` and bug fix in frame stack (#700 ) * Update doc * Add comment * Add skip option to VecTransposeImage and fix bug in frame stack	2021-12-23 17:12:49 +02:00
Antonin RAFFIN	e24147390d	Improve tests and add check for float32 (#686 ) * Add additional checks * Improve tests and error message * Update changelog * Bump version * Update doc * Add tests for action space * Improve test	2021-12-09 14:14:33 +02:00
Antonin RAFFIN	507ed1762e	Multiprocessing support for off policy algorithms (#439 ) * Add multi-env training support for SAC * Fix for dict obs * Pytype fixes * Fix assert on number of envs * Remove for loop * Add support for Dict obs * Start cleanup * Update doc and bug fix * Add support for vectorized action noise and add multi env example for off-policy * Update version * Bug fix with VecNormalize * Update README table * Update variable names * Update changelog and version * Update doc and fix for `gradient_steps=-1` * Add test for `gradient_steps=-1` * Disable pytype pyi errors * Fix for DQN * Update comment on deepcopy * Remove episode_reward field * Fix RolloutReturn * Avoid modification by reference * Fix error message Co-authored-by: Anssi <kaneran21@hotmail.com>	2021-12-01 22:30:09 +01:00
Antonin RAFFIN	d228364ccf	Add timeout handling for on-policy algorithms (#658 ) * Add timeout handling for on-policy algorithms * Fixes * Fix infinite loop in eval * Skip type check for python 3.9 * Fix for discrete obs + add docstring * Fix A2C test * Removed unused helper * Add test for infinite horizon * typed ast should be fixed * Apply suggestions from code review Co-authored-by: Anssi <kaneran21@hotmail.com> Co-authored-by: Anssi <kaneran21@hotmail.com>	2021-11-16 17:19:16 +01:00
Antonin RAFFIN	2bb4500948	Fix `set_env` when using `VecNormalize` (#638 ) * Fix `set_env` when using `VecNormalize` * Update version	2021-11-02 13:52:26 +02:00
Antonin Raffin	6daf82bf74	Relax test	2021-10-31 19:03:28 +01:00
Oleksii Kachaiev	0c17fedfac	Adjust FPS calculation to accommodate for reset_num_timesteps=False (#636 ) * Store number of timesteps at the beginning of each learn cycle * Update changelog * Set default _num_timesteps_at_start in the contructor * Test case for FPS logger * Adjust test to cover both on-policy and off-policy algorithms * Fix formatting * Update test and add comment * Fix test Co-authored-by: Oleksii Kachaiev <okachaiev@riotgames.com> Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2021-10-31 18:19:03 +01:00
Oleksii Kachaiev	0503e694b2	Introduce norm_obs_keys param for VecNormalize environment wrapper (#631 ) * Implement new norm_obs_keys param for VecNormalize environment wrapper * Simplified doc string to avoid issues with lint and doc * Updated changelog * Update changelog.rst * Update test_vec_normalize.py * Update sanity checks * Fix backward compat * Update doc * Update changelog * Fix lint warnings * Fix tests * Minor edit * observation_space sanity check was applied twice Co-authored-by: Oleksii Kachaiev <okachaiev@riotgames.com> Co-authored-by: Anssi <kaneran21@hotmail.com> Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>	2021-10-28 19:18:39 +02:00
Antonin RAFFIN	e907eca18e	Fix `set_env` to keep the number of timesteps (#615 ) * Fix for `set_env` * Add test and update changelog * Use underscores and f-strings * Add PyPi info * Update comments	2021-10-23 16:36:40 +02:00
Antonin RAFFIN	1564a85081	System info helper (#613 ) * Add `system_env_info` * Add `print_system_info` to load and store system info at save time * Remove TODO * Rename to `get_system_info` * Import as sb3 for consistency * Update changelog * Add warning for old SB3 versions * Use underscore litteral for more clarity	2021-10-18 10:43:56 +02:00
Antonin RAFFIN	1881d904a0	Doc fix and improve error messages (#598 ) * Fix custom env doc * Catch common mistake * Improve `EvalCallback` error message * Lint test * Update docs/guide/custom_env.rst Co-authored-by: Adam Gleave <adam@gleave.me> Co-authored-by: Adam Gleave <adam@gleave.me>	2021-10-08 18:08:31 +02:00
Antonin RAFFIN	306e49fda6	Fixes in `is_vectorized_observation` (#587 ) * Fix is vectorized bug in DQN * Fix sub-classed obs	2021-09-28 21:57:49 +02:00
Antonin RAFFIN	201fbffa8c	Remove `sde_net_arch` + Simplify policy (#584 ) * Remove `sde_net_arch` + Simplify policy * Add warning at load time	2021-09-28 22:32:54 +03:00
Adam Gleave	e825fbdd33	VecNormalize: allow non-continuous observations when norm_obs is False (#575 ) * VecNormalize: allow non-continuous observations when norm_obs is False * Update changelog, fix lint * Switch to environment present in new and old versions of Gym * Fix name Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2021-09-18 12:11:01 +02:00
Cyprien	f3a35aa786	Add method `predict_values` for ActorCriticPolicy (#569 ) * feat: add method predict_values for ActorCriticPolicy * Fixes for new gym version * Reformat Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2021-09-15 14:03:04 +02:00
Antonin RAFFIN	16f8b21d9b	Add `get_distribution` for on-policy algorithms (#566 ) * feat: get_distribution method for ActorCriticPolicy New method get_distribution for class ActorCriticPolicy returning current action distribution given observations * doc: updating changelog.rst - adding block for Release 1.2.1a0 - adding cyprienc to contributors * style: make format * fix: updating version.txt Changing version from 1.2.0 to 1.2.1a0 * Update changelog * Add test for get distribution Co-authored-by: Cyprien <courtot.c@gmail.com>	2021-09-13 10:25:42 +02:00
Antonin RAFFIN	f8a0869073	Hotfix for Vecnormalize (#558 ) * Hotfix for Vecnormalize * Rename `ret` to `returns`	2021-09-08 12:30:20 +02:00
Scott Brownlie	1afc2f3abe	Avoid putting target networks into training mode (#553 ) * make sure DQN policy is always in correct mode - train or eval * make set_training_mode an abstract method of the base policy - safer * update docstring of _build method to note that the target network is put into eval mode * use set_training_mode to put the dqn target network into eval mode * use set_training_mode to set the training model of the q-network * move set_training_mode abstract method from BasePolicy to BaseModel * set train and eval mode for TD3 * make sure critic is always in correct mode during train * set train and eval mode for SAC * add comment re batch norm and dropout * set train and eval mode for A2C and PPO * add tests for collect rollouts with batch norm * fix formatting * update change log * update version * remove Optional typing for batch size - causing type check to fail * Fix scipy dependency for toy text envs * implement set_training_mode method in BaseModel * move all tests of train/eval mode to test_train_eval_mode * call learn with learning_starts = total_timesteps to test that collect_rollouts does not update batch norm * remove extra calls to set_training_mode in train method of TD3 and SAC * Allow gradient_steps=0 * Refactor tests * Add comment + use aliases * Typos Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2021-08-30 17:42:41 +02:00
David Blom	3efab0d267	Training and evaluation: call model.train() and model.eval() (#537 ) * training and evaluation: call model.train() and model.eval() to enable and disable dropout and batchnorm * Add comment documentation * Fix train and eval for the Actor class * Run black * Add github handle to changelog * Add unit tests for PPO and DQN * Refactor unit test * Run black * unit test: add a dropout layer and check that calling predict with deterministic=True is deterministic * documentation: add bugfix description to changelog * unit test: use learning_starts=0, decrease the size of the network and use more training steps * on policy algorithms: call policy.train() and policy.eval() instead of disable_training and enable_training as it is a th.nn.module * Rename unit test * unit test: use drop out probability of 0.5 * Call policy.train and policy.eval * Fixes + update tests * Remove unneeded eval Co-authored-by: David Blom <davidsblom@gmail.com> Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2021-08-14 14:08:27 +02:00

1 2 3 4 5

221 commits