Commit graph

236 commits

Author SHA1 Message Date
Quentin Gallouédec
96b1a7cf01
env_id consistency in tests (#1224)
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-12-20 16:01:26 +01:00
Alex Pasquali
2cfcec4f50
Modified ActorCriticPolicy to support non-shared features extractor (#1148)
* Modified ActorCriticPolicy to support non-shared features extractor

* Refactored features extraction with non-shared features extractor in ActorCriticPolicy and updated doc

Doc update: added 'warning' on custom policy docs that says that, if the features extractor is non-shared, it's not possible to have shared layers in the mlp_extractor

* Moved attrib share_features_extractor in class

* Updated custom policy doc for non-shared features extractor

* Updated changelog

* Made some if-statements more readable if policies.py

The if-statements are related to the shared/non-shared features extractor in ActorCritic policies

* Simplify implementation and add run test

* Keep order in module gain to keep previous results consistents

* Fix test

* Improved docstring in policies.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Added some tests

* feature extractor -> features extractor

* Fix test

* Fix env_id in test

* Make features extractor parameter explicit

* Remove duplicate

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2022-12-20 15:12:05 +01:00
Antonin RAFFIN
8452106734
Fix support of image like normalized inputs (#1214)
* Fix support of image like normalized inputs

* Improve docstring and warning message.

* Don't check if obs is image when normalize_images is False (lil opt)

* Comment fix

* Fix normalize_images not passed to parent

* Check for subclasses too

* Remove useless multiline

* Update version and add comment

* Fix some typos

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2022-12-20 13:18:28 +01:00
Quentin Gallouédec
e39bc3da00
Add support for multidimensional spaces.MultiBinary observations (#1179)
* Fix `get_obs_shape` for multidimensi onnal Multibinary space

* Update changelog

* more tests

* fix multidiscrete one-hot encoding

* refactor tests

* Update changelog.rst

* Update changelog.rst

* batched obs and revert preprocess_obs changes

* Add support for multidimensional ``spaces.MultiBinary`` observations

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2022-12-08 18:46:41 +01:00
Quentin Gallouédec
0973b01b9d
Fix tests/test_distributions.py type hint (#1186)
* Fixed test_distribution type hint

* Impose list[int] for action dim
2022-11-29 11:27:59 +01:00
Quentin Gallouédec
e3b24829a5
Drop gym.GoalEnv and other minor changes initally from #780 (#1184)
* Various changes from #780

* Fix env_checker for goal_env detection
2022-11-28 18:22:31 +01:00
Antonin RAFFIN
cd630a3121
Fixes for flake8 6.0 (#1181) 2022-11-25 15:14:55 +01:00
Juan Rocamonde
68b190b667
Raise error when same env object instance is passed in vectorized environment (#1154)
* Raise error when same env object instance is passed in vectorized environment

* At to changelog

* Add raises to docstring

* Add test

* Also test make_vec_env

* Fix test

* Try to enable color for MyPy

* Update version and ignore lint warnings

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2022-11-22 14:28:58 +01:00
Quentin Gallouédec
abffa16198
Mypy type checking (#1143)
* Install and configure mypy

* Test if github CI uses setup.cfg for mypy

* force color output

* tab to space

* Try to fix regex

* follow_imports silent

* use space as indentation

* fix indentation setup.cfg

* Show error code

* Update doc

* Udate changelog

* Ignore mypy cache files from commit

* Update gitlab CI

* Add pytype and mypy entry in Makefile

* Make mypy happy

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-11-16 13:22:57 +01:00
Adam Gleave
4fb8aec215
Update evaluate_policy type annotation to support policies as well as RL algorithms (#1146)
* Add PolicyPredictor protocol and use it in evaluate_policy

* Update changelog

* Move Protocol to type_aliases to avoid circular import

* Add test for evaluate_policy on BasePolicy

* Remove unused import

* Use typing_extensions

* Move typing_extensions to 3rd party

* Add version range (typing_extensions uses SemVer)

* Import Protocol from typing_extensions only on Python<3.8

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Install typing_extensions only on Python<3.8

* Add missing sys import

* Fix import ordering

* Fix observation type hint in predict

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>
2022-11-03 15:36:19 +01:00
Quentin Gallouédec
d5d1a02c15
Allow model trained with python3.7 to be loaded with python3.8+ without the custom_objects workaround (#1123)
* Fix loading

* Remove documentation note

* Update changelog

* Revert save_format change

* Add test for errors while unpickling

* Update version and cleanup

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-10-17 17:33:47 +02:00
Juan Rocamonde
cdcdd32c51
Fix return type of evaluate_actions (#1118)
* Fix return type of ActorCriticPolicy.evaluate_actions to optional entropy tensor

* Update changelog.rst
2022-10-14 17:45:28 +02:00
Antonin RAFFIN
508f8ffd59
Remove deprecated features and attributes (#1104)
* Remove deprecated eval env

* Remove deprecated ret attribute

* Remove sde net arch

* Remove unused code

* Update test comment

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2022-10-11 10:55:16 +02:00
tobirohrer
d8a430e088
Deprecate create_eval_env, eval_env and eval_freq parameter (#1082)
* Adds deprecation warning if `eval_env` or `eval_freq` parameters are used. See #925

* added changelog entry

* added missing backtick

* deprecating `create_eval_env` parameter as well and adding comments to explain the `stacklevel` parameter used

* Updated tests to ignore DeprecationWarnings

* Updated changelog entry

* - Removed the `create_eval_env` parameter from the examples in the docs
- Removed information about the `create_eval_env` parameter from the migration docs
- Added information about deprecation of the `create_eval_env` parameter in the docs

* Add alternative in docstring

* Update docstrings

* `eval_freq` warning in docstring

* Add deprecation comments in tests

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>
2022-10-10 15:39:38 +02:00
Antonin RAFFIN
7c21b79188
Add progress bar callback and argument (#1095)
* Add progress bar callback and argument

* Update doc

* Update changelog

* Upgrade pytype in docker image

* Use tqdm.write in the logger to have cleaner output

* Fix logger test

* Fix when doing multiple calls to learn()

* Address comments from code-review
2022-10-06 18:17:31 +02:00
Juan Rocamonde
e22e372306
Fix duplicate key error in HumanOutputFormat (#1079)
* Fix duplicate key error in HumanOutputFormat

* Update changelog

* Add test

* Update changelog.rst

Co-authored-by: Adam Gleave <adam@gleave.me>

Co-authored-by: Adam Gleave <adam@gleave.me>
2022-09-28 12:06:07 +02:00
Dominic Kerr
899eee6bd4
Automatically create missing directories of `filenames passed to ResultsWriter` (#1072)
* Create (if any) missing filename directories, passed into ResultsWriter

* Fixed incorrect ``filename`` docstring (if ``filename`` where ``None``, the string method ``filename.endswith(Monitor.EXT)`` would raise an ``AttributeError``), and renamed ``reset_keywords`` docstring.

* Added description of #1068

* Ignore pytype errors

* Update changelog.rst

Co-authored-by: dominicgkerr <dominicgkerr1@gmail.co>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-09-21 13:14:38 +02:00
Quentin Gallouédec
440735cbd0
Fix loading a model with different number of environments (#1058)
* Fix loading with new `n_envs`

* Update tests

* Update changelog

* Fix the fix

* Remove `self._setup_model()` from `set_env()`

* Raise `AssertionError` when setting env with a different `n_envs`

* Update unitests

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-09-17 11:10:03 +02:00
Quentin Gallouédec
29f6687b98
Raise error when observation keys and observation space keys don't match (#1047)
* Raise error when observation keys and observation space keys don't match

* Print the difference in keys

* Update changelog
2022-09-05 14:54:58 +02:00
Sidney Tio
304c17dc78
Add append mode to Monitor (#1037)
* Added option to override or use existing CSVs

* Updated changelog for Monitor override

* Changed default value to override

* Simplify code and add test

* Update version

* Fix for pytype

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-08-31 11:53:44 +02:00
Hugh Perkins
2cc1477fa2
Fix advantage normalization with mini-batchsize of 1 (#1028)
* fix nan in advnatages with batch size 1, for ppo

* changelog

* black

* Simplify test

* Bump version

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-08-25 11:50:08 +02:00
Anand Balakrishnan
59af0c1b01
CheckpointCallback can now save replay buffer and VecNormalize (#1030)
* CheckpointCallback now saves replay buffer (if present)

* VecNormalize stats are saved at checkpoints

* Make checkpointing replay buffer and VecNormalize opt-in

* Edit changelog

* Add documentation for new parameters

* Update docs/misc/changelog.rst

* Add documentation for new parameters

* Implement suggested edits

* Reformat code

* Fix git conflict

* Add .pkl suffix to VecNormalize checkpoints

* Add tests for new CheckpointCallback params

* Merge CheckpointCallback tests

* Update test and add helper for checkpoint path

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-08-25 10:57:51 +02:00
Honglu Fan
29a481a288
Include running_mean and running_val when updating target networks (#1004)
* include `running_mean` and `running_val` when updating target networks in DQN, SAC, TD3.

* Update stable_baselines3/common/utils.py

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Precompute batch norm parameters in `_setup_model` and directly copy them in the target update.

* include `running_mean` and `running_val` when updating target networks in DQN, SAC, TD3.

* Update stable_baselines3/common/utils.py

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Precompute batch norm parameters in `_setup_model` and directly copy them in the target update.

* Fix `DictReplayBuffer.next_observations` type (#1013)

* Fix DictReplayBuffer.next_observations type

* Update changelog

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Fixed missing verbose parameter passing (#1011)

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Support for `device=auto` buffers and set it as default value (#1009)

* Default device is "auto" for buffer + auto device support in BufferBaseClass

* Update docstring

* Update tests

* Unify tests

* Update changelog

* Fix tests on CUDA device

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>

* Precompute batch norm parameters in `_setup_model` and directly copy them in the target update.

* Update test

* Add comments and update tests

* Bump version

* Remove one extra space to conform code style.

* Update docstrings

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Burak Demirbilek <BurakDmb@users.noreply.github.com>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2022-08-23 10:20:43 +02:00
Timothé
01cc127d32
Support hparams logging to tensorboard (#984)
* create Hparam class & support in all OutputFormats

* add hparams documentation & example

* add hparam tests

* remove unnecessary test & fix name

* format changes

* support hyperparameters logging to tensorboard

* fix HParams class docstring

* use more explicit variable names

* raise error instead of warning

* Unpin protobuf

* Add test for logging hparams

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-08-22 22:06:54 +02:00
Quentin Gallouédec
73822c34da
Support for device=auto buffers and set it as default value (#1009)
* Default device is "auto" for buffer + auto device support in BufferBaseClass

* Update docstring

* Update tests

* Unify tests

* Update changelog

* Fix tests on CUDA device

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2022-08-16 17:54:55 +02:00
Quentin Gallouédec
c4f54fcf04
Handling multi-dimensional action spaces (#971)
* Handle non 1D action shape

* Revert changes of observation (out of the scope of this PR)

* Apply changes  to DictReplayBuffer

* Update tests

* Rollout buffer n-D actions space handling

* Remove error when non 1D action space

* ActorCriticPolicy return action with the proper shape

* remove useless reshape

* Update changelog

* Add tests

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-08-06 14:19:20 +02:00
Adam Gleave
b1cc15970a
Use higher resolution time_ns() and avoid division by zero (#979)
* Use higher resolution time and round up to eps

* Update changelog

* Add test case

* Fix formatting, time()->time_ns

* Bugfix: ns is integer not float

* Move test to better place

* Divide by 1e9 earlier
2022-07-25 23:02:53 +02:00
Quentin Gallouédec
fda3d4d748
Fix returned type in predict (#964)
* `arr[0]` to `arr.squeeze(0)`

* `squeeze(axis=0)` to `squeeze(0)`

* Type testing

* Add type test for unvectorized observation

* `squeeze(0)` to `squeeze(axis=0)`

* Treatment of the laziness symptoms

* Update changelog

* Udate changelog

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-07-18 11:22:19 +02:00
Antonin RAFFIN
c1f1c3d3d7
Release v1.6.0 (#958)
* Release v1.6.0 + update doc + add copy button

* Update read the doc conda env

* Update year

* Fix bug in kl divergence check

* Rephrase requirement for envpool and isaac gym
2022-07-12 22:50:23 +02:00
Max Weltevrede
ef10189d80
Prohibit simultaneous use of optimize_memory_usage and handle_timeout_termination (#948)
* Prohibit simultaneous use of optimize_memory_buffer and handle_timeout_termination

* Modify test to avoid unsupported buffer configuration

* Change from assertion to raising of ValueError

* Update changelog

* Update style for consistency

* Use handle_timeout_termination when possible

Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-07-04 15:08:54 +02:00
Antonin RAFFIN
49813d8c68
Update doc and add check for unbounded action space (#918) 2022-05-25 16:24:21 +02:00
Antonin RAFFIN
0fadc94df3
Fix synchronization bug with EvalCallback (#907) 2022-05-08 21:54:34 +03:00
Antonin RAFFIN
a6f5049a99
Upgrade code to Python 3.7+ syntax using pyupgrade (#887)
* Upgrade code to Python 3.7+ syntax

* Update changelog
2022-04-25 13:01:38 +03:00
Paul Scheikl
ed308a71be
Fixed unchecked None value in SubprocVecEnv (#808)
* Fixed unchecked None value in SubprocVecEnv

* Fixed unchecked None value in DummyVecEnv

* Fix formatting

* Update test and changelog

* Improve test

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-04-12 16:05:40 +02:00
Antonin RAFFIN
39a4f9379a
Escape tensorboard log name (#857)
* escape tensorboard log name

Otherwise utils does not recognize the log.

* Added fix to changelog

* Modifications made by: make commit-checks .

* Revert "Modifications made by: make commit-checks ."

This reverts commit 529a275d9475f85ef031038a8f3565f7301e5371.

* Update changelog and add test

Co-authored-by: James Hirschorn <James.Hirschorn@quantitative-technologies.com>
2022-04-11 21:49:18 +02:00
Yifei Cheng
44e53ff811
Enable force_zip64 (#839)
* Enable force_zip64

* mark tests as expensive

* Update changelog

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-03-28 10:35:33 +02:00
Julio César Alves
cdaa9ab418
Callback to early stop the training if there is no model improvement after consecutive evaluations (#741)
* Added StopTrainingOnNoModelImprovement callback and callback_after_eval parameter in EvalCallback

* Correction in EvalCallback and tests for StopTrainingOnNoModelImprovement

* Update the docs related to new StopTrainingOnNoModelImprovement callback

* Update doc

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2022-02-25 11:56:47 +01:00
Quentin Gallouédec
13fcb12471
Fix normalization for DictReplayBuffer (#744)
* Normalize samples DictReplayBuffer (#743)

* Fixed sample normalization in ``DictReplayBuffer`` (#743)

* Test buffer normalization

* Rename test replay buffer

* Bump version

Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-02-23 13:04:57 +01:00
Boyuan Chen
7a01637128
Fix VecNormalization bug for Dict obs (#768)
* fix #724 VecNormalization bug for Dict obs

* update test and changelog

* Update changelog

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-02-23 12:33:41 +01:00
Costa Huang
d2ebd2eeaa
Allow PPO to turn off advantage normalization (#763)
* Allow PPO to turn of advantage normalization

* update changelog

* Add a test case

* Update test and sanity check

* Fix tests

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-02-22 15:29:21 +01:00
Antonin RAFFIN
7ce4bb8016
Pin gym version (#782)
* Pin gym version

* Cleanup warnings

* Reformat
2022-02-21 23:12:54 +01:00
Adam Gleave
78afcbd6d9
HumanOutputFormat: make length configurable, throw error if keys alias (#756)
* Make HumanOutputFormat length configurable and bump to 36 by default

* Add test case

* Updated changelog

* Blacken

* Blacken code

* Fix GitLab CI: switch to Docker container with new black version

* Incorporate suggestion

* Add class docstring

* Dummy commit to retrigger GitLab

Co-authored-by: Anssi <kaneran21@hotmail.com>
2022-02-05 12:57:35 +02:00
Carlos Luis
5143cd19f7
Gym fixes - Follow up from #705 (#734)
* fix Atari in CI

* fix dtype and atari extra

* Update setup.py

* remove 3.6

* note about how to install Atari

* pendulum-v1

* atari v5

* black

* fix pendulum capitalization

* add minimum version

* moved things in changelog to breaking changes

* partial v5 fix

* env update to pass tests

* mismatch env version fixed

* Fix tests after merge

* Include autorom in setup.py

* Blacken code

* Fix dtype issue in more robust way

* Fix GitLab CI: switch to Docker container with new black version

* Remove workaround from GitLab. (May need to rebuild Docker for this though.)

* Revert to v4

* Update setup.py

* Apply suggestions from code review

* Remove unnecessary autorom

* Consistent gym versions

Co-authored-by: J K Terry <justinkterry@gmail.com>
Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: modanesh <mohamad4danesh@gmail.com>
Co-authored-by: Adam Gleave <adam@gleave.me>
2022-02-04 15:13:57 -08:00
Adam Gleave
f488d0772a
Autoformat code with black (new version complains about new things) (#757)
* Blacken code

* Fix GitLab CI: switch to Docker container with new black version
2022-02-04 02:56:06 +02:00
Paul Scheikl
fc41600225
Fixed logging info_keywords in the VecMonitor class. (#730)
* Writing the additional info_keywords into the episode infos that are passed to the resulst writer. Directly taken from the non-vec version of monitor.

* Added test for monitoring info_keywords.

* Removed unnecessary step of registering the env. Not using make_vec_env, because it applies a monitor wrapper to the env.

* Reformat

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-01-19 17:17:22 +01:00
Antonin RAFFIN
e9a8979022
Add copy and combine method to running mean std (#716)
* Add copy and combine method to running mean std

* Update test

* Faster test

* Update test

* Update test

* Shift values in RMS test
2022-01-06 01:31:04 +02:00
Antonin RAFFIN
bb16645c4e
Add skip option for VecTransposeImage and bug fix in frame stack (#700)
* Update doc

* Add comment

* Add skip option to VecTransposeImage and fix bug in frame stack
2021-12-23 17:12:49 +02:00
Antonin RAFFIN
e24147390d
Improve tests and add check for float32 (#686)
* Add additional checks

* Improve tests and error message

* Update changelog

* Bump version

* Update doc

* Add tests for action space

* Improve test
2021-12-09 14:14:33 +02:00
Antonin RAFFIN
507ed1762e
Multiprocessing support for off policy algorithms (#439)
* Add multi-env training support for SAC

* Fix for dict obs

* Pytype fixes

* Fix assert on number of envs

* Remove for loop

* Add support for Dict obs

* Start cleanup

* Update doc and bug fix

* Add support for vectorized action noise
and add multi env example for off-policy

* Update version

* Bug fix with VecNormalize

* Update README table

* Update variable names

* Update changelog and version

* Update doc and fix for `gradient_steps=-1`

* Add test for `gradient_steps=-1`

* Disable pytype pyi errors

* Fix for DQN

* Update comment on deepcopy

* Remove episode_reward field

* Fix RolloutReturn

* Avoid modification by reference

* Fix error message

Co-authored-by: Anssi <kaneran21@hotmail.com>
2021-12-01 22:30:09 +01:00
Antonin RAFFIN
d228364ccf
Add timeout handling for on-policy algorithms (#658)
* Add timeout handling for on-policy algorithms

* Fixes

* Fix infinite loop in eval

* Skip type check for python 3.9

* Fix for discrete obs + add docstring

* Fix A2C test

* Removed unused helper

* Add test for infinite horizon

* typed ast should be fixed

* Apply suggestions from code review

Co-authored-by: Anssi <kaneran21@hotmail.com>

Co-authored-by: Anssi <kaneran21@hotmail.com>
2021-11-16 17:19:16 +01:00