Commit graph

272 commits

Author SHA1 Message Date
Quentin Gallouédec
12e9917c24
Fix image-based normalized env loading (#1321)
* Fix

* Add test

* Update changelog

* fix memory error avoidance

* Update version

* image env test

* black

* check_shape_equal

* check shape equal in vecnormalize

* Allow spaces not to be box or dict

* rm `test_save_load_vecnormalized_image` in favor of `test_vec_env`

* Remove unused imports

---------

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2023-02-15 14:17:18 +01:00
Vikas Kumar
69b94dd6a8
Rename "timesteps" to "episodes" in log_interval documentation (#1325)
* change timestamp to episode for logging

* update changelog

* minor format modif

* minor format modif

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2023-02-10 21:15:09 +01:00
Sidney Tio
489b1fdaf2
Add the argument dtype (default to float32) to the noise (#1301)
* Fixed noise to return float32

* Updated changelog

* Fixed test to use numpy arrays instead of python floats

* Sorted imports for tests

* Added dtype to constructor

* Removed dtype parameter for VectorizedActionNoise

* __init__ -> None; Capitalize and period in docstring when needed; fix dtype type hint; dtype in docstring

* fix dtype type hint

* Update version

* Clarify changelog [skip ci]

* empty commit to run ci

* Update docs/misc/changelog.rst

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2023-02-07 13:42:14 +01:00
Quentin Gallouédec
2e4a45020e
Refactor observation stacking (#1238)
* refactor stacking obs

* Improve docstring

* remove all StackedDictObservations

* Update tests and make stacked obs clearer

* Fix type check

* fix stacked_observation_space

* undo init change, deprecate StackedDictObservations

* deprecate stack_observation_space

* type hints

* ignore pytype errors

* undo vecenv doc change

* Deprecation warning in StackedDictObs doctstring

* Fix vec_env.rst

* Fix __all__ sorting

* fix pytype ignore statement

* Update docstring

* stack

* Remove n_stack

* Update changelog

* Simplify code

* Rename test file

* Re-use variable for shift

* Fix doc build

* Remove pytype comment

* Disable pytype error

---------

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2023-02-06 22:41:59 +01:00
adamfrly
411ff697dd
Ensure train/n_updates metric accounts for early stopping of training loop (#1311)
* Correct _n_updates when target_kl stops loop early

* Update changelog

* Simplify code

---------

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2023-02-06 15:48:41 +01:00
Quentin Gallouédec
82bc63fca4
Upgrade black formatting (#1310)
* apply black

* Reformat tests

---------

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2023-02-02 11:58:41 +01:00
Alex Pasquali
bea3c44ba5
Fixed typo in A2C's docstring (#1303) 2023-01-28 12:04:07 +01:00
Quentin Gallouédec
5ee9009535
Add sticky actions for Atari games (#1286)
* repeat_action_probability

* Add test

* Undo atari wrapper doc change since CI fails

* remove action_repeat_probability from make_atari_env

* Add sticky action wrapper and improve documentation

* Update changelog

* handle the case noop_max=0

* Update tests

* Comply to ALE implementation

* Reorder doc

* Add doc warning and don't wrap with sticky action when not needed

* fix docstring and reorder

* Move `action_repeat_probability` args at the last position

* Add ref

* Update doc and wrap with frameskip only if needed

* Update changelog

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2023-01-26 10:32:58 +01:00
Quentin Gallouédec
637988c9cc
Fix Atari wrapper bug: tried to step environment that needs reset (#1297)
* fix 1060

* update changelog
2023-01-26 00:31:20 +01:00
Alex Pasquali
b702884c23
Removed shared layers in mlp_extractor (#1292)
* Modified actor-critic policies & MlpExtractor class

ActorCriticPolicy:
  - changed type hint of net_arch param: now it's a dict
  - removed check that if features extractor is not shared: no shared layers are allowed in the mlp_extractor regardless of the features extractor
ActorCriticCnnPolicy:
  - changed type hint of net_arch param: now it's a dict
MultiInputActorcriticPolicy:
  - changed type hint of net_arch param: now it's a dict
MlpExtractor:
  - changed type hint of net_arch param: now it's a dict
  - adapted networks creation
  - adapted methods: forward, forward_actor & forward_critic

* Removed shared layers in mlp_extractor

* Updated docs and changelog + reformat

* Updated custom policy tests

* Removed test on deprecation warning for share layers in mlp_extractor

Now shared layers are removed

* Update version

* Update RL Zoo doc

* Fix linter warnings

* Add ruff to Makefile (experimental)

* Add backward compat code and minor updates

* Update tests

* Add backward compatibility

* Fix test

* Improve compat code

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2023-01-23 14:55:19 +01:00
Quentin Gallouédec
69fdf155e1
Downgrade sphinx-autodoc-typehints (#1291)
* Update setup.py

* black

* hotfix pytype
2023-01-23 10:56:45 +01:00
Quentin Gallouédec
92f7a6f23b
Fix test_vec_normalize.py, test_tensorboard.py and common/monitor.py type hint (#1194)
* Remove from mypy exclude

* type hint for metadata

* Union[float, int] -> float

* Remove useless __init__

* Type hint for model and logger in BaseCallback

* Type hint for metric_dict

* Update changelog

* fix test_tensorboard

* ignore gamma type checking

* Fix monitor type hint

* Update logger type hints

* Fix type annotation and bump version

* Fix circular import

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2023-01-13 18:28:22 +01:00
Yu Zheng
9bb1538b78
Fix outdated load_parameters to set_parameters (#1270)
* Update examples.rst

* Update changelog

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2023-01-11 14:13:21 +01:00
Antonin RAFFIN
6b8905acdb
Release v1.7.0 (#1268) 2023-01-10 17:32:57 +01:00
Dominic Kerr
5aa6e7d340
Fix ProgressBarCallback under-reporting (#1260)
* Updated tqdm progress bar constructor to account for the effects of train_freq/n_steps/num_envs on total_timesteps. Ensure progress bar is "flushed" on training end.

* Added description of PR #1260. Fixed formatting typo

* Partial revert

Co-authored-by: dominicgkerr <dominicgkerr1@gmail.co>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2023-01-10 15:17:52 +01:00
Alex Pasquali
30a19848ce
Deprecation of shared layers in MlpExtractor (#1252)
* Deprecation warning for shared layers in Mlpextractor

* Updated changelog

* Updated custom policy doc

* Update doc and deprecation

* Fix doc build

* Minor edits

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2023-01-05 09:59:36 +01:00
Quentin Gallouédec
4fa17dcf0f
Standardize the use of from gym import spaces (#1240)
* generalize the use of `from gym import spaces`

* command line get system info

* Documentation line length for doc

* update changelog

* add space before os plateform to avoid ref to other issue

* format

* get_system_info update in changelog

* fix type check error

* fix get system info

* add comment about regex

* update version
2023-01-02 14:51:11 +01:00
Antonin RAFFIN
e78ba6ffa4
Hotfix to load policies saved with SB3 <= v1.6 (#1234)
* Hotfix to load policies saved with SB3 <= v1.6

* Add warning and test

* Update doc
2022-12-22 23:58:30 +01:00
Antonin RAFFIN
3c028f3d5c
Fix load_from_tensor (#1231) 2022-12-22 17:28:18 +01:00
Quentin Gallouédec
5549b34231
Fix `stable_baselines3/common/vec_env/vec_check_nan.py` type hints (#1226)
* super() init style

* "async_step" arg to "event"; "news" to "dones"; improve docstring

* Remove vec_check_nan from mypy exclude

* Update changelog
2022-12-22 12:24:59 +01:00
Alex Pasquali
2cfcec4f50
Modified ActorCriticPolicy to support non-shared features extractor (#1148)
* Modified ActorCriticPolicy to support non-shared features extractor

* Refactored features extraction with non-shared features extractor in ActorCriticPolicy and updated doc

Doc update: added 'warning' on custom policy docs that says that, if the features extractor is non-shared, it's not possible to have shared layers in the mlp_extractor

* Moved attrib share_features_extractor in class

* Updated custom policy doc for non-shared features extractor

* Updated changelog

* Made some if-statements more readable if policies.py

The if-statements are related to the shared/non-shared features extractor in ActorCritic policies

* Simplify implementation and add run test

* Keep order in module gain to keep previous results consistents

* Fix test

* Improved docstring in policies.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Added some tests

* feature extractor -> features extractor

* Fix test

* Fix env_id in test

* Make features extractor parameter explicit

* Remove duplicate

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2022-12-20 15:12:05 +01:00
Antonin RAFFIN
8452106734
Fix support of image like normalized inputs (#1214)
* Fix support of image like normalized inputs

* Improve docstring and warning message.

* Don't check if obs is image when normalize_images is False (lil opt)

* Comment fix

* Fix normalize_images not passed to parent

* Check for subclasses too

* Remove useless multiline

* Update version and add comment

* Fix some typos

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2022-12-20 13:18:28 +01:00
Quentin Gallouédec
ca944fed2d
Update version (#1220)
* Replace .to(device) when possible

* fix numpy dep

* black

* Add warning for device != cpu and copy=False

* Update changelog

* Remove warning

* Update buffers.py

* Update version

* Fix type checking

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-12-19 13:53:00 +01:00
Antonin Raffin
213b06b0c6
Monkey-patch np.bool = bool 2022-12-19 13:20:48 +01:00
Quentin Gallouédec
68a40e0940
Construct tensors directly on GPU (#1218)
* Replace .to(device) when possible

* fix numpy dep

* black

* Add warning for device != cpu and copy=False

* Update changelog

* Remove warning

* Update buffers.py
2022-12-19 12:50:22 +01:00
Antonin RAFFIN
0c1bc0b1da
Fix stable_baselines3/common/atari_wrappers.py type hints (#1216)
* Fix `stable_baselines3/common/atari_wrappers.py` type hints

* Fix initialization

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2022-12-18 16:13:44 +01:00
Antonin RAFFIN
07094c3f2e
Fix stable_baselines3/common/preprocessing.py type hints (#1217) 2022-12-18 15:53:17 +01:00
Quentin Gallouédec
e39bc3da00
Add support for multidimensional spaces.MultiBinary observations (#1179)
* Fix `get_obs_shape` for multidimensi onnal Multibinary space

* Update changelog

* more tests

* fix multidiscrete one-hot encoding

* refactor tests

* Update changelog.rst

* Update changelog.rst

* batched obs and revert preprocess_obs changes

* Add support for multidimensional ``spaces.MultiBinary`` observations

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2022-12-08 18:46:41 +01:00
Quentin Gallouédec
002850f8ac
Fix stable_baselines3/common/torch_layers.py type hint (#1191)
* Remove torch layers from mypy exclude

* Make torch layers mypy compliant

* Extra type specification

* Update changelog

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-11-29 23:46:32 +01:00
Zikang Xiong
852d635742
Exposed modules in __init__.py with __all__ (#1195)
* Exposed modules in __init__.py with __all__

* Remove flake8 ignore and update root __all__

* Update version

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-11-29 23:33:46 +01:00
Quentin Gallouédec
b46396a664
Fix stable_baselines3/common/env_util.py type hint (#1192)
* Remove env_util from mypy exclude

* Fix make_atari_env type hint

* Update changelog
2022-11-29 15:36:55 +01:00
Quentin Gallouédec
5cd891317e
Add with_bias parameter to create_mlp (#1188)
* Add with_bias arg

* Update changelog

* move torch_layers to the last position

* Update version

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-11-29 12:43:16 +01:00
Quentin Gallouédec
6902fac5e7
Fix stable_baselines3/common/type_aliases.py type hint (#1189) 2022-11-29 12:26:16 +01:00
Quentin Gallouédec
0973b01b9d
Fix tests/test_distributions.py type hint (#1186)
* Fixed test_distribution type hint

* Impose list[int] for action dim
2022-11-29 11:27:59 +01:00
Quentin Gallouédec
aee0ba03c7
Update changelog for #1184 (#1185) 2022-11-28 19:36:26 +01:00
Quentin Gallouédec
e3b24829a5
Drop gym.GoalEnv and other minor changes initally from #780 (#1184)
* Various changes from #780

* Fix env_checker for goal_env detection
2022-11-28 18:22:31 +01:00
Antonin RAFFIN
cd630a3121
Fixes for flake8 6.0 (#1181) 2022-11-25 15:14:55 +01:00
Juan Rocamonde
68b190b667
Raise error when same env object instance is passed in vectorized environment (#1154)
* Raise error when same env object instance is passed in vectorized environment

* At to changelog

* Add raises to docstring

* Add test

* Also test make_vec_env

* Fix test

* Try to enable color for MyPy

* Update version and ignore lint warnings

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2022-11-22 14:28:58 +01:00
Quentin Gallouédec
f3abda5cbc
Fix Self return type (#1167)
* Fix Self annotation

* Update changelog

* Define type var on top

* ClassSelf to SelfClass

* annotate self

* Revert Running meanstd change

* Revert vecnormalize change (static method rejected)

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-11-22 13:42:39 +01:00
Adam Gleave
4fb8aec215
Update evaluate_policy type annotation to support policies as well as RL algorithms (#1146)
* Add PolicyPredictor protocol and use it in evaluate_policy

* Update changelog

* Move Protocol to type_aliases to avoid circular import

* Add test for evaluate_policy on BasePolicy

* Remove unused import

* Use typing_extensions

* Move typing_extensions to 3rd party

* Add version range (typing_extensions uses SemVer)

* Import Protocol from typing_extensions only on Python<3.8

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Install typing_extensions only on Python<3.8

* Add missing sys import

* Fix import ordering

* Fix observation type hint in predict

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>
2022-11-03 15:36:19 +01:00
Quentin Gallouédec
d5d1a02c15
Allow model trained with python3.7 to be loaded with python3.8+ without the custom_objects workaround (#1123)
* Fix loading

* Remove documentation note

* Update changelog

* Revert save_format change

* Add test for errors while unpickling

* Update version and cleanup

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-10-17 17:33:47 +02:00
Quentin Gallouédec
5ef10c8e69
Fix type annotation of `policy in BaseAlgorithm and OffPolicyAlgorithm` (#1120) 2022-10-17 10:16:20 +02:00
Juan Rocamonde
cdcdd32c51
Fix return type of evaluate_actions (#1118)
* Fix return type of ActorCriticPolicy.evaluate_actions to optional entropy tensor

* Update changelog.rst
2022-10-14 17:45:28 +02:00
Antonin RAFFIN
508f8ffd59
Remove deprecated features and attributes (#1104)
* Remove deprecated eval env

* Remove deprecated ret attribute

* Remove sde net arch

* Remove unused code

* Update test comment

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2022-10-11 10:55:16 +02:00
Antonin RAFFIN
e2f81bb70b
Release v1.6.2 (#1103)
* Release v1.6.2

* Remove Gitlab CI, no more minutes
2022-10-10 16:37:11 +02:00
tobirohrer
d8a430e088
Deprecate create_eval_env, eval_env and eval_freq parameter (#1082)
* Adds deprecation warning if `eval_env` or `eval_freq` parameters are used. See #925

* added changelog entry

* added missing backtick

* deprecating `create_eval_env` parameter as well and adding comments to explain the `stacklevel` parameter used

* Updated tests to ignore DeprecationWarnings

* Updated changelog entry

* - Removed the `create_eval_env` parameter from the examples in the docs
- Removed information about the `create_eval_env` parameter from the migration docs
- Added information about deprecation of the `create_eval_env` parameter in the docs

* Add alternative in docstring

* Update docstrings

* `eval_freq` warning in docstring

* Add deprecation comments in tests

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>
2022-10-10 15:39:38 +02:00
Antonin RAFFIN
7c21b79188
Add progress bar callback and argument (#1095)
* Add progress bar callback and argument

* Update doc

* Update changelog

* Upgrade pytype in docker image

* Use tqdm.write in the logger to have cleaner output

* Fix logger test

* Fix when doing multiple calls to learn()

* Address comments from code-review
2022-10-06 18:17:31 +02:00
Alex Pasquali
6a8c9ddc8b
Updated type hint and extended docstring in make_vec_env and make_atari_env (#1085)
* Updated type hint and extended docstring in make_vec_env

The function itself was already working with callables, but it wasn't considerent in the type hint of the function's signature.

Extended the description of the wrapper_class parameter with a link to a Github issue containing more details on the matter.

* Updated type hint in make_atari_env

The function itself was already working with callables, but it wasn't considerent in the type hint of the function's signature.

* Updated docstring in make_atari_env

When modifying the type hint of the parameter 'env_id' (in this commit: fda6872f73c11075901ba88f2520f6316f818d1d), I forgot to update its description in the docstrig.
Doing it now.

* Removed redundant type in env_id's type hint in make_vec_env and make_atari_env

Callable[..., gym.Env] already includes Type[gym.Env], as pointed out here: https://github.com/DLR-RM/stable-baselines3/pull/1085#issuecomment-1269685218

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-10-06 13:36:06 +02:00
Antonin RAFFIN
21300c9aaf
Release v1.6.1 (#1080) 2022-09-29 12:15:55 +02:00
Akhil
def0574d03
Fixed typos (#1076)
* Updated docstring from n_steps to n_rollout_steps

This must be a typo

* Fixed typo in a comment in ppo.py

* Update changelog

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2022-09-28 14:57:46 +02:00