Commit graph

317 commits

Author SHA1 Message Date
Antonin RAFFIN
49813d8c68
Update doc and add check for unbounded action space (#918) 2022-05-25 16:24:21 +02:00
TibiGG
2fcf8f91c1
Removed redundant double-check of nested Dict (#908)
* Removed redundant double-check of nested Dict observation space from BaseAlgorithm

* Update changelog

Co-authored-by: tibigg <tg4018@ic.ac.uk>
2022-05-09 14:36:15 +03:00
Antonin RAFFIN
0fadc94df3
Fix synchronization bug with EvalCallback (#907) 2022-05-08 21:54:34 +03:00
Thomas Rudolf
c2518dc160
Add doc to use mlflow logger (#889)
* ADD feature for mlflow logger via MLflowOutputFormat.

* Move MLFlow integration to doc

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-05-08 15:28:31 +02:00
Antonin RAFFIN
c5f0aa5de0
Update doc: PPO blog post and remark on timeouts (#896) 2022-05-01 16:26:34 +02:00
Antonin RAFFIN
a6f5049a99
Upgrade code to Python 3.7+ syntax using pyupgrade (#887)
* Upgrade code to Python 3.7+ syntax

* Update changelog
2022-04-25 13:01:38 +03:00
Bryan Collazo
3c468ff558
Update ppo documentation (remove redundant and) (#874)
* Update ppo documentation (remove redundant and)

PTAL, thanks!

* Update changelog

* Pin ale-py version

Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2022-04-19 14:15:51 +02:00
Paul Scheikl
ed308a71be
Fixed unchecked None value in SubprocVecEnv (#808)
* Fixed unchecked None value in SubprocVecEnv

* Fixed unchecked None value in DummyVecEnv

* Fix formatting

* Update test and changelog

* Improve test

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-04-12 16:05:40 +02:00
Antonin RAFFIN
39a4f9379a
Escape tensorboard log name (#857)
* escape tensorboard log name

Otherwise utils does not recognize the log.

* Added fix to changelog

* Modifications made by: make commit-checks .

* Revert "Modifications made by: make commit-checks ."

This reverts commit 529a275d9475f85ef031038a8f3565f7301e5371.

* Update changelog and add test

Co-authored-by: James Hirschorn <James.Hirschorn@quantitative-technologies.com>
2022-04-11 21:49:18 +02:00
Antonin RAFFIN
248f082cdc
Bump min PyTorch version (#855) 2022-04-11 18:34:15 +02:00
Quentin Gallouédec
16703b1314
Fix HER goal selection (#848)
* Goal sampled from next_achieved_goal instead of achived_goal

* No need to have special case for future anymore

* Update changelog

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-04-11 17:50:02 +02:00
Grégoire Passault
254bb10c42
Replacing the policy registry with policy "aliases" (#842)
* Replacing the policy registry with policy "aliases"

* Fixing import order and SAC

* Changing arg. order to be sure policy_aliases is a kwarg

* Import orders

* Removing pytype error check

* Reformat

* Fix alias import

* Not using mutable {} as default for policy_aliases

* Empty aliases initialization

* Using static attributes for policy_aliases

* Fixing isort

* Fixing back bad merge

* Running isort

* Fixing aliases for A2C and PPO

* Using f-string

* Moving policy_aliases definition position

* Adding change in the changelog

* Update version

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-04-08 21:21:53 +02:00
Yifei Cheng
44e53ff811
Enable force_zip64 (#839)
* Enable force_zip64

* mark tests as expensive

* Update changelog

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-03-28 10:35:33 +02:00
Antonin RAFFIN
30772aa9f5
Release v1.5.0 (#835)
* Release v1.5.0

* Fix link
2022-03-25 14:38:22 +01:00
Grégoire Passault
00ac43b0a9
Removing dead code for handling time limits (#831)
* Removing dead code for handling time limits (see #829)

* Mentionning remove_time_limit_termination in the changelog

* Update changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-03-23 13:33:55 +01:00
Yuan
009bb0549a
Update tensorboard.rst in SummaryWriterCallback (#822)
* Update tensorboard.rst

* update changelog.rst

* update changelog.rst, add username
2022-03-15 21:48:52 +01:00
Antonin RAFFIN
e88eb1c9ca
Add explanation of logger output (#803)
* Add explanation of logger output

* Apply suggestions from code review

Co-authored-by: Anssi <kaneran21@hotmail.com>

* Add example output

Co-authored-by: Anssi <kaneran21@hotmail.com>
2022-03-07 12:20:43 +01:00
Julio César Alves
cdaa9ab418
Callback to early stop the training if there is no model improvement after consecutive evaluations (#741)
* Added StopTrainingOnNoModelImprovement callback and callback_after_eval parameter in EvalCallback

* Correction in EvalCallback and tests for StopTrainingOnNoModelImprovement

* Update the docs related to new StopTrainingOnNoModelImprovement callback

* Update doc

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2022-02-25 11:56:47 +01:00
Quentin Gallouédec
db5366fb51
None as default value for env in HerReplayBuffer.sample + DQN batch size typing fix (#790)
* `env` to `None` by default in `HerReplayBuffer.sample` (#788)

* Fix DQN batch_size typing

* Fix changelog

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2022-02-24 15:51:01 +01:00
Quentin Gallouédec
13fcb12471
Fix normalization for DictReplayBuffer (#744)
* Normalize samples DictReplayBuffer (#743)

* Fixed sample normalization in ``DictReplayBuffer`` (#743)

* Test buffer normalization

* Rename test replay buffer

* Bump version

Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-02-23 13:04:57 +01:00
Boyuan Chen
7a01637128
Fix VecNormalization bug for Dict obs (#768)
* fix #724 VecNormalization bug for Dict obs

* update test and changelog

* Update changelog

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-02-23 12:33:41 +01:00
Costa Huang
d2ebd2eeaa
Allow PPO to turn off advantage normalization (#763)
* Allow PPO to turn of advantage normalization

* update changelog

* Add a test case

* Update test and sanity check

* Fix tests

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-02-22 15:29:21 +01:00
Antonin RAFFIN
7ce4bb8016
Pin gym version (#782)
* Pin gym version

* Cleanup warnings

* Reformat
2022-02-21 23:12:54 +01:00
Gianluca De Cola
58a98060f9
Update docstring on MlpExtractor. Resolves #736 (#774)
* Improve docstring on MlpExtractor.

* update changelog.
2022-02-16 01:50:17 +02:00
Gautam J
59bec30180
update docs fix indentation (#764)
* update docs fix indentation

Changed code block indentation from 2 spaces to 4 spaces for consistency.

* update changelog

* Update changelog.rst

Co-authored-by: Anssi <kaneran21@hotmail.com>
2022-02-07 21:00:53 +02:00
Manuel
40bda9a918
Remove explict forward calls (#753)
* Remove explict forward calls

* Changelog and commit checks.

* Reverted test forward removal for super call.

Co-authored-by: Anssi <kaneran21@hotmail.com>
2022-02-06 22:27:12 +02:00
Adam Gleave
78afcbd6d9
HumanOutputFormat: make length configurable, throw error if keys alias (#756)
* Make HumanOutputFormat length configurable and bump to 36 by default

* Add test case

* Updated changelog

* Blacken

* Blacken code

* Fix GitLab CI: switch to Docker container with new black version

* Incorporate suggestion

* Add class docstring

* Dummy commit to retrigger GitLab

Co-authored-by: Anssi <kaneran21@hotmail.com>
2022-02-05 12:57:35 +02:00
Adam Gleave
9ff26dafed
Fix changelog (#760) 2022-02-05 02:12:17 +02:00
Carlos Luis
5143cd19f7
Gym fixes - Follow up from #705 (#734)
* fix Atari in CI

* fix dtype and atari extra

* Update setup.py

* remove 3.6

* note about how to install Atari

* pendulum-v1

* atari v5

* black

* fix pendulum capitalization

* add minimum version

* moved things in changelog to breaking changes

* partial v5 fix

* env update to pass tests

* mismatch env version fixed

* Fix tests after merge

* Include autorom in setup.py

* Blacken code

* Fix dtype issue in more robust way

* Fix GitLab CI: switch to Docker container with new black version

* Remove workaround from GitLab. (May need to rebuild Docker for this though.)

* Revert to v4

* Update setup.py

* Apply suggestions from code review

* Remove unnecessary autorom

* Consistent gym versions

Co-authored-by: J K Terry <justinkterry@gmail.com>
Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: modanesh <mohamad4danesh@gmail.com>
Co-authored-by: Adam Gleave <adam@gleave.me>
2022-02-04 15:13:57 -08:00
Armand du Parc Locmaria
44dfedc061
Add furuta pendulum project to project list (#742)
* add furuta pendulum project

* Update changelog to reflect addition to docs

Co-authored-by: Anssi <kaneran21@hotmail.com>
2022-02-04 11:39:49 +02:00
Antonin RAFFIN
54bcfa4544
Add Hugging Face integration to SB3 doc (#733)
* Add Hugging Face to SB3 doc

* Update doc + fixes

* Use SB3 model from the hub

* Bump version

* Fixes

Co-authored-by: simoninithomas <simonini_thomas@outlook.fr>
2022-01-20 10:04:12 +01:00
Paul Scheikl
fc41600225
Fixed logging info_keywords in the VecMonitor class. (#730)
* Writing the additional info_keywords into the episode infos that are passed to the resulst writer. Directly taken from the non-vec version of monitor.

* Added test for monitoring info_keywords.

* Removed unnecessary step of registering the env. Not using make_vec_env, because it applies a monitor wrapper to the env.

* Reformat

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-01-19 17:17:22 +01:00
Antonin RAFFIN
21f6a474a4
Release 1.4.0 (#729)
* Release 1.4.0

* Add integration section in the readme
2022-01-19 11:16:15 +01:00
Antonin RAFFIN
cd6e04705b
Update SB3 Contrib doc (ARS) and W&B integration (#726)
* Add ARS to SB3 contrib

* Add integration page
2022-01-18 15:10:25 +01:00
Antonin RAFFIN
e9a8979022
Add copy and combine method to running mean std (#716)
* Add copy and combine method to running mean std

* Update test

* Faster test

* Update test

* Update test

* Shift values in RMS test
2022-01-06 01:31:04 +02:00
IperGiove
d9e198e04f
Update custom_policy.rst (#711)
* Update custom_policy.rst

Added methods forward_actor and forward_critic in CustomNetwork class.

* Update doc

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-01-03 16:22:58 +01:00
Thomas Gubler
c895c1d46f
Doc fix: A2C - fix guidance on RMSpropTFLike (#708)
* doc: A2C/migration: fix guidance on RMSpropTFLike

* Update changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-12-30 11:28:12 +01:00
Antonin RAFFIN
4a5dfaedfc
Update SB3 contrib doc (+ fix backward compat) (#707)
* Fix `VecNormalize` load for SB3<= 1.3.0

* Update SB3 contrib doc

* Bump version
2021-12-29 14:25:09 +01:00
Antonin RAFFIN
bb16645c4e
Add skip option for VecTransposeImage and bug fix in frame stack (#700)
* Update doc

* Add comment

* Add skip option to VecTransposeImage and fix bug in frame stack
2021-12-23 17:12:49 +02:00
Quentin Gallouédec
d496cd4d95
Consistent use of device as keyword argument (#702)
* consistent device as keyword arg

* Fixed ``device`` arg inconsistency in changelog
2021-12-22 11:43:59 +01:00
Demetrio92
798b16aaf7
more verbose documentation regarding .load vs .set_parameters (#696)
* more verbose documentation regarding `.load` vs `.set_parameters` (#683, #614)

* add a note to explain the difference between `.load` and `.set_parameters` to the examples

* fix typos

Co-authored-by: Anssi <kaneran21@hotmail.com>

Co-authored-by: Anssi <kaneran21@hotmail.com>
2021-12-18 17:28:37 +02:00
hsuehch
222a69ca49
Eliminate extra empty lines in CSV monitor files on Windows (DLR-RM#692) (#695)
* Added ``newline="\n"`` when opening CSV monitor files so that each line ends with ``\r\n`` instead of ``\r\r\n`` on Windows while Linux environments are not affected
2021-12-18 16:04:33 +02:00
Antonin RAFFIN
e24147390d
Improve tests and add check for float32 (#686)
* Add additional checks

* Improve tests and error message

* Update changelog

* Bump version

* Update doc

* Add tests for action space

* Improve test
2021-12-09 14:14:33 +02:00
Antonin RAFFIN
77f4f5021d
Drop Python 3.6 support (#685)
* Drop python 3.6 support

* Update doc

* Update gitlab CI

* Update doc env

* Fix gitlab CI
2021-12-06 12:54:43 +01:00
Antonin RAFFIN
507ed1762e
Multiprocessing support for off policy algorithms (#439)
* Add multi-env training support for SAC

* Fix for dict obs

* Pytype fixes

* Fix assert on number of envs

* Remove for loop

* Add support for Dict obs

* Start cleanup

* Update doc and bug fix

* Add support for vectorized action noise
and add multi env example for off-policy

* Update version

* Bug fix with VecNormalize

* Update README table

* Update variable names

* Update changelog and version

* Update doc and fix for `gradient_steps=-1`

* Add test for `gradient_steps=-1`

* Disable pytype pyi errors

* Fix for DQN

* Update comment on deepcopy

* Remove episode_reward field

* Fix RolloutReturn

* Avoid modification by reference

* Fix error message

Co-authored-by: Anssi <kaneran21@hotmail.com>
2021-12-01 22:30:09 +01:00
Antonin RAFFIN
2ebb8aa22b
Update Citation (#684)
* Update citation

* Remove cff file
2021-12-01 18:55:21 +01:00
Antonin RAFFIN
52c29dc497
Fix evaluation script for recurrent policies (#678)
* Fix evaluation script for RNN

* Add error message

* Revert "Add error message"

This reverts commit 8d69b6cf4de2cd13aecfb425bd3145fad6a6c49a.

* Fix for pytype

* Rename mask to `episode_start`

* Fix type hint

* Fix type hints

* Remove confusing part of sentence

Co-authored-by: Anssi <kaneran21@hotmail.com>
2021-11-30 13:49:06 +01:00
Gary Briggs
8e5ede783f
Add a section on exporting to TFLite/Coral with demonstration (#679)
* Add a section on exporting to TFLite/Coral with demonstration

* Changelog to reflect new export documentation

* Update docs/guide/export.rst

Fingers on autopilot make word wrong

Co-authored-by: Anssi <kaneran21@hotmail.com>

* Update docs/guide/export.rst

Better wording clarity

Co-authored-by: Anssi <kaneran21@hotmail.com>

* Update docs/guide/export.rst

Better wording clarity

Co-authored-by: Anssi <kaneran21@hotmail.com>

* Clarify motivations and hardware

* Update docs/misc/changelog.rst

Make consistent with other changelog entries

Co-authored-by: Anssi <kaneran21@hotmail.com>

* Sphinx wants the section underline to be at least this long

* Remove first-person voice

* Typos

Co-authored-by: Anssi <kaneran21@hotmail.com>
2021-11-28 10:54:50 +01:00
Shyamal H Anadkat
3b68dc7312
Update GAE computation docstring (#655)
* Fix typo in buffers.py

* Revert "Fix typo in buffers.py"

This reverts commit ca643d5e3a509ae1b8a65bf0de98f4609ca9d8da.

* Ignore pytype errors

* Update GAE computation docstring

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-11-25 10:53:42 +01:00
Parth Kothari
58e5506385
Editted Authors of DriverGym project (#669) 2021-11-18 10:18:18 +01:00
Parth Kothari
1ac35eaef2
Add DriverGym project to SB3 project documentation (#665)
* Added DriverGym project

* Updated changelog

* Update changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-11-17 11:13:43 +01:00
Antonin RAFFIN
d228364ccf
Add timeout handling for on-policy algorithms (#658)
* Add timeout handling for on-policy algorithms

* Fixes

* Fix infinite loop in eval

* Skip type check for python 3.9

* Fix for discrete obs + add docstring

* Fix A2C test

* Removed unused helper

* Add test for infinite horizon

* typed ast should be fixed

* Apply suggestions from code review

Co-authored-by: Anssi <kaneran21@hotmail.com>

Co-authored-by: Anssi <kaneran21@hotmail.com>
2021-11-16 17:19:16 +01:00
Antonin RAFFIN
e75e1de4c1
Fix indentation in RL tips doc (#657)
* Update rl_tips.rst

indent fix to make if done and its following statement work

* Fix indentation and update changelog

* Skip type check for python 3.9

Co-authored-by: paulg <cove9988@gmail.com>
2021-11-10 16:54:20 +00:00
Antonin RAFFIN
2bb4500948
Fix set_env when using VecNormalize (#638)
* Fix `set_env` when using `VecNormalize`

* Update version
2021-11-02 13:52:26 +02:00
ac-93
98c1a637cf
add tactile-gym to the list of projects using SB3 (#640)
* Update projects.rst

* Update changelog.rst

* Update projects.rst

* Fix doc build

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-10-31 18:26:06 +01:00
Oleksii Kachaiev
0c17fedfac
Adjust FPS calculation to accommodate for reset_num_timesteps=False (#636)
* Store number of timesteps at the beginning of each learn cycle

* Update changelog

* Set default _num_timesteps_at_start in the contructor

* Test case for FPS logger

* Adjust test to cover both on-policy and off-policy algorithms

* Fix formatting

* Update test and add comment

* Fix test

Co-authored-by: Oleksii Kachaiev <okachaiev@riotgames.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-10-31 18:19:03 +01:00
Edouard Leurent
a2e3001598
Add highway-env to the list of projects using SB3 (#639)
* Add highway-env to the list of projects using SB3

Many thanks for this fantastic library, keep up the good work!

* Update changelog with added documentation
2021-10-30 13:53:36 +02:00
Oleksii Kachaiev
0503e694b2
Introduce norm_obs_keys param for VecNormalize environment wrapper (#631)
* Implement new norm_obs_keys param for VecNormalize environment wrapper

* Simplified doc string to avoid issues with lint and doc

* Updated changelog

* Update changelog.rst

* Update test_vec_normalize.py

* Update sanity checks

* Fix backward compat

* Update doc

* Update changelog

* Fix lint warnings

* Fix tests

* Minor edit

* observation_space sanity check was applied twice

Co-authored-by: Oleksii Kachaiev <okachaiev@riotgames.com>
Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2021-10-28 19:18:39 +02:00
Antonin RAFFIN
7b977d7b03
Release 1.3.0 (#625) 2021-10-23 17:07:00 +02:00
Antonin RAFFIN
e907eca18e
Fix set_env to keep the number of timesteps (#615)
* Fix for `set_env`

* Add test and update changelog

* Use underscores and f-strings

* Add PyPi info

* Update comments
2021-10-23 16:36:40 +02:00
Antonin RAFFIN
1564a85081
System info helper (#613)
* Add `system_env_info`

* Add `print_system_info` to load
and store system info at save time

* Remove TODO

* Rename to `get_system_info`

* Import as sb3 for consistency

* Update changelog

* Add warning for old SB3 versions

* Use underscore litteral for more clarity
2021-10-18 10:43:56 +02:00
Timo Kaufmann
09e9fc42eb
Use consistent logging keys (#605)
* Use a consistent key to log the total timesteps

This changes the timestep logging key of on-policy algorithms from
`time/total_timesteps` to `time/total timesteps` (note the
underscore/space). The off-policy algorithms and the eval callback
already use the latter, so this behavior is more consistent.

* Use underscores instead of spaces in logging keys

Most keys already followed this policy and consistent behavior is
friendlier to new users.

* Minor edit and bump version

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-10-12 13:17:30 +02:00
Antonin RAFFIN
75aa31dcfb
Update SB3 contrib algorithms (#604) 2021-10-10 15:41:39 +02:00
Antonin RAFFIN
1881d904a0
Doc fix and improve error messages (#598)
* Fix custom env doc

* Catch common mistake

* Improve `EvalCallback` error message

* Lint test

* Update docs/guide/custom_env.rst

Co-authored-by: Adam Gleave <adam@gleave.me>

Co-authored-by: Adam Gleave <adam@gleave.me>
2021-10-08 18:08:31 +02:00
Ilja Avadiev
740d61ada3
Doc fix environment mixup (#588) 2021-09-29 10:16:59 +02:00
Antonin RAFFIN
306e49fda6
Fixes in is_vectorized_observation (#587)
* Fix is vectorized bug in DQN

* Fix sub-classed obs
2021-09-28 21:57:49 +02:00
Antonin RAFFIN
201fbffa8c
Remove sde_net_arch + Simplify policy (#584)
* Remove `sde_net_arch` + Simplify policy

* Add warning at load time
2021-09-28 22:32:54 +03:00
batu
89af49ca91
ONNX Documentation Update (#464)
* Updated ONNX documentation

First draft on the documentation explaining how to export SB3 models in the ONNX format

* Updated changelog with ONNX documentation fix

* Address comments

* Update changelog.rst

* Update rtd env

* Fixes + add test example

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Anssi Kanervisto <anssk@Anssis-MacBook-Air.local>
Co-authored-by: Anssi Kanervisto <kaneran21@hotmail.com>
2021-09-26 17:40:35 +02:00
Baek Junyeob
914bc10a0d
Add policy-distillation-baselines to project page (#578)
* Update projects.rst

* Update docs/misc/projects.rst

* Apply suggestions from code review

* Update changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-09-20 16:30:16 +02:00
Adam Gleave
e825fbdd33
VecNormalize: allow non-continuous observations when norm_obs is False (#575)
* VecNormalize: allow non-continuous observations when norm_obs is False

* Update changelog, fix lint

* Switch to environment present in new and old versions of Gym

* Fix name

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-09-18 12:11:01 +02:00
Matthew Allen
76c212a854
Add RLGym to project page (#576)
* Add RLGym to projects list.

Per the request in this issue on our repo: https://github.com/lucas-emery/rocket-league-gym/issues/24

* Update changelog documentation section

* Update changelog.rst

* Update docs/misc/projects.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-09-18 11:47:22 +02:00
Wilhelm Kirchgässner
303df08a80
Add GEM project to project section of doc (#574)
* add GEM project to project section of doc

* Update docs/misc/projects.rst

* Update changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-09-18 11:10:04 +02:00
Cyprien
f3a35aa786
Add method predict_values for ActorCriticPolicy (#569)
* feat: add method predict_values for ActorCriticPolicy

* Fixes for new gym version

* Reformat

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-09-15 14:03:04 +02:00
Antonin RAFFIN
16f8b21d9b
Add get_distribution for on-policy algorithms (#566)
* feat: get_distribution method for ActorCriticPolicy

New method get_distribution for class ActorCriticPolicy returning current action distribution given observations

* doc: updating changelog.rst

- adding block for Release 1.2.1a0
- adding cyprienc to contributors

* style: make format

* fix: updating version.txt

Changing version from 1.2.0 to 1.2.1a0

* Update changelog

* Add test for get distribution

Co-authored-by: Cyprien <courtot.c@gmail.com>
2021-09-13 10:25:42 +02:00
Antonin RAFFIN
f8a0869073
Hotfix for Vecnormalize (#558)
* Hotfix for Vecnormalize

* Rename `ret` to `returns`
2021-09-08 12:30:20 +02:00
Antonin RAFFIN
f9e5753acd
Refactor BasePolicy predict (#559) 2021-09-05 02:27:45 +03:00
Scott Brownlie
1afc2f3abe
Avoid putting target networks into training mode (#553)
* make sure DQN policy is always in correct mode - train or eval

* make set_training_mode an abstract method of the base policy - safer

* update docstring of _build method to note that the target network is put into eval mode

* use set_training_mode to put the dqn target network into eval mode

* use set_training_mode to set the training model of the q-network

* move set_training_mode abstract method from BasePolicy to BaseModel

* set train and eval mode for TD3

* make sure critic is always in correct mode during train

* set train and eval mode for SAC

* add comment re batch norm and dropout

* set train and eval mode for A2C and PPO

* add tests for collect rollouts with batch norm

* fix formatting

* update change log

* update version

* remove Optional typing for batch size - causing type check to fail

* Fix scipy dependency for toy text envs

* implement set_training_mode method in BaseModel

* move all tests of train/eval mode to test_train_eval_mode

* call learn with learning_starts = total_timesteps to test that collect_rollouts does not update batch norm

* remove extra calls to set_training_mode in train method of TD3 and SAC

* Allow gradient_steps=0

* Refactor tests

* Add comment + use aliases

* Typos

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-08-30 17:42:41 +02:00
David Blom
3efab0d267
Training and evaluation: call model.train() and model.eval() (#537)
* training and evaluation: call model.train() and model.eval() to enable and disable dropout and batchnorm

* Add comment documentation

* Fix train and eval for the Actor class

* Run black

* Add github handle to changelog

* Add unit tests for PPO and DQN

* Refactor unit test

* Run black

* unit test: add a dropout layer and check that calling predict with deterministic=True is deterministic

* documentation: add bugfix description to changelog

* unit test: use learning_starts=0, decrease the size of the network and use more training steps

* on policy algorithms: call policy.train() and policy.eval() instead of disable_training and enable_training as it is a th.nn.module

* Rename unit test

* unit test: use drop out probability of 0.5

* Call policy.train and policy.eval

* Fixes + update tests

* Remove unneeded eval

Co-authored-by: David Blom <davidsblom@gmail.com>
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-08-14 14:08:27 +02:00
MihaiAnca13
c41368f2ea
Docs examples warning - issue #526 (#530)
* Update a2c.rst

* Update ddpg.rst

* Update dqn.rst

* Update her.rst

* Update ppo.rst

* Update sac.rst

* Update td3.rst

* Update changelog.rst

* modified message

* Update examples.rst

Co-authored-by: Anssi <kaneran21@hotmail.com>
2021-08-09 16:23:25 +03:00
Antonin RAFFIN
be86883f36
Fix type annotations (#522)
* Fix type annotations

* Add citation file

* Update CITATION.cff

* Add note about tb logging

Co-authored-by: Anssi <kaneran21@hotmail.com>
2021-07-29 13:02:09 +02:00
Antonin RAFFIN
503425932f
Documentation fixes (#514)
* Update multiprocessing example

* Add VecEnvWrapper example

* Update docs/guide/vec_envs.rst

Co-authored-by: Anssi <kaneran21@hotmail.com>

Co-authored-by: Anssi <kaneran21@hotmail.com>
2021-07-18 20:51:41 +02:00
Antonin RAFFIN
2fa06ae8d2
Add Python3.9 CI + upgrade min PyTorch version (#503)
* Add Python3.9 CI + upgrade min PyTorch version

* Upgrade min PyTorch version
2021-07-06 09:32:03 +02:00
Antonin RAFFIN
5af35fa2cc
Release v1.1.0 (#497) 2021-07-02 11:21:09 +02:00
Skander Moalla
abbf48e93e
Fix Inconsistencies with EvalCallback tensorboard logs (#492)
* Make EvalCallback dump the evaluation logs it records #457.

* Make test deterministic

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-07-01 15:43:08 +02:00
Carlo Rizzardo
066e1409d9
Corrected DictReplayBuffer observation dtype #484 (#486)
* Fix observation buffer dtype in DictReplayBuffer

* Formatting fix (line length)

* Changelog update, bugfix DictReplaybuffer observations dtype
2021-06-22 13:41:26 +02:00
Antonin RAFFIN
b52c6fc18f
Fix logger setup (#469)
* Make logger an attribute

* Update doc

* Fix logger reset when using multiple runs

* Cleanup logger: remove `Logger.CURRENT`

* Fix for PPO

* Update tests and improve docstring

* Add warning

* Throw error when tensorboard not installed
2021-06-14 15:17:48 +02:00
Benjamin Steenhoek
180a2e3832
Remove recurrent policies from A2C docs (#470)
* Remove recurrent policies from A2C docs

Recurrent policies are not supported yet as of (https://github.com/DLR-RM/stable-baselines3/issues/160#issuecomment-694756355), but the docs say that A2C supports them. Changing it to avoid misleading.

* Update changelog

Co-authored-by: benjaminjsteenhoek@gmail.com <benjis@iastate.edu>
2021-06-07 19:39:49 +02:00
Benjamin Black
a038044d11
Added support for vector envs in evaluation (#447)
* added vector env support to evaluate_policy

* fixed linting and documentation

* updated changelog

* fixed code style issue

* added tests for vec env

* fixed formatting

* renamed observations

* added comments for vector evaluation

* fixed issues

* Cleanup + bump version

* Add comment

* Fix wrong count of episodes

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2021-05-28 12:40:29 +02:00
Antonin RAFFIN
88e1be9ff5
Documentation update (#450)
* Update migration guide

* Add sanity check

* Removed parameter ``channels_last`` from ``is_image_space``

* Pin docutils

* Clarify callback `save_freq` definition

* Update docs/misc/changelog.rst

* Update docs/misc/changelog.rst

* Fix typos

Co-authored-by: Anssi <kaneran21@hotmail.com>
2021-05-23 13:13:11 +02:00
Amanda Dsouza
18f4e3ace0
Added wrapper_kwargs argument to make_vec_env (#448)
* Added wrapper_kwargs to make_vec_env

* code black format

* Tmp fix for atari-py

* Update changelog

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-05-23 11:33:34 +02:00
Rohan Tangri
df6f9de8f4
KL Divergence Helper Function (#431)
* add kl divergence wrapper

* add test

* update changelog

* black lint

* remove unused import

* Fix ent coef loading for SAC (#429)

* Fix ent coef loading for SAC

* Better fix and add comment

* add 'distribution' to base Distribution class

* add sample test

* revert to plain pytorch implementation

* black reformat

* Update docs/misc/changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Doc update (custom policy + fix her example) (#436)

* isort and black reformat

* float -> bool tensor

* add sanity test

* more concise kl code

* remove outdated comment

* all -> allclose assertion

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Fix PyTorch warning

* Update gSDE entropy test

* Update entropy test

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2021-05-20 19:01:07 +02:00
Antonin RAFFIN
378d197b00
Doc update (custom policy + fix her example) (#436) 2021-05-16 18:21:07 +02:00
Antonin RAFFIN
1ce911994b
Fix ent coef loading for SAC (#429)
* Fix ent coef loading for SAC

* Better fix and add comment
2021-05-12 12:21:54 +03:00
Jaden Travnik
75b6f3b3b0
Dictionary Observations (#243)
* First commit

* Fixing missing refs from a quick merge from master

* Reformat

* Adding DictBuffers

* Reformat

* Minor reformat

* added slow dict test. Added SACMultiInputPolicy for future. Added private static image transpose helper to common policy

* Ran black on buffers

* Ran isort

* Adding StackedObservations classes used within VecStackEnvs wrappers. Made test_dict_env shorter and removed slow

* Running isort :facepalm

* Fixed typing issues

* Adding docstrings and typing. Using util for moving data to device.

* Fixed trailing commas

* Fix types

* Minor edits

* Avoid duplicating code

* Fix calls to parents

* Adding assert to buffers. Updating changelong

* Running format on buffers

* Adding multi-input policies to dqn,td3,a2c. Fixing warnings. Fixed bug with DictReplayBuffer as Replay buffers use only 1 env

* Fixing warnings, splitting is_vectorized_observation into multiple functions based on space type

* Created envs folder in common. Updated imports. Moved stacked_obs to vec_env folder

* Moved envs to envs directory. Moved stacked obs to vec_envs. Started update on documentation

* Fixes

* Running code style

* Update docstrings on torch_layers

* Decapitalize non-constant variables

* Using NatureCNN architecture in combined extractor. Increasing img size in multi input env. Adding memory reduction in test

* Update doc

* Update doc

* Fix format

* Removing NineRoom env. Using nested preprocess. Removing mutable default args

* running code style

* Passing channel check through to stacked dict observations.

* Running black

* Adding channel control to SimpleMultiObsEnv. Passing check_channels to CombinedExtractor

* Remove optimize memory for dict buffers

* Update doc

* Move identity env

* Minor edits + bump version

* Update doc

* Fix doc build

* Bug fixes + add support for more type of dict env

* Fixes + add multi env test

* Add support for vectranspose

* Fix stacked obs for dict and add tests

* Add check for nested spaces. Fix dict-subprocvecenv test

* Fix (single) pytype error

* Simplify CombinedExtractor

* Fix tests

* Fix check

* Merge branch 'master' into feat/dict_observations

* Fix for net_arch with dict and vector obs

* Fixes

* Add consistency test

* Update env checker

* Add some docs on dict obs

* Update default CNN feature vector size

* Refactor HER (#351)

* Start refactoring HER

* Fixes

* Additional fixes

* Faster tests

* WIP: HER as a custom replay buffer

* New replay only version (working with DQN)

* Add support for all off-policy algorithms

* Fix saving/loading

* Remove ObsDictWrapper and add VecNormalize tests with dict

* Stable-Baselines3 v1.0 (#354)

* Bump version and update doc

* Fix name

* Apply suggestions from code review

Co-authored-by: Adam Gleave <adam@gleave.me>

* Update docs/index.rst

Co-authored-by: Adam Gleave <adam@gleave.me>

* Update wording for RL zoo

Co-authored-by: Adam Gleave <adam@gleave.me>

* Add gym-pybullet-drones project (#358)

* Update projects.rst

Added gym-pybullet-drones

* Update projects.rst

Longer title underline

* Update changelog

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>

* Include SuperSuit in projects (#359)

* include supersuit

* longer title underline

* Update changelog.rst

* Fix default arguments + add bugbear (#363)

* Fix potential bug + add bug bear

* Remove unused variables

* Minor: version bump

* Add code of conduct + update doc (#373)

* Add code of conduct

* Fix DQN doc example

* Update doc (channel-last/first)

* Apply suggestions from code review

Co-authored-by: Anssi <kaneran21@hotmail.com>

* Apply suggestions from code review

Co-authored-by: Adam Gleave <adam@gleave.me>

Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Adam Gleave <adam@gleave.me>

* Make installation command compatible with ZSH (#376)

* Add quotes

* Add Zsh bracket info

* Add clarify pip installation line

* Make note bold

* Add Zsh pip installation note

* Add handle timeouts param

* Fixes

* Fixes (buffer size, extend test)

* Fix `max_episode_length` redefinition

* Fix potential issue

* Add some docs on dict obs

* Fix performance bug

* Fix slowdown

* Add package to install (#378)

* Add package to install

* Update docs packages installation command

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Fix backward compat + add test

* Fix VecEnv detection

* Update doc

* Fix vec env check

* Support for `VecMonitor` for gym3-style environments (#311)

* add vectorized monitor

* auto format of the code

* add documentation and VecExtractDictObs

* refactor and add test cases

* add test cases and format

* avoid circular import and fix doc

* fix type

* fix type

* oops

* Update stable_baselines3/common/monitor.py

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Update stable_baselines3/common/monitor.py

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* add test cases

* update changelog

* fix mutable argument

* quick fix

* Apply suggestions from code review

* fix terminal observation for gym3 envs

* delete comment

* Update doc and bump version

* Add warning when already using `Monitor` wrapper

* Update vecmonitor tests

* Fixes

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Reformat

* Fixed loading of ``ent_coef`` for ``SAC`` and ``TQC``, it was not optimized anymore (#392)

* Fix ent coef loading bug

* Add test

* Add comment

* Reuse save path

* Add test for GAE + rename `RolloutBuffer.dones` for clarification (#375)

* Fix return computation + add test for GAE

* Rename `last_dones` to `episode_starts` for clarification

* Revert advantage

* Cleanup test

* Rename variable

* Clarify return computation

* Clarify docs

* Add multi-episode rollout test

* Reformat

Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>

* Fixed saving of `A2C` and `PPO` policy when using gSDE (#401)

* Improve doc and replay buffer loading

* Add support for images

* Fix doc

* Update Procgen doc

* Update changelog

* Update docstrings

Co-authored-by: Adam Gleave <adam@gleave.me>
Co-authored-by: Jacopo Panerati <jacopo.panerati@utoronto.ca>
Co-authored-by: Justin Terry <justinkterry@gmail.com>
Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Tom Dörr <tomdoerr96@gmail.com>
Co-authored-by: Tom Dörr <tom.doerr@tum.de>
Co-authored-by: Costa Huang <costa.huang@outlook.com>

* Update doc and minor fixes

* Update doc

* Added note about MultiInputPolicy in error of NatureCNN

* Merge branch 'master' into feat/dict_observations

* Address comments

* Naming clarifications

* Actually saving the file would be nice

* Fix edge case when doing online sampling with HER

* Cleanup

* Add sanity check

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>
Co-authored-by: Adam Gleave <adam@gleave.me>
Co-authored-by: Jacopo Panerati <jacopo.panerati@utoronto.ca>
Co-authored-by: Justin Terry <justinkterry@gmail.com>
Co-authored-by: Tom Dörr <tomdoerr96@gmail.com>
Co-authored-by: Tom Dörr <tom.doerr@tum.de>
Co-authored-by: Costa Huang <costa.huang@outlook.com>
2021-05-11 12:29:30 +02:00
Rohan Tangri
2ada2dd0b2
Update PPO KL Divergence Estimator (#419)
* remove unused all_kl_divs memory

* new kl approximate equation

* move kl check before update step

* update changelog

* add continue_training flag update to kl check

* add verbose check

* update changelog

* lint with black

* r -> log_ratio

* Add link to PR

* invert ratio

* Fix for Sphinx v4.0

Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-05-10 13:21:00 +03:00
Rohan Tangri
35da0b59b9
Policy Base for On-policy Algorithms (#412) (#415)
* add policy_base input to OnPolicyAlgorithms

* update changelog

* Fix pytype error

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-05-04 12:59:36 +03:00
Antonin RAFFIN
6f822b9ed7
Reformat with new black version (#408)
* Reformat

* Update changelog
2021-04-26 15:58:19 +02:00
Antonin RAFFIN
c69f7cd5e6
Fixed saving of A2C and PPO policy when using gSDE (#401) 2021-04-19 12:23:02 +02:00
Antonin RAFFIN
5d47296b8d
Add test for GAE + rename RolloutBuffer.dones for clarification (#375)
* Fix return computation + add test for GAE

* Rename `last_dones` to `episode_starts` for clarification

* Revert advantage

* Cleanup test

* Rename variable

* Clarify return computation

* Clarify docs

* Add multi-episode rollout test

* Reformat

Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>
2021-04-16 15:52:55 +02:00
Antonin RAFFIN
c4304029a2
Fixed loading of `ent_coef for SAC and TQC`, it was not optimized anymore (#392)
* Fix ent coef loading bug

* Add test

* Add comment

* Reuse save path
2021-04-15 14:50:43 +02:00