Commit graph

230 commits

Author SHA1 Message Date
Antonin RAFFIN
4a5dfaedfc
Update SB3 contrib doc (+ fix backward compat) (#707)
* Fix `VecNormalize` load for SB3<= 1.3.0

* Update SB3 contrib doc

* Bump version
2021-12-29 14:25:09 +01:00
Antonin RAFFIN
bb16645c4e
Add skip option for VecTransposeImage and bug fix in frame stack (#700)
* Update doc

* Add comment

* Add skip option to VecTransposeImage and fix bug in frame stack
2021-12-23 17:12:49 +02:00
Quentin Gallouédec
d496cd4d95
Consistent use of device as keyword argument (#702)
* consistent device as keyword arg

* Fixed ``device`` arg inconsistency in changelog
2021-12-22 11:43:59 +01:00
Demetrio92
798b16aaf7
more verbose documentation regarding .load vs .set_parameters (#696)
* more verbose documentation regarding `.load` vs `.set_parameters` (#683, #614)

* add a note to explain the difference between `.load` and `.set_parameters` to the examples

* fix typos

Co-authored-by: Anssi <kaneran21@hotmail.com>

Co-authored-by: Anssi <kaneran21@hotmail.com>
2021-12-18 17:28:37 +02:00
hsuehch
222a69ca49
Eliminate extra empty lines in CSV monitor files on Windows (DLR-RM#692) (#695)
* Added ``newline="\n"`` when opening CSV monitor files so that each line ends with ``\r\n`` instead of ``\r\r\n`` on Windows while Linux environments are not affected
2021-12-18 16:04:33 +02:00
Antonin RAFFIN
e24147390d
Improve tests and add check for float32 (#686)
* Add additional checks

* Improve tests and error message

* Update changelog

* Bump version

* Update doc

* Add tests for action space

* Improve test
2021-12-09 14:14:33 +02:00
Antonin RAFFIN
77f4f5021d
Drop Python 3.6 support (#685)
* Drop python 3.6 support

* Update doc

* Update gitlab CI

* Update doc env

* Fix gitlab CI
2021-12-06 12:54:43 +01:00
Antonin RAFFIN
507ed1762e
Multiprocessing support for off policy algorithms (#439)
* Add multi-env training support for SAC

* Fix for dict obs

* Pytype fixes

* Fix assert on number of envs

* Remove for loop

* Add support for Dict obs

* Start cleanup

* Update doc and bug fix

* Add support for vectorized action noise
and add multi env example for off-policy

* Update version

* Bug fix with VecNormalize

* Update README table

* Update variable names

* Update changelog and version

* Update doc and fix for `gradient_steps=-1`

* Add test for `gradient_steps=-1`

* Disable pytype pyi errors

* Fix for DQN

* Update comment on deepcopy

* Remove episode_reward field

* Fix RolloutReturn

* Avoid modification by reference

* Fix error message

Co-authored-by: Anssi <kaneran21@hotmail.com>
2021-12-01 22:30:09 +01:00
Antonin RAFFIN
2ebb8aa22b
Update Citation (#684)
* Update citation

* Remove cff file
2021-12-01 18:55:21 +01:00
Antonin RAFFIN
52c29dc497
Fix evaluation script for recurrent policies (#678)
* Fix evaluation script for RNN

* Add error message

* Revert "Add error message"

This reverts commit 8d69b6cf4de2cd13aecfb425bd3145fad6a6c49a.

* Fix for pytype

* Rename mask to `episode_start`

* Fix type hint

* Fix type hints

* Remove confusing part of sentence

Co-authored-by: Anssi <kaneran21@hotmail.com>
2021-11-30 13:49:06 +01:00
Gary Briggs
8e5ede783f
Add a section on exporting to TFLite/Coral with demonstration (#679)
* Add a section on exporting to TFLite/Coral with demonstration

* Changelog to reflect new export documentation

* Update docs/guide/export.rst

Fingers on autopilot make word wrong

Co-authored-by: Anssi <kaneran21@hotmail.com>

* Update docs/guide/export.rst

Better wording clarity

Co-authored-by: Anssi <kaneran21@hotmail.com>

* Update docs/guide/export.rst

Better wording clarity

Co-authored-by: Anssi <kaneran21@hotmail.com>

* Clarify motivations and hardware

* Update docs/misc/changelog.rst

Make consistent with other changelog entries

Co-authored-by: Anssi <kaneran21@hotmail.com>

* Sphinx wants the section underline to be at least this long

* Remove first-person voice

* Typos

Co-authored-by: Anssi <kaneran21@hotmail.com>
2021-11-28 10:54:50 +01:00
Shyamal H Anadkat
3b68dc7312
Update GAE computation docstring (#655)
* Fix typo in buffers.py

* Revert "Fix typo in buffers.py"

This reverts commit ca643d5e3a509ae1b8a65bf0de98f4609ca9d8da.

* Ignore pytype errors

* Update GAE computation docstring

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-11-25 10:53:42 +01:00
Parth Kothari
58e5506385
Editted Authors of DriverGym project (#669) 2021-11-18 10:18:18 +01:00
Parth Kothari
1ac35eaef2
Add DriverGym project to SB3 project documentation (#665)
* Added DriverGym project

* Updated changelog

* Update changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-11-17 11:13:43 +01:00
Antonin RAFFIN
d228364ccf
Add timeout handling for on-policy algorithms (#658)
* Add timeout handling for on-policy algorithms

* Fixes

* Fix infinite loop in eval

* Skip type check for python 3.9

* Fix for discrete obs + add docstring

* Fix A2C test

* Removed unused helper

* Add test for infinite horizon

* typed ast should be fixed

* Apply suggestions from code review

Co-authored-by: Anssi <kaneran21@hotmail.com>

Co-authored-by: Anssi <kaneran21@hotmail.com>
2021-11-16 17:19:16 +01:00
Antonin RAFFIN
e75e1de4c1
Fix indentation in RL tips doc (#657)
* Update rl_tips.rst

indent fix to make if done and its following statement work

* Fix indentation and update changelog

* Skip type check for python 3.9

Co-authored-by: paulg <cove9988@gmail.com>
2021-11-10 16:54:20 +00:00
Antonin RAFFIN
2bb4500948
Fix set_env when using VecNormalize (#638)
* Fix `set_env` when using `VecNormalize`

* Update version
2021-11-02 13:52:26 +02:00
ac-93
98c1a637cf
add tactile-gym to the list of projects using SB3 (#640)
* Update projects.rst

* Update changelog.rst

* Update projects.rst

* Fix doc build

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-10-31 18:26:06 +01:00
Oleksii Kachaiev
0c17fedfac
Adjust FPS calculation to accommodate for reset_num_timesteps=False (#636)
* Store number of timesteps at the beginning of each learn cycle

* Update changelog

* Set default _num_timesteps_at_start in the contructor

* Test case for FPS logger

* Adjust test to cover both on-policy and off-policy algorithms

* Fix formatting

* Update test and add comment

* Fix test

Co-authored-by: Oleksii Kachaiev <okachaiev@riotgames.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-10-31 18:19:03 +01:00
Edouard Leurent
a2e3001598
Add highway-env to the list of projects using SB3 (#639)
* Add highway-env to the list of projects using SB3

Many thanks for this fantastic library, keep up the good work!

* Update changelog with added documentation
2021-10-30 13:53:36 +02:00
Oleksii Kachaiev
0503e694b2
Introduce norm_obs_keys param for VecNormalize environment wrapper (#631)
* Implement new norm_obs_keys param for VecNormalize environment wrapper

* Simplified doc string to avoid issues with lint and doc

* Updated changelog

* Update changelog.rst

* Update test_vec_normalize.py

* Update sanity checks

* Fix backward compat

* Update doc

* Update changelog

* Fix lint warnings

* Fix tests

* Minor edit

* observation_space sanity check was applied twice

Co-authored-by: Oleksii Kachaiev <okachaiev@riotgames.com>
Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2021-10-28 19:18:39 +02:00
Antonin RAFFIN
7b977d7b03
Release 1.3.0 (#625) 2021-10-23 17:07:00 +02:00
Antonin RAFFIN
e907eca18e
Fix set_env to keep the number of timesteps (#615)
* Fix for `set_env`

* Add test and update changelog

* Use underscores and f-strings

* Add PyPi info

* Update comments
2021-10-23 16:36:40 +02:00
Antonin RAFFIN
1564a85081
System info helper (#613)
* Add `system_env_info`

* Add `print_system_info` to load
and store system info at save time

* Remove TODO

* Rename to `get_system_info`

* Import as sb3 for consistency

* Update changelog

* Add warning for old SB3 versions

* Use underscore litteral for more clarity
2021-10-18 10:43:56 +02:00
Timo Kaufmann
09e9fc42eb
Use consistent logging keys (#605)
* Use a consistent key to log the total timesteps

This changes the timestep logging key of on-policy algorithms from
`time/total_timesteps` to `time/total timesteps` (note the
underscore/space). The off-policy algorithms and the eval callback
already use the latter, so this behavior is more consistent.

* Use underscores instead of spaces in logging keys

Most keys already followed this policy and consistent behavior is
friendlier to new users.

* Minor edit and bump version

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-10-12 13:17:30 +02:00
Antonin RAFFIN
75aa31dcfb
Update SB3 contrib algorithms (#604) 2021-10-10 15:41:39 +02:00
Antonin RAFFIN
1881d904a0
Doc fix and improve error messages (#598)
* Fix custom env doc

* Catch common mistake

* Improve `EvalCallback` error message

* Lint test

* Update docs/guide/custom_env.rst

Co-authored-by: Adam Gleave <adam@gleave.me>

Co-authored-by: Adam Gleave <adam@gleave.me>
2021-10-08 18:08:31 +02:00
Ilja Avadiev
740d61ada3
Doc fix environment mixup (#588) 2021-09-29 10:16:59 +02:00
Antonin RAFFIN
306e49fda6
Fixes in is_vectorized_observation (#587)
* Fix is vectorized bug in DQN

* Fix sub-classed obs
2021-09-28 21:57:49 +02:00
Antonin RAFFIN
201fbffa8c
Remove sde_net_arch + Simplify policy (#584)
* Remove `sde_net_arch` + Simplify policy

* Add warning at load time
2021-09-28 22:32:54 +03:00
batu
89af49ca91
ONNX Documentation Update (#464)
* Updated ONNX documentation

First draft on the documentation explaining how to export SB3 models in the ONNX format

* Updated changelog with ONNX documentation fix

* Address comments

* Update changelog.rst

* Update rtd env

* Fixes + add test example

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Anssi Kanervisto <anssk@Anssis-MacBook-Air.local>
Co-authored-by: Anssi Kanervisto <kaneran21@hotmail.com>
2021-09-26 17:40:35 +02:00
Baek Junyeob
914bc10a0d
Add policy-distillation-baselines to project page (#578)
* Update projects.rst

* Update docs/misc/projects.rst

* Apply suggestions from code review

* Update changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-09-20 16:30:16 +02:00
Adam Gleave
e825fbdd33
VecNormalize: allow non-continuous observations when norm_obs is False (#575)
* VecNormalize: allow non-continuous observations when norm_obs is False

* Update changelog, fix lint

* Switch to environment present in new and old versions of Gym

* Fix name

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-09-18 12:11:01 +02:00
Matthew Allen
76c212a854
Add RLGym to project page (#576)
* Add RLGym to projects list.

Per the request in this issue on our repo: https://github.com/lucas-emery/rocket-league-gym/issues/24

* Update changelog documentation section

* Update changelog.rst

* Update docs/misc/projects.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-09-18 11:47:22 +02:00
Wilhelm Kirchgässner
303df08a80
Add GEM project to project section of doc (#574)
* add GEM project to project section of doc

* Update docs/misc/projects.rst

* Update changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-09-18 11:10:04 +02:00
Cyprien
f3a35aa786
Add method predict_values for ActorCriticPolicy (#569)
* feat: add method predict_values for ActorCriticPolicy

* Fixes for new gym version

* Reformat

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-09-15 14:03:04 +02:00
Antonin RAFFIN
16f8b21d9b
Add get_distribution for on-policy algorithms (#566)
* feat: get_distribution method for ActorCriticPolicy

New method get_distribution for class ActorCriticPolicy returning current action distribution given observations

* doc: updating changelog.rst

- adding block for Release 1.2.1a0
- adding cyprienc to contributors

* style: make format

* fix: updating version.txt

Changing version from 1.2.0 to 1.2.1a0

* Update changelog

* Add test for get distribution

Co-authored-by: Cyprien <courtot.c@gmail.com>
2021-09-13 10:25:42 +02:00
Antonin RAFFIN
f8a0869073
Hotfix for Vecnormalize (#558)
* Hotfix for Vecnormalize

* Rename `ret` to `returns`
2021-09-08 12:30:20 +02:00
Antonin RAFFIN
f9e5753acd
Refactor BasePolicy predict (#559) 2021-09-05 02:27:45 +03:00
Scott Brownlie
1afc2f3abe
Avoid putting target networks into training mode (#553)
* make sure DQN policy is always in correct mode - train or eval

* make set_training_mode an abstract method of the base policy - safer

* update docstring of _build method to note that the target network is put into eval mode

* use set_training_mode to put the dqn target network into eval mode

* use set_training_mode to set the training model of the q-network

* move set_training_mode abstract method from BasePolicy to BaseModel

* set train and eval mode for TD3

* make sure critic is always in correct mode during train

* set train and eval mode for SAC

* add comment re batch norm and dropout

* set train and eval mode for A2C and PPO

* add tests for collect rollouts with batch norm

* fix formatting

* update change log

* update version

* remove Optional typing for batch size - causing type check to fail

* Fix scipy dependency for toy text envs

* implement set_training_mode method in BaseModel

* move all tests of train/eval mode to test_train_eval_mode

* call learn with learning_starts = total_timesteps to test that collect_rollouts does not update batch norm

* remove extra calls to set_training_mode in train method of TD3 and SAC

* Allow gradient_steps=0

* Refactor tests

* Add comment + use aliases

* Typos

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-08-30 17:42:41 +02:00
David Blom
3efab0d267
Training and evaluation: call model.train() and model.eval() (#537)
* training and evaluation: call model.train() and model.eval() to enable and disable dropout and batchnorm

* Add comment documentation

* Fix train and eval for the Actor class

* Run black

* Add github handle to changelog

* Add unit tests for PPO and DQN

* Refactor unit test

* Run black

* unit test: add a dropout layer and check that calling predict with deterministic=True is deterministic

* documentation: add bugfix description to changelog

* unit test: use learning_starts=0, decrease the size of the network and use more training steps

* on policy algorithms: call policy.train() and policy.eval() instead of disable_training and enable_training as it is a th.nn.module

* Rename unit test

* unit test: use drop out probability of 0.5

* Call policy.train and policy.eval

* Fixes + update tests

* Remove unneeded eval

Co-authored-by: David Blom <davidsblom@gmail.com>
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-08-14 14:08:27 +02:00
MihaiAnca13
c41368f2ea
Docs examples warning - issue #526 (#530)
* Update a2c.rst

* Update ddpg.rst

* Update dqn.rst

* Update her.rst

* Update ppo.rst

* Update sac.rst

* Update td3.rst

* Update changelog.rst

* modified message

* Update examples.rst

Co-authored-by: Anssi <kaneran21@hotmail.com>
2021-08-09 16:23:25 +03:00
Antonin RAFFIN
be86883f36
Fix type annotations (#522)
* Fix type annotations

* Add citation file

* Update CITATION.cff

* Add note about tb logging

Co-authored-by: Anssi <kaneran21@hotmail.com>
2021-07-29 13:02:09 +02:00
Antonin RAFFIN
503425932f
Documentation fixes (#514)
* Update multiprocessing example

* Add VecEnvWrapper example

* Update docs/guide/vec_envs.rst

Co-authored-by: Anssi <kaneran21@hotmail.com>

Co-authored-by: Anssi <kaneran21@hotmail.com>
2021-07-18 20:51:41 +02:00
Antonin RAFFIN
2fa06ae8d2
Add Python3.9 CI + upgrade min PyTorch version (#503)
* Add Python3.9 CI + upgrade min PyTorch version

* Upgrade min PyTorch version
2021-07-06 09:32:03 +02:00
Antonin RAFFIN
5af35fa2cc
Release v1.1.0 (#497) 2021-07-02 11:21:09 +02:00
Skander Moalla
abbf48e93e
Fix Inconsistencies with EvalCallback tensorboard logs (#492)
* Make EvalCallback dump the evaluation logs it records #457.

* Make test deterministic

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-07-01 15:43:08 +02:00
Carlo Rizzardo
066e1409d9
Corrected DictReplayBuffer observation dtype #484 (#486)
* Fix observation buffer dtype in DictReplayBuffer

* Formatting fix (line length)

* Changelog update, bugfix DictReplaybuffer observations dtype
2021-06-22 13:41:26 +02:00
Antonin RAFFIN
b52c6fc18f
Fix logger setup (#469)
* Make logger an attribute

* Update doc

* Fix logger reset when using multiple runs

* Cleanup logger: remove `Logger.CURRENT`

* Fix for PPO

* Update tests and improve docstring

* Add warning

* Throw error when tensorboard not installed
2021-06-14 15:17:48 +02:00
Benjamin Steenhoek
180a2e3832
Remove recurrent policies from A2C docs (#470)
* Remove recurrent policies from A2C docs

Recurrent policies are not supported yet as of (https://github.com/DLR-RM/stable-baselines3/issues/160#issuecomment-694756355), but the docs say that A2C supports them. Changing it to avoid misleading.

* Update changelog

Co-authored-by: benjaminjsteenhoek@gmail.com <benjis@iastate.edu>
2021-06-07 19:39:49 +02:00