Commit graph

187 commits

Author SHA1 Message Date
Antonin RAFFIN
d228364ccf
Add timeout handling for on-policy algorithms (#658)
* Add timeout handling for on-policy algorithms

* Fixes

* Fix infinite loop in eval

* Skip type check for python 3.9

* Fix for discrete obs + add docstring

* Fix A2C test

* Removed unused helper

* Add test for infinite horizon

* typed ast should be fixed

* Apply suggestions from code review

Co-authored-by: Anssi <kaneran21@hotmail.com>

Co-authored-by: Anssi <kaneran21@hotmail.com>
2021-11-16 17:19:16 +01:00
Antonin RAFFIN
2bb4500948
Fix set_env when using VecNormalize (#638)
* Fix `set_env` when using `VecNormalize`

* Update version
2021-11-02 13:52:26 +02:00
Antonin Raffin
6daf82bf74
Relax test 2021-10-31 19:03:28 +01:00
Oleksii Kachaiev
0c17fedfac
Adjust FPS calculation to accommodate for reset_num_timesteps=False (#636)
* Store number of timesteps at the beginning of each learn cycle

* Update changelog

* Set default _num_timesteps_at_start in the contructor

* Test case for FPS logger

* Adjust test to cover both on-policy and off-policy algorithms

* Fix formatting

* Update test and add comment

* Fix test

Co-authored-by: Oleksii Kachaiev <okachaiev@riotgames.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-10-31 18:19:03 +01:00
Oleksii Kachaiev
0503e694b2
Introduce norm_obs_keys param for VecNormalize environment wrapper (#631)
* Implement new norm_obs_keys param for VecNormalize environment wrapper

* Simplified doc string to avoid issues with lint and doc

* Updated changelog

* Update changelog.rst

* Update test_vec_normalize.py

* Update sanity checks

* Fix backward compat

* Update doc

* Update changelog

* Fix lint warnings

* Fix tests

* Minor edit

* observation_space sanity check was applied twice

Co-authored-by: Oleksii Kachaiev <okachaiev@riotgames.com>
Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2021-10-28 19:18:39 +02:00
Antonin RAFFIN
e907eca18e
Fix set_env to keep the number of timesteps (#615)
* Fix for `set_env`

* Add test and update changelog

* Use underscores and f-strings

* Add PyPi info

* Update comments
2021-10-23 16:36:40 +02:00
Antonin RAFFIN
1564a85081
System info helper (#613)
* Add `system_env_info`

* Add `print_system_info` to load
and store system info at save time

* Remove TODO

* Rename to `get_system_info`

* Import as sb3 for consistency

* Update changelog

* Add warning for old SB3 versions

* Use underscore litteral for more clarity
2021-10-18 10:43:56 +02:00
Antonin RAFFIN
1881d904a0
Doc fix and improve error messages (#598)
* Fix custom env doc

* Catch common mistake

* Improve `EvalCallback` error message

* Lint test

* Update docs/guide/custom_env.rst

Co-authored-by: Adam Gleave <adam@gleave.me>

Co-authored-by: Adam Gleave <adam@gleave.me>
2021-10-08 18:08:31 +02:00
Antonin RAFFIN
306e49fda6
Fixes in is_vectorized_observation (#587)
* Fix is vectorized bug in DQN

* Fix sub-classed obs
2021-09-28 21:57:49 +02:00
Antonin RAFFIN
201fbffa8c
Remove sde_net_arch + Simplify policy (#584)
* Remove `sde_net_arch` + Simplify policy

* Add warning at load time
2021-09-28 22:32:54 +03:00
Adam Gleave
e825fbdd33
VecNormalize: allow non-continuous observations when norm_obs is False (#575)
* VecNormalize: allow non-continuous observations when norm_obs is False

* Update changelog, fix lint

* Switch to environment present in new and old versions of Gym

* Fix name

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-09-18 12:11:01 +02:00
Cyprien
f3a35aa786
Add method predict_values for ActorCriticPolicy (#569)
* feat: add method predict_values for ActorCriticPolicy

* Fixes for new gym version

* Reformat

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-09-15 14:03:04 +02:00
Antonin RAFFIN
16f8b21d9b
Add get_distribution for on-policy algorithms (#566)
* feat: get_distribution method for ActorCriticPolicy

New method get_distribution for class ActorCriticPolicy returning current action distribution given observations

* doc: updating changelog.rst

- adding block for Release 1.2.1a0
- adding cyprienc to contributors

* style: make format

* fix: updating version.txt

Changing version from 1.2.0 to 1.2.1a0

* Update changelog

* Add test for get distribution

Co-authored-by: Cyprien <courtot.c@gmail.com>
2021-09-13 10:25:42 +02:00
Antonin RAFFIN
f8a0869073
Hotfix for Vecnormalize (#558)
* Hotfix for Vecnormalize

* Rename `ret` to `returns`
2021-09-08 12:30:20 +02:00
Scott Brownlie
1afc2f3abe
Avoid putting target networks into training mode (#553)
* make sure DQN policy is always in correct mode - train or eval

* make set_training_mode an abstract method of the base policy - safer

* update docstring of _build method to note that the target network is put into eval mode

* use set_training_mode to put the dqn target network into eval mode

* use set_training_mode to set the training model of the q-network

* move set_training_mode abstract method from BasePolicy to BaseModel

* set train and eval mode for TD3

* make sure critic is always in correct mode during train

* set train and eval mode for SAC

* add comment re batch norm and dropout

* set train and eval mode for A2C and PPO

* add tests for collect rollouts with batch norm

* fix formatting

* update change log

* update version

* remove Optional typing for batch size - causing type check to fail

* Fix scipy dependency for toy text envs

* implement set_training_mode method in BaseModel

* move all tests of train/eval mode to test_train_eval_mode

* call learn with learning_starts = total_timesteps to test that collect_rollouts does not update batch norm

* remove extra calls to set_training_mode in train method of TD3 and SAC

* Allow gradient_steps=0

* Refactor tests

* Add comment + use aliases

* Typos

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-08-30 17:42:41 +02:00
David Blom
3efab0d267
Training and evaluation: call model.train() and model.eval() (#537)
* training and evaluation: call model.train() and model.eval() to enable and disable dropout and batchnorm

* Add comment documentation

* Fix train and eval for the Actor class

* Run black

* Add github handle to changelog

* Add unit tests for PPO and DQN

* Refactor unit test

* Run black

* unit test: add a dropout layer and check that calling predict with deterministic=True is deterministic

* documentation: add bugfix description to changelog

* unit test: use learning_starts=0, decrease the size of the network and use more training steps

* on policy algorithms: call policy.train() and policy.eval() instead of disable_training and enable_training as it is a th.nn.module

* Rename unit test

* unit test: use drop out probability of 0.5

* Call policy.train and policy.eval

* Fixes + update tests

* Remove unneeded eval

Co-authored-by: David Blom <davidsblom@gmail.com>
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-08-14 14:08:27 +02:00
Skander Moalla
abbf48e93e
Fix Inconsistencies with EvalCallback tensorboard logs (#492)
* Make EvalCallback dump the evaluation logs it records #457.

* Make test deterministic

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-07-01 15:43:08 +02:00
Antonin RAFFIN
b52c6fc18f
Fix logger setup (#469)
* Make logger an attribute

* Update doc

* Fix logger reset when using multiple runs

* Cleanup logger: remove `Logger.CURRENT`

* Fix for PPO

* Update tests and improve docstring

* Add warning

* Throw error when tensorboard not installed
2021-06-14 15:17:48 +02:00
Benjamin Black
a038044d11
Added support for vector envs in evaluation (#447)
* added vector env support to evaluate_policy

* fixed linting and documentation

* updated changelog

* fixed code style issue

* added tests for vec env

* fixed formatting

* renamed observations

* added comments for vector evaluation

* fixed issues

* Cleanup + bump version

* Add comment

* Fix wrong count of episodes

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2021-05-28 12:40:29 +02:00
Antonin RAFFIN
88e1be9ff5
Documentation update (#450)
* Update migration guide

* Add sanity check

* Removed parameter ``channels_last`` from ``is_image_space``

* Pin docutils

* Clarify callback `save_freq` definition

* Update docs/misc/changelog.rst

* Update docs/misc/changelog.rst

* Fix typos

Co-authored-by: Anssi <kaneran21@hotmail.com>
2021-05-23 13:13:11 +02:00
Amanda Dsouza
18f4e3ace0
Added wrapper_kwargs argument to make_vec_env (#448)
* Added wrapper_kwargs to make_vec_env

* code black format

* Tmp fix for atari-py

* Update changelog

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-05-23 11:33:34 +02:00
Rohan Tangri
df6f9de8f4
KL Divergence Helper Function (#431)
* add kl divergence wrapper

* add test

* update changelog

* black lint

* remove unused import

* Fix ent coef loading for SAC (#429)

* Fix ent coef loading for SAC

* Better fix and add comment

* add 'distribution' to base Distribution class

* add sample test

* revert to plain pytorch implementation

* black reformat

* Update docs/misc/changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Doc update (custom policy + fix her example) (#436)

* isort and black reformat

* float -> bool tensor

* add sanity test

* more concise kl code

* remove outdated comment

* all -> allclose assertion

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Fix PyTorch warning

* Update gSDE entropy test

* Update entropy test

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2021-05-20 19:01:07 +02:00
Antonin RAFFIN
1ce911994b
Fix ent coef loading for SAC (#429)
* Fix ent coef loading for SAC

* Better fix and add comment
2021-05-12 12:21:54 +03:00
Jaden Travnik
75b6f3b3b0
Dictionary Observations (#243)
* First commit

* Fixing missing refs from a quick merge from master

* Reformat

* Adding DictBuffers

* Reformat

* Minor reformat

* added slow dict test. Added SACMultiInputPolicy for future. Added private static image transpose helper to common policy

* Ran black on buffers

* Ran isort

* Adding StackedObservations classes used within VecStackEnvs wrappers. Made test_dict_env shorter and removed slow

* Running isort :facepalm

* Fixed typing issues

* Adding docstrings and typing. Using util for moving data to device.

* Fixed trailing commas

* Fix types

* Minor edits

* Avoid duplicating code

* Fix calls to parents

* Adding assert to buffers. Updating changelong

* Running format on buffers

* Adding multi-input policies to dqn,td3,a2c. Fixing warnings. Fixed bug with DictReplayBuffer as Replay buffers use only 1 env

* Fixing warnings, splitting is_vectorized_observation into multiple functions based on space type

* Created envs folder in common. Updated imports. Moved stacked_obs to vec_env folder

* Moved envs to envs directory. Moved stacked obs to vec_envs. Started update on documentation

* Fixes

* Running code style

* Update docstrings on torch_layers

* Decapitalize non-constant variables

* Using NatureCNN architecture in combined extractor. Increasing img size in multi input env. Adding memory reduction in test

* Update doc

* Update doc

* Fix format

* Removing NineRoom env. Using nested preprocess. Removing mutable default args

* running code style

* Passing channel check through to stacked dict observations.

* Running black

* Adding channel control to SimpleMultiObsEnv. Passing check_channels to CombinedExtractor

* Remove optimize memory for dict buffers

* Update doc

* Move identity env

* Minor edits + bump version

* Update doc

* Fix doc build

* Bug fixes + add support for more type of dict env

* Fixes + add multi env test

* Add support for vectranspose

* Fix stacked obs for dict and add tests

* Add check for nested spaces. Fix dict-subprocvecenv test

* Fix (single) pytype error

* Simplify CombinedExtractor

* Fix tests

* Fix check

* Merge branch 'master' into feat/dict_observations

* Fix for net_arch with dict and vector obs

* Fixes

* Add consistency test

* Update env checker

* Add some docs on dict obs

* Update default CNN feature vector size

* Refactor HER (#351)

* Start refactoring HER

* Fixes

* Additional fixes

* Faster tests

* WIP: HER as a custom replay buffer

* New replay only version (working with DQN)

* Add support for all off-policy algorithms

* Fix saving/loading

* Remove ObsDictWrapper and add VecNormalize tests with dict

* Stable-Baselines3 v1.0 (#354)

* Bump version and update doc

* Fix name

* Apply suggestions from code review

Co-authored-by: Adam Gleave <adam@gleave.me>

* Update docs/index.rst

Co-authored-by: Adam Gleave <adam@gleave.me>

* Update wording for RL zoo

Co-authored-by: Adam Gleave <adam@gleave.me>

* Add gym-pybullet-drones project (#358)

* Update projects.rst

Added gym-pybullet-drones

* Update projects.rst

Longer title underline

* Update changelog

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>

* Include SuperSuit in projects (#359)

* include supersuit

* longer title underline

* Update changelog.rst

* Fix default arguments + add bugbear (#363)

* Fix potential bug + add bug bear

* Remove unused variables

* Minor: version bump

* Add code of conduct + update doc (#373)

* Add code of conduct

* Fix DQN doc example

* Update doc (channel-last/first)

* Apply suggestions from code review

Co-authored-by: Anssi <kaneran21@hotmail.com>

* Apply suggestions from code review

Co-authored-by: Adam Gleave <adam@gleave.me>

Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Adam Gleave <adam@gleave.me>

* Make installation command compatible with ZSH (#376)

* Add quotes

* Add Zsh bracket info

* Add clarify pip installation line

* Make note bold

* Add Zsh pip installation note

* Add handle timeouts param

* Fixes

* Fixes (buffer size, extend test)

* Fix `max_episode_length` redefinition

* Fix potential issue

* Add some docs on dict obs

* Fix performance bug

* Fix slowdown

* Add package to install (#378)

* Add package to install

* Update docs packages installation command

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Fix backward compat + add test

* Fix VecEnv detection

* Update doc

* Fix vec env check

* Support for `VecMonitor` for gym3-style environments (#311)

* add vectorized monitor

* auto format of the code

* add documentation and VecExtractDictObs

* refactor and add test cases

* add test cases and format

* avoid circular import and fix doc

* fix type

* fix type

* oops

* Update stable_baselines3/common/monitor.py

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Update stable_baselines3/common/monitor.py

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* add test cases

* update changelog

* fix mutable argument

* quick fix

* Apply suggestions from code review

* fix terminal observation for gym3 envs

* delete comment

* Update doc and bump version

* Add warning when already using `Monitor` wrapper

* Update vecmonitor tests

* Fixes

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Reformat

* Fixed loading of ``ent_coef`` for ``SAC`` and ``TQC``, it was not optimized anymore (#392)

* Fix ent coef loading bug

* Add test

* Add comment

* Reuse save path

* Add test for GAE + rename `RolloutBuffer.dones` for clarification (#375)

* Fix return computation + add test for GAE

* Rename `last_dones` to `episode_starts` for clarification

* Revert advantage

* Cleanup test

* Rename variable

* Clarify return computation

* Clarify docs

* Add multi-episode rollout test

* Reformat

Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>

* Fixed saving of `A2C` and `PPO` policy when using gSDE (#401)

* Improve doc and replay buffer loading

* Add support for images

* Fix doc

* Update Procgen doc

* Update changelog

* Update docstrings

Co-authored-by: Adam Gleave <adam@gleave.me>
Co-authored-by: Jacopo Panerati <jacopo.panerati@utoronto.ca>
Co-authored-by: Justin Terry <justinkterry@gmail.com>
Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Tom Dörr <tomdoerr96@gmail.com>
Co-authored-by: Tom Dörr <tom.doerr@tum.de>
Co-authored-by: Costa Huang <costa.huang@outlook.com>

* Update doc and minor fixes

* Update doc

* Added note about MultiInputPolicy in error of NatureCNN

* Merge branch 'master' into feat/dict_observations

* Address comments

* Naming clarifications

* Actually saving the file would be nice

* Fix edge case when doing online sampling with HER

* Cleanup

* Add sanity check

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>
Co-authored-by: Adam Gleave <adam@gleave.me>
Co-authored-by: Jacopo Panerati <jacopo.panerati@utoronto.ca>
Co-authored-by: Justin Terry <justinkterry@gmail.com>
Co-authored-by: Tom Dörr <tomdoerr96@gmail.com>
Co-authored-by: Tom Dörr <tom.doerr@tum.de>
Co-authored-by: Costa Huang <costa.huang@outlook.com>
2021-05-11 12:29:30 +02:00
Antonin RAFFIN
c69f7cd5e6
Fixed saving of A2C and PPO policy when using gSDE (#401) 2021-04-19 12:23:02 +02:00
Antonin RAFFIN
5d47296b8d
Add test for GAE + rename RolloutBuffer.dones for clarification (#375)
* Fix return computation + add test for GAE

* Rename `last_dones` to `episode_starts` for clarification

* Revert advantage

* Cleanup test

* Rename variable

* Clarify return computation

* Clarify docs

* Add multi-episode rollout test

* Reformat

Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>
2021-04-16 15:52:55 +02:00
Antonin RAFFIN
c4304029a2
Fixed loading of `ent_coef for SAC and TQC`, it was not optimized anymore (#392)
* Fix ent coef loading bug

* Add test

* Add comment

* Reuse save path
2021-04-15 14:50:43 +02:00
Costa Huang
ddbe0e93f9
Support for VecMonitor for gym3-style environments (#311)
* add vectorized monitor

* auto format of the code

* add documentation and VecExtractDictObs

* refactor and add test cases

* add test cases and format

* avoid circular import and fix doc

* fix type

* fix type

* oops

* Update stable_baselines3/common/monitor.py

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Update stable_baselines3/common/monitor.py

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* add test cases

* update changelog

* fix mutable argument

* quick fix

* Apply suggestions from code review

* fix terminal observation for gym3 envs

* delete comment

* Update doc and bump version

* Add warning when already using `Monitor` wrapper

* Update vecmonitor tests

* Fixes

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-04-13 18:09:31 +02:00
Antonin RAFFIN
237223f834
Fix for HER with custom objects (#343) 2021-03-06 15:57:27 +01:00
Antonin RAFFIN
c62e9259db
Add custom objects support + bug fix (#336)
* Add support for custom objects

* Add python 3.8 to the CI

* Bump version

* PyType fixes

* [ci skip] Fix typo

* Add note about slow-down + fix typos

* Minor edits to the doc

* Bug fix for DQN

* Update test

* Add test for custom objects
2021-03-06 15:17:43 +02:00
Antonin RAFFIN
d0d55f3767
Beta is over =)! V1.0rc0 (#334)
* Fix doc + bump version

* Removed cmd util

* Remove test
2021-03-01 13:35:21 +01:00
Antonin RAFFIN
b2c94a677d
Fix train_freq at load time (#332)
* Fix train_freq loading

* Update docker

* Add sanity checks + tests for train freq
2021-02-27 19:53:13 +01:00
M. Ernestus
0c50d75ecb
TD3 Code review (#245)
* Removed unneeded overrides of feature_extractor and normalize_images in the TD3 Actor.

* Add learning rate schedule example (#248)

* Add learning rate schedule example

* Update docs/guide/examples.rst

Co-authored-by: Adam Gleave <adam@gleave.me>

* Address comments

Co-authored-by: Adam Gleave <adam@gleave.me>

* Add supported action spaces checks (#254)

* Add supported action spaces checks

* Address comment

* Use `pass` in an abstractmethod instead of deleting the arguments.

* Remove the "deterministic" keyword from the forward method of the TD3 Actor since it always is deterministic anyways.

* Rename _get_data to _get_data_to_reconstruct_model.

_get_data was too generic and could have meant anything.

* Remove the n_episodes_rollout parameter and allow passing tuples as train_freq instead.

* Fix docstring of `train_freq` parameter.

* Black fixes.

* Fix TD3 delayed update + rename `_get_data()`

* Fix TD3 test

* Normalize `train_freq` to a tuple in the constructor and turn the warning into an assert.

* Make one step the default train frequency.

* Black fixes.

* Change np.bool to bool.

* Use the tuple format to specify an amount of steps in terms of steps or episodes in the collect_collouts of the off policy algorithm.

* Use the tuple format to specify an amount of steps in terms of steps or episodes in the collect_collouts of HER.

* Use named tuple for train freq

* Rename train_freq to train_every and TrainFreq to ExperienceDuration. Also add some type annotations and documentation.

* Black fixes.

* Revert to train_freq

* Fix terminal observation issues

* Typo

* Fix action noise bug in HER

* Add assert when loading HER models

* Update version

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Adam Gleave <adam@gleave.me>
2021-02-27 17:33:50 +01:00
Lorenz Hetzel
b01bde3e2d
Add Support for Text Records to Logger, Add Hint on How To Access SummaryWriter in Docs. (#303)
* add support for text records to logger

* add note on how to access summary writer directly

* escape unicode chars for HumanOutputFormat

* update changelog

* fix formatting

* fix docs

* add tests

* fix formatting

* fix example, link to pytorch docs, update changelog

* move unicode escaping to own function, properly escape quotechars in csv formatter

* switch from n_calls to num_timesteps in example

* make step coherent in example

* use n_calls to check when to login example

* add small hint about log frequency

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* add comment about str is scalar type, improve test input

* Update tests

* Update test_logger.py

* use repr to handle strings in logger

* remove repr from text log output

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-02-01 11:56:33 +01:00
Antonin RAFFIN
d7c6aff252
Fix discrete obs support (#296)
* Fixed discrete obs support

* Suggest new edit, fix failed test

* Revert "Suggest new edit, fix failed test"

This reverts commit 6892bf05506bb5ad0e87016d8d382705ab72e6a4.

* Fix test

* Special case for discrete obs

Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>
2021-01-21 02:42:33 +02:00
Cody Wild
b1aee71772
Improve error messages when PPO effective batch size is 1 and when last mini-batch is truncated (#270)
* Add warning about total_env_steps not dividing neatly into batch size

* Stylistic cleanup

* Black reformatting

* Add clearer documentation and update changelog

* Update changelog.rst

* Use specific RolloutBuffer terminology

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Change to minibatch language

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Cleaning up language describing rollout buffer requirements

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Switch to using env.num_envs

* Working tests

* Black and isort still fighting each other

* codestyle finally happy

* Basic test exists, possibly in the wrong file

* Update phrasing

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-01-11 17:03:32 +01:00
Carlos M. Casas Cuadrado
5993033c73
Add image and figure to tensorboard logger (#277)
* Added Image and Figure classes to logger. For now, these objects can only be logged by TensorBoardOutputFormat

* Added documentation for figure and image logging into tensorboard

* Updated changelog

* Minor changes to documentation. Reviewed supported types for logging images and figures

* Fix type for np arrays

* Added more explicit example for logging figures in the documentation. Added docstrings for parameters in logging auxiliary classes

* Added tests for image and figure logging

* Applied autoformatting

* Update doc

* Fix documentation example

* Bump version

Co-authored-by: Carlos Casas <ccasascuadrado@guidewire.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-01-08 15:47:08 +01:00
Antonin RAFFIN
944dfdafe4
Update doc: SB3-Contrib (#267)
* Fix big when saving/loading q-net alone

* Rename variables to match SB3-contrib

* Update docker image

* Set min version for tensorboard

* Add SB3-Contrib to doc

* Update DQN

* Apply suggestions from code review

Co-authored-by: Adam Gleave <adam@gleave.me>

* Update wording

Co-authored-by: Adam Gleave <adam@gleave.me>
2020-12-21 16:17:24 +01:00
Antonin RAFFIN
6b598323ae
Add eval success rate logging (#255)
* Add eval success rate logging

* Fix name clash

* Log data

* Bump version
2020-12-08 15:49:07 +01:00
Antonin RAFFIN
2b9fc1f923
Add supported action spaces checks (#254)
* Add supported action spaces checks

* Address comment
2020-12-06 14:05:10 +02:00
Antonin RAFFIN
3207bdab17
Automatically wrap with a Monitor when possible (#237)
* Automatically wrap with a Monitor when possible

* Update stable_baselines3/common/base_class.py

Co-authored-by: Anssi <kaneran21@hotmail.com>
2020-11-20 18:08:00 +02:00
Megan Klaiber
852961139e
Fix bug with full HerReplayBuffer (#236)
* Fix bug with full replay buffer

* Updated changelog

* Update tests/test_her.py

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-11-20 13:23:03 +01:00
Antonin RAFFIN
d04aad2a20
Doc fixes and add monitor_kwargs parameter (#230)
* Fix type annotation

* Fix migration doc for A2C

* Update version

* Add `monitor_kwargs` argument

* Update docs/guide/migration.rst

Co-authored-by: Adam Gleave <adam@gleave.me>

* Fix make atari env

* Fix docstring

* Renamed LearningRateSchedule

Co-authored-by: Adam Gleave <adam@gleave.me>
2020-11-20 10:28:54 +01:00
Antonin RAFFIN
9069cf55f1
Fix DQN predict shape for single Gym env (#222)
* Fix DQN predict shape for single Gym env

* Remove unused imports
2020-11-17 00:43:26 +02:00
Anssi
18d10dbf42
Use Monitor episode reward/length for evaluate_policy (#220)
* Update evaluate_policy to use monitor data if available

* Update documentation

* Cleaning up

* Remove unnecessary typing trickery

* Update doc

* Rename is_wrapped to clarify it is for vecenvs

* Add is_wrapped for regular envs

* Add is_wrapped call for subprocvecenv and update code for circular imports

* Move new functions back to env_util and fix imports

* Update changelog

* Clarify evaluate_policy docs

* Add tests for wrapped modifying episode lengths

* Fix tests

* Update changelog

* Minor edits

* Add warn switch to evaluate_policy and update tests

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-11-16 11:52:28 +01:00
Anssi
e2b6f5460f
Avoid transposing channel-first envs (#213)
* Add test for channel-first environments

* Add support for channel-first envs, including more tests

* Update changelog

* Run black

* Run black, again

* Improve NatureCNN error message

* Update image checks and FrameStack wrapper

* Update tests

* Update docs

* Run isort

* Reformat

* Fixes: avoid breaking changes for non-image env

* Add additional checks

* Update docstring

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-11-03 12:34:09 +01:00
Stefan Heid
9d463bc476
Small docstring improvements related to the notion of Rollout (#206)
* Small docstring improvements related to the notion of Rollout

* documented changes in changelog.rst, added myself to contributers

* Minor edits

Co-authored-by: Stefan Heid <stefan.heid@upb.de>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-11-02 11:45:08 +01:00
Antonin RAFFIN
6327cc6156
Fix env loading (#203)
* Fix loading bug

* [ci skip] Fix docstring
2020-10-27 23:12:52 +02:00
Antonin RAFFIN
0fc0dd1b21
Fix off policy features extractor (#198)
* Faster tests

* Fix feature extractor bug + add check

* Add missing check

* Allow TD3 features extractor to be separate

* Add share features extractor option for SAC

* Bug fixes

* Apply suggestions from code review

Co-authored-by: Adam Gleave <adam@gleave.me>

Co-authored-by: Adam Gleave <adam@gleave.me>
2020-10-27 14:24:59 +01:00
Megan Klaiber
dd6e361204
Implement HER (#120)
* Added working her version, Online sampling is missing.

* Updated test_her.

* Added first version of online her sampling. Still problems with tensor dimensions.

* Reformat

* Fixed tests

* Added some comments.

* Updated changelog.

* Add missing init file

* Fixed some small bugs.

* Reduced arguments for HER, small changes.

* Added getattr. Fixed bug for online sampling.

* Updated save/load funtions. Small changes.

* Added her to init.

* Updated save method.

* Updated her ratio.

* Move obs_wrapper

* Added DQN test.

* Fix potential bug

* Offline and online her share same sample_goal function.

* Changed lists into arrays.

* Updated her test.

* Fix online sampling

* Fixed action bug. Updated time limit for episodes.

* Updated convert_dict method to take keys as arguments.

* Renamed obs dict wrapper.

* Seed bit flipping env

* Remove get_episode_dict

* Add fast online sampling version

* Added documentation.

* Vectorized reward computation

* Vectorized goal sampling

* Update time limit for episodes in online her sampling.

* Fix max episode length inference

* Bug fix for Fetch envs

* Fix for HER + gSDE

* Reformat (new black version)

* Added info dict to compute new reward. Check her_replay_buffer again.

* Fix info buffer

* Updated done flag.

* Fixes for gSDE

* Offline her version uses now HerReplayBuffer as episode storage.

* Fix num_timesteps computation

* Fix get torch params

* Vectorized version for offline sampling.

* Modified offline her sampling to use sample method of her_replay_buffer

* Updated HER tests.

* Updated documentation

* Cleanup docstrings

* Updated to review comments

* Fix pytype

* Update according to review comments.

* Removed random goal strategy. Updated sample transitions.

* Updated migration. Removed time signal removal.

* Update doc

* Fix potential load issue

* Add VecNormalize support for dict obs

* Updated saving/loading replay buffer for HER.

* Fix test memory usage

* Fixed save/load replay buffer.

* Fixed save/load replay buffer

* Fixed transition index after loading replay buffer in online sampling

* Better error handling

* Add tests for get_time_limit

* More tests for VecNormalize with dict obs

* Update doc

* Improve HER description

* Add test for sde support

* Add comments

* Add comments

* Remove check that was always valid

* Fix for terminal observation

* Updated buffer size in offline version and reset of HER buffer

* Reformat

* Update doc

* Remove np.empty + add doc

* Fix loading

* Updated loading replay buffer

* Separate online and offline sampling + bug fixes

* Update tensorboard log name

* Version bump

* Bug fix for special case

Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-10-22 11:56:43 +02:00