Commit graph

114 commits

Author SHA1 Message Date
Antonin RAFFIN
2599f04940
Add custom arch for off-policy actor/critic networks (#182)
* Add custom arch for off-policy actor/critic networks

* Fix type hints

* Address comments

* Make sure number of updated parameters match in polyak

* Add zip_strict for strict-length zipping

* Fix building docs

* Add test for zip strict

* Faster tests

Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>
2020-10-13 12:01:33 +02:00
Antonin RAFFIN
fc9527157a
Fix off-by-one GAE computation (#185)
* Fix off-by-one GAE computation

* Fix identity test

* Revert gae loop
2020-10-13 00:10:54 +03:00
Antonin RAFFIN
fc6c5d3daa
Migration Guide (#123)
* Start migration guide

* Update guide

* Add comment on RMSpropTFLike plus PPO/A2C migrations

* Add note about set/get-parameters

* Update migration guide

* Update changelog and readme

* Update doc + clean changelog

* Address comments

Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>
2020-10-11 23:22:12 +02:00
Antonin RAFFIN
a1e055695c
Improve typing coverage (#175)
* Improve typing coverage

* Even more types

* Fixes

* Update changelog

* Unified docstrings

* Improve error messages for unsupported spaces
2020-10-07 10:51:49 +02:00
Antonin RAFFIN
a10e3ae587
Release v0.9.0 (#174) 2020-10-04 17:12:35 +02:00
Antonin RAFFIN
55912576ed
Cleanup docstring types (#169)
* Cleanup docstring types

* Update style

* Test with js hack

* Revert "Test with js hack"

This reverts commit d091f438e8851ab8d01b66628e06a104f5e5ec69.

* Fix types

* Fix typo

* Update CONTRIBUTING example
2020-10-02 20:05:55 +03:00
Antonin RAFFIN
2c924f52f5
Update docs (custom policy, type hints) (#167)
* Change import

* Update custom policy doc

* Re-enable sphinx_autodoc_typehints

* Update docker image

* Attempt to fix read the doc build error

* Add sphinx_autodoc_typehints to read the doc env

* Fix pip version

* Add full custom policy example

* Fix
2020-09-29 20:41:14 +03:00
Antonin RAFFIN
44a723eecb
Fix loading of old versions and update changelog (#165) 2020-09-24 16:05:36 +02:00
Anssi
9855486488
Get/set parameters and review of saving and loading (#138)
* Update comments and docstrings

* Rename get_torch_variables to private and update docs

* Clarify documentation on data, params and tensors

* Make excluded_save_params private and update docs

* Update get_torch_variable_names to get_torch_save_params for description

* Simplify saving code and update docs on params vs tensors

* Rename saved item tensors to pytorch_variables for clarity

* Reformat

* Fix a typo

* Add get/set_parameters, update tests accordingly

* Use f-strings for formatting

* Fix load docstring

* Reorganize functions in BaseClass

* Update changelog

* Add library version to the stored models

* Actually run isort this time

* Fix flake8 complaints and also fix testing code

* Fix isort

* ...and black

* Fix set_random_seed

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2020-09-24 14:28:27 +02:00
mloo3
00595b09d8
Add actor/critic loss logging to td3 (#164)
* add actor/critic loss logging to td3

* Update changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-09-23 22:40:41 +02:00
Wilson
e908583e2a
Fix type annotation in make_vec_env (#162)
* Fix type annotation in make_vec_env

The variable `vec_env_cls` is a type and not an instance of either DummyVecEnv or SubprocVecEnv

* Update changelog.rst
2020-09-23 10:34:35 +02:00
liorcohen5
f5104a5efc
Allow to set a device when loading a model (#154)
* Added a 'device' keyword argument to BaseAlgorithm.load().
Edited the save and load test to also test the load method with all possible devices.
Added the changes to the changelog

* improved the load test to ensure that the model loads to the correct device.

* improved the test: now the correctness is improved. If the get_device policy would change, it wouldn't break the test.

* Update tests/test_save_load.py

@araffin's suggestion during the PR process

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Update tests/test_save_load.py

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Bug fixes: when comparing devices, comparing only device type since get_device() doesn't provide device index.
Now the code loads all of the model parameters from the saved state dict straight into the required device. (fixed load_from_zip_file).

* PR fixes: bug fix - a non-related test failed when running on GPU. updated the assertion to consider only types of devices. Also corrected a related bug in 'get_device()' method.

* Update changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-09-20 19:13:18 +02:00
Antonin RAFFIN
583d4b8e41 Minor: fix changelog 2020-09-10 16:56:27 +02:00
Vsevolod Kompantsev
4fd408bec2
Fix PPO logging of clip_fractions (#150)
* bugfix for PPO logging of clip_fractions

* Update changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-09-01 09:52:31 +02:00
Francisco Caio
5fc90a7f7d
Add StopTrainingOnMaxEpisodes to callback collection (#147)
* Add StopTrainingOnMaxEpisodes class to pre-made callback collection

* Adjust instant when counters are incremented for both OnPolicy and OffPolicy algorithms

* Improv to StopTrainingOnMaxEpisodes including output, tests and doc

* Improv StopTrainingOnMaxEpisodes callback running _init_callback

* Update callbacks.py

* Update test_callbacks.py

* Fix style

* Update changelog.rst

* Fix test

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2020-08-28 11:36:33 +02:00
Antonin RAFFIN
a1afc5e42f
Fix typos in SAC and TD3 (#145) 2020-08-23 17:44:35 +02:00
Stelios Tymvios
9003a09d5b
Callbacks have access to locals (#115)
* callbacks have access to locals

* changeloc

* doc

* callbacks have access to locals

* changeloc

* doc

* Added update function for child callbacks

* Pre-Release 0.8.0 (#134)

* Fix double reset and improve typing coverage (#136)

* Fix double reset and improve typing coverage

* Revert minor edit

* Add doc about types

* Update child callbacks

* cleaned imports

* format

* import order

* Simplify tests and add comments

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-08-23 14:34:01 +02:00
Sam Toyer
42ef6d4677
Remove "device" argument from policies (#141)
* Remove device arg from policies

* Clean up for PR

* Update test and doc

* Fix codestyle

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-08-23 13:27:52 +02:00
Antonin RAFFIN
21e9994ff9
Fix double reset and improve typing coverage (#136)
* Fix double reset and improve typing coverage

* Revert minor edit

* Add doc about types
2020-08-05 13:12:02 +03:00
Antonin RAFFIN
cceffd5ab2
Pre-Release 0.8.0 (#134) 2020-08-03 22:38:54 +02:00
Anssi
2cd6a4f93b
Match performance with stable-baselines (discrete case) (#110)
* Fix storing correct episode dones

* Fix number of filters in NatureCNN network

* Add TF-like RMSprop for matching performance with sb2

* Remove stuff that was accidentally included

* Reformat

* Clarify variable naming

* Update changelog

* Add comment on RMSprop implementations to A2C

* Add test for RMSpropTFLike

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-08-03 22:22:51 +02:00
RaphaelWag
3253ee11e7
Update custom_policy.rst (#125)
* Update custom_policy.rst

Fixed Typo

* Update changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-07-31 11:10:48 +02:00
Anssi
77cb3dd0ab
Separate feature extractor networks for DQN networks (#132)
* Separate feature extractor networks for DQN networks

* [ci skip] Bump version

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-07-30 20:48:30 +02:00
Andy Shih
8f9aaaebe9
fix approximate entropy calculation in PPO and A2C (#130) 2020-07-29 21:19:41 +02:00
rk37
bd2aae0c27
Fix ortho init when bias=False with custom policy (#126)
* Update policies.py

fix AttributeError occurred when use "bias=False" linear layer in custom FeaturesExtractor #124

* Update changelog.rst

 update the changelog accordingly

* Update changelog.rst

Co-authored-by: Kong Lingchao <konglingchao@gmail.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-07-25 22:35:48 +02:00
Steven H. Wang
83530560b5
Fix CloudpickleWrapper load (#118)
* CloudpickleWrapper: Load using cloudpickle

* Update changelog
2020-07-21 10:12:39 +02:00
Stelios Tymvios
dbe8cfceb6
Optimized polyak updates (#106)
* quick polyak updates

* changelog

* typing

* reverted autoformatting

* rerverted autofmt

* Update stable_baselines3/common/utils.py

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* parameter names in test

* cleanup

* Merge branch 'master' into polyak

* Update changelog

* Apply suggestions from code review

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Update stable_baselines3/common/utils.py

* Update utils.py

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-07-17 15:53:28 +02:00
Antonin RAFFIN
23afedb254
Auto-formatting with black and isort (#97)
* Add auto formatting with black and isort

* Reformat code

* Ignore typing errors

* Add note about line length

* Add minimum version for isort

* Add commit-checks

* Update docker image

* Fixed lost import (during last merge)

* Fix opencv dependency
2020-07-16 16:12:16 +02:00
Antonin RAFFIN
5ff176b2f1
Implement DDPG (#92)
* Add DDPG + TD3 with any number of critics

* Allow any number of critics for SAC

* Update doc

* [ci skip] Update DDPG example

* Remove unused parameter

* Add DDPG to identity test

* Fix computation with n_critics=1,3

* Update doc

* Apply suggestions from code review

Co-authored-by: Adam Gleave <adam@gleave.me>

* Update docstrings for off-policy algos

* Add check for sde

Co-authored-by: Adam Gleave <adam@gleave.me>
2020-07-16 14:14:22 +02:00
Antonin RAFFIN
208890dfc8
Ignore errors from new pytype version (#107) 2020-07-16 11:54:37 +02:00
Joel Joseph
3cf6e9714b
Update ppo.rst (#94)
* Update ppo.rst

minor correction from A2C to PPO

* Update changelog.rst

* Update changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-07-10 10:38:35 +02:00
Adam Gleave
e7130344de Add changelog entry 2020-07-07 19:03:46 -07:00
Antonin RAFFIN
3756d05f72
Refactored ContinuousCritic for SAC/TD3 (#78)
* Refactored ContinuousCritic for SAC/TD3

* Address comments

* Add pybullet notebook
2020-07-07 01:02:51 +03:00
Stelios Tymvios
4aa66ed34a
Automatically create paths for saved objects (#80)
* automatically create paths for saved objects

* Minor Corrections, more tests

* linting

* typing

* Correct mode checking

* corrected tests to reflect new verbose functionality
2020-07-03 01:14:21 +03:00
Marios Koulakis
7d8ebb9e98
Udacity Reacher Project with Unity (#79)
* Add the reacher project to the sample projects

* Update the change log

* Remove github incompatible link notation

* Update changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-06-30 15:03:02 +02:00
Antonin RAFFIN
08e7519381
Fix q-target in SAC (#77)
* Fix q-target in SAC

* [ci skip] Update version
2020-06-29 17:58:55 +02:00
Noah
96b771f24e
Implement DQN (#28)
* Created DQN template according to the paper.
Next steps:
- Create Policy
- Complete Training
- Debug

* Changed Base Class

* refactor save, to be consistence with overriding the excluded_save_params function. Do not try to exclude the parameters twice.

* Added simple DQN policy

* Finished learn and train function
- missing correct loss computation

* changed collect_rollouts to work with discrete space

* moved discrete space collect_rollouts to dqn

* basic dqn working

* deleted SDE related code

* added gradient clipping and moved greedy policy to policy

* changed policy to implement target network
and added soft update(in fact standart tau is 1 so hard update)

* fixed policy setup

* rebase target_update_intervall on _n_updates

* adapted all tests
all tests passing

* Move to stable-baseline3

* Fixes for DQN

* Fix tests + add CNNPolicy

* Allow any optimizer for DQN

* added some util functions to create a arbitrary linear schedule, fixed pickle problem with old exploration schedule

* more documentation

* changed buffer dtype

* refactor and document

* Added Sphinx Documentation
Updated changelog.rst

* removed custom collect_rollouts as it is no longer necessary

* Implemented suggestions to clean code and documentation.

* extracted some functions on tests to reduce duplicated code

* added support for exploration_fraction

* Fixed exploration_fraction

* Added documentation

* Fixed get_linear_fn -> proper progress scaling

* Merged master

* Added nature reference

* Changed default parameters to https://www.nature.com/articles/nature14236/tables/1

* Fixed n_updates to be incremented correctly

* Correct train_freq

* Doc update

* added special parameter for DQN in tests

* different fix for test_discrete

* Update docs/modules/dqn.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Update docs/modules/dqn.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Update docs/modules/dqn.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Added RMSProp in optimizer_kwargs, as described in nature paper

* Exploration fraction is inverse of 50.000.000 (total frames) / 1.000.000 (frames with linear schedule) according to nature paper

* Changelog update for buffer dtype

* standard exlude parameters should be always excluded to assure proper saving only if intentionally included by ``include`` parameter

* slightly more iterations on test_discrete to pass the test

* added param use_rms_prop instead of mutable default argument

* forgot alpha

* using huber loss, adam and learning rate 1e-4

* account for train_freq in update_target_network

* Added memory check for both buffers

* Doc updated for buffer allocation

* Added psutil Requirement

* Adapted test_identity.py

* Fixes with new SB3 version

* Fix for tensorboard name

* Convert assert to warning and fix tests

* Refactor off-policy algorithms

* Fixes

* test: remove next_obs in replay buffer

* Update changelog

* Fix tests and use tmp_path where possible

* Fix sampling bug in buffer

* Do not store next obs on episode termination

* Fix replay buffer sampling

* Update comment

* moved epsilon from policy to model

* Update predict method

* Update atari wrappers to match SB2

* Minor edit in the buffers

* Update changelog

* Merge branch 'master' into dqn

* Update DQN to new structure

* Fix tests and remove hardcoded path

* Fix for DQN

* Disable memory efficient replay buffer by default

* Fix docstring

* Add tests for memory efficient buffer

* Update changelog

* Split collect rollout

* Move target update outside `train()` for DQN

* Update changelog

* Update linear schedule doc

* Cleanup DQN code

* Minor edit

* Update version and docker images

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-06-29 11:16:54 +02:00
JieQiang (Jay) Wei
e47da426c1
Update rl_zoo.rst (#72)
* Update rl_zoo.rst

a typo fixed.

* Update changelog.rst

Fixed a typo in zoo readme.

* Update changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-06-25 12:14:56 +02:00
Matthias K
977a615c82
Fixed SubprocVecEnv close. (#68)
Updated changelog.

Co-authored-by: Matthias K <wirspielen@web.de>
2020-06-20 18:01:37 +02:00
Tirafesi
644d2c17ac
save_replay_buffer now receives as argument the file path instead of the folder path (#63)
* save_replay_buffer now receives as argument the file path instead of the folder path

* Update changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-06-17 14:00:49 +02:00
Antonin RAFFIN
a861f33107
Update notebooks (#65) 2020-06-17 12:47:09 +02:00
Antonin RAFFIN
494ebfd20a
Hotfix PPO + gSDE (#53)
* Fix variable being passed with gradients

* Update changelog

* Bump version

* Fixes #54
2020-06-10 18:58:35 +02:00
Anssi
b833207142
Add some missing tests, update VecNormalize and RolloutBuffer (#50)
* Change saving/loading normalization parameters to use single pickle file

* Remove 'use_gae' from RolloutBuffer compute_returns function

* Add some missing tests for normalizer, nan-checker and PPO clip_value_fn argument

* Update changelog

* Fix typo

* Use proper pytest.raises for catching errors in tests

* Add comment on GAE and how to obtain non-GAE behaviour

* Remove save/load_running_average from VecNormalize in favor of load/save

* Update changelog

* Update docstring

* Add accidentally removed tests for VecNormalize

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-06-10 12:09:04 +02:00
Anssi
44f8218df0
Review of code (A2C, PPO and refactoring) (#35)
* Split torch module code into torch_layers file

* Updated reference to CNN

* Change 'CxWxH' to 'CxHxW', as per common notion

* Fix missing import in policies.py

* Move PPOPolicy to OnlineActorCriticPolicy

* Create OnPolicyRLModel from PPO, and make A2C and PPO inherit

* Update A2C optimizer comment

* Clean weight init scales for clarity

* Fix A2C log_interval default parameter

* Rename 'progress' to 'progress_remaining

* Rename 'Models' to 'Algorithms'

* Rename 'OnlineActorCriticPolicy' to 'ActorCriticPolicy'

* Move static functions out from BaseAlgorithm

* Move on/off_policy base algorithms to their own files

* Add  files for A2C/PPO

* Fix docs

* Fix pytype

* Update documentation on OnPolicyAlgorithm

* Add proper doctstring for on_policy rollout gathering

* Add bit clarification on the mlppolicy/cnnpolicy naming

* Move static function is_vectorized_policies to utils.py

* Checking docstrings, pep8 fixes

* Update changelog

* Clean changelog

* Remove policy warnings for sac/td3

* Add monitor_wrapper for OnPolicyAlgorithm. Clean tb logging variables. Add parameter keywords to OffPolicyAlgorithm super init

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-06-09 13:54:18 +02:00
Antonin RAFFIN
11d33eb4ae
Fix gSDE loading issue in test mode (#45)
* Fix gSDE loading issue in test mode

* Forward `reset_noise` method

* Re-add `make_actor`

* Reformat
2020-06-08 11:15:10 +02:00
Antonin RAFFIN
353ea81080
Fix several VecEnv issues, add fork start method to tests (#43)
* Fix several VecEnv issues, add `fork` start method to tests

* Fix signature
2020-06-04 11:22:12 +02:00
Antonin RAFFIN
403fff5d50
Pre-Release v0.6.0 (#39)
* Prepare release

* Update docker images
2020-06-01 13:09:47 +02:00
Roland Gavrilescu
bb01253261
Tensorboard integration (#30)
* init commit tensorboard-integration

* Added tb logger to ppo (with output exclusions)

* fixed truncated stdout

* categorize stdout outputs by tag

* separated exclusions from values, added missing logs

* saving exclusions as dict instead of list

* reformatting, auto run indexing

* included renaming suggestions, fixed tests

* tb support for sac

* linting

* moved logging to base class

* tb support for td3

* removed histograms, non-verbose output working

* modifed changelog

* linting

* fixed type error

* moved logger config to utils

* removed episode_rewards log from ppo

* Enable tensorboard in tests

* Remove unused import

* Update logger sub titles

* Minor edit for PPO

* Update logger and tb log folder

* Pass correct logger to Callbacks

* updated docs

* added tb example image to docs

* add support for continuing training in tensorboard

* added tensorboard to docs index

* added tb test

* moved logger config to _setup_learn, updated tests

* accessing verbose from base class

* Update doc and tests

* Rename session -> time

* Update version

* Update logger truncate

* Update types

* Remove duplicated code

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-06-01 11:55:44 +02:00
mloo3
42f432c79c
Fix TD3 Example Code Documentation (#38)
Fix TD3's example code
2020-06-01 11:37:42 +03:00
Stelios Tymvios
78e8d405d7
Implemented Vectorized Action Noise (#34)
* Implemented Vectorized Action Noise

Vectorized Action Noise allows for multiple instances of
ActionNoiseProcesses to run in parallel. This makes it easier to
run TD3/SAC/DDPG with VecEnv.

* fixed linting issues

* make test function name consistent

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* sanity checks and more detailed test

* Update stable_baselines3/common/noise.py

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Added assertion error message in noises setter

* Corrected tests to reflect change to AssertionError from ValueError

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-05-27 09:53:01 +02:00