Commit graph

122 commits

Author SHA1 Message Date
Steven H. Wang
b252f4212c
Add imitation library docs (#200)
* docs: Add imitation library docs

* Fix doc syntax errors

* Fix internal link; PDF->abstract for DAgger for consistency

* Grammar

* Update migration guide

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Adam Gleave <adam@gleave.me>
2020-10-24 17:33:26 +01:00
Megan Klaiber
dd6e361204
Implement HER (#120)
* Added working her version, Online sampling is missing.

* Updated test_her.

* Added first version of online her sampling. Still problems with tensor dimensions.

* Reformat

* Fixed tests

* Added some comments.

* Updated changelog.

* Add missing init file

* Fixed some small bugs.

* Reduced arguments for HER, small changes.

* Added getattr. Fixed bug for online sampling.

* Updated save/load funtions. Small changes.

* Added her to init.

* Updated save method.

* Updated her ratio.

* Move obs_wrapper

* Added DQN test.

* Fix potential bug

* Offline and online her share same sample_goal function.

* Changed lists into arrays.

* Updated her test.

* Fix online sampling

* Fixed action bug. Updated time limit for episodes.

* Updated convert_dict method to take keys as arguments.

* Renamed obs dict wrapper.

* Seed bit flipping env

* Remove get_episode_dict

* Add fast online sampling version

* Added documentation.

* Vectorized reward computation

* Vectorized goal sampling

* Update time limit for episodes in online her sampling.

* Fix max episode length inference

* Bug fix for Fetch envs

* Fix for HER + gSDE

* Reformat (new black version)

* Added info dict to compute new reward. Check her_replay_buffer again.

* Fix info buffer

* Updated done flag.

* Fixes for gSDE

* Offline her version uses now HerReplayBuffer as episode storage.

* Fix num_timesteps computation

* Fix get torch params

* Vectorized version for offline sampling.

* Modified offline her sampling to use sample method of her_replay_buffer

* Updated HER tests.

* Updated documentation

* Cleanup docstrings

* Updated to review comments

* Fix pytype

* Update according to review comments.

* Removed random goal strategy. Updated sample transitions.

* Updated migration. Removed time signal removal.

* Update doc

* Fix potential load issue

* Add VecNormalize support for dict obs

* Updated saving/loading replay buffer for HER.

* Fix test memory usage

* Fixed save/load replay buffer.

* Fixed save/load replay buffer

* Fixed transition index after loading replay buffer in online sampling

* Better error handling

* Add tests for get_time_limit

* More tests for VecNormalize with dict obs

* Update doc

* Improve HER description

* Add test for sde support

* Add comments

* Add comments

* Remove check that was always valid

* Fix for terminal observation

* Updated buffer size in offline version and reset of HER buffer

* Reformat

* Update doc

* Remove np.empty + add doc

* Fix loading

* Updated loading replay buffer

* Separate online and offline sampling + bug fixes

* Update tensorboard log name

* Version bump

* Bug fix for special case

Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-10-22 11:56:43 +02:00
Bernhard Raml
15e94a6d14
Add support to log videos via tensorboard (#196)
* Add support to log videos via tensorboard

The ability to look at renderings of agent's trajectories during
training helps evaluate the performance of that agent. One can see what
the agent actually does at various stages during training. For now only
tensorboard is supported, as it is straightforward to implement.

* Remove moviepy dependency from extra & doc update

* Removed the moviepy dependency from the `extra` dependencies so the
user can decide whether to install it or not

* Update the video logging docu with proper naming, comments

* Added a warning to the video logging docu explaining the moviepy
dependency

* Updated the video test, to check for a warning when moviepy is missing

* Update doc

* Update FormatUnsupportedError message

* Also log the offending value making the error message more expressive

* Fix reporting the correct format and update regression test

* Use string description in FormatUnsupportedError

* Instead of converting the value to string without the user's control
the constructor takes a string representation of the value

* Use string description in FormatUnsupportedError

* Use a shorter string description for the error to reduce verbosity

Co-authored-by: Bernhard Raml <raml.bernhard@gmail.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-10-22 11:33:58 +02:00
Anssi
19c1a89a3a
Rename cmd_util to env_util (#197)
* Rename cmd_util to env_util

* Fix docs and add missing newline

* Address comments
2020-10-22 11:05:52 +02:00
Melvin Wang
856da19609
add check to ensure action space is non-dict non-tuple for env_checker nan check (#192)
* add check to ensure action space is non-dict non-tuple for env_checker nan check

* update changelog.rst

* add regression test for new check

* commit-checks

* add more action space checks

* update docstrings

* add warning check
2020-10-19 00:23:51 +03:00
Anssi
37f48aa979
Fix initializing CUDA even when device="cpu" is used. (#194)
* Fall back to 'cpu' device in policies instead of 'auto'

* Update changelog
2020-10-18 20:51:56 +02:00
Bernhard Raml
97b81f9e9e
Fix ignoring the exclude in the logger's record function for json, csv and log logging formats (#190)
* Fix ignoring the exclude in logger record

For the logging formats json, csv, and log the exclude parameter of the
logger's record function has been ignored. The necessary checks were
missing from some of the format writer classes. Regression tests have
been added to prevent this error in the future.

* Fix docstring for filter_excluded_keys

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Added missing type hints to local functions

* Update stable_baselines3/common/logger.py

Co-authored-by: Bernhard Raml <raml.bernhard@gmail.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-10-16 17:34:49 +02:00
Wilson
fe6ade3089
Allow env_kwargs in make_vec_env when env ID string supplied (#189)
* Allow env_kwargs in make_vec_env when env ID string supplied

Resolves #188

* Update docs/misc/changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Add test for env kargs in make_vec_env

* remove unnecessary args in test_vec_env_kwargs function

* Fixes and reformat

* Doc fix

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-10-16 11:09:19 +02:00
Antonin RAFFIN
2599f04940
Add custom arch for off-policy actor/critic networks (#182)
* Add custom arch for off-policy actor/critic networks

* Fix type hints

* Address comments

* Make sure number of updated parameters match in polyak

* Add zip_strict for strict-length zipping

* Fix building docs

* Add test for zip strict

* Faster tests

Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>
2020-10-13 12:01:33 +02:00
Antonin RAFFIN
fc9527157a
Fix off-by-one GAE computation (#185)
* Fix off-by-one GAE computation

* Fix identity test

* Revert gae loop
2020-10-13 00:10:54 +03:00
Antonin RAFFIN
fc6c5d3daa
Migration Guide (#123)
* Start migration guide

* Update guide

* Add comment on RMSpropTFLike plus PPO/A2C migrations

* Add note about set/get-parameters

* Update migration guide

* Update changelog and readme

* Update doc + clean changelog

* Address comments

Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>
2020-10-11 23:22:12 +02:00
Antonin RAFFIN
a1e055695c
Improve typing coverage (#175)
* Improve typing coverage

* Even more types

* Fixes

* Update changelog

* Unified docstrings

* Improve error messages for unsupported spaces
2020-10-07 10:51:49 +02:00
Antonin RAFFIN
a10e3ae587
Release v0.9.0 (#174) 2020-10-04 17:12:35 +02:00
Antonin RAFFIN
55912576ed
Cleanup docstring types (#169)
* Cleanup docstring types

* Update style

* Test with js hack

* Revert "Test with js hack"

This reverts commit d091f438e8851ab8d01b66628e06a104f5e5ec69.

* Fix types

* Fix typo

* Update CONTRIBUTING example
2020-10-02 20:05:55 +03:00
Antonin RAFFIN
2c924f52f5
Update docs (custom policy, type hints) (#167)
* Change import

* Update custom policy doc

* Re-enable sphinx_autodoc_typehints

* Update docker image

* Attempt to fix read the doc build error

* Add sphinx_autodoc_typehints to read the doc env

* Fix pip version

* Add full custom policy example

* Fix
2020-09-29 20:41:14 +03:00
Antonin RAFFIN
44a723eecb
Fix loading of old versions and update changelog (#165) 2020-09-24 16:05:36 +02:00
Anssi
9855486488
Get/set parameters and review of saving and loading (#138)
* Update comments and docstrings

* Rename get_torch_variables to private and update docs

* Clarify documentation on data, params and tensors

* Make excluded_save_params private and update docs

* Update get_torch_variable_names to get_torch_save_params for description

* Simplify saving code and update docs on params vs tensors

* Rename saved item tensors to pytorch_variables for clarity

* Reformat

* Fix a typo

* Add get/set_parameters, update tests accordingly

* Use f-strings for formatting

* Fix load docstring

* Reorganize functions in BaseClass

* Update changelog

* Add library version to the stored models

* Actually run isort this time

* Fix flake8 complaints and also fix testing code

* Fix isort

* ...and black

* Fix set_random_seed

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2020-09-24 14:28:27 +02:00
mloo3
00595b09d8
Add actor/critic loss logging to td3 (#164)
* add actor/critic loss logging to td3

* Update changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-09-23 22:40:41 +02:00
Wilson
e908583e2a
Fix type annotation in make_vec_env (#162)
* Fix type annotation in make_vec_env

The variable `vec_env_cls` is a type and not an instance of either DummyVecEnv or SubprocVecEnv

* Update changelog.rst
2020-09-23 10:34:35 +02:00
liorcohen5
f5104a5efc
Allow to set a device when loading a model (#154)
* Added a 'device' keyword argument to BaseAlgorithm.load().
Edited the save and load test to also test the load method with all possible devices.
Added the changes to the changelog

* improved the load test to ensure that the model loads to the correct device.

* improved the test: now the correctness is improved. If the get_device policy would change, it wouldn't break the test.

* Update tests/test_save_load.py

@araffin's suggestion during the PR process

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Update tests/test_save_load.py

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Bug fixes: when comparing devices, comparing only device type since get_device() doesn't provide device index.
Now the code loads all of the model parameters from the saved state dict straight into the required device. (fixed load_from_zip_file).

* PR fixes: bug fix - a non-related test failed when running on GPU. updated the assertion to consider only types of devices. Also corrected a related bug in 'get_device()' method.

* Update changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-09-20 19:13:18 +02:00
Antonin RAFFIN
583d4b8e41 Minor: fix changelog 2020-09-10 16:56:27 +02:00
Vsevolod Kompantsev
4fd408bec2
Fix PPO logging of clip_fractions (#150)
* bugfix for PPO logging of clip_fractions

* Update changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-09-01 09:52:31 +02:00
Francisco Caio
5fc90a7f7d
Add StopTrainingOnMaxEpisodes to callback collection (#147)
* Add StopTrainingOnMaxEpisodes class to pre-made callback collection

* Adjust instant when counters are incremented for both OnPolicy and OffPolicy algorithms

* Improv to StopTrainingOnMaxEpisodes including output, tests and doc

* Improv StopTrainingOnMaxEpisodes callback running _init_callback

* Update callbacks.py

* Update test_callbacks.py

* Fix style

* Update changelog.rst

* Fix test

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2020-08-28 11:36:33 +02:00
Antonin RAFFIN
a1afc5e42f
Fix typos in SAC and TD3 (#145) 2020-08-23 17:44:35 +02:00
Stelios Tymvios
9003a09d5b
Callbacks have access to locals (#115)
* callbacks have access to locals

* changeloc

* doc

* callbacks have access to locals

* changeloc

* doc

* Added update function for child callbacks

* Pre-Release 0.8.0 (#134)

* Fix double reset and improve typing coverage (#136)

* Fix double reset and improve typing coverage

* Revert minor edit

* Add doc about types

* Update child callbacks

* cleaned imports

* format

* import order

* Simplify tests and add comments

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-08-23 14:34:01 +02:00
Sam Toyer
42ef6d4677
Remove "device" argument from policies (#141)
* Remove device arg from policies

* Clean up for PR

* Update test and doc

* Fix codestyle

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-08-23 13:27:52 +02:00
Antonin RAFFIN
21e9994ff9
Fix double reset and improve typing coverage (#136)
* Fix double reset and improve typing coverage

* Revert minor edit

* Add doc about types
2020-08-05 13:12:02 +03:00
Antonin RAFFIN
cceffd5ab2
Pre-Release 0.8.0 (#134) 2020-08-03 22:38:54 +02:00
Anssi
2cd6a4f93b
Match performance with stable-baselines (discrete case) (#110)
* Fix storing correct episode dones

* Fix number of filters in NatureCNN network

* Add TF-like RMSprop for matching performance with sb2

* Remove stuff that was accidentally included

* Reformat

* Clarify variable naming

* Update changelog

* Add comment on RMSprop implementations to A2C

* Add test for RMSpropTFLike

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-08-03 22:22:51 +02:00
RaphaelWag
3253ee11e7
Update custom_policy.rst (#125)
* Update custom_policy.rst

Fixed Typo

* Update changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-07-31 11:10:48 +02:00
Anssi
77cb3dd0ab
Separate feature extractor networks for DQN networks (#132)
* Separate feature extractor networks for DQN networks

* [ci skip] Bump version

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-07-30 20:48:30 +02:00
Andy Shih
8f9aaaebe9
fix approximate entropy calculation in PPO and A2C (#130) 2020-07-29 21:19:41 +02:00
rk37
bd2aae0c27
Fix ortho init when bias=False with custom policy (#126)
* Update policies.py

fix AttributeError occurred when use "bias=False" linear layer in custom FeaturesExtractor #124

* Update changelog.rst

 update the changelog accordingly

* Update changelog.rst

Co-authored-by: Kong Lingchao <konglingchao@gmail.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-07-25 22:35:48 +02:00
Steven H. Wang
83530560b5
Fix CloudpickleWrapper load (#118)
* CloudpickleWrapper: Load using cloudpickle

* Update changelog
2020-07-21 10:12:39 +02:00
Stelios Tymvios
dbe8cfceb6
Optimized polyak updates (#106)
* quick polyak updates

* changelog

* typing

* reverted autoformatting

* rerverted autofmt

* Update stable_baselines3/common/utils.py

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* parameter names in test

* cleanup

* Merge branch 'master' into polyak

* Update changelog

* Apply suggestions from code review

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Update stable_baselines3/common/utils.py

* Update utils.py

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-07-17 15:53:28 +02:00
Antonin RAFFIN
23afedb254
Auto-formatting with black and isort (#97)
* Add auto formatting with black and isort

* Reformat code

* Ignore typing errors

* Add note about line length

* Add minimum version for isort

* Add commit-checks

* Update docker image

* Fixed lost import (during last merge)

* Fix opencv dependency
2020-07-16 16:12:16 +02:00
Antonin RAFFIN
5ff176b2f1
Implement DDPG (#92)
* Add DDPG + TD3 with any number of critics

* Allow any number of critics for SAC

* Update doc

* [ci skip] Update DDPG example

* Remove unused parameter

* Add DDPG to identity test

* Fix computation with n_critics=1,3

* Update doc

* Apply suggestions from code review

Co-authored-by: Adam Gleave <adam@gleave.me>

* Update docstrings for off-policy algos

* Add check for sde

Co-authored-by: Adam Gleave <adam@gleave.me>
2020-07-16 14:14:22 +02:00
Antonin RAFFIN
208890dfc8
Ignore errors from new pytype version (#107) 2020-07-16 11:54:37 +02:00
Joel Joseph
3cf6e9714b
Update ppo.rst (#94)
* Update ppo.rst

minor correction from A2C to PPO

* Update changelog.rst

* Update changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-07-10 10:38:35 +02:00
Adam Gleave
e7130344de Add changelog entry 2020-07-07 19:03:46 -07:00
Antonin RAFFIN
3756d05f72
Refactored ContinuousCritic for SAC/TD3 (#78)
* Refactored ContinuousCritic for SAC/TD3

* Address comments

* Add pybullet notebook
2020-07-07 01:02:51 +03:00
Stelios Tymvios
4aa66ed34a
Automatically create paths for saved objects (#80)
* automatically create paths for saved objects

* Minor Corrections, more tests

* linting

* typing

* Correct mode checking

* corrected tests to reflect new verbose functionality
2020-07-03 01:14:21 +03:00
Marios Koulakis
7d8ebb9e98
Udacity Reacher Project with Unity (#79)
* Add the reacher project to the sample projects

* Update the change log

* Remove github incompatible link notation

* Update changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-06-30 15:03:02 +02:00
Antonin RAFFIN
08e7519381
Fix q-target in SAC (#77)
* Fix q-target in SAC

* [ci skip] Update version
2020-06-29 17:58:55 +02:00
Noah
96b771f24e
Implement DQN (#28)
* Created DQN template according to the paper.
Next steps:
- Create Policy
- Complete Training
- Debug

* Changed Base Class

* refactor save, to be consistence with overriding the excluded_save_params function. Do not try to exclude the parameters twice.

* Added simple DQN policy

* Finished learn and train function
- missing correct loss computation

* changed collect_rollouts to work with discrete space

* moved discrete space collect_rollouts to dqn

* basic dqn working

* deleted SDE related code

* added gradient clipping and moved greedy policy to policy

* changed policy to implement target network
and added soft update(in fact standart tau is 1 so hard update)

* fixed policy setup

* rebase target_update_intervall on _n_updates

* adapted all tests
all tests passing

* Move to stable-baseline3

* Fixes for DQN

* Fix tests + add CNNPolicy

* Allow any optimizer for DQN

* added some util functions to create a arbitrary linear schedule, fixed pickle problem with old exploration schedule

* more documentation

* changed buffer dtype

* refactor and document

* Added Sphinx Documentation
Updated changelog.rst

* removed custom collect_rollouts as it is no longer necessary

* Implemented suggestions to clean code and documentation.

* extracted some functions on tests to reduce duplicated code

* added support for exploration_fraction

* Fixed exploration_fraction

* Added documentation

* Fixed get_linear_fn -> proper progress scaling

* Merged master

* Added nature reference

* Changed default parameters to https://www.nature.com/articles/nature14236/tables/1

* Fixed n_updates to be incremented correctly

* Correct train_freq

* Doc update

* added special parameter for DQN in tests

* different fix for test_discrete

* Update docs/modules/dqn.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Update docs/modules/dqn.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Update docs/modules/dqn.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Added RMSProp in optimizer_kwargs, as described in nature paper

* Exploration fraction is inverse of 50.000.000 (total frames) / 1.000.000 (frames with linear schedule) according to nature paper

* Changelog update for buffer dtype

* standard exlude parameters should be always excluded to assure proper saving only if intentionally included by ``include`` parameter

* slightly more iterations on test_discrete to pass the test

* added param use_rms_prop instead of mutable default argument

* forgot alpha

* using huber loss, adam and learning rate 1e-4

* account for train_freq in update_target_network

* Added memory check for both buffers

* Doc updated for buffer allocation

* Added psutil Requirement

* Adapted test_identity.py

* Fixes with new SB3 version

* Fix for tensorboard name

* Convert assert to warning and fix tests

* Refactor off-policy algorithms

* Fixes

* test: remove next_obs in replay buffer

* Update changelog

* Fix tests and use tmp_path where possible

* Fix sampling bug in buffer

* Do not store next obs on episode termination

* Fix replay buffer sampling

* Update comment

* moved epsilon from policy to model

* Update predict method

* Update atari wrappers to match SB2

* Minor edit in the buffers

* Update changelog

* Merge branch 'master' into dqn

* Update DQN to new structure

* Fix tests and remove hardcoded path

* Fix for DQN

* Disable memory efficient replay buffer by default

* Fix docstring

* Add tests for memory efficient buffer

* Update changelog

* Split collect rollout

* Move target update outside `train()` for DQN

* Update changelog

* Update linear schedule doc

* Cleanup DQN code

* Minor edit

* Update version and docker images

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-06-29 11:16:54 +02:00
JieQiang (Jay) Wei
e47da426c1
Update rl_zoo.rst (#72)
* Update rl_zoo.rst

a typo fixed.

* Update changelog.rst

Fixed a typo in zoo readme.

* Update changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-06-25 12:14:56 +02:00
Matthias K
977a615c82
Fixed SubprocVecEnv close. (#68)
Updated changelog.

Co-authored-by: Matthias K <wirspielen@web.de>
2020-06-20 18:01:37 +02:00
Tirafesi
644d2c17ac
save_replay_buffer now receives as argument the file path instead of the folder path (#63)
* save_replay_buffer now receives as argument the file path instead of the folder path

* Update changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-06-17 14:00:49 +02:00
Antonin RAFFIN
a861f33107
Update notebooks (#65) 2020-06-17 12:47:09 +02:00
Antonin RAFFIN
494ebfd20a
Hotfix PPO + gSDE (#53)
* Fix variable being passed with gradients

* Update changelog

* Bump version

* Fixes #54
2020-06-10 18:58:35 +02:00