Commit graph

134 commits

Author SHA1 Message Date
Bernhard Raml
97b81f9e9e
Fix ignoring the exclude in the logger's record function for json, csv and log logging formats (#190)
* Fix ignoring the exclude in logger record

For the logging formats json, csv, and log the exclude parameter of the
logger's record function has been ignored. The necessary checks were
missing from some of the format writer classes. Regression tests have
been added to prevent this error in the future.

* Fix docstring for filter_excluded_keys

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Added missing type hints to local functions

* Update stable_baselines3/common/logger.py

Co-authored-by: Bernhard Raml <raml.bernhard@gmail.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-10-16 17:34:49 +02:00
Wilson
fe6ade3089
Allow env_kwargs in make_vec_env when env ID string supplied (#189)
* Allow env_kwargs in make_vec_env when env ID string supplied

Resolves #188

* Update docs/misc/changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Add test for env kargs in make_vec_env

* remove unnecessary args in test_vec_env_kwargs function

* Fixes and reformat

* Doc fix

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-10-16 11:09:19 +02:00
Antonin RAFFIN
2599f04940
Add custom arch for off-policy actor/critic networks (#182)
* Add custom arch for off-policy actor/critic networks

* Fix type hints

* Address comments

* Make sure number of updated parameters match in polyak

* Add zip_strict for strict-length zipping

* Fix building docs

* Add test for zip strict

* Faster tests

Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>
2020-10-13 12:01:33 +02:00
Antonin RAFFIN
fc9527157a
Fix off-by-one GAE computation (#185)
* Fix off-by-one GAE computation

* Fix identity test

* Revert gae loop
2020-10-13 00:10:54 +03:00
Anssi
9855486488
Get/set parameters and review of saving and loading (#138)
* Update comments and docstrings

* Rename get_torch_variables to private and update docs

* Clarify documentation on data, params and tensors

* Make excluded_save_params private and update docs

* Update get_torch_variable_names to get_torch_save_params for description

* Simplify saving code and update docs on params vs tensors

* Rename saved item tensors to pytorch_variables for clarity

* Reformat

* Fix a typo

* Add get/set_parameters, update tests accordingly

* Use f-strings for formatting

* Fix load docstring

* Reorganize functions in BaseClass

* Update changelog

* Add library version to the stored models

* Actually run isort this time

* Fix flake8 complaints and also fix testing code

* Fix isort

* ...and black

* Fix set_random_seed

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2020-09-24 14:28:27 +02:00
liorcohen5
f5104a5efc
Allow to set a device when loading a model (#154)
* Added a 'device' keyword argument to BaseAlgorithm.load().
Edited the save and load test to also test the load method with all possible devices.
Added the changes to the changelog

* improved the load test to ensure that the model loads to the correct device.

* improved the test: now the correctness is improved. If the get_device policy would change, it wouldn't break the test.

* Update tests/test_save_load.py

@araffin's suggestion during the PR process

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Update tests/test_save_load.py

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Bug fixes: when comparing devices, comparing only device type since get_device() doesn't provide device index.
Now the code loads all of the model parameters from the saved state dict straight into the required device. (fixed load_from_zip_file).

* PR fixes: bug fix - a non-related test failed when running on GPU. updated the assertion to consider only types of devices. Also corrected a related bug in 'get_device()' method.

* Update changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-09-20 19:13:18 +02:00
Francisco Caio
5fc90a7f7d
Add StopTrainingOnMaxEpisodes to callback collection (#147)
* Add StopTrainingOnMaxEpisodes class to pre-made callback collection

* Adjust instant when counters are incremented for both OnPolicy and OffPolicy algorithms

* Improv to StopTrainingOnMaxEpisodes including output, tests and doc

* Improv StopTrainingOnMaxEpisodes callback running _init_callback

* Update callbacks.py

* Update test_callbacks.py

* Fix style

* Update changelog.rst

* Fix test

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2020-08-28 11:36:33 +02:00
Antonin RAFFIN
15d32c6a4a
Update black version + update docker image (#151)
* Update docker image

* Update black and reformat
2020-08-27 23:02:59 +02:00
Stelios Tymvios
9003a09d5b
Callbacks have access to locals (#115)
* callbacks have access to locals

* changeloc

* doc

* callbacks have access to locals

* changeloc

* doc

* Added update function for child callbacks

* Pre-Release 0.8.0 (#134)

* Fix double reset and improve typing coverage (#136)

* Fix double reset and improve typing coverage

* Revert minor edit

* Add doc about types

* Update child callbacks

* cleaned imports

* format

* import order

* Simplify tests and add comments

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-08-23 14:34:01 +02:00
Sam Toyer
42ef6d4677
Remove "device" argument from policies (#141)
* Remove device arg from policies

* Clean up for PR

* Update test and doc

* Fix codestyle

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-08-23 13:27:52 +02:00
Anssi
2cd6a4f93b
Match performance with stable-baselines (discrete case) (#110)
* Fix storing correct episode dones

* Fix number of filters in NatureCNN network

* Add TF-like RMSprop for matching performance with sb2

* Remove stuff that was accidentally included

* Reformat

* Clarify variable naming

* Update changelog

* Add comment on RMSprop implementations to A2C

* Add test for RMSpropTFLike

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-08-03 22:22:51 +02:00
Stelios Tymvios
dbe8cfceb6
Optimized polyak updates (#106)
* quick polyak updates

* changelog

* typing

* reverted autoformatting

* rerverted autofmt

* Update stable_baselines3/common/utils.py

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* parameter names in test

* cleanup

* Merge branch 'master' into polyak

* Update changelog

* Apply suggestions from code review

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Update stable_baselines3/common/utils.py

* Update utils.py

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-07-17 15:53:28 +02:00
Antonin RAFFIN
23afedb254
Auto-formatting with black and isort (#97)
* Add auto formatting with black and isort

* Reformat code

* Ignore typing errors

* Add note about line length

* Add minimum version for isort

* Add commit-checks

* Update docker image

* Fixed lost import (during last merge)

* Fix opencv dependency
2020-07-16 16:12:16 +02:00
Antonin RAFFIN
5ff176b2f1
Implement DDPG (#92)
* Add DDPG + TD3 with any number of critics

* Allow any number of critics for SAC

* Update doc

* [ci skip] Update DDPG example

* Remove unused parameter

* Add DDPG to identity test

* Fix computation with n_critics=1,3

* Update doc

* Apply suggestions from code review

Co-authored-by: Adam Gleave <adam@gleave.me>

* Update docstrings for off-policy algos

* Add check for sde

Co-authored-by: Adam Gleave <adam@gleave.me>
2020-07-16 14:14:22 +02:00
Stelios Tymvios
4aa66ed34a
Automatically create paths for saved objects (#80)
* automatically create paths for saved objects

* Minor Corrections, more tests

* linting

* typing

* Correct mode checking

* corrected tests to reflect new verbose functionality
2020-07-03 01:14:21 +03:00
Noah
96b771f24e
Implement DQN (#28)
* Created DQN template according to the paper.
Next steps:
- Create Policy
- Complete Training
- Debug

* Changed Base Class

* refactor save, to be consistence with overriding the excluded_save_params function. Do not try to exclude the parameters twice.

* Added simple DQN policy

* Finished learn and train function
- missing correct loss computation

* changed collect_rollouts to work with discrete space

* moved discrete space collect_rollouts to dqn

* basic dqn working

* deleted SDE related code

* added gradient clipping and moved greedy policy to policy

* changed policy to implement target network
and added soft update(in fact standart tau is 1 so hard update)

* fixed policy setup

* rebase target_update_intervall on _n_updates

* adapted all tests
all tests passing

* Move to stable-baseline3

* Fixes for DQN

* Fix tests + add CNNPolicy

* Allow any optimizer for DQN

* added some util functions to create a arbitrary linear schedule, fixed pickle problem with old exploration schedule

* more documentation

* changed buffer dtype

* refactor and document

* Added Sphinx Documentation
Updated changelog.rst

* removed custom collect_rollouts as it is no longer necessary

* Implemented suggestions to clean code and documentation.

* extracted some functions on tests to reduce duplicated code

* added support for exploration_fraction

* Fixed exploration_fraction

* Added documentation

* Fixed get_linear_fn -> proper progress scaling

* Merged master

* Added nature reference

* Changed default parameters to https://www.nature.com/articles/nature14236/tables/1

* Fixed n_updates to be incremented correctly

* Correct train_freq

* Doc update

* added special parameter for DQN in tests

* different fix for test_discrete

* Update docs/modules/dqn.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Update docs/modules/dqn.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Update docs/modules/dqn.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Added RMSProp in optimizer_kwargs, as described in nature paper

* Exploration fraction is inverse of 50.000.000 (total frames) / 1.000.000 (frames with linear schedule) according to nature paper

* Changelog update for buffer dtype

* standard exlude parameters should be always excluded to assure proper saving only if intentionally included by ``include`` parameter

* slightly more iterations on test_discrete to pass the test

* added param use_rms_prop instead of mutable default argument

* forgot alpha

* using huber loss, adam and learning rate 1e-4

* account for train_freq in update_target_network

* Added memory check for both buffers

* Doc updated for buffer allocation

* Added psutil Requirement

* Adapted test_identity.py

* Fixes with new SB3 version

* Fix for tensorboard name

* Convert assert to warning and fix tests

* Refactor off-policy algorithms

* Fixes

* test: remove next_obs in replay buffer

* Update changelog

* Fix tests and use tmp_path where possible

* Fix sampling bug in buffer

* Do not store next obs on episode termination

* Fix replay buffer sampling

* Update comment

* moved epsilon from policy to model

* Update predict method

* Update atari wrappers to match SB2

* Minor edit in the buffers

* Update changelog

* Merge branch 'master' into dqn

* Update DQN to new structure

* Fix tests and remove hardcoded path

* Fix for DQN

* Disable memory efficient replay buffer by default

* Fix docstring

* Add tests for memory efficient buffer

* Update changelog

* Split collect rollout

* Move target update outside `train()` for DQN

* Update changelog

* Update linear schedule doc

* Cleanup DQN code

* Minor edit

* Update version and docker images

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-06-29 11:16:54 +02:00
Tirafesi
644d2c17ac
save_replay_buffer now receives as argument the file path instead of the folder path (#63)
* save_replay_buffer now receives as argument the file path instead of the folder path

* Update changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-06-17 14:00:49 +02:00
Antonin RAFFIN
494ebfd20a
Hotfix PPO + gSDE (#53)
* Fix variable being passed with gradients

* Update changelog

* Bump version

* Fixes #54
2020-06-10 18:58:35 +02:00
Anssi
b833207142
Add some missing tests, update VecNormalize and RolloutBuffer (#50)
* Change saving/loading normalization parameters to use single pickle file

* Remove 'use_gae' from RolloutBuffer compute_returns function

* Add some missing tests for normalizer, nan-checker and PPO clip_value_fn argument

* Update changelog

* Fix typo

* Use proper pytest.raises for catching errors in tests

* Add comment on GAE and how to obtain non-GAE behaviour

* Remove save/load_running_average from VecNormalize in favor of load/save

* Update changelog

* Update docstring

* Add accidentally removed tests for VecNormalize

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-06-10 12:09:04 +02:00
Anssi
44f8218df0
Review of code (A2C, PPO and refactoring) (#35)
* Split torch module code into torch_layers file

* Updated reference to CNN

* Change 'CxWxH' to 'CxHxW', as per common notion

* Fix missing import in policies.py

* Move PPOPolicy to OnlineActorCriticPolicy

* Create OnPolicyRLModel from PPO, and make A2C and PPO inherit

* Update A2C optimizer comment

* Clean weight init scales for clarity

* Fix A2C log_interval default parameter

* Rename 'progress' to 'progress_remaining

* Rename 'Models' to 'Algorithms'

* Rename 'OnlineActorCriticPolicy' to 'ActorCriticPolicy'

* Move static functions out from BaseAlgorithm

* Move on/off_policy base algorithms to their own files

* Add  files for A2C/PPO

* Fix docs

* Fix pytype

* Update documentation on OnPolicyAlgorithm

* Add proper doctstring for on_policy rollout gathering

* Add bit clarification on the mlppolicy/cnnpolicy naming

* Move static function is_vectorized_policies to utils.py

* Checking docstrings, pep8 fixes

* Update changelog

* Clean changelog

* Remove policy warnings for sac/td3

* Add monitor_wrapper for OnPolicyAlgorithm. Clean tb logging variables. Add parameter keywords to OffPolicyAlgorithm super init

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-06-09 13:54:18 +02:00
Antonin RAFFIN
353ea81080
Fix several VecEnv issues, add fork start method to tests (#43)
* Fix several VecEnv issues, add `fork` start method to tests

* Fix signature
2020-06-04 11:22:12 +02:00
Roland Gavrilescu
bb01253261
Tensorboard integration (#30)
* init commit tensorboard-integration

* Added tb logger to ppo (with output exclusions)

* fixed truncated stdout

* categorize stdout outputs by tag

* separated exclusions from values, added missing logs

* saving exclusions as dict instead of list

* reformatting, auto run indexing

* included renaming suggestions, fixed tests

* tb support for sac

* linting

* moved logging to base class

* tb support for td3

* removed histograms, non-verbose output working

* modifed changelog

* linting

* fixed type error

* moved logger config to utils

* removed episode_rewards log from ppo

* Enable tensorboard in tests

* Remove unused import

* Update logger sub titles

* Minor edit for PPO

* Update logger and tb log folder

* Pass correct logger to Callbacks

* updated docs

* added tb example image to docs

* add support for continuing training in tensorboard

* added tensorboard to docs index

* added tb test

* moved logger config to _setup_learn, updated tests

* accessing verbose from base class

* Update doc and tests

* Rename session -> time

* Update version

* Update logger truncate

* Update types

* Remove duplicated code

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-06-01 11:55:44 +02:00
Stelios Tymvios
78e8d405d7
Implemented Vectorized Action Noise (#34)
* Implemented Vectorized Action Noise

Vectorized Action Noise allows for multiple instances of
ActionNoiseProcesses to run in parallel. This makes it easier to
run TD3/SAC/DDPG with VecEnv.

* fixed linting issues

* make test function name consistent

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* sanity checks and more detailed test

* Update stable_baselines3/common/noise.py

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Added assertion error message in noises setter

* Corrected tests to reflect change to AssertionError from ValueError

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-05-27 09:53:01 +02:00
Roland Gavrilescu
91adefdb4b
Support for MultiBinary / MultiDiscrete spaces (#13)
* multicategorical dist and test

* fixed List annotation

* bernoulli dist and test

* added distributions to preprocessing (needs testing)

* fixed and tested distributions

* added changelog and fixed ppo policy

* minor fix

* dist fixes, added test_spaces

* clean up

* modified changelog

* additional fixes

* minor changelog mod

* hot encoding fix, flake8 clean up

* lint tests

* preprocessing fix

* fixed bernoulli bug

* removed commented prints

* Update changelog.rst

* included suggested modifications

* linting fix

* increased space dim

* Update doc and tests

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-05-18 14:42:13 +02:00
Antonin RAFFIN
15ff6d47ee
Documentation update and style fixes (#21)
* Update doc: add gSDE

* Fix codestyle

* Remove travis script

* Add lint check to gitlab
2020-05-15 13:54:06 +02:00
Antonin RAFFIN
54f6f5b6fb
Add flake8 linter and Github CI (#19)
* Cleanup code

* Add flake8 lint and github workflow

* Update build matrix

* Relax precision for python3.7
2020-05-12 17:55:01 +02:00
Antonin RAFFIN
257a40ef4b
Add Gitlab CI (#12)
* Test gitlab-ci

* Try different image

* Add pytest and doc build

* Fix command

* Fix image used for CI

* Seperate pytest builds

* Fix weird seg fault in docker image due to FakeImageEnv

* Fix make command

* [ci skip] Add space in the badges

* Fix CI failures

* Re-install opencv

* Use opencv-headless

* Test with new docker image
2020-05-09 23:10:49 +02:00
Antonin RAFFIN
c20af230f7 Remove SDE support for TD3 2020-05-08 15:00:34 +02:00
Antonin RAFFIN
aa0ff8a59b Update atari test 2020-05-07 16:36:48 +02:00
Antonin RAFFIN
8046a24719 More doc + sync VecEnvs + atari 2020-05-07 16:08:23 +02:00
Antonin RAFFIN
cf1ae840c8 Sync identity envs 2020-05-05 16:52:22 +02:00
Antonin RAFFIN
04d85ac2e2 Fix import in tests 2020-05-05 16:32:08 +02:00
Antonin RAFFIN
2c34a4d694 Sync with Stable-Baselines 2020-05-05 16:28:38 +02:00
Antonin RAFFIN
d542732c8d Rename to stable-baselines3 2020-05-05 15:02:35 +02:00
Antonin RAFFIN
7ae54206ce Reformat and code cleanup 2020-04-23 15:18:21 +02:00
Antonin RAFFIN
f38ddcb278 Allow any number of channels 2020-04-22 16:11:23 +02:00
Antonin RAFFIN
f3cb0688c4 Fix custom optimizer 2020-04-22 13:21:11 +02:00
Antonin RAFFIN
041f2bc59a Cleanup, bug fixes + more tests 2020-04-22 13:14:22 +02:00
Antonin RAFFIN
73fb8d1c63 Add CNN support for TD3 2020-04-22 11:05:46 +02:00
Antonin RAFFIN
8aac9e819d Add VecTransposeImage and fix for SAC 2020-04-21 20:41:58 +02:00
Antonin RAFFIN
93c2a01f91 Start CNN support (failing for SAC) 2020-04-21 16:22:46 +02:00
Antonin RAFFIN
f347474e6a Independent save/load for policies 2020-04-20 15:59:44 +02:00
Antonin RAFFIN
aa1026ee87 Added `optimizer and optimizer_kwargs to policy_kwargs` 2020-04-17 15:13:45 +02:00
Antonin RAFFIN
71ce9ef2f4 Add test for actor 2020-03-31 18:26:26 +02:00
Antonin RAFFIN
c264403816 Rename for consistency
+ add _predict to actors
+ improve sac actor code
2020-03-31 17:48:23 +02:00
Antonin RAFFIN
2bbf6a9462 Minor: remove comment 2020-03-31 16:40:53 +02:00
Antonin RAFFIN
fdecd512db Add save/load weights for policies and refactor action distributions 2020-03-31 16:29:13 +02:00
Antonin RAFFIN
b782f3a208 Fix for test failures 2020-03-31 10:18:56 +02:00
Antonin RAFFIN
fa599c65a6 Add support for Discrete observation spaces 2020-03-25 16:42:05 +01:00
Antonin RAFFIN
4b2092f55a Remove base network 2020-03-23 15:31:14 +01:00