Antonin RAFFIN
b52c6fc18f
Fix logger setup ( #469 )
...
* Make logger an attribute
* Update doc
* Fix logger reset when using multiple runs
* Cleanup logger: remove `Logger.CURRENT`
* Fix for PPO
* Update tests and improve docstring
* Add warning
* Throw error when tensorboard not installed
2021-06-14 15:17:48 +02:00
Jaden Travnik
75b6f3b3b0
Dictionary Observations ( #243 )
...
* First commit
* Fixing missing refs from a quick merge from master
* Reformat
* Adding DictBuffers
* Reformat
* Minor reformat
* added slow dict test. Added SACMultiInputPolicy for future. Added private static image transpose helper to common policy
* Ran black on buffers
* Ran isort
* Adding StackedObservations classes used within VecStackEnvs wrappers. Made test_dict_env shorter and removed slow
* Running isort :facepalm
* Fixed typing issues
* Adding docstrings and typing. Using util for moving data to device.
* Fixed trailing commas
* Fix types
* Minor edits
* Avoid duplicating code
* Fix calls to parents
* Adding assert to buffers. Updating changelong
* Running format on buffers
* Adding multi-input policies to dqn,td3,a2c. Fixing warnings. Fixed bug with DictReplayBuffer as Replay buffers use only 1 env
* Fixing warnings, splitting is_vectorized_observation into multiple functions based on space type
* Created envs folder in common. Updated imports. Moved stacked_obs to vec_env folder
* Moved envs to envs directory. Moved stacked obs to vec_envs. Started update on documentation
* Fixes
* Running code style
* Update docstrings on torch_layers
* Decapitalize non-constant variables
* Using NatureCNN architecture in combined extractor. Increasing img size in multi input env. Adding memory reduction in test
* Update doc
* Update doc
* Fix format
* Removing NineRoom env. Using nested preprocess. Removing mutable default args
* running code style
* Passing channel check through to stacked dict observations.
* Running black
* Adding channel control to SimpleMultiObsEnv. Passing check_channels to CombinedExtractor
* Remove optimize memory for dict buffers
* Update doc
* Move identity env
* Minor edits + bump version
* Update doc
* Fix doc build
* Bug fixes + add support for more type of dict env
* Fixes + add multi env test
* Add support for vectranspose
* Fix stacked obs for dict and add tests
* Add check for nested spaces. Fix dict-subprocvecenv test
* Fix (single) pytype error
* Simplify CombinedExtractor
* Fix tests
* Fix check
* Merge branch 'master' into feat/dict_observations
* Fix for net_arch with dict and vector obs
* Fixes
* Add consistency test
* Update env checker
* Add some docs on dict obs
* Update default CNN feature vector size
* Refactor HER (#351 )
* Start refactoring HER
* Fixes
* Additional fixes
* Faster tests
* WIP: HER as a custom replay buffer
* New replay only version (working with DQN)
* Add support for all off-policy algorithms
* Fix saving/loading
* Remove ObsDictWrapper and add VecNormalize tests with dict
* Stable-Baselines3 v1.0 (#354 )
* Bump version and update doc
* Fix name
* Apply suggestions from code review
Co-authored-by: Adam Gleave <adam@gleave.me>
* Update docs/index.rst
Co-authored-by: Adam Gleave <adam@gleave.me>
* Update wording for RL zoo
Co-authored-by: Adam Gleave <adam@gleave.me>
* Add gym-pybullet-drones project (#358 )
* Update projects.rst
Added gym-pybullet-drones
* Update projects.rst
Longer title underline
* Update changelog
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Include SuperSuit in projects (#359 )
* include supersuit
* longer title underline
* Update changelog.rst
* Fix default arguments + add bugbear (#363 )
* Fix potential bug + add bug bear
* Remove unused variables
* Minor: version bump
* Add code of conduct + update doc (#373 )
* Add code of conduct
* Fix DQN doc example
* Update doc (channel-last/first)
* Apply suggestions from code review
Co-authored-by: Anssi <kaneran21@hotmail.com>
* Apply suggestions from code review
Co-authored-by: Adam Gleave <adam@gleave.me>
Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Adam Gleave <adam@gleave.me>
* Make installation command compatible with ZSH (#376 )
* Add quotes
* Add Zsh bracket info
* Add clarify pip installation line
* Make note bold
* Add Zsh pip installation note
* Add handle timeouts param
* Fixes
* Fixes (buffer size, extend test)
* Fix `max_episode_length` redefinition
* Fix potential issue
* Add some docs on dict obs
* Fix performance bug
* Fix slowdown
* Add package to install (#378 )
* Add package to install
* Update docs packages installation command
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Fix backward compat + add test
* Fix VecEnv detection
* Update doc
* Fix vec env check
* Support for `VecMonitor` for gym3-style environments (#311 )
* add vectorized monitor
* auto format of the code
* add documentation and VecExtractDictObs
* refactor and add test cases
* add test cases and format
* avoid circular import and fix doc
* fix type
* fix type
* oops
* Update stable_baselines3/common/monitor.py
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Update stable_baselines3/common/monitor.py
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* add test cases
* update changelog
* fix mutable argument
* quick fix
* Apply suggestions from code review
* fix terminal observation for gym3 envs
* delete comment
* Update doc and bump version
* Add warning when already using `Monitor` wrapper
* Update vecmonitor tests
* Fixes
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Reformat
* Fixed loading of ``ent_coef`` for ``SAC`` and ``TQC``, it was not optimized anymore (#392 )
* Fix ent coef loading bug
* Add test
* Add comment
* Reuse save path
* Add test for GAE + rename `RolloutBuffer.dones` for clarification (#375 )
* Fix return computation + add test for GAE
* Rename `last_dones` to `episode_starts` for clarification
* Revert advantage
* Cleanup test
* Rename variable
* Clarify return computation
* Clarify docs
* Add multi-episode rollout test
* Reformat
Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>
* Fixed saving of `A2C` and `PPO` policy when using gSDE (#401 )
* Improve doc and replay buffer loading
* Add support for images
* Fix doc
* Update Procgen doc
* Update changelog
* Update docstrings
Co-authored-by: Adam Gleave <adam@gleave.me>
Co-authored-by: Jacopo Panerati <jacopo.panerati@utoronto.ca>
Co-authored-by: Justin Terry <justinkterry@gmail.com>
Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Tom Dörr <tomdoerr96@gmail.com>
Co-authored-by: Tom Dörr <tom.doerr@tum.de>
Co-authored-by: Costa Huang <costa.huang@outlook.com>
* Update doc and minor fixes
* Update doc
* Added note about MultiInputPolicy in error of NatureCNN
* Merge branch 'master' into feat/dict_observations
* Address comments
* Naming clarifications
* Actually saving the file would be nice
* Fix edge case when doing online sampling with HER
* Cleanup
* Add sanity check
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>
Co-authored-by: Adam Gleave <adam@gleave.me>
Co-authored-by: Jacopo Panerati <jacopo.panerati@utoronto.ca>
Co-authored-by: Justin Terry <justinkterry@gmail.com>
Co-authored-by: Tom Dörr <tomdoerr96@gmail.com>
Co-authored-by: Tom Dörr <tom.doerr@tum.de>
Co-authored-by: Costa Huang <costa.huang@outlook.com>
2021-05-11 12:29:30 +02:00
Rohan Tangri
35da0b59b9
Policy Base for On-policy Algorithms ( #412 ) ( #415 )
...
* add policy_base input to OnPolicyAlgorithms
* update changelog
* Fix pytype error
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-05-04 12:59:36 +03:00
Antonin RAFFIN
5d47296b8d
Add test for GAE + rename RolloutBuffer.dones for clarification ( #375 )
...
* Fix return computation + add test for GAE
* Rename `last_dones` to `episode_starts` for clarification
* Revert advantage
* Cleanup test
* Rename variable
* Clarify return computation
* Clarify docs
* Add multi-episode rollout test
* Reformat
Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>
2021-04-16 15:52:55 +02:00
Antonin RAFFIN
c62e9259db
Add custom objects support + bug fix ( #336 )
...
* Add support for custom objects
* Add python 3.8 to the CI
* Bump version
* PyType fixes
* [ci skip] Fix typo
* Add note about slow-down + fix typos
* Minor edits to the doc
* Bug fix for DQN
* Update test
* Add test for custom objects
2021-03-06 15:17:43 +02:00
Antonin RAFFIN
2b9fc1f923
Add supported action spaces checks ( #254 )
...
* Add supported action spaces checks
* Address comment
2020-12-06 14:05:10 +02:00
Antonin RAFFIN
d04aad2a20
Doc fixes and add monitor_kwargs parameter ( #230 )
...
* Fix type annotation
* Fix migration doc for A2C
* Update version
* Add `monitor_kwargs` argument
* Update docs/guide/migration.rst
Co-authored-by: Adam Gleave <adam@gleave.me>
* Fix make atari env
* Fix docstring
* Renamed LearningRateSchedule
Co-authored-by: Adam Gleave <adam@gleave.me>
2020-11-20 10:28:54 +01:00
M. Ernestus
c74509ae9d
Add callable signatures to type annotations. ( #215 )
...
* Add callback signature to the learning rate type annotations.
* Add callback signature to the learning rate schedule type annotations.
* Add missing type annotations for learning rate callbacks.
* Add signature to old-style learning and evaluation callbacks.
* Add signature to env wrapper callback.
* Add type annotation to closure function.
* Use MaybeCallback more consistently.
* Update changelog.
* Remove now unused List import.
* Fix import order.
* Add type alias for learning rate schedules.
* Optimize imports.
* Fix messed up import.
* Remove resolved TODO.
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-11-15 17:50:28 +01:00
Stefan Heid
9d463bc476
Small docstring improvements related to the notion of Rollout ( #206 )
...
* Small docstring improvements related to the notion of Rollout
* documented changes in changelog.rst, added myself to contributers
* Minor edits
Co-authored-by: Stefan Heid <stefan.heid@upb.de>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-11-02 11:45:08 +01:00
Antonin RAFFIN
fc9527157a
Fix off-by-one GAE computation ( #185 )
...
* Fix off-by-one GAE computation
* Fix identity test
* Revert gae loop
2020-10-13 00:10:54 +03:00
Antonin RAFFIN
55912576ed
Cleanup docstring types ( #169 )
...
* Cleanup docstring types
* Update style
* Test with js hack
* Revert "Test with js hack"
This reverts commit d091f438e8851ab8d01b66628e06a104f5e5ec69.
* Fix types
* Fix typo
* Update CONTRIBUTING example
2020-10-02 20:05:55 +03:00
Anssi
9855486488
Get/set parameters and review of saving and loading ( #138 )
...
* Update comments and docstrings
* Rename get_torch_variables to private and update docs
* Clarify documentation on data, params and tensors
* Make excluded_save_params private and update docs
* Update get_torch_variable_names to get_torch_save_params for description
* Simplify saving code and update docs on params vs tensors
* Rename saved item tensors to pytorch_variables for clarity
* Reformat
* Fix a typo
* Add get/set_parameters, update tests accordingly
* Use f-strings for formatting
* Fix load docstring
* Reorganize functions in BaseClass
* Update changelog
* Add library version to the stored models
* Actually run isort this time
* Fix flake8 complaints and also fix testing code
* Fix isort
* ...and black
* Fix set_random_seed
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2020-09-24 14:28:27 +02:00
Francisco Caio
5fc90a7f7d
Add StopTrainingOnMaxEpisodes to callback collection ( #147 )
...
* Add StopTrainingOnMaxEpisodes class to pre-made callback collection
* Adjust instant when counters are incremented for both OnPolicy and OffPolicy algorithms
* Improv to StopTrainingOnMaxEpisodes including output, tests and doc
* Improv StopTrainingOnMaxEpisodes callback running _init_callback
* Update callbacks.py
* Update test_callbacks.py
* Fix style
* Update changelog.rst
* Fix test
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2020-08-28 11:36:33 +02:00
Stelios Tymvios
9003a09d5b
Callbacks have access to locals ( #115 )
...
* callbacks have access to locals
* changeloc
* doc
* callbacks have access to locals
* changeloc
* doc
* Added update function for child callbacks
* Pre-Release 0.8.0 (#134 )
* Fix double reset and improve typing coverage (#136 )
* Fix double reset and improve typing coverage
* Revert minor edit
* Add doc about types
* Update child callbacks
* cleaned imports
* format
* import order
* Simplify tests and add comments
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-08-23 14:34:01 +02:00
Sam Toyer
42ef6d4677
Remove "device" argument from policies ( #141 )
...
* Remove device arg from policies
* Clean up for PR
* Update test and doc
* Fix codestyle
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-08-23 13:27:52 +02:00
Anssi
2cd6a4f93b
Match performance with stable-baselines (discrete case) ( #110 )
...
* Fix storing correct episode dones
* Fix number of filters in NatureCNN network
* Add TF-like RMSprop for matching performance with sb2
* Remove stuff that was accidentally included
* Reformat
* Clarify variable naming
* Update changelog
* Add comment on RMSprop implementations to A2C
* Add test for RMSpropTFLike
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-08-03 22:22:51 +02:00
Antonin RAFFIN
23afedb254
Auto-formatting with black and isort ( #97 )
...
* Add auto formatting with black and isort
* Reformat code
* Ignore typing errors
* Add note about line length
* Add minimum version for isort
* Add commit-checks
* Update docker image
* Fixed lost import (during last merge)
* Fix opencv dependency
2020-07-16 16:12:16 +02:00
Adam Gleave
e61d34a6f0
Fix typing, key error
2020-07-02 21:35:06 -07:00
Anssi
44f8218df0
Review of code (A2C, PPO and refactoring) ( #35 )
...
* Split torch module code into torch_layers file
* Updated reference to CNN
* Change 'CxWxH' to 'CxHxW', as per common notion
* Fix missing import in policies.py
* Move PPOPolicy to OnlineActorCriticPolicy
* Create OnPolicyRLModel from PPO, and make A2C and PPO inherit
* Update A2C optimizer comment
* Clean weight init scales for clarity
* Fix A2C log_interval default parameter
* Rename 'progress' to 'progress_remaining
* Rename 'Models' to 'Algorithms'
* Rename 'OnlineActorCriticPolicy' to 'ActorCriticPolicy'
* Move static functions out from BaseAlgorithm
* Move on/off_policy base algorithms to their own files
* Add files for A2C/PPO
* Fix docs
* Fix pytype
* Update documentation on OnPolicyAlgorithm
* Add proper doctstring for on_policy rollout gathering
* Add bit clarification on the mlppolicy/cnnpolicy naming
* Move static function is_vectorized_policies to utils.py
* Checking docstrings, pep8 fixes
* Update changelog
* Clean changelog
* Remove policy warnings for sac/td3
* Add monitor_wrapper for OnPolicyAlgorithm. Clean tb logging variables. Add parameter keywords to OffPolicyAlgorithm super init
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-06-09 13:54:18 +02:00