* Fix storing correct episode dones
* Fix number of filters in NatureCNN network
* Add TF-like RMSprop for matching performance with sb2
* Remove stuff that was accidentally included
* Reformat
* Clarify variable naming
* Update changelog
* Add comment on RMSprop implementations to A2C
* Add test for RMSpropTFLike
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Add auto formatting with black and isort
* Reformat code
* Ignore typing errors
* Add note about line length
* Add minimum version for isort
* Add commit-checks
* Update docker image
* Fixed lost import (during last merge)
* Fix opencv dependency
* Add DDPG + TD3 with any number of critics
* Allow any number of critics for SAC
* Update doc
* [ci skip] Update DDPG example
* Remove unused parameter
* Add DDPG to identity test
* Fix computation with n_critics=1,3
* Update doc
* Apply suggestions from code review
Co-authored-by: Adam Gleave <adam@gleave.me>
* Update docstrings for off-policy algos
* Add check for sde
Co-authored-by: Adam Gleave <adam@gleave.me>
* Created DQN template according to the paper.
Next steps:
- Create Policy
- Complete Training
- Debug
* Changed Base Class
* refactor save, to be consistence with overriding the excluded_save_params function. Do not try to exclude the parameters twice.
* Added simple DQN policy
* Finished learn and train function
- missing correct loss computation
* changed collect_rollouts to work with discrete space
* moved discrete space collect_rollouts to dqn
* basic dqn working
* deleted SDE related code
* added gradient clipping and moved greedy policy to policy
* changed policy to implement target network
and added soft update(in fact standart tau is 1 so hard update)
* fixed policy setup
* rebase target_update_intervall on _n_updates
* adapted all tests
all tests passing
* Move to stable-baseline3
* Fixes for DQN
* Fix tests + add CNNPolicy
* Allow any optimizer for DQN
* added some util functions to create a arbitrary linear schedule, fixed pickle problem with old exploration schedule
* more documentation
* changed buffer dtype
* refactor and document
* Added Sphinx Documentation
Updated changelog.rst
* removed custom collect_rollouts as it is no longer necessary
* Implemented suggestions to clean code and documentation.
* extracted some functions on tests to reduce duplicated code
* added support for exploration_fraction
* Fixed exploration_fraction
* Added documentation
* Fixed get_linear_fn -> proper progress scaling
* Merged master
* Added nature reference
* Changed default parameters to https://www.nature.com/articles/nature14236/tables/1
* Fixed n_updates to be incremented correctly
* Correct train_freq
* Doc update
* added special parameter for DQN in tests
* different fix for test_discrete
* Update docs/modules/dqn.rst
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Update docs/modules/dqn.rst
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Update docs/modules/dqn.rst
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Added RMSProp in optimizer_kwargs, as described in nature paper
* Exploration fraction is inverse of 50.000.000 (total frames) / 1.000.000 (frames with linear schedule) according to nature paper
* Changelog update for buffer dtype
* standard exlude parameters should be always excluded to assure proper saving only if intentionally included by ``include`` parameter
* slightly more iterations on test_discrete to pass the test
* added param use_rms_prop instead of mutable default argument
* forgot alpha
* using huber loss, adam and learning rate 1e-4
* account for train_freq in update_target_network
* Added memory check for both buffers
* Doc updated for buffer allocation
* Added psutil Requirement
* Adapted test_identity.py
* Fixes with new SB3 version
* Fix for tensorboard name
* Convert assert to warning and fix tests
* Refactor off-policy algorithms
* Fixes
* test: remove next_obs in replay buffer
* Update changelog
* Fix tests and use tmp_path where possible
* Fix sampling bug in buffer
* Do not store next obs on episode termination
* Fix replay buffer sampling
* Update comment
* moved epsilon from policy to model
* Update predict method
* Update atari wrappers to match SB2
* Minor edit in the buffers
* Update changelog
* Merge branch 'master' into dqn
* Update DQN to new structure
* Fix tests and remove hardcoded path
* Fix for DQN
* Disable memory efficient replay buffer by default
* Fix docstring
* Add tests for memory efficient buffer
* Update changelog
* Split collect rollout
* Move target update outside `train()` for DQN
* Update changelog
* Update linear schedule doc
* Cleanup DQN code
* Minor edit
* Update version and docker images
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* save_replay_buffer now receives as argument the file path instead of the folder path
* Update changelog.rst
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Change saving/loading normalization parameters to use single pickle file
* Remove 'use_gae' from RolloutBuffer compute_returns function
* Add some missing tests for normalizer, nan-checker and PPO clip_value_fn argument
* Update changelog
* Fix typo
* Use proper pytest.raises for catching errors in tests
* Add comment on GAE and how to obtain non-GAE behaviour
* Remove save/load_running_average from VecNormalize in favor of load/save
* Update changelog
* Update docstring
* Add accidentally removed tests for VecNormalize
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Split torch module code into torch_layers file
* Updated reference to CNN
* Change 'CxWxH' to 'CxHxW', as per common notion
* Fix missing import in policies.py
* Move PPOPolicy to OnlineActorCriticPolicy
* Create OnPolicyRLModel from PPO, and make A2C and PPO inherit
* Update A2C optimizer comment
* Clean weight init scales for clarity
* Fix A2C log_interval default parameter
* Rename 'progress' to 'progress_remaining
* Rename 'Models' to 'Algorithms'
* Rename 'OnlineActorCriticPolicy' to 'ActorCriticPolicy'
* Move static functions out from BaseAlgorithm
* Move on/off_policy base algorithms to their own files
* Add files for A2C/PPO
* Fix docs
* Fix pytype
* Update documentation on OnPolicyAlgorithm
* Add proper doctstring for on_policy rollout gathering
* Add bit clarification on the mlppolicy/cnnpolicy naming
* Move static function is_vectorized_policies to utils.py
* Checking docstrings, pep8 fixes
* Update changelog
* Clean changelog
* Remove policy warnings for sac/td3
* Add monitor_wrapper for OnPolicyAlgorithm. Clean tb logging variables. Add parameter keywords to OffPolicyAlgorithm super init
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* init commit tensorboard-integration
* Added tb logger to ppo (with output exclusions)
* fixed truncated stdout
* categorize stdout outputs by tag
* separated exclusions from values, added missing logs
* saving exclusions as dict instead of list
* reformatting, auto run indexing
* included renaming suggestions, fixed tests
* tb support for sac
* linting
* moved logging to base class
* tb support for td3
* removed histograms, non-verbose output working
* modifed changelog
* linting
* fixed type error
* moved logger config to utils
* removed episode_rewards log from ppo
* Enable tensorboard in tests
* Remove unused import
* Update logger sub titles
* Minor edit for PPO
* Update logger and tb log folder
* Pass correct logger to Callbacks
* updated docs
* added tb example image to docs
* add support for continuing training in tensorboard
* added tensorboard to docs index
* added tb test
* moved logger config to _setup_learn, updated tests
* accessing verbose from base class
* Update doc and tests
* Rename session -> time
* Update version
* Update logger truncate
* Update types
* Remove duplicated code
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Implemented Vectorized Action Noise
Vectorized Action Noise allows for multiple instances of
ActionNoiseProcesses to run in parallel. This makes it easier to
run TD3/SAC/DDPG with VecEnv.
* fixed linting issues
* make test function name consistent
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* sanity checks and more detailed test
* Update stable_baselines3/common/noise.py
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Added assertion error message in noises setter
* Corrected tests to reflect change to AssertionError from ValueError
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Test gitlab-ci
* Try different image
* Add pytest and doc build
* Fix command
* Fix image used for CI
* Seperate pytest builds
* Fix weird seg fault in docker image due to FakeImageEnv
* Fix make command
* [ci skip] Add space in the badges
* Fix CI failures
* Re-install opencv
* Use opencv-headless
* Test with new docker image