* Add support for custom objects
* Add python 3.8 to the CI
* Bump version
* PyType fixes
* [ci skip] Fix typo
* Add note about slow-down + fix typos
* Minor edits to the doc
* Bug fix for DQN
* Update test
* Add test for custom objects
* Removed unneeded overrides of feature_extractor and normalize_images in the TD3 Actor.
* Add learning rate schedule example (#248)
* Add learning rate schedule example
* Update docs/guide/examples.rst
Co-authored-by: Adam Gleave <adam@gleave.me>
* Address comments
Co-authored-by: Adam Gleave <adam@gleave.me>
* Add supported action spaces checks (#254)
* Add supported action spaces checks
* Address comment
* Use `pass` in an abstractmethod instead of deleting the arguments.
* Remove the "deterministic" keyword from the forward method of the TD3 Actor since it always is deterministic anyways.
* Rename _get_data to _get_data_to_reconstruct_model.
_get_data was too generic and could have meant anything.
* Remove the n_episodes_rollout parameter and allow passing tuples as train_freq instead.
* Fix docstring of `train_freq` parameter.
* Black fixes.
* Fix TD3 delayed update + rename `_get_data()`
* Fix TD3 test
* Normalize `train_freq` to a tuple in the constructor and turn the warning into an assert.
* Make one step the default train frequency.
* Black fixes.
* Change np.bool to bool.
* Use the tuple format to specify an amount of steps in terms of steps or episodes in the collect_collouts of the off policy algorithm.
* Use the tuple format to specify an amount of steps in terms of steps or episodes in the collect_collouts of HER.
* Use named tuple for train freq
* Rename train_freq to train_every and TrainFreq to ExperienceDuration. Also add some type annotations and documentation.
* Black fixes.
* Revert to train_freq
* Fix terminal observation issues
* Typo
* Fix action noise bug in HER
* Add assert when loading HER models
* Update version
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Adam Gleave <adam@gleave.me>
* add support for text records to logger
* add note on how to access summary writer directly
* escape unicode chars for HumanOutputFormat
* update changelog
* fix formatting
* fix docs
* add tests
* fix formatting
* fix example, link to pytorch docs, update changelog
* move unicode escaping to own function, properly escape quotechars in csv formatter
* switch from n_calls to num_timesteps in example
* make step coherent in example
* use n_calls to check when to login example
* add small hint about log frequency
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* add comment about str is scalar type, improve test input
* Update tests
* Update test_logger.py
* use repr to handle strings in logger
* remove repr from text log output
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Fixed discrete obs support
* Suggest new edit, fix failed test
* Revert "Suggest new edit, fix failed test"
This reverts commit 6892bf05506bb5ad0e87016d8d382705ab72e6a4.
* Fix test
* Special case for discrete obs
Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>
* Add warning about total_env_steps not dividing neatly into batch size
* Stylistic cleanup
* Black reformatting
* Add clearer documentation and update changelog
* Update changelog.rst
* Use specific RolloutBuffer terminology
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Change to minibatch language
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Cleaning up language describing rollout buffer requirements
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Switch to using env.num_envs
* Working tests
* Black and isort still fighting each other
* codestyle finally happy
* Basic test exists, possibly in the wrong file
* Update phrasing
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Added Image and Figure classes to logger. For now, these objects can only be logged by TensorBoardOutputFormat
* Added documentation for figure and image logging into tensorboard
* Updated changelog
* Minor changes to documentation. Reviewed supported types for logging images and figures
* Fix type for np arrays
* Added more explicit example for logging figures in the documentation. Added docstrings for parameters in logging auxiliary classes
* Added tests for image and figure logging
* Applied autoformatting
* Update doc
* Fix documentation example
* Bump version
Co-authored-by: Carlos Casas <ccasascuadrado@guidewire.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Fix big when saving/loading q-net alone
* Rename variables to match SB3-contrib
* Update docker image
* Set min version for tensorboard
* Add SB3-Contrib to doc
* Update DQN
* Apply suggestions from code review
Co-authored-by: Adam Gleave <adam@gleave.me>
* Update wording
Co-authored-by: Adam Gleave <adam@gleave.me>
* Update evaluate_policy to use monitor data if available
* Update documentation
* Cleaning up
* Remove unnecessary typing trickery
* Update doc
* Rename is_wrapped to clarify it is for vecenvs
* Add is_wrapped for regular envs
* Add is_wrapped call for subprocvecenv and update code for circular imports
* Move new functions back to env_util and fix imports
* Update changelog
* Clarify evaluate_policy docs
* Add tests for wrapped modifying episode lengths
* Fix tests
* Update changelog
* Minor edits
* Add warn switch to evaluate_policy and update tests
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Small docstring improvements related to the notion of Rollout
* documented changes in changelog.rst, added myself to contributers
* Minor edits
Co-authored-by: Stefan Heid <stefan.heid@upb.de>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Added working her version, Online sampling is missing.
* Updated test_her.
* Added first version of online her sampling. Still problems with tensor dimensions.
* Reformat
* Fixed tests
* Added some comments.
* Updated changelog.
* Add missing init file
* Fixed some small bugs.
* Reduced arguments for HER, small changes.
* Added getattr. Fixed bug for online sampling.
* Updated save/load funtions. Small changes.
* Added her to init.
* Updated save method.
* Updated her ratio.
* Move obs_wrapper
* Added DQN test.
* Fix potential bug
* Offline and online her share same sample_goal function.
* Changed lists into arrays.
* Updated her test.
* Fix online sampling
* Fixed action bug. Updated time limit for episodes.
* Updated convert_dict method to take keys as arguments.
* Renamed obs dict wrapper.
* Seed bit flipping env
* Remove get_episode_dict
* Add fast online sampling version
* Added documentation.
* Vectorized reward computation
* Vectorized goal sampling
* Update time limit for episodes in online her sampling.
* Fix max episode length inference
* Bug fix for Fetch envs
* Fix for HER + gSDE
* Reformat (new black version)
* Added info dict to compute new reward. Check her_replay_buffer again.
* Fix info buffer
* Updated done flag.
* Fixes for gSDE
* Offline her version uses now HerReplayBuffer as episode storage.
* Fix num_timesteps computation
* Fix get torch params
* Vectorized version for offline sampling.
* Modified offline her sampling to use sample method of her_replay_buffer
* Updated HER tests.
* Updated documentation
* Cleanup docstrings
* Updated to review comments
* Fix pytype
* Update according to review comments.
* Removed random goal strategy. Updated sample transitions.
* Updated migration. Removed time signal removal.
* Update doc
* Fix potential load issue
* Add VecNormalize support for dict obs
* Updated saving/loading replay buffer for HER.
* Fix test memory usage
* Fixed save/load replay buffer.
* Fixed save/load replay buffer
* Fixed transition index after loading replay buffer in online sampling
* Better error handling
* Add tests for get_time_limit
* More tests for VecNormalize with dict obs
* Update doc
* Improve HER description
* Add test for sde support
* Add comments
* Add comments
* Remove check that was always valid
* Fix for terminal observation
* Updated buffer size in offline version and reset of HER buffer
* Reformat
* Update doc
* Remove np.empty + add doc
* Fix loading
* Updated loading replay buffer
* Separate online and offline sampling + bug fixes
* Update tensorboard log name
* Version bump
* Bug fix for special case
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Add support to log videos via tensorboard
The ability to look at renderings of agent's trajectories during
training helps evaluate the performance of that agent. One can see what
the agent actually does at various stages during training. For now only
tensorboard is supported, as it is straightforward to implement.
* Remove moviepy dependency from extra & doc update
* Removed the moviepy dependency from the `extra` dependencies so the
user can decide whether to install it or not
* Update the video logging docu with proper naming, comments
* Added a warning to the video logging docu explaining the moviepy
dependency
* Updated the video test, to check for a warning when moviepy is missing
* Update doc
* Update FormatUnsupportedError message
* Also log the offending value making the error message more expressive
* Fix reporting the correct format and update regression test
* Use string description in FormatUnsupportedError
* Instead of converting the value to string without the user's control
the constructor takes a string representation of the value
* Use string description in FormatUnsupportedError
* Use a shorter string description for the error to reduce verbosity
Co-authored-by: Bernhard Raml <raml.bernhard@gmail.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* add check to ensure action space is non-dict non-tuple for env_checker nan check
* update changelog.rst
* add regression test for new check
* commit-checks
* add more action space checks
* update docstrings
* add warning check
* Fix ignoring the exclude in logger record
For the logging formats json, csv, and log the exclude parameter of the
logger's record function has been ignored. The necessary checks were
missing from some of the format writer classes. Regression tests have
been added to prevent this error in the future.
* Fix docstring for filter_excluded_keys
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Added missing type hints to local functions
* Update stable_baselines3/common/logger.py
Co-authored-by: Bernhard Raml <raml.bernhard@gmail.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Allow env_kwargs in make_vec_env when env ID string supplied
Resolves#188
* Update docs/misc/changelog.rst
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Add test for env kargs in make_vec_env
* remove unnecessary args in test_vec_env_kwargs function
* Fixes and reformat
* Doc fix
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Add custom arch for off-policy actor/critic networks
* Fix type hints
* Address comments
* Make sure number of updated parameters match in polyak
* Add zip_strict for strict-length zipping
* Fix building docs
* Add test for zip strict
* Faster tests
Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>
* Update comments and docstrings
* Rename get_torch_variables to private and update docs
* Clarify documentation on data, params and tensors
* Make excluded_save_params private and update docs
* Update get_torch_variable_names to get_torch_save_params for description
* Simplify saving code and update docs on params vs tensors
* Rename saved item tensors to pytorch_variables for clarity
* Reformat
* Fix a typo
* Add get/set_parameters, update tests accordingly
* Use f-strings for formatting
* Fix load docstring
* Reorganize functions in BaseClass
* Update changelog
* Add library version to the stored models
* Actually run isort this time
* Fix flake8 complaints and also fix testing code
* Fix isort
* ...and black
* Fix set_random_seed
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
* Added a 'device' keyword argument to BaseAlgorithm.load().
Edited the save and load test to also test the load method with all possible devices.
Added the changes to the changelog
* improved the load test to ensure that the model loads to the correct device.
* improved the test: now the correctness is improved. If the get_device policy would change, it wouldn't break the test.
* Update tests/test_save_load.py
@araffin's suggestion during the PR process
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Update tests/test_save_load.py
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Bug fixes: when comparing devices, comparing only device type since get_device() doesn't provide device index.
Now the code loads all of the model parameters from the saved state dict straight into the required device. (fixed load_from_zip_file).
* PR fixes: bug fix - a non-related test failed when running on GPU. updated the assertion to consider only types of devices. Also corrected a related bug in 'get_device()' method.
* Update changelog.rst
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Fix storing correct episode dones
* Fix number of filters in NatureCNN network
* Add TF-like RMSprop for matching performance with sb2
* Remove stuff that was accidentally included
* Reformat
* Clarify variable naming
* Update changelog
* Add comment on RMSprop implementations to A2C
* Add test for RMSpropTFLike
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Add auto formatting with black and isort
* Reformat code
* Ignore typing errors
* Add note about line length
* Add minimum version for isort
* Add commit-checks
* Update docker image
* Fixed lost import (during last merge)
* Fix opencv dependency
* Add DDPG + TD3 with any number of critics
* Allow any number of critics for SAC
* Update doc
* [ci skip] Update DDPG example
* Remove unused parameter
* Add DDPG to identity test
* Fix computation with n_critics=1,3
* Update doc
* Apply suggestions from code review
Co-authored-by: Adam Gleave <adam@gleave.me>
* Update docstrings for off-policy algos
* Add check for sde
Co-authored-by: Adam Gleave <adam@gleave.me>
* Created DQN template according to the paper.
Next steps:
- Create Policy
- Complete Training
- Debug
* Changed Base Class
* refactor save, to be consistence with overriding the excluded_save_params function. Do not try to exclude the parameters twice.
* Added simple DQN policy
* Finished learn and train function
- missing correct loss computation
* changed collect_rollouts to work with discrete space
* moved discrete space collect_rollouts to dqn
* basic dqn working
* deleted SDE related code
* added gradient clipping and moved greedy policy to policy
* changed policy to implement target network
and added soft update(in fact standart tau is 1 so hard update)
* fixed policy setup
* rebase target_update_intervall on _n_updates
* adapted all tests
all tests passing
* Move to stable-baseline3
* Fixes for DQN
* Fix tests + add CNNPolicy
* Allow any optimizer for DQN
* added some util functions to create a arbitrary linear schedule, fixed pickle problem with old exploration schedule
* more documentation
* changed buffer dtype
* refactor and document
* Added Sphinx Documentation
Updated changelog.rst
* removed custom collect_rollouts as it is no longer necessary
* Implemented suggestions to clean code and documentation.
* extracted some functions on tests to reduce duplicated code
* added support for exploration_fraction
* Fixed exploration_fraction
* Added documentation
* Fixed get_linear_fn -> proper progress scaling
* Merged master
* Added nature reference
* Changed default parameters to https://www.nature.com/articles/nature14236/tables/1
* Fixed n_updates to be incremented correctly
* Correct train_freq
* Doc update
* added special parameter for DQN in tests
* different fix for test_discrete
* Update docs/modules/dqn.rst
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Update docs/modules/dqn.rst
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Update docs/modules/dqn.rst
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Added RMSProp in optimizer_kwargs, as described in nature paper
* Exploration fraction is inverse of 50.000.000 (total frames) / 1.000.000 (frames with linear schedule) according to nature paper
* Changelog update for buffer dtype
* standard exlude parameters should be always excluded to assure proper saving only if intentionally included by ``include`` parameter
* slightly more iterations on test_discrete to pass the test
* added param use_rms_prop instead of mutable default argument
* forgot alpha
* using huber loss, adam and learning rate 1e-4
* account for train_freq in update_target_network
* Added memory check for both buffers
* Doc updated for buffer allocation
* Added psutil Requirement
* Adapted test_identity.py
* Fixes with new SB3 version
* Fix for tensorboard name
* Convert assert to warning and fix tests
* Refactor off-policy algorithms
* Fixes
* test: remove next_obs in replay buffer
* Update changelog
* Fix tests and use tmp_path where possible
* Fix sampling bug in buffer
* Do not store next obs on episode termination
* Fix replay buffer sampling
* Update comment
* moved epsilon from policy to model
* Update predict method
* Update atari wrappers to match SB2
* Minor edit in the buffers
* Update changelog
* Merge branch 'master' into dqn
* Update DQN to new structure
* Fix tests and remove hardcoded path
* Fix for DQN
* Disable memory efficient replay buffer by default
* Fix docstring
* Add tests for memory efficient buffer
* Update changelog
* Split collect rollout
* Move target update outside `train()` for DQN
* Update changelog
* Update linear schedule doc
* Cleanup DQN code
* Minor edit
* Update version and docker images
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* save_replay_buffer now receives as argument the file path instead of the folder path
* Update changelog.rst
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Change saving/loading normalization parameters to use single pickle file
* Remove 'use_gae' from RolloutBuffer compute_returns function
* Add some missing tests for normalizer, nan-checker and PPO clip_value_fn argument
* Update changelog
* Fix typo
* Use proper pytest.raises for catching errors in tests
* Add comment on GAE and how to obtain non-GAE behaviour
* Remove save/load_running_average from VecNormalize in favor of load/save
* Update changelog
* Update docstring
* Add accidentally removed tests for VecNormalize
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Split torch module code into torch_layers file
* Updated reference to CNN
* Change 'CxWxH' to 'CxHxW', as per common notion
* Fix missing import in policies.py
* Move PPOPolicy to OnlineActorCriticPolicy
* Create OnPolicyRLModel from PPO, and make A2C and PPO inherit
* Update A2C optimizer comment
* Clean weight init scales for clarity
* Fix A2C log_interval default parameter
* Rename 'progress' to 'progress_remaining
* Rename 'Models' to 'Algorithms'
* Rename 'OnlineActorCriticPolicy' to 'ActorCriticPolicy'
* Move static functions out from BaseAlgorithm
* Move on/off_policy base algorithms to their own files
* Add files for A2C/PPO
* Fix docs
* Fix pytype
* Update documentation on OnPolicyAlgorithm
* Add proper doctstring for on_policy rollout gathering
* Add bit clarification on the mlppolicy/cnnpolicy naming
* Move static function is_vectorized_policies to utils.py
* Checking docstrings, pep8 fixes
* Update changelog
* Clean changelog
* Remove policy warnings for sac/td3
* Add monitor_wrapper for OnPolicyAlgorithm. Clean tb logging variables. Add parameter keywords to OffPolicyAlgorithm super init
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* init commit tensorboard-integration
* Added tb logger to ppo (with output exclusions)
* fixed truncated stdout
* categorize stdout outputs by tag
* separated exclusions from values, added missing logs
* saving exclusions as dict instead of list
* reformatting, auto run indexing
* included renaming suggestions, fixed tests
* tb support for sac
* linting
* moved logging to base class
* tb support for td3
* removed histograms, non-verbose output working
* modifed changelog
* linting
* fixed type error
* moved logger config to utils
* removed episode_rewards log from ppo
* Enable tensorboard in tests
* Remove unused import
* Update logger sub titles
* Minor edit for PPO
* Update logger and tb log folder
* Pass correct logger to Callbacks
* updated docs
* added tb example image to docs
* add support for continuing training in tensorboard
* added tensorboard to docs index
* added tb test
* moved logger config to _setup_learn, updated tests
* accessing verbose from base class
* Update doc and tests
* Rename session -> time
* Update version
* Update logger truncate
* Update types
* Remove duplicated code
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Implemented Vectorized Action Noise
Vectorized Action Noise allows for multiple instances of
ActionNoiseProcesses to run in parallel. This makes it easier to
run TD3/SAC/DDPG with VecEnv.
* fixed linting issues
* make test function name consistent
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* sanity checks and more detailed test
* Update stable_baselines3/common/noise.py
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Added assertion error message in noises setter
* Corrected tests to reflect change to AssertionError from ValueError
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>