* Bump version and update doc
* Fix name
* Apply suggestions from code review
Co-authored-by: Adam Gleave <adam@gleave.me>
* Update docs/index.rst
Co-authored-by: Adam Gleave <adam@gleave.me>
* Update wording for RL zoo
Co-authored-by: Adam Gleave <adam@gleave.me>
* Add support for custom objects
* Add python 3.8 to the CI
* Bump version
* PyType fixes
* [ci skip] Fix typo
* Add note about slow-down + fix typos
* Minor edits to the doc
* Bug fix for DQN
* Update test
* Add test for custom objects
* Removed unneeded overrides of feature_extractor and normalize_images in the TD3 Actor.
* Add learning rate schedule example (#248)
* Add learning rate schedule example
* Update docs/guide/examples.rst
Co-authored-by: Adam Gleave <adam@gleave.me>
* Address comments
Co-authored-by: Adam Gleave <adam@gleave.me>
* Add supported action spaces checks (#254)
* Add supported action spaces checks
* Address comment
* Use `pass` in an abstractmethod instead of deleting the arguments.
* Remove the "deterministic" keyword from the forward method of the TD3 Actor since it always is deterministic anyways.
* Rename _get_data to _get_data_to_reconstruct_model.
_get_data was too generic and could have meant anything.
* Remove the n_episodes_rollout parameter and allow passing tuples as train_freq instead.
* Fix docstring of `train_freq` parameter.
* Black fixes.
* Fix TD3 delayed update + rename `_get_data()`
* Fix TD3 test
* Normalize `train_freq` to a tuple in the constructor and turn the warning into an assert.
* Make one step the default train frequency.
* Black fixes.
* Change np.bool to bool.
* Use the tuple format to specify an amount of steps in terms of steps or episodes in the collect_collouts of the off policy algorithm.
* Use the tuple format to specify an amount of steps in terms of steps or episodes in the collect_collouts of HER.
* Use named tuple for train freq
* Rename train_freq to train_every and TrainFreq to ExperienceDuration. Also add some type annotations and documentation.
* Black fixes.
* Revert to train_freq
* Fix terminal observation issues
* Typo
* Fix action noise bug in HER
* Add assert when loading HER models
* Update version
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Adam Gleave <adam@gleave.me>
* add support for text records to logger
* add note on how to access summary writer directly
* escape unicode chars for HumanOutputFormat
* update changelog
* fix formatting
* fix docs
* add tests
* fix formatting
* fix example, link to pytorch docs, update changelog
* move unicode escaping to own function, properly escape quotechars in csv formatter
* switch from n_calls to num_timesteps in example
* make step coherent in example
* use n_calls to check when to login example
* add small hint about log frequency
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* add comment about str is scalar type, improve test input
* Update tests
* Update test_logger.py
* use repr to handle strings in logger
* remove repr from text log output
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Fixed discrete obs support
* Suggest new edit, fix failed test
* Revert "Suggest new edit, fix failed test"
This reverts commit 6892bf05506bb5ad0e87016d8d382705ab72e6a4.
* Fix test
* Special case for discrete obs
Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>
* Add warning about total_env_steps not dividing neatly into batch size
* Stylistic cleanup
* Black reformatting
* Add clearer documentation and update changelog
* Update changelog.rst
* Use specific RolloutBuffer terminology
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Change to minibatch language
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Cleaning up language describing rollout buffer requirements
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Switch to using env.num_envs
* Working tests
* Black and isort still fighting each other
* codestyle finally happy
* Basic test exists, possibly in the wrong file
* Update phrasing
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Added Image and Figure classes to logger. For now, these objects can only be logged by TensorBoardOutputFormat
* Added documentation for figure and image logging into tensorboard
* Updated changelog
* Minor changes to documentation. Reviewed supported types for logging images and figures
* Fix type for np arrays
* Added more explicit example for logging figures in the documentation. Added docstrings for parameters in logging auxiliary classes
* Added tests for image and figure logging
* Applied autoformatting
* Update doc
* Fix documentation example
* Bump version
Co-authored-by: Carlos Casas <ccasascuadrado@guidewire.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Fix big when saving/loading q-net alone
* Rename variables to match SB3-contrib
* Update docker image
* Set min version for tensorboard
* Add SB3-Contrib to doc
* Update DQN
* Apply suggestions from code review
Co-authored-by: Adam Gleave <adam@gleave.me>
* Update wording
Co-authored-by: Adam Gleave <adam@gleave.me>
* Add SUMO-RL as example project in the docs
* Fixed docstring of AtariWrapper which was not inside of __init__
* Updated changelog regarding docs
* Fix docstring of classes in atari_wrappers.py which were inside the constructor
* Formated docstring with black
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Fix for arguments order in explained_variance()
Fix for arguments order in explained_variance() in PPO
* Fix for arguments order in explained_variance()
Fix for arguments order in explained_variance() in a2c
* Fix for arguments order in explained_variance()
update changelog.rst
* Update evaluate_policy to use monitor data if available
* Update documentation
* Cleaning up
* Remove unnecessary typing trickery
* Update doc
* Rename is_wrapped to clarify it is for vecenvs
* Add is_wrapped for regular envs
* Add is_wrapped call for subprocvecenv and update code for circular imports
* Move new functions back to env_util and fix imports
* Update changelog
* Clarify evaluate_policy docs
* Add tests for wrapped modifying episode lengths
* Fix tests
* Update changelog
* Minor edits
* Add warn switch to evaluate_policy and update tests
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Add callback signature to the learning rate type annotations.
* Add callback signature to the learning rate schedule type annotations.
* Add missing type annotations for learning rate callbacks.
* Add signature to old-style learning and evaluation callbacks.
* Add signature to env wrapper callback.
* Add type annotation to closure function.
* Use MaybeCallback more consistently.
* Update changelog.
* Remove now unused List import.
* Fix import order.
* Add type alias for learning rate schedules.
* Optimize imports.
* Fix messed up import.
* Remove resolved TODO.
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Small docstring improvements related to the notion of Rollout
* documented changes in changelog.rst, added myself to contributers
* Minor edits
Co-authored-by: Stefan Heid <stefan.heid@upb.de>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Update doc and add new example
* Add save/load replay buffer example
* Add save format + export doc
* Add example for get/set parameters
* Typos and minor edits
* Add results sections
* Add note about performance
* Add DDPG results
* Address comments
* Fix grammar/wording
Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>
* Added working her version, Online sampling is missing.
* Updated test_her.
* Added first version of online her sampling. Still problems with tensor dimensions.
* Reformat
* Fixed tests
* Added some comments.
* Updated changelog.
* Add missing init file
* Fixed some small bugs.
* Reduced arguments for HER, small changes.
* Added getattr. Fixed bug for online sampling.
* Updated save/load funtions. Small changes.
* Added her to init.
* Updated save method.
* Updated her ratio.
* Move obs_wrapper
* Added DQN test.
* Fix potential bug
* Offline and online her share same sample_goal function.
* Changed lists into arrays.
* Updated her test.
* Fix online sampling
* Fixed action bug. Updated time limit for episodes.
* Updated convert_dict method to take keys as arguments.
* Renamed obs dict wrapper.
* Seed bit flipping env
* Remove get_episode_dict
* Add fast online sampling version
* Added documentation.
* Vectorized reward computation
* Vectorized goal sampling
* Update time limit for episodes in online her sampling.
* Fix max episode length inference
* Bug fix for Fetch envs
* Fix for HER + gSDE
* Reformat (new black version)
* Added info dict to compute new reward. Check her_replay_buffer again.
* Fix info buffer
* Updated done flag.
* Fixes for gSDE
* Offline her version uses now HerReplayBuffer as episode storage.
* Fix num_timesteps computation
* Fix get torch params
* Vectorized version for offline sampling.
* Modified offline her sampling to use sample method of her_replay_buffer
* Updated HER tests.
* Updated documentation
* Cleanup docstrings
* Updated to review comments
* Fix pytype
* Update according to review comments.
* Removed random goal strategy. Updated sample transitions.
* Updated migration. Removed time signal removal.
* Update doc
* Fix potential load issue
* Add VecNormalize support for dict obs
* Updated saving/loading replay buffer for HER.
* Fix test memory usage
* Fixed save/load replay buffer.
* Fixed save/load replay buffer
* Fixed transition index after loading replay buffer in online sampling
* Better error handling
* Add tests for get_time_limit
* More tests for VecNormalize with dict obs
* Update doc
* Improve HER description
* Add test for sde support
* Add comments
* Add comments
* Remove check that was always valid
* Fix for terminal observation
* Updated buffer size in offline version and reset of HER buffer
* Reformat
* Update doc
* Remove np.empty + add doc
* Fix loading
* Updated loading replay buffer
* Separate online and offline sampling + bug fixes
* Update tensorboard log name
* Version bump
* Bug fix for special case
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>