* more verbose documentation regarding `.load` vs `.set_parameters` (#683, #614)
* add a note to explain the difference between `.load` and `.set_parameters` to the examples
* fix typos
Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Anssi <kaneran21@hotmail.com>
* Add multi-env training support for SAC
* Fix for dict obs
* Pytype fixes
* Fix assert on number of envs
* Remove for loop
* Add support for Dict obs
* Start cleanup
* Update doc and bug fix
* Add support for vectorized action noise
and add multi env example for off-policy
* Update version
* Bug fix with VecNormalize
* Update README table
* Update variable names
* Update changelog and version
* Update doc and fix for `gradient_steps=-1`
* Add test for `gradient_steps=-1`
* Disable pytype pyi errors
* Fix for DQN
* Update comment on deepcopy
* Remove episode_reward field
* Fix RolloutReturn
* Avoid modification by reference
* Fix error message
Co-authored-by: Anssi <kaneran21@hotmail.com>
* Add a section on exporting to TFLite/Coral with demonstration
* Changelog to reflect new export documentation
* Update docs/guide/export.rst
Fingers on autopilot make word wrong
Co-authored-by: Anssi <kaneran21@hotmail.com>
* Update docs/guide/export.rst
Better wording clarity
Co-authored-by: Anssi <kaneran21@hotmail.com>
* Update docs/guide/export.rst
Better wording clarity
Co-authored-by: Anssi <kaneran21@hotmail.com>
* Clarify motivations and hardware
* Update docs/misc/changelog.rst
Make consistent with other changelog entries
Co-authored-by: Anssi <kaneran21@hotmail.com>
* Sphinx wants the section underline to be at least this long
* Remove first-person voice
* Typos
Co-authored-by: Anssi <kaneran21@hotmail.com>
* Update rl_tips.rst
indent fix to make if done and its following statement work
* Fix indentation and update changelog
* Skip type check for python 3.9
Co-authored-by: paulg <cove9988@gmail.com>
* Add `system_env_info`
* Add `print_system_info` to load
and store system info at save time
* Remove TODO
* Rename to `get_system_info`
* Import as sb3 for consistency
* Update changelog
* Add warning for old SB3 versions
* Use underscore litteral for more clarity
* Updated ONNX documentation
First draft on the documentation explaining how to export SB3 models in the ONNX format
* Updated changelog with ONNX documentation fix
* Address comments
* Update changelog.rst
* Update rtd env
* Fixes + add test example
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Anssi Kanervisto <anssk@Anssis-MacBook-Air.local>
Co-authored-by: Anssi Kanervisto <kaneran21@hotmail.com>
* Bump version and update doc
* Fix name
* Apply suggestions from code review
Co-authored-by: Adam Gleave <adam@gleave.me>
* Update docs/index.rst
Co-authored-by: Adam Gleave <adam@gleave.me>
* Update wording for RL zoo
Co-authored-by: Adam Gleave <adam@gleave.me>
* Add support for custom objects
* Add python 3.8 to the CI
* Bump version
* PyType fixes
* [ci skip] Fix typo
* Add note about slow-down + fix typos
* Minor edits to the doc
* Bug fix for DQN
* Update test
* Add test for custom objects
* Removed unneeded overrides of feature_extractor and normalize_images in the TD3 Actor.
* Add learning rate schedule example (#248)
* Add learning rate schedule example
* Update docs/guide/examples.rst
Co-authored-by: Adam Gleave <adam@gleave.me>
* Address comments
Co-authored-by: Adam Gleave <adam@gleave.me>
* Add supported action spaces checks (#254)
* Add supported action spaces checks
* Address comment
* Use `pass` in an abstractmethod instead of deleting the arguments.
* Remove the "deterministic" keyword from the forward method of the TD3 Actor since it always is deterministic anyways.
* Rename _get_data to _get_data_to_reconstruct_model.
_get_data was too generic and could have meant anything.
* Remove the n_episodes_rollout parameter and allow passing tuples as train_freq instead.
* Fix docstring of `train_freq` parameter.
* Black fixes.
* Fix TD3 delayed update + rename `_get_data()`
* Fix TD3 test
* Normalize `train_freq` to a tuple in the constructor and turn the warning into an assert.
* Make one step the default train frequency.
* Black fixes.
* Change np.bool to bool.
* Use the tuple format to specify an amount of steps in terms of steps or episodes in the collect_collouts of the off policy algorithm.
* Use the tuple format to specify an amount of steps in terms of steps or episodes in the collect_collouts of HER.
* Use named tuple for train freq
* Rename train_freq to train_every and TrainFreq to ExperienceDuration. Also add some type annotations and documentation.
* Black fixes.
* Revert to train_freq
* Fix terminal observation issues
* Typo
* Fix action noise bug in HER
* Add assert when loading HER models
* Update version
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Adam Gleave <adam@gleave.me>
* add support for text records to logger
* add note on how to access summary writer directly
* escape unicode chars for HumanOutputFormat
* update changelog
* fix formatting
* fix docs
* add tests
* fix formatting
* fix example, link to pytorch docs, update changelog
* move unicode escaping to own function, properly escape quotechars in csv formatter
* switch from n_calls to num_timesteps in example
* make step coherent in example
* use n_calls to check when to login example
* add small hint about log frequency
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* add comment about str is scalar type, improve test input
* Update tests
* Update test_logger.py
* use repr to handle strings in logger
* remove repr from text log output
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Added Image and Figure classes to logger. For now, these objects can only be logged by TensorBoardOutputFormat
* Added documentation for figure and image logging into tensorboard
* Updated changelog
* Minor changes to documentation. Reviewed supported types for logging images and figures
* Fix type for np arrays
* Added more explicit example for logging figures in the documentation. Added docstrings for parameters in logging auxiliary classes
* Added tests for image and figure logging
* Applied autoformatting
* Update doc
* Fix documentation example
* Bump version
Co-authored-by: Carlos Casas <ccasascuadrado@guidewire.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Fix big when saving/loading q-net alone
* Rename variables to match SB3-contrib
* Update docker image
* Set min version for tensorboard
* Add SB3-Contrib to doc
* Update DQN
* Apply suggestions from code review
Co-authored-by: Adam Gleave <adam@gleave.me>
* Update wording
Co-authored-by: Adam Gleave <adam@gleave.me>
* Update evaluate_policy to use monitor data if available
* Update documentation
* Cleaning up
* Remove unnecessary typing trickery
* Update doc
* Rename is_wrapped to clarify it is for vecenvs
* Add is_wrapped for regular envs
* Add is_wrapped call for subprocvecenv and update code for circular imports
* Move new functions back to env_util and fix imports
* Update changelog
* Clarify evaluate_policy docs
* Add tests for wrapped modifying episode lengths
* Fix tests
* Update changelog
* Minor edits
* Add warn switch to evaluate_policy and update tests
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Update doc and add new example
* Add save/load replay buffer example
* Add save format + export doc
* Add example for get/set parameters
* Typos and minor edits
* Add results sections
* Add note about performance
* Add DDPG results
* Address comments
* Fix grammar/wording
Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>
* Added working her version, Online sampling is missing.
* Updated test_her.
* Added first version of online her sampling. Still problems with tensor dimensions.
* Reformat
* Fixed tests
* Added some comments.
* Updated changelog.
* Add missing init file
* Fixed some small bugs.
* Reduced arguments for HER, small changes.
* Added getattr. Fixed bug for online sampling.
* Updated save/load funtions. Small changes.
* Added her to init.
* Updated save method.
* Updated her ratio.
* Move obs_wrapper
* Added DQN test.
* Fix potential bug
* Offline and online her share same sample_goal function.
* Changed lists into arrays.
* Updated her test.
* Fix online sampling
* Fixed action bug. Updated time limit for episodes.
* Updated convert_dict method to take keys as arguments.
* Renamed obs dict wrapper.
* Seed bit flipping env
* Remove get_episode_dict
* Add fast online sampling version
* Added documentation.
* Vectorized reward computation
* Vectorized goal sampling
* Update time limit for episodes in online her sampling.
* Fix max episode length inference
* Bug fix for Fetch envs
* Fix for HER + gSDE
* Reformat (new black version)
* Added info dict to compute new reward. Check her_replay_buffer again.
* Fix info buffer
* Updated done flag.
* Fixes for gSDE
* Offline her version uses now HerReplayBuffer as episode storage.
* Fix num_timesteps computation
* Fix get torch params
* Vectorized version for offline sampling.
* Modified offline her sampling to use sample method of her_replay_buffer
* Updated HER tests.
* Updated documentation
* Cleanup docstrings
* Updated to review comments
* Fix pytype
* Update according to review comments.
* Removed random goal strategy. Updated sample transitions.
* Updated migration. Removed time signal removal.
* Update doc
* Fix potential load issue
* Add VecNormalize support for dict obs
* Updated saving/loading replay buffer for HER.
* Fix test memory usage
* Fixed save/load replay buffer.
* Fixed save/load replay buffer
* Fixed transition index after loading replay buffer in online sampling
* Better error handling
* Add tests for get_time_limit
* More tests for VecNormalize with dict obs
* Update doc
* Improve HER description
* Add test for sde support
* Add comments
* Add comments
* Remove check that was always valid
* Fix for terminal observation
* Updated buffer size in offline version and reset of HER buffer
* Reformat
* Update doc
* Remove np.empty + add doc
* Fix loading
* Updated loading replay buffer
* Separate online and offline sampling + bug fixes
* Update tensorboard log name
* Version bump
* Bug fix for special case
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Add support to log videos via tensorboard
The ability to look at renderings of agent's trajectories during
training helps evaluate the performance of that agent. One can see what
the agent actually does at various stages during training. For now only
tensorboard is supported, as it is straightforward to implement.
* Remove moviepy dependency from extra & doc update
* Removed the moviepy dependency from the `extra` dependencies so the
user can decide whether to install it or not
* Update the video logging docu with proper naming, comments
* Added a warning to the video logging docu explaining the moviepy
dependency
* Updated the video test, to check for a warning when moviepy is missing
* Update doc
* Update FormatUnsupportedError message
* Also log the offending value making the error message more expressive
* Fix reporting the correct format and update regression test
* Use string description in FormatUnsupportedError
* Instead of converting the value to string without the user's control
the constructor takes a string representation of the value
* Use string description in FormatUnsupportedError
* Use a shorter string description for the error to reduce verbosity
Co-authored-by: Bernhard Raml <raml.bernhard@gmail.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Add custom arch for off-policy actor/critic networks
* Fix type hints
* Address comments
* Make sure number of updated parameters match in polyak
* Add zip_strict for strict-length zipping
* Fix building docs
* Add test for zip strict
* Faster tests
Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>