* Store number of timesteps at the beginning of each learn cycle
* Update changelog
* Set default _num_timesteps_at_start in the contructor
* Test case for FPS logger
* Adjust test to cover both on-policy and off-policy algorithms
* Fix formatting
* Update test and add comment
* Fix test
Co-authored-by: Oleksii Kachaiev <okachaiev@riotgames.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Add `system_env_info`
* Add `print_system_info` to load
and store system info at save time
* Remove TODO
* Rename to `get_system_info`
* Import as sb3 for consistency
* Update changelog
* Add warning for old SB3 versions
* Use underscore litteral for more clarity
* VecNormalize: allow non-continuous observations when norm_obs is False
* Update changelog, fix lint
* Switch to environment present in new and old versions of Gym
* Fix name
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* feat: add method predict_values for ActorCriticPolicy
* Fixes for new gym version
* Reformat
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* feat: get_distribution method for ActorCriticPolicy
New method get_distribution for class ActorCriticPolicy returning current action distribution given observations
* doc: updating changelog.rst
- adding block for Release 1.2.1a0
- adding cyprienc to contributors
* style: make format
* fix: updating version.txt
Changing version from 1.2.0 to 1.2.1a0
* Update changelog
* Add test for get distribution
Co-authored-by: Cyprien <courtot.c@gmail.com>
* make sure DQN policy is always in correct mode - train or eval
* make set_training_mode an abstract method of the base policy - safer
* update docstring of _build method to note that the target network is put into eval mode
* use set_training_mode to put the dqn target network into eval mode
* use set_training_mode to set the training model of the q-network
* move set_training_mode abstract method from BasePolicy to BaseModel
* set train and eval mode for TD3
* make sure critic is always in correct mode during train
* set train and eval mode for SAC
* add comment re batch norm and dropout
* set train and eval mode for A2C and PPO
* add tests for collect rollouts with batch norm
* fix formatting
* update change log
* update version
* remove Optional typing for batch size - causing type check to fail
* Fix scipy dependency for toy text envs
* implement set_training_mode method in BaseModel
* move all tests of train/eval mode to test_train_eval_mode
* call learn with learning_starts = total_timesteps to test that collect_rollouts does not update batch norm
* remove extra calls to set_training_mode in train method of TD3 and SAC
* Allow gradient_steps=0
* Refactor tests
* Add comment + use aliases
* Typos
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* training and evaluation: call model.train() and model.eval() to enable and disable dropout and batchnorm
* Add comment documentation
* Fix train and eval for the Actor class
* Run black
* Add github handle to changelog
* Add unit tests for PPO and DQN
* Refactor unit test
* Run black
* unit test: add a dropout layer and check that calling predict with deterministic=True is deterministic
* documentation: add bugfix description to changelog
* unit test: use learning_starts=0, decrease the size of the network and use more training steps
* on policy algorithms: call policy.train() and policy.eval() instead of disable_training and enable_training as it is a th.nn.module
* Rename unit test
* unit test: use drop out probability of 0.5
* Call policy.train and policy.eval
* Fixes + update tests
* Remove unneeded eval
Co-authored-by: David Blom <davidsblom@gmail.com>
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Add support for custom objects
* Add python 3.8 to the CI
* Bump version
* PyType fixes
* [ci skip] Fix typo
* Add note about slow-down + fix typos
* Minor edits to the doc
* Bug fix for DQN
* Update test
* Add test for custom objects
* Removed unneeded overrides of feature_extractor and normalize_images in the TD3 Actor.
* Add learning rate schedule example (#248)
* Add learning rate schedule example
* Update docs/guide/examples.rst
Co-authored-by: Adam Gleave <adam@gleave.me>
* Address comments
Co-authored-by: Adam Gleave <adam@gleave.me>
* Add supported action spaces checks (#254)
* Add supported action spaces checks
* Address comment
* Use `pass` in an abstractmethod instead of deleting the arguments.
* Remove the "deterministic" keyword from the forward method of the TD3 Actor since it always is deterministic anyways.
* Rename _get_data to _get_data_to_reconstruct_model.
_get_data was too generic and could have meant anything.
* Remove the n_episodes_rollout parameter and allow passing tuples as train_freq instead.
* Fix docstring of `train_freq` parameter.
* Black fixes.
* Fix TD3 delayed update + rename `_get_data()`
* Fix TD3 test
* Normalize `train_freq` to a tuple in the constructor and turn the warning into an assert.
* Make one step the default train frequency.
* Black fixes.
* Change np.bool to bool.
* Use the tuple format to specify an amount of steps in terms of steps or episodes in the collect_collouts of the off policy algorithm.
* Use the tuple format to specify an amount of steps in terms of steps or episodes in the collect_collouts of HER.
* Use named tuple for train freq
* Rename train_freq to train_every and TrainFreq to ExperienceDuration. Also add some type annotations and documentation.
* Black fixes.
* Revert to train_freq
* Fix terminal observation issues
* Typo
* Fix action noise bug in HER
* Add assert when loading HER models
* Update version
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Adam Gleave <adam@gleave.me>
* add support for text records to logger
* add note on how to access summary writer directly
* escape unicode chars for HumanOutputFormat
* update changelog
* fix formatting
* fix docs
* add tests
* fix formatting
* fix example, link to pytorch docs, update changelog
* move unicode escaping to own function, properly escape quotechars in csv formatter
* switch from n_calls to num_timesteps in example
* make step coherent in example
* use n_calls to check when to login example
* add small hint about log frequency
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* add comment about str is scalar type, improve test input
* Update tests
* Update test_logger.py
* use repr to handle strings in logger
* remove repr from text log output
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Fixed discrete obs support
* Suggest new edit, fix failed test
* Revert "Suggest new edit, fix failed test"
This reverts commit 6892bf05506bb5ad0e87016d8d382705ab72e6a4.
* Fix test
* Special case for discrete obs
Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>
* Add warning about total_env_steps not dividing neatly into batch size
* Stylistic cleanup
* Black reformatting
* Add clearer documentation and update changelog
* Update changelog.rst
* Use specific RolloutBuffer terminology
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Change to minibatch language
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Cleaning up language describing rollout buffer requirements
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Switch to using env.num_envs
* Working tests
* Black and isort still fighting each other
* codestyle finally happy
* Basic test exists, possibly in the wrong file
* Update phrasing
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Added Image and Figure classes to logger. For now, these objects can only be logged by TensorBoardOutputFormat
* Added documentation for figure and image logging into tensorboard
* Updated changelog
* Minor changes to documentation. Reviewed supported types for logging images and figures
* Fix type for np arrays
* Added more explicit example for logging figures in the documentation. Added docstrings for parameters in logging auxiliary classes
* Added tests for image and figure logging
* Applied autoformatting
* Update doc
* Fix documentation example
* Bump version
Co-authored-by: Carlos Casas <ccasascuadrado@guidewire.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Fix big when saving/loading q-net alone
* Rename variables to match SB3-contrib
* Update docker image
* Set min version for tensorboard
* Add SB3-Contrib to doc
* Update DQN
* Apply suggestions from code review
Co-authored-by: Adam Gleave <adam@gleave.me>
* Update wording
Co-authored-by: Adam Gleave <adam@gleave.me>
* Update evaluate_policy to use monitor data if available
* Update documentation
* Cleaning up
* Remove unnecessary typing trickery
* Update doc
* Rename is_wrapped to clarify it is for vecenvs
* Add is_wrapped for regular envs
* Add is_wrapped call for subprocvecenv and update code for circular imports
* Move new functions back to env_util and fix imports
* Update changelog
* Clarify evaluate_policy docs
* Add tests for wrapped modifying episode lengths
* Fix tests
* Update changelog
* Minor edits
* Add warn switch to evaluate_policy and update tests
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Small docstring improvements related to the notion of Rollout
* documented changes in changelog.rst, added myself to contributers
* Minor edits
Co-authored-by: Stefan Heid <stefan.heid@upb.de>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Added working her version, Online sampling is missing.
* Updated test_her.
* Added first version of online her sampling. Still problems with tensor dimensions.
* Reformat
* Fixed tests
* Added some comments.
* Updated changelog.
* Add missing init file
* Fixed some small bugs.
* Reduced arguments for HER, small changes.
* Added getattr. Fixed bug for online sampling.
* Updated save/load funtions. Small changes.
* Added her to init.
* Updated save method.
* Updated her ratio.
* Move obs_wrapper
* Added DQN test.
* Fix potential bug
* Offline and online her share same sample_goal function.
* Changed lists into arrays.
* Updated her test.
* Fix online sampling
* Fixed action bug. Updated time limit for episodes.
* Updated convert_dict method to take keys as arguments.
* Renamed obs dict wrapper.
* Seed bit flipping env
* Remove get_episode_dict
* Add fast online sampling version
* Added documentation.
* Vectorized reward computation
* Vectorized goal sampling
* Update time limit for episodes in online her sampling.
* Fix max episode length inference
* Bug fix for Fetch envs
* Fix for HER + gSDE
* Reformat (new black version)
* Added info dict to compute new reward. Check her_replay_buffer again.
* Fix info buffer
* Updated done flag.
* Fixes for gSDE
* Offline her version uses now HerReplayBuffer as episode storage.
* Fix num_timesteps computation
* Fix get torch params
* Vectorized version for offline sampling.
* Modified offline her sampling to use sample method of her_replay_buffer
* Updated HER tests.
* Updated documentation
* Cleanup docstrings
* Updated to review comments
* Fix pytype
* Update according to review comments.
* Removed random goal strategy. Updated sample transitions.
* Updated migration. Removed time signal removal.
* Update doc
* Fix potential load issue
* Add VecNormalize support for dict obs
* Updated saving/loading replay buffer for HER.
* Fix test memory usage
* Fixed save/load replay buffer.
* Fixed save/load replay buffer
* Fixed transition index after loading replay buffer in online sampling
* Better error handling
* Add tests for get_time_limit
* More tests for VecNormalize with dict obs
* Update doc
* Improve HER description
* Add test for sde support
* Add comments
* Add comments
* Remove check that was always valid
* Fix for terminal observation
* Updated buffer size in offline version and reset of HER buffer
* Reformat
* Update doc
* Remove np.empty + add doc
* Fix loading
* Updated loading replay buffer
* Separate online and offline sampling + bug fixes
* Update tensorboard log name
* Version bump
* Bug fix for special case
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Add support to log videos via tensorboard
The ability to look at renderings of agent's trajectories during
training helps evaluate the performance of that agent. One can see what
the agent actually does at various stages during training. For now only
tensorboard is supported, as it is straightforward to implement.
* Remove moviepy dependency from extra & doc update
* Removed the moviepy dependency from the `extra` dependencies so the
user can decide whether to install it or not
* Update the video logging docu with proper naming, comments
* Added a warning to the video logging docu explaining the moviepy
dependency
* Updated the video test, to check for a warning when moviepy is missing
* Update doc
* Update FormatUnsupportedError message
* Also log the offending value making the error message more expressive
* Fix reporting the correct format and update regression test
* Use string description in FormatUnsupportedError
* Instead of converting the value to string without the user's control
the constructor takes a string representation of the value
* Use string description in FormatUnsupportedError
* Use a shorter string description for the error to reduce verbosity
Co-authored-by: Bernhard Raml <raml.bernhard@gmail.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* add check to ensure action space is non-dict non-tuple for env_checker nan check
* update changelog.rst
* add regression test for new check
* commit-checks
* add more action space checks
* update docstrings
* add warning check