* Allow PPO to turn of advantage normalization
* update changelog
* Add a test case
* Update test and sanity check
* Fix tests
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Make HumanOutputFormat length configurable and bump to 36 by default
* Add test case
* Updated changelog
* Blacken
* Blacken code
* Fix GitLab CI: switch to Docker container with new black version
* Incorporate suggestion
* Add class docstring
* Dummy commit to retrigger GitLab
Co-authored-by: Anssi <kaneran21@hotmail.com>
* fix Atari in CI
* fix dtype and atari extra
* Update setup.py
* remove 3.6
* note about how to install Atari
* pendulum-v1
* atari v5
* black
* fix pendulum capitalization
* add minimum version
* moved things in changelog to breaking changes
* partial v5 fix
* env update to pass tests
* mismatch env version fixed
* Fix tests after merge
* Include autorom in setup.py
* Blacken code
* Fix dtype issue in more robust way
* Fix GitLab CI: switch to Docker container with new black version
* Remove workaround from GitLab. (May need to rebuild Docker for this though.)
* Revert to v4
* Update setup.py
* Apply suggestions from code review
* Remove unnecessary autorom
* Consistent gym versions
Co-authored-by: J K Terry <justinkterry@gmail.com>
Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: modanesh <mohamad4danesh@gmail.com>
Co-authored-by: Adam Gleave <adam@gleave.me>
* Writing the additional info_keywords into the episode infos that are passed to the resulst writer. Directly taken from the non-vec version of monitor.
* Added test for monitoring info_keywords.
* Removed unnecessary step of registering the env. Not using make_vec_env, because it applies a monitor wrapper to the env.
* Reformat
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Add multi-env training support for SAC
* Fix for dict obs
* Pytype fixes
* Fix assert on number of envs
* Remove for loop
* Add support for Dict obs
* Start cleanup
* Update doc and bug fix
* Add support for vectorized action noise
and add multi env example for off-policy
* Update version
* Bug fix with VecNormalize
* Update README table
* Update variable names
* Update changelog and version
* Update doc and fix for `gradient_steps=-1`
* Add test for `gradient_steps=-1`
* Disable pytype pyi errors
* Fix for DQN
* Update comment on deepcopy
* Remove episode_reward field
* Fix RolloutReturn
* Avoid modification by reference
* Fix error message
Co-authored-by: Anssi <kaneran21@hotmail.com>
* Store number of timesteps at the beginning of each learn cycle
* Update changelog
* Set default _num_timesteps_at_start in the contructor
* Test case for FPS logger
* Adjust test to cover both on-policy and off-policy algorithms
* Fix formatting
* Update test and add comment
* Fix test
Co-authored-by: Oleksii Kachaiev <okachaiev@riotgames.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Add `system_env_info`
* Add `print_system_info` to load
and store system info at save time
* Remove TODO
* Rename to `get_system_info`
* Import as sb3 for consistency
* Update changelog
* Add warning for old SB3 versions
* Use underscore litteral for more clarity
* VecNormalize: allow non-continuous observations when norm_obs is False
* Update changelog, fix lint
* Switch to environment present in new and old versions of Gym
* Fix name
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* feat: add method predict_values for ActorCriticPolicy
* Fixes for new gym version
* Reformat
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* feat: get_distribution method for ActorCriticPolicy
New method get_distribution for class ActorCriticPolicy returning current action distribution given observations
* doc: updating changelog.rst
- adding block for Release 1.2.1a0
- adding cyprienc to contributors
* style: make format
* fix: updating version.txt
Changing version from 1.2.0 to 1.2.1a0
* Update changelog
* Add test for get distribution
Co-authored-by: Cyprien <courtot.c@gmail.com>
* make sure DQN policy is always in correct mode - train or eval
* make set_training_mode an abstract method of the base policy - safer
* update docstring of _build method to note that the target network is put into eval mode
* use set_training_mode to put the dqn target network into eval mode
* use set_training_mode to set the training model of the q-network
* move set_training_mode abstract method from BasePolicy to BaseModel
* set train and eval mode for TD3
* make sure critic is always in correct mode during train
* set train and eval mode for SAC
* add comment re batch norm and dropout
* set train and eval mode for A2C and PPO
* add tests for collect rollouts with batch norm
* fix formatting
* update change log
* update version
* remove Optional typing for batch size - causing type check to fail
* Fix scipy dependency for toy text envs
* implement set_training_mode method in BaseModel
* move all tests of train/eval mode to test_train_eval_mode
* call learn with learning_starts = total_timesteps to test that collect_rollouts does not update batch norm
* remove extra calls to set_training_mode in train method of TD3 and SAC
* Allow gradient_steps=0
* Refactor tests
* Add comment + use aliases
* Typos
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* training and evaluation: call model.train() and model.eval() to enable and disable dropout and batchnorm
* Add comment documentation
* Fix train and eval for the Actor class
* Run black
* Add github handle to changelog
* Add unit tests for PPO and DQN
* Refactor unit test
* Run black
* unit test: add a dropout layer and check that calling predict with deterministic=True is deterministic
* documentation: add bugfix description to changelog
* unit test: use learning_starts=0, decrease the size of the network and use more training steps
* on policy algorithms: call policy.train() and policy.eval() instead of disable_training and enable_training as it is a th.nn.module
* Rename unit test
* unit test: use drop out probability of 0.5
* Call policy.train and policy.eval
* Fixes + update tests
* Remove unneeded eval
Co-authored-by: David Blom <davidsblom@gmail.com>
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Add support for custom objects
* Add python 3.8 to the CI
* Bump version
* PyType fixes
* [ci skip] Fix typo
* Add note about slow-down + fix typos
* Minor edits to the doc
* Bug fix for DQN
* Update test
* Add test for custom objects
* Removed unneeded overrides of feature_extractor and normalize_images in the TD3 Actor.
* Add learning rate schedule example (#248)
* Add learning rate schedule example
* Update docs/guide/examples.rst
Co-authored-by: Adam Gleave <adam@gleave.me>
* Address comments
Co-authored-by: Adam Gleave <adam@gleave.me>
* Add supported action spaces checks (#254)
* Add supported action spaces checks
* Address comment
* Use `pass` in an abstractmethod instead of deleting the arguments.
* Remove the "deterministic" keyword from the forward method of the TD3 Actor since it always is deterministic anyways.
* Rename _get_data to _get_data_to_reconstruct_model.
_get_data was too generic and could have meant anything.
* Remove the n_episodes_rollout parameter and allow passing tuples as train_freq instead.
* Fix docstring of `train_freq` parameter.
* Black fixes.
* Fix TD3 delayed update + rename `_get_data()`
* Fix TD3 test
* Normalize `train_freq` to a tuple in the constructor and turn the warning into an assert.
* Make one step the default train frequency.
* Black fixes.
* Change np.bool to bool.
* Use the tuple format to specify an amount of steps in terms of steps or episodes in the collect_collouts of the off policy algorithm.
* Use the tuple format to specify an amount of steps in terms of steps or episodes in the collect_collouts of HER.
* Use named tuple for train freq
* Rename train_freq to train_every and TrainFreq to ExperienceDuration. Also add some type annotations and documentation.
* Black fixes.
* Revert to train_freq
* Fix terminal observation issues
* Typo
* Fix action noise bug in HER
* Add assert when loading HER models
* Update version
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Adam Gleave <adam@gleave.me>
* add support for text records to logger
* add note on how to access summary writer directly
* escape unicode chars for HumanOutputFormat
* update changelog
* fix formatting
* fix docs
* add tests
* fix formatting
* fix example, link to pytorch docs, update changelog
* move unicode escaping to own function, properly escape quotechars in csv formatter
* switch from n_calls to num_timesteps in example
* make step coherent in example
* use n_calls to check when to login example
* add small hint about log frequency
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* add comment about str is scalar type, improve test input
* Update tests
* Update test_logger.py
* use repr to handle strings in logger
* remove repr from text log output
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Fixed discrete obs support
* Suggest new edit, fix failed test
* Revert "Suggest new edit, fix failed test"
This reverts commit 6892bf05506bb5ad0e87016d8d382705ab72e6a4.
* Fix test
* Special case for discrete obs
Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>
* Add warning about total_env_steps not dividing neatly into batch size
* Stylistic cleanup
* Black reformatting
* Add clearer documentation and update changelog
* Update changelog.rst
* Use specific RolloutBuffer terminology
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Change to minibatch language
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Cleaning up language describing rollout buffer requirements
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Switch to using env.num_envs
* Working tests
* Black and isort still fighting each other
* codestyle finally happy
* Basic test exists, possibly in the wrong file
* Update phrasing
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Added Image and Figure classes to logger. For now, these objects can only be logged by TensorBoardOutputFormat
* Added documentation for figure and image logging into tensorboard
* Updated changelog
* Minor changes to documentation. Reviewed supported types for logging images and figures
* Fix type for np arrays
* Added more explicit example for logging figures in the documentation. Added docstrings for parameters in logging auxiliary classes
* Added tests for image and figure logging
* Applied autoformatting
* Update doc
* Fix documentation example
* Bump version
Co-authored-by: Carlos Casas <ccasascuadrado@guidewire.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Fix big when saving/loading q-net alone
* Rename variables to match SB3-contrib
* Update docker image
* Set min version for tensorboard
* Add SB3-Contrib to doc
* Update DQN
* Apply suggestions from code review
Co-authored-by: Adam Gleave <adam@gleave.me>
* Update wording
Co-authored-by: Adam Gleave <adam@gleave.me>