* Adds deprecation warning if `eval_env` or `eval_freq` parameters are used. See #925
* added changelog entry
* added missing backtick
* deprecating `create_eval_env` parameter as well and adding comments to explain the `stacklevel` parameter used
* Updated tests to ignore DeprecationWarnings
* Updated changelog entry
* - Removed the `create_eval_env` parameter from the examples in the docs
- Removed information about the `create_eval_env` parameter from the migration docs
- Added information about deprecation of the `create_eval_env` parameter in the docs
* Add alternative in docstring
* Update docstrings
* `eval_freq` warning in docstring
* Add deprecation comments in tests
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>
* Add progress bar callback and argument
* Update doc
* Update changelog
* Upgrade pytype in docker image
* Use tqdm.write in the logger to have cleaner output
* Fix logger test
* Fix when doing multiple calls to learn()
* Address comments from code-review
* Updated docstring from n_steps to n_rollout_steps
This must be a typo
* Fixed typo in a comment in ppo.py
* Update changelog
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
* Fix return type for load, learn in BaseAlgorithm
* Update changelog
* Add typing extensions to dependencies
* Import directly from typing for python >3.11
* Reorder changelog to reflect merge order
* Roll back to typevar solution
* Updated changelog
* Remove typing extensions requirement
* Update base_class.py
* Remove final point in changelog
* Additional type fixes across project
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Use higher resolution time and round up to eps
* Update changelog
* Add test case
* Fix formatting, time()->time_ns
* Bugfix: ns is integer not float
* Move test to better place
* Divide by 1e9 earlier
* Replacing the policy registry with policy "aliases"
* Fixing import order and SAC
* Changing arg. order to be sure policy_aliases is a kwarg
* Import orders
* Removing pytype error check
* Reformat
* Fix alias import
* Not using mutable {} as default for policy_aliases
* Empty aliases initialization
* Using static attributes for policy_aliases
* Fixing isort
* Fixing back bad merge
* Running isort
* Fixing aliases for A2C and PPO
* Using f-string
* Moving policy_aliases definition position
* Adding change in the changelog
* Update version
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Add multi-env training support for SAC
* Fix for dict obs
* Pytype fixes
* Fix assert on number of envs
* Remove for loop
* Add support for Dict obs
* Start cleanup
* Update doc and bug fix
* Add support for vectorized action noise
and add multi env example for off-policy
* Update version
* Bug fix with VecNormalize
* Update README table
* Update variable names
* Update changelog and version
* Update doc and fix for `gradient_steps=-1`
* Add test for `gradient_steps=-1`
* Disable pytype pyi errors
* Fix for DQN
* Update comment on deepcopy
* Remove episode_reward field
* Fix RolloutReturn
* Avoid modification by reference
* Fix error message
Co-authored-by: Anssi <kaneran21@hotmail.com>
* Store number of timesteps at the beginning of each learn cycle
* Update changelog
* Set default _num_timesteps_at_start in the contructor
* Test case for FPS logger
* Adjust test to cover both on-policy and off-policy algorithms
* Fix formatting
* Update test and add comment
* Fix test
Co-authored-by: Oleksii Kachaiev <okachaiev@riotgames.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* make sure DQN policy is always in correct mode - train or eval
* make set_training_mode an abstract method of the base policy - safer
* update docstring of _build method to note that the target network is put into eval mode
* use set_training_mode to put the dqn target network into eval mode
* use set_training_mode to set the training model of the q-network
* move set_training_mode abstract method from BasePolicy to BaseModel
* set train and eval mode for TD3
* make sure critic is always in correct mode during train
* set train and eval mode for SAC
* add comment re batch norm and dropout
* set train and eval mode for A2C and PPO
* add tests for collect rollouts with batch norm
* fix formatting
* update change log
* update version
* remove Optional typing for batch size - causing type check to fail
* Fix scipy dependency for toy text envs
* implement set_training_mode method in BaseModel
* move all tests of train/eval mode to test_train_eval_mode
* call learn with learning_starts = total_timesteps to test that collect_rollouts does not update batch norm
* remove extra calls to set_training_mode in train method of TD3 and SAC
* Allow gradient_steps=0
* Refactor tests
* Add comment + use aliases
* Typos
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* training and evaluation: call model.train() and model.eval() to enable and disable dropout and batchnorm
* Add comment documentation
* Fix train and eval for the Actor class
* Run black
* Add github handle to changelog
* Add unit tests for PPO and DQN
* Refactor unit test
* Run black
* unit test: add a dropout layer and check that calling predict with deterministic=True is deterministic
* documentation: add bugfix description to changelog
* unit test: use learning_starts=0, decrease the size of the network and use more training steps
* on policy algorithms: call policy.train() and policy.eval() instead of disable_training and enable_training as it is a th.nn.module
* Rename unit test
* unit test: use drop out probability of 0.5
* Call policy.train and policy.eval
* Fixes + update tests
* Remove unneeded eval
Co-authored-by: David Blom <davidsblom@gmail.com>
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Add support for custom objects
* Add python 3.8 to the CI
* Bump version
* PyType fixes
* [ci skip] Fix typo
* Add note about slow-down + fix typos
* Minor edits to the doc
* Bug fix for DQN
* Update test
* Add test for custom objects
* Add callback signature to the learning rate type annotations.
* Add callback signature to the learning rate schedule type annotations.
* Add missing type annotations for learning rate callbacks.
* Add signature to old-style learning and evaluation callbacks.
* Add signature to env wrapper callback.
* Add type annotation to closure function.
* Use MaybeCallback more consistently.
* Update changelog.
* Remove now unused List import.
* Fix import order.
* Add type alias for learning rate schedules.
* Optimize imports.
* Fix messed up import.
* Remove resolved TODO.
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Small docstring improvements related to the notion of Rollout
* documented changes in changelog.rst, added myself to contributers
* Minor edits
Co-authored-by: Stefan Heid <stefan.heid@upb.de>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Update comments and docstrings
* Rename get_torch_variables to private and update docs
* Clarify documentation on data, params and tensors
* Make excluded_save_params private and update docs
* Update get_torch_variable_names to get_torch_save_params for description
* Simplify saving code and update docs on params vs tensors
* Rename saved item tensors to pytorch_variables for clarity
* Reformat
* Fix a typo
* Add get/set_parameters, update tests accordingly
* Use f-strings for formatting
* Fix load docstring
* Reorganize functions in BaseClass
* Update changelog
* Add library version to the stored models
* Actually run isort this time
* Fix flake8 complaints and also fix testing code
* Fix isort
* ...and black
* Fix set_random_seed
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
* Fix storing correct episode dones
* Fix number of filters in NatureCNN network
* Add TF-like RMSprop for matching performance with sb2
* Remove stuff that was accidentally included
* Reformat
* Clarify variable naming
* Update changelog
* Add comment on RMSprop implementations to A2C
* Add test for RMSpropTFLike
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Add auto formatting with black and isort
* Reformat code
* Ignore typing errors
* Add note about line length
* Add minimum version for isort
* Add commit-checks
* Update docker image
* Fixed lost import (during last merge)
* Fix opencv dependency
* Split torch module code into torch_layers file
* Updated reference to CNN
* Change 'CxWxH' to 'CxHxW', as per common notion
* Fix missing import in policies.py
* Move PPOPolicy to OnlineActorCriticPolicy
* Create OnPolicyRLModel from PPO, and make A2C and PPO inherit
* Update A2C optimizer comment
* Clean weight init scales for clarity
* Fix A2C log_interval default parameter
* Rename 'progress' to 'progress_remaining
* Rename 'Models' to 'Algorithms'
* Rename 'OnlineActorCriticPolicy' to 'ActorCriticPolicy'
* Move static functions out from BaseAlgorithm
* Move on/off_policy base algorithms to their own files
* Add files for A2C/PPO
* Fix docs
* Fix pytype
* Update documentation on OnPolicyAlgorithm
* Add proper doctstring for on_policy rollout gathering
* Add bit clarification on the mlppolicy/cnnpolicy naming
* Move static function is_vectorized_policies to utils.py
* Checking docstrings, pep8 fixes
* Update changelog
* Clean changelog
* Remove policy warnings for sac/td3
* Add monitor_wrapper for OnPolicyAlgorithm. Clean tb logging variables. Add parameter keywords to OffPolicyAlgorithm super init
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>