* Add success rate in monitor for on policy algorithms
* Update changelog
* make commit-checks refactoring
* Assert buffers are not none in _dump_logs
* Automatic refactoring of the type hinting
* Add success_rate logging test for on policy algorithms
* Update changelog
* Reformat
* Fix tests and update changelog
---------
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Add rollout_buffer_class and rollout_buffer_kwargs parameters to OnPolicyAlgorithm
* Add rollout_buffer_class and rollout_buffer_kwargs to PPO.
* Add rollout_buffer_class and rollout_buffer_kwargs to A2C.
* Make use of the rollout buffer kwargs.
* Update version
* Add test and update doc
---------
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
* fix: Follow PEP8 guidelines and evaluate falsy to truth with `not` rather than `is False`.
https://docs.python.org/2/library/stdtypes.html#truth-value-testing
* chore: Update changelog inline with intent of changes in PR #1707
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* fix: Change `is False` to `not` as per PEP8
* chore: Remove superfluous comment about `is False`
* test: One On- and one Off-Policy algorithm (A2C and SAC respectively), with settings to speed up testing
* Update changelog
* chore: Remove EvalCallback as it's not actually required
* Update changelog.rst
* Rm duplicated "others" section in changelog.rst
---------
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* prevents squash_output if not use_sde, see #1592
* update changelog
* add unscaling of actions taken during training
* add test regarding squashing and unquashing
* avoids try-except block
* format Gymnasium code with black
* makes mypy pass
* makes pytype pass
* sort imports
* makes error message in assert statement clearer
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* improves code commenting
* replaces full env with wrapper
* Cleanup code
* Reformat
---------
Co-authored-by: PatrickHelm <patrick.helm@gmx.net>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
* Fix failing set_env test
* Fix test failiing due to deprectation of env.seed
* Adjust mean reward threshold in failing test
* Fix her test failing due to rng
* Change seed and revert reward threshold to 90
* Pin gym version
* Make VecEnv compatible with gym seeding change
* Revert change to VecEnv reset signature
* Change subprocenv seed cmd to call reset instead
* Fix type check
* Add backward compat
* Add `compat_gym_seed` helper
* Add goal env checks in env_checker
* Add docs on HER requirements for envs
* Capture user warning in test with inverted box space
* Update ale-py version
* Fix randint
* Allow noop_max to be zero
* Update changelog
* Update docker image
* Update doc conda env and dockerfile
* Custom envs should not have any warnings
* Fix test for numpy >= 1.21
* Add check for vectorized compute reward
* Bump to gym 0.24
* Fix gym default step docstring
* Test downgrading gym
* Revert "Test downgrading gym"
This reverts commit 0072b77156c006ada8a1d6e26ce347ed85a83eeb.
* Fix protobuf error
* Fix in dependencies
* Fix protobuf dep
* Use newest version of cartpole
* Update gym
* Fix warning
* Loosen required scipy version
* Scipy no longer needed
* Try gym 0.25
* Silence warnings from gym
* Filter warnings during tests
* Update doc
* Update requirements
* Add gym 26 compat in vec env
* Fixes in envs and tests for gym 0.26+
* Enforce gym 0.26 api
* format
* Fix formatting
* Fix dependencies
* Fix syntax
* Cleanup doc and warnings
* Faster tests
* Higher budget for HER perf test (revert prev change)
* Fixes and update doc
* Fix doc build
* Fix breaking change
* Fixes for rendering
* Rename variables in monitor
* update render method for gym 0.26 API
backwards compatible (mode argument is allowed) while using the gym 0.26 API (render mode is determined at environment creation)
* update tests and docs to new gym render API
* undo removal of render modes metatadata check
* set rgb_array as default render mode for gym.make
* undo changes & raise warning if not 'rgb_array'
* Fix type check
* Remove recursion and fix type checking
* Remove hacks for protobuf and gym 0.24
* Fix type annotations
* reuse existing render_mode attribute
* return tiled images for 'human' render mode
* Allow to use opencv for human render, fix typos
* Add warning when using non-zero start with Discrete (fixes#1197)
* Fix type checking
* Bug fixes and handle more cases
* Throw proper warnings
* Update test
* Fix new metadata name
* Ignore numpy warnings
* Fixes in vec recorder
* Global ignore
* Filter local warning too
* Monkey patch not needed for gym 26
* Add doc of VecEnv vs Gym API
* Add render test
* Fix return type
* Update VecEnv vs Gym API doc
* Fix for custom render mode
* Fix return type
* Fix type checking
* check test env test_buffer
* skip render check
* check env test_dict_env
* test_env test_gae
* check envs in remaining tests
* Update tests
* Add warning for Discrete action space with non-zero (#1295)
* Fix atari annotation
* ignore get_action_meanings [attr-defined]
* Fix mypy issues
* Add patch for gym/gymnasium transition
* Switch to gymnasium
* Rely on signature instead of version
* More patches
* Type ignore because of https://github.com/Farama-Foundation/Gymnasium/pull/39
* Fix doc build
* Fix pytype errors
* Fix atari requirement
* Update env checker due to change in dtype for Discrete
* Fix type hint
* Convert spaces for saved models
* Ignore pytype
* Remove gitlab CI
* Disable pytype for convert space
* Fix undefined info
* Fix undefined info
* Upgrade shimmy
* Fix wrappers type annotation (need PR from Gymnasium)
* Fix gymnasium dependency
* Fix dependency declaration
* Cap pygame version for python 3.7
* Point to master branch (v0.28.0)
* Fix: use main not master branch
* Rename done to terminated
* Fix pygame dependency for python 3.7
* Rename gym to gymnasium
* Update Gymnasium
* Fix test
* Fix tests
* Forks don't have access to private variables
* Fix linter warnings
* Update read the doc env
* Fix env checker for GoalEnv
* Fix import
* Update env checker (more info) and fix dtype
* Use micromamab for Docker
* Update dependencies
* Clarify VecEnv doc
* Fix Gymnasium version
* Copy file only after mamba install
* [ci skip] Update docker doc
* Polish code
* Reformat
* Remove deprecated features
* Ignore warning
* Update doc
* Update examples and changelog
* Fix type annotation bundle (SAC, TD3, A2C, PPO, base class) (#1436)
* Fix SAC type hints, improve DQN ones
* Fix A2C and TD3 type hints
* Fix PPO type hints
* Fix on-policy type hints
* Fix base class type annotation, do not use defaults
* Update version
* Disable mypy for python 3.7
* Rename Gym26StepReturn
* Update continuous critic type annotation
* Fix pytype complain
---------
Co-authored-by: Carlos Luis <carlos.luisgonc@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Thomas Lips <37955681+tlpss@users.noreply.github.com>
Co-authored-by: tlips <thomas.lips@ugent.be>
Co-authored-by: tlpss <thomas17.lips@gmail.com>
Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>
* generalize the use of `from gym import spaces`
* command line get system info
* Documentation line length for doc
* update changelog
* add space before os plateform to avoid ref to other issue
* format
* get_system_info update in changelog
* fix type check error
* fix get system info
* add comment about regex
* update version
* Adds deprecation warning if `eval_env` or `eval_freq` parameters are used. See #925
* added changelog entry
* added missing backtick
* deprecating `create_eval_env` parameter as well and adding comments to explain the `stacklevel` parameter used
* Updated tests to ignore DeprecationWarnings
* Updated changelog entry
* - Removed the `create_eval_env` parameter from the examples in the docs
- Removed information about the `create_eval_env` parameter from the migration docs
- Added information about deprecation of the `create_eval_env` parameter in the docs
* Add alternative in docstring
* Update docstrings
* `eval_freq` warning in docstring
* Add deprecation comments in tests
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>
* Add progress bar callback and argument
* Update doc
* Update changelog
* Upgrade pytype in docker image
* Use tqdm.write in the logger to have cleaner output
* Fix logger test
* Fix when doing multiple calls to learn()
* Address comments from code-review
* Updated docstring from n_steps to n_rollout_steps
This must be a typo
* Fixed typo in a comment in ppo.py
* Update changelog
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
* Fix return type for load, learn in BaseAlgorithm
* Update changelog
* Add typing extensions to dependencies
* Import directly from typing for python >3.11
* Reorder changelog to reflect merge order
* Roll back to typevar solution
* Updated changelog
* Remove typing extensions requirement
* Update base_class.py
* Remove final point in changelog
* Additional type fixes across project
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Use higher resolution time and round up to eps
* Update changelog
* Add test case
* Fix formatting, time()->time_ns
* Bugfix: ns is integer not float
* Move test to better place
* Divide by 1e9 earlier
* Replacing the policy registry with policy "aliases"
* Fixing import order and SAC
* Changing arg. order to be sure policy_aliases is a kwarg
* Import orders
* Removing pytype error check
* Reformat
* Fix alias import
* Not using mutable {} as default for policy_aliases
* Empty aliases initialization
* Using static attributes for policy_aliases
* Fixing isort
* Fixing back bad merge
* Running isort
* Fixing aliases for A2C and PPO
* Using f-string
* Moving policy_aliases definition position
* Adding change in the changelog
* Update version
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Add multi-env training support for SAC
* Fix for dict obs
* Pytype fixes
* Fix assert on number of envs
* Remove for loop
* Add support for Dict obs
* Start cleanup
* Update doc and bug fix
* Add support for vectorized action noise
and add multi env example for off-policy
* Update version
* Bug fix with VecNormalize
* Update README table
* Update variable names
* Update changelog and version
* Update doc and fix for `gradient_steps=-1`
* Add test for `gradient_steps=-1`
* Disable pytype pyi errors
* Fix for DQN
* Update comment on deepcopy
* Remove episode_reward field
* Fix RolloutReturn
* Avoid modification by reference
* Fix error message
Co-authored-by: Anssi <kaneran21@hotmail.com>
* Store number of timesteps at the beginning of each learn cycle
* Update changelog
* Set default _num_timesteps_at_start in the contructor
* Test case for FPS logger
* Adjust test to cover both on-policy and off-policy algorithms
* Fix formatting
* Update test and add comment
* Fix test
Co-authored-by: Oleksii Kachaiev <okachaiev@riotgames.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* make sure DQN policy is always in correct mode - train or eval
* make set_training_mode an abstract method of the base policy - safer
* update docstring of _build method to note that the target network is put into eval mode
* use set_training_mode to put the dqn target network into eval mode
* use set_training_mode to set the training model of the q-network
* move set_training_mode abstract method from BasePolicy to BaseModel
* set train and eval mode for TD3
* make sure critic is always in correct mode during train
* set train and eval mode for SAC
* add comment re batch norm and dropout
* set train and eval mode for A2C and PPO
* add tests for collect rollouts with batch norm
* fix formatting
* update change log
* update version
* remove Optional typing for batch size - causing type check to fail
* Fix scipy dependency for toy text envs
* implement set_training_mode method in BaseModel
* move all tests of train/eval mode to test_train_eval_mode
* call learn with learning_starts = total_timesteps to test that collect_rollouts does not update batch norm
* remove extra calls to set_training_mode in train method of TD3 and SAC
* Allow gradient_steps=0
* Refactor tests
* Add comment + use aliases
* Typos
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* training and evaluation: call model.train() and model.eval() to enable and disable dropout and batchnorm
* Add comment documentation
* Fix train and eval for the Actor class
* Run black
* Add github handle to changelog
* Add unit tests for PPO and DQN
* Refactor unit test
* Run black
* unit test: add a dropout layer and check that calling predict with deterministic=True is deterministic
* documentation: add bugfix description to changelog
* unit test: use learning_starts=0, decrease the size of the network and use more training steps
* on policy algorithms: call policy.train() and policy.eval() instead of disable_training and enable_training as it is a th.nn.module
* Rename unit test
* unit test: use drop out probability of 0.5
* Call policy.train and policy.eval
* Fixes + update tests
* Remove unneeded eval
Co-authored-by: David Blom <davidsblom@gmail.com>
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Add support for custom objects
* Add python 3.8 to the CI
* Bump version
* PyType fixes
* [ci skip] Fix typo
* Add note about slow-down + fix typos
* Minor edits to the doc
* Bug fix for DQN
* Update test
* Add test for custom objects
* Add callback signature to the learning rate type annotations.
* Add callback signature to the learning rate schedule type annotations.
* Add missing type annotations for learning rate callbacks.
* Add signature to old-style learning and evaluation callbacks.
* Add signature to env wrapper callback.
* Add type annotation to closure function.
* Use MaybeCallback more consistently.
* Update changelog.
* Remove now unused List import.
* Fix import order.
* Add type alias for learning rate schedules.
* Optimize imports.
* Fix messed up import.
* Remove resolved TODO.
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Small docstring improvements related to the notion of Rollout
* documented changes in changelog.rst, added myself to contributers
* Minor edits
Co-authored-by: Stefan Heid <stefan.heid@upb.de>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Update comments and docstrings
* Rename get_torch_variables to private and update docs
* Clarify documentation on data, params and tensors
* Make excluded_save_params private and update docs
* Update get_torch_variable_names to get_torch_save_params for description
* Simplify saving code and update docs on params vs tensors
* Rename saved item tensors to pytorch_variables for clarity
* Reformat
* Fix a typo
* Add get/set_parameters, update tests accordingly
* Use f-strings for formatting
* Fix load docstring
* Reorganize functions in BaseClass
* Update changelog
* Add library version to the stored models
* Actually run isort this time
* Fix flake8 complaints and also fix testing code
* Fix isort
* ...and black
* Fix set_random_seed
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
* Fix storing correct episode dones
* Fix number of filters in NatureCNN network
* Add TF-like RMSprop for matching performance with sb2
* Remove stuff that was accidentally included
* Reformat
* Clarify variable naming
* Update changelog
* Add comment on RMSprop implementations to A2C
* Add test for RMSpropTFLike
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Add auto formatting with black and isort
* Reformat code
* Ignore typing errors
* Add note about line length
* Add minimum version for isort
* Add commit-checks
* Update docker image
* Fixed lost import (during last merge)
* Fix opencv dependency
* Split torch module code into torch_layers file
* Updated reference to CNN
* Change 'CxWxH' to 'CxHxW', as per common notion
* Fix missing import in policies.py
* Move PPOPolicy to OnlineActorCriticPolicy
* Create OnPolicyRLModel from PPO, and make A2C and PPO inherit
* Update A2C optimizer comment
* Clean weight init scales for clarity
* Fix A2C log_interval default parameter
* Rename 'progress' to 'progress_remaining
* Rename 'Models' to 'Algorithms'
* Rename 'OnlineActorCriticPolicy' to 'ActorCriticPolicy'
* Move static functions out from BaseAlgorithm
* Move on/off_policy base algorithms to their own files
* Add files for A2C/PPO
* Fix docs
* Fix pytype
* Update documentation on OnPolicyAlgorithm
* Add proper doctstring for on_policy rollout gathering
* Add bit clarification on the mlppolicy/cnnpolicy naming
* Move static function is_vectorized_policies to utils.py
* Checking docstrings, pep8 fixes
* Update changelog
* Clean changelog
* Remove policy warnings for sac/td3
* Add monitor_wrapper for OnPolicyAlgorithm. Clean tb logging variables. Add parameter keywords to OffPolicyAlgorithm super init
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>