* Add rollout_buffer_class and rollout_buffer_kwargs parameters to OnPolicyAlgorithm
* Add rollout_buffer_class and rollout_buffer_kwargs to PPO.
* Add rollout_buffer_class and rollout_buffer_kwargs to A2C.
* Make use of the rollout buffer kwargs.
* Update version
* Add test and update doc
---------
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
* Fix failing set_env test
* Fix test failiing due to deprectation of env.seed
* Adjust mean reward threshold in failing test
* Fix her test failing due to rng
* Change seed and revert reward threshold to 90
* Pin gym version
* Make VecEnv compatible with gym seeding change
* Revert change to VecEnv reset signature
* Change subprocenv seed cmd to call reset instead
* Fix type check
* Add backward compat
* Add `compat_gym_seed` helper
* Add goal env checks in env_checker
* Add docs on HER requirements for envs
* Capture user warning in test with inverted box space
* Update ale-py version
* Fix randint
* Allow noop_max to be zero
* Update changelog
* Update docker image
* Update doc conda env and dockerfile
* Custom envs should not have any warnings
* Fix test for numpy >= 1.21
* Add check for vectorized compute reward
* Bump to gym 0.24
* Fix gym default step docstring
* Test downgrading gym
* Revert "Test downgrading gym"
This reverts commit 0072b77156c006ada8a1d6e26ce347ed85a83eeb.
* Fix protobuf error
* Fix in dependencies
* Fix protobuf dep
* Use newest version of cartpole
* Update gym
* Fix warning
* Loosen required scipy version
* Scipy no longer needed
* Try gym 0.25
* Silence warnings from gym
* Filter warnings during tests
* Update doc
* Update requirements
* Add gym 26 compat in vec env
* Fixes in envs and tests for gym 0.26+
* Enforce gym 0.26 api
* format
* Fix formatting
* Fix dependencies
* Fix syntax
* Cleanup doc and warnings
* Faster tests
* Higher budget for HER perf test (revert prev change)
* Fixes and update doc
* Fix doc build
* Fix breaking change
* Fixes for rendering
* Rename variables in monitor
* update render method for gym 0.26 API
backwards compatible (mode argument is allowed) while using the gym 0.26 API (render mode is determined at environment creation)
* update tests and docs to new gym render API
* undo removal of render modes metatadata check
* set rgb_array as default render mode for gym.make
* undo changes & raise warning if not 'rgb_array'
* Fix type check
* Remove recursion and fix type checking
* Remove hacks for protobuf and gym 0.24
* Fix type annotations
* reuse existing render_mode attribute
* return tiled images for 'human' render mode
* Allow to use opencv for human render, fix typos
* Add warning when using non-zero start with Discrete (fixes#1197)
* Fix type checking
* Bug fixes and handle more cases
* Throw proper warnings
* Update test
* Fix new metadata name
* Ignore numpy warnings
* Fixes in vec recorder
* Global ignore
* Filter local warning too
* Monkey patch not needed for gym 26
* Add doc of VecEnv vs Gym API
* Add render test
* Fix return type
* Update VecEnv vs Gym API doc
* Fix for custom render mode
* Fix return type
* Fix type checking
* check test env test_buffer
* skip render check
* check env test_dict_env
* test_env test_gae
* check envs in remaining tests
* Update tests
* Add warning for Discrete action space with non-zero (#1295)
* Fix atari annotation
* ignore get_action_meanings [attr-defined]
* Fix mypy issues
* Add patch for gym/gymnasium transition
* Switch to gymnasium
* Rely on signature instead of version
* More patches
* Type ignore because of https://github.com/Farama-Foundation/Gymnasium/pull/39
* Fix doc build
* Fix pytype errors
* Fix atari requirement
* Update env checker due to change in dtype for Discrete
* Fix type hint
* Convert spaces for saved models
* Ignore pytype
* Remove gitlab CI
* Disable pytype for convert space
* Fix undefined info
* Fix undefined info
* Upgrade shimmy
* Fix wrappers type annotation (need PR from Gymnasium)
* Fix gymnasium dependency
* Fix dependency declaration
* Cap pygame version for python 3.7
* Point to master branch (v0.28.0)
* Fix: use main not master branch
* Rename done to terminated
* Fix pygame dependency for python 3.7
* Rename gym to gymnasium
* Update Gymnasium
* Fix test
* Fix tests
* Forks don't have access to private variables
* Fix linter warnings
* Update read the doc env
* Fix env checker for GoalEnv
* Fix import
* Update env checker (more info) and fix dtype
* Use micromamab for Docker
* Update dependencies
* Clarify VecEnv doc
* Fix Gymnasium version
* Copy file only after mamba install
* [ci skip] Update docker doc
* Polish code
* Reformat
* Remove deprecated features
* Ignore warning
* Update doc
* Update examples and changelog
* Fix type annotation bundle (SAC, TD3, A2C, PPO, base class) (#1436)
* Fix SAC type hints, improve DQN ones
* Fix A2C and TD3 type hints
* Fix PPO type hints
* Fix on-policy type hints
* Fix base class type annotation, do not use defaults
* Update version
* Disable mypy for python 3.7
* Rename Gym26StepReturn
* Update continuous critic type annotation
* Fix pytype complain
---------
Co-authored-by: Carlos Luis <carlos.luisgonc@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Thomas Lips <37955681+tlpss@users.noreply.github.com>
Co-authored-by: tlips <thomas.lips@ugent.be>
Co-authored-by: tlpss <thomas17.lips@gmail.com>
Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>
* Adds deprecation warning if `eval_env` or `eval_freq` parameters are used. See #925
* added changelog entry
* added missing backtick
* deprecating `create_eval_env` parameter as well and adding comments to explain the `stacklevel` parameter used
* Updated tests to ignore DeprecationWarnings
* Updated changelog entry
* - Removed the `create_eval_env` parameter from the examples in the docs
- Removed information about the `create_eval_env` parameter from the migration docs
- Added information about deprecation of the `create_eval_env` parameter in the docs
* Add alternative in docstring
* Update docstrings
* `eval_freq` warning in docstring
* Add deprecation comments in tests
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>
* Add progress bar callback and argument
* Update doc
* Update changelog
* Upgrade pytype in docker image
* Use tqdm.write in the logger to have cleaner output
* Fix logger test
* Fix when doing multiple calls to learn()
* Address comments from code-review
* Updated docstring from n_steps to n_rollout_steps
This must be a typo
* Fixed typo in a comment in ppo.py
* Update changelog
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
* Fix return type for load, learn in BaseAlgorithm
* Update changelog
* Add typing extensions to dependencies
* Import directly from typing for python >3.11
* Reorder changelog to reflect merge order
* Roll back to typevar solution
* Updated changelog
* Remove typing extensions requirement
* Update base_class.py
* Remove final point in changelog
* Additional type fixes across project
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* fix nan in advnatages with batch size 1, for ppo
* changelog
* black
* Simplify test
* Bump version
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Replacing the policy registry with policy "aliases"
* Fixing import order and SAC
* Changing arg. order to be sure policy_aliases is a kwarg
* Import orders
* Removing pytype error check
* Reformat
* Fix alias import
* Not using mutable {} as default for policy_aliases
* Empty aliases initialization
* Using static attributes for policy_aliases
* Fixing isort
* Fixing back bad merge
* Running isort
* Fixing aliases for A2C and PPO
* Using f-string
* Moving policy_aliases definition position
* Adding change in the changelog
* Update version
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Allow PPO to turn of advantage normalization
* update changelog
* Add a test case
* Update test and sanity check
* Fix tests
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Add `system_env_info`
* Add `print_system_info` to load
and store system info at save time
* Remove TODO
* Rename to `get_system_info`
* Import as sb3 for consistency
* Update changelog
* Add warning for old SB3 versions
* Use underscore litteral for more clarity
* make sure DQN policy is always in correct mode - train or eval
* make set_training_mode an abstract method of the base policy - safer
* update docstring of _build method to note that the target network is put into eval mode
* use set_training_mode to put the dqn target network into eval mode
* use set_training_mode to set the training model of the q-network
* move set_training_mode abstract method from BasePolicy to BaseModel
* set train and eval mode for TD3
* make sure critic is always in correct mode during train
* set train and eval mode for SAC
* add comment re batch norm and dropout
* set train and eval mode for A2C and PPO
* add tests for collect rollouts with batch norm
* fix formatting
* update change log
* update version
* remove Optional typing for batch size - causing type check to fail
* Fix scipy dependency for toy text envs
* implement set_training_mode method in BaseModel
* move all tests of train/eval mode to test_train_eval_mode
* call learn with learning_starts = total_timesteps to test that collect_rollouts does not update batch norm
* remove extra calls to set_training_mode in train method of TD3 and SAC
* Allow gradient_steps=0
* Refactor tests
* Add comment + use aliases
* Typos
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* training and evaluation: call model.train() and model.eval() to enable and disable dropout and batchnorm
* Add comment documentation
* Fix train and eval for the Actor class
* Run black
* Add github handle to changelog
* Add unit tests for PPO and DQN
* Refactor unit test
* Run black
* unit test: add a dropout layer and check that calling predict with deterministic=True is deterministic
* documentation: add bugfix description to changelog
* unit test: use learning_starts=0, decrease the size of the network and use more training steps
* on policy algorithms: call policy.train() and policy.eval() instead of disable_training and enable_training as it is a th.nn.module
* Rename unit test
* unit test: use drop out probability of 0.5
* Call policy.train and policy.eval
* Fixes + update tests
* Remove unneeded eval
Co-authored-by: David Blom <davidsblom@gmail.com>
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Add warning about total_env_steps not dividing neatly into batch size
* Stylistic cleanup
* Black reformatting
* Add clearer documentation and update changelog
* Update changelog.rst
* Use specific RolloutBuffer terminology
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Change to minibatch language
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Cleaning up language describing rollout buffer requirements
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Switch to using env.num_envs
* Working tests
* Black and isort still fighting each other
* codestyle finally happy
* Basic test exists, possibly in the wrong file
* Update phrasing
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Fix for arguments order in explained_variance()
Fix for arguments order in explained_variance() in PPO
* Fix for arguments order in explained_variance()
Fix for arguments order in explained_variance() in a2c
* Fix for arguments order in explained_variance()
update changelog.rst
* Add callback signature to the learning rate type annotations.
* Add callback signature to the learning rate schedule type annotations.
* Add missing type annotations for learning rate callbacks.
* Add signature to old-style learning and evaluation callbacks.
* Add signature to env wrapper callback.
* Add type annotation to closure function.
* Use MaybeCallback more consistently.
* Update changelog.
* Remove now unused List import.
* Fix import order.
* Add type alias for learning rate schedules.
* Optimize imports.
* Fix messed up import.
* Remove resolved TODO.
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Small docstring improvements related to the notion of Rollout
* documented changes in changelog.rst, added myself to contributers
* Minor edits
Co-authored-by: Stefan Heid <stefan.heid@upb.de>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Add auto formatting with black and isort
* Reformat code
* Ignore typing errors
* Add note about line length
* Add minimum version for isort
* Add commit-checks
* Update docker image
* Fixed lost import (during last merge)
* Fix opencv dependency
* Split torch module code into torch_layers file
* Updated reference to CNN
* Change 'CxWxH' to 'CxHxW', as per common notion
* Fix missing import in policies.py
* Move PPOPolicy to OnlineActorCriticPolicy
* Create OnPolicyRLModel from PPO, and make A2C and PPO inherit
* Update A2C optimizer comment
* Clean weight init scales for clarity
* Fix A2C log_interval default parameter
* Rename 'progress' to 'progress_remaining
* Rename 'Models' to 'Algorithms'
* Rename 'OnlineActorCriticPolicy' to 'ActorCriticPolicy'
* Move static functions out from BaseAlgorithm
* Move on/off_policy base algorithms to their own files
* Add files for A2C/PPO
* Fix docs
* Fix pytype
* Update documentation on OnPolicyAlgorithm
* Add proper doctstring for on_policy rollout gathering
* Add bit clarification on the mlppolicy/cnnpolicy naming
* Move static function is_vectorized_policies to utils.py
* Checking docstrings, pep8 fixes
* Update changelog
* Clean changelog
* Remove policy warnings for sac/td3
* Add monitor_wrapper for OnPolicyAlgorithm. Clean tb logging variables. Add parameter keywords to OffPolicyAlgorithm super init
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* init commit tensorboard-integration
* Added tb logger to ppo (with output exclusions)
* fixed truncated stdout
* categorize stdout outputs by tag
* separated exclusions from values, added missing logs
* saving exclusions as dict instead of list
* reformatting, auto run indexing
* included renaming suggestions, fixed tests
* tb support for sac
* linting
* moved logging to base class
* tb support for td3
* removed histograms, non-verbose output working
* modifed changelog
* linting
* fixed type error
* moved logger config to utils
* removed episode_rewards log from ppo
* Enable tensorboard in tests
* Remove unused import
* Update logger sub titles
* Minor edit for PPO
* Update logger and tb log folder
* Pass correct logger to Callbacks
* updated docs
* added tb example image to docs
* add support for continuing training in tensorboard
* added tensorboard to docs index
* added tb test
* moved logger config to _setup_learn, updated tests
* accessing verbose from base class
* Update doc and tests
* Rename session -> time
* Update version
* Update logger truncate
* Update types
* Remove duplicated code
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>