* Fixing #1791
* Update test and version
* Add test for callback after eval
* Fix mypy error
* Remove tqdm warnings
---------
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* fix: Follow PEP8 guidelines and evaluate falsy to truth with `not` rather than `is False`.
https://docs.python.org/2/library/stdtypes.html#truth-value-testing
* chore: Update changelog inline with intent of changes in PR #1707
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* fix: Change `is False` to `not` as per PEP8
* chore: Remove superfluous comment about `is False`
* test: One On- and one Off-Policy algorithm (A2C and SAC respectively), with settings to speed up testing
* Update changelog
* chore: Remove EvalCallback as it's not actually required
* Update changelog.rst
* Rm duplicated "others" section in changelog.rst
---------
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Fix failing set_env test
* Fix test failiing due to deprectation of env.seed
* Adjust mean reward threshold in failing test
* Fix her test failing due to rng
* Change seed and revert reward threshold to 90
* Pin gym version
* Make VecEnv compatible with gym seeding change
* Revert change to VecEnv reset signature
* Change subprocenv seed cmd to call reset instead
* Fix type check
* Add backward compat
* Add `compat_gym_seed` helper
* Add goal env checks in env_checker
* Add docs on HER requirements for envs
* Capture user warning in test with inverted box space
* Update ale-py version
* Fix randint
* Allow noop_max to be zero
* Update changelog
* Update docker image
* Update doc conda env and dockerfile
* Custom envs should not have any warnings
* Fix test for numpy >= 1.21
* Add check for vectorized compute reward
* Bump to gym 0.24
* Fix gym default step docstring
* Test downgrading gym
* Revert "Test downgrading gym"
This reverts commit 0072b77156c006ada8a1d6e26ce347ed85a83eeb.
* Fix protobuf error
* Fix in dependencies
* Fix protobuf dep
* Use newest version of cartpole
* Update gym
* Fix warning
* Loosen required scipy version
* Scipy no longer needed
* Try gym 0.25
* Silence warnings from gym
* Filter warnings during tests
* Update doc
* Update requirements
* Add gym 26 compat in vec env
* Fixes in envs and tests for gym 0.26+
* Enforce gym 0.26 api
* format
* Fix formatting
* Fix dependencies
* Fix syntax
* Cleanup doc and warnings
* Faster tests
* Higher budget for HER perf test (revert prev change)
* Fixes and update doc
* Fix doc build
* Fix breaking change
* Fixes for rendering
* Rename variables in monitor
* update render method for gym 0.26 API
backwards compatible (mode argument is allowed) while using the gym 0.26 API (render mode is determined at environment creation)
* update tests and docs to new gym render API
* undo removal of render modes metatadata check
* set rgb_array as default render mode for gym.make
* undo changes & raise warning if not 'rgb_array'
* Fix type check
* Remove recursion and fix type checking
* Remove hacks for protobuf and gym 0.24
* Fix type annotations
* reuse existing render_mode attribute
* return tiled images for 'human' render mode
* Allow to use opencv for human render, fix typos
* Add warning when using non-zero start with Discrete (fixes#1197)
* Fix type checking
* Bug fixes and handle more cases
* Throw proper warnings
* Update test
* Fix new metadata name
* Ignore numpy warnings
* Fixes in vec recorder
* Global ignore
* Filter local warning too
* Monkey patch not needed for gym 26
* Add doc of VecEnv vs Gym API
* Add render test
* Fix return type
* Update VecEnv vs Gym API doc
* Fix for custom render mode
* Fix return type
* Fix type checking
* check test env test_buffer
* skip render check
* check env test_dict_env
* test_env test_gae
* check envs in remaining tests
* Update tests
* Add warning for Discrete action space with non-zero (#1295)
* Fix atari annotation
* ignore get_action_meanings [attr-defined]
* Fix mypy issues
* Add patch for gym/gymnasium transition
* Switch to gymnasium
* Rely on signature instead of version
* More patches
* Type ignore because of https://github.com/Farama-Foundation/Gymnasium/pull/39
* Fix doc build
* Fix pytype errors
* Fix atari requirement
* Update env checker due to change in dtype for Discrete
* Fix type hint
* Convert spaces for saved models
* Ignore pytype
* Remove gitlab CI
* Disable pytype for convert space
* Fix undefined info
* Fix undefined info
* Upgrade shimmy
* Fix wrappers type annotation (need PR from Gymnasium)
* Fix gymnasium dependency
* Fix dependency declaration
* Cap pygame version for python 3.7
* Point to master branch (v0.28.0)
* Fix: use main not master branch
* Rename done to terminated
* Fix pygame dependency for python 3.7
* Rename gym to gymnasium
* Update Gymnasium
* Fix test
* Fix tests
* Forks don't have access to private variables
* Fix linter warnings
* Update read the doc env
* Fix env checker for GoalEnv
* Fix import
* Update env checker (more info) and fix dtype
* Use micromamab for Docker
* Update dependencies
* Clarify VecEnv doc
* Fix Gymnasium version
* Copy file only after mamba install
* [ci skip] Update docker doc
* Polish code
* Reformat
* Remove deprecated features
* Ignore warning
* Update doc
* Update examples and changelog
* Fix type annotation bundle (SAC, TD3, A2C, PPO, base class) (#1436)
* Fix SAC type hints, improve DQN ones
* Fix A2C and TD3 type hints
* Fix PPO type hints
* Fix on-policy type hints
* Fix base class type annotation, do not use defaults
* Update version
* Disable mypy for python 3.7
* Rename Gym26StepReturn
* Update continuous critic type annotation
* Fix pytype complain
---------
Co-authored-by: Carlos Luis <carlos.luisgonc@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Thomas Lips <37955681+tlpss@users.noreply.github.com>
Co-authored-by: tlips <thomas.lips@ugent.be>
Co-authored-by: tlpss <thomas17.lips@gmail.com>
Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>
* Add progress bar callback and argument
* Update doc
* Update changelog
* Upgrade pytype in docker image
* Use tqdm.write in the logger to have cleaner output
* Fix logger test
* Fix when doing multiple calls to learn()
* Address comments from code-review
* Added StopTrainingOnNoModelImprovement callback and callback_after_eval parameter in EvalCallback
* Correction in EvalCallback and tests for StopTrainingOnNoModelImprovement
* Update the docs related to new StopTrainingOnNoModelImprovement callback
* Update doc
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
* fix Atari in CI
* fix dtype and atari extra
* Update setup.py
* remove 3.6
* note about how to install Atari
* pendulum-v1
* atari v5
* black
* fix pendulum capitalization
* add minimum version
* moved things in changelog to breaking changes
* partial v5 fix
* env update to pass tests
* mismatch env version fixed
* Fix tests after merge
* Include autorom in setup.py
* Blacken code
* Fix dtype issue in more robust way
* Fix GitLab CI: switch to Docker container with new black version
* Remove workaround from GitLab. (May need to rebuild Docker for this though.)
* Revert to v4
* Update setup.py
* Apply suggestions from code review
* Remove unnecessary autorom
* Consistent gym versions
Co-authored-by: J K Terry <justinkterry@gmail.com>
Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: modanesh <mohamad4danesh@gmail.com>
Co-authored-by: Adam Gleave <adam@gleave.me>
* Update evaluate_policy to use monitor data if available
* Update documentation
* Cleaning up
* Remove unnecessary typing trickery
* Update doc
* Rename is_wrapped to clarify it is for vecenvs
* Add is_wrapped for regular envs
* Add is_wrapped call for subprocvecenv and update code for circular imports
* Move new functions back to env_util and fix imports
* Update changelog
* Clarify evaluate_policy docs
* Add tests for wrapped modifying episode lengths
* Fix tests
* Update changelog
* Minor edits
* Add warn switch to evaluate_policy and update tests
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Add auto formatting with black and isort
* Reformat code
* Ignore typing errors
* Add note about line length
* Add minimum version for isort
* Add commit-checks
* Update docker image
* Fixed lost import (during last merge)
* Fix opencv dependency
* Add DDPG + TD3 with any number of critics
* Allow any number of critics for SAC
* Update doc
* [ci skip] Update DDPG example
* Remove unused parameter
* Add DDPG to identity test
* Fix computation with n_critics=1,3
* Update doc
* Apply suggestions from code review
Co-authored-by: Adam Gleave <adam@gleave.me>
* Update docstrings for off-policy algos
* Add check for sde
Co-authored-by: Adam Gleave <adam@gleave.me>
* Created DQN template according to the paper.
Next steps:
- Create Policy
- Complete Training
- Debug
* Changed Base Class
* refactor save, to be consistence with overriding the excluded_save_params function. Do not try to exclude the parameters twice.
* Added simple DQN policy
* Finished learn and train function
- missing correct loss computation
* changed collect_rollouts to work with discrete space
* moved discrete space collect_rollouts to dqn
* basic dqn working
* deleted SDE related code
* added gradient clipping and moved greedy policy to policy
* changed policy to implement target network
and added soft update(in fact standart tau is 1 so hard update)
* fixed policy setup
* rebase target_update_intervall on _n_updates
* adapted all tests
all tests passing
* Move to stable-baseline3
* Fixes for DQN
* Fix tests + add CNNPolicy
* Allow any optimizer for DQN
* added some util functions to create a arbitrary linear schedule, fixed pickle problem with old exploration schedule
* more documentation
* changed buffer dtype
* refactor and document
* Added Sphinx Documentation
Updated changelog.rst
* removed custom collect_rollouts as it is no longer necessary
* Implemented suggestions to clean code and documentation.
* extracted some functions on tests to reduce duplicated code
* added support for exploration_fraction
* Fixed exploration_fraction
* Added documentation
* Fixed get_linear_fn -> proper progress scaling
* Merged master
* Added nature reference
* Changed default parameters to https://www.nature.com/articles/nature14236/tables/1
* Fixed n_updates to be incremented correctly
* Correct train_freq
* Doc update
* added special parameter for DQN in tests
* different fix for test_discrete
* Update docs/modules/dqn.rst
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Update docs/modules/dqn.rst
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Update docs/modules/dqn.rst
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Added RMSProp in optimizer_kwargs, as described in nature paper
* Exploration fraction is inverse of 50.000.000 (total frames) / 1.000.000 (frames with linear schedule) according to nature paper
* Changelog update for buffer dtype
* standard exlude parameters should be always excluded to assure proper saving only if intentionally included by ``include`` parameter
* slightly more iterations on test_discrete to pass the test
* added param use_rms_prop instead of mutable default argument
* forgot alpha
* using huber loss, adam and learning rate 1e-4
* account for train_freq in update_target_network
* Added memory check for both buffers
* Doc updated for buffer allocation
* Added psutil Requirement
* Adapted test_identity.py
* Fixes with new SB3 version
* Fix for tensorboard name
* Convert assert to warning and fix tests
* Refactor off-policy algorithms
* Fixes
* test: remove next_obs in replay buffer
* Update changelog
* Fix tests and use tmp_path where possible
* Fix sampling bug in buffer
* Do not store next obs on episode termination
* Fix replay buffer sampling
* Update comment
* moved epsilon from policy to model
* Update predict method
* Update atari wrappers to match SB2
* Minor edit in the buffers
* Update changelog
* Merge branch 'master' into dqn
* Update DQN to new structure
* Fix tests and remove hardcoded path
* Fix for DQN
* Disable memory efficient replay buffer by default
* Fix docstring
* Add tests for memory efficient buffer
* Update changelog
* Split collect rollout
* Move target update outside `train()` for DQN
* Update changelog
* Update linear schedule doc
* Cleanup DQN code
* Minor edit
* Update version and docker images
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>