* Update Gymnasium to v1.0.0a1
* Comment out `gymnasium.wrappers.monitor` (todo update to VideoRecord)
* Fix ruff warnings
* Register Atari envs
* Update `getattr` to `Env.get_wrapper_attr`
* Reorder imports
* Fix `seed` order
* Fix collecting `max_steps`
* Copy and paste video recorder to prevent the need to rewrite the vec vide recorder wrapper
* Use `typing.List` rather than list
* Fix env attribute forwarding
* Separate out env attribute collection from its utilisation
* Update for Gymnasium alpha 2
* Remove assert for OrderedDict
* Update setup.py
* Add type: ignore
* Test with Gymnasium main
* Remove `gymnasium.logger.debug/info`
* Fix github CI yaml
* Run gym 0.29.1 on python 3.10
* Update lower bounds
* Integrate video recorder
* Remove ordered dict
* Update changelog
---------
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Update documentation
Added comment to PPO documentation that CPU should primarily be used unless using CNN as well as sample code. Added warning to user for both PPO and A2C that CPU should be used if the user is running GPU without using a CNN, reference Issue #1245.
* Add warning to base class and add test
---------
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Add np.ndarray as a recognized type for TB histograms.
Torch histograms allow th.Tensor, np.ndarray, and caffe2 formatted strings. This commits expands the TensorBoardOutputFormat's capabilities to log the two former types.
* Update changelog to reflect bug fix
* fix: try/catch for if either np or torch aren't at the required versions. See https://github.com/DLR-RM/stable-baselines3/pull/1635 for more details
* fix: Add comment describing the test for when add_histogram should not have been called
* Cleanup
---------
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Add support for pre and post linear modules in `create_mlp`
* Disable mypy for python 3.8
* Reformat toml file
* Update docstring
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* Add some comments
---------
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* Fixing #1791
* Update test and version
* Add test for callback after eval
* Fix mypy error
* Remove tqdm warnings
---------
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Fix loading a model with net_arch=None
* Remove redundant get
* Dummy commit
* Add to contributors
* Update test and version
---------
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* create failing test for unpickle error
* Fix learning_rate argument causing failure in weights_only=True if passed a function with non-float types
* Updated with feedback from araffin on PR#1901
* Update test and version
* Update changelog and SBX doc
---------
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Add success rate in monitor for on policy algorithms
* Update changelog
* make commit-checks refactoring
* Assert buffers are not none in _dump_logs
* Automatic refactoring of the type hinting
* Add success_rate logging test for on policy algorithms
* Update changelog
* Reformat
* Fix tests and update changelog
---------
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Add rollout_buffer_class and rollout_buffer_kwargs parameters to OnPolicyAlgorithm
* Add rollout_buffer_class and rollout_buffer_kwargs to PPO.
* Add rollout_buffer_class and rollout_buffer_kwargs to A2C.
* Make use of the rollout buffer kwargs.
* Update version
* Add test and update doc
---------
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
* Update signatures, and test with options
* Update changelog and black formatting
* Finish implementation (fixes, doc, tests)
* Use deepcopy to avoid side effects (modif by reference)
* Fix for mypy
---------
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* fix: Follow PEP8 guidelines and evaluate falsy to truth with `not` rather than `is False`.
https://docs.python.org/2/library/stdtypes.html#truth-value-testing
* chore: Update changelog inline with intent of changes in PR #1707
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* fix: Change `is False` to `not` as per PEP8
* chore: Remove superfluous comment about `is False`
* test: One On- and one Off-Policy algorithm (A2C and SAC respectively), with settings to speed up testing
* Update changelog
* chore: Remove EvalCallback as it's not actually required
* Update changelog.rst
* Rm duplicated "others" section in changelog.rst
---------
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* prevents squash_output if not use_sde, see #1592
* update changelog
* add unscaling of actions taken during training
* add test regarding squashing and unquashing
* avoids try-except block
* format Gymnasium code with black
* makes mypy pass
* makes pytype pass
* sort imports
* makes error message in assert statement clearer
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* improves code commenting
* replaces full env with wrapper
* Cleanup code
* Reformat
---------
Co-authored-by: PatrickHelm <patrick.helm@gmx.net>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
* Fix type hints in `common/utils.py`
* Fix `VecTranspose` type annotations
* Fix types for callbacks
* Update changelog
* Fix video recorder type hints
* Fix save utils type hints
* Allow BytesIO
* Improve error message
* Make logger and training env properties
* Clarify which open_path fn is called
* Fix bug in env_checker.py bounds warning message
* Fix bug where Gym Environment Checker does not output the correct warning message when dealing with observation spaces that have different upper and different lower bounds
* Update test_env_checker.py with more comprehensive tests
* Make naming consistent
* Update version
* Catch all invalid indices at once
---------
Co-authored-by: gabo_tor <gabriel0torre@gmail.com>
* Added test cases where off policy algorithms fail with float64 actionspace
* casting observations and actions to `np.float32` to unify behaviour between `ReplayBuffer` and `RolloutBuffer`. Fixing issue #1145
* reformatted using black
* making test more restrictive by checking models action is float64
* added changelog entry
* undo cast of observations as `preprocessing.preprocess_obs()` casts them to float32 anyways.
* - Casting to float32 only, if action.dtype is float64
- Added cast to `DictReplayBuffer` as well
* Added tests for multiple variations of continuous action types and observation spaces
* applied reformatting by `make commit-checks`
* Added typing and comment referring to description in merge request
* Apply linter for single element slice
* Rename helper and refactor tests
* Update changelog and docstring
---------
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Update setup.py to v0.29.0
* Remove invalid test
* Loosen version and update changelog
---------
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Fix env checker single-step-env edge case
Before this change, env checker failed to `reset()` the tested
environment before calling `step()` when checking for `Inf` / `NaN`.
This could cause environments which happened to have only one `step()`
available before the episode was terminated to fail.
This is now fixed.
* Code review fixes#1
As suggested by Antonin Raffin <antonin.raffin@ensta.org>.
* Switch from List to Sequence for `seed()` type hint
* Fix logger type hints
* Improve replay buffer type hints
* Fix custom envs type annotations
* Fix VecMonitor type hints
* Fix RMSprop type hint
* Fix vec extract dict obs type hints
* Fix vec frame stack type annotations
* Fix base vec env type hints
* Fix dummy vec env type hints
* Fix for mypy
* Fixes for the tests
* mypy doesn't like when we overwrite type
* fix step of SimpleMultiObsEnv
* remove useless type specification
* Rm useless type hint
* Improve logger type hint
* format
* rm useless type hint
* Re-add variables in constructor, remove unused import
---------
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* Fix failing set_env test
* Fix test failiing due to deprectation of env.seed
* Adjust mean reward threshold in failing test
* Fix her test failing due to rng
* Change seed and revert reward threshold to 90
* Pin gym version
* Make VecEnv compatible with gym seeding change
* Revert change to VecEnv reset signature
* Change subprocenv seed cmd to call reset instead
* Fix type check
* Add backward compat
* Add `compat_gym_seed` helper
* Add goal env checks in env_checker
* Add docs on HER requirements for envs
* Capture user warning in test with inverted box space
* Update ale-py version
* Fix randint
* Allow noop_max to be zero
* Update changelog
* Update docker image
* Update doc conda env and dockerfile
* Custom envs should not have any warnings
* Fix test for numpy >= 1.21
* Add check for vectorized compute reward
* Bump to gym 0.24
* Fix gym default step docstring
* Test downgrading gym
* Revert "Test downgrading gym"
This reverts commit 0072b77156c006ada8a1d6e26ce347ed85a83eeb.
* Fix protobuf error
* Fix in dependencies
* Fix protobuf dep
* Use newest version of cartpole
* Update gym
* Fix warning
* Loosen required scipy version
* Scipy no longer needed
* Try gym 0.25
* Silence warnings from gym
* Filter warnings during tests
* Update doc
* Update requirements
* Add gym 26 compat in vec env
* Fixes in envs and tests for gym 0.26+
* Enforce gym 0.26 api
* format
* Fix formatting
* Fix dependencies
* Fix syntax
* Cleanup doc and warnings
* Faster tests
* Higher budget for HER perf test (revert prev change)
* Fixes and update doc
* Fix doc build
* Fix breaking change
* Fixes for rendering
* Rename variables in monitor
* update render method for gym 0.26 API
backwards compatible (mode argument is allowed) while using the gym 0.26 API (render mode is determined at environment creation)
* update tests and docs to new gym render API
* undo removal of render modes metatadata check
* set rgb_array as default render mode for gym.make
* undo changes & raise warning if not 'rgb_array'
* Fix type check
* Remove recursion and fix type checking
* Remove hacks for protobuf and gym 0.24
* Fix type annotations
* reuse existing render_mode attribute
* return tiled images for 'human' render mode
* Allow to use opencv for human render, fix typos
* Add warning when using non-zero start with Discrete (fixes#1197)
* Fix type checking
* Bug fixes and handle more cases
* Throw proper warnings
* Update test
* Fix new metadata name
* Ignore numpy warnings
* Fixes in vec recorder
* Global ignore
* Filter local warning too
* Monkey patch not needed for gym 26
* Add doc of VecEnv vs Gym API
* Add render test
* Fix return type
* Update VecEnv vs Gym API doc
* Fix for custom render mode
* Fix return type
* Fix type checking
* check test env test_buffer
* skip render check
* check env test_dict_env
* test_env test_gae
* check envs in remaining tests
* Update tests
* Add warning for Discrete action space with non-zero (#1295)
* Fix atari annotation
* ignore get_action_meanings [attr-defined]
* Fix mypy issues
* Add patch for gym/gymnasium transition
* Switch to gymnasium
* Rely on signature instead of version
* More patches
* Type ignore because of https://github.com/Farama-Foundation/Gymnasium/pull/39
* Fix doc build
* Fix pytype errors
* Fix atari requirement
* Update env checker due to change in dtype for Discrete
* Fix type hint
* Convert spaces for saved models
* Ignore pytype
* Remove gitlab CI
* Disable pytype for convert space
* Fix undefined info
* Fix undefined info
* Upgrade shimmy
* Fix wrappers type annotation (need PR from Gymnasium)
* Fix gymnasium dependency
* Fix dependency declaration
* Cap pygame version for python 3.7
* Point to master branch (v0.28.0)
* Fix: use main not master branch
* Rename done to terminated
* Fix pygame dependency for python 3.7
* Rename gym to gymnasium
* Update Gymnasium
* Fix test
* Fix tests
* Forks don't have access to private variables
* Fix linter warnings
* Update read the doc env
* Fix env checker for GoalEnv
* Fix import
* Update env checker (more info) and fix dtype
* Use micromamab for Docker
* Update dependencies
* Clarify VecEnv doc
* Fix Gymnasium version
* Copy file only after mamba install
* [ci skip] Update docker doc
* Polish code
* Reformat
* Remove deprecated features
* Ignore warning
* Update doc
* Update examples and changelog
* Fix type annotation bundle (SAC, TD3, A2C, PPO, base class) (#1436)
* Fix SAC type hints, improve DQN ones
* Fix A2C and TD3 type hints
* Fix PPO type hints
* Fix on-policy type hints
* Fix base class type annotation, do not use defaults
* Update version
* Disable mypy for python 3.7
* Rename Gym26StepReturn
* Update continuous critic type annotation
* Fix pytype complain
---------
Co-authored-by: Carlos Luis <carlos.luisgonc@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Thomas Lips <37955681+tlpss@users.noreply.github.com>
Co-authored-by: tlips <thomas.lips@ugent.be>
Co-authored-by: tlpss <thomas17.lips@gmail.com>
Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>
* VecExtractDictObs handle terminal_observation
* Added VecExtractDictObs handle terminal_output to changelog
* Update changelog.rst
* Update test_vec_extract_dict_obs.py
Add random dones in env to test if terminal_observation is properly handled
* Made test deterministic
* Fixed bug in test
* Improved test
* Fix format in test
* Update test
* Fix type hint
* Ignore pytype warning
* Ignore pytype
---------
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Rename the observations variable in the evaluation util to avoid shadowing
This enables a callback in evaluate_policy to have access to the
observation vector that is fed to the environment step function,
which is currently shadowed by the output observation.
* Update changelog
* Add test
* Move assignment outside of the loop
---------
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
* Fix type hints for DQN
* [ci skip] Remove commented line
* Refine types
* Fix vectorized obs detection
* Fix for pytype
* Fix check at load time to create replay buffer
* One config file to rule them all
* Delete unused config
---------
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* add instructions for running single tests in the README, add assertions for observation_space
* update changelog
* address linting warnings
* correct pytest command in the README
* correct review comments, run make commit-checks
* truncate lines that are too long
* address make lint warning about checking module availability
* fix tests
* use f-strings for formatting assertion messages
* fix type issue
* Refactor tests, improve error messages
---------
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>