* Update Gymnasium to v1.0.0a1
* Comment out `gymnasium.wrappers.monitor` (todo update to VideoRecord)
* Fix ruff warnings
* Register Atari envs
* Update `getattr` to `Env.get_wrapper_attr`
* Reorder imports
* Fix `seed` order
* Fix collecting `max_steps`
* Copy and paste video recorder to prevent the need to rewrite the vec vide recorder wrapper
* Use `typing.List` rather than list
* Fix env attribute forwarding
* Separate out env attribute collection from its utilisation
* Update for Gymnasium alpha 2
* Remove assert for OrderedDict
* Update setup.py
* Add type: ignore
* Test with Gymnasium main
* Remove `gymnasium.logger.debug/info`
* Fix github CI yaml
* Run gym 0.29.1 on python 3.10
* Update lower bounds
* Integrate video recorder
* Remove ordered dict
* Update changelog
---------
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Update readme and clarify planned features
* Fix rtd python version
* Fix pip version for rtd
* Update rtd ubuntu and mambaforge
* Add upper bound for gymnasium
* [ci skip] Update readme
* Update documentation
Added comment to PPO documentation that CPU should primarily be used unless using CNN as well as sample code. Added warning to user for both PPO and A2C that CPU should be used if the user is running GPU without using a CNN, reference Issue #1245.
* Add warning to base class and add test
---------
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Add np.ndarray as a recognized type for TB histograms.
Torch histograms allow th.Tensor, np.ndarray, and caffe2 formatted strings. This commits expands the TensorBoardOutputFormat's capabilities to log the two former types.
* Update changelog to reflect bug fix
* fix: try/catch for if either np or torch aren't at the required versions. See https://github.com/DLR-RM/stable-baselines3/pull/1635 for more details
* fix: Add comment describing the test for when add_histogram should not have been called
* Cleanup
---------
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Add support for pre and post linear modules in `create_mlp`
* Disable mypy for python 3.8
* Reformat toml file
* Update docstring
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* Add some comments
---------
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* Updated DQN optimizer input to only include q_network parameters
* Update version
---------
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Fixing #1791
* Update test and version
* Add test for callback after eval
* Fix mypy error
* Remove tqdm warnings
---------
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Fix loading a model with net_arch=None
* Remove redundant get
* Dummy commit
* Add to contributors
* Update test and version
---------
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Fix memory leak in base_class.py
Loading the data return value is not necessary since it is unused. Loading the data causes a memory leak through the ep_info_buffer variable. I found this while loading a PPO learner from storage on a multi-GPU system since the ep_info_buffer is loaded to the memory location it was on while it was saved to disk, instead of the target loading location, and is then not cleaned up.
* Update changelog.rst
* Update changelog
---------
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* create failing test for unpickle error
* Fix learning_rate argument causing failure in weights_only=True if passed a function with non-float types
* Updated with feedback from araffin on PR#1901
* Update test and version
* Update changelog and SBX doc
---------
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Add success rate in monitor for on policy algorithms
* Update changelog
* make commit-checks refactoring
* Assert buffers are not none in _dump_logs
* Automatic refactoring of the type hinting
* Add success_rate logging test for on policy algorithms
* Update changelog
* Reformat
* Fix tests and update changelog
---------
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Fix docstring for log_interval inside the learn method in the base class.
* Updated changelog.
* Update docstring
---------
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Fix VecNormalize type hints
* Fix VecEnv utils type annotations
* Apply suggestions from code review
Co-authored-by: M. Ernestus <maximilian@ernestus.de>
* Remove PyType
---------
Co-authored-by: M. Ernestus <maximilian@ernestus.de>