* Updated DQN optimizer input to only include q_network parameters
* Update version
---------
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Fixing #1791
* Update test and version
* Add test for callback after eval
* Fix mypy error
* Remove tqdm warnings
---------
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Fix loading a model with net_arch=None
* Remove redundant get
* Dummy commit
* Add to contributors
* Update test and version
---------
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Fix memory leak in base_class.py
Loading the data return value is not necessary since it is unused. Loading the data causes a memory leak through the ep_info_buffer variable. I found this while loading a PPO learner from storage on a multi-GPU system since the ep_info_buffer is loaded to the memory location it was on while it was saved to disk, instead of the target loading location, and is then not cleaned up.
* Update changelog.rst
* Update changelog
---------
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* create failing test for unpickle error
* Fix learning_rate argument causing failure in weights_only=True if passed a function with non-float types
* Updated with feedback from araffin on PR#1901
* Update test and version
* Update changelog and SBX doc
---------
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Add success rate in monitor for on policy algorithms
* Update changelog
* make commit-checks refactoring
* Assert buffers are not none in _dump_logs
* Automatic refactoring of the type hinting
* Add success_rate logging test for on policy algorithms
* Update changelog
* Reformat
* Fix tests and update changelog
---------
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Fix docstring for log_interval inside the learn method in the base class.
* Updated changelog.
* Update docstring
---------
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Fix VecNormalize type hints
* Fix VecEnv utils type annotations
* Apply suggestions from code review
Co-authored-by: M. Ernestus <maximilian@ernestus.de>
* Remove PyType
---------
Co-authored-by: M. Ernestus <maximilian@ernestus.de>
* Add rollout_buffer_class and rollout_buffer_kwargs parameters to OnPolicyAlgorithm
* Add rollout_buffer_class and rollout_buffer_kwargs to PPO.
* Add rollout_buffer_class and rollout_buffer_kwargs to A2C.
* Make use of the rollout buffer kwargs.
* Update version
* Add test and update doc
---------
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
* Update signatures, and test with options
* Update changelog and black formatting
* Finish implementation (fixes, doc, tests)
* Use deepcopy to avoid side effects (modif by reference)
* Fix for mypy
---------
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* fix: Follow PEP8 guidelines and evaluate falsy to truth with `not` rather than `is False`.
https://docs.python.org/2/library/stdtypes.html#truth-value-testing
* chore: Update changelog inline with intent of changes in PR #1707
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* fix: Change `is False` to `not` as per PEP8
* chore: Remove superfluous comment about `is False`
* test: One On- and one Off-Policy algorithm (A2C and SAC respectively), with settings to speed up testing
* Update changelog
* chore: Remove EvalCallback as it's not actually required
* Update changelog.rst
* Rm duplicated "others" section in changelog.rst
---------
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>