Commit graph

13 commits

Author SHA1 Message Date
Adam Gleave
4fb8aec215
Update evaluate_policy type annotation to support policies as well as RL algorithms (#1146)
* Add PolicyPredictor protocol and use it in evaluate_policy

* Update changelog

* Move Protocol to type_aliases to avoid circular import

* Add test for evaluate_policy on BasePolicy

* Remove unused import

* Use typing_extensions

* Move typing_extensions to 3rd party

* Add version range (typing_extensions uses SemVer)

* Import Protocol from typing_extensions only on Python<3.8

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Install typing_extensions only on Python<3.8

* Add missing sys import

* Fix import ordering

* Fix observation type hint in predict

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>
2022-11-03 15:36:19 +01:00
Antonin RAFFIN
52c29dc497
Fix evaluation script for recurrent policies (#678)
* Fix evaluation script for RNN

* Add error message

* Revert "Add error message"

This reverts commit 8d69b6cf4de2cd13aecfb425bd3145fad6a6c49a.

* Fix for pytype

* Rename mask to `episode_start`

* Fix type hint

* Fix type hints

* Remove confusing part of sentence

Co-authored-by: Anssi <kaneran21@hotmail.com>
2021-11-30 13:49:06 +01:00
Benjamin Black
a038044d11
Added support for vector envs in evaluation (#447)
* added vector env support to evaluate_policy

* fixed linting and documentation

* updated changelog

* fixed code style issue

* added tests for vec env

* fixed formatting

* renamed observations

* added comments for vector evaluation

* fixed issues

* Cleanup + bump version

* Add comment

* Fix wrong count of episodes

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2021-05-28 12:40:29 +02:00
Costa Huang
ddbe0e93f9
Support for VecMonitor for gym3-style environments (#311)
* add vectorized monitor

* auto format of the code

* add documentation and VecExtractDictObs

* refactor and add test cases

* add test cases and format

* avoid circular import and fix doc

* fix type

* fix type

* oops

* Update stable_baselines3/common/monitor.py

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Update stable_baselines3/common/monitor.py

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* add test cases

* update changelog

* fix mutable argument

* quick fix

* Apply suggestions from code review

* fix terminal observation for gym3 envs

* delete comment

* Update doc and bump version

* Add warning when already using `Monitor` wrapper

* Update vecmonitor tests

* Fixes

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-04-13 18:09:31 +02:00
Antonin RAFFIN
c62e9259db
Add custom objects support + bug fix (#336)
* Add support for custom objects

* Add python 3.8 to the CI

* Bump version

* PyType fixes

* [ci skip] Fix typo

* Add note about slow-down + fix typos

* Minor edits to the doc

* Bug fix for DQN

* Update test

* Add test for custom objects
2021-03-06 15:17:43 +02:00
Anssi
18d10dbf42
Use Monitor episode reward/length for evaluate_policy (#220)
* Update evaluate_policy to use monitor data if available

* Update documentation

* Cleaning up

* Remove unnecessary typing trickery

* Update doc

* Rename is_wrapped to clarify it is for vecenvs

* Add is_wrapped for regular envs

* Add is_wrapped call for subprocvecenv and update code for circular imports

* Move new functions back to env_util and fix imports

* Update changelog

* Clarify evaluate_policy docs

* Add tests for wrapped modifying episode lengths

* Fix tests

* Update changelog

* Minor edits

* Add warn switch to evaluate_policy and update tests

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-11-16 11:52:28 +01:00
M. Ernestus
c74509ae9d
Add callable signatures to type annotations. (#215)
* Add callback signature to the learning rate type annotations.

* Add callback signature to the learning rate schedule type annotations.

* Add missing type annotations for learning rate callbacks.

* Add signature to old-style learning and evaluation callbacks.

* Add signature to env wrapper callback.

* Add type annotation to closure function.

* Use MaybeCallback more consistently.

* Update changelog.

* Remove now unused List import.

* Fix import order.

* Add type alias for learning rate schedules.

* Optimize imports.

* Fix messed up import.

* Remove resolved TODO.

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-11-15 17:50:28 +01:00
Antonin RAFFIN
55912576ed
Cleanup docstring types (#169)
* Cleanup docstring types

* Update style

* Test with js hack

* Revert "Test with js hack"

This reverts commit d091f438e8851ab8d01b66628e06a104f5e5ec69.

* Fix types

* Fix typo

* Update CONTRIBUTING example
2020-10-02 20:05:55 +03:00
Antonin RAFFIN
2c924f52f5
Update docs (custom policy, type hints) (#167)
* Change import

* Update custom policy doc

* Re-enable sphinx_autodoc_typehints

* Update docker image

* Attempt to fix read the doc build error

* Add sphinx_autodoc_typehints to read the doc env

* Fix pip version

* Add full custom policy example

* Fix
2020-09-29 20:41:14 +03:00
Antonin RAFFIN
21e9994ff9
Fix double reset and improve typing coverage (#136)
* Fix double reset and improve typing coverage

* Revert minor edit

* Add doc about types
2020-08-05 13:12:02 +03:00
Antonin RAFFIN
23afedb254
Auto-formatting with black and isort (#97)
* Add auto formatting with black and isort

* Reformat code

* Ignore typing errors

* Add note about line length

* Add minimum version for isort

* Add commit-checks

* Update docker image

* Fixed lost import (during last merge)

* Fix opencv dependency
2020-07-16 16:12:16 +02:00
Anssi
44f8218df0
Review of code (A2C, PPO and refactoring) (#35)
* Split torch module code into torch_layers file

* Updated reference to CNN

* Change 'CxWxH' to 'CxHxW', as per common notion

* Fix missing import in policies.py

* Move PPOPolicy to OnlineActorCriticPolicy

* Create OnPolicyRLModel from PPO, and make A2C and PPO inherit

* Update A2C optimizer comment

* Clean weight init scales for clarity

* Fix A2C log_interval default parameter

* Rename 'progress' to 'progress_remaining

* Rename 'Models' to 'Algorithms'

* Rename 'OnlineActorCriticPolicy' to 'ActorCriticPolicy'

* Move static functions out from BaseAlgorithm

* Move on/off_policy base algorithms to their own files

* Add  files for A2C/PPO

* Fix docs

* Fix pytype

* Update documentation on OnPolicyAlgorithm

* Add proper doctstring for on_policy rollout gathering

* Add bit clarification on the mlppolicy/cnnpolicy naming

* Move static function is_vectorized_policies to utils.py

* Checking docstrings, pep8 fixes

* Update changelog

* Clean changelog

* Remove policy warnings for sac/td3

* Add monitor_wrapper for OnPolicyAlgorithm. Clean tb logging variables. Add parameter keywords to OffPolicyAlgorithm super init

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-06-09 13:54:18 +02:00
Antonin RAFFIN
d542732c8d Rename to stable-baselines3 2020-05-05 15:02:35 +02:00
Renamed from torchy_baselines/common/evaluation.py (Browse further)