Commit graph

68 commits

Author SHA1 Message Date
Quentin Gallouédec
69fdf155e1
Downgrade sphinx-autodoc-typehints (#1291)
* Update setup.py

* black

* hotfix pytype
2023-01-23 10:56:45 +01:00
Quentin Gallouédec
9aff1137a9
Add support for Python 3.10 (#1227)
* Add python 3.10 and 3.11

* Update setup

* Fix CI

* Drop 3.11 (because of pytorch)

* Update changelog

* revert unwanted change in setup.cfg

* Remove remark about pytorch
2022-12-21 15:52:48 +01:00
Antonin Raffin
213b06b0c6
Monkey-patch np.bool = bool 2022-12-19 13:20:48 +01:00
Quentin Gallouédec
68a40e0940
Construct tensors directly on GPU (#1218)
* Replace .to(device) when possible

* fix numpy dep

* black

* Add warning for device != cpu and copy=False

* Update changelog

* Remove warning

* Update buffers.py
2022-12-19 12:50:22 +01:00
Quentin Gallouédec
e3b24829a5
Drop gym.GoalEnv and other minor changes initally from #780 (#1184)
* Various changes from #780

* Fix env_checker for goal_env detection
2022-11-28 18:22:31 +01:00
Quentin Gallouédec
abffa16198
Mypy type checking (#1143)
* Install and configure mypy

* Test if github CI uses setup.cfg for mypy

* force color output

* tab to space

* Try to fix regex

* follow_imports silent

* use space as indentation

* fix indentation setup.cfg

* Show error code

* Update doc

* Udate changelog

* Ignore mypy cache files from commit

* Update gitlab CI

* Add pytype and mypy entry in Makefile

* Make mypy happy

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-11-16 13:22:57 +01:00
Adam Gleave
4fb8aec215
Update evaluate_policy type annotation to support policies as well as RL algorithms (#1146)
* Add PolicyPredictor protocol and use it in evaluate_policy

* Update changelog

* Move Protocol to type_aliases to avoid circular import

* Add test for evaluate_policy on BasePolicy

* Remove unused import

* Use typing_extensions

* Move typing_extensions to 3rd party

* Add version range (typing_extensions uses SemVer)

* Import Protocol from typing_extensions only on Python<3.8

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Install typing_extensions only on Python<3.8

* Add missing sys import

* Fix import ordering

* Fix observation type hint in predict

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>
2022-11-03 15:36:19 +01:00
Antonin RAFFIN
e2f81bb70b
Release v1.6.2 (#1103)
* Release v1.6.2

* Remove Gitlab CI, no more minutes
2022-10-10 16:37:11 +02:00
Antonin RAFFIN
7c21b79188
Add progress bar callback and argument (#1095)
* Add progress bar callback and argument

* Update doc

* Update changelog

* Upgrade pytype in docker image

* Use tqdm.write in the logger to have cleaner output

* Fix logger test

* Fix when doing multiple calls to learn()

* Address comments from code-review
2022-10-06 18:17:31 +02:00
Quentin Gallouédec
d3eb0e3ed6
Fix importlib dependency (#1088)
* Set requirement ``importlib-metadata~=4.13``

* Update changelog
2022-10-03 12:03:51 +02:00
Timothé
01cc127d32
Support hparams logging to tensorboard (#984)
* create Hparam class & support in all OutputFormats

* add hparams documentation & example

* add hparam tests

* remove unnecessary test & fix name

* format changes

* support hyperparameters logging to tensorboard

* fix HParams class docstring

* use more explicit variable names

* raise error instead of warning

* Unpin protobuf

* Add test for logging hparams

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-08-22 22:06:54 +02:00
Antonin RAFFIN
c1f1c3d3d7
Release v1.6.0 (#958)
* Release v1.6.0 + update doc + add copy button

* Update read the doc conda env

* Update year

* Fix bug in kl divergence check

* Rephrase requirement for envpool and isaac gym
2022-07-12 22:50:23 +02:00
Antonin RAFFIN
4b89fbf283
Fix issues due to newer version of protobuf and sphinx (#924) 2022-05-29 21:09:50 +02:00
Antonin RAFFIN
a6f5049a99
Upgrade code to Python 3.7+ syntax using pyupgrade (#887)
* Upgrade code to Python 3.7+ syntax

* Update changelog
2022-04-25 13:01:38 +03:00
Bryan Collazo
3c468ff558
Update ppo documentation (remove redundant and) (#874)
* Update ppo documentation (remove redundant and)

PTAL, thanks!

* Update changelog

* Pin ale-py version

Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2022-04-19 14:15:51 +02:00
Antonin RAFFIN
248f082cdc
Bump min PyTorch version (#855) 2022-04-11 18:34:15 +02:00
Antonin RAFFIN
7ce4bb8016
Pin gym version (#782)
* Pin gym version

* Cleanup warnings

* Reformat
2022-02-21 23:12:54 +01:00
Carlos Luis
5143cd19f7
Gym fixes - Follow up from #705 (#734)
* fix Atari in CI

* fix dtype and atari extra

* Update setup.py

* remove 3.6

* note about how to install Atari

* pendulum-v1

* atari v5

* black

* fix pendulum capitalization

* add minimum version

* moved things in changelog to breaking changes

* partial v5 fix

* env update to pass tests

* mismatch env version fixed

* Fix tests after merge

* Include autorom in setup.py

* Blacken code

* Fix dtype issue in more robust way

* Fix GitLab CI: switch to Docker container with new black version

* Remove workaround from GitLab. (May need to rebuild Docker for this though.)

* Revert to v4

* Update setup.py

* Apply suggestions from code review

* Remove unnecessary autorom

* Consistent gym versions

Co-authored-by: J K Terry <justinkterry@gmail.com>
Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: modanesh <mohamad4danesh@gmail.com>
Co-authored-by: Adam Gleave <adam@gleave.me>
2022-02-04 15:13:57 -08:00
IperGiove
d9e198e04f
Update custom_policy.rst (#711)
* Update custom_policy.rst

Added methods forward_actor and forward_critic in CustomNetwork class.

* Update doc

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-01-03 16:22:58 +01:00
Antonin RAFFIN
77f4f5021d
Drop Python 3.6 support (#685)
* Drop python 3.6 support

* Update doc

* Update gitlab CI

* Update doc env

* Fix gitlab CI
2021-12-06 12:54:43 +01:00
Antonin RAFFIN
e907eca18e
Fix set_env to keep the number of timesteps (#615)
* Fix for `set_env`

* Add test and update changelog

* Use underscores and f-strings

* Add PyPi info

* Update comments
2021-10-23 16:36:40 +02:00
Cyprien
f3a35aa786
Add method predict_values for ActorCriticPolicy (#569)
* feat: add method predict_values for ActorCriticPolicy

* Fixes for new gym version

* Reformat

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-09-15 14:03:04 +02:00
Scott Brownlie
1afc2f3abe
Avoid putting target networks into training mode (#553)
* make sure DQN policy is always in correct mode - train or eval

* make set_training_mode an abstract method of the base policy - safer

* update docstring of _build method to note that the target network is put into eval mode

* use set_training_mode to put the dqn target network into eval mode

* use set_training_mode to set the training model of the q-network

* move set_training_mode abstract method from BasePolicy to BaseModel

* set train and eval mode for TD3

* make sure critic is always in correct mode during train

* set train and eval mode for SAC

* add comment re batch norm and dropout

* set train and eval mode for A2C and PPO

* add tests for collect rollouts with batch norm

* fix formatting

* update change log

* update version

* remove Optional typing for batch size - causing type check to fail

* Fix scipy dependency for toy text envs

* implement set_training_mode method in BaseModel

* move all tests of train/eval mode to test_train_eval_mode

* call learn with learning_starts = total_timesteps to test that collect_rollouts does not update batch norm

* remove extra calls to set_training_mode in train method of TD3 and SAC

* Allow gradient_steps=0

* Refactor tests

* Add comment + use aliases

* Typos

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-08-30 17:42:41 +02:00
Antonin RAFFIN
2fa06ae8d2
Add Python3.9 CI + upgrade min PyTorch version (#503)
* Add Python3.9 CI + upgrade min PyTorch version

* Upgrade min PyTorch version
2021-07-06 09:32:03 +02:00
Antonin RAFFIN
8a08078ea2
Fix default arguments + add bugbear (#363)
* Fix potential bug + add bug bear

* Remove unused variables

* Minor: version bump
2021-03-25 11:35:21 +02:00
Antonin RAFFIN
944dfdafe4
Update doc: SB3-Contrib (#267)
* Fix big when saving/loading q-net alone

* Rename variables to match SB3-contrib

* Update docker image

* Set min version for tensorboard

* Add SB3-Contrib to doc

* Update DQN

* Apply suggestions from code review

Co-authored-by: Adam Gleave <adam@gleave.me>

* Update wording

Co-authored-by: Adam Gleave <adam@gleave.me>
2020-12-21 16:17:24 +01:00
Antonin RAFFIN
2c924f52f5
Update docs (custom policy, type hints) (#167)
* Change import

* Update custom policy doc

* Re-enable sphinx_autodoc_typehints

* Update docker image

* Attempt to fix read the doc build error

* Add sphinx_autodoc_typehints to read the doc env

* Fix pip version

* Add full custom policy example

* Fix
2020-09-29 20:41:14 +03:00
Antonin RAFFIN
23afedb254
Auto-formatting with black and isort (#97)
* Add auto formatting with black and isort

* Reformat code

* Ignore typing errors

* Add note about line length

* Add minimum version for isort

* Add commit-checks

* Update docker image

* Fixed lost import (during last merge)

* Fix opencv dependency
2020-07-16 16:12:16 +02:00
Noah
96b771f24e
Implement DQN (#28)
* Created DQN template according to the paper.
Next steps:
- Create Policy
- Complete Training
- Debug

* Changed Base Class

* refactor save, to be consistence with overriding the excluded_save_params function. Do not try to exclude the parameters twice.

* Added simple DQN policy

* Finished learn and train function
- missing correct loss computation

* changed collect_rollouts to work with discrete space

* moved discrete space collect_rollouts to dqn

* basic dqn working

* deleted SDE related code

* added gradient clipping and moved greedy policy to policy

* changed policy to implement target network
and added soft update(in fact standart tau is 1 so hard update)

* fixed policy setup

* rebase target_update_intervall on _n_updates

* adapted all tests
all tests passing

* Move to stable-baseline3

* Fixes for DQN

* Fix tests + add CNNPolicy

* Allow any optimizer for DQN

* added some util functions to create a arbitrary linear schedule, fixed pickle problem with old exploration schedule

* more documentation

* changed buffer dtype

* refactor and document

* Added Sphinx Documentation
Updated changelog.rst

* removed custom collect_rollouts as it is no longer necessary

* Implemented suggestions to clean code and documentation.

* extracted some functions on tests to reduce duplicated code

* added support for exploration_fraction

* Fixed exploration_fraction

* Added documentation

* Fixed get_linear_fn -> proper progress scaling

* Merged master

* Added nature reference

* Changed default parameters to https://www.nature.com/articles/nature14236/tables/1

* Fixed n_updates to be incremented correctly

* Correct train_freq

* Doc update

* added special parameter for DQN in tests

* different fix for test_discrete

* Update docs/modules/dqn.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Update docs/modules/dqn.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Update docs/modules/dqn.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Added RMSProp in optimizer_kwargs, as described in nature paper

* Exploration fraction is inverse of 50.000.000 (total frames) / 1.000.000 (frames with linear schedule) according to nature paper

* Changelog update for buffer dtype

* standard exlude parameters should be always excluded to assure proper saving only if intentionally included by ``include`` parameter

* slightly more iterations on test_discrete to pass the test

* added param use_rms_prop instead of mutable default argument

* forgot alpha

* using huber loss, adam and learning rate 1e-4

* account for train_freq in update_target_network

* Added memory check for both buffers

* Doc updated for buffer allocation

* Added psutil Requirement

* Adapted test_identity.py

* Fixes with new SB3 version

* Fix for tensorboard name

* Convert assert to warning and fix tests

* Refactor off-policy algorithms

* Fixes

* test: remove next_obs in replay buffer

* Update changelog

* Fix tests and use tmp_path where possible

* Fix sampling bug in buffer

* Do not store next obs on episode termination

* Fix replay buffer sampling

* Update comment

* moved epsilon from policy to model

* Update predict method

* Update atari wrappers to match SB2

* Minor edit in the buffers

* Update changelog

* Merge branch 'master' into dqn

* Update DQN to new structure

* Fix tests and remove hardcoded path

* Fix for DQN

* Disable memory efficient replay buffer by default

* Fix docstring

* Add tests for memory efficient buffer

* Update changelog

* Split collect rollout

* Move target update outside `train()` for DQN

* Update changelog

* Update linear schedule doc

* Cleanup DQN code

* Minor edit

* Update version and docker images

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-06-29 11:16:54 +02:00
Roland Gavrilescu
bb01253261
Tensorboard integration (#30)
* init commit tensorboard-integration

* Added tb logger to ppo (with output exclusions)

* fixed truncated stdout

* categorize stdout outputs by tag

* separated exclusions from values, added missing logs

* saving exclusions as dict instead of list

* reformatting, auto run indexing

* included renaming suggestions, fixed tests

* tb support for sac

* linting

* moved logging to base class

* tb support for td3

* removed histograms, non-verbose output working

* modifed changelog

* linting

* fixed type error

* moved logger config to utils

* removed episode_rewards log from ppo

* Enable tensorboard in tests

* Remove unused import

* Update logger sub titles

* Minor edit for PPO

* Update logger and tb log folder

* Pass correct logger to Callbacks

* updated docs

* added tb example image to docs

* add support for continuing training in tensorboard

* added tensorboard to docs index

* added tb test

* moved logger config to _setup_learn, updated tests

* accessing verbose from base class

* Update doc and tests

* Rename session -> time

* Update version

* Update logger truncate

* Update types

* Remove duplicated code

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-06-01 11:55:44 +02:00
Antonin RAFFIN
54f6f5b6fb
Add flake8 linter and Github CI (#19)
* Cleanup code

* Add flake8 lint and github workflow

* Update build matrix

* Relax precision for python3.7
2020-05-12 17:55:01 +02:00
Antonin RAFFIN
299c28140e
Use conda env for building the doc (#16) 2020-05-11 17:39:41 +02:00
Antonin RAFFIN
b1794ebc52 [ci skip] Simplify quickstart example 2020-05-11 15:32:01 +02:00
Antonin RAFFIN
97aea21349 Update minimum gym version 2020-05-08 12:43:42 +02:00
Antonin RAFFIN
26981f1247 Build the doc 2020-05-07 17:35:29 +02:00
Antonin RAFFIN
e6ff4bbd6c Update setup 2020-05-07 16:24:19 +02:00
Antonin RAFFIN
aa66012764 Update requirements 2020-05-07 16:21:33 +02:00
Antonin RAFFIN
8046a24719 More doc + sync VecEnvs + atari 2020-05-07 16:08:23 +02:00
Antonin RAFFIN
73afaf157c Add version.txt to package 2020-05-07 12:19:29 +02:00
Antonin RAFFIN
d17f29c8ad Add base doc 2020-05-07 10:10:51 +02:00
Antonin RAFFIN
94b1267817 Update long description 2020-05-07 08:59:54 +02:00
Antonin RAFFIN
2c34a4d694 Sync with Stable-Baselines 2020-05-05 16:28:38 +02:00
Antonin RAFFIN
d542732c8d Rename to stable-baselines3 2020-05-05 15:02:35 +02:00
Antonin RAFFIN
9485b90a41 Sync predict with SB and add version file 2020-03-18 15:11:19 +01:00
Antonin RAFFIN
b37c23c149 Bump version and fix 2020-03-16 14:05:21 +01:00
Antonin Raffin
70e601c03c Improve code and bump version 2020-03-12 15:34:35 +01:00
Antonin Raffin
6ebad92e1b Remove default seed and bump dependencies 2020-03-10 17:43:54 +01:00
Antonin Raffin
80fb62e22d Bump version 2020-03-10 17:10:15 +01:00
Antonin Raffin
26ccf499b3 Use normal sampling for SAC 2020-02-21 14:50:28 +01:00
Antonin Raffin
809a3d3d38 Release 0.2.0 2020-02-14 14:39:24 +01:00