Commit graph

32 commits

Author SHA1 Message Date
Antonin RAFFIN
b2c94a677d
Fix train_freq at load time (#332)
* Fix train_freq loading

* Update docker

* Add sanity checks + tests for train freq
2021-02-27 19:53:13 +01:00
Antonin RAFFIN
0fc0dd1b21
Fix off policy features extractor (#198)
* Faster tests

* Fix feature extractor bug + add check

* Add missing check

* Allow TD3 features extractor to be separate

* Add share features extractor option for SAC

* Bug fixes

* Apply suggestions from code review

Co-authored-by: Adam Gleave <adam@gleave.me>

Co-authored-by: Adam Gleave <adam@gleave.me>
2020-10-27 14:24:59 +01:00
Antonin RAFFIN
2599f04940
Add custom arch for off-policy actor/critic networks (#182)
* Add custom arch for off-policy actor/critic networks

* Fix type hints

* Address comments

* Make sure number of updated parameters match in polyak

* Add zip_strict for strict-length zipping

* Fix building docs

* Add test for zip strict

* Faster tests

Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>
2020-10-13 12:01:33 +02:00
Antonin RAFFIN
23afedb254
Auto-formatting with black and isort (#97)
* Add auto formatting with black and isort

* Reformat code

* Ignore typing errors

* Add note about line length

* Add minimum version for isort

* Add commit-checks

* Update docker image

* Fixed lost import (during last merge)

* Fix opencv dependency
2020-07-16 16:12:16 +02:00
Antonin RAFFIN
5ff176b2f1
Implement DDPG (#92)
* Add DDPG + TD3 with any number of critics

* Allow any number of critics for SAC

* Update doc

* [ci skip] Update DDPG example

* Remove unused parameter

* Add DDPG to identity test

* Fix computation with n_critics=1,3

* Update doc

* Apply suggestions from code review

Co-authored-by: Adam Gleave <adam@gleave.me>

* Update docstrings for off-policy algos

* Add check for sde

Co-authored-by: Adam Gleave <adam@gleave.me>
2020-07-16 14:14:22 +02:00
Noah
96b771f24e
Implement DQN (#28)
* Created DQN template according to the paper.
Next steps:
- Create Policy
- Complete Training
- Debug

* Changed Base Class

* refactor save, to be consistence with overriding the excluded_save_params function. Do not try to exclude the parameters twice.

* Added simple DQN policy

* Finished learn and train function
- missing correct loss computation

* changed collect_rollouts to work with discrete space

* moved discrete space collect_rollouts to dqn

* basic dqn working

* deleted SDE related code

* added gradient clipping and moved greedy policy to policy

* changed policy to implement target network
and added soft update(in fact standart tau is 1 so hard update)

* fixed policy setup

* rebase target_update_intervall on _n_updates

* adapted all tests
all tests passing

* Move to stable-baseline3

* Fixes for DQN

* Fix tests + add CNNPolicy

* Allow any optimizer for DQN

* added some util functions to create a arbitrary linear schedule, fixed pickle problem with old exploration schedule

* more documentation

* changed buffer dtype

* refactor and document

* Added Sphinx Documentation
Updated changelog.rst

* removed custom collect_rollouts as it is no longer necessary

* Implemented suggestions to clean code and documentation.

* extracted some functions on tests to reduce duplicated code

* added support for exploration_fraction

* Fixed exploration_fraction

* Added documentation

* Fixed get_linear_fn -> proper progress scaling

* Merged master

* Added nature reference

* Changed default parameters to https://www.nature.com/articles/nature14236/tables/1

* Fixed n_updates to be incremented correctly

* Correct train_freq

* Doc update

* added special parameter for DQN in tests

* different fix for test_discrete

* Update docs/modules/dqn.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Update docs/modules/dqn.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Update docs/modules/dqn.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Added RMSProp in optimizer_kwargs, as described in nature paper

* Exploration fraction is inverse of 50.000.000 (total frames) / 1.000.000 (frames with linear schedule) according to nature paper

* Changelog update for buffer dtype

* standard exlude parameters should be always excluded to assure proper saving only if intentionally included by ``include`` parameter

* slightly more iterations on test_discrete to pass the test

* added param use_rms_prop instead of mutable default argument

* forgot alpha

* using huber loss, adam and learning rate 1e-4

* account for train_freq in update_target_network

* Added memory check for both buffers

* Doc updated for buffer allocation

* Added psutil Requirement

* Adapted test_identity.py

* Fixes with new SB3 version

* Fix for tensorboard name

* Convert assert to warning and fix tests

* Refactor off-policy algorithms

* Fixes

* test: remove next_obs in replay buffer

* Update changelog

* Fix tests and use tmp_path where possible

* Fix sampling bug in buffer

* Do not store next obs on episode termination

* Fix replay buffer sampling

* Update comment

* moved epsilon from policy to model

* Update predict method

* Update atari wrappers to match SB2

* Minor edit in the buffers

* Update changelog

* Merge branch 'master' into dqn

* Update DQN to new structure

* Fix tests and remove hardcoded path

* Fix for DQN

* Disable memory efficient replay buffer by default

* Fix docstring

* Add tests for memory efficient buffer

* Update changelog

* Split collect rollout

* Move target update outside `train()` for DQN

* Update changelog

* Update linear schedule doc

* Cleanup DQN code

* Minor edit

* Update version and docker images

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-06-29 11:16:54 +02:00
Anssi
b833207142
Add some missing tests, update VecNormalize and RolloutBuffer (#50)
* Change saving/loading normalization parameters to use single pickle file

* Remove 'use_gae' from RolloutBuffer compute_returns function

* Add some missing tests for normalizer, nan-checker and PPO clip_value_fn argument

* Update changelog

* Fix typo

* Use proper pytest.raises for catching errors in tests

* Add comment on GAE and how to obtain non-GAE behaviour

* Remove save/load_running_average from VecNormalize in favor of load/save

* Update changelog

* Update docstring

* Add accidentally removed tests for VecNormalize

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-06-10 12:09:04 +02:00
Antonin RAFFIN
d542732c8d Rename to stable-baselines3 2020-05-05 15:02:35 +02:00
Antonin RAFFIN
7ae54206ce Reformat and code cleanup 2020-04-23 15:18:21 +02:00
Antonin RAFFIN
dcb54b5301 Remove CEMRL 2020-03-23 14:48:38 +01:00
Antonin Raffin
b64873ffff Sync callbacks 2020-03-12 12:34:25 +01:00
Antonin Raffin
18f38f8cf5 Reformat 2020-03-12 11:12:10 +01:00
Antonin Raffin
e31b139c47 Add test for predict method 2020-02-14 14:03:41 +01:00
Antonin Raffin
b66003cfb3 Add callback support 2020-01-27 14:32:31 +01:00
Antonin Raffin
0117cc37f4 Merge branch 'master' into feat/sde-features 2019-12-05 16:33:41 +01:00
Antonin Raffin
21e655ecbf Add test for SAC with different entropy temperature 2019-12-02 11:47:52 +01:00
Noah Dormann
cfb822aa91 Corrected test_run.py 2019-11-21 16:54:30 +01:00
Noah Dormann
17f84053b3 save implementation for a2c needed before uncommenting save and load test in test_run.py::test_onpolicy 2019-11-21 14:44:02 +01:00
Noah Dormann
fb5f192fc4 Implemented Changes suggested from Antonin-Raffin
Added Optimizer saving
2019-11-21 14:39:44 +01:00
Noah Dormann
a7655ca6e1 Reformated every file with PEP 8 errors 2019-11-21 13:01:03 +01:00
Noah Dormann
cc744a48b5 first save and load features 2019-11-12 17:03:57 +01:00
Antonin Raffin
0ad743c85d Add A2C 2019-10-25 10:59:15 +02:00
Antonin Raffin
ef50bb81e8 Add support for categorical distribution 2019-10-08 13:06:38 +02:00
Antonin Raffin
37ab9d10f1 Rescale actions and add action noise 2019-10-07 16:26:03 +02:00
Antonin Raffin
32648d9029 Add docstrings 2019-09-24 15:30:58 +02:00
Antonin Raffin
d22caac616 Working SAC 2019-09-24 14:15:12 +02:00
Antonin RAFFIN
2469ff3859 Reformat 2019-09-21 17:17:09 +02:00
Antonin Raffin
a9b8276efb Attempt to fix loss of perf because of VecEnvs 2019-09-20 18:06:08 +02:00
Antonin Raffin
0e727a5f72 Full compat for VecEnv + bug fixes for cuda 2019-09-20 16:43:19 +02:00
Antonin RAFFIN
fe8b415cbf First sign of life 2019-09-19 16:21:28 +02:00
Antonin RAFFIN
e1c1d5c4ab Bug fixes (not working yet) 2019-09-18 22:12:32 +02:00
Antonin RAFFIN
6bb7e183d2 Running PPO (not working yet) 2019-09-18 15:35:17 +02:00