Commit graph

39 commits

Author SHA1 Message Date
Antonin RAFFIN
508f8ffd59
Remove deprecated features and attributes (#1104)
* Remove deprecated eval env

* Remove deprecated ret attribute

* Remove sde net arch

* Remove unused code

* Update test comment

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2022-10-11 10:55:16 +02:00
tobirohrer
d8a430e088
Deprecate create_eval_env, eval_env and eval_freq parameter (#1082)
* Adds deprecation warning if `eval_env` or `eval_freq` parameters are used. See #925

* added changelog entry

* added missing backtick

* deprecating `create_eval_env` parameter as well and adding comments to explain the `stacklevel` parameter used

* Updated tests to ignore DeprecationWarnings

* Updated changelog entry

* - Removed the `create_eval_env` parameter from the examples in the docs
- Removed information about the `create_eval_env` parameter from the migration docs
- Added information about deprecation of the `create_eval_env` parameter in the docs

* Add alternative in docstring

* Update docstrings

* `eval_freq` warning in docstring

* Add deprecation comments in tests

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>
2022-10-10 15:39:38 +02:00
Hugh Perkins
2cc1477fa2
Fix advantage normalization with mini-batchsize of 1 (#1028)
* fix nan in advnatages with batch size 1, for ppo

* changelog

* black

* Simplify test

* Bump version

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-08-25 11:50:08 +02:00
Adam Gleave
b1cc15970a
Use higher resolution time_ns() and avoid division by zero (#979)
* Use higher resolution time and round up to eps

* Update changelog

* Add test case

* Fix formatting, time()->time_ns

* Bugfix: ns is integer not float

* Move test to better place

* Divide by 1e9 earlier
2022-07-25 23:02:53 +02:00
Costa Huang
d2ebd2eeaa
Allow PPO to turn off advantage normalization (#763)
* Allow PPO to turn of advantage normalization

* update changelog

* Add a test case

* Update test and sanity check

* Fix tests

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-02-22 15:29:21 +01:00
Carlos Luis
5143cd19f7
Gym fixes - Follow up from #705 (#734)
* fix Atari in CI

* fix dtype and atari extra

* Update setup.py

* remove 3.6

* note about how to install Atari

* pendulum-v1

* atari v5

* black

* fix pendulum capitalization

* add minimum version

* moved things in changelog to breaking changes

* partial v5 fix

* env update to pass tests

* mismatch env version fixed

* Fix tests after merge

* Include autorom in setup.py

* Blacken code

* Fix dtype issue in more robust way

* Fix GitLab CI: switch to Docker container with new black version

* Remove workaround from GitLab. (May need to rebuild Docker for this though.)

* Revert to v4

* Update setup.py

* Apply suggestions from code review

* Remove unnecessary autorom

* Consistent gym versions

Co-authored-by: J K Terry <justinkterry@gmail.com>
Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: modanesh <mohamad4danesh@gmail.com>
Co-authored-by: Adam Gleave <adam@gleave.me>
2022-02-04 15:13:57 -08:00
Antonin RAFFIN
507ed1762e
Multiprocessing support for off policy algorithms (#439)
* Add multi-env training support for SAC

* Fix for dict obs

* Pytype fixes

* Fix assert on number of envs

* Remove for loop

* Add support for Dict obs

* Start cleanup

* Update doc and bug fix

* Add support for vectorized action noise
and add multi env example for off-policy

* Update version

* Bug fix with VecNormalize

* Update README table

* Update variable names

* Update changelog and version

* Update doc and fix for `gradient_steps=-1`

* Add test for `gradient_steps=-1`

* Disable pytype pyi errors

* Fix for DQN

* Update comment on deepcopy

* Remove episode_reward field

* Fix RolloutReturn

* Avoid modification by reference

* Fix error message

Co-authored-by: Anssi <kaneran21@hotmail.com>
2021-12-01 22:30:09 +01:00
Antonin RAFFIN
b2c94a677d
Fix train_freq at load time (#332)
* Fix train_freq loading

* Update docker

* Add sanity checks + tests for train freq
2021-02-27 19:53:13 +01:00
Antonin RAFFIN
0fc0dd1b21
Fix off policy features extractor (#198)
* Faster tests

* Fix feature extractor bug + add check

* Add missing check

* Allow TD3 features extractor to be separate

* Add share features extractor option for SAC

* Bug fixes

* Apply suggestions from code review

Co-authored-by: Adam Gleave <adam@gleave.me>

Co-authored-by: Adam Gleave <adam@gleave.me>
2020-10-27 14:24:59 +01:00
Antonin RAFFIN
2599f04940
Add custom arch for off-policy actor/critic networks (#182)
* Add custom arch for off-policy actor/critic networks

* Fix type hints

* Address comments

* Make sure number of updated parameters match in polyak

* Add zip_strict for strict-length zipping

* Fix building docs

* Add test for zip strict

* Faster tests

Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>
2020-10-13 12:01:33 +02:00
Antonin RAFFIN
23afedb254
Auto-formatting with black and isort (#97)
* Add auto formatting with black and isort

* Reformat code

* Ignore typing errors

* Add note about line length

* Add minimum version for isort

* Add commit-checks

* Update docker image

* Fixed lost import (during last merge)

* Fix opencv dependency
2020-07-16 16:12:16 +02:00
Antonin RAFFIN
5ff176b2f1
Implement DDPG (#92)
* Add DDPG + TD3 with any number of critics

* Allow any number of critics for SAC

* Update doc

* [ci skip] Update DDPG example

* Remove unused parameter

* Add DDPG to identity test

* Fix computation with n_critics=1,3

* Update doc

* Apply suggestions from code review

Co-authored-by: Adam Gleave <adam@gleave.me>

* Update docstrings for off-policy algos

* Add check for sde

Co-authored-by: Adam Gleave <adam@gleave.me>
2020-07-16 14:14:22 +02:00
Noah
96b771f24e
Implement DQN (#28)
* Created DQN template according to the paper.
Next steps:
- Create Policy
- Complete Training
- Debug

* Changed Base Class

* refactor save, to be consistence with overriding the excluded_save_params function. Do not try to exclude the parameters twice.

* Added simple DQN policy

* Finished learn and train function
- missing correct loss computation

* changed collect_rollouts to work with discrete space

* moved discrete space collect_rollouts to dqn

* basic dqn working

* deleted SDE related code

* added gradient clipping and moved greedy policy to policy

* changed policy to implement target network
and added soft update(in fact standart tau is 1 so hard update)

* fixed policy setup

* rebase target_update_intervall on _n_updates

* adapted all tests
all tests passing

* Move to stable-baseline3

* Fixes for DQN

* Fix tests + add CNNPolicy

* Allow any optimizer for DQN

* added some util functions to create a arbitrary linear schedule, fixed pickle problem with old exploration schedule

* more documentation

* changed buffer dtype

* refactor and document

* Added Sphinx Documentation
Updated changelog.rst

* removed custom collect_rollouts as it is no longer necessary

* Implemented suggestions to clean code and documentation.

* extracted some functions on tests to reduce duplicated code

* added support for exploration_fraction

* Fixed exploration_fraction

* Added documentation

* Fixed get_linear_fn -> proper progress scaling

* Merged master

* Added nature reference

* Changed default parameters to https://www.nature.com/articles/nature14236/tables/1

* Fixed n_updates to be incremented correctly

* Correct train_freq

* Doc update

* added special parameter for DQN in tests

* different fix for test_discrete

* Update docs/modules/dqn.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Update docs/modules/dqn.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Update docs/modules/dqn.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Added RMSProp in optimizer_kwargs, as described in nature paper

* Exploration fraction is inverse of 50.000.000 (total frames) / 1.000.000 (frames with linear schedule) according to nature paper

* Changelog update for buffer dtype

* standard exlude parameters should be always excluded to assure proper saving only if intentionally included by ``include`` parameter

* slightly more iterations on test_discrete to pass the test

* added param use_rms_prop instead of mutable default argument

* forgot alpha

* using huber loss, adam and learning rate 1e-4

* account for train_freq in update_target_network

* Added memory check for both buffers

* Doc updated for buffer allocation

* Added psutil Requirement

* Adapted test_identity.py

* Fixes with new SB3 version

* Fix for tensorboard name

* Convert assert to warning and fix tests

* Refactor off-policy algorithms

* Fixes

* test: remove next_obs in replay buffer

* Update changelog

* Fix tests and use tmp_path where possible

* Fix sampling bug in buffer

* Do not store next obs on episode termination

* Fix replay buffer sampling

* Update comment

* moved epsilon from policy to model

* Update predict method

* Update atari wrappers to match SB2

* Minor edit in the buffers

* Update changelog

* Merge branch 'master' into dqn

* Update DQN to new structure

* Fix tests and remove hardcoded path

* Fix for DQN

* Disable memory efficient replay buffer by default

* Fix docstring

* Add tests for memory efficient buffer

* Update changelog

* Split collect rollout

* Move target update outside `train()` for DQN

* Update changelog

* Update linear schedule doc

* Cleanup DQN code

* Minor edit

* Update version and docker images

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-06-29 11:16:54 +02:00
Anssi
b833207142
Add some missing tests, update VecNormalize and RolloutBuffer (#50)
* Change saving/loading normalization parameters to use single pickle file

* Remove 'use_gae' from RolloutBuffer compute_returns function

* Add some missing tests for normalizer, nan-checker and PPO clip_value_fn argument

* Update changelog

* Fix typo

* Use proper pytest.raises for catching errors in tests

* Add comment on GAE and how to obtain non-GAE behaviour

* Remove save/load_running_average from VecNormalize in favor of load/save

* Update changelog

* Update docstring

* Add accidentally removed tests for VecNormalize

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-06-10 12:09:04 +02:00
Antonin RAFFIN
d542732c8d Rename to stable-baselines3 2020-05-05 15:02:35 +02:00
Antonin RAFFIN
7ae54206ce Reformat and code cleanup 2020-04-23 15:18:21 +02:00
Antonin RAFFIN
dcb54b5301 Remove CEMRL 2020-03-23 14:48:38 +01:00
Antonin Raffin
b64873ffff Sync callbacks 2020-03-12 12:34:25 +01:00
Antonin Raffin
18f38f8cf5 Reformat 2020-03-12 11:12:10 +01:00
Antonin Raffin
e31b139c47 Add test for predict method 2020-02-14 14:03:41 +01:00
Antonin Raffin
b66003cfb3 Add callback support 2020-01-27 14:32:31 +01:00
Antonin Raffin
0117cc37f4 Merge branch 'master' into feat/sde-features 2019-12-05 16:33:41 +01:00
Antonin Raffin
21e655ecbf Add test for SAC with different entropy temperature 2019-12-02 11:47:52 +01:00
Noah Dormann
cfb822aa91 Corrected test_run.py 2019-11-21 16:54:30 +01:00
Noah Dormann
17f84053b3 save implementation for a2c needed before uncommenting save and load test in test_run.py::test_onpolicy 2019-11-21 14:44:02 +01:00
Noah Dormann
fb5f192fc4 Implemented Changes suggested from Antonin-Raffin
Added Optimizer saving
2019-11-21 14:39:44 +01:00
Noah Dormann
a7655ca6e1 Reformated every file with PEP 8 errors 2019-11-21 13:01:03 +01:00
Noah Dormann
cc744a48b5 first save and load features 2019-11-12 17:03:57 +01:00
Antonin Raffin
0ad743c85d Add A2C 2019-10-25 10:59:15 +02:00
Antonin Raffin
ef50bb81e8 Add support for categorical distribution 2019-10-08 13:06:38 +02:00
Antonin Raffin
37ab9d10f1 Rescale actions and add action noise 2019-10-07 16:26:03 +02:00
Antonin Raffin
32648d9029 Add docstrings 2019-09-24 15:30:58 +02:00
Antonin Raffin
d22caac616 Working SAC 2019-09-24 14:15:12 +02:00
Antonin RAFFIN
2469ff3859 Reformat 2019-09-21 17:17:09 +02:00
Antonin Raffin
a9b8276efb Attempt to fix loss of perf because of VecEnvs 2019-09-20 18:06:08 +02:00
Antonin Raffin
0e727a5f72 Full compat for VecEnv + bug fixes for cuda 2019-09-20 16:43:19 +02:00
Antonin RAFFIN
fe8b415cbf First sign of life 2019-09-19 16:21:28 +02:00
Antonin RAFFIN
e1c1d5c4ab Bug fixes (not working yet) 2019-09-18 22:12:32 +02:00
Antonin RAFFIN
6bb7e183d2 Running PPO (not working yet) 2019-09-18 15:35:17 +02:00