Commit graph

21 commits

Author SHA1 Message Date
Megan Klaiber
dd6e361204
Implement HER (#120)
* Added working her version, Online sampling is missing.

* Updated test_her.

* Added first version of online her sampling. Still problems with tensor dimensions.

* Reformat

* Fixed tests

* Added some comments.

* Updated changelog.

* Add missing init file

* Fixed some small bugs.

* Reduced arguments for HER, small changes.

* Added getattr. Fixed bug for online sampling.

* Updated save/load funtions. Small changes.

* Added her to init.

* Updated save method.

* Updated her ratio.

* Move obs_wrapper

* Added DQN test.

* Fix potential bug

* Offline and online her share same sample_goal function.

* Changed lists into arrays.

* Updated her test.

* Fix online sampling

* Fixed action bug. Updated time limit for episodes.

* Updated convert_dict method to take keys as arguments.

* Renamed obs dict wrapper.

* Seed bit flipping env

* Remove get_episode_dict

* Add fast online sampling version

* Added documentation.

* Vectorized reward computation

* Vectorized goal sampling

* Update time limit for episodes in online her sampling.

* Fix max episode length inference

* Bug fix for Fetch envs

* Fix for HER + gSDE

* Reformat (new black version)

* Added info dict to compute new reward. Check her_replay_buffer again.

* Fix info buffer

* Updated done flag.

* Fixes for gSDE

* Offline her version uses now HerReplayBuffer as episode storage.

* Fix num_timesteps computation

* Fix get torch params

* Vectorized version for offline sampling.

* Modified offline her sampling to use sample method of her_replay_buffer

* Updated HER tests.

* Updated documentation

* Cleanup docstrings

* Updated to review comments

* Fix pytype

* Update according to review comments.

* Removed random goal strategy. Updated sample transitions.

* Updated migration. Removed time signal removal.

* Update doc

* Fix potential load issue

* Add VecNormalize support for dict obs

* Updated saving/loading replay buffer for HER.

* Fix test memory usage

* Fixed save/load replay buffer.

* Fixed save/load replay buffer

* Fixed transition index after loading replay buffer in online sampling

* Better error handling

* Add tests for get_time_limit

* More tests for VecNormalize with dict obs

* Update doc

* Improve HER description

* Add test for sde support

* Add comments

* Add comments

* Remove check that was always valid

* Fix for terminal observation

* Updated buffer size in offline version and reset of HER buffer

* Reformat

* Update doc

* Remove np.empty + add doc

* Fix loading

* Updated loading replay buffer

* Separate online and offline sampling + bug fixes

* Update tensorboard log name

* Version bump

* Bug fix for special case

Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-10-22 11:56:43 +02:00
Antonin RAFFIN
2599f04940
Add custom arch for off-policy actor/critic networks (#182)
* Add custom arch for off-policy actor/critic networks

* Fix type hints

* Address comments

* Make sure number of updated parameters match in polyak

* Add zip_strict for strict-length zipping

* Fix building docs

* Add test for zip strict

* Faster tests

Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>
2020-10-13 12:01:33 +02:00
Antonin RAFFIN
23afedb254
Auto-formatting with black and isort (#97)
* Add auto formatting with black and isort

* Reformat code

* Ignore typing errors

* Add note about line length

* Add minimum version for isort

* Add commit-checks

* Update docker image

* Fixed lost import (during last merge)

* Fix opencv dependency
2020-07-16 16:12:16 +02:00
Antonin RAFFIN
c20af230f7 Remove SDE support for TD3 2020-05-08 15:00:34 +02:00
Antonin RAFFIN
d542732c8d Rename to stable-baselines3 2020-05-05 15:02:35 +02:00
Antonin RAFFIN
fd9e73cfb8 Fix entropy computation 2020-03-19 10:19:48 +01:00
Antonin Raffin
037986a91d Add test for expln 2020-03-11 16:35:13 +01:00
Antonin Raffin
c542009641 Clean up code + bug fixes 2020-01-20 11:17:55 +01:00
Antonin Raffin
3cdd5f20af Bug fix + add test for sde net arch 2019-12-02 14:14:48 +01:00
Antonin Raffin
5483e02d1a Add SDE support for SAC 2019-11-26 15:26:12 +01:00
Antonin Raffin
d26fcf4566 Fix grad computation for sde test 2019-11-26 11:57:48 +01:00
Antonin Raffin
0885dbe74b Bug fix in choosing the distribution 2019-11-25 15:02:10 +01:00
Antonin Raffin
5d6649d92b Enable separate feature extraction for SDE 2019-11-25 14:54:13 +01:00
Antonin Raffin
ad32aa60f3 Add sde scheduler 2019-11-18 16:03:08 +01:00
Antonin Raffin
d8a7556d84 Merge branch 'feat/sde' into feat/offpolicy-sde 2019-11-18 15:14:05 +01:00
Antonin Raffin
5d353d598c Start cleanup + update docstrings 2019-11-18 14:09:31 +01:00
Antonin Raffin
fb64072859 Update sde test 2019-11-15 11:07:49 +01:00
Antonin Raffin
db87e0d36a Quick and dirty SDE version for TD3 2019-11-07 17:31:52 +01:00
Antonin Raffin
72a6f18e43 Add sde test + fix random seed 2019-10-31 14:14:30 +01:00
Antonin Raffin
42d50ed09b Add expln 2019-10-29 15:15:54 +01:00
Antonin Raffin
c15b4bda1e Add first draft of SDE 2019-10-28 18:24:13 +01:00