Commit graph

23 commits

Author SHA1 Message Date
Antonin RAFFIN
c62e9259db
Add custom objects support + bug fix (#336)
* Add support for custom objects

* Add python 3.8 to the CI

* Bump version

* PyType fixes

* [ci skip] Fix typo

* Add note about slow-down + fix typos

* Minor edits to the doc

* Bug fix for DQN

* Update test

* Add test for custom objects
2021-03-06 15:17:43 +02:00
M. Ernestus
0c50d75ecb
TD3 Code review (#245)
* Removed unneeded overrides of feature_extractor and normalize_images in the TD3 Actor.

* Add learning rate schedule example (#248)

* Add learning rate schedule example

* Update docs/guide/examples.rst

Co-authored-by: Adam Gleave <adam@gleave.me>

* Address comments

Co-authored-by: Adam Gleave <adam@gleave.me>

* Add supported action spaces checks (#254)

* Add supported action spaces checks

* Address comment

* Use `pass` in an abstractmethod instead of deleting the arguments.

* Remove the "deterministic" keyword from the forward method of the TD3 Actor since it always is deterministic anyways.

* Rename _get_data to _get_data_to_reconstruct_model.

_get_data was too generic and could have meant anything.

* Remove the n_episodes_rollout parameter and allow passing tuples as train_freq instead.

* Fix docstring of `train_freq` parameter.

* Black fixes.

* Fix TD3 delayed update + rename `_get_data()`

* Fix TD3 test

* Normalize `train_freq` to a tuple in the constructor and turn the warning into an assert.

* Make one step the default train frequency.

* Black fixes.

* Change np.bool to bool.

* Use the tuple format to specify an amount of steps in terms of steps or episodes in the collect_collouts of the off policy algorithm.

* Use the tuple format to specify an amount of steps in terms of steps or episodes in the collect_collouts of HER.

* Use named tuple for train freq

* Rename train_freq to train_every and TrainFreq to ExperienceDuration. Also add some type annotations and documentation.

* Black fixes.

* Revert to train_freq

* Fix terminal observation issues

* Typo

* Fix action noise bug in HER

* Add assert when loading HER models

* Update version

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Adam Gleave <adam@gleave.me>
2021-02-27 17:33:50 +01:00
Antonin RAFFIN
c722c4f5bd
Fix numpy warning and update migration guide (#307) 2021-02-01 11:24:44 +01:00
Anssi
18d10dbf42
Use Monitor episode reward/length for evaluate_policy (#220)
* Update evaluate_policy to use monitor data if available

* Update documentation

* Cleaning up

* Remove unnecessary typing trickery

* Update doc

* Rename is_wrapped to clarify it is for vecenvs

* Add is_wrapped for regular envs

* Add is_wrapped call for subprocvecenv and update code for circular imports

* Move new functions back to env_util and fix imports

* Update changelog

* Clarify evaluate_policy docs

* Add tests for wrapped modifying episode lengths

* Fix tests

* Update changelog

* Minor edits

* Add warn switch to evaluate_policy and update tests

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-11-16 11:52:28 +01:00
Anssi
e2b6f5460f
Avoid transposing channel-first envs (#213)
* Add test for channel-first environments

* Add support for channel-first envs, including more tests

* Update changelog

* Run black

* Run black, again

* Improve NatureCNN error message

* Update image checks and FrameStack wrapper

* Update tests

* Update docs

* Run isort

* Reformat

* Fixes: avoid breaking changes for non-image env

* Add additional checks

* Update docstring

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-11-03 12:34:09 +01:00
Megan Klaiber
dd6e361204
Implement HER (#120)
* Added working her version, Online sampling is missing.

* Updated test_her.

* Added first version of online her sampling. Still problems with tensor dimensions.

* Reformat

* Fixed tests

* Added some comments.

* Updated changelog.

* Add missing init file

* Fixed some small bugs.

* Reduced arguments for HER, small changes.

* Added getattr. Fixed bug for online sampling.

* Updated save/load funtions. Small changes.

* Added her to init.

* Updated save method.

* Updated her ratio.

* Move obs_wrapper

* Added DQN test.

* Fix potential bug

* Offline and online her share same sample_goal function.

* Changed lists into arrays.

* Updated her test.

* Fix online sampling

* Fixed action bug. Updated time limit for episodes.

* Updated convert_dict method to take keys as arguments.

* Renamed obs dict wrapper.

* Seed bit flipping env

* Remove get_episode_dict

* Add fast online sampling version

* Added documentation.

* Vectorized reward computation

* Vectorized goal sampling

* Update time limit for episodes in online her sampling.

* Fix max episode length inference

* Bug fix for Fetch envs

* Fix for HER + gSDE

* Reformat (new black version)

* Added info dict to compute new reward. Check her_replay_buffer again.

* Fix info buffer

* Updated done flag.

* Fixes for gSDE

* Offline her version uses now HerReplayBuffer as episode storage.

* Fix num_timesteps computation

* Fix get torch params

* Vectorized version for offline sampling.

* Modified offline her sampling to use sample method of her_replay_buffer

* Updated HER tests.

* Updated documentation

* Cleanup docstrings

* Updated to review comments

* Fix pytype

* Update according to review comments.

* Removed random goal strategy. Updated sample transitions.

* Updated migration. Removed time signal removal.

* Update doc

* Fix potential load issue

* Add VecNormalize support for dict obs

* Updated saving/loading replay buffer for HER.

* Fix test memory usage

* Fixed save/load replay buffer.

* Fixed save/load replay buffer

* Fixed transition index after loading replay buffer in online sampling

* Better error handling

* Add tests for get_time_limit

* More tests for VecNormalize with dict obs

* Update doc

* Improve HER description

* Add test for sde support

* Add comments

* Add comments

* Remove check that was always valid

* Fix for terminal observation

* Updated buffer size in offline version and reset of HER buffer

* Reformat

* Update doc

* Remove np.empty + add doc

* Fix loading

* Updated loading replay buffer

* Separate online and offline sampling + bug fixes

* Update tensorboard log name

* Version bump

* Bug fix for special case

Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-10-22 11:56:43 +02:00
Antonin RAFFIN
a1e055695c
Improve typing coverage (#175)
* Improve typing coverage

* Even more types

* Fixes

* Update changelog

* Unified docstrings

* Improve error messages for unsupported spaces
2020-10-07 10:51:49 +02:00
Antonin RAFFIN
55912576ed
Cleanup docstring types (#169)
* Cleanup docstring types

* Update style

* Test with js hack

* Revert "Test with js hack"

This reverts commit d091f438e8851ab8d01b66628e06a104f5e5ec69.

* Fix types

* Fix typo

* Update CONTRIBUTING example
2020-10-02 20:05:55 +03:00
Antonin RAFFIN
21e9994ff9
Fix double reset and improve typing coverage (#136)
* Fix double reset and improve typing coverage

* Revert minor edit

* Add doc about types
2020-08-05 13:12:02 +03:00
Steven H. Wang
83530560b5
Fix CloudpickleWrapper load (#118)
* CloudpickleWrapper: Load using cloudpickle

* Update changelog
2020-07-21 10:12:39 +02:00
Antonin RAFFIN
23afedb254
Auto-formatting with black and isort (#97)
* Add auto formatting with black and isort

* Reformat code

* Ignore typing errors

* Add note about line length

* Add minimum version for isort

* Add commit-checks

* Update docker image

* Fixed lost import (during last merge)

* Fix opencv dependency
2020-07-16 16:12:16 +02:00
Matthias K
977a615c82
Fixed SubprocVecEnv close. (#68)
Updated changelog.

Co-authored-by: Matthias K <wirspielen@web.de>
2020-06-20 18:01:37 +02:00
Anssi
b833207142
Add some missing tests, update VecNormalize and RolloutBuffer (#50)
* Change saving/loading normalization parameters to use single pickle file

* Remove 'use_gae' from RolloutBuffer compute_returns function

* Add some missing tests for normalizer, nan-checker and PPO clip_value_fn argument

* Update changelog

* Fix typo

* Use proper pytest.raises for catching errors in tests

* Add comment on GAE and how to obtain non-GAE behaviour

* Remove save/load_running_average from VecNormalize in favor of load/save

* Update changelog

* Update docstring

* Add accidentally removed tests for VecNormalize

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-06-10 12:09:04 +02:00
Anssi
44f8218df0
Review of code (A2C, PPO and refactoring) (#35)
* Split torch module code into torch_layers file

* Updated reference to CNN

* Change 'CxWxH' to 'CxHxW', as per common notion

* Fix missing import in policies.py

* Move PPOPolicy to OnlineActorCriticPolicy

* Create OnPolicyRLModel from PPO, and make A2C and PPO inherit

* Update A2C optimizer comment

* Clean weight init scales for clarity

* Fix A2C log_interval default parameter

* Rename 'progress' to 'progress_remaining

* Rename 'Models' to 'Algorithms'

* Rename 'OnlineActorCriticPolicy' to 'ActorCriticPolicy'

* Move static functions out from BaseAlgorithm

* Move on/off_policy base algorithms to their own files

* Add  files for A2C/PPO

* Fix docs

* Fix pytype

* Update documentation on OnPolicyAlgorithm

* Add proper doctstring for on_policy rollout gathering

* Add bit clarification on the mlppolicy/cnnpolicy naming

* Move static function is_vectorized_policies to utils.py

* Checking docstrings, pep8 fixes

* Update changelog

* Clean changelog

* Remove policy warnings for sac/td3

* Add monitor_wrapper for OnPolicyAlgorithm. Clean tb logging variables. Add parameter keywords to OffPolicyAlgorithm super init

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-06-09 13:54:18 +02:00
Antonin RAFFIN
353ea81080
Fix several VecEnv issues, add fork start method to tests (#43)
* Fix several VecEnv issues, add `fork` start method to tests

* Fix signature
2020-06-04 11:22:12 +02:00
Antonin RAFFIN
15ff6d47ee
Documentation update and style fixes (#21)
* Update doc: add gSDE

* Fix codestyle

* Remove travis script

* Add lint check to gitlab
2020-05-15 13:54:06 +02:00
Antonin RAFFIN
54f6f5b6fb
Add flake8 linter and Github CI (#19)
* Cleanup code

* Add flake8 lint and github workflow

* Update build matrix

* Relax precision for python3.7
2020-05-12 17:55:01 +02:00
Antonin RAFFIN
413a2386d9 Cleanup + reformat code 2020-05-08 15:10:46 +02:00
Antonin RAFFIN
623f821571 Update examples 2020-05-08 12:14:33 +02:00
Antonin RAFFIN
aa0ff8a59b Update atari test 2020-05-07 16:36:48 +02:00
Antonin RAFFIN
8046a24719 More doc + sync VecEnvs + atari 2020-05-07 16:08:23 +02:00
Antonin RAFFIN
2c34a4d694 Sync with Stable-Baselines 2020-05-05 16:28:38 +02:00
Antonin RAFFIN
d542732c8d Rename to stable-baselines3 2020-05-05 15:02:35 +02:00