Commit graph

16 commits

Author SHA1 Message Date
Quentin Gallouédec
c5adad82b2
Multiprocessing support for HerReplayBuffer (#704)
* IM compat. modif from old fork

* mp her working, without offline sampling

* update readme and doc

* fix discrete action/obs space case

* handle offline sampling

* fix pos to be consistent with the old version

* improve typing and docstring

* fix discrete obs special case

* new her, using episode uid

* deal with full buffer

* offline not implemented

* info storage; compute_reward as arg; offline sampling error

* offline sampling; timeout_termination; fix last_trans detection

* rm max_episode_length from tests

* fix loading and loading test

* Fix episode sampling strategy

* Episode interrupted not valid

* Typo

* Fix infos sampling, next_obs desired goals, offline sampling

* update tests for multienvs

* speed up code

* handle timeout sampling when samping

* give up ep_uid for ep_start and ep_lenght

* speed up sampling

* Improve docstring

* Typos and renaming

* Fix typing

* Fix linter warnings

* Renaming + add note

* fix reward type

* Fix future sampling strategy

* Fix future goal selection strategy

* env_fn as lambda

* Re-fix linter warnings

* Formatting

* Fix offline sampling

* restore the initial performance budget

* Remove max_episode_length for HerReplayBuffer kwargs

* SubprcVecEnv compat test

* Dedicated SubrocVecEnv test rm n_envs from parametrization

* Back to using the env arg instead of compute_reward

* Up VecEnv import

* fix lint warnings

* fix docstring

* Fix device issue

* actor_loss_modifier in SAV and TD3

* Merge RewardModifier and ActorLossModifier into Surgeon

* update surgeon for rnd

* fix uninteded merge

* fix uninteded merge

* fix unintended merge

* Rm unintended merge

* Fix KeyError

* Remove useless `all_inds`

* Minor docstring format

* Fix hint

* speedup!

* Speedup again

* speedup

* np.nonzero

* fix env normalization

* flat sampling for speedup

* typo

* drop online

* format

* remove observation from env_cheker (see #1335)

* update changelog

* default device to "auto"

* add comment for info storage

* add comment for ep_start and ep_length attributes

* a[b][c] to a[b, c]

* comment flatnonzero and unravel_index

* update _sample_goals docstring

* Fix future gaol sampling for split episode

* add informative error message for learning_starts too small

* use keyword arg for env

* try fix pytye

* Update stable_baselines3/common/off_policy_algorithm.py

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Add `copy_info_dict` option

* Ignore pytype

* Update changelog

* Rename variables and improve documentation

* Ignore new bug bear rule

* Add note about future strategy

* Add deprecation warning

* Fix bug trying to pickle buffer kwargs

---------

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2023-03-20 12:03:57 +01:00
tobirohrer
d8a430e088
Deprecate create_eval_env, eval_env and eval_freq parameter (#1082)
* Adds deprecation warning if `eval_env` or `eval_freq` parameters are used. See #925

* added changelog entry

* added missing backtick

* deprecating `create_eval_env` parameter as well and adding comments to explain the `stacklevel` parameter used

* Updated tests to ignore DeprecationWarnings

* Updated changelog entry

* - Removed the `create_eval_env` parameter from the examples in the docs
- Removed information about the `create_eval_env` parameter from the migration docs
- Added information about deprecation of the `create_eval_env` parameter in the docs

* Add alternative in docstring

* Update docstrings

* `eval_freq` warning in docstring

* Add deprecation comments in tests

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>
2022-10-10 15:39:38 +02:00
Antonin RAFFIN
a18b91e01a
Replace "nature" with "Nature" (magazine) to reduce confusion (#965)
* Replace "nature" with "Nature" (magazine) to reduce confusion

* Replace "nature" with "Nature" (magazine) to reduce confusion

* Update changelog

Co-authored-by: mel <callmesolis@gmail.com>
2022-07-15 22:48:27 +02:00
IperGiove
d9e198e04f
Update custom_policy.rst (#711)
* Update custom_policy.rst

Added methods forward_actor and forward_critic in CustomNetwork class.

* Update doc

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-01-03 16:22:58 +01:00
Thomas Gubler
c895c1d46f
Doc fix: A2C - fix guidance on RMSpropTFLike (#708)
* doc: A2C/migration: fix guidance on RMSpropTFLike

* Update changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-12-30 11:28:12 +01:00
Antonin RAFFIN
88e1be9ff5
Documentation update (#450)
* Update migration guide

* Add sanity check

* Removed parameter ``channels_last`` from ``is_image_space``

* Pin docutils

* Clarify callback `save_freq` definition

* Update docs/misc/changelog.rst

* Update docs/misc/changelog.rst

* Fix typos

Co-authored-by: Anssi <kaneran21@hotmail.com>
2021-05-23 13:13:11 +02:00
Antonin RAFFIN
e3875b50a1
Stable-Baselines3 v1.0 (#354)
* Bump version and update doc

* Fix name

* Apply suggestions from code review

Co-authored-by: Adam Gleave <adam@gleave.me>

* Update docs/index.rst

Co-authored-by: Adam Gleave <adam@gleave.me>

* Update wording for RL zoo

Co-authored-by: Adam Gleave <adam@gleave.me>
2021-03-17 14:20:31 +01:00
Antonin RAFFIN
c62e9259db
Add custom objects support + bug fix (#336)
* Add support for custom objects

* Add python 3.8 to the CI

* Bump version

* PyType fixes

* [ci skip] Fix typo

* Add note about slow-down + fix typos

* Minor edits to the doc

* Bug fix for DQN

* Update test

* Add test for custom objects
2021-03-06 15:17:43 +02:00
Antonin RAFFIN
c722c4f5bd
Fix numpy warning and update migration guide (#307) 2021-02-01 11:24:44 +01:00
Antonin RAFFIN
d04aad2a20
Doc fixes and add monitor_kwargs parameter (#230)
* Fix type annotation

* Fix migration doc for A2C

* Update version

* Add `monitor_kwargs` argument

* Update docs/guide/migration.rst

Co-authored-by: Adam Gleave <adam@gleave.me>

* Fix make atari env

* Fix docstring

* Renamed LearningRateSchedule

Co-authored-by: Adam Gleave <adam@gleave.me>
2020-11-20 10:28:54 +01:00
Antonin RAFFIN
897e98c4e2
Update documentation (#199)
* Update doc and add new example

* Add save/load replay buffer example

* Add save format + export doc

* Add example for get/set parameters

* Typos and minor edits

* Add results sections

* Add note about performance

* Add DDPG results

* Address comments

* Fix grammar/wording

Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>
2020-10-28 09:55:16 +01:00
Steven H. Wang
b252f4212c
Add imitation library docs (#200)
* docs: Add imitation library docs

* Fix doc syntax errors

* Fix internal link; PDF->abstract for DAgger for consistency

* Grammar

* Update migration guide

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Adam Gleave <adam@gleave.me>
2020-10-24 17:33:26 +01:00
Megan Klaiber
dd6e361204
Implement HER (#120)
* Added working her version, Online sampling is missing.

* Updated test_her.

* Added first version of online her sampling. Still problems with tensor dimensions.

* Reformat

* Fixed tests

* Added some comments.

* Updated changelog.

* Add missing init file

* Fixed some small bugs.

* Reduced arguments for HER, small changes.

* Added getattr. Fixed bug for online sampling.

* Updated save/load funtions. Small changes.

* Added her to init.

* Updated save method.

* Updated her ratio.

* Move obs_wrapper

* Added DQN test.

* Fix potential bug

* Offline and online her share same sample_goal function.

* Changed lists into arrays.

* Updated her test.

* Fix online sampling

* Fixed action bug. Updated time limit for episodes.

* Updated convert_dict method to take keys as arguments.

* Renamed obs dict wrapper.

* Seed bit flipping env

* Remove get_episode_dict

* Add fast online sampling version

* Added documentation.

* Vectorized reward computation

* Vectorized goal sampling

* Update time limit for episodes in online her sampling.

* Fix max episode length inference

* Bug fix for Fetch envs

* Fix for HER + gSDE

* Reformat (new black version)

* Added info dict to compute new reward. Check her_replay_buffer again.

* Fix info buffer

* Updated done flag.

* Fixes for gSDE

* Offline her version uses now HerReplayBuffer as episode storage.

* Fix num_timesteps computation

* Fix get torch params

* Vectorized version for offline sampling.

* Modified offline her sampling to use sample method of her_replay_buffer

* Updated HER tests.

* Updated documentation

* Cleanup docstrings

* Updated to review comments

* Fix pytype

* Update according to review comments.

* Removed random goal strategy. Updated sample transitions.

* Updated migration. Removed time signal removal.

* Update doc

* Fix potential load issue

* Add VecNormalize support for dict obs

* Updated saving/loading replay buffer for HER.

* Fix test memory usage

* Fixed save/load replay buffer.

* Fixed save/load replay buffer

* Fixed transition index after loading replay buffer in online sampling

* Better error handling

* Add tests for get_time_limit

* More tests for VecNormalize with dict obs

* Update doc

* Improve HER description

* Add test for sde support

* Add comments

* Add comments

* Remove check that was always valid

* Fix for terminal observation

* Updated buffer size in offline version and reset of HER buffer

* Reformat

* Update doc

* Remove np.empty + add doc

* Fix loading

* Updated loading replay buffer

* Separate online and offline sampling + bug fixes

* Update tensorboard log name

* Version bump

* Bug fix for special case

Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-10-22 11:56:43 +02:00
Anssi
19c1a89a3a
Rename cmd_util to env_util (#197)
* Rename cmd_util to env_util

* Fix docs and add missing newline

* Address comments
2020-10-22 11:05:52 +02:00
Antonin RAFFIN
fc6c5d3daa
Migration Guide (#123)
* Start migration guide

* Update guide

* Add comment on RMSpropTFLike plus PPO/A2C migrations

* Add note about set/get-parameters

* Update migration guide

* Update changelog and readme

* Update doc + clean changelog

* Address comments

Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>
2020-10-11 23:22:12 +02:00
Antonin RAFFIN
d17f29c8ad Add base doc 2020-05-07 10:10:51 +02:00