Commit graph

129 commits

Author SHA1 Message Date
Quentin Gallouédec
c5adad82b2
Multiprocessing support for HerReplayBuffer (#704)
* IM compat. modif from old fork

* mp her working, without offline sampling

* update readme and doc

* fix discrete action/obs space case

* handle offline sampling

* fix pos to be consistent with the old version

* improve typing and docstring

* fix discrete obs special case

* new her, using episode uid

* deal with full buffer

* offline not implemented

* info storage; compute_reward as arg; offline sampling error

* offline sampling; timeout_termination; fix last_trans detection

* rm max_episode_length from tests

* fix loading and loading test

* Fix episode sampling strategy

* Episode interrupted not valid

* Typo

* Fix infos sampling, next_obs desired goals, offline sampling

* update tests for multienvs

* speed up code

* handle timeout sampling when samping

* give up ep_uid for ep_start and ep_lenght

* speed up sampling

* Improve docstring

* Typos and renaming

* Fix typing

* Fix linter warnings

* Renaming + add note

* fix reward type

* Fix future sampling strategy

* Fix future goal selection strategy

* env_fn as lambda

* Re-fix linter warnings

* Formatting

* Fix offline sampling

* restore the initial performance budget

* Remove max_episode_length for HerReplayBuffer kwargs

* SubprcVecEnv compat test

* Dedicated SubrocVecEnv test rm n_envs from parametrization

* Back to using the env arg instead of compute_reward

* Up VecEnv import

* fix lint warnings

* fix docstring

* Fix device issue

* actor_loss_modifier in SAV and TD3

* Merge RewardModifier and ActorLossModifier into Surgeon

* update surgeon for rnd

* fix uninteded merge

* fix uninteded merge

* fix unintended merge

* Rm unintended merge

* Fix KeyError

* Remove useless `all_inds`

* Minor docstring format

* Fix hint

* speedup!

* Speedup again

* speedup

* np.nonzero

* fix env normalization

* flat sampling for speedup

* typo

* drop online

* format

* remove observation from env_cheker (see #1335)

* update changelog

* default device to "auto"

* add comment for info storage

* add comment for ep_start and ep_length attributes

* a[b][c] to a[b, c]

* comment flatnonzero and unravel_index

* update _sample_goals docstring

* Fix future gaol sampling for split episode

* add informative error message for learning_starts too small

* use keyword arg for env

* try fix pytye

* Update stable_baselines3/common/off_policy_algorithm.py

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Add `copy_info_dict` option

* Ignore pytype

* Update changelog

* Rename variables and improve documentation

* Ignore new bug bear rule

* Add note about future strategy

* Add deprecation warning

* Fix bug trying to pickle buffer kwargs

---------

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2023-03-20 12:03:57 +01:00
Antonin RAFFIN
e5deeed16e
Update doc about Gymnasium support (#1382) 2023-03-14 12:43:19 +01:00
Antonin RAFFIN
f0382a25bd
Add documentation about default network architecture (#1353)
* Add documentation about default network architecture

* [ci skip] Rename custom policy section to Policy Networks
2023-03-02 14:14:57 +01:00
harveybellini
7a1e429702
Remove Note from examples - Code works (#1330)
* Remove Note

Gif creation works with Atari Environments using the script provided below.

* Update changelog

---------

Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2023-02-15 13:14:02 +01:00
Quentin Gallouédec
2e4a45020e
Refactor observation stacking (#1238)
* refactor stacking obs

* Improve docstring

* remove all StackedDictObservations

* Update tests and make stacked obs clearer

* Fix type check

* fix stacked_observation_space

* undo init change, deprecate StackedDictObservations

* deprecate stack_observation_space

* type hints

* ignore pytype errors

* undo vecenv doc change

* Deprecation warning in StackedDictObs doctstring

* Fix vec_env.rst

* Fix __all__ sorting

* fix pytype ignore statement

* Update docstring

* stack

* Remove n_stack

* Update changelog

* Simplify code

* Rename test file

* Re-use variable for shift

* Fix doc build

* Remove pytype comment

* Disable pytype error

---------

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2023-02-06 22:41:59 +01:00
Alex Pasquali
b702884c23
Removed shared layers in mlp_extractor (#1292)
* Modified actor-critic policies & MlpExtractor class

ActorCriticPolicy:
  - changed type hint of net_arch param: now it's a dict
  - removed check that if features extractor is not shared: no shared layers are allowed in the mlp_extractor regardless of the features extractor
ActorCriticCnnPolicy:
  - changed type hint of net_arch param: now it's a dict
MultiInputActorcriticPolicy:
  - changed type hint of net_arch param: now it's a dict
MlpExtractor:
  - changed type hint of net_arch param: now it's a dict
  - adapted networks creation
  - adapted methods: forward, forward_actor & forward_critic

* Removed shared layers in mlp_extractor

* Updated docs and changelog + reformat

* Updated custom policy tests

* Removed test on deprecation warning for share layers in mlp_extractor

Now shared layers are removed

* Update version

* Update RL Zoo doc

* Fix linter warnings

* Add ruff to Makefile (experimental)

* Add backward compat code and minor updates

* Update tests

* Add backward compatibility

* Fix test

* Improve compat code

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2023-01-23 14:55:19 +01:00
Quentin Gallouédec
92f7a6f23b
Fix test_vec_normalize.py, test_tensorboard.py and common/monitor.py type hint (#1194)
* Remove from mypy exclude

* type hint for metadata

* Union[float, int] -> float

* Remove useless __init__

* Type hint for model and logger in BaseCallback

* Type hint for metric_dict

* Update changelog

* fix test_tensorboard

* ignore gamma type checking

* Fix monitor type hint

* Update logger type hints

* Fix type annotation and bump version

* Fix circular import

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2023-01-13 18:28:22 +01:00
Yu Zheng
9bb1538b78
Fix outdated load_parameters to set_parameters (#1270)
* Update examples.rst

* Update changelog

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2023-01-11 14:13:21 +01:00
Alex Pasquali
30a19848ce
Deprecation of shared layers in MlpExtractor (#1252)
* Deprecation warning for shared layers in Mlpextractor

* Updated changelog

* Updated custom policy doc

* Update doc and deprecation

* Fix doc build

* Minor edits

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2023-01-05 09:59:36 +01:00
Quentin Gallouédec
4fa17dcf0f
Standardize the use of from gym import spaces (#1240)
* generalize the use of `from gym import spaces`

* command line get system info

* Documentation line length for doc

* update changelog

* add space before os plateform to avoid ref to other issue

* format

* get_system_info update in changelog

* fix type check error

* fix get system info

* add comment about regex

* update version
2023-01-02 14:51:11 +01:00
Antonin RAFFIN
7202ece85b
Update tensorboard callback doc (#1221)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2022-12-21 12:51:28 +01:00
Quentin Gallouédec
96b1a7cf01
env_id consistency in tests (#1224)
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-12-20 16:01:26 +01:00
Alex Pasquali
2cfcec4f50
Modified ActorCriticPolicy to support non-shared features extractor (#1148)
* Modified ActorCriticPolicy to support non-shared features extractor

* Refactored features extraction with non-shared features extractor in ActorCriticPolicy and updated doc

Doc update: added 'warning' on custom policy docs that says that, if the features extractor is non-shared, it's not possible to have shared layers in the mlp_extractor

* Moved attrib share_features_extractor in class

* Updated custom policy doc for non-shared features extractor

* Updated changelog

* Made some if-statements more readable if policies.py

The if-statements are related to the shared/non-shared features extractor in ActorCritic policies

* Simplify implementation and add run test

* Keep order in module gain to keep previous results consistents

* Fix test

* Improved docstring in policies.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Added some tests

* feature extractor -> features extractor

* Fix test

* Fix env_id in test

* Make features extractor parameter explicit

* Remove duplicate

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2022-12-20 15:12:05 +01:00
Antonin RAFFIN
8452106734
Fix support of image like normalized inputs (#1214)
* Fix support of image like normalized inputs

* Improve docstring and warning message.

* Don't check if obs is image when normalize_images is False (lil opt)

* Comment fix

* Fix normalize_images not passed to parent

* Check for subclasses too

* Remove useless multiline

* Update version and add comment

* Fix some typos

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2022-12-20 13:18:28 +01:00
Alex Pasquali
6d55a09f81
Updated custom policy docs to better explain the `mlp_extractor`'s dimensions (#1196)
* Updated custom policy docs

Better explained how the dimensions of the mlp_extractor work, including the action net and the value net after the layers specified in net_arch.

* Improved custom policy doc

Section: Custom Network Architecture.
Explained with greater detail that an action net and a value net will be added on top of the net_arch.

* Improved custom policy doc

Section: Custom Network Architecture.
Merged a comment into a note

* Alignment

Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>
2022-12-12 16:19:51 +01:00
Athanasios Theocharis
f7d7ed3fa7
Update custom_policy.rst (#1183)
* Update custom_policy.rst

* Update changelog

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2022-12-06 17:51:52 +01:00
Quentin Gallouédec
e3b24829a5
Drop gym.GoalEnv and other minor changes initally from #780 (#1184)
* Various changes from #780

* Fix env_checker for goal_env detection
2022-11-28 18:22:31 +01:00
Franz Srambical
8641b05b09
Fix typo in documentation (#1177) 2022-11-15 15:00:03 +01:00
Antonin RAFFIN
0532a5719c
Fix integration documentation (#1135) 2022-10-24 13:20:58 +02:00
Antonin Raffin
37a942c8f9
Fixes 2022-10-24 12:53:48 +02:00
Thomas Simonini
0274aaf056
Update docs/guide/integrations.rst
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-10-24 11:22:33 +02:00
Thomas Simonini
714737c986 Update Hugging Face Integration Documentation 2022-10-24 10:55:30 +02:00
Sam Toyer
5e8f06b3cb
Link to full imitation docs (#1106) 2022-10-10 21:36:30 -07:00
Antonin RAFFIN
e2f81bb70b
Release v1.6.2 (#1103)
* Release v1.6.2

* Remove Gitlab CI, no more minutes
2022-10-10 16:37:11 +02:00
tobirohrer
d8a430e088
Deprecate create_eval_env, eval_env and eval_freq parameter (#1082)
* Adds deprecation warning if `eval_env` or `eval_freq` parameters are used. See #925

* added changelog entry

* added missing backtick

* deprecating `create_eval_env` parameter as well and adding comments to explain the `stacklevel` parameter used

* Updated tests to ignore DeprecationWarnings

* Updated changelog entry

* - Removed the `create_eval_env` parameter from the examples in the docs
- Removed information about the `create_eval_env` parameter from the migration docs
- Added information about deprecation of the `create_eval_env` parameter in the docs

* Add alternative in docstring

* Update docstrings

* `eval_freq` warning in docstring

* Add deprecation comments in tests

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>
2022-10-10 15:39:38 +02:00
Antonin RAFFIN
7c21b79188
Add progress bar callback and argument (#1095)
* Add progress bar callback and argument

* Update doc

* Update changelog

* Upgrade pytype in docker image

* Use tqdm.write in the logger to have cleaner output

* Fix logger test

* Fix when doing multiple calls to learn()

* Address comments from code-review
2022-10-06 18:17:31 +02:00
Quentin Gallouédec
a697401e03
Standardized the use of `"` for string representation (#1086)
* Replace ``'`` by ``" `` in python code

* Update changelog

* Rm whitespace
2022-10-03 15:15:39 +02:00
Antonin RAFFIN
537a82a7fd
Update export doc (fixes + add torch jit) (#1074)
* Update export doc (fixes + add torch jit)

* Fix conflicts

* Update according to code review comments

* fix torch -> th

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2022-09-30 14:30:40 +02:00
Alex Pasquali
d0b129ecc3
Updated custom policy docs (#1067) 2022-09-18 09:17:57 +02:00
Quentin Gallouédec
98e786f744
Clarify and standardize verbosity documentation (#1056)
* Standardize the use of verbosity: > to >=

* Make verbose docstring more specific

* Update changelog
2022-09-09 16:46:28 +02:00
Luke Fisher
a7f30b04e3
Updated minor grammar error (#1041)
"an history" -> "a history"
2022-08-31 18:04:15 +02:00
Anand Balakrishnan
59af0c1b01
CheckpointCallback can now save replay buffer and VecNormalize (#1030)
* CheckpointCallback now saves replay buffer (if present)

* VecNormalize stats are saved at checkpoints

* Make checkpointing replay buffer and VecNormalize opt-in

* Edit changelog

* Add documentation for new parameters

* Update docs/misc/changelog.rst

* Add documentation for new parameters

* Implement suggested edits

* Reformat code

* Fix git conflict

* Add .pkl suffix to VecNormalize checkpoints

* Add tests for new CheckpointCallback params

* Merge CheckpointCallback tests

* Update test and add helper for checkpoint path

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-08-25 10:57:51 +02:00
Timothé
01cc127d32
Support hparams logging to tensorboard (#984)
* create Hparam class & support in all OutputFormats

* add hparams documentation & example

* add hparam tests

* remove unnecessary test & fix name

* format changes

* support hyperparameters logging to tensorboard

* fix HParams class docstring

* use more explicit variable names

* raise error instead of warning

* Unpin protobuf

* Add test for logging hparams

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-08-22 22:06:54 +02:00
jlp-ue
6ce33f5bd2
Fix url in docs (#1000)
* fixed URL in docs

* Update changelog.rst
2022-08-05 17:54:48 +02:00
Marsel Khisamutdinov
d532362e94
Adds info on split tensorboard graphs (#989)
* Add info on split tensorboard graphs.

* Change wording to make it look better.

* Update changelog.rst

* Rephrase and add link to issue

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-07-30 12:44:25 +02:00
Antonin RAFFIN
a18b91e01a
Replace "nature" with "Nature" (magazine) to reduce confusion (#965)
* Replace "nature" with "Nature" (magazine) to reduce confusion

* Replace "nature" with "Nature" (magazine) to reduce confusion

* Update changelog

Co-authored-by: mel <callmesolis@gmail.com>
2022-07-15 22:48:27 +02:00
Antonin RAFFIN
c1f1c3d3d7
Release v1.6.0 (#958)
* Release v1.6.0 + update doc + add copy button

* Update read the doc conda env

* Update year

* Fix bug in kl divergence check

* Rephrase requirement for envpool and isaac gym
2022-07-12 22:50:23 +02:00
Antonin RAFFIN
d68f0a2411
Update doc: SB3 Contrib RecurrentPPO (#927)
* Update doc: contrib update

* Update docs/misc/changelog.rst

Co-authored-by: Anssi <kaneran21@hotmail.com>

* Address Anssi comments

Co-authored-by: Anssi <kaneran21@hotmail.com>
2022-05-31 18:11:16 +02:00
Antonin RAFFIN
49813d8c68
Update doc and add check for unbounded action space (#918) 2022-05-25 16:24:21 +02:00
Thomas Rudolf
c2518dc160
Add doc to use mlflow logger (#889)
* ADD feature for mlflow logger via MLflowOutputFormat.

* Move MLFlow integration to doc

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-05-08 15:28:31 +02:00
Marsel Khisamutdinov
e98ae129de
Fix a grammatical mistake (#899)
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-05-03 16:27:48 +02:00
Antonin RAFFIN
c5f0aa5de0
Update doc: PPO blog post and remark on timeouts (#896) 2022-05-01 16:26:34 +02:00
Antonin RAFFIN
248f082cdc
Bump min PyTorch version (#855) 2022-04-11 18:34:15 +02:00
Yuan
009bb0549a
Update tensorboard.rst in SummaryWriterCallback (#822)
* Update tensorboard.rst

* update changelog.rst

* update changelog.rst, add username
2022-03-15 21:48:52 +01:00
Antonin RAFFIN
e88eb1c9ca
Add explanation of logger output (#803)
* Add explanation of logger output

* Apply suggestions from code review

Co-authored-by: Anssi <kaneran21@hotmail.com>

* Add example output

Co-authored-by: Anssi <kaneran21@hotmail.com>
2022-03-07 12:20:43 +01:00
Julio César Alves
cdaa9ab418
Callback to early stop the training if there is no model improvement after consecutive evaluations (#741)
* Added StopTrainingOnNoModelImprovement callback and callback_after_eval parameter in EvalCallback

* Correction in EvalCallback and tests for StopTrainingOnNoModelImprovement

* Update the docs related to new StopTrainingOnNoModelImprovement callback

* Update doc

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2022-02-25 11:56:47 +01:00
Gautam J
59bec30180
update docs fix indentation (#764)
* update docs fix indentation

Changed code block indentation from 2 spaces to 4 spaces for consistency.

* update changelog

* Update changelog.rst

Co-authored-by: Anssi <kaneran21@hotmail.com>
2022-02-07 21:00:53 +02:00
Ashish Dutt
954daaac37
Custom environment page modified. Following fixes are committed in response to issue#755. (#758)
* Page modified. Following fixes are committed in response to issue#755.

- fixed the broken url on creating custom gym environment. Also added appropriate advice by citing official OpenAi gym documents.
- SB3 text tweaked.

* modified page

- updated the in-line text hyperlinks to follow Sphinx restructured text format.

* modified page

- updated the in-line text hyperlinks to follow Sphinx restructured text format.
- updated text grammar

* Language

Co-authored-by: Anssi <kaneran21@hotmail.com>
2022-02-05 13:36:36 +02:00
Carlos Luis
5143cd19f7
Gym fixes - Follow up from #705 (#734)
* fix Atari in CI

* fix dtype and atari extra

* Update setup.py

* remove 3.6

* note about how to install Atari

* pendulum-v1

* atari v5

* black

* fix pendulum capitalization

* add minimum version

* moved things in changelog to breaking changes

* partial v5 fix

* env update to pass tests

* mismatch env version fixed

* Fix tests after merge

* Include autorom in setup.py

* Blacken code

* Fix dtype issue in more robust way

* Fix GitLab CI: switch to Docker container with new black version

* Remove workaround from GitLab. (May need to rebuild Docker for this though.)

* Revert to v4

* Update setup.py

* Apply suggestions from code review

* Remove unnecessary autorom

* Consistent gym versions

Co-authored-by: J K Terry <justinkterry@gmail.com>
Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: modanesh <mohamad4danesh@gmail.com>
Co-authored-by: Adam Gleave <adam@gleave.me>
2022-02-04 15:13:57 -08:00
Antonin RAFFIN
54bcfa4544
Add Hugging Face integration to SB3 doc (#733)
* Add Hugging Face to SB3 doc

* Update doc + fixes

* Use SB3 model from the hub

* Bump version

* Fixes

Co-authored-by: simoninithomas <simonini_thomas@outlook.fr>
2022-01-20 10:04:12 +01:00