Commit graph

521 commits

Author SHA1 Message Date
Kallinteris Andreas
9c338f917a
vec_envs fix seed() causing a reset (#1486)
* `dummy_vec_env` fix `seed()` causing a reset

* rename `seed`

* fixes

* bug fix

* fix seed return type

* Cleanup seeding, add test and remove compat wrapper

* Update env checker and tests

* Add deterministic test for make_vec_env

---------

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2023-05-20 10:30:54 +02:00
Antonin RAFFIN
fd0cd82339
Update outdated custom env doc (#1490)
* Update outdated custom env doc

* fix render_mode and term/trunc/reset_info

* gym -> gymnasium

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2023-05-08 13:48:26 +02:00
Quentin Gallouédec
9cebedc89f
Fix Colab logger error (#1484)
* fix HumanOutputFormat

* update version

* update changelog

* TextIO annotation, TextIOBase isinstance

* update changelog

* test for HumanOutputFormat with custom TextIO

* rm extra test line

* Update tests/test_logger.py

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

---------

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2023-05-05 14:26:39 +02:00
Antonin RAFFIN
63a0bb9da1
Type annotation bundle (logger, vec env, custom envs) (#1479)
* Switch from List to Sequence for `seed()` type hint

* Fix logger type hints

* Improve replay buffer type hints

* Fix custom envs type annotations

* Fix VecMonitor type hints

* Fix RMSprop type hint

* Fix vec extract dict obs type hints

* Fix vec frame stack type annotations

* Fix base vec env type hints

* Fix dummy vec env type hints

* Fix for mypy

* Fixes for the tests

* mypy doesn't like when we overwrite type

* fix step of SimpleMultiObsEnv

* remove useless type specification

* Rm useless type hint

* Improve logger type hint

* format

* rm useless type hint

* Re-add variables in constructor, remove unused import

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2023-05-04 20:27:15 +02:00
Sidney Tio
d6ddee9366
Add evalcallback example (#1468)
* Moved 'Monitoring Training' to subsubsection of 'Using callbacks'

* Added EvalCallback example

* Updated Changelogs

* Edited the language

* Moved subsection headers up one level

* added make_vec_env into Evalcallback example

* Added parameters to the top for readability

* Added note on multiple training environments

* Added more clarity to eval_freq note

* Apply suggestions from code review

---------

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2023-05-02 18:02:36 +02:00
Übertreiber
4f9805eeb8
Fix overly relaxed version requirement on NumPy (#1472)
Since commit 489b1fda, this package has been using
`numpy.typing.DTypeLike`, which was only added in [NumPy 1.20][1].

[1]: https://numpy.org/doc/stable/release/1.20.0-notes.html#numpy-is-now-typed

Co-authored-by: troiganto <troiganto@proton.me>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2023-04-27 19:07:53 +02:00
Tobias Rohrer
6cbb2c9303
Fix DQN target update interval for multi-env (#1463)
* Calculating target update interval per environment in `_on_step()`. See GitHub issue #1373

* Added changelog entry and changed test comment

* Added requested changes from code review

* Update version

---------

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2023-04-27 18:35:33 +02:00
Lei He
dc09d81f9c
Added UAV_Navigation_DRL_AirSim to the project page (#1462)
* Update changelog.rst

* Update projects.rst

* Update grammar and fix doc build

---------

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2023-04-20 23:12:57 +02:00
Antonin RAFFIN
96526ed08a
Update issue templates and env infos (#1451)
* Update issue templates and env infos

* Fix pytype
2023-04-14 13:50:14 +02:00
Antonin RAFFIN
40e0b9d2c8
Add Gymnasium support (#1327)
* Fix failing set_env test

* Fix test failiing due to deprectation of env.seed

* Adjust mean reward threshold in failing test

* Fix her test failing due to rng

* Change seed and revert reward threshold to 90

* Pin gym version

* Make VecEnv compatible with gym seeding change

* Revert change to VecEnv reset signature

* Change subprocenv seed cmd to call reset instead

* Fix type check

* Add backward compat

* Add `compat_gym_seed` helper

* Add goal env checks in env_checker

* Add docs on  HER requirements for envs

* Capture user warning in test with inverted box space

* Update ale-py version

* Fix randint

* Allow noop_max to be zero

* Update changelog

* Update docker image

* Update doc conda env and dockerfile

* Custom envs should not have any warnings

* Fix test for numpy >= 1.21

* Add check for vectorized compute reward

* Bump to gym 0.24

* Fix gym default step docstring

* Test downgrading gym

* Revert "Test downgrading gym"

This reverts commit 0072b77156c006ada8a1d6e26ce347ed85a83eeb.

* Fix protobuf error

* Fix in dependencies

* Fix protobuf dep

* Use newest version of cartpole

* Update gym

* Fix warning

* Loosen required scipy version

* Scipy no longer needed

* Try gym 0.25

* Silence warnings from gym

* Filter warnings during tests

* Update doc

* Update requirements

* Add gym 26 compat in vec env

* Fixes in envs and tests for gym 0.26+

* Enforce gym 0.26 api

* format

* Fix formatting

* Fix dependencies

* Fix syntax

* Cleanup doc and warnings

* Faster tests

* Higher budget for HER perf test (revert prev change)

* Fixes and update doc

* Fix doc build

* Fix breaking change

* Fixes for rendering

* Rename variables in monitor

* update render method for gym 0.26 API

backwards compatible (mode argument is allowed) while using the gym 0.26 API (render mode is determined at environment creation)

* update tests and docs to new gym render API

* undo removal of render modes metatadata check

* set rgb_array as default render mode for gym.make

* undo changes & raise warning if not 'rgb_array'

* Fix type check

* Remove recursion and fix type checking

* Remove hacks for protobuf and gym 0.24

* Fix type annotations

* reuse existing render_mode attribute

* return tiled images for 'human' render mode

* Allow to use opencv for human render, fix typos

* Add warning when using non-zero start with Discrete (fixes #1197)

* Fix type checking

* Bug fixes and handle more cases

* Throw proper warnings

* Update test

* Fix new metadata name

* Ignore numpy warnings

* Fixes in vec recorder

* Global ignore

* Filter local warning too

* Monkey patch not needed for gym 26

* Add doc of VecEnv vs Gym API

* Add render test

* Fix return type

* Update VecEnv vs Gym API doc

* Fix for custom render mode

* Fix return type

* Fix type checking

* check test env test_buffer

* skip render check

* check env test_dict_env

* test_env test_gae

* check envs in remaining tests

* Update tests

* Add warning for Discrete action space with non-zero (#1295)

* Fix atari annotation

* ignore get_action_meanings [attr-defined]

* Fix mypy issues

* Add patch for gym/gymnasium transition

* Switch to gymnasium

* Rely on signature instead of version

* More patches

* Type ignore because of https://github.com/Farama-Foundation/Gymnasium/pull/39

* Fix doc build

* Fix pytype errors

* Fix atari requirement

* Update env checker due to change in dtype for Discrete

* Fix type hint

* Convert spaces for saved models

* Ignore pytype

* Remove gitlab CI

* Disable pytype for convert space

* Fix undefined info

* Fix undefined info

* Upgrade shimmy

* Fix wrappers type annotation (need PR from Gymnasium)

* Fix gymnasium dependency

* Fix dependency declaration

* Cap pygame version for python 3.7

* Point to master branch (v0.28.0)

* Fix: use main not master branch

* Rename done to terminated

* Fix pygame dependency for python 3.7

* Rename gym to gymnasium

* Update Gymnasium

* Fix test

* Fix tests

* Forks don't have access to private variables

* Fix linter warnings

* Update read the doc env

* Fix env checker for GoalEnv

* Fix import

* Update env checker (more info) and fix dtype

* Use micromamab for Docker

* Update dependencies

* Clarify VecEnv doc

* Fix Gymnasium version

* Copy file only after mamba install

* [ci skip] Update docker doc

* Polish code

* Reformat

* Remove deprecated features

* Ignore warning

* Update doc

* Update examples and changelog

* Fix type annotation bundle (SAC, TD3, A2C, PPO, base class) (#1436)

* Fix SAC type hints, improve DQN ones

* Fix A2C and TD3 type hints

* Fix PPO type hints

* Fix on-policy type hints

* Fix base class type annotation, do not use defaults

* Update version

* Disable mypy for python 3.7

* Rename Gym26StepReturn

* Update continuous critic type annotation

* Fix pytype complain

---------

Co-authored-by: Carlos Luis <carlos.luisgonc@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Thomas Lips <37955681+tlpss@users.noreply.github.com>
Co-authored-by: tlips <thomas.lips@ugent.be>
Co-authored-by: tlpss <thomas17.lips@gmail.com>
Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>
2023-04-14 13:13:59 +02:00
WeberSamuel
15c9daa2ba
Fix VecExtractDictObs does not handle terminal observation (#1443)
* VecExtractDictObs handle terminal_observation

* Added VecExtractDictObs handle terminal_output to changelog

* Update changelog.rst

* Update test_vec_extract_dict_obs.py

Add random dones in env to test if terminal_observation is properly handled

* Made test deterministic

* Fixed bug in test

* Improved test

* Fix format in test

* Update test

* Fix type hint

* Ignore pytype warning

* Ignore pytype

---------

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2023-04-12 15:20:04 +02:00
npit
4232f9daa9
Rename the observations variable in the evaluation util to avoid shadowing (#1288)
* Rename the observations variable in the evaluation util to avoid shadowing

This enables a callback in evaluate_policy to have access to the
observation vector that is fed to the environment step function,
which is currently shadowed by the output observation.

* Update changelog

* Add test

* Move assignment outside of the loop

---------

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2023-04-11 18:00:33 +02:00
Antonin RAFFIN
84f5511e08
Update changelog and cleanup (#1434) 2023-04-08 15:36:55 +02:00
Jonas Reiher
12250eb761
Add stats window argument (#1424)
* added stats_window_size argument

* updated changelog

* docstring info updated

* added missing tensorboard log docstring

* added stats_window_size argument for all models

* fixed stats_window_size test

* Update version

---------

Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2023-04-05 11:33:26 +02:00
Antonin RAFFIN
5a70af8abd
Fix type hints for DQN (#1354)
* Fix type hints for DQN

* [ci skip] Remove commented line

* Refine types

* Fix vectorized obs detection

* Fix for pytype

* Fix check at load time to create replay buffer

* One config file to rule them all

* Delete unused config

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2023-03-30 11:31:47 +02:00
Omar Younis
a60b0179e0
Fix: Reshape action in DictRolloutBuffer (#1395)
* reshape action in DictRolloutBuffer

* improve buffer test

* update changelog

* add comment

* Update comments and version

---------

Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2023-03-29 16:25:05 +02:00
Fiete
b6aa507a22
Make check_env assertions in regards to observation_space more actionable (#1400)
* add instructions for running single tests in the README, add assertions for observation_space

* update changelog

* address linting warnings

* correct pytest command in the README

* correct review comments, run make commit-checks

* truncate lines that are too long

* address make lint warning about checking module availability

* fix tests

* use f-strings for formatting assertion messages

* fix type issue

* Refactor tests, improve error messages

---------

Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2023-03-29 15:26:03 +02:00
Quentin Gallouédec
c5adad82b2
Multiprocessing support for HerReplayBuffer (#704)
* IM compat. modif from old fork

* mp her working, without offline sampling

* update readme and doc

* fix discrete action/obs space case

* handle offline sampling

* fix pos to be consistent with the old version

* improve typing and docstring

* fix discrete obs special case

* new her, using episode uid

* deal with full buffer

* offline not implemented

* info storage; compute_reward as arg; offline sampling error

* offline sampling; timeout_termination; fix last_trans detection

* rm max_episode_length from tests

* fix loading and loading test

* Fix episode sampling strategy

* Episode interrupted not valid

* Typo

* Fix infos sampling, next_obs desired goals, offline sampling

* update tests for multienvs

* speed up code

* handle timeout sampling when samping

* give up ep_uid for ep_start and ep_lenght

* speed up sampling

* Improve docstring

* Typos and renaming

* Fix typing

* Fix linter warnings

* Renaming + add note

* fix reward type

* Fix future sampling strategy

* Fix future goal selection strategy

* env_fn as lambda

* Re-fix linter warnings

* Formatting

* Fix offline sampling

* restore the initial performance budget

* Remove max_episode_length for HerReplayBuffer kwargs

* SubprcVecEnv compat test

* Dedicated SubrocVecEnv test rm n_envs from parametrization

* Back to using the env arg instead of compute_reward

* Up VecEnv import

* fix lint warnings

* fix docstring

* Fix device issue

* actor_loss_modifier in SAV and TD3

* Merge RewardModifier and ActorLossModifier into Surgeon

* update surgeon for rnd

* fix uninteded merge

* fix uninteded merge

* fix unintended merge

* Rm unintended merge

* Fix KeyError

* Remove useless `all_inds`

* Minor docstring format

* Fix hint

* speedup!

* Speedup again

* speedup

* np.nonzero

* fix env normalization

* flat sampling for speedup

* typo

* drop online

* format

* remove observation from env_cheker (see #1335)

* update changelog

* default device to "auto"

* add comment for info storage

* add comment for ep_start and ep_length attributes

* a[b][c] to a[b, c]

* comment flatnonzero and unravel_index

* update _sample_goals docstring

* Fix future gaol sampling for split episode

* add informative error message for learning_starts too small

* use keyword arg for env

* try fix pytye

* Update stable_baselines3/common/off_policy_algorithm.py

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Add `copy_info_dict` option

* Ignore pytype

* Update changelog

* Rename variables and improve documentation

* Ignore new bug bear rule

* Add note about future strategy

* Add deprecation warning

* Fix bug trying to pickle buffer kwargs

---------

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2023-03-20 12:03:57 +01:00
Antonin RAFFIN
e5deeed16e
Update doc about Gymnasium support (#1382) 2023-03-14 12:43:19 +01:00
Antonin RAFFIN
470771b5c2
Fix Atari Roms download, enable RUF linting (#1379)
* Add extra no Atari and fix CI for forks

* Enable ruff rules

* Change to no roms
2023-03-12 18:47:52 +01:00
Antonin RAFFIN
10e83865ec
Switch to pyproject.toml and ruff (#1361)
* Switch to `pyproject.toml` and `ruff`

* Fix for Atari ROMs and mypy

* Switch order in CI, lint first
2023-03-11 22:15:26 +01:00
Antonin RAFFIN
f0382a25bd
Add documentation about default network architecture (#1353)
* Add documentation about default network architecture

* [ci skip] Rename custom policy section to Policy Networks
2023-03-02 14:14:57 +01:00
Antonin RAFFIN
ed8783cb73
Add support for dict/tuple obs space for VecCheckNaN (#1348)
* Add support for dict/tuple obs space for VecCheckNaN

* Handle list too

* Address comments from code review

* Ignore B028 (explicit stack level)
2023-02-27 13:45:17 +01:00
Antonin RAFFIN
085bdd5a68
Remove deprecated usage of feature extractor (#1296)
* Remove deprecated usage of feature extractor

* Update changelog and version

* Update changelog.rst
2023-02-19 12:53:10 +01:00
Quentin Gallouédec
12e9917c24
Fix image-based normalized env loading (#1321)
* Fix

* Add test

* Update changelog

* fix memory error avoidance

* Update version

* image env test

* black

* check_shape_equal

* check shape equal in vecnormalize

* Allow spaces not to be box or dict

* rm `test_save_load_vecnormalized_image` in favor of `test_vec_env`

* Remove unused imports

---------

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2023-02-15 14:17:18 +01:00
harveybellini
7a1e429702
Remove Note from examples - Code works (#1330)
* Remove Note

Gif creation works with Atari Environments using the script provided below.

* Update changelog

---------

Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2023-02-15 13:14:02 +01:00
Vikas Kumar
69b94dd6a8
Rename "timesteps" to "episodes" in log_interval documentation (#1325)
* change timestamp to episode for logging

* update changelog

* minor format modif

* minor format modif

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2023-02-10 21:15:09 +01:00
Sidney Tio
489b1fdaf2
Add the argument dtype (default to float32) to the noise (#1301)
* Fixed noise to return float32

* Updated changelog

* Fixed test to use numpy arrays instead of python floats

* Sorted imports for tests

* Added dtype to constructor

* Removed dtype parameter for VectorizedActionNoise

* __init__ -> None; Capitalize and period in docstring when needed; fix dtype type hint; dtype in docstring

* fix dtype type hint

* Update version

* Clarify changelog [skip ci]

* empty commit to run ci

* Update docs/misc/changelog.rst

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2023-02-07 13:42:14 +01:00
Quentin Gallouédec
2e4a45020e
Refactor observation stacking (#1238)
* refactor stacking obs

* Improve docstring

* remove all StackedDictObservations

* Update tests and make stacked obs clearer

* Fix type check

* fix stacked_observation_space

* undo init change, deprecate StackedDictObservations

* deprecate stack_observation_space

* type hints

* ignore pytype errors

* undo vecenv doc change

* Deprecation warning in StackedDictObs doctstring

* Fix vec_env.rst

* Fix __all__ sorting

* fix pytype ignore statement

* Update docstring

* stack

* Remove n_stack

* Update changelog

* Simplify code

* Rename test file

* Re-use variable for shift

* Fix doc build

* Remove pytype comment

* Disable pytype error

---------

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2023-02-06 22:41:59 +01:00
adamfrly
411ff697dd
Ensure train/n_updates metric accounts for early stopping of training loop (#1311)
* Correct _n_updates when target_kl stops loop early

* Update changelog

* Simplify code

---------

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2023-02-06 15:48:41 +01:00
Marco Tröster
d0c1a87faf
Add scaling section to A2C documentation (#1250)
* add scaling section to A2C documentation

* add cross-reference to vectorized envs article

* turn it as note

* update changelog

* add Bonifatius94 to the list of contributors

* fix issue number

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2023-02-02 12:34:38 +01:00
Alex Pasquali
bea3c44ba5
Fixed typo in A2C's docstring (#1303) 2023-01-28 12:04:07 +01:00
Quentin Gallouédec
5ee9009535
Add sticky actions for Atari games (#1286)
* repeat_action_probability

* Add test

* Undo atari wrapper doc change since CI fails

* remove action_repeat_probability from make_atari_env

* Add sticky action wrapper and improve documentation

* Update changelog

* handle the case noop_max=0

* Update tests

* Comply to ALE implementation

* Reorder doc

* Add doc warning and don't wrap with sticky action when not needed

* fix docstring and reorder

* Move `action_repeat_probability` args at the last position

* Add ref

* Update doc and wrap with frameskip only if needed

* Update changelog

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2023-01-26 10:32:58 +01:00
Quentin Gallouédec
637988c9cc
Fix Atari wrapper bug: tried to step environment that needs reset (#1297)
* fix 1060

* update changelog
2023-01-26 00:31:20 +01:00
Alex Pasquali
b702884c23
Removed shared layers in mlp_extractor (#1292)
* Modified actor-critic policies & MlpExtractor class

ActorCriticPolicy:
  - changed type hint of net_arch param: now it's a dict
  - removed check that if features extractor is not shared: no shared layers are allowed in the mlp_extractor regardless of the features extractor
ActorCriticCnnPolicy:
  - changed type hint of net_arch param: now it's a dict
MultiInputActorcriticPolicy:
  - changed type hint of net_arch param: now it's a dict
MlpExtractor:
  - changed type hint of net_arch param: now it's a dict
  - adapted networks creation
  - adapted methods: forward, forward_actor & forward_critic

* Removed shared layers in mlp_extractor

* Updated docs and changelog + reformat

* Updated custom policy tests

* Removed test on deprecation warning for share layers in mlp_extractor

Now shared layers are removed

* Update version

* Update RL Zoo doc

* Fix linter warnings

* Add ruff to Makefile (experimental)

* Add backward compat code and minor updates

* Update tests

* Add backward compatibility

* Fix test

* Improve compat code

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2023-01-23 14:55:19 +01:00
Quentin Gallouédec
92f7a6f23b
Fix test_vec_normalize.py, test_tensorboard.py and common/monitor.py type hint (#1194)
* Remove from mypy exclude

* type hint for metadata

* Union[float, int] -> float

* Remove useless __init__

* Type hint for model and logger in BaseCallback

* Type hint for metric_dict

* Update changelog

* fix test_tensorboard

* ignore gamma type checking

* Fix monitor type hint

* Update logger type hints

* Fix type annotation and bump version

* Fix circular import

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2023-01-13 18:28:22 +01:00
Yu Zheng
9bb1538b78
Fix outdated load_parameters to set_parameters (#1270)
* Update examples.rst

* Update changelog

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2023-01-11 14:13:21 +01:00
Antonin RAFFIN
6b8905acdb
Release v1.7.0 (#1268) 2023-01-10 17:32:57 +01:00
Dominic Kerr
5aa6e7d340
Fix ProgressBarCallback under-reporting (#1260)
* Updated tqdm progress bar constructor to account for the effects of train_freq/n_steps/num_envs on total_timesteps. Ensure progress bar is "flushed" on training end.

* Added description of PR #1260. Fixed formatting typo

* Partial revert

Co-authored-by: dominicgkerr <dominicgkerr1@gmail.co>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2023-01-10 15:17:52 +01:00
Alex Pasquali
30a19848ce
Deprecation of shared layers in MlpExtractor (#1252)
* Deprecation warning for shared layers in Mlpextractor

* Updated changelog

* Updated custom policy doc

* Update doc and deprecation

* Fix doc build

* Minor edits

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2023-01-05 09:59:36 +01:00
Quentin Gallouédec
4fa17dcf0f
Standardize the use of from gym import spaces (#1240)
* generalize the use of `from gym import spaces`

* command line get system info

* Documentation line length for doc

* update changelog

* add space before os plateform to avoid ref to other issue

* format

* get_system_info update in changelog

* fix type check error

* fix get system info

* add comment about regex

* update version
2023-01-02 14:51:11 +01:00
Friedrich Yuan
2bb8ef5e63
Add RLeXplore to the project page (#1246)
* Update project page

Adding the repo "rl-exploration-baselines" to the project page.

* Update changelog.rst

* Update projects.rst

* Update changelog.rst

* Update docs/misc/projects.rst

* Update changelog.rst

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2022-12-28 15:06:09 +01:00
Antonin RAFFIN
e78ba6ffa4
Hotfix to load policies saved with SB3 <= v1.6 (#1234)
* Hotfix to load policies saved with SB3 <= v1.6

* Add warning and test

* Update doc
2022-12-22 23:58:30 +01:00
Antonin RAFFIN
3c028f3d5c
Fix load_from_tensor (#1231) 2022-12-22 17:28:18 +01:00
Quentin Gallouédec
5549b34231
Fix `stable_baselines3/common/vec_env/vec_check_nan.py` type hints (#1226)
* super() init style

* "async_step" arg to "event"; "news" to "dones"; improve docstring

* Remove vec_check_nan from mypy exclude

* Update changelog
2022-12-22 12:24:59 +01:00
Quentin Gallouédec
9aff1137a9
Add support for Python 3.10 (#1227)
* Add python 3.10 and 3.11

* Update setup

* Fix CI

* Drop 3.11 (because of pytorch)

* Update changelog

* revert unwanted change in setup.cfg

* Remove remark about pytorch
2022-12-21 15:52:48 +01:00
Antonin RAFFIN
7202ece85b
Update tensorboard callback doc (#1221)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2022-12-21 12:51:28 +01:00
Quentin Gallouédec
96b1a7cf01
env_id consistency in tests (#1224)
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-12-20 16:01:26 +01:00
Quentin Gallouédec
7fb8336f40
Update PR template (#1225)
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-12-20 15:13:42 +01:00
Alex Pasquali
2cfcec4f50
Modified ActorCriticPolicy to support non-shared features extractor (#1148)
* Modified ActorCriticPolicy to support non-shared features extractor

* Refactored features extraction with non-shared features extractor in ActorCriticPolicy and updated doc

Doc update: added 'warning' on custom policy docs that says that, if the features extractor is non-shared, it's not possible to have shared layers in the mlp_extractor

* Moved attrib share_features_extractor in class

* Updated custom policy doc for non-shared features extractor

* Updated changelog

* Made some if-statements more readable if policies.py

The if-statements are related to the shared/non-shared features extractor in ActorCritic policies

* Simplify implementation and add run test

* Keep order in module gain to keep previous results consistents

* Fix test

* Improved docstring in policies.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Added some tests

* feature extractor -> features extractor

* Fix test

* Fix env_id in test

* Make features extractor parameter explicit

* Remove duplicate

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2022-12-20 15:12:05 +01:00
Antonin RAFFIN
8452106734
Fix support of image like normalized inputs (#1214)
* Fix support of image like normalized inputs

* Improve docstring and warning message.

* Don't check if obs is image when normalize_images is False (lil opt)

* Comment fix

* Fix normalize_images not passed to parent

* Check for subclasses too

* Remove useless multiline

* Update version and add comment

* Fix some typos

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2022-12-20 13:18:28 +01:00
Quentin Gallouédec
ca944fed2d
Update version (#1220)
* Replace .to(device) when possible

* fix numpy dep

* black

* Add warning for device != cpu and copy=False

* Update changelog

* Remove warning

* Update buffers.py

* Update version

* Fix type checking

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-12-19 13:53:00 +01:00
Antonin Raffin
9af2d11b6e
Update changelog 2022-12-19 13:21:10 +01:00
Quentin Gallouédec
68a40e0940
Construct tensors directly on GPU (#1218)
* Replace .to(device) when possible

* fix numpy dep

* black

* Add warning for device != cpu and copy=False

* Update changelog

* Remove warning

* Update buffers.py
2022-12-19 12:50:22 +01:00
Antonin RAFFIN
0c1bc0b1da
Fix stable_baselines3/common/atari_wrappers.py type hints (#1216)
* Fix `stable_baselines3/common/atari_wrappers.py` type hints

* Fix initialization

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2022-12-18 16:13:44 +01:00
Antonin RAFFIN
07094c3f2e
Fix stable_baselines3/common/preprocessing.py type hints (#1217) 2022-12-18 15:53:17 +01:00
Alex Pasquali
6d55a09f81
Updated custom policy docs to better explain the `mlp_extractor`'s dimensions (#1196)
* Updated custom policy docs

Better explained how the dimensions of the mlp_extractor work, including the action net and the value net after the layers specified in net_arch.

* Improved custom policy doc

Section: Custom Network Architecture.
Explained with greater detail that an action net and a value net will be added on top of the net_arch.

* Improved custom policy doc

Section: Custom Network Architecture.
Merged a comment into a note

* Alignment

Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>
2022-12-12 16:19:51 +01:00
Quentin Gallouédec
e39bc3da00
Add support for multidimensional spaces.MultiBinary observations (#1179)
* Fix `get_obs_shape` for multidimensi onnal Multibinary space

* Update changelog

* more tests

* fix multidiscrete one-hot encoding

* refactor tests

* Update changelog.rst

* Update changelog.rst

* batched obs and revert preprocess_obs changes

* Add support for multidimensional ``spaces.MultiBinary`` observations

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2022-12-08 18:46:41 +01:00
Quentin Gallouédec
6763a864c8
Upgrade CI/github-actions (#1204)
* checkout v2 -> v3; setup-python v2 -> v4

* Update changelog.rst
2022-12-07 16:43:47 +01:00
Athanasios Theocharis
f7d7ed3fa7
Update custom_policy.rst (#1183)
* Update custom_policy.rst

* Update changelog

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2022-12-06 17:51:52 +01:00
Quentin Gallouédec
002850f8ac
Fix stable_baselines3/common/torch_layers.py type hint (#1191)
* Remove torch layers from mypy exclude

* Make torch layers mypy compliant

* Extra type specification

* Update changelog

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-11-29 23:46:32 +01:00
Zikang Xiong
852d635742
Exposed modules in __init__.py with __all__ (#1195)
* Exposed modules in __init__.py with __all__

* Remove flake8 ignore and update root __all__

* Update version

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-11-29 23:33:46 +01:00
Quentin Gallouédec
b46396a664
Fix stable_baselines3/common/env_util.py type hint (#1192)
* Remove env_util from mypy exclude

* Fix make_atari_env type hint

* Update changelog
2022-11-29 15:36:55 +01:00
Quentin Gallouédec
5cd891317e
Add with_bias parameter to create_mlp (#1188)
* Add with_bias arg

* Update changelog

* move torch_layers to the last position

* Update version

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-11-29 12:43:16 +01:00
Quentin Gallouédec
6902fac5e7
Fix stable_baselines3/common/type_aliases.py type hint (#1189) 2022-11-29 12:26:16 +01:00
Quentin Gallouédec
0973b01b9d
Fix tests/test_distributions.py type hint (#1186)
* Fixed test_distribution type hint

* Impose list[int] for action dim
2022-11-29 11:27:59 +01:00
Quentin Gallouédec
aee0ba03c7
Update changelog for #1184 (#1185) 2022-11-28 19:36:26 +01:00
Quentin Gallouédec
e3b24829a5
Drop gym.GoalEnv and other minor changes initally from #780 (#1184)
* Various changes from #780

* Fix env_checker for goal_env detection
2022-11-28 18:22:31 +01:00
Antonin RAFFIN
cd630a3121
Fixes for flake8 6.0 (#1181) 2022-11-25 15:14:55 +01:00
Juan Rocamonde
68b190b667
Raise error when same env object instance is passed in vectorized environment (#1154)
* Raise error when same env object instance is passed in vectorized environment

* At to changelog

* Add raises to docstring

* Add test

* Also test make_vec_env

* Fix test

* Try to enable color for MyPy

* Update version and ignore lint warnings

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2022-11-22 14:28:58 +01:00
Quentin Gallouédec
f3abda5cbc
Fix Self return type (#1167)
* Fix Self annotation

* Update changelog

* Define type var on top

* ClassSelf to SelfClass

* annotate self

* Revert Running meanstd change

* Revert vecnormalize change (static method rejected)

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-11-22 13:42:39 +01:00
Quentin Gallouédec
abffa16198
Mypy type checking (#1143)
* Install and configure mypy

* Test if github CI uses setup.cfg for mypy

* force color output

* tab to space

* Try to fix regex

* follow_imports silent

* use space as indentation

* fix indentation setup.cfg

* Show error code

* Update doc

* Udate changelog

* Ignore mypy cache files from commit

* Update gitlab CI

* Add pytype and mypy entry in Makefile

* Make mypy happy

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-11-16 13:22:57 +01:00
Franz Srambical
8641b05b09
Fix typo in documentation (#1177) 2022-11-15 15:00:03 +01:00
Taimur Shahzad Gill
7e1db1aaaa
Fixed errors in the documentation (#1159)
* Fixed errors in the documentation

Fixed grammatical and punctuation errors, and improved the sentence structure.

* Added username in the contributors
2022-11-07 15:38:41 +01:00
Adam Gleave
4fb8aec215
Update evaluate_policy type annotation to support policies as well as RL algorithms (#1146)
* Add PolicyPredictor protocol and use it in evaluate_policy

* Update changelog

* Move Protocol to type_aliases to avoid circular import

* Add test for evaluate_policy on BasePolicy

* Remove unused import

* Use typing_extensions

* Move typing_extensions to 3rd party

* Add version range (typing_extensions uses SemVer)

* Import Protocol from typing_extensions only on Python<3.8

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Install typing_extensions only on Python<3.8

* Add missing sys import

* Fix import ordering

* Fix observation type hint in predict

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>
2022-11-03 15:36:19 +01:00
Antonin RAFFIN
0532a5719c
Fix integration documentation (#1135) 2022-10-24 13:20:58 +02:00
Antonin Raffin
37a942c8f9
Fixes 2022-10-24 12:53:48 +02:00
Thomas Simonini
0274aaf056
Update docs/guide/integrations.rst
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-10-24 11:22:33 +02:00
Thomas Simonini
fc6c111cc3 Changelog Update 2022-10-24 11:03:20 +02:00
Thomas Simonini
714737c986 Update Hugging Face Integration Documentation 2022-10-24 10:55:30 +02:00
Quentin Gallouédec
d5d1a02c15
Allow model trained with python3.7 to be loaded with python3.8+ without the custom_objects workaround (#1123)
* Fix loading

* Remove documentation note

* Update changelog

* Revert save_format change

* Add test for errors while unpickling

* Update version and cleanup

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-10-17 17:33:47 +02:00
Quentin Gallouédec
5ef10c8e69
Fix type annotation of `policy in BaseAlgorithm and OffPolicyAlgorithm` (#1120) 2022-10-17 10:16:20 +02:00
Juan Rocamonde
cdcdd32c51
Fix return type of evaluate_actions (#1118)
* Fix return type of ActorCriticPolicy.evaluate_actions to optional entropy tensor

* Update changelog.rst
2022-10-14 17:45:28 +02:00
Quentin Gallouédec
1bff6215b6
New Issue forms (#1111)
* Update bug report template

* .md -> .yml

* System info section

* Custom env issue form

* documentation form

* Question template

* Feature request template

* Rm old templates

* Update changelog
2022-10-13 17:46:21 +02:00
Antonin RAFFIN
508f8ffd59
Remove deprecated features and attributes (#1104)
* Remove deprecated eval env

* Remove deprecated ret attribute

* Remove sde net arch

* Remove unused code

* Update test comment

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2022-10-11 10:55:16 +02:00
Sam Toyer
5e8f06b3cb
Link to full imitation docs (#1106) 2022-10-10 21:36:30 -07:00
Antonin RAFFIN
e2f81bb70b
Release v1.6.2 (#1103)
* Release v1.6.2

* Remove Gitlab CI, no more minutes
2022-10-10 16:37:11 +02:00
tobirohrer
d8a430e088
Deprecate create_eval_env, eval_env and eval_freq parameter (#1082)
* Adds deprecation warning if `eval_env` or `eval_freq` parameters are used. See #925

* added changelog entry

* added missing backtick

* deprecating `create_eval_env` parameter as well and adding comments to explain the `stacklevel` parameter used

* Updated tests to ignore DeprecationWarnings

* Updated changelog entry

* - Removed the `create_eval_env` parameter from the examples in the docs
- Removed information about the `create_eval_env` parameter from the migration docs
- Added information about deprecation of the `create_eval_env` parameter in the docs

* Add alternative in docstring

* Update docstrings

* `eval_freq` warning in docstring

* Add deprecation comments in tests

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>
2022-10-10 15:39:38 +02:00
Antonin RAFFIN
7c21b79188
Add progress bar callback and argument (#1095)
* Add progress bar callback and argument

* Update doc

* Update changelog

* Upgrade pytype in docker image

* Use tqdm.write in the logger to have cleaner output

* Fix logger test

* Fix when doing multiple calls to learn()

* Address comments from code-review
2022-10-06 18:17:31 +02:00
Alex Pasquali
6a8c9ddc8b
Updated type hint and extended docstring in make_vec_env and make_atari_env (#1085)
* Updated type hint and extended docstring in make_vec_env

The function itself was already working with callables, but it wasn't considerent in the type hint of the function's signature.

Extended the description of the wrapper_class parameter with a link to a Github issue containing more details on the matter.

* Updated type hint in make_atari_env

The function itself was already working with callables, but it wasn't considerent in the type hint of the function's signature.

* Updated docstring in make_atari_env

When modifying the type hint of the parameter 'env_id' (in this commit: fda6872f73c11075901ba88f2520f6316f818d1d), I forgot to update its description in the docstrig.
Doing it now.

* Removed redundant type in env_id's type hint in make_vec_env and make_atari_env

Callable[..., gym.Env] already includes Type[gym.Env], as pointed out here: https://github.com/DLR-RM/stable-baselines3/pull/1085#issuecomment-1269685218

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-10-06 13:36:06 +02:00
Quentin Gallouédec
a697401e03
Standardized the use of `"` for string representation (#1086)
* Replace ``'`` by ``" `` in python code

* Update changelog

* Rm whitespace
2022-10-03 15:15:39 +02:00
Quentin Gallouédec
d3eb0e3ed6
Fix importlib dependency (#1088)
* Set requirement ``importlib-metadata~=4.13``

* Update changelog
2022-10-03 12:03:51 +02:00
Antonin RAFFIN
537a82a7fd
Update export doc (fixes + add torch jit) (#1074)
* Update export doc (fixes + add torch jit)

* Fix conflicts

* Update according to code review comments

* fix torch -> th

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2022-09-30 14:30:40 +02:00
Antonin RAFFIN
21300c9aaf
Release v1.6.1 (#1080) 2022-09-29 12:15:55 +02:00
Akhil
def0574d03
Fixed typos (#1076)
* Updated docstring from n_steps to n_rollout_steps

This must be a typo

* Fixed typo in a comment in ppo.py

* Update changelog

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2022-09-28 14:57:46 +02:00
Juan Rocamonde
e22e372306
Fix duplicate key error in HumanOutputFormat (#1079)
* Fix duplicate key error in HumanOutputFormat

* Update changelog

* Add test

* Update changelog.rst

Co-authored-by: Adam Gleave <adam@gleave.me>

Co-authored-by: Adam Gleave <adam@gleave.me>
2022-09-28 12:06:07 +02:00
Juan Rocamonde
432b3f876d
Fix return type for load, learn in BaseAlgorithm (#1043)
* Fix return type for load, learn in BaseAlgorithm

* Update changelog

* Add typing extensions to dependencies

* Import directly from typing for python >3.11

* Reorder changelog to reflect merge order

* Roll back to typevar solution

* Updated changelog

* Remove typing extensions requirement

* Update base_class.py

* Remove final point in changelog

* Additional type fixes across project

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-09-26 12:13:56 +02:00
Dominic Kerr
899eee6bd4
Automatically create missing directories of `filenames passed to ResultsWriter` (#1072)
* Create (if any) missing filename directories, passed into ResultsWriter

* Fixed incorrect ``filename`` docstring (if ``filename`` where ``None``, the string method ``filename.endswith(Monitor.EXT)`` would raise an ``AttributeError``), and renamed ``reset_keywords`` docstring.

* Added description of #1068

* Ignore pytype errors

* Update changelog.rst

Co-authored-by: dominicgkerr <dominicgkerr1@gmail.co>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-09-21 13:14:38 +02:00
Alex Pasquali
d0b129ecc3
Updated custom policy docs (#1067) 2022-09-18 09:17:57 +02:00
Quentin Gallouédec
440735cbd0
Fix loading a model with different number of environments (#1058)
* Fix loading with new `n_envs`

* Update tests

* Update changelog

* Fix the fix

* Remove `self._setup_model()` from `set_env()`

* Raise `AssertionError` when setting env with a different `n_envs`

* Update unitests

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-09-17 11:10:03 +02:00
Juan Rocamonde
18b29a68e8
Remove forward() method from common.policies.BaseModel (#1061)
* Remove forward() method.

* Updated changelog
2022-09-11 18:39:13 +02:00
Quentin Gallouédec
98e786f744
Clarify and standardize verbosity documentation (#1056)
* Standardize the use of verbosity: > to >=

* Make verbose docstring more specific

* Update changelog
2022-09-09 16:46:28 +02:00
Quentin Gallouédec
29f6687b98
Raise error when observation keys and observation space keys don't match (#1047)
* Raise error when observation keys and observation space keys don't match

* Print the difference in keys

* Update changelog
2022-09-05 14:54:58 +02:00
Juan Rocamonde
fdca786f09
Fix replay_buffer_class type annotation (#1042)
* Fix replay_buffer_class type annotation

* Update changelog

* Further replacement of same type annotation issue

* Formatting

* Rolled back formatting changes for consistency
2022-09-01 20:10:01 -07:00
Luke Fisher
a7f30b04e3
Updated minor grammar error (#1041)
"an history" -> "a history"
2022-08-31 18:04:15 +02:00
Sidney Tio
304c17dc78
Add append mode to Monitor (#1037)
* Added option to override or use existing CSVs

* Updated changelog for Monitor override

* Changed default value to override

* Simplify code and add test

* Update version

* Fix for pytype

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-08-31 11:53:44 +02:00
Hugh Perkins
2cc1477fa2
Fix advantage normalization with mini-batchsize of 1 (#1028)
* fix nan in advnatages with batch size 1, for ppo

* changelog

* black

* Simplify test

* Bump version

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-08-25 11:50:08 +02:00
Anand Balakrishnan
59af0c1b01
CheckpointCallback can now save replay buffer and VecNormalize (#1030)
* CheckpointCallback now saves replay buffer (if present)

* VecNormalize stats are saved at checkpoints

* Make checkpointing replay buffer and VecNormalize opt-in

* Edit changelog

* Add documentation for new parameters

* Update docs/misc/changelog.rst

* Add documentation for new parameters

* Implement suggested edits

* Reformat code

* Fix git conflict

* Add .pkl suffix to VecNormalize checkpoints

* Add tests for new CheckpointCallback params

* Merge CheckpointCallback tests

* Update test and add helper for checkpoint path

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-08-25 10:57:51 +02:00
Honglu Fan
29a481a288
Include running_mean and running_val when updating target networks (#1004)
* include `running_mean` and `running_val` when updating target networks in DQN, SAC, TD3.

* Update stable_baselines3/common/utils.py

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Precompute batch norm parameters in `_setup_model` and directly copy them in the target update.

* include `running_mean` and `running_val` when updating target networks in DQN, SAC, TD3.

* Update stable_baselines3/common/utils.py

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Precompute batch norm parameters in `_setup_model` and directly copy them in the target update.

* Fix `DictReplayBuffer.next_observations` type (#1013)

* Fix DictReplayBuffer.next_observations type

* Update changelog

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Fixed missing verbose parameter passing (#1011)

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Support for `device=auto` buffers and set it as default value (#1009)

* Default device is "auto" for buffer + auto device support in BufferBaseClass

* Update docstring

* Update tests

* Unify tests

* Update changelog

* Fix tests on CUDA device

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>

* Precompute batch norm parameters in `_setup_model` and directly copy them in the target update.

* Update test

* Add comments and update tests

* Bump version

* Remove one extra space to conform code style.

* Update docstrings

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Burak Demirbilek <BurakDmb@users.noreply.github.com>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2022-08-23 10:20:43 +02:00
Timothé
01cc127d32
Support hparams logging to tensorboard (#984)
* create Hparam class & support in all OutputFormats

* add hparams documentation & example

* add hparam tests

* remove unnecessary test & fix name

* format changes

* support hyperparameters logging to tensorboard

* fix HParams class docstring

* use more explicit variable names

* raise error instead of warning

* Unpin protobuf

* Add test for logging hparams

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-08-22 22:06:54 +02:00
Antonin RAFFIN
57e0054e62
Add Quentin to the list of maintainers (#1014) 2022-08-17 09:55:40 +02:00
Quentin Gallouédec
73822c34da
Support for device=auto buffers and set it as default value (#1009)
* Default device is "auto" for buffer + auto device support in BufferBaseClass

* Update docstring

* Update tests

* Unify tests

* Update changelog

* Fix tests on CUDA device

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2022-08-16 17:54:55 +02:00
Burak Demirbilek
792e3bcc27
Fixed missing verbose parameter passing (#1011)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2022-08-16 13:32:32 +02:00
Quentin Gallouédec
a30d36002b
Fix DictReplayBuffer.next_observations type (#1013)
* Fix DictReplayBuffer.next_observations type

* Update changelog

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-08-16 10:53:22 +02:00
Quentin Gallouédec
c4f54fcf04
Handling multi-dimensional action spaces (#971)
* Handle non 1D action shape

* Revert changes of observation (out of the scope of this PR)

* Apply changes  to DictReplayBuffer

* Update tests

* Rollout buffer n-D actions space handling

* Remove error when non 1D action space

* ActorCriticPolicy return action with the proper shape

* remove useless reshape

* Update changelog

* Add tests

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-08-06 14:19:20 +02:00
jlp-ue
6ce33f5bd2
Fix url in docs (#1000)
* fixed URL in docs

* Update changelog.rst
2022-08-05 17:54:48 +02:00
Francesco Lucianò
646d6d38b6
Fixed typo in PPO doc (#983)
* Fixed typo

Fixed typo

* Update changelog

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-07-30 12:52:35 +02:00
Marsel Khisamutdinov
d532362e94
Adds info on split tensorboard graphs (#989)
* Add info on split tensorboard graphs.

* Change wording to make it look better.

* Update changelog.rst

* Rephrase and add link to issue

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-07-30 12:44:25 +02:00
Adam Gleave
b1cc15970a
Use higher resolution time_ns() and avoid division by zero (#979)
* Use higher resolution time and round up to eps

* Update changelog

* Add test case

* Fix formatting, time()->time_ns

* Bugfix: ns is integer not float

* Move test to better place

* Divide by 1e9 earlier
2022-07-25 23:02:53 +02:00
Quentin Gallouédec
fda3d4d748
Fix returned type in predict (#964)
* `arr[0]` to `arr.squeeze(0)`

* `squeeze(axis=0)` to `squeeze(0)`

* Type testing

* Add type test for unvectorized observation

* `squeeze(0)` to `squeeze(axis=0)`

* Treatment of the laziness symptoms

* Update changelog

* Udate changelog

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-07-18 11:22:19 +02:00
Antonin RAFFIN
a18b91e01a
Replace "nature" with "Nature" (magazine) to reduce confusion (#965)
* Replace "nature" with "Nature" (magazine) to reduce confusion

* Replace "nature" with "Nature" (magazine) to reduce confusion

* Update changelog

Co-authored-by: mel <callmesolis@gmail.com>
2022-07-15 22:48:27 +02:00
Antonin Raffin
38706f12f3
Use ICRL url for PPO blog post 2022-07-12 23:48:05 +02:00
Antonin RAFFIN
c1f1c3d3d7
Release v1.6.0 (#958)
* Release v1.6.0 + update doc + add copy button

* Update read the doc conda env

* Update year

* Fix bug in kl divergence check

* Rephrase requirement for envpool and isaac gym
2022-07-12 22:50:23 +02:00
Max Weltevrede
ef10189d80
Prohibit simultaneous use of optimize_memory_usage and handle_timeout_termination (#948)
* Prohibit simultaneous use of optimize_memory_buffer and handle_timeout_termination

* Modify test to avoid unsupported buffer configuration

* Change from assertion to raising of ValueError

* Update changelog

* Update style for consistency

* Use handle_timeout_termination when possible

Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-07-04 15:08:54 +02:00
Ram Rachum
d64bcb401a
Fix exception cause in base_class.py (#940) 2022-06-21 20:58:02 +01:00
Antonin RAFFIN
7ce7b6a8c2
Update defaults for offpolicy algos with features extractor (#935) 2022-06-18 10:52:52 +02:00
Antonin RAFFIN
d68f0a2411
Update doc: SB3 Contrib RecurrentPPO (#927)
* Update doc: contrib update

* Update docs/misc/changelog.rst

Co-authored-by: Anssi <kaneran21@hotmail.com>

* Address Anssi comments

Co-authored-by: Anssi <kaneran21@hotmail.com>
2022-05-31 18:11:16 +02:00
Antonin RAFFIN
4b89fbf283
Fix issues due to newer version of protobuf and sphinx (#924) 2022-05-29 21:09:50 +02:00
Antonin RAFFIN
49813d8c68
Update doc and add check for unbounded action space (#918) 2022-05-25 16:24:21 +02:00
TibiGG
2fcf8f91c1
Removed redundant double-check of nested Dict (#908)
* Removed redundant double-check of nested Dict observation space from BaseAlgorithm

* Update changelog

Co-authored-by: tibigg <tg4018@ic.ac.uk>
2022-05-09 14:36:15 +03:00
Antonin RAFFIN
0fadc94df3
Fix synchronization bug with EvalCallback (#907) 2022-05-08 21:54:34 +03:00
Thomas Rudolf
c2518dc160
Add doc to use mlflow logger (#889)
* ADD feature for mlflow logger via MLflowOutputFormat.

* Move MLFlow integration to doc

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-05-08 15:28:31 +02:00
Marsel Khisamutdinov
e98ae129de
Fix a grammatical mistake (#899)
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-05-03 16:27:48 +02:00
Antonin RAFFIN
c5f0aa5de0
Update doc: PPO blog post and remark on timeouts (#896) 2022-05-01 16:26:34 +02:00
Antonin RAFFIN
a6f5049a99
Upgrade code to Python 3.7+ syntax using pyupgrade (#887)
* Upgrade code to Python 3.7+ syntax

* Update changelog
2022-04-25 13:01:38 +03:00
Bryan Collazo
3c468ff558
Update ppo documentation (remove redundant and) (#874)
* Update ppo documentation (remove redundant and)

PTAL, thanks!

* Update changelog

* Pin ale-py version

Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2022-04-19 14:15:51 +02:00
Paul Scheikl
ed308a71be
Fixed unchecked None value in SubprocVecEnv (#808)
* Fixed unchecked None value in SubprocVecEnv

* Fixed unchecked None value in DummyVecEnv

* Fix formatting

* Update test and changelog

* Improve test

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-04-12 16:05:40 +02:00
Antonin RAFFIN
39a4f9379a
Escape tensorboard log name (#857)
* escape tensorboard log name

Otherwise utils does not recognize the log.

* Added fix to changelog

* Modifications made by: make commit-checks .

* Revert "Modifications made by: make commit-checks ."

This reverts commit 529a275d9475f85ef031038a8f3565f7301e5371.

* Update changelog and add test

Co-authored-by: James Hirschorn <James.Hirschorn@quantitative-technologies.com>
2022-04-11 21:49:18 +02:00
Antonin RAFFIN
248f082cdc
Bump min PyTorch version (#855) 2022-04-11 18:34:15 +02:00
Quentin Gallouédec
16703b1314
Fix HER goal selection (#848)
* Goal sampled from next_achieved_goal instead of achived_goal

* No need to have special case for future anymore

* Update changelog

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-04-11 17:50:02 +02:00
Grégoire Passault
254bb10c42
Replacing the policy registry with policy "aliases" (#842)
* Replacing the policy registry with policy "aliases"

* Fixing import order and SAC

* Changing arg. order to be sure policy_aliases is a kwarg

* Import orders

* Removing pytype error check

* Reformat

* Fix alias import

* Not using mutable {} as default for policy_aliases

* Empty aliases initialization

* Using static attributes for policy_aliases

* Fixing isort

* Fixing back bad merge

* Running isort

* Fixing aliases for A2C and PPO

* Using f-string

* Moving policy_aliases definition position

* Adding change in the changelog

* Update version

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-04-08 21:21:53 +02:00
Yifei Cheng
44e53ff811
Enable force_zip64 (#839)
* Enable force_zip64

* mark tests as expensive

* Update changelog

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-03-28 10:35:33 +02:00
Antonin RAFFIN
30772aa9f5
Release v1.5.0 (#835)
* Release v1.5.0

* Fix link
2022-03-25 14:38:22 +01:00
Grégoire Passault
00ac43b0a9
Removing dead code for handling time limits (#831)
* Removing dead code for handling time limits (see #829)

* Mentionning remove_time_limit_termination in the changelog

* Update changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-03-23 13:33:55 +01:00
Yuan
009bb0549a
Update tensorboard.rst in SummaryWriterCallback (#822)
* Update tensorboard.rst

* update changelog.rst

* update changelog.rst, add username
2022-03-15 21:48:52 +01:00
Antonin RAFFIN
e88eb1c9ca
Add explanation of logger output (#803)
* Add explanation of logger output

* Apply suggestions from code review

Co-authored-by: Anssi <kaneran21@hotmail.com>

* Add example output

Co-authored-by: Anssi <kaneran21@hotmail.com>
2022-03-07 12:20:43 +01:00
Julio César Alves
cdaa9ab418
Callback to early stop the training if there is no model improvement after consecutive evaluations (#741)
* Added StopTrainingOnNoModelImprovement callback and callback_after_eval parameter in EvalCallback

* Correction in EvalCallback and tests for StopTrainingOnNoModelImprovement

* Update the docs related to new StopTrainingOnNoModelImprovement callback

* Update doc

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2022-02-25 11:56:47 +01:00
Quentin Gallouédec
db5366fb51
None as default value for env in HerReplayBuffer.sample + DQN batch size typing fix (#790)
* `env` to `None` by default in `HerReplayBuffer.sample` (#788)

* Fix DQN batch_size typing

* Fix changelog

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2022-02-24 15:51:01 +01:00
Quentin Gallouédec
13fcb12471
Fix normalization for DictReplayBuffer (#744)
* Normalize samples DictReplayBuffer (#743)

* Fixed sample normalization in ``DictReplayBuffer`` (#743)

* Test buffer normalization

* Rename test replay buffer

* Bump version

Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-02-23 13:04:57 +01:00
Boyuan Chen
7a01637128
Fix VecNormalization bug for Dict obs (#768)
* fix #724 VecNormalization bug for Dict obs

* update test and changelog

* Update changelog

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-02-23 12:33:41 +01:00