Commit graph

295 commits

Author SHA1 Message Date
Antonin RAFFIN
daaebd0a52
Drop python 3.8 and add python 3.12 support (#2041)
* Drop python 3.8 support, add python 3.12 support

* Upgrade to python 3.9 syntax

* Fixes for Numpy v2

* Fix doc warning
2024-11-18 15:40:36 +01:00
Mark Towers
8f0b488bc5
Update Gymnasium to v1.0.0 (#1837)
* Update Gymnasium to v1.0.0a1

* Comment out `gymnasium.wrappers.monitor` (todo update to VideoRecord)

* Fix ruff warnings

* Register Atari envs

* Update `getattr` to `Env.get_wrapper_attr`

* Reorder imports

* Fix `seed` order

* Fix collecting `max_steps`

* Copy and paste video recorder to prevent the need to rewrite the vec vide recorder wrapper

* Use `typing.List` rather than list

* Fix env attribute forwarding

* Separate out env attribute collection from its utilisation

* Update for Gymnasium alpha 2

* Remove assert for OrderedDict

* Update setup.py

* Add type: ignore

* Test with Gymnasium main

* Remove `gymnasium.logger.debug/info`

* Fix github CI yaml

* Run gym 0.29.1 on python 3.10

* Update lower bounds

* Integrate video recorder

* Remove ordered dict

* Update changelog

---------

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2024-11-04 12:03:12 +01:00
Antonin RAFFIN
3d59b5c86b
Use uv on GitHub CI for faster download and update changelog (#2026)
* Use uv on GitHub CI for faster download and update changelog

* Fix new mypy issues
2024-10-24 15:20:05 +02:00
Devin White
56c153f048
Add warning when using PPO on GPU and update doc (#2017)
* Update documentation

Added comment to PPO documentation that CPU should primarily be used unless using CNN as well as sample code. Added warning to user for both PPO and A2C that CPU should be used if the user is running GPU without using a CNN, reference Issue #1245.

* Add warning to base class and add test

---------

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2024-10-07 11:24:47 +02:00
Antonin RAFFIN
512eea923a
Warn users when using multi-dim MultiDiscrete obs space (#2003)
* Update env checker to warn users when using multi-dim MultiDiscrete obs space

* Update changelog
2024-09-13 13:15:23 +02:00
Antonin RAFFIN
4a7631b71d
Fix test device for buffers (#1993)
* Prevent test_device from being a noop

* Update changelog

---------

Co-authored-by: Adrià Garriga-Alonso <adria@far.ai>
2024-08-18 12:33:22 +02:00
Jan-Hendrik Ewers
4a1137ba3a
Add np.ndarray as a recognized type for TB histograms. (#1635)
* Add np.ndarray as a recognized type for TB histograms.

Torch histograms allow th.Tensor, np.ndarray, and caffe2 formatted strings. This commits expands the TensorBoardOutputFormat's capabilities to log the two former types.

* Update changelog to reflect bug fix

* fix: try/catch for if either np or torch aren't at the required versions. See https://github.com/DLR-RM/stable-baselines3/pull/1635 for more details

* fix: Add comment describing the test for when add_histogram should not have been called

* Cleanup

---------

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2024-08-02 11:55:27 +02:00
Antonin RAFFIN
bd3c0c6530
Fix loading of optimizer with older DQN models (#1978) 2024-07-26 14:57:55 +02:00
Antonin RAFFIN
000544cc1f
Add support for pre and post linear modules in create_mlp (#1975)
* Add support for pre and post linear modules in `create_mlp`

* Disable mypy for python 3.8

* Reformat toml file

* Update docstring

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Add some comments

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2024-07-22 13:42:33 +02:00
Dominik Baron
24ebf1a1df
Remove unnecessary SDE resampling in PPO update (#1933)
* Remove unnecessary SDE resampling in PPO update

* Update changelog.rst

* Update version

* Update PyTorch version on CI

* Update ruff

* Limit NumPy version

* Reformat

---------

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2024-06-29 20:07:32 +02:00
will-maclean
4efee92fba
Set CallbackList children's parent correctly (#1939)
* Fixing #1791

* Update test and version

* Add test for callback after eval

* Fix mypy error

* Remove tqdm warnings

---------

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2024-06-07 14:07:28 +02:00
Joe Ksiazek
0b06d8ab20
Fix error when loading a model that has net_arch manually set to None (#1937)
* Fix loading a model with net_arch=None

* Remove redundant get

* Dummy commit

* Add to contributors

* Update test and version

---------

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2024-06-05 17:27:40 +02:00
Chris Schindlbeck
4317c62598
Fix various typos (#1926)
* Fix various typos

* Update changelog

---------

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2024-05-15 15:19:39 +02:00
Mark Smith
9a749389d3
Cast learning_rate to float lambda for pickle safety when doing model.load (#1901)
* create failing test for unpickle error

* Fix learning_rate argument causing failure in weights_only=True if passed a function with non-float types

* Updated with feedback from araffin on PR#1901

* Update test and version

* Update changelog and SBX doc

---------

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2024-04-22 10:04:01 +02:00
Corentin
071226d3e8
Log success rate for on policy algorithms (#1870)
* Add success rate in monitor for on policy algorithms

* Update changelog

* make commit-checks refactoring

* Assert buffers are not none in _dump_logs

* Automatic refactoring of the type hinting

* Add success_rate logging test for on policy algorithms

* Update changelog

* Reformat

* Fix tests and update changelog

---------

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2024-03-22 12:13:48 +01:00
Antonin RAFFIN
a8e905977f
Update env checker for spaces with non-zero start (#1845)
* Update ruff

* Update env checker for non-zero start
2024-02-19 16:44:02 +01:00
Quentin Gallouédec
e3dea4b2e0
Release 2.2.1: Hotfix file closing (#1754)
* new closing policy

* revert #1742

* Add tests and update changelog

---------

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2023-11-17 23:50:23 +01:00
Antonin RAFFIN
23fbeb5975
Fix resource warning (#1742)
* Fix resource warning

* Add test and update changelog

* Fix for new mypy version
2023-11-16 17:11:13 +01:00
Antonin RAFFIN
018ea5ab67
Fix distributions type hints (#1733)
* Fix distributions type hints

* Add test for multim binary action space

* Fix test
2023-11-06 10:09:01 +01:00
M. Ernestus
69afefc91d
Add rollout_buffer_class parameter to on-policy algorithms (#1720)
* Add rollout_buffer_class and rollout_buffer_kwargs parameters to OnPolicyAlgorithm

* Add rollout_buffer_class and rollout_buffer_kwargs to PPO.

* Add rollout_buffer_class and rollout_buffer_kwargs to A2C.

* Make use of the rollout buffer kwargs.

* Update version

* Add test and update doc

---------

Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2023-10-27 17:36:24 +02:00
Hosseinkhan Rémy
aab545901f
Add support for setting options at reset with VecEnv (#1606)
* Update signatures, and test with options

* Update changelog and black formatting

* Finish implementation (fixes, doc, tests)

* Use deepcopy to avoid side effects (modif by reference)

* Fix for mypy

---------

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2023-10-23 13:38:48 +02:00
Jan-Hendrik Ewers
2ddf015cd9
fix: Follow PEP8 guidelines and evaluate falsy to truthy with not rather than is False. (#1707)
* fix: Follow PEP8 guidelines and evaluate falsy to truth with `not` rather than `is False`.

https://docs.python.org/2/library/stdtypes.html#truth-value-testing

* chore: Update changelog inline with intent of changes in PR #1707

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* fix: Change `is False` to `not` as per PEP8

* chore: Remove superfluous comment about `is False`

* test: One On- and one Off-Policy algorithm (A2C and SAC respectively), with settings to speed up testing

* Update changelog

* chore: Remove EvalCallback as it's not actually required

* Update changelog.rst

* Rm duplicated "others" section in changelog.rst

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2023-10-09 12:21:12 +02:00
Antonin RAFFIN
2ca94cb73d
Add check for common mistake when mixing Gym/VecEnv API (#1696) 2023-09-25 12:39:22 +02:00
Corentin
f4c5b1e5e2
Fix check_env for Sequence observation space (#1690)
* Fix Sequence obs env_checker

* Fix Sequence obs env_checker

* Add test : env_checker for Sequence obs

* Add test : env_checker for Sequence obs

* Cleanup and improve env checker messages

---------

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2023-09-24 12:36:52 +02:00
Antonin RAFFIN
99712760c8
Fix render_mode when loading VecNormalize (#1671)
* Fix render_mode when loading VecNormalize

* Switch from isort to ruff, and cap black version

* Add test and update changelog
2023-09-12 11:28:32 +02:00
Patrick Helm
e071796549
Fixes replay buffer device after loading in OffPolicyAlgorithm (#1662)
* sets replay buffer device after loading

* update changelog

* update changelog

* correct changelog

* add test for replay buffer device

* Fix test to actually test the bug fix

* [ci skip] Update version

* [ci skip] Update docker images

---------

Co-authored-by: PatrickHelm <patrick.helm@gmx.net>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2023-09-03 12:50:02 +02:00
PatrickHelm
16c6a886db
Fix squash output unscaling when using gSDE (#1652)
* prevents squash_output if not use_sde, see #1592

* update changelog

* add unscaling of actions taken during training

* add test regarding squashing and unquashing

* avoids try-except block

* format Gymnasium code with black

* makes mypy pass

* makes pytype pass

* sort imports

* makes error message in assert statement clearer

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* improves code commenting

* replaces full env with wrapper

* Cleanup code

* Reformat

---------

Co-authored-by: PatrickHelm <patrick.helm@gmx.net>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2023-09-01 17:58:15 +02:00
Antonin RAFFIN
e9f0f23ce4
Fix type hints for callbacks, utils and VecTranspose (#1648)
* Fix type hints in `common/utils.py`

* Fix `VecTranspose` type annotations

* Fix types for callbacks

* Update changelog

* Fix video recorder type hints

* Fix save utils type hints

* Allow BytesIO

* Improve error message

* Make logger and training env properties

* Clarify which open_path fn is called
2023-08-29 16:04:08 +02:00
Antonin RAFFIN
17f02a8ae1
Fix env checker bounds, expose all invalid indices at once (#1638)
* Fix bug in env_checker.py bounds warning message

* Fix bug where Gym Environment Checker does not output the correct warning message when dealing with observation spaces that have different upper and different lower bounds

* Update test_env_checker.py with more comprehensive tests

* Make naming consistent

* Update version

* Catch all invalid indices at once

---------

Co-authored-by: gabo_tor <gabriel0torre@gmail.com>
2023-08-02 16:43:45 +02:00
Tobias Rohrer
ba77dd7c61
Fix to use float64 actions for off policy algorithms (#1572)
* Added test cases where off policy algorithms fail with float64 actionspace

* casting observations and actions to `np.float32` to unify behaviour between `ReplayBuffer` and `RolloutBuffer`. Fixing issue #1145

* reformatted using black

* making test more restrictive by checking models action is float64

* added changelog entry

* undo cast of observations as `preprocessing.preprocess_obs()` casts them to float32 anyways.

* - Casting to float32 only, if action.dtype is float64
- Added cast to `DictReplayBuffer` as well

* Added tests for multiple variations of continuous action types and observation spaces

* applied reformatting by `make commit-checks`

* Added typing and comment referring to description in merge request

* Apply linter for single element slice

* Rename helper and refactor tests

* Update changelog and docstring

---------

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2023-07-24 16:38:03 +02:00
Antonin RAFFIN
a730b9b66a
Relax logger check for Windows (#1615)
* Relax logger check for Windows

* Update tests
2023-07-21 07:02:38 +02:00
Mark Towers
61e1060525
Update Gymnasium to v0.29.0 (#1610)
* Update setup.py to v0.29.0

* Remove invalid test

* Loosen version and update changelog

---------

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2023-07-18 14:22:22 +02:00
Antonin RAFFIN
ffe26ccf95
Fix render bug for vec env wrappers (#1525)
* Fix render bug for vec env wrappers

* Fix tests and update changelog

* Better fix, backward compatible

* remove render_mode from VecEnv init

* Make DictObsVecEnv inherit from VecEnv

* format

* Fix env_is_wrapped

* try/except getting render mode ( (https://github.com/DLR-RM/stable-baselines3/pull/1525#discussion_r1206888921)

* update version

* Fix env_is_wrapped in test_vec_extract_dict

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>
2023-06-07 16:20:40 +02:00
Lara Bergmann
32778ddc94
Fix wrong truncation in HER replay buffer (#1543)
* fix episode start idx that leads to wrong episode length

* add episode length test

* Update changelog

* Reformat files

* Use replay_buffer.dones to test HER truncation warning

* truncate_last_trajectory: sample truncated episode and handle infinite horizon tasks

* make test_truncate_last_trajectory independent of learning

* Add timeout comment HER truncate_last_trajectory

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Update version.txt

* Update version

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2023-06-07 15:57:12 +02:00
lutogniew
e76316341d
Fix env checker single-step-env edge case (#1524)
* Fix env checker single-step-env edge case

Before this change, env checker failed to `reset()` the tested
environment before calling `step()` when checking for `Inf` / `NaN`.
This could cause environments which happened to have only one `step()`
available before the episode was terminated to fail.

This is now fixed.

* Code review fixes #1

As suggested by Antonin Raffin <antonin.raffin@ensta.org>.
2023-05-25 17:12:32 +02:00
Kallinteris Andreas
9c338f917a
vec_envs fix seed() causing a reset (#1486)
* `dummy_vec_env` fix `seed()` causing a reset

* rename `seed`

* fixes

* bug fix

* fix seed return type

* Cleanup seeding, add test and remove compat wrapper

* Update env checker and tests

* Add deterministic test for make_vec_env

---------

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2023-05-20 10:30:54 +02:00
Quentin Gallouédec
9cebedc89f
Fix Colab logger error (#1484)
* fix HumanOutputFormat

* update version

* update changelog

* TextIO annotation, TextIOBase isinstance

* update changelog

* test for HumanOutputFormat with custom TextIO

* rm extra test line

* Update tests/test_logger.py

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

---------

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2023-05-05 14:26:39 +02:00
Antonin RAFFIN
63a0bb9da1
Type annotation bundle (logger, vec env, custom envs) (#1479)
* Switch from List to Sequence for `seed()` type hint

* Fix logger type hints

* Improve replay buffer type hints

* Fix custom envs type annotations

* Fix VecMonitor type hints

* Fix RMSprop type hint

* Fix vec extract dict obs type hints

* Fix vec frame stack type annotations

* Fix base vec env type hints

* Fix dummy vec env type hints

* Fix for mypy

* Fixes for the tests

* mypy doesn't like when we overwrite type

* fix step of SimpleMultiObsEnv

* remove useless type specification

* Rm useless type hint

* Improve logger type hint

* format

* rm useless type hint

* Re-add variables in constructor, remove unused import

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2023-05-04 20:27:15 +02:00
Tobias Rohrer
6cbb2c9303
Fix DQN target update interval for multi-env (#1463)
* Calculating target update interval per environment in `_on_step()`. See GitHub issue #1373

* Added changelog entry and changed test comment

* Added requested changes from code review

* Update version

---------

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2023-04-27 18:35:33 +02:00
Antonin RAFFIN
40e0b9d2c8
Add Gymnasium support (#1327)
* Fix failing set_env test

* Fix test failiing due to deprectation of env.seed

* Adjust mean reward threshold in failing test

* Fix her test failing due to rng

* Change seed and revert reward threshold to 90

* Pin gym version

* Make VecEnv compatible with gym seeding change

* Revert change to VecEnv reset signature

* Change subprocenv seed cmd to call reset instead

* Fix type check

* Add backward compat

* Add `compat_gym_seed` helper

* Add goal env checks in env_checker

* Add docs on  HER requirements for envs

* Capture user warning in test with inverted box space

* Update ale-py version

* Fix randint

* Allow noop_max to be zero

* Update changelog

* Update docker image

* Update doc conda env and dockerfile

* Custom envs should not have any warnings

* Fix test for numpy >= 1.21

* Add check for vectorized compute reward

* Bump to gym 0.24

* Fix gym default step docstring

* Test downgrading gym

* Revert "Test downgrading gym"

This reverts commit 0072b77156c006ada8a1d6e26ce347ed85a83eeb.

* Fix protobuf error

* Fix in dependencies

* Fix protobuf dep

* Use newest version of cartpole

* Update gym

* Fix warning

* Loosen required scipy version

* Scipy no longer needed

* Try gym 0.25

* Silence warnings from gym

* Filter warnings during tests

* Update doc

* Update requirements

* Add gym 26 compat in vec env

* Fixes in envs and tests for gym 0.26+

* Enforce gym 0.26 api

* format

* Fix formatting

* Fix dependencies

* Fix syntax

* Cleanup doc and warnings

* Faster tests

* Higher budget for HER perf test (revert prev change)

* Fixes and update doc

* Fix doc build

* Fix breaking change

* Fixes for rendering

* Rename variables in monitor

* update render method for gym 0.26 API

backwards compatible (mode argument is allowed) while using the gym 0.26 API (render mode is determined at environment creation)

* update tests and docs to new gym render API

* undo removal of render modes metatadata check

* set rgb_array as default render mode for gym.make

* undo changes & raise warning if not 'rgb_array'

* Fix type check

* Remove recursion and fix type checking

* Remove hacks for protobuf and gym 0.24

* Fix type annotations

* reuse existing render_mode attribute

* return tiled images for 'human' render mode

* Allow to use opencv for human render, fix typos

* Add warning when using non-zero start with Discrete (fixes #1197)

* Fix type checking

* Bug fixes and handle more cases

* Throw proper warnings

* Update test

* Fix new metadata name

* Ignore numpy warnings

* Fixes in vec recorder

* Global ignore

* Filter local warning too

* Monkey patch not needed for gym 26

* Add doc of VecEnv vs Gym API

* Add render test

* Fix return type

* Update VecEnv vs Gym API doc

* Fix for custom render mode

* Fix return type

* Fix type checking

* check test env test_buffer

* skip render check

* check env test_dict_env

* test_env test_gae

* check envs in remaining tests

* Update tests

* Add warning for Discrete action space with non-zero (#1295)

* Fix atari annotation

* ignore get_action_meanings [attr-defined]

* Fix mypy issues

* Add patch for gym/gymnasium transition

* Switch to gymnasium

* Rely on signature instead of version

* More patches

* Type ignore because of https://github.com/Farama-Foundation/Gymnasium/pull/39

* Fix doc build

* Fix pytype errors

* Fix atari requirement

* Update env checker due to change in dtype for Discrete

* Fix type hint

* Convert spaces for saved models

* Ignore pytype

* Remove gitlab CI

* Disable pytype for convert space

* Fix undefined info

* Fix undefined info

* Upgrade shimmy

* Fix wrappers type annotation (need PR from Gymnasium)

* Fix gymnasium dependency

* Fix dependency declaration

* Cap pygame version for python 3.7

* Point to master branch (v0.28.0)

* Fix: use main not master branch

* Rename done to terminated

* Fix pygame dependency for python 3.7

* Rename gym to gymnasium

* Update Gymnasium

* Fix test

* Fix tests

* Forks don't have access to private variables

* Fix linter warnings

* Update read the doc env

* Fix env checker for GoalEnv

* Fix import

* Update env checker (more info) and fix dtype

* Use micromamab for Docker

* Update dependencies

* Clarify VecEnv doc

* Fix Gymnasium version

* Copy file only after mamba install

* [ci skip] Update docker doc

* Polish code

* Reformat

* Remove deprecated features

* Ignore warning

* Update doc

* Update examples and changelog

* Fix type annotation bundle (SAC, TD3, A2C, PPO, base class) (#1436)

* Fix SAC type hints, improve DQN ones

* Fix A2C and TD3 type hints

* Fix PPO type hints

* Fix on-policy type hints

* Fix base class type annotation, do not use defaults

* Update version

* Disable mypy for python 3.7

* Rename Gym26StepReturn

* Update continuous critic type annotation

* Fix pytype complain

---------

Co-authored-by: Carlos Luis <carlos.luisgonc@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Thomas Lips <37955681+tlpss@users.noreply.github.com>
Co-authored-by: tlips <thomas.lips@ugent.be>
Co-authored-by: tlpss <thomas17.lips@gmail.com>
Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>
2023-04-14 13:13:59 +02:00
WeberSamuel
15c9daa2ba
Fix VecExtractDictObs does not handle terminal observation (#1443)
* VecExtractDictObs handle terminal_observation

* Added VecExtractDictObs handle terminal_output to changelog

* Update changelog.rst

* Update test_vec_extract_dict_obs.py

Add random dones in env to test if terminal_observation is properly handled

* Made test deterministic

* Fixed bug in test

* Improved test

* Fix format in test

* Update test

* Fix type hint

* Ignore pytype warning

* Ignore pytype

---------

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2023-04-12 15:20:04 +02:00
npit
4232f9daa9
Rename the observations variable in the evaluation util to avoid shadowing (#1288)
* Rename the observations variable in the evaluation util to avoid shadowing

This enables a callback in evaluate_policy to have access to the
observation vector that is fed to the environment step function,
which is currently shadowed by the output observation.

* Update changelog

* Add test

* Move assignment outside of the loop

---------

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2023-04-11 18:00:33 +02:00
Antonin RAFFIN
84f5511e08
Update changelog and cleanup (#1434) 2023-04-08 15:36:55 +02:00
Jonas Reiher
12250eb761
Add stats window argument (#1424)
* added stats_window_size argument

* updated changelog

* docstring info updated

* added missing tensorboard log docstring

* added stats_window_size argument for all models

* fixed stats_window_size test

* Update version

---------

Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2023-04-05 11:33:26 +02:00
Antonin RAFFIN
5a70af8abd
Fix type hints for DQN (#1354)
* Fix type hints for DQN

* [ci skip] Remove commented line

* Refine types

* Fix vectorized obs detection

* Fix for pytype

* Fix check at load time to create replay buffer

* One config file to rule them all

* Delete unused config

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2023-03-30 11:31:47 +02:00
Omar Younis
a60b0179e0
Fix: Reshape action in DictRolloutBuffer (#1395)
* reshape action in DictRolloutBuffer

* improve buffer test

* update changelog

* add comment

* Update comments and version

---------

Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2023-03-29 16:25:05 +02:00
Fiete
b6aa507a22
Make check_env assertions in regards to observation_space more actionable (#1400)
* add instructions for running single tests in the README, add assertions for observation_space

* update changelog

* address linting warnings

* correct pytest command in the README

* correct review comments, run make commit-checks

* truncate lines that are too long

* address make lint warning about checking module availability

* fix tests

* use f-strings for formatting assertion messages

* fix type issue

* Refactor tests, improve error messages

---------

Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2023-03-29 15:26:03 +02:00
Quentin Gallouédec
c5adad82b2
Multiprocessing support for HerReplayBuffer (#704)
* IM compat. modif from old fork

* mp her working, without offline sampling

* update readme and doc

* fix discrete action/obs space case

* handle offline sampling

* fix pos to be consistent with the old version

* improve typing and docstring

* fix discrete obs special case

* new her, using episode uid

* deal with full buffer

* offline not implemented

* info storage; compute_reward as arg; offline sampling error

* offline sampling; timeout_termination; fix last_trans detection

* rm max_episode_length from tests

* fix loading and loading test

* Fix episode sampling strategy

* Episode interrupted not valid

* Typo

* Fix infos sampling, next_obs desired goals, offline sampling

* update tests for multienvs

* speed up code

* handle timeout sampling when samping

* give up ep_uid for ep_start and ep_lenght

* speed up sampling

* Improve docstring

* Typos and renaming

* Fix typing

* Fix linter warnings

* Renaming + add note

* fix reward type

* Fix future sampling strategy

* Fix future goal selection strategy

* env_fn as lambda

* Re-fix linter warnings

* Formatting

* Fix offline sampling

* restore the initial performance budget

* Remove max_episode_length for HerReplayBuffer kwargs

* SubprcVecEnv compat test

* Dedicated SubrocVecEnv test rm n_envs from parametrization

* Back to using the env arg instead of compute_reward

* Up VecEnv import

* fix lint warnings

* fix docstring

* Fix device issue

* actor_loss_modifier in SAV and TD3

* Merge RewardModifier and ActorLossModifier into Surgeon

* update surgeon for rnd

* fix uninteded merge

* fix uninteded merge

* fix unintended merge

* Rm unintended merge

* Fix KeyError

* Remove useless `all_inds`

* Minor docstring format

* Fix hint

* speedup!

* Speedup again

* speedup

* np.nonzero

* fix env normalization

* flat sampling for speedup

* typo

* drop online

* format

* remove observation from env_cheker (see #1335)

* update changelog

* default device to "auto"

* add comment for info storage

* add comment for ep_start and ep_length attributes

* a[b][c] to a[b, c]

* comment flatnonzero and unravel_index

* update _sample_goals docstring

* Fix future gaol sampling for split episode

* add informative error message for learning_starts too small

* use keyword arg for env

* try fix pytye

* Update stable_baselines3/common/off_policy_algorithm.py

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Add `copy_info_dict` option

* Ignore pytype

* Update changelog

* Rename variables and improve documentation

* Ignore new bug bear rule

* Add note about future strategy

* Add deprecation warning

* Fix bug trying to pickle buffer kwargs

---------

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2023-03-20 12:03:57 +01:00
Antonin RAFFIN
470771b5c2
Fix Atari Roms download, enable RUF linting (#1379)
* Add extra no Atari and fix CI for forks

* Enable ruff rules

* Change to no roms
2023-03-12 18:47:52 +01:00
Quentin Gallouédec
12e9917c24
Fix image-based normalized env loading (#1321)
* Fix

* Add test

* Update changelog

* fix memory error avoidance

* Update version

* image env test

* black

* check_shape_equal

* check shape equal in vecnormalize

* Allow spaces not to be box or dict

* rm `test_save_load_vecnormalized_image` in favor of `test_vec_env`

* Remove unused imports

---------

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2023-02-15 14:17:18 +01:00