Commit graph

304 commits

Author SHA1 Message Date
Alex Pasquali
6a8c9ddc8b
Updated type hint and extended docstring in make_vec_env and make_atari_env (#1085)
* Updated type hint and extended docstring in make_vec_env

The function itself was already working with callables, but it wasn't considerent in the type hint of the function's signature.

Extended the description of the wrapper_class parameter with a link to a Github issue containing more details on the matter.

* Updated type hint in make_atari_env

The function itself was already working with callables, but it wasn't considerent in the type hint of the function's signature.

* Updated docstring in make_atari_env

When modifying the type hint of the parameter 'env_id' (in this commit: fda6872f73c11075901ba88f2520f6316f818d1d), I forgot to update its description in the docstrig.
Doing it now.

* Removed redundant type in env_id's type hint in make_vec_env and make_atari_env

Callable[..., gym.Env] already includes Type[gym.Env], as pointed out here: https://github.com/DLR-RM/stable-baselines3/pull/1085#issuecomment-1269685218

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-10-06 13:36:06 +02:00
Quentin Gallouédec
a697401e03
Standardized the use of `"` for string representation (#1086)
* Replace ``'`` by ``" `` in python code

* Update changelog

* Rm whitespace
2022-10-03 15:15:39 +02:00
Quentin Gallouédec
d3eb0e3ed6
Fix importlib dependency (#1088)
* Set requirement ``importlib-metadata~=4.13``

* Update changelog
2022-10-03 12:03:51 +02:00
Antonin RAFFIN
537a82a7fd
Update export doc (fixes + add torch jit) (#1074)
* Update export doc (fixes + add torch jit)

* Fix conflicts

* Update according to code review comments

* fix torch -> th

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2022-09-30 14:30:40 +02:00
Antonin RAFFIN
21300c9aaf
Release v1.6.1 (#1080) 2022-09-29 12:15:55 +02:00
Akhil
def0574d03
Fixed typos (#1076)
* Updated docstring from n_steps to n_rollout_steps

This must be a typo

* Fixed typo in a comment in ppo.py

* Update changelog

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2022-09-28 14:57:46 +02:00
Juan Rocamonde
e22e372306
Fix duplicate key error in HumanOutputFormat (#1079)
* Fix duplicate key error in HumanOutputFormat

* Update changelog

* Add test

* Update changelog.rst

Co-authored-by: Adam Gleave <adam@gleave.me>

Co-authored-by: Adam Gleave <adam@gleave.me>
2022-09-28 12:06:07 +02:00
Juan Rocamonde
432b3f876d
Fix return type for load, learn in BaseAlgorithm (#1043)
* Fix return type for load, learn in BaseAlgorithm

* Update changelog

* Add typing extensions to dependencies

* Import directly from typing for python >3.11

* Reorder changelog to reflect merge order

* Roll back to typevar solution

* Updated changelog

* Remove typing extensions requirement

* Update base_class.py

* Remove final point in changelog

* Additional type fixes across project

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-09-26 12:13:56 +02:00
Dominic Kerr
899eee6bd4
Automatically create missing directories of `filenames passed to ResultsWriter` (#1072)
* Create (if any) missing filename directories, passed into ResultsWriter

* Fixed incorrect ``filename`` docstring (if ``filename`` where ``None``, the string method ``filename.endswith(Monitor.EXT)`` would raise an ``AttributeError``), and renamed ``reset_keywords`` docstring.

* Added description of #1068

* Ignore pytype errors

* Update changelog.rst

Co-authored-by: dominicgkerr <dominicgkerr1@gmail.co>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-09-21 13:14:38 +02:00
Alex Pasquali
d0b129ecc3
Updated custom policy docs (#1067) 2022-09-18 09:17:57 +02:00
Quentin Gallouédec
440735cbd0
Fix loading a model with different number of environments (#1058)
* Fix loading with new `n_envs`

* Update tests

* Update changelog

* Fix the fix

* Remove `self._setup_model()` from `set_env()`

* Raise `AssertionError` when setting env with a different `n_envs`

* Update unitests

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-09-17 11:10:03 +02:00
Juan Rocamonde
18b29a68e8
Remove forward() method from common.policies.BaseModel (#1061)
* Remove forward() method.

* Updated changelog
2022-09-11 18:39:13 +02:00
Quentin Gallouédec
98e786f744
Clarify and standardize verbosity documentation (#1056)
* Standardize the use of verbosity: > to >=

* Make verbose docstring more specific

* Update changelog
2022-09-09 16:46:28 +02:00
Quentin Gallouédec
29f6687b98
Raise error when observation keys and observation space keys don't match (#1047)
* Raise error when observation keys and observation space keys don't match

* Print the difference in keys

* Update changelog
2022-09-05 14:54:58 +02:00
Juan Rocamonde
fdca786f09
Fix replay_buffer_class type annotation (#1042)
* Fix replay_buffer_class type annotation

* Update changelog

* Further replacement of same type annotation issue

* Formatting

* Rolled back formatting changes for consistency
2022-09-01 20:10:01 -07:00
Sidney Tio
304c17dc78
Add append mode to Monitor (#1037)
* Added option to override or use existing CSVs

* Updated changelog for Monitor override

* Changed default value to override

* Simplify code and add test

* Update version

* Fix for pytype

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-08-31 11:53:44 +02:00
Hugh Perkins
2cc1477fa2
Fix advantage normalization with mini-batchsize of 1 (#1028)
* fix nan in advnatages with batch size 1, for ppo

* changelog

* black

* Simplify test

* Bump version

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-08-25 11:50:08 +02:00
Anand Balakrishnan
59af0c1b01
CheckpointCallback can now save replay buffer and VecNormalize (#1030)
* CheckpointCallback now saves replay buffer (if present)

* VecNormalize stats are saved at checkpoints

* Make checkpointing replay buffer and VecNormalize opt-in

* Edit changelog

* Add documentation for new parameters

* Update docs/misc/changelog.rst

* Add documentation for new parameters

* Implement suggested edits

* Reformat code

* Fix git conflict

* Add .pkl suffix to VecNormalize checkpoints

* Add tests for new CheckpointCallback params

* Merge CheckpointCallback tests

* Update test and add helper for checkpoint path

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-08-25 10:57:51 +02:00
Honglu Fan
29a481a288
Include running_mean and running_val when updating target networks (#1004)
* include `running_mean` and `running_val` when updating target networks in DQN, SAC, TD3.

* Update stable_baselines3/common/utils.py

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Precompute batch norm parameters in `_setup_model` and directly copy them in the target update.

* include `running_mean` and `running_val` when updating target networks in DQN, SAC, TD3.

* Update stable_baselines3/common/utils.py

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Precompute batch norm parameters in `_setup_model` and directly copy them in the target update.

* Fix `DictReplayBuffer.next_observations` type (#1013)

* Fix DictReplayBuffer.next_observations type

* Update changelog

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Fixed missing verbose parameter passing (#1011)

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Support for `device=auto` buffers and set it as default value (#1009)

* Default device is "auto" for buffer + auto device support in BufferBaseClass

* Update docstring

* Update tests

* Unify tests

* Update changelog

* Fix tests on CUDA device

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>

* Precompute batch norm parameters in `_setup_model` and directly copy them in the target update.

* Update test

* Add comments and update tests

* Bump version

* Remove one extra space to conform code style.

* Update docstrings

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Burak Demirbilek <BurakDmb@users.noreply.github.com>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2022-08-23 10:20:43 +02:00
Timothé
01cc127d32
Support hparams logging to tensorboard (#984)
* create Hparam class & support in all OutputFormats

* add hparams documentation & example

* add hparam tests

* remove unnecessary test & fix name

* format changes

* support hyperparameters logging to tensorboard

* fix HParams class docstring

* use more explicit variable names

* raise error instead of warning

* Unpin protobuf

* Add test for logging hparams

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-08-22 22:06:54 +02:00
Antonin RAFFIN
57e0054e62
Add Quentin to the list of maintainers (#1014) 2022-08-17 09:55:40 +02:00
Quentin Gallouédec
73822c34da
Support for device=auto buffers and set it as default value (#1009)
* Default device is "auto" for buffer + auto device support in BufferBaseClass

* Update docstring

* Update tests

* Unify tests

* Update changelog

* Fix tests on CUDA device

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2022-08-16 17:54:55 +02:00
Burak Demirbilek
792e3bcc27
Fixed missing verbose parameter passing (#1011)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2022-08-16 13:32:32 +02:00
Quentin Gallouédec
a30d36002b
Fix DictReplayBuffer.next_observations type (#1013)
* Fix DictReplayBuffer.next_observations type

* Update changelog

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-08-16 10:53:22 +02:00
Quentin Gallouédec
c4f54fcf04
Handling multi-dimensional action spaces (#971)
* Handle non 1D action shape

* Revert changes of observation (out of the scope of this PR)

* Apply changes  to DictReplayBuffer

* Update tests

* Rollout buffer n-D actions space handling

* Remove error when non 1D action space

* ActorCriticPolicy return action with the proper shape

* remove useless reshape

* Update changelog

* Add tests

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-08-06 14:19:20 +02:00
jlp-ue
6ce33f5bd2
Fix url in docs (#1000)
* fixed URL in docs

* Update changelog.rst
2022-08-05 17:54:48 +02:00
Francesco Lucianò
646d6d38b6
Fixed typo in PPO doc (#983)
* Fixed typo

Fixed typo

* Update changelog

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-07-30 12:52:35 +02:00
Marsel Khisamutdinov
d532362e94
Adds info on split tensorboard graphs (#989)
* Add info on split tensorboard graphs.

* Change wording to make it look better.

* Update changelog.rst

* Rephrase and add link to issue

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-07-30 12:44:25 +02:00
Adam Gleave
b1cc15970a
Use higher resolution time_ns() and avoid division by zero (#979)
* Use higher resolution time and round up to eps

* Update changelog

* Add test case

* Fix formatting, time()->time_ns

* Bugfix: ns is integer not float

* Move test to better place

* Divide by 1e9 earlier
2022-07-25 23:02:53 +02:00
Quentin Gallouédec
fda3d4d748
Fix returned type in predict (#964)
* `arr[0]` to `arr.squeeze(0)`

* `squeeze(axis=0)` to `squeeze(0)`

* Type testing

* Add type test for unvectorized observation

* `squeeze(0)` to `squeeze(axis=0)`

* Treatment of the laziness symptoms

* Update changelog

* Udate changelog

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-07-18 11:22:19 +02:00
Antonin RAFFIN
a18b91e01a
Replace "nature" with "Nature" (magazine) to reduce confusion (#965)
* Replace "nature" with "Nature" (magazine) to reduce confusion

* Replace "nature" with "Nature" (magazine) to reduce confusion

* Update changelog

Co-authored-by: mel <callmesolis@gmail.com>
2022-07-15 22:48:27 +02:00
Antonin RAFFIN
c1f1c3d3d7
Release v1.6.0 (#958)
* Release v1.6.0 + update doc + add copy button

* Update read the doc conda env

* Update year

* Fix bug in kl divergence check

* Rephrase requirement for envpool and isaac gym
2022-07-12 22:50:23 +02:00
Max Weltevrede
ef10189d80
Prohibit simultaneous use of optimize_memory_usage and handle_timeout_termination (#948)
* Prohibit simultaneous use of optimize_memory_buffer and handle_timeout_termination

* Modify test to avoid unsupported buffer configuration

* Change from assertion to raising of ValueError

* Update changelog

* Update style for consistency

* Use handle_timeout_termination when possible

Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-07-04 15:08:54 +02:00
Ram Rachum
d64bcb401a
Fix exception cause in base_class.py (#940) 2022-06-21 20:58:02 +01:00
Antonin RAFFIN
7ce7b6a8c2
Update defaults for offpolicy algos with features extractor (#935) 2022-06-18 10:52:52 +02:00
Antonin RAFFIN
d68f0a2411
Update doc: SB3 Contrib RecurrentPPO (#927)
* Update doc: contrib update

* Update docs/misc/changelog.rst

Co-authored-by: Anssi <kaneran21@hotmail.com>

* Address Anssi comments

Co-authored-by: Anssi <kaneran21@hotmail.com>
2022-05-31 18:11:16 +02:00
Antonin RAFFIN
4b89fbf283
Fix issues due to newer version of protobuf and sphinx (#924) 2022-05-29 21:09:50 +02:00
Antonin RAFFIN
49813d8c68
Update doc and add check for unbounded action space (#918) 2022-05-25 16:24:21 +02:00
TibiGG
2fcf8f91c1
Removed redundant double-check of nested Dict (#908)
* Removed redundant double-check of nested Dict observation space from BaseAlgorithm

* Update changelog

Co-authored-by: tibigg <tg4018@ic.ac.uk>
2022-05-09 14:36:15 +03:00
Antonin RAFFIN
0fadc94df3
Fix synchronization bug with EvalCallback (#907) 2022-05-08 21:54:34 +03:00
Thomas Rudolf
c2518dc160
Add doc to use mlflow logger (#889)
* ADD feature for mlflow logger via MLflowOutputFormat.

* Move MLFlow integration to doc

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-05-08 15:28:31 +02:00
Antonin RAFFIN
c5f0aa5de0
Update doc: PPO blog post and remark on timeouts (#896) 2022-05-01 16:26:34 +02:00
Antonin RAFFIN
a6f5049a99
Upgrade code to Python 3.7+ syntax using pyupgrade (#887)
* Upgrade code to Python 3.7+ syntax

* Update changelog
2022-04-25 13:01:38 +03:00
Bryan Collazo
3c468ff558
Update ppo documentation (remove redundant and) (#874)
* Update ppo documentation (remove redundant and)

PTAL, thanks!

* Update changelog

* Pin ale-py version

Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2022-04-19 14:15:51 +02:00
Paul Scheikl
ed308a71be
Fixed unchecked None value in SubprocVecEnv (#808)
* Fixed unchecked None value in SubprocVecEnv

* Fixed unchecked None value in DummyVecEnv

* Fix formatting

* Update test and changelog

* Improve test

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-04-12 16:05:40 +02:00
Antonin RAFFIN
39a4f9379a
Escape tensorboard log name (#857)
* escape tensorboard log name

Otherwise utils does not recognize the log.

* Added fix to changelog

* Modifications made by: make commit-checks .

* Revert "Modifications made by: make commit-checks ."

This reverts commit 529a275d9475f85ef031038a8f3565f7301e5371.

* Update changelog and add test

Co-authored-by: James Hirschorn <James.Hirschorn@quantitative-technologies.com>
2022-04-11 21:49:18 +02:00
Antonin RAFFIN
248f082cdc
Bump min PyTorch version (#855) 2022-04-11 18:34:15 +02:00
Quentin Gallouédec
16703b1314
Fix HER goal selection (#848)
* Goal sampled from next_achieved_goal instead of achived_goal

* No need to have special case for future anymore

* Update changelog

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-04-11 17:50:02 +02:00
Grégoire Passault
254bb10c42
Replacing the policy registry with policy "aliases" (#842)
* Replacing the policy registry with policy "aliases"

* Fixing import order and SAC

* Changing arg. order to be sure policy_aliases is a kwarg

* Import orders

* Removing pytype error check

* Reformat

* Fix alias import

* Not using mutable {} as default for policy_aliases

* Empty aliases initialization

* Using static attributes for policy_aliases

* Fixing isort

* Fixing back bad merge

* Running isort

* Fixing aliases for A2C and PPO

* Using f-string

* Moving policy_aliases definition position

* Adding change in the changelog

* Update version

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-04-08 21:21:53 +02:00
Yifei Cheng
44e53ff811
Enable force_zip64 (#839)
* Enable force_zip64

* mark tests as expensive

* Update changelog

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-03-28 10:35:33 +02:00