Commit graph

343 commits

Author SHA1 Message Date
Juan Rocamonde
18b29a68e8
Remove forward() method from common.policies.BaseModel (#1061)
* Remove forward() method.

* Updated changelog
2022-09-11 18:39:13 +02:00
Quentin Gallouédec
98e786f744
Clarify and standardize verbosity documentation (#1056)
* Standardize the use of verbosity: > to >=

* Make verbose docstring more specific

* Update changelog
2022-09-09 16:46:28 +02:00
Quentin Gallouédec
29f6687b98
Raise error when observation keys and observation space keys don't match (#1047)
* Raise error when observation keys and observation space keys don't match

* Print the difference in keys

* Update changelog
2022-09-05 14:54:58 +02:00
Juan Rocamonde
fdca786f09
Fix replay_buffer_class type annotation (#1042)
* Fix replay_buffer_class type annotation

* Update changelog

* Further replacement of same type annotation issue

* Formatting

* Rolled back formatting changes for consistency
2022-09-01 20:10:01 -07:00
Sidney Tio
304c17dc78
Add append mode to Monitor (#1037)
* Added option to override or use existing CSVs

* Updated changelog for Monitor override

* Changed default value to override

* Simplify code and add test

* Update version

* Fix for pytype

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-08-31 11:53:44 +02:00
Hugh Perkins
2cc1477fa2
Fix advantage normalization with mini-batchsize of 1 (#1028)
* fix nan in advnatages with batch size 1, for ppo

* changelog

* black

* Simplify test

* Bump version

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-08-25 11:50:08 +02:00
Anand Balakrishnan
59af0c1b01
CheckpointCallback can now save replay buffer and VecNormalize (#1030)
* CheckpointCallback now saves replay buffer (if present)

* VecNormalize stats are saved at checkpoints

* Make checkpointing replay buffer and VecNormalize opt-in

* Edit changelog

* Add documentation for new parameters

* Update docs/misc/changelog.rst

* Add documentation for new parameters

* Implement suggested edits

* Reformat code

* Fix git conflict

* Add .pkl suffix to VecNormalize checkpoints

* Add tests for new CheckpointCallback params

* Merge CheckpointCallback tests

* Update test and add helper for checkpoint path

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-08-25 10:57:51 +02:00
Honglu Fan
29a481a288
Include running_mean and running_val when updating target networks (#1004)
* include `running_mean` and `running_val` when updating target networks in DQN, SAC, TD3.

* Update stable_baselines3/common/utils.py

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Precompute batch norm parameters in `_setup_model` and directly copy them in the target update.

* include `running_mean` and `running_val` when updating target networks in DQN, SAC, TD3.

* Update stable_baselines3/common/utils.py

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Precompute batch norm parameters in `_setup_model` and directly copy them in the target update.

* Fix `DictReplayBuffer.next_observations` type (#1013)

* Fix DictReplayBuffer.next_observations type

* Update changelog

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

* Fixed missing verbose parameter passing (#1011)

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Support for `device=auto` buffers and set it as default value (#1009)

* Default device is "auto" for buffer + auto device support in BufferBaseClass

* Update docstring

* Update tests

* Unify tests

* Update changelog

* Fix tests on CUDA device

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>

* Precompute batch norm parameters in `_setup_model` and directly copy them in the target update.

* Update test

* Add comments and update tests

* Bump version

* Remove one extra space to conform code style.

* Update docstrings

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Burak Demirbilek <BurakDmb@users.noreply.github.com>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2022-08-23 10:20:43 +02:00
Timothé
01cc127d32
Support hparams logging to tensorboard (#984)
* create Hparam class & support in all OutputFormats

* add hparams documentation & example

* add hparam tests

* remove unnecessary test & fix name

* format changes

* support hyperparameters logging to tensorboard

* fix HParams class docstring

* use more explicit variable names

* raise error instead of warning

* Unpin protobuf

* Add test for logging hparams

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-08-22 22:06:54 +02:00
Antonin RAFFIN
57e0054e62
Add Quentin to the list of maintainers (#1014) 2022-08-17 09:55:40 +02:00
Quentin Gallouédec
73822c34da
Support for device=auto buffers and set it as default value (#1009)
* Default device is "auto" for buffer + auto device support in BufferBaseClass

* Update docstring

* Update tests

* Unify tests

* Update changelog

* Fix tests on CUDA device

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2022-08-16 17:54:55 +02:00
Burak Demirbilek
792e3bcc27
Fixed missing verbose parameter passing (#1011)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2022-08-16 13:32:32 +02:00
Quentin Gallouédec
a30d36002b
Fix DictReplayBuffer.next_observations type (#1013)
* Fix DictReplayBuffer.next_observations type

* Update changelog

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-08-16 10:53:22 +02:00
Quentin Gallouédec
c4f54fcf04
Handling multi-dimensional action spaces (#971)
* Handle non 1D action shape

* Revert changes of observation (out of the scope of this PR)

* Apply changes  to DictReplayBuffer

* Update tests

* Rollout buffer n-D actions space handling

* Remove error when non 1D action space

* ActorCriticPolicy return action with the proper shape

* remove useless reshape

* Update changelog

* Add tests

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-08-06 14:19:20 +02:00
jlp-ue
6ce33f5bd2
Fix url in docs (#1000)
* fixed URL in docs

* Update changelog.rst
2022-08-05 17:54:48 +02:00
Francesco Lucianò
646d6d38b6
Fixed typo in PPO doc (#983)
* Fixed typo

Fixed typo

* Update changelog

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-07-30 12:52:35 +02:00
Marsel Khisamutdinov
d532362e94
Adds info on split tensorboard graphs (#989)
* Add info on split tensorboard graphs.

* Change wording to make it look better.

* Update changelog.rst

* Rephrase and add link to issue

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-07-30 12:44:25 +02:00
Adam Gleave
b1cc15970a
Use higher resolution time_ns() and avoid division by zero (#979)
* Use higher resolution time and round up to eps

* Update changelog

* Add test case

* Fix formatting, time()->time_ns

* Bugfix: ns is integer not float

* Move test to better place

* Divide by 1e9 earlier
2022-07-25 23:02:53 +02:00
Quentin Gallouédec
fda3d4d748
Fix returned type in predict (#964)
* `arr[0]` to `arr.squeeze(0)`

* `squeeze(axis=0)` to `squeeze(0)`

* Type testing

* Add type test for unvectorized observation

* `squeeze(0)` to `squeeze(axis=0)`

* Treatment of the laziness symptoms

* Update changelog

* Udate changelog

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-07-18 11:22:19 +02:00
Antonin RAFFIN
a18b91e01a
Replace "nature" with "Nature" (magazine) to reduce confusion (#965)
* Replace "nature" with "Nature" (magazine) to reduce confusion

* Replace "nature" with "Nature" (magazine) to reduce confusion

* Update changelog

Co-authored-by: mel <callmesolis@gmail.com>
2022-07-15 22:48:27 +02:00
Antonin RAFFIN
c1f1c3d3d7
Release v1.6.0 (#958)
* Release v1.6.0 + update doc + add copy button

* Update read the doc conda env

* Update year

* Fix bug in kl divergence check

* Rephrase requirement for envpool and isaac gym
2022-07-12 22:50:23 +02:00
Max Weltevrede
ef10189d80
Prohibit simultaneous use of optimize_memory_usage and handle_timeout_termination (#948)
* Prohibit simultaneous use of optimize_memory_buffer and handle_timeout_termination

* Modify test to avoid unsupported buffer configuration

* Change from assertion to raising of ValueError

* Update changelog

* Update style for consistency

* Use handle_timeout_termination when possible

Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-07-04 15:08:54 +02:00
Ram Rachum
d64bcb401a
Fix exception cause in base_class.py (#940) 2022-06-21 20:58:02 +01:00
Antonin RAFFIN
7ce7b6a8c2
Update defaults for offpolicy algos with features extractor (#935) 2022-06-18 10:52:52 +02:00
Antonin RAFFIN
d68f0a2411
Update doc: SB3 Contrib RecurrentPPO (#927)
* Update doc: contrib update

* Update docs/misc/changelog.rst

Co-authored-by: Anssi <kaneran21@hotmail.com>

* Address Anssi comments

Co-authored-by: Anssi <kaneran21@hotmail.com>
2022-05-31 18:11:16 +02:00
Antonin RAFFIN
4b89fbf283
Fix issues due to newer version of protobuf and sphinx (#924) 2022-05-29 21:09:50 +02:00
Antonin RAFFIN
49813d8c68
Update doc and add check for unbounded action space (#918) 2022-05-25 16:24:21 +02:00
TibiGG
2fcf8f91c1
Removed redundant double-check of nested Dict (#908)
* Removed redundant double-check of nested Dict observation space from BaseAlgorithm

* Update changelog

Co-authored-by: tibigg <tg4018@ic.ac.uk>
2022-05-09 14:36:15 +03:00
Antonin RAFFIN
0fadc94df3
Fix synchronization bug with EvalCallback (#907) 2022-05-08 21:54:34 +03:00
Thomas Rudolf
c2518dc160
Add doc to use mlflow logger (#889)
* ADD feature for mlflow logger via MLflowOutputFormat.

* Move MLFlow integration to doc

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-05-08 15:28:31 +02:00
Antonin RAFFIN
c5f0aa5de0
Update doc: PPO blog post and remark on timeouts (#896) 2022-05-01 16:26:34 +02:00
Antonin RAFFIN
a6f5049a99
Upgrade code to Python 3.7+ syntax using pyupgrade (#887)
* Upgrade code to Python 3.7+ syntax

* Update changelog
2022-04-25 13:01:38 +03:00
Bryan Collazo
3c468ff558
Update ppo documentation (remove redundant and) (#874)
* Update ppo documentation (remove redundant and)

PTAL, thanks!

* Update changelog

* Pin ale-py version

Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2022-04-19 14:15:51 +02:00
Paul Scheikl
ed308a71be
Fixed unchecked None value in SubprocVecEnv (#808)
* Fixed unchecked None value in SubprocVecEnv

* Fixed unchecked None value in DummyVecEnv

* Fix formatting

* Update test and changelog

* Improve test

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-04-12 16:05:40 +02:00
Antonin RAFFIN
39a4f9379a
Escape tensorboard log name (#857)
* escape tensorboard log name

Otherwise utils does not recognize the log.

* Added fix to changelog

* Modifications made by: make commit-checks .

* Revert "Modifications made by: make commit-checks ."

This reverts commit 529a275d9475f85ef031038a8f3565f7301e5371.

* Update changelog and add test

Co-authored-by: James Hirschorn <James.Hirschorn@quantitative-technologies.com>
2022-04-11 21:49:18 +02:00
Antonin RAFFIN
248f082cdc
Bump min PyTorch version (#855) 2022-04-11 18:34:15 +02:00
Quentin Gallouédec
16703b1314
Fix HER goal selection (#848)
* Goal sampled from next_achieved_goal instead of achived_goal

* No need to have special case for future anymore

* Update changelog

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-04-11 17:50:02 +02:00
Grégoire Passault
254bb10c42
Replacing the policy registry with policy "aliases" (#842)
* Replacing the policy registry with policy "aliases"

* Fixing import order and SAC

* Changing arg. order to be sure policy_aliases is a kwarg

* Import orders

* Removing pytype error check

* Reformat

* Fix alias import

* Not using mutable {} as default for policy_aliases

* Empty aliases initialization

* Using static attributes for policy_aliases

* Fixing isort

* Fixing back bad merge

* Running isort

* Fixing aliases for A2C and PPO

* Using f-string

* Moving policy_aliases definition position

* Adding change in the changelog

* Update version

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-04-08 21:21:53 +02:00
Yifei Cheng
44e53ff811
Enable force_zip64 (#839)
* Enable force_zip64

* mark tests as expensive

* Update changelog

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-03-28 10:35:33 +02:00
Antonin RAFFIN
30772aa9f5
Release v1.5.0 (#835)
* Release v1.5.0

* Fix link
2022-03-25 14:38:22 +01:00
Grégoire Passault
00ac43b0a9
Removing dead code for handling time limits (#831)
* Removing dead code for handling time limits (see #829)

* Mentionning remove_time_limit_termination in the changelog

* Update changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-03-23 13:33:55 +01:00
Yuan
009bb0549a
Update tensorboard.rst in SummaryWriterCallback (#822)
* Update tensorboard.rst

* update changelog.rst

* update changelog.rst, add username
2022-03-15 21:48:52 +01:00
Antonin RAFFIN
e88eb1c9ca
Add explanation of logger output (#803)
* Add explanation of logger output

* Apply suggestions from code review

Co-authored-by: Anssi <kaneran21@hotmail.com>

* Add example output

Co-authored-by: Anssi <kaneran21@hotmail.com>
2022-03-07 12:20:43 +01:00
Julio César Alves
cdaa9ab418
Callback to early stop the training if there is no model improvement after consecutive evaluations (#741)
* Added StopTrainingOnNoModelImprovement callback and callback_after_eval parameter in EvalCallback

* Correction in EvalCallback and tests for StopTrainingOnNoModelImprovement

* Update the docs related to new StopTrainingOnNoModelImprovement callback

* Update doc

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2022-02-25 11:56:47 +01:00
Quentin Gallouédec
db5366fb51
None as default value for env in HerReplayBuffer.sample + DQN batch size typing fix (#790)
* `env` to `None` by default in `HerReplayBuffer.sample` (#788)

* Fix DQN batch_size typing

* Fix changelog

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2022-02-24 15:51:01 +01:00
Quentin Gallouédec
13fcb12471
Fix normalization for DictReplayBuffer (#744)
* Normalize samples DictReplayBuffer (#743)

* Fixed sample normalization in ``DictReplayBuffer`` (#743)

* Test buffer normalization

* Rename test replay buffer

* Bump version

Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-02-23 13:04:57 +01:00
Boyuan Chen
7a01637128
Fix VecNormalization bug for Dict obs (#768)
* fix #724 VecNormalization bug for Dict obs

* update test and changelog

* Update changelog

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-02-23 12:33:41 +01:00
Costa Huang
d2ebd2eeaa
Allow PPO to turn off advantage normalization (#763)
* Allow PPO to turn of advantage normalization

* update changelog

* Add a test case

* Update test and sanity check

* Fix tests

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-02-22 15:29:21 +01:00
Antonin RAFFIN
7ce4bb8016
Pin gym version (#782)
* Pin gym version

* Cleanup warnings

* Reformat
2022-02-21 23:12:54 +01:00
Gianluca De Cola
58a98060f9
Update docstring on MlpExtractor. Resolves #736 (#774)
* Improve docstring on MlpExtractor.

* update changelog.
2022-02-16 01:50:17 +02:00
Gautam J
59bec30180
update docs fix indentation (#764)
* update docs fix indentation

Changed code block indentation from 2 spaces to 4 spaces for consistency.

* update changelog

* Update changelog.rst

Co-authored-by: Anssi <kaneran21@hotmail.com>
2022-02-07 21:00:53 +02:00
Manuel
40bda9a918
Remove explict forward calls (#753)
* Remove explict forward calls

* Changelog and commit checks.

* Reverted test forward removal for super call.

Co-authored-by: Anssi <kaneran21@hotmail.com>
2022-02-06 22:27:12 +02:00
Adam Gleave
78afcbd6d9
HumanOutputFormat: make length configurable, throw error if keys alias (#756)
* Make HumanOutputFormat length configurable and bump to 36 by default

* Add test case

* Updated changelog

* Blacken

* Blacken code

* Fix GitLab CI: switch to Docker container with new black version

* Incorporate suggestion

* Add class docstring

* Dummy commit to retrigger GitLab

Co-authored-by: Anssi <kaneran21@hotmail.com>
2022-02-05 12:57:35 +02:00
Adam Gleave
9ff26dafed
Fix changelog (#760) 2022-02-05 02:12:17 +02:00
Carlos Luis
5143cd19f7
Gym fixes - Follow up from #705 (#734)
* fix Atari in CI

* fix dtype and atari extra

* Update setup.py

* remove 3.6

* note about how to install Atari

* pendulum-v1

* atari v5

* black

* fix pendulum capitalization

* add minimum version

* moved things in changelog to breaking changes

* partial v5 fix

* env update to pass tests

* mismatch env version fixed

* Fix tests after merge

* Include autorom in setup.py

* Blacken code

* Fix dtype issue in more robust way

* Fix GitLab CI: switch to Docker container with new black version

* Remove workaround from GitLab. (May need to rebuild Docker for this though.)

* Revert to v4

* Update setup.py

* Apply suggestions from code review

* Remove unnecessary autorom

* Consistent gym versions

Co-authored-by: J K Terry <justinkterry@gmail.com>
Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: modanesh <mohamad4danesh@gmail.com>
Co-authored-by: Adam Gleave <adam@gleave.me>
2022-02-04 15:13:57 -08:00
Armand du Parc Locmaria
44dfedc061
Add furuta pendulum project to project list (#742)
* add furuta pendulum project

* Update changelog to reflect addition to docs

Co-authored-by: Anssi <kaneran21@hotmail.com>
2022-02-04 11:39:49 +02:00
Antonin RAFFIN
54bcfa4544
Add Hugging Face integration to SB3 doc (#733)
* Add Hugging Face to SB3 doc

* Update doc + fixes

* Use SB3 model from the hub

* Bump version

* Fixes

Co-authored-by: simoninithomas <simonini_thomas@outlook.fr>
2022-01-20 10:04:12 +01:00
Paul Scheikl
fc41600225
Fixed logging info_keywords in the VecMonitor class. (#730)
* Writing the additional info_keywords into the episode infos that are passed to the resulst writer. Directly taken from the non-vec version of monitor.

* Added test for monitoring info_keywords.

* Removed unnecessary step of registering the env. Not using make_vec_env, because it applies a monitor wrapper to the env.

* Reformat

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-01-19 17:17:22 +01:00
Antonin RAFFIN
21f6a474a4
Release 1.4.0 (#729)
* Release 1.4.0

* Add integration section in the readme
2022-01-19 11:16:15 +01:00
Antonin RAFFIN
cd6e04705b
Update SB3 Contrib doc (ARS) and W&B integration (#726)
* Add ARS to SB3 contrib

* Add integration page
2022-01-18 15:10:25 +01:00
Antonin RAFFIN
e9a8979022
Add copy and combine method to running mean std (#716)
* Add copy and combine method to running mean std

* Update test

* Faster test

* Update test

* Update test

* Shift values in RMS test
2022-01-06 01:31:04 +02:00
IperGiove
d9e198e04f
Update custom_policy.rst (#711)
* Update custom_policy.rst

Added methods forward_actor and forward_critic in CustomNetwork class.

* Update doc

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-01-03 16:22:58 +01:00
Thomas Gubler
c895c1d46f
Doc fix: A2C - fix guidance on RMSpropTFLike (#708)
* doc: A2C/migration: fix guidance on RMSpropTFLike

* Update changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-12-30 11:28:12 +01:00
Antonin RAFFIN
4a5dfaedfc
Update SB3 contrib doc (+ fix backward compat) (#707)
* Fix `VecNormalize` load for SB3<= 1.3.0

* Update SB3 contrib doc

* Bump version
2021-12-29 14:25:09 +01:00
Antonin RAFFIN
bb16645c4e
Add skip option for VecTransposeImage and bug fix in frame stack (#700)
* Update doc

* Add comment

* Add skip option to VecTransposeImage and fix bug in frame stack
2021-12-23 17:12:49 +02:00
Quentin Gallouédec
d496cd4d95
Consistent use of device as keyword argument (#702)
* consistent device as keyword arg

* Fixed ``device`` arg inconsistency in changelog
2021-12-22 11:43:59 +01:00
Demetrio92
798b16aaf7
more verbose documentation regarding .load vs .set_parameters (#696)
* more verbose documentation regarding `.load` vs `.set_parameters` (#683, #614)

* add a note to explain the difference between `.load` and `.set_parameters` to the examples

* fix typos

Co-authored-by: Anssi <kaneran21@hotmail.com>

Co-authored-by: Anssi <kaneran21@hotmail.com>
2021-12-18 17:28:37 +02:00
hsuehch
222a69ca49
Eliminate extra empty lines in CSV monitor files on Windows (DLR-RM#692) (#695)
* Added ``newline="\n"`` when opening CSV monitor files so that each line ends with ``\r\n`` instead of ``\r\r\n`` on Windows while Linux environments are not affected
2021-12-18 16:04:33 +02:00
Antonin RAFFIN
e24147390d
Improve tests and add check for float32 (#686)
* Add additional checks

* Improve tests and error message

* Update changelog

* Bump version

* Update doc

* Add tests for action space

* Improve test
2021-12-09 14:14:33 +02:00
Antonin RAFFIN
77f4f5021d
Drop Python 3.6 support (#685)
* Drop python 3.6 support

* Update doc

* Update gitlab CI

* Update doc env

* Fix gitlab CI
2021-12-06 12:54:43 +01:00
Antonin RAFFIN
507ed1762e
Multiprocessing support for off policy algorithms (#439)
* Add multi-env training support for SAC

* Fix for dict obs

* Pytype fixes

* Fix assert on number of envs

* Remove for loop

* Add support for Dict obs

* Start cleanup

* Update doc and bug fix

* Add support for vectorized action noise
and add multi env example for off-policy

* Update version

* Bug fix with VecNormalize

* Update README table

* Update variable names

* Update changelog and version

* Update doc and fix for `gradient_steps=-1`

* Add test for `gradient_steps=-1`

* Disable pytype pyi errors

* Fix for DQN

* Update comment on deepcopy

* Remove episode_reward field

* Fix RolloutReturn

* Avoid modification by reference

* Fix error message

Co-authored-by: Anssi <kaneran21@hotmail.com>
2021-12-01 22:30:09 +01:00
Antonin RAFFIN
2ebb8aa22b
Update Citation (#684)
* Update citation

* Remove cff file
2021-12-01 18:55:21 +01:00
Antonin RAFFIN
52c29dc497
Fix evaluation script for recurrent policies (#678)
* Fix evaluation script for RNN

* Add error message

* Revert "Add error message"

This reverts commit 8d69b6cf4de2cd13aecfb425bd3145fad6a6c49a.

* Fix for pytype

* Rename mask to `episode_start`

* Fix type hint

* Fix type hints

* Remove confusing part of sentence

Co-authored-by: Anssi <kaneran21@hotmail.com>
2021-11-30 13:49:06 +01:00
Gary Briggs
8e5ede783f
Add a section on exporting to TFLite/Coral with demonstration (#679)
* Add a section on exporting to TFLite/Coral with demonstration

* Changelog to reflect new export documentation

* Update docs/guide/export.rst

Fingers on autopilot make word wrong

Co-authored-by: Anssi <kaneran21@hotmail.com>

* Update docs/guide/export.rst

Better wording clarity

Co-authored-by: Anssi <kaneran21@hotmail.com>

* Update docs/guide/export.rst

Better wording clarity

Co-authored-by: Anssi <kaneran21@hotmail.com>

* Clarify motivations and hardware

* Update docs/misc/changelog.rst

Make consistent with other changelog entries

Co-authored-by: Anssi <kaneran21@hotmail.com>

* Sphinx wants the section underline to be at least this long

* Remove first-person voice

* Typos

Co-authored-by: Anssi <kaneran21@hotmail.com>
2021-11-28 10:54:50 +01:00
Shyamal H Anadkat
3b68dc7312
Update GAE computation docstring (#655)
* Fix typo in buffers.py

* Revert "Fix typo in buffers.py"

This reverts commit ca643d5e3a509ae1b8a65bf0de98f4609ca9d8da.

* Ignore pytype errors

* Update GAE computation docstring

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-11-25 10:53:42 +01:00
Parth Kothari
58e5506385
Editted Authors of DriverGym project (#669) 2021-11-18 10:18:18 +01:00
Parth Kothari
1ac35eaef2
Add DriverGym project to SB3 project documentation (#665)
* Added DriverGym project

* Updated changelog

* Update changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-11-17 11:13:43 +01:00
Antonin RAFFIN
d228364ccf
Add timeout handling for on-policy algorithms (#658)
* Add timeout handling for on-policy algorithms

* Fixes

* Fix infinite loop in eval

* Skip type check for python 3.9

* Fix for discrete obs + add docstring

* Fix A2C test

* Removed unused helper

* Add test for infinite horizon

* typed ast should be fixed

* Apply suggestions from code review

Co-authored-by: Anssi <kaneran21@hotmail.com>

Co-authored-by: Anssi <kaneran21@hotmail.com>
2021-11-16 17:19:16 +01:00
Antonin RAFFIN
e75e1de4c1
Fix indentation in RL tips doc (#657)
* Update rl_tips.rst

indent fix to make if done and its following statement work

* Fix indentation and update changelog

* Skip type check for python 3.9

Co-authored-by: paulg <cove9988@gmail.com>
2021-11-10 16:54:20 +00:00
Antonin RAFFIN
2bb4500948
Fix set_env when using VecNormalize (#638)
* Fix `set_env` when using `VecNormalize`

* Update version
2021-11-02 13:52:26 +02:00
ac-93
98c1a637cf
add tactile-gym to the list of projects using SB3 (#640)
* Update projects.rst

* Update changelog.rst

* Update projects.rst

* Fix doc build

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-10-31 18:26:06 +01:00
Oleksii Kachaiev
0c17fedfac
Adjust FPS calculation to accommodate for reset_num_timesteps=False (#636)
* Store number of timesteps at the beginning of each learn cycle

* Update changelog

* Set default _num_timesteps_at_start in the contructor

* Test case for FPS logger

* Adjust test to cover both on-policy and off-policy algorithms

* Fix formatting

* Update test and add comment

* Fix test

Co-authored-by: Oleksii Kachaiev <okachaiev@riotgames.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-10-31 18:19:03 +01:00
Edouard Leurent
a2e3001598
Add highway-env to the list of projects using SB3 (#639)
* Add highway-env to the list of projects using SB3

Many thanks for this fantastic library, keep up the good work!

* Update changelog with added documentation
2021-10-30 13:53:36 +02:00
Oleksii Kachaiev
0503e694b2
Introduce norm_obs_keys param for VecNormalize environment wrapper (#631)
* Implement new norm_obs_keys param for VecNormalize environment wrapper

* Simplified doc string to avoid issues with lint and doc

* Updated changelog

* Update changelog.rst

* Update test_vec_normalize.py

* Update sanity checks

* Fix backward compat

* Update doc

* Update changelog

* Fix lint warnings

* Fix tests

* Minor edit

* observation_space sanity check was applied twice

Co-authored-by: Oleksii Kachaiev <okachaiev@riotgames.com>
Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2021-10-28 19:18:39 +02:00
Antonin RAFFIN
7b977d7b03
Release 1.3.0 (#625) 2021-10-23 17:07:00 +02:00
Antonin RAFFIN
e907eca18e
Fix set_env to keep the number of timesteps (#615)
* Fix for `set_env`

* Add test and update changelog

* Use underscores and f-strings

* Add PyPi info

* Update comments
2021-10-23 16:36:40 +02:00
Antonin RAFFIN
1564a85081
System info helper (#613)
* Add `system_env_info`

* Add `print_system_info` to load
and store system info at save time

* Remove TODO

* Rename to `get_system_info`

* Import as sb3 for consistency

* Update changelog

* Add warning for old SB3 versions

* Use underscore litteral for more clarity
2021-10-18 10:43:56 +02:00
Timo Kaufmann
09e9fc42eb
Use consistent logging keys (#605)
* Use a consistent key to log the total timesteps

This changes the timestep logging key of on-policy algorithms from
`time/total_timesteps` to `time/total timesteps` (note the
underscore/space). The off-policy algorithms and the eval callback
already use the latter, so this behavior is more consistent.

* Use underscores instead of spaces in logging keys

Most keys already followed this policy and consistent behavior is
friendlier to new users.

* Minor edit and bump version

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-10-12 13:17:30 +02:00
Antonin RAFFIN
75aa31dcfb
Update SB3 contrib algorithms (#604) 2021-10-10 15:41:39 +02:00
Antonin RAFFIN
1881d904a0
Doc fix and improve error messages (#598)
* Fix custom env doc

* Catch common mistake

* Improve `EvalCallback` error message

* Lint test

* Update docs/guide/custom_env.rst

Co-authored-by: Adam Gleave <adam@gleave.me>

Co-authored-by: Adam Gleave <adam@gleave.me>
2021-10-08 18:08:31 +02:00
Ilja Avadiev
740d61ada3
Doc fix environment mixup (#588) 2021-09-29 10:16:59 +02:00
Antonin RAFFIN
306e49fda6
Fixes in is_vectorized_observation (#587)
* Fix is vectorized bug in DQN

* Fix sub-classed obs
2021-09-28 21:57:49 +02:00
Antonin RAFFIN
201fbffa8c
Remove sde_net_arch + Simplify policy (#584)
* Remove `sde_net_arch` + Simplify policy

* Add warning at load time
2021-09-28 22:32:54 +03:00
batu
89af49ca91
ONNX Documentation Update (#464)
* Updated ONNX documentation

First draft on the documentation explaining how to export SB3 models in the ONNX format

* Updated changelog with ONNX documentation fix

* Address comments

* Update changelog.rst

* Update rtd env

* Fixes + add test example

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Anssi Kanervisto <anssk@Anssis-MacBook-Air.local>
Co-authored-by: Anssi Kanervisto <kaneran21@hotmail.com>
2021-09-26 17:40:35 +02:00
Baek Junyeob
914bc10a0d
Add policy-distillation-baselines to project page (#578)
* Update projects.rst

* Update docs/misc/projects.rst

* Apply suggestions from code review

* Update changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-09-20 16:30:16 +02:00
Adam Gleave
e825fbdd33
VecNormalize: allow non-continuous observations when norm_obs is False (#575)
* VecNormalize: allow non-continuous observations when norm_obs is False

* Update changelog, fix lint

* Switch to environment present in new and old versions of Gym

* Fix name

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-09-18 12:11:01 +02:00
Matthew Allen
76c212a854
Add RLGym to project page (#576)
* Add RLGym to projects list.

Per the request in this issue on our repo: https://github.com/lucas-emery/rocket-league-gym/issues/24

* Update changelog documentation section

* Update changelog.rst

* Update docs/misc/projects.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-09-18 11:47:22 +02:00
Wilhelm Kirchgässner
303df08a80
Add GEM project to project section of doc (#574)
* add GEM project to project section of doc

* Update docs/misc/projects.rst

* Update changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-09-18 11:10:04 +02:00
Cyprien
f3a35aa786
Add method predict_values for ActorCriticPolicy (#569)
* feat: add method predict_values for ActorCriticPolicy

* Fixes for new gym version

* Reformat

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-09-15 14:03:04 +02:00
Antonin RAFFIN
16f8b21d9b
Add get_distribution for on-policy algorithms (#566)
* feat: get_distribution method for ActorCriticPolicy

New method get_distribution for class ActorCriticPolicy returning current action distribution given observations

* doc: updating changelog.rst

- adding block for Release 1.2.1a0
- adding cyprienc to contributors

* style: make format

* fix: updating version.txt

Changing version from 1.2.0 to 1.2.1a0

* Update changelog

* Add test for get distribution

Co-authored-by: Cyprien <courtot.c@gmail.com>
2021-09-13 10:25:42 +02:00