* Add with_bias arg
* Update changelog
* move torch_layers to the last position
* Update version
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Raise error when same env object instance is passed in vectorized environment
* At to changelog
* Add raises to docstring
* Add test
* Also test make_vec_env
* Fix test
* Try to enable color for MyPy
* Update version and ignore lint warnings
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
* Install and configure mypy
* Test if github CI uses setup.cfg for mypy
* force color output
* tab to space
* Try to fix regex
* follow_imports silent
* use space as indentation
* fix indentation setup.cfg
* Show error code
* Update doc
* Udate changelog
* Ignore mypy cache files from commit
* Update gitlab CI
* Add pytype and mypy entry in Makefile
* Make mypy happy
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Fixed errors in the documentation
Fixed grammatical and punctuation errors, and improved the sentence structure.
* Added username in the contributors
* Add PolicyPredictor protocol and use it in evaluate_policy
* Update changelog
* Move Protocol to type_aliases to avoid circular import
* Add test for evaluate_policy on BasePolicy
* Remove unused import
* Use typing_extensions
* Move typing_extensions to 3rd party
* Add version range (typing_extensions uses SemVer)
* Import Protocol from typing_extensions only on Python<3.8
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* Install typing_extensions only on Python<3.8
* Add missing sys import
* Fix import ordering
* Fix observation type hint in predict
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>
* Adds deprecation warning if `eval_env` or `eval_freq` parameters are used. See #925
* added changelog entry
* added missing backtick
* deprecating `create_eval_env` parameter as well and adding comments to explain the `stacklevel` parameter used
* Updated tests to ignore DeprecationWarnings
* Updated changelog entry
* - Removed the `create_eval_env` parameter from the examples in the docs
- Removed information about the `create_eval_env` parameter from the migration docs
- Added information about deprecation of the `create_eval_env` parameter in the docs
* Add alternative in docstring
* Update docstrings
* `eval_freq` warning in docstring
* Add deprecation comments in tests
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>
* Add progress bar callback and argument
* Update doc
* Update changelog
* Upgrade pytype in docker image
* Use tqdm.write in the logger to have cleaner output
* Fix logger test
* Fix when doing multiple calls to learn()
* Address comments from code-review
* Updated type hint and extended docstring in make_vec_env
The function itself was already working with callables, but it wasn't considerent in the type hint of the function's signature.
Extended the description of the wrapper_class parameter with a link to a Github issue containing more details on the matter.
* Updated type hint in make_atari_env
The function itself was already working with callables, but it wasn't considerent in the type hint of the function's signature.
* Updated docstring in make_atari_env
When modifying the type hint of the parameter 'env_id' (in this commit: fda6872f73c11075901ba88f2520f6316f818d1d), I forgot to update its description in the docstrig.
Doing it now.
* Removed redundant type in env_id's type hint in make_vec_env and make_atari_env
Callable[..., gym.Env] already includes Type[gym.Env], as pointed out here: https://github.com/DLR-RM/stable-baselines3/pull/1085#issuecomment-1269685218
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Updated docstring from n_steps to n_rollout_steps
This must be a typo
* Fixed typo in a comment in ppo.py
* Update changelog
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
* Fix return type for load, learn in BaseAlgorithm
* Update changelog
* Add typing extensions to dependencies
* Import directly from typing for python >3.11
* Reorder changelog to reflect merge order
* Roll back to typevar solution
* Updated changelog
* Remove typing extensions requirement
* Update base_class.py
* Remove final point in changelog
* Additional type fixes across project
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Fix loading with new `n_envs`
* Update tests
* Update changelog
* Fix the fix
* Remove `self._setup_model()` from `set_env()`
* Raise `AssertionError` when setting env with a different `n_envs`
* Update unitests
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Fix replay_buffer_class type annotation
* Update changelog
* Further replacement of same type annotation issue
* Formatting
* Rolled back formatting changes for consistency
* Added option to override or use existing CSVs
* Updated changelog for Monitor override
* Changed default value to override
* Simplify code and add test
* Update version
* Fix for pytype
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* fix nan in advnatages with batch size 1, for ppo
* changelog
* black
* Simplify test
* Bump version
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* include `running_mean` and `running_val` when updating target networks in DQN, SAC, TD3.
* Update stable_baselines3/common/utils.py
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Precompute batch norm parameters in `_setup_model` and directly copy them in the target update.
* include `running_mean` and `running_val` when updating target networks in DQN, SAC, TD3.
* Update stable_baselines3/common/utils.py
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Precompute batch norm parameters in `_setup_model` and directly copy them in the target update.
* Fix `DictReplayBuffer.next_observations` type (#1013)
* Fix DictReplayBuffer.next_observations type
* Update changelog
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Fixed missing verbose parameter passing (#1011)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* Support for `device=auto` buffers and set it as default value (#1009)
* Default device is "auto" for buffer + auto device support in BufferBaseClass
* Update docstring
* Update tests
* Unify tests
* Update changelog
* Fix tests on CUDA device
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
* Precompute batch norm parameters in `_setup_model` and directly copy them in the target update.
* Update test
* Add comments and update tests
* Bump version
* Remove one extra space to conform code style.
* Update docstrings
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Burak Demirbilek <BurakDmb@users.noreply.github.com>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
* create Hparam class & support in all OutputFormats
* add hparams documentation & example
* add hparam tests
* remove unnecessary test & fix name
* format changes
* support hyperparameters logging to tensorboard
* fix HParams class docstring
* use more explicit variable names
* raise error instead of warning
* Unpin protobuf
* Add test for logging hparams
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Handle non 1D action shape
* Revert changes of observation (out of the scope of this PR)
* Apply changes to DictReplayBuffer
* Update tests
* Rollout buffer n-D actions space handling
* Remove error when non 1D action space
* ActorCriticPolicy return action with the proper shape
* remove useless reshape
* Update changelog
* Add tests
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Add info on split tensorboard graphs.
* Change wording to make it look better.
* Update changelog.rst
* Rephrase and add link to issue
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Use higher resolution time and round up to eps
* Update changelog
* Add test case
* Fix formatting, time()->time_ns
* Bugfix: ns is integer not float
* Move test to better place
* Divide by 1e9 earlier
* `arr[0]` to `arr.squeeze(0)`
* `squeeze(axis=0)` to `squeeze(0)`
* Type testing
* Add type test for unvectorized observation
* `squeeze(0)` to `squeeze(axis=0)`
* Treatment of the laziness symptoms
* Update changelog
* Udate changelog
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Prohibit simultaneous use of optimize_memory_buffer and handle_timeout_termination
* Modify test to avoid unsupported buffer configuration
* Change from assertion to raising of ValueError
* Update changelog
* Update style for consistency
* Use handle_timeout_termination when possible
Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Fixed unchecked None value in SubprocVecEnv
* Fixed unchecked None value in DummyVecEnv
* Fix formatting
* Update test and changelog
* Improve test
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* escape tensorboard log name
Otherwise utils does not recognize the log.
* Added fix to changelog
* Modifications made by: make commit-checks .
* Revert "Modifications made by: make commit-checks ."
This reverts commit 529a275d9475f85ef031038a8f3565f7301e5371.
* Update changelog and add test
Co-authored-by: James Hirschorn <James.Hirschorn@quantitative-technologies.com>
* Goal sampled from next_achieved_goal instead of achived_goal
* No need to have special case for future anymore
* Update changelog
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Replacing the policy registry with policy "aliases"
* Fixing import order and SAC
* Changing arg. order to be sure policy_aliases is a kwarg
* Import orders
* Removing pytype error check
* Reformat
* Fix alias import
* Not using mutable {} as default for policy_aliases
* Empty aliases initialization
* Using static attributes for policy_aliases
* Fixing isort
* Fixing back bad merge
* Running isort
* Fixing aliases for A2C and PPO
* Using f-string
* Moving policy_aliases definition position
* Adding change in the changelog
* Update version
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Removing dead code for handling time limits (see #829)
* Mentionning remove_time_limit_termination in the changelog
* Update changelog.rst
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Added StopTrainingOnNoModelImprovement callback and callback_after_eval parameter in EvalCallback
* Correction in EvalCallback and tests for StopTrainingOnNoModelImprovement
* Update the docs related to new StopTrainingOnNoModelImprovement callback
* Update doc
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
* Allow PPO to turn of advantage normalization
* update changelog
* Add a test case
* Update test and sanity check
* Fix tests
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Make HumanOutputFormat length configurable and bump to 36 by default
* Add test case
* Updated changelog
* Blacken
* Blacken code
* Fix GitLab CI: switch to Docker container with new black version
* Incorporate suggestion
* Add class docstring
* Dummy commit to retrigger GitLab
Co-authored-by: Anssi <kaneran21@hotmail.com>
* fix Atari in CI
* fix dtype and atari extra
* Update setup.py
* remove 3.6
* note about how to install Atari
* pendulum-v1
* atari v5
* black
* fix pendulum capitalization
* add minimum version
* moved things in changelog to breaking changes
* partial v5 fix
* env update to pass tests
* mismatch env version fixed
* Fix tests after merge
* Include autorom in setup.py
* Blacken code
* Fix dtype issue in more robust way
* Fix GitLab CI: switch to Docker container with new black version
* Remove workaround from GitLab. (May need to rebuild Docker for this though.)
* Revert to v4
* Update setup.py
* Apply suggestions from code review
* Remove unnecessary autorom
* Consistent gym versions
Co-authored-by: J K Terry <justinkterry@gmail.com>
Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: modanesh <mohamad4danesh@gmail.com>
Co-authored-by: Adam Gleave <adam@gleave.me>
* Add Hugging Face to SB3 doc
* Update doc + fixes
* Use SB3 model from the hub
* Bump version
* Fixes
Co-authored-by: simoninithomas <simonini_thomas@outlook.fr>
* Writing the additional info_keywords into the episode infos that are passed to the resulst writer. Directly taken from the non-vec version of monitor.
* Added test for monitoring info_keywords.
* Removed unnecessary step of registering the env. Not using make_vec_env, because it applies a monitor wrapper to the env.
* Reformat
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* more verbose documentation regarding `.load` vs `.set_parameters` (#683, #614)
* add a note to explain the difference between `.load` and `.set_parameters` to the examples
* fix typos
Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Anssi <kaneran21@hotmail.com>
* Added ``newline="\n"`` when opening CSV monitor files so that each line ends with ``\r\n`` instead of ``\r\r\n`` on Windows while Linux environments are not affected
* Add multi-env training support for SAC
* Fix for dict obs
* Pytype fixes
* Fix assert on number of envs
* Remove for loop
* Add support for Dict obs
* Start cleanup
* Update doc and bug fix
* Add support for vectorized action noise
and add multi env example for off-policy
* Update version
* Bug fix with VecNormalize
* Update README table
* Update variable names
* Update changelog and version
* Update doc and fix for `gradient_steps=-1`
* Add test for `gradient_steps=-1`
* Disable pytype pyi errors
* Fix for DQN
* Update comment on deepcopy
* Remove episode_reward field
* Fix RolloutReturn
* Avoid modification by reference
* Fix error message
Co-authored-by: Anssi <kaneran21@hotmail.com>
* Fix evaluation script for RNN
* Add error message
* Revert "Add error message"
This reverts commit 8d69b6cf4de2cd13aecfb425bd3145fad6a6c49a.
* Fix for pytype
* Rename mask to `episode_start`
* Fix type hint
* Fix type hints
* Remove confusing part of sentence
Co-authored-by: Anssi <kaneran21@hotmail.com>
* Add a section on exporting to TFLite/Coral with demonstration
* Changelog to reflect new export documentation
* Update docs/guide/export.rst
Fingers on autopilot make word wrong
Co-authored-by: Anssi <kaneran21@hotmail.com>
* Update docs/guide/export.rst
Better wording clarity
Co-authored-by: Anssi <kaneran21@hotmail.com>
* Update docs/guide/export.rst
Better wording clarity
Co-authored-by: Anssi <kaneran21@hotmail.com>
* Clarify motivations and hardware
* Update docs/misc/changelog.rst
Make consistent with other changelog entries
Co-authored-by: Anssi <kaneran21@hotmail.com>
* Sphinx wants the section underline to be at least this long
* Remove first-person voice
* Typos
Co-authored-by: Anssi <kaneran21@hotmail.com>
* Update rl_tips.rst
indent fix to make if done and its following statement work
* Fix indentation and update changelog
* Skip type check for python 3.9
Co-authored-by: paulg <cove9988@gmail.com>
* Store number of timesteps at the beginning of each learn cycle
* Update changelog
* Set default _num_timesteps_at_start in the contructor
* Test case for FPS logger
* Adjust test to cover both on-policy and off-policy algorithms
* Fix formatting
* Update test and add comment
* Fix test
Co-authored-by: Oleksii Kachaiev <okachaiev@riotgames.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Add highway-env to the list of projects using SB3
Many thanks for this fantastic library, keep up the good work!
* Update changelog with added documentation
* Add `system_env_info`
* Add `print_system_info` to load
and store system info at save time
* Remove TODO
* Rename to `get_system_info`
* Import as sb3 for consistency
* Update changelog
* Add warning for old SB3 versions
* Use underscore litteral for more clarity
* Use a consistent key to log the total timesteps
This changes the timestep logging key of on-policy algorithms from
`time/total_timesteps` to `time/total timesteps` (note the
underscore/space). The off-policy algorithms and the eval callback
already use the latter, so this behavior is more consistent.
* Use underscores instead of spaces in logging keys
Most keys already followed this policy and consistent behavior is
friendlier to new users.
* Minor edit and bump version
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Updated ONNX documentation
First draft on the documentation explaining how to export SB3 models in the ONNX format
* Updated changelog with ONNX documentation fix
* Address comments
* Update changelog.rst
* Update rtd env
* Fixes + add test example
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Anssi Kanervisto <anssk@Anssis-MacBook-Air.local>
Co-authored-by: Anssi Kanervisto <kaneran21@hotmail.com>
* VecNormalize: allow non-continuous observations when norm_obs is False
* Update changelog, fix lint
* Switch to environment present in new and old versions of Gym
* Fix name
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* feat: add method predict_values for ActorCriticPolicy
* Fixes for new gym version
* Reformat
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* feat: get_distribution method for ActorCriticPolicy
New method get_distribution for class ActorCriticPolicy returning current action distribution given observations
* doc: updating changelog.rst
- adding block for Release 1.2.1a0
- adding cyprienc to contributors
* style: make format
* fix: updating version.txt
Changing version from 1.2.0 to 1.2.1a0
* Update changelog
* Add test for get distribution
Co-authored-by: Cyprien <courtot.c@gmail.com>
* make sure DQN policy is always in correct mode - train or eval
* make set_training_mode an abstract method of the base policy - safer
* update docstring of _build method to note that the target network is put into eval mode
* use set_training_mode to put the dqn target network into eval mode
* use set_training_mode to set the training model of the q-network
* move set_training_mode abstract method from BasePolicy to BaseModel
* set train and eval mode for TD3
* make sure critic is always in correct mode during train
* set train and eval mode for SAC
* add comment re batch norm and dropout
* set train and eval mode for A2C and PPO
* add tests for collect rollouts with batch norm
* fix formatting
* update change log
* update version
* remove Optional typing for batch size - causing type check to fail
* Fix scipy dependency for toy text envs
* implement set_training_mode method in BaseModel
* move all tests of train/eval mode to test_train_eval_mode
* call learn with learning_starts = total_timesteps to test that collect_rollouts does not update batch norm
* remove extra calls to set_training_mode in train method of TD3 and SAC
* Allow gradient_steps=0
* Refactor tests
* Add comment + use aliases
* Typos
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* training and evaluation: call model.train() and model.eval() to enable and disable dropout and batchnorm
* Add comment documentation
* Fix train and eval for the Actor class
* Run black
* Add github handle to changelog
* Add unit tests for PPO and DQN
* Refactor unit test
* Run black
* unit test: add a dropout layer and check that calling predict with deterministic=True is deterministic
* documentation: add bugfix description to changelog
* unit test: use learning_starts=0, decrease the size of the network and use more training steps
* on policy algorithms: call policy.train() and policy.eval() instead of disable_training and enable_training as it is a th.nn.module
* Rename unit test
* unit test: use drop out probability of 0.5
* Call policy.train and policy.eval
* Fixes + update tests
* Remove unneeded eval
Co-authored-by: David Blom <davidsblom@gmail.com>
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Bump version and update doc
* Fix name
* Apply suggestions from code review
Co-authored-by: Adam Gleave <adam@gleave.me>
* Update docs/index.rst
Co-authored-by: Adam Gleave <adam@gleave.me>
* Update wording for RL zoo
Co-authored-by: Adam Gleave <adam@gleave.me>
* Add support for custom objects
* Add python 3.8 to the CI
* Bump version
* PyType fixes
* [ci skip] Fix typo
* Add note about slow-down + fix typos
* Minor edits to the doc
* Bug fix for DQN
* Update test
* Add test for custom objects
* Removed unneeded overrides of feature_extractor and normalize_images in the TD3 Actor.
* Add learning rate schedule example (#248)
* Add learning rate schedule example
* Update docs/guide/examples.rst
Co-authored-by: Adam Gleave <adam@gleave.me>
* Address comments
Co-authored-by: Adam Gleave <adam@gleave.me>
* Add supported action spaces checks (#254)
* Add supported action spaces checks
* Address comment
* Use `pass` in an abstractmethod instead of deleting the arguments.
* Remove the "deterministic" keyword from the forward method of the TD3 Actor since it always is deterministic anyways.
* Rename _get_data to _get_data_to_reconstruct_model.
_get_data was too generic and could have meant anything.
* Remove the n_episodes_rollout parameter and allow passing tuples as train_freq instead.
* Fix docstring of `train_freq` parameter.
* Black fixes.
* Fix TD3 delayed update + rename `_get_data()`
* Fix TD3 test
* Normalize `train_freq` to a tuple in the constructor and turn the warning into an assert.
* Make one step the default train frequency.
* Black fixes.
* Change np.bool to bool.
* Use the tuple format to specify an amount of steps in terms of steps or episodes in the collect_collouts of the off policy algorithm.
* Use the tuple format to specify an amount of steps in terms of steps or episodes in the collect_collouts of HER.
* Use named tuple for train freq
* Rename train_freq to train_every and TrainFreq to ExperienceDuration. Also add some type annotations and documentation.
* Black fixes.
* Revert to train_freq
* Fix terminal observation issues
* Typo
* Fix action noise bug in HER
* Add assert when loading HER models
* Update version
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Adam Gleave <adam@gleave.me>
* add support for text records to logger
* add note on how to access summary writer directly
* escape unicode chars for HumanOutputFormat
* update changelog
* fix formatting
* fix docs
* add tests
* fix formatting
* fix example, link to pytorch docs, update changelog
* move unicode escaping to own function, properly escape quotechars in csv formatter
* switch from n_calls to num_timesteps in example
* make step coherent in example
* use n_calls to check when to login example
* add small hint about log frequency
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* add comment about str is scalar type, improve test input
* Update tests
* Update test_logger.py
* use repr to handle strings in logger
* remove repr from text log output
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Fixed discrete obs support
* Suggest new edit, fix failed test
* Revert "Suggest new edit, fix failed test"
This reverts commit 6892bf05506bb5ad0e87016d8d382705ab72e6a4.
* Fix test
* Special case for discrete obs
Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>
* Add warning about total_env_steps not dividing neatly into batch size
* Stylistic cleanup
* Black reformatting
* Add clearer documentation and update changelog
* Update changelog.rst
* Use specific RolloutBuffer terminology
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Change to minibatch language
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Cleaning up language describing rollout buffer requirements
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Switch to using env.num_envs
* Working tests
* Black and isort still fighting each other
* codestyle finally happy
* Basic test exists, possibly in the wrong file
* Update phrasing
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Added Image and Figure classes to logger. For now, these objects can only be logged by TensorBoardOutputFormat
* Added documentation for figure and image logging into tensorboard
* Updated changelog
* Minor changes to documentation. Reviewed supported types for logging images and figures
* Fix type for np arrays
* Added more explicit example for logging figures in the documentation. Added docstrings for parameters in logging auxiliary classes
* Added tests for image and figure logging
* Applied autoformatting
* Update doc
* Fix documentation example
* Bump version
Co-authored-by: Carlos Casas <ccasascuadrado@guidewire.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Fix big when saving/loading q-net alone
* Rename variables to match SB3-contrib
* Update docker image
* Set min version for tensorboard
* Add SB3-Contrib to doc
* Update DQN
* Apply suggestions from code review
Co-authored-by: Adam Gleave <adam@gleave.me>
* Update wording
Co-authored-by: Adam Gleave <adam@gleave.me>
* Add SUMO-RL as example project in the docs
* Fixed docstring of AtariWrapper which was not inside of __init__
* Updated changelog regarding docs
* Fix docstring of classes in atari_wrappers.py which were inside the constructor
* Formated docstring with black
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Fix for arguments order in explained_variance()
Fix for arguments order in explained_variance() in PPO
* Fix for arguments order in explained_variance()
Fix for arguments order in explained_variance() in a2c
* Fix for arguments order in explained_variance()
update changelog.rst
* Update evaluate_policy to use monitor data if available
* Update documentation
* Cleaning up
* Remove unnecessary typing trickery
* Update doc
* Rename is_wrapped to clarify it is for vecenvs
* Add is_wrapped for regular envs
* Add is_wrapped call for subprocvecenv and update code for circular imports
* Move new functions back to env_util and fix imports
* Update changelog
* Clarify evaluate_policy docs
* Add tests for wrapped modifying episode lengths
* Fix tests
* Update changelog
* Minor edits
* Add warn switch to evaluate_policy and update tests
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>