* Handle non 1D action shape
* Revert changes of observation (out of the scope of this PR)
* Apply changes to DictReplayBuffer
* Update tests
* Rollout buffer n-D actions space handling
* Remove error when non 1D action space
* ActorCriticPolicy return action with the proper shape
* remove useless reshape
* Update changelog
* Add tests
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Use higher resolution time and round up to eps
* Update changelog
* Add test case
* Fix formatting, time()->time_ns
* Bugfix: ns is integer not float
* Move test to better place
* Divide by 1e9 earlier
* `arr[0]` to `arr.squeeze(0)`
* `squeeze(axis=0)` to `squeeze(0)`
* Type testing
* Add type test for unvectorized observation
* `squeeze(0)` to `squeeze(axis=0)`
* Treatment of the laziness symptoms
* Update changelog
* Udate changelog
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Prohibit simultaneous use of optimize_memory_buffer and handle_timeout_termination
* Modify test to avoid unsupported buffer configuration
* Change from assertion to raising of ValueError
* Update changelog
* Update style for consistency
* Use handle_timeout_termination when possible
Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Fixed unchecked None value in SubprocVecEnv
* Fixed unchecked None value in DummyVecEnv
* Fix formatting
* Update test and changelog
* Improve test
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* escape tensorboard log name
Otherwise utils does not recognize the log.
* Added fix to changelog
* Modifications made by: make commit-checks .
* Revert "Modifications made by: make commit-checks ."
This reverts commit 529a275d9475f85ef031038a8f3565f7301e5371.
* Update changelog and add test
Co-authored-by: James Hirschorn <James.Hirschorn@quantitative-technologies.com>
* Replacing the policy registry with policy "aliases"
* Fixing import order and SAC
* Changing arg. order to be sure policy_aliases is a kwarg
* Import orders
* Removing pytype error check
* Reformat
* Fix alias import
* Not using mutable {} as default for policy_aliases
* Empty aliases initialization
* Using static attributes for policy_aliases
* Fixing isort
* Fixing back bad merge
* Running isort
* Fixing aliases for A2C and PPO
* Using f-string
* Moving policy_aliases definition position
* Adding change in the changelog
* Update version
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Removing dead code for handling time limits (see #829)
* Mentionning remove_time_limit_termination in the changelog
* Update changelog.rst
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Added StopTrainingOnNoModelImprovement callback and callback_after_eval parameter in EvalCallback
* Correction in EvalCallback and tests for StopTrainingOnNoModelImprovement
* Update the docs related to new StopTrainingOnNoModelImprovement callback
* Update doc
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
* Make HumanOutputFormat length configurable and bump to 36 by default
* Add test case
* Updated changelog
* Blacken
* Blacken code
* Fix GitLab CI: switch to Docker container with new black version
* Incorporate suggestion
* Add class docstring
* Dummy commit to retrigger GitLab
Co-authored-by: Anssi <kaneran21@hotmail.com>
* Writing the additional info_keywords into the episode infos that are passed to the resulst writer. Directly taken from the non-vec version of monitor.
* Added test for monitoring info_keywords.
* Removed unnecessary step of registering the env. Not using make_vec_env, because it applies a monitor wrapper to the env.
* Reformat
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* more verbose documentation regarding `.load` vs `.set_parameters` (#683, #614)
* add a note to explain the difference between `.load` and `.set_parameters` to the examples
* fix typos
Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Anssi <kaneran21@hotmail.com>
* Added ``newline="\n"`` when opening CSV monitor files so that each line ends with ``\r\n`` instead of ``\r\r\n`` on Windows while Linux environments are not affected
* Add multi-env training support for SAC
* Fix for dict obs
* Pytype fixes
* Fix assert on number of envs
* Remove for loop
* Add support for Dict obs
* Start cleanup
* Update doc and bug fix
* Add support for vectorized action noise
and add multi env example for off-policy
* Update version
* Bug fix with VecNormalize
* Update README table
* Update variable names
* Update changelog and version
* Update doc and fix for `gradient_steps=-1`
* Add test for `gradient_steps=-1`
* Disable pytype pyi errors
* Fix for DQN
* Update comment on deepcopy
* Remove episode_reward field
* Fix RolloutReturn
* Avoid modification by reference
* Fix error message
Co-authored-by: Anssi <kaneran21@hotmail.com>
* Fix evaluation script for RNN
* Add error message
* Revert "Add error message"
This reverts commit 8d69b6cf4de2cd13aecfb425bd3145fad6a6c49a.
* Fix for pytype
* Rename mask to `episode_start`
* Fix type hint
* Fix type hints
* Remove confusing part of sentence
Co-authored-by: Anssi <kaneran21@hotmail.com>
* Store number of timesteps at the beginning of each learn cycle
* Update changelog
* Set default _num_timesteps_at_start in the contructor
* Test case for FPS logger
* Adjust test to cover both on-policy and off-policy algorithms
* Fix formatting
* Update test and add comment
* Fix test
Co-authored-by: Oleksii Kachaiev <okachaiev@riotgames.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Add `system_env_info`
* Add `print_system_info` to load
and store system info at save time
* Remove TODO
* Rename to `get_system_info`
* Import as sb3 for consistency
* Update changelog
* Add warning for old SB3 versions
* Use underscore litteral for more clarity
* Use a consistent key to log the total timesteps
This changes the timestep logging key of on-policy algorithms from
`time/total_timesteps` to `time/total timesteps` (note the
underscore/space). The off-policy algorithms and the eval callback
already use the latter, so this behavior is more consistent.
* Use underscores instead of spaces in logging keys
Most keys already followed this policy and consistent behavior is
friendlier to new users.
* Minor edit and bump version
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>