* Add progress bar callback and argument
* Update doc
* Update changelog
* Upgrade pytype in docker image
* Use tqdm.write in the logger to have cleaner output
* Fix logger test
* Fix when doing multiple calls to learn()
* Address comments from code-review
* Added option to override or use existing CSVs
* Updated changelog for Monitor override
* Changed default value to override
* Simplify code and add test
* Update version
* Fix for pytype
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* fix nan in advnatages with batch size 1, for ppo
* changelog
* black
* Simplify test
* Bump version
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* include `running_mean` and `running_val` when updating target networks in DQN, SAC, TD3.
* Update stable_baselines3/common/utils.py
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Precompute batch norm parameters in `_setup_model` and directly copy them in the target update.
* include `running_mean` and `running_val` when updating target networks in DQN, SAC, TD3.
* Update stable_baselines3/common/utils.py
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Precompute batch norm parameters in `_setup_model` and directly copy them in the target update.
* Fix `DictReplayBuffer.next_observations` type (#1013)
* Fix DictReplayBuffer.next_observations type
* Update changelog
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Fixed missing verbose parameter passing (#1011)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* Support for `device=auto` buffers and set it as default value (#1009)
* Default device is "auto" for buffer + auto device support in BufferBaseClass
* Update docstring
* Update tests
* Unify tests
* Update changelog
* Fix tests on CUDA device
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
* Precompute batch norm parameters in `_setup_model` and directly copy them in the target update.
* Update test
* Add comments and update tests
* Bump version
* Remove one extra space to conform code style.
* Update docstrings
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Burak Demirbilek <BurakDmb@users.noreply.github.com>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
* create Hparam class & support in all OutputFormats
* add hparams documentation & example
* add hparam tests
* remove unnecessary test & fix name
* format changes
* support hyperparameters logging to tensorboard
* fix HParams class docstring
* use more explicit variable names
* raise error instead of warning
* Unpin protobuf
* Add test for logging hparams
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* escape tensorboard log name
Otherwise utils does not recognize the log.
* Added fix to changelog
* Modifications made by: make commit-checks .
* Revert "Modifications made by: make commit-checks ."
This reverts commit 529a275d9475f85ef031038a8f3565f7301e5371.
* Update changelog and add test
Co-authored-by: James Hirschorn <James.Hirschorn@quantitative-technologies.com>
* Goal sampled from next_achieved_goal instead of achived_goal
* No need to have special case for future anymore
* Update changelog
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Replacing the policy registry with policy "aliases"
* Fixing import order and SAC
* Changing arg. order to be sure policy_aliases is a kwarg
* Import orders
* Removing pytype error check
* Reformat
* Fix alias import
* Not using mutable {} as default for policy_aliases
* Empty aliases initialization
* Using static attributes for policy_aliases
* Fixing isort
* Fixing back bad merge
* Running isort
* Fixing aliases for A2C and PPO
* Using f-string
* Moving policy_aliases definition position
* Adding change in the changelog
* Update version
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Add Hugging Face to SB3 doc
* Update doc + fixes
* Use SB3 model from the hub
* Bump version
* Fixes
Co-authored-by: simoninithomas <simonini_thomas@outlook.fr>
* Add multi-env training support for SAC
* Fix for dict obs
* Pytype fixes
* Fix assert on number of envs
* Remove for loop
* Add support for Dict obs
* Start cleanup
* Update doc and bug fix
* Add support for vectorized action noise
and add multi env example for off-policy
* Update version
* Bug fix with VecNormalize
* Update README table
* Update variable names
* Update changelog and version
* Update doc and fix for `gradient_steps=-1`
* Add test for `gradient_steps=-1`
* Disable pytype pyi errors
* Fix for DQN
* Update comment on deepcopy
* Remove episode_reward field
* Fix RolloutReturn
* Avoid modification by reference
* Fix error message
Co-authored-by: Anssi <kaneran21@hotmail.com>
* Fix evaluation script for RNN
* Add error message
* Revert "Add error message"
This reverts commit 8d69b6cf4de2cd13aecfb425bd3145fad6a6c49a.
* Fix for pytype
* Rename mask to `episode_start`
* Fix type hint
* Fix type hints
* Remove confusing part of sentence
Co-authored-by: Anssi <kaneran21@hotmail.com>
* Add `system_env_info`
* Add `print_system_info` to load
and store system info at save time
* Remove TODO
* Rename to `get_system_info`
* Import as sb3 for consistency
* Update changelog
* Add warning for old SB3 versions
* Use underscore litteral for more clarity
* Use a consistent key to log the total timesteps
This changes the timestep logging key of on-policy algorithms from
`time/total_timesteps` to `time/total timesteps` (note the
underscore/space). The off-policy algorithms and the eval callback
already use the latter, so this behavior is more consistent.
* Use underscores instead of spaces in logging keys
Most keys already followed this policy and consistent behavior is
friendlier to new users.
* Minor edit and bump version
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* feat: add method predict_values for ActorCriticPolicy
* Fixes for new gym version
* Reformat
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* feat: get_distribution method for ActorCriticPolicy
New method get_distribution for class ActorCriticPolicy returning current action distribution given observations
* doc: updating changelog.rst
- adding block for Release 1.2.1a0
- adding cyprienc to contributors
* style: make format
* fix: updating version.txt
Changing version from 1.2.0 to 1.2.1a0
* Update changelog
* Add test for get distribution
Co-authored-by: Cyprien <courtot.c@gmail.com>
* make sure DQN policy is always in correct mode - train or eval
* make set_training_mode an abstract method of the base policy - safer
* update docstring of _build method to note that the target network is put into eval mode
* use set_training_mode to put the dqn target network into eval mode
* use set_training_mode to set the training model of the q-network
* move set_training_mode abstract method from BasePolicy to BaseModel
* set train and eval mode for TD3
* make sure critic is always in correct mode during train
* set train and eval mode for SAC
* add comment re batch norm and dropout
* set train and eval mode for A2C and PPO
* add tests for collect rollouts with batch norm
* fix formatting
* update change log
* update version
* remove Optional typing for batch size - causing type check to fail
* Fix scipy dependency for toy text envs
* implement set_training_mode method in BaseModel
* move all tests of train/eval mode to test_train_eval_mode
* call learn with learning_starts = total_timesteps to test that collect_rollouts does not update batch norm
* remove extra calls to set_training_mode in train method of TD3 and SAC
* Allow gradient_steps=0
* Refactor tests
* Add comment + use aliases
* Typos
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* training and evaluation: call model.train() and model.eval() to enable and disable dropout and batchnorm
* Add comment documentation
* Fix train and eval for the Actor class
* Run black
* Add github handle to changelog
* Add unit tests for PPO and DQN
* Refactor unit test
* Run black
* unit test: add a dropout layer and check that calling predict with deterministic=True is deterministic
* documentation: add bugfix description to changelog
* unit test: use learning_starts=0, decrease the size of the network and use more training steps
* on policy algorithms: call policy.train() and policy.eval() instead of disable_training and enable_training as it is a th.nn.module
* Rename unit test
* unit test: use drop out probability of 0.5
* Call policy.train and policy.eval
* Fixes + update tests
* Remove unneeded eval
Co-authored-by: David Blom <davidsblom@gmail.com>
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>