stable-baselines3

mirror of https://github.com/saymrwulf/stable-baselines3.git synced 2026-05-29 23:07:07 +00:00

Author	SHA1	Message	Date
Joe Ksiazek	0b06d8ab20	Fix error when loading a model that has net_arch manually set to None (#1937 ) * Fix loading a model with net_arch=None * Remove redundant get * Dummy commit * Add to contributors * Update test and version --------- Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2024-06-05 17:27:40 +02:00
Ole Petersen	6c00565778	Fix memory leak in base_class.py (#1908 ) * Fix memory leak in base_class.py Loading the data return value is not necessary since it is unused. Loading the data causes a memory leak through the ep_info_buffer variable. I found this while loading a PPO learner from storage on a multi-GPU system since the ep_info_buffer is loaded to the memory location it was on while it was saved to disk, instead of the target loading location, and is then not cleaned up. * Update changelog.rst * Update changelog --------- Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2024-05-15 15:59:32 +02:00
Chris Schindlbeck	4317c62598	Fix various typos (#1926 ) * Fix various typos * Update changelog --------- Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2024-05-15 15:19:39 +02:00
Rushit Shah	f375cc3939	Fix docstring for ``log_interval`` to differentiate between on-policy/off-policy logging frequency (#1855 ) * Fix docstring for log_interval inside the learn method in the base class. * Updated changelog. * Update docstring --------- Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2024-03-04 11:42:16 +01:00
Antonin RAFFIN	c6c660e51b	Fix type annotations of buffers (#1700 ) * Fix type annotation and replay buffer * Exclude pytype check * Remove some pytype specific annotaiton and update changelog * Fix HerReplayBuffer type hints * try remove # type: ignore[assignment] * revert change --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2023-09-28 18:52:46 +02:00
Antonin RAFFIN	1036c05680	Release v2.0.0 (#1571 ) * RUF012: Explicit ClassVar * Prepare v2.0.0 * Update docs/misc/changelog.rst --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2023-06-23 12:21:58 +02:00
Antonin RAFFIN	40e0b9d2c8	Add Gymnasium support (#1327 ) * Fix failing set_env test * Fix test failiing due to deprectation of env.seed * Adjust mean reward threshold in failing test * Fix her test failing due to rng * Change seed and revert reward threshold to 90 * Pin gym version * Make VecEnv compatible with gym seeding change * Revert change to VecEnv reset signature * Change subprocenv seed cmd to call reset instead * Fix type check * Add backward compat * Add `compat_gym_seed` helper * Add goal env checks in env_checker * Add docs on HER requirements for envs * Capture user warning in test with inverted box space * Update ale-py version * Fix randint * Allow noop_max to be zero * Update changelog * Update docker image * Update doc conda env and dockerfile * Custom envs should not have any warnings * Fix test for numpy >= 1.21 * Add check for vectorized compute reward * Bump to gym 0.24 * Fix gym default step docstring * Test downgrading gym * Revert "Test downgrading gym" This reverts commit 0072b77156c006ada8a1d6e26ce347ed85a83eeb. * Fix protobuf error * Fix in dependencies * Fix protobuf dep * Use newest version of cartpole * Update gym * Fix warning * Loosen required scipy version * Scipy no longer needed * Try gym 0.25 * Silence warnings from gym * Filter warnings during tests * Update doc * Update requirements * Add gym 26 compat in vec env * Fixes in envs and tests for gym 0.26+ * Enforce gym 0.26 api * format * Fix formatting * Fix dependencies * Fix syntax * Cleanup doc and warnings * Faster tests * Higher budget for HER perf test (revert prev change) * Fixes and update doc * Fix doc build * Fix breaking change * Fixes for rendering * Rename variables in monitor * update render method for gym 0.26 API backwards compatible (mode argument is allowed) while using the gym 0.26 API (render mode is determined at environment creation) * update tests and docs to new gym render API * undo removal of render modes metatadata check * set rgb_array as default render mode for gym.make * undo changes & raise warning if not 'rgb_array' * Fix type check * Remove recursion and fix type checking * Remove hacks for protobuf and gym 0.24 * Fix type annotations * reuse existing render_mode attribute * return tiled images for 'human' render mode * Allow to use opencv for human render, fix typos * Add warning when using non-zero start with Discrete (fixes #1197) * Fix type checking * Bug fixes and handle more cases * Throw proper warnings * Update test * Fix new metadata name * Ignore numpy warnings * Fixes in vec recorder * Global ignore * Filter local warning too * Monkey patch not needed for gym 26 * Add doc of VecEnv vs Gym API * Add render test * Fix return type * Update VecEnv vs Gym API doc * Fix for custom render mode * Fix return type * Fix type checking * check test env test_buffer * skip render check * check env test_dict_env * test_env test_gae * check envs in remaining tests * Update tests * Add warning for Discrete action space with non-zero (#1295) * Fix atari annotation * ignore get_action_meanings [attr-defined] * Fix mypy issues * Add patch for gym/gymnasium transition * Switch to gymnasium * Rely on signature instead of version * More patches * Type ignore because of https://github.com/Farama-Foundation/Gymnasium/pull/39 * Fix doc build * Fix pytype errors * Fix atari requirement * Update env checker due to change in dtype for Discrete * Fix type hint * Convert spaces for saved models * Ignore pytype * Remove gitlab CI * Disable pytype for convert space * Fix undefined info * Fix undefined info * Upgrade shimmy * Fix wrappers type annotation (need PR from Gymnasium) * Fix gymnasium dependency * Fix dependency declaration * Cap pygame version for python 3.7 * Point to master branch (v0.28.0) * Fix: use main not master branch * Rename done to terminated * Fix pygame dependency for python 3.7 * Rename gym to gymnasium * Update Gymnasium * Fix test * Fix tests * Forks don't have access to private variables * Fix linter warnings * Update read the doc env * Fix env checker for GoalEnv * Fix import * Update env checker (more info) and fix dtype * Use micromamab for Docker * Update dependencies * Clarify VecEnv doc * Fix Gymnasium version * Copy file only after mamba install * [ci skip] Update docker doc * Polish code * Reformat * Remove deprecated features * Ignore warning * Update doc * Update examples and changelog * Fix type annotation bundle (SAC, TD3, A2C, PPO, base class) (#1436) * Fix SAC type hints, improve DQN ones * Fix A2C and TD3 type hints * Fix PPO type hints * Fix on-policy type hints * Fix base class type annotation, do not use defaults * Update version * Disable mypy for python 3.7 * Rename Gym26StepReturn * Update continuous critic type annotation * Fix pytype complain --------- Co-authored-by: Carlos Luis <carlos.luisgonc@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Thomas Lips <37955681+tlpss@users.noreply.github.com> Co-authored-by: tlips <thomas.lips@ugent.be> Co-authored-by: tlpss <thomas17.lips@gmail.com> Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>	2023-04-14 13:13:59 +02:00
Jonas Reiher	12250eb761	Add stats window argument (#1424 ) * added stats_window_size argument * updated changelog * docstring info updated * added missing tensorboard log docstring * added stats_window_size argument for all models * fixed stats_window_size test * Update version --------- Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>	2023-04-05 11:33:26 +02:00
Antonin RAFFIN	5a70af8abd	Fix type hints for DQN (#1354 ) * Fix type hints for DQN * [ci skip] Remove commented line * Refine types * Fix vectorized obs detection * Fix for pytype * Fix check at load time to create replay buffer * One config file to rule them all * Delete unused config --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2023-03-30 11:31:47 +02:00
Antonin RAFFIN	10e83865ec	Switch to `pyproject.toml` and `ruff` (#1361 ) * Switch to `pyproject.toml` and `ruff` * Fix for Atari ROMs and mypy * Switch order in CI, lint first	2023-03-11 22:15:26 +01:00
Vikas Kumar	69b94dd6a8	Rename "timesteps" to "episodes" in `log_interval` documentation (#1325 ) * change timestamp to episode for logging * update changelog * minor format modif * minor format modif --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2023-02-10 21:15:09 +01:00
Alex Pasquali	b702884c23	Removed shared layers in mlp_extractor (#1292 ) * Modified actor-critic policies & MlpExtractor class ActorCriticPolicy: - changed type hint of net_arch param: now it's a dict - removed check that if features extractor is not shared: no shared layers are allowed in the mlp_extractor regardless of the features extractor ActorCriticCnnPolicy: - changed type hint of net_arch param: now it's a dict MultiInputActorcriticPolicy: - changed type hint of net_arch param: now it's a dict MlpExtractor: - changed type hint of net_arch param: now it's a dict - adapted networks creation - adapted methods: forward, forward_actor & forward_critic * Removed shared layers in mlp_extractor * Updated docs and changelog + reformat * Updated custom policy tests * Removed test on deprecation warning for share layers in mlp_extractor Now shared layers are removed * Update version * Update RL Zoo doc * Fix linter warnings * Add ruff to Makefile (experimental) * Add backward compat code and minor updates * Update tests * Add backward compatibility * Fix test * Improve compat code Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2023-01-23 14:55:19 +01:00
Alex Pasquali	30a19848ce	Deprecation of shared layers in `MlpExtractor` (#1252 ) * Deprecation warning for shared layers in Mlpextractor * Updated changelog * Updated custom policy doc * Update doc and deprecation * Fix doc build * Minor edits Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2023-01-05 09:59:36 +01:00
Quentin Gallouédec	4fa17dcf0f	Standardize the use of `from gym import spaces` (#1240 ) * generalize the use of `from gym import spaces` * command line get system info * Documentation line length for doc * update changelog * add space before os plateform to avoid ref to other issue * format * get_system_info update in changelog * fix type check error * fix get system info * add comment about regex * update version	2023-01-02 14:51:11 +01:00
Antonin RAFFIN	e78ba6ffa4	Hotfix to load policies saved with SB3 <= v1.6 (#1234 ) * Hotfix to load policies saved with SB3 <= v1.6 * Add warning and test * Update doc	2022-12-22 23:58:30 +01:00
Quentin Gallouédec	e3b24829a5	Drop `gym.GoalEnv` and other minor changes initally from #780 (#1184 ) * Various changes from #780 * Fix env_checker for goal_env detection	2022-11-28 18:22:31 +01:00
Antonin RAFFIN	cd630a3121	Fixes for flake8 6.0 (#1181 )	2022-11-25 15:14:55 +01:00
Quentin Gallouédec	f3abda5cbc	Fix `Self` return type (#1167 ) * Fix Self annotation * Update changelog * Define type var on top * ClassSelf to SelfClass * annotate self * Revert Running meanstd change * Revert vecnormalize change (static method rejected) Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2022-11-22 13:42:39 +01:00
Adam Gleave	4fb8aec215	Update evaluate_policy type annotation to support policies as well as RL algorithms (#1146 ) * Add PolicyPredictor protocol and use it in evaluate_policy * Update changelog * Move Protocol to type_aliases to avoid circular import * Add test for evaluate_policy on BasePolicy * Remove unused import * Use typing_extensions * Move typing_extensions to 3rd party * Add version range (typing_extensions uses SemVer) * Import Protocol from typing_extensions only on Python<3.8 Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Install typing_extensions only on Python<3.8 * Add missing sys import * Fix import ordering * Fix observation type hint in predict Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>	2022-11-03 15:36:19 +01:00
Quentin Gallouédec	5ef10c8e69	Fix type annotation of ``policy` `in` `BaseAlgorithm` `and` `OffPolicyAlgorithm`` (#1120 )	2022-10-17 10:16:20 +02:00
Antonin RAFFIN	508f8ffd59	Remove deprecated features and attributes (#1104 ) * Remove deprecated eval env * Remove deprecated ret attribute * Remove sde net arch * Remove unused code * Update test comment Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2022-10-11 10:55:16 +02:00
tobirohrer	d8a430e088	Deprecate `create_eval_env`, `eval_env` and `eval_freq` parameter (#1082 ) * Adds deprecation warning if `eval_env` or `eval_freq` parameters are used. See #925 * added changelog entry * added missing backtick * deprecating `create_eval_env` parameter as well and adding comments to explain the `stacklevel` parameter used * Updated tests to ignore DeprecationWarnings * Updated changelog entry * - Removed the `create_eval_env` parameter from the examples in the docs - Removed information about the `create_eval_env` parameter from the migration docs - Added information about deprecation of the `create_eval_env` parameter in the docs * Add alternative in docstring * Update docstrings * `eval_freq` warning in docstring * Add deprecation comments in tests Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>	2022-10-10 15:39:38 +02:00
Antonin RAFFIN	7c21b79188	Add progress bar callback and argument (#1095 ) * Add progress bar callback and argument * Update doc * Update changelog * Upgrade pytype in docker image * Use tqdm.write in the logger to have cleaner output * Fix logger test * Fix when doing multiple calls to learn() * Address comments from code-review	2022-10-06 18:17:31 +02:00
Juan Rocamonde	432b3f876d	Fix return type for load, learn in BaseAlgorithm (#1043 ) * Fix return type for load, learn in BaseAlgorithm * Update changelog * Add typing extensions to dependencies * Import directly from typing for python >3.11 * Reorder changelog to reflect merge order * Roll back to typevar solution * Updated changelog * Remove typing extensions requirement * Update base_class.py * Remove final point in changelog * Additional type fixes across project Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2022-09-26 12:13:56 +02:00
Quentin Gallouédec	440735cbd0	Fix loading a model with different number of environments (#1058 ) * Fix loading with new `n_envs` * Update tests * Update changelog * Fix the fix * Remove `self._setup_model()` from `set_env()` * Raise `AssertionError` when setting env with a different `n_envs` * Update unitests Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2022-09-17 11:10:03 +02:00
Quentin Gallouédec	98e786f744	Clarify and standardize verbosity documentation (#1056 ) * Standardize the use of verbosity: > to >= * Make verbose docstring more specific * Update changelog	2022-09-09 16:46:28 +02:00
Burak Demirbilek	792e3bcc27	Fixed missing verbose parameter passing (#1011 ) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2022-08-16 13:32:32 +02:00
Adam Gleave	b1cc15970a	Use higher resolution time_ns() and avoid division by zero (#979 ) * Use higher resolution time and round up to eps * Update changelog * Add test case * Fix formatting, time()->time_ns * Bugfix: ns is integer not float * Move test to better place * Divide by 1e9 earlier	2022-07-25 23:02:53 +02:00
Ram Rachum	d64bcb401a	Fix exception cause in base_class.py (#940 )	2022-06-21 20:58:02 +01:00
Antonin RAFFIN	49813d8c68	Update doc and add check for unbounded action space (#918 )	2022-05-25 16:24:21 +02:00
TibiGG	2fcf8f91c1	Removed redundant double-check of nested Dict (#908 ) * Removed redundant double-check of nested Dict observation space from BaseAlgorithm * Update changelog Co-authored-by: tibigg <tg4018@ic.ac.uk>	2022-05-09 14:36:15 +03:00
Grégoire Passault	254bb10c42	Replacing the policy registry with policy "aliases" (#842 ) * Replacing the policy registry with policy "aliases" * Fixing import order and SAC * Changing arg. order to be sure policy_aliases is a kwarg * Import orders * Removing pytype error check * Reformat * Fix alias import * Not using mutable {} as default for policy_aliases * Empty aliases initialization * Using static attributes for policy_aliases * Fixing isort * Fixing back bad merge * Running isort * Fixing aliases for A2C and PPO * Using f-string * Moving policy_aliases definition position * Adding change in the changelog * Update version Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2022-04-08 21:21:53 +02:00
Antonin RAFFIN	bb16645c4e	Add `skip` option for `VecTransposeImage` and bug fix in frame stack (#700 ) * Update doc * Add comment * Add skip option to VecTransposeImage and fix bug in frame stack	2021-12-23 17:12:49 +02:00
Demetrio92	798b16aaf7	more verbose documentation regarding `.load` vs `.set_parameters` (#696 ) * more verbose documentation regarding `.load` vs `.set_parameters` (#683, #614) * add a note to explain the difference between `.load` and `.set_parameters` to the examples * fix typos Co-authored-by: Anssi <kaneran21@hotmail.com> Co-authored-by: Anssi <kaneran21@hotmail.com>	2021-12-18 17:28:37 +02:00
Antonin RAFFIN	52c29dc497	Fix evaluation script for recurrent policies (#678 ) * Fix evaluation script for RNN * Add error message * Revert "Add error message" This reverts commit 8d69b6cf4de2cd13aecfb425bd3145fad6a6c49a. * Fix for pytype * Rename mask to `episode_start` * Fix type hint * Fix type hints * Remove confusing part of sentence Co-authored-by: Anssi <kaneran21@hotmail.com>	2021-11-30 13:49:06 +01:00
Antonin RAFFIN	2bb4500948	Fix `set_env` when using `VecNormalize` (#638 ) * Fix `set_env` when using `VecNormalize` * Update version	2021-11-02 13:52:26 +02:00
Oleksii Kachaiev	0c17fedfac	Adjust FPS calculation to accommodate for reset_num_timesteps=False (#636 ) * Store number of timesteps at the beginning of each learn cycle * Update changelog * Set default _num_timesteps_at_start in the contructor * Test case for FPS logger * Adjust test to cover both on-policy and off-policy algorithms * Fix formatting * Update test and add comment * Fix test Co-authored-by: Oleksii Kachaiev <okachaiev@riotgames.com> Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2021-10-31 18:19:03 +01:00
Antonin RAFFIN	e907eca18e	Fix `set_env` to keep the number of timesteps (#615 ) * Fix for `set_env` * Add test and update changelog * Use underscores and f-strings * Add PyPi info * Update comments	2021-10-23 16:36:40 +02:00
Antonin RAFFIN	1564a85081	System info helper (#613 ) * Add `system_env_info` * Add `print_system_info` to load and store system info at save time * Remove TODO * Rename to `get_system_info` * Import as sb3 for consistency * Update changelog * Add warning for old SB3 versions * Use underscore litteral for more clarity	2021-10-18 10:43:56 +02:00
Antonin RAFFIN	1881d904a0	Doc fix and improve error messages (#598 ) * Fix custom env doc * Catch common mistake * Improve `EvalCallback` error message * Lint test * Update docs/guide/custom_env.rst Co-authored-by: Adam Gleave <adam@gleave.me> Co-authored-by: Adam Gleave <adam@gleave.me>	2021-10-08 18:08:31 +02:00
David Blom	3efab0d267	Training and evaluation: call model.train() and model.eval() (#537 ) * training and evaluation: call model.train() and model.eval() to enable and disable dropout and batchnorm * Add comment documentation * Fix train and eval for the Actor class * Run black * Add github handle to changelog * Add unit tests for PPO and DQN * Refactor unit test * Run black * unit test: add a dropout layer and check that calling predict with deterministic=True is deterministic * documentation: add bugfix description to changelog * unit test: use learning_starts=0, decrease the size of the network and use more training steps * on policy algorithms: call policy.train() and policy.eval() instead of disable_training and enable_training as it is a th.nn.module * Rename unit test * unit test: use drop out probability of 0.5 * Call policy.train and policy.eval * Fixes + update tests * Remove unneeded eval Co-authored-by: David Blom <davidsblom@gmail.com> Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2021-08-14 14:08:27 +02:00
Antonin RAFFIN	be86883f36	Fix type annotations (#522 ) * Fix type annotations * Add citation file * Update CITATION.cff * Add note about tb logging Co-authored-by: Anssi <kaneran21@hotmail.com>	2021-07-29 13:02:09 +02:00
Antonin RAFFIN	b52c6fc18f	Fix logger setup (#469 ) * Make logger an attribute * Update doc * Fix logger reset when using multiple runs * Cleanup logger: remove `Logger.CURRENT` * Fix for PPO * Update tests and improve docstring * Add warning * Throw error when tensorboard not installed	2021-06-14 15:17:48 +02:00
Antonin RAFFIN	1ce911994b	Fix ent coef loading for SAC (#429 ) * Fix ent coef loading for SAC * Better fix and add comment	2021-05-12 12:21:54 +03:00
Jaden Travnik	75b6f3b3b0	Dictionary Observations (#243 ) * First commit * Fixing missing refs from a quick merge from master * Reformat * Adding DictBuffers * Reformat * Minor reformat * added slow dict test. Added SACMultiInputPolicy for future. Added private static image transpose helper to common policy * Ran black on buffers * Ran isort * Adding StackedObservations classes used within VecStackEnvs wrappers. Made test_dict_env shorter and removed slow * Running isort :facepalm * Fixed typing issues * Adding docstrings and typing. Using util for moving data to device. * Fixed trailing commas * Fix types * Minor edits * Avoid duplicating code * Fix calls to parents * Adding assert to buffers. Updating changelong * Running format on buffers * Adding multi-input policies to dqn,td3,a2c. Fixing warnings. Fixed bug with DictReplayBuffer as Replay buffers use only 1 env * Fixing warnings, splitting is_vectorized_observation into multiple functions based on space type * Created envs folder in common. Updated imports. Moved stacked_obs to vec_env folder * Moved envs to envs directory. Moved stacked obs to vec_envs. Started update on documentation * Fixes * Running code style * Update docstrings on torch_layers * Decapitalize non-constant variables * Using NatureCNN architecture in combined extractor. Increasing img size in multi input env. Adding memory reduction in test * Update doc * Update doc * Fix format * Removing NineRoom env. Using nested preprocess. Removing mutable default args * running code style * Passing channel check through to stacked dict observations. * Running black * Adding channel control to SimpleMultiObsEnv. Passing check_channels to CombinedExtractor * Remove optimize memory for dict buffers * Update doc * Move identity env * Minor edits + bump version * Update doc * Fix doc build * Bug fixes + add support for more type of dict env * Fixes + add multi env test * Add support for vectranspose * Fix stacked obs for dict and add tests * Add check for nested spaces. Fix dict-subprocvecenv test * Fix (single) pytype error * Simplify CombinedExtractor * Fix tests * Fix check * Merge branch 'master' into feat/dict_observations * Fix for net_arch with dict and vector obs * Fixes * Add consistency test * Update env checker * Add some docs on dict obs * Update default CNN feature vector size * Refactor HER (#351) * Start refactoring HER * Fixes * Additional fixes * Faster tests * WIP: HER as a custom replay buffer * New replay only version (working with DQN) * Add support for all off-policy algorithms * Fix saving/loading * Remove ObsDictWrapper and add VecNormalize tests with dict * Stable-Baselines3 v1.0 (#354) * Bump version and update doc * Fix name * Apply suggestions from code review Co-authored-by: Adam Gleave <adam@gleave.me> * Update docs/index.rst Co-authored-by: Adam Gleave <adam@gleave.me> * Update wording for RL zoo Co-authored-by: Adam Gleave <adam@gleave.me> * Add gym-pybullet-drones project (#358) * Update projects.rst Added gym-pybullet-drones * Update projects.rst Longer title underline * Update changelog Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org> * Include SuperSuit in projects (#359) * include supersuit * longer title underline * Update changelog.rst * Fix default arguments + add bugbear (#363) * Fix potential bug + add bug bear * Remove unused variables * Minor: version bump * Add code of conduct + update doc (#373) * Add code of conduct * Fix DQN doc example * Update doc (channel-last/first) * Apply suggestions from code review Co-authored-by: Anssi <kaneran21@hotmail.com> * Apply suggestions from code review Co-authored-by: Adam Gleave <adam@gleave.me> Co-authored-by: Anssi <kaneran21@hotmail.com> Co-authored-by: Adam Gleave <adam@gleave.me> * Make installation command compatible with ZSH (#376) * Add quotes * Add Zsh bracket info * Add clarify pip installation line * Make note bold * Add Zsh pip installation note * Add handle timeouts param * Fixes * Fixes (buffer size, extend test) * Fix `max_episode_length` redefinition * Fix potential issue * Add some docs on dict obs * Fix performance bug * Fix slowdown * Add package to install (#378) * Add package to install * Update docs packages installation command Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Fix backward compat + add test * Fix VecEnv detection * Update doc * Fix vec env check * Support for `VecMonitor` for gym3-style environments (#311) * add vectorized monitor * auto format of the code * add documentation and VecExtractDictObs * refactor and add test cases * add test cases and format * avoid circular import and fix doc * fix type * fix type * oops * Update stable_baselines3/common/monitor.py Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Update stable_baselines3/common/monitor.py Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * add test cases * update changelog * fix mutable argument * quick fix * Apply suggestions from code review * fix terminal observation for gym3 envs * delete comment * Update doc and bump version * Add warning when already using `Monitor` wrapper * Update vecmonitor tests * Fixes Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Reformat * Fixed loading of ``ent_coef`` for ``SAC`` and ``TQC``, it was not optimized anymore (#392) * Fix ent coef loading bug * Add test * Add comment * Reuse save path * Add test for GAE + rename `RolloutBuffer.dones` for clarification (#375) * Fix return computation + add test for GAE * Rename `last_dones` to `episode_starts` for clarification * Revert advantage * Cleanup test * Rename variable * Clarify return computation * Clarify docs * Add multi-episode rollout test * Reformat Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com> * Fixed saving of `A2C` and `PPO` policy when using gSDE (#401) * Improve doc and replay buffer loading * Add support for images * Fix doc * Update Procgen doc * Update changelog * Update docstrings Co-authored-by: Adam Gleave <adam@gleave.me> Co-authored-by: Jacopo Panerati <jacopo.panerati@utoronto.ca> Co-authored-by: Justin Terry <justinkterry@gmail.com> Co-authored-by: Anssi <kaneran21@hotmail.com> Co-authored-by: Tom Dörr <tomdoerr96@gmail.com> Co-authored-by: Tom Dörr <tom.doerr@tum.de> Co-authored-by: Costa Huang <costa.huang@outlook.com> * Update doc and minor fixes * Update doc * Added note about MultiInputPolicy in error of NatureCNN * Merge branch 'master' into feat/dict_observations * Address comments * Naming clarifications * Actually saving the file would be nice * Fix edge case when doing online sampling with HER * Cleanup * Add sanity check Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com> Co-authored-by: Adam Gleave <adam@gleave.me> Co-authored-by: Jacopo Panerati <jacopo.panerati@utoronto.ca> Co-authored-by: Justin Terry <justinkterry@gmail.com> Co-authored-by: Tom Dörr <tomdoerr96@gmail.com> Co-authored-by: Tom Dörr <tom.doerr@tum.de> Co-authored-by: Costa Huang <costa.huang@outlook.com>	2021-05-11 12:29:30 +02:00
Rohan Tangri	35da0b59b9	Policy Base for On-policy Algorithms (#412 ) (#415 ) * add policy_base input to OnPolicyAlgorithms * update changelog * Fix pytype error Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2021-05-04 12:59:36 +03:00
Antonin RAFFIN	5d47296b8d	Add test for GAE + rename `RolloutBuffer.dones` for clarification (#375 ) * Fix return computation + add test for GAE * Rename `last_dones` to `episode_starts` for clarification * Revert advantage * Cleanup test * Rename variable * Clarify return computation * Clarify docs * Add multi-episode rollout test * Reformat Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>	2021-04-16 15:52:55 +02:00
Antonin RAFFIN	c4304029a2	Fixed loading of ``ent_coef` `for` `SAC` `and` `TQC``, it was not optimized anymore (#392 ) * Fix ent coef loading bug * Add test * Add comment * Reuse save path	2021-04-15 14:50:43 +02:00
Antonin RAFFIN	c62e9259db	Add custom objects support + bug fix (#336 ) * Add support for custom objects * Add python 3.8 to the CI * Bump version * PyType fixes * [ci skip] Fix typo * Add note about slow-down + fix typos * Minor edits to the doc * Bug fix for DQN * Update test * Add test for custom objects	2021-03-06 15:17:43 +02:00
M. Ernestus	0c50d75ecb	TD3 Code review (#245 ) * Removed unneeded overrides of feature_extractor and normalize_images in the TD3 Actor. * Add learning rate schedule example (#248) * Add learning rate schedule example * Update docs/guide/examples.rst Co-authored-by: Adam Gleave <adam@gleave.me> * Address comments Co-authored-by: Adam Gleave <adam@gleave.me> * Add supported action spaces checks (#254) * Add supported action spaces checks * Address comment * Use `pass` in an abstractmethod instead of deleting the arguments. * Remove the "deterministic" keyword from the forward method of the TD3 Actor since it always is deterministic anyways. * Rename _get_data to _get_data_to_reconstruct_model. _get_data was too generic and could have meant anything. * Remove the n_episodes_rollout parameter and allow passing tuples as train_freq instead. * Fix docstring of `train_freq` parameter. * Black fixes. * Fix TD3 delayed update + rename `_get_data()` * Fix TD3 test * Normalize `train_freq` to a tuple in the constructor and turn the warning into an assert. * Make one step the default train frequency. * Black fixes. * Change np.bool to bool. * Use the tuple format to specify an amount of steps in terms of steps or episodes in the collect_collouts of the off policy algorithm. * Use the tuple format to specify an amount of steps in terms of steps or episodes in the collect_collouts of HER. * Use named tuple for train freq * Rename train_freq to train_every and TrainFreq to ExperienceDuration. Also add some type annotations and documentation. * Black fixes. * Revert to train_freq * Fix terminal observation issues * Typo * Fix action noise bug in HER * Add assert when loading HER models * Update version Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> Co-authored-by: Adam Gleave <adam@gleave.me>	2021-02-27 17:33:50 +01:00

1 2

82 commits