stable-baselines3

mirror of https://github.com/saymrwulf/stable-baselines3.git synced 2026-05-16 21:10:08 +00:00

Author	SHA1	Message	Date
Antonin RAFFIN	daaebd0a52	Drop python 3.8 and add python 3.12 support (#2041 ) * Drop python 3.8 support, add python 3.12 support * Upgrade to python 3.9 syntax * Fixes for Numpy v2 * Fix doc warning	2024-11-18 15:40:36 +01:00
Antonin RAFFIN	b413f4c285	Fix `VecEnv` type hints (#1736 ) * Fix VecNormalize type hints * Fix VecEnv utils type annotations * Apply suggestions from code review Co-authored-by: M. Ernestus <maximilian@ernestus.de> * Remove PyType --------- Co-authored-by: M. Ernestus <maximilian@ernestus.de>	2023-11-08 09:46:40 +01:00
Antonin RAFFIN	a35c08c0d6	Fix offpolicy algo type hints (#1734 ) * Fix offpolicy algo type hints * Update PyTorch to have latest type hints * Fix pip argument * Try PyTorch 2.0.1 * Revert "Try PyTorch 2.0.1" This reverts commit 0e0ead442d524d26f1f7e1a0bb21e2bfc0245b69. * Update changelog	2023-11-06 11:17:36 +01:00
Jan-Hendrik Ewers	2ddf015cd9	fix: Follow PEP8 guidelines and evaluate falsy to truthy with `not` rather than `is False`. (#1707 ) * fix: Follow PEP8 guidelines and evaluate falsy to truth with `not` rather than `is False`. https://docs.python.org/2/library/stdtypes.html#truth-value-testing * chore: Update changelog inline with intent of changes in PR #1707 Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * fix: Change `is False` to `not` as per PEP8 * chore: Remove superfluous comment about `is False` * test: One On- and one Off-Policy algorithm (A2C and SAC respectively), with settings to speed up testing * Update changelog * chore: Remove EvalCallback as it's not actually required * Update changelog.rst * Rm duplicated "others" section in changelog.rst --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2023-10-09 12:21:12 +02:00
Patrick Helm	e071796549	Fixes replay buffer device after loading in OffPolicyAlgorithm (#1662 ) * sets replay buffer device after loading * update changelog * update changelog * correct changelog * add test for replay buffer device * Fix test to actually test the bug fix * [ci skip] Update version * [ci skip] Update docker images --------- Co-authored-by: PatrickHelm <patrick.helm@gmx.net> Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2023-09-03 12:50:02 +02:00
PatrickHelm	c99d65c664	Fix `VectorizedActionNoise` in `OffPolicyAlgorithm` (#1657 ) * moves VectorizedActionNoise into _setup_learn() * update changelog --------- Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>	2023-08-30 12:37:14 +02:00
Antonin RAFFIN	40e0b9d2c8	Add Gymnasium support (#1327 ) * Fix failing set_env test * Fix test failiing due to deprectation of env.seed * Adjust mean reward threshold in failing test * Fix her test failing due to rng * Change seed and revert reward threshold to 90 * Pin gym version * Make VecEnv compatible with gym seeding change * Revert change to VecEnv reset signature * Change subprocenv seed cmd to call reset instead * Fix type check * Add backward compat * Add `compat_gym_seed` helper * Add goal env checks in env_checker * Add docs on HER requirements for envs * Capture user warning in test with inverted box space * Update ale-py version * Fix randint * Allow noop_max to be zero * Update changelog * Update docker image * Update doc conda env and dockerfile * Custom envs should not have any warnings * Fix test for numpy >= 1.21 * Add check for vectorized compute reward * Bump to gym 0.24 * Fix gym default step docstring * Test downgrading gym * Revert "Test downgrading gym" This reverts commit 0072b77156c006ada8a1d6e26ce347ed85a83eeb. * Fix protobuf error * Fix in dependencies * Fix protobuf dep * Use newest version of cartpole * Update gym * Fix warning * Loosen required scipy version * Scipy no longer needed * Try gym 0.25 * Silence warnings from gym * Filter warnings during tests * Update doc * Update requirements * Add gym 26 compat in vec env * Fixes in envs and tests for gym 0.26+ * Enforce gym 0.26 api * format * Fix formatting * Fix dependencies * Fix syntax * Cleanup doc and warnings * Faster tests * Higher budget for HER perf test (revert prev change) * Fixes and update doc * Fix doc build * Fix breaking change * Fixes for rendering * Rename variables in monitor * update render method for gym 0.26 API backwards compatible (mode argument is allowed) while using the gym 0.26 API (render mode is determined at environment creation) * update tests and docs to new gym render API * undo removal of render modes metatadata check * set rgb_array as default render mode for gym.make * undo changes & raise warning if not 'rgb_array' * Fix type check * Remove recursion and fix type checking * Remove hacks for protobuf and gym 0.24 * Fix type annotations * reuse existing render_mode attribute * return tiled images for 'human' render mode * Allow to use opencv for human render, fix typos * Add warning when using non-zero start with Discrete (fixes #1197) * Fix type checking * Bug fixes and handle more cases * Throw proper warnings * Update test * Fix new metadata name * Ignore numpy warnings * Fixes in vec recorder * Global ignore * Filter local warning too * Monkey patch not needed for gym 26 * Add doc of VecEnv vs Gym API * Add render test * Fix return type * Update VecEnv vs Gym API doc * Fix for custom render mode * Fix return type * Fix type checking * check test env test_buffer * skip render check * check env test_dict_env * test_env test_gae * check envs in remaining tests * Update tests * Add warning for Discrete action space with non-zero (#1295) * Fix atari annotation * ignore get_action_meanings [attr-defined] * Fix mypy issues * Add patch for gym/gymnasium transition * Switch to gymnasium * Rely on signature instead of version * More patches * Type ignore because of https://github.com/Farama-Foundation/Gymnasium/pull/39 * Fix doc build * Fix pytype errors * Fix atari requirement * Update env checker due to change in dtype for Discrete * Fix type hint * Convert spaces for saved models * Ignore pytype * Remove gitlab CI * Disable pytype for convert space * Fix undefined info * Fix undefined info * Upgrade shimmy * Fix wrappers type annotation (need PR from Gymnasium) * Fix gymnasium dependency * Fix dependency declaration * Cap pygame version for python 3.7 * Point to master branch (v0.28.0) * Fix: use main not master branch * Rename done to terminated * Fix pygame dependency for python 3.7 * Rename gym to gymnasium * Update Gymnasium * Fix test * Fix tests * Forks don't have access to private variables * Fix linter warnings * Update read the doc env * Fix env checker for GoalEnv * Fix import * Update env checker (more info) and fix dtype * Use micromamab for Docker * Update dependencies * Clarify VecEnv doc * Fix Gymnasium version * Copy file only after mamba install * [ci skip] Update docker doc * Polish code * Reformat * Remove deprecated features * Ignore warning * Update doc * Update examples and changelog * Fix type annotation bundle (SAC, TD3, A2C, PPO, base class) (#1436) * Fix SAC type hints, improve DQN ones * Fix A2C and TD3 type hints * Fix PPO type hints * Fix on-policy type hints * Fix base class type annotation, do not use defaults * Update version * Disable mypy for python 3.7 * Rename Gym26StepReturn * Update continuous critic type annotation * Fix pytype complain --------- Co-authored-by: Carlos Luis <carlos.luisgonc@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Thomas Lips <37955681+tlpss@users.noreply.github.com> Co-authored-by: tlips <thomas.lips@ugent.be> Co-authored-by: tlpss <thomas17.lips@gmail.com> Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>	2023-04-14 13:13:59 +02:00
Jonas Reiher	12250eb761	Add stats window argument (#1424 ) * added stats_window_size argument * updated changelog * docstring info updated * added missing tensorboard log docstring * added stats_window_size argument for all models * fixed stats_window_size test * Update version --------- Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>	2023-04-05 11:33:26 +02:00
Antonin RAFFIN	5a70af8abd	Fix type hints for DQN (#1354 ) * Fix type hints for DQN * [ci skip] Remove commented line * Refine types * Fix vectorized obs detection * Fix for pytype * Fix check at load time to create replay buffer * One config file to rule them all * Delete unused config --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2023-03-30 11:31:47 +02:00
Quentin Gallouédec	c5adad82b2	Multiprocessing support for HerReplayBuffer (#704 ) * IM compat. modif from old fork * mp her working, without offline sampling * update readme and doc * fix discrete action/obs space case * handle offline sampling * fix pos to be consistent with the old version * improve typing and docstring * fix discrete obs special case * new her, using episode uid * deal with full buffer * offline not implemented * info storage; compute_reward as arg; offline sampling error * offline sampling; timeout_termination; fix last_trans detection * rm max_episode_length from tests * fix loading and loading test * Fix episode sampling strategy * Episode interrupted not valid * Typo * Fix infos sampling, next_obs desired goals, offline sampling * update tests for multienvs * speed up code * handle timeout sampling when samping * give up ep_uid for ep_start and ep_lenght * speed up sampling * Improve docstring * Typos and renaming * Fix typing * Fix linter warnings * Renaming + add note * fix reward type * Fix future sampling strategy * Fix future goal selection strategy * env_fn as lambda * Re-fix linter warnings * Formatting * Fix offline sampling * restore the initial performance budget * Remove max_episode_length for HerReplayBuffer kwargs * SubprcVecEnv compat test * Dedicated SubrocVecEnv test rm n_envs from parametrization * Back to using the env arg instead of compute_reward * Up VecEnv import * fix lint warnings * fix docstring * Fix device issue * actor_loss_modifier in SAV and TD3 * Merge RewardModifier and ActorLossModifier into Surgeon * update surgeon for rnd * fix uninteded merge * fix uninteded merge * fix unintended merge * Rm unintended merge * Fix KeyError * Remove useless `all_inds` * Minor docstring format * Fix hint * speedup! * Speedup again * speedup * np.nonzero * fix env normalization * flat sampling for speedup * typo * drop online * format * remove observation from env_cheker (see #1335) * update changelog * default device to "auto" * add comment for info storage * add comment for ep_start and ep_length attributes * a[b][c] to a[b, c] * comment flatnonzero and unravel_index * update _sample_goals docstring * Fix future gaol sampling for split episode * add informative error message for learning_starts too small * use keyword arg for env * try fix pytye * Update stable_baselines3/common/off_policy_algorithm.py Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Add `copy_info_dict` option * Ignore pytype * Update changelog * Rename variables and improve documentation * Ignore new bug bear rule * Add note about future strategy * Add deprecation warning * Fix bug trying to pickle buffer kwargs --------- Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2023-03-20 12:03:57 +01:00
Quentin Gallouédec	82bc63fca4	Upgrade black formatting (#1310 ) * apply black * Reformat tests --------- Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2023-02-02 11:58:41 +01:00
Quentin Gallouédec	4fa17dcf0f	Standardize the use of `from gym import spaces` (#1240 ) * generalize the use of `from gym import spaces` * command line get system info * Documentation line length for doc * update changelog * add space before os plateform to avoid ref to other issue * format * get_system_info update in changelog * fix type check error * fix get system info * add comment about regex * update version	2023-01-02 14:51:11 +01:00
Quentin Gallouédec	e3b24829a5	Drop `gym.GoalEnv` and other minor changes initally from #780 (#1184 ) * Various changes from #780 * Fix env_checker for goal_env detection	2022-11-28 18:22:31 +01:00
Quentin Gallouédec	f3abda5cbc	Fix `Self` return type (#1167 ) * Fix Self annotation * Update changelog * Define type var on top * ClassSelf to SelfClass * annotate self * Revert Running meanstd change * Revert vecnormalize change (static method rejected) Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2022-11-22 13:42:39 +01:00
Quentin Gallouédec	5ef10c8e69	Fix type annotation of ``policy` `in` `BaseAlgorithm` `and` `OffPolicyAlgorithm`` (#1120 )	2022-10-17 10:16:20 +02:00
Antonin RAFFIN	508f8ffd59	Remove deprecated features and attributes (#1104 ) * Remove deprecated eval env * Remove deprecated ret attribute * Remove sde net arch * Remove unused code * Update test comment Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2022-10-11 10:55:16 +02:00
tobirohrer	d8a430e088	Deprecate `create_eval_env`, `eval_env` and `eval_freq` parameter (#1082 ) * Adds deprecation warning if `eval_env` or `eval_freq` parameters are used. See #925 * added changelog entry * added missing backtick * deprecating `create_eval_env` parameter as well and adding comments to explain the `stacklevel` parameter used * Updated tests to ignore DeprecationWarnings * Updated changelog entry * - Removed the `create_eval_env` parameter from the examples in the docs - Removed information about the `create_eval_env` parameter from the migration docs - Added information about deprecation of the `create_eval_env` parameter in the docs * Add alternative in docstring * Update docstrings * `eval_freq` warning in docstring * Add deprecation comments in tests Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>	2022-10-10 15:39:38 +02:00
Antonin RAFFIN	7c21b79188	Add progress bar callback and argument (#1095 ) * Add progress bar callback and argument * Update doc * Update changelog * Upgrade pytype in docker image * Use tqdm.write in the logger to have cleaner output * Fix logger test * Fix when doing multiple calls to learn() * Address comments from code-review	2022-10-06 18:17:31 +02:00
Juan Rocamonde	432b3f876d	Fix return type for load, learn in BaseAlgorithm (#1043 ) * Fix return type for load, learn in BaseAlgorithm * Update changelog * Add typing extensions to dependencies * Import directly from typing for python >3.11 * Reorder changelog to reflect merge order * Roll back to typevar solution * Updated changelog * Remove typing extensions requirement * Update base_class.py * Remove final point in changelog * Additional type fixes across project Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2022-09-26 12:13:56 +02:00
Quentin Gallouédec	98e786f744	Clarify and standardize verbosity documentation (#1056 ) * Standardize the use of verbosity: > to >= * Make verbose docstring more specific * Update changelog	2022-09-09 16:46:28 +02:00
Juan Rocamonde	fdca786f09	Fix replay_buffer_class type annotation (#1042 ) * Fix replay_buffer_class type annotation * Update changelog * Further replacement of same type annotation issue * Formatting * Rolled back formatting changes for consistency	2022-09-01 20:10:01 -07:00
Adam Gleave	b1cc15970a	Use higher resolution time_ns() and avoid division by zero (#979 ) * Use higher resolution time and round up to eps * Update changelog * Add test case * Fix formatting, time()->time_ns * Bugfix: ns is integer not float * Move test to better place * Divide by 1e9 earlier	2022-07-25 23:02:53 +02:00
Ram Rachum	d64bcb401a	Fix exception cause in base_class.py (#940 )	2022-06-21 20:58:02 +01:00
Antonin RAFFIN	a6f5049a99	Upgrade code to Python 3.7+ syntax using `pyupgrade` (#887 ) * Upgrade code to Python 3.7+ syntax * Update changelog	2022-04-25 13:01:38 +03:00
Grégoire Passault	254bb10c42	Replacing the policy registry with policy "aliases" (#842 ) * Replacing the policy registry with policy "aliases" * Fixing import order and SAC * Changing arg. order to be sure policy_aliases is a kwarg * Import orders * Removing pytype error check * Reformat * Fix alias import * Not using mutable {} as default for policy_aliases * Empty aliases initialization * Using static attributes for policy_aliases * Fixing isort * Fixing back bad merge * Running isort * Fixing aliases for A2C and PPO * Using f-string * Moving policy_aliases definition position * Adding change in the changelog * Update version Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2022-04-08 21:21:53 +02:00
Grégoire Passault	00ac43b0a9	Removing dead code for handling time limits (#831 ) * Removing dead code for handling time limits (see #829) * Mentionning remove_time_limit_termination in the changelog * Update changelog.rst Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2022-03-23 13:33:55 +01:00
Quentin Gallouédec	d496cd4d95	Consistent use of `device` as keyword argument (#702 ) * consistent device as keyword arg * Fixed ``device`` arg inconsistency in changelog	2021-12-22 11:43:59 +01:00
Antonin RAFFIN	507ed1762e	Multiprocessing support for off policy algorithms (#439 ) * Add multi-env training support for SAC * Fix for dict obs * Pytype fixes * Fix assert on number of envs * Remove for loop * Add support for Dict obs * Start cleanup * Update doc and bug fix * Add support for vectorized action noise and add multi env example for off-policy * Update version * Bug fix with VecNormalize * Update README table * Update variable names * Update changelog and version * Update doc and fix for `gradient_steps=-1` * Add test for `gradient_steps=-1` * Disable pytype pyi errors * Fix for DQN * Update comment on deepcopy * Remove episode_reward field * Fix RolloutReturn * Avoid modification by reference * Fix error message Co-authored-by: Anssi <kaneran21@hotmail.com>	2021-12-01 22:30:09 +01:00
Oleksii Kachaiev	0c17fedfac	Adjust FPS calculation to accommodate for reset_num_timesteps=False (#636 ) * Store number of timesteps at the beginning of each learn cycle * Update changelog * Set default _num_timesteps_at_start in the contructor * Test case for FPS logger * Adjust test to cover both on-policy and off-policy algorithms * Fix formatting * Update test and add comment * Fix test Co-authored-by: Oleksii Kachaiev <okachaiev@riotgames.com> Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2021-10-31 18:19:03 +01:00
Antonin RAFFIN	1564a85081	System info helper (#613 ) * Add `system_env_info` * Add `print_system_info` to load and store system info at save time * Remove TODO * Rename to `get_system_info` * Import as sb3 for consistency * Update changelog * Add warning for old SB3 versions * Use underscore litteral for more clarity	2021-10-18 10:43:56 +02:00
Timo Kaufmann	09e9fc42eb	Use consistent logging keys (#605 ) * Use a consistent key to log the total timesteps This changes the timestep logging key of on-policy algorithms from `time/total_timesteps` to `time/total timesteps` (note the underscore/space). The off-policy algorithms and the eval callback already use the latter, so this behavior is more consistent. * Use underscores instead of spaces in logging keys Most keys already followed this policy and consistent behavior is friendlier to new users. * Minor edit and bump version Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2021-10-12 13:17:30 +02:00
Scott Brownlie	1afc2f3abe	Avoid putting target networks into training mode (#553 ) * make sure DQN policy is always in correct mode - train or eval * make set_training_mode an abstract method of the base policy - safer * update docstring of _build method to note that the target network is put into eval mode * use set_training_mode to put the dqn target network into eval mode * use set_training_mode to set the training model of the q-network * move set_training_mode abstract method from BasePolicy to BaseModel * set train and eval mode for TD3 * make sure critic is always in correct mode during train * set train and eval mode for SAC * add comment re batch norm and dropout * set train and eval mode for A2C and PPO * add tests for collect rollouts with batch norm * fix formatting * update change log * update version * remove Optional typing for batch size - causing type check to fail * Fix scipy dependency for toy text envs * implement set_training_mode method in BaseModel * move all tests of train/eval mode to test_train_eval_mode * call learn with learning_starts = total_timesteps to test that collect_rollouts does not update batch norm * remove extra calls to set_training_mode in train method of TD3 and SAC * Allow gradient_steps=0 * Refactor tests * Add comment + use aliases * Typos Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2021-08-30 17:42:41 +02:00
David Blom	3efab0d267	Training and evaluation: call model.train() and model.eval() (#537 ) * training and evaluation: call model.train() and model.eval() to enable and disable dropout and batchnorm * Add comment documentation * Fix train and eval for the Actor class * Run black * Add github handle to changelog * Add unit tests for PPO and DQN * Refactor unit test * Run black * unit test: add a dropout layer and check that calling predict with deterministic=True is deterministic * documentation: add bugfix description to changelog * unit test: use learning_starts=0, decrease the size of the network and use more training steps * on policy algorithms: call policy.train() and policy.eval() instead of disable_training and enable_training as it is a th.nn.module * Rename unit test * unit test: use drop out probability of 0.5 * Call policy.train and policy.eval * Fixes + update tests * Remove unneeded eval Co-authored-by: David Blom <davidsblom@gmail.com> Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2021-08-14 14:08:27 +02:00
Antonin RAFFIN	be86883f36	Fix type annotations (#522 ) * Fix type annotations * Add citation file * Update CITATION.cff * Add note about tb logging Co-authored-by: Anssi <kaneran21@hotmail.com>	2021-07-29 13:02:09 +02:00
Antonin RAFFIN	b52c6fc18f	Fix logger setup (#469 ) * Make logger an attribute * Update doc * Fix logger reset when using multiple runs * Cleanup logger: remove `Logger.CURRENT` * Fix for PPO * Update tests and improve docstring * Add warning * Throw error when tensorboard not installed	2021-06-14 15:17:48 +02:00
Jaden Travnik	75b6f3b3b0	Dictionary Observations (#243 ) * First commit * Fixing missing refs from a quick merge from master * Reformat * Adding DictBuffers * Reformat * Minor reformat * added slow dict test. Added SACMultiInputPolicy for future. Added private static image transpose helper to common policy * Ran black on buffers * Ran isort * Adding StackedObservations classes used within VecStackEnvs wrappers. Made test_dict_env shorter and removed slow * Running isort :facepalm * Fixed typing issues * Adding docstrings and typing. Using util for moving data to device. * Fixed trailing commas * Fix types * Minor edits * Avoid duplicating code * Fix calls to parents * Adding assert to buffers. Updating changelong * Running format on buffers * Adding multi-input policies to dqn,td3,a2c. Fixing warnings. Fixed bug with DictReplayBuffer as Replay buffers use only 1 env * Fixing warnings, splitting is_vectorized_observation into multiple functions based on space type * Created envs folder in common. Updated imports. Moved stacked_obs to vec_env folder * Moved envs to envs directory. Moved stacked obs to vec_envs. Started update on documentation * Fixes * Running code style * Update docstrings on torch_layers * Decapitalize non-constant variables * Using NatureCNN architecture in combined extractor. Increasing img size in multi input env. Adding memory reduction in test * Update doc * Update doc * Fix format * Removing NineRoom env. Using nested preprocess. Removing mutable default args * running code style * Passing channel check through to stacked dict observations. * Running black * Adding channel control to SimpleMultiObsEnv. Passing check_channels to CombinedExtractor * Remove optimize memory for dict buffers * Update doc * Move identity env * Minor edits + bump version * Update doc * Fix doc build * Bug fixes + add support for more type of dict env * Fixes + add multi env test * Add support for vectranspose * Fix stacked obs for dict and add tests * Add check for nested spaces. Fix dict-subprocvecenv test * Fix (single) pytype error * Simplify CombinedExtractor * Fix tests * Fix check * Merge branch 'master' into feat/dict_observations * Fix for net_arch with dict and vector obs * Fixes * Add consistency test * Update env checker * Add some docs on dict obs * Update default CNN feature vector size * Refactor HER (#351) * Start refactoring HER * Fixes * Additional fixes * Faster tests * WIP: HER as a custom replay buffer * New replay only version (working with DQN) * Add support for all off-policy algorithms * Fix saving/loading * Remove ObsDictWrapper and add VecNormalize tests with dict * Stable-Baselines3 v1.0 (#354) * Bump version and update doc * Fix name * Apply suggestions from code review Co-authored-by: Adam Gleave <adam@gleave.me> * Update docs/index.rst Co-authored-by: Adam Gleave <adam@gleave.me> * Update wording for RL zoo Co-authored-by: Adam Gleave <adam@gleave.me> * Add gym-pybullet-drones project (#358) * Update projects.rst Added gym-pybullet-drones * Update projects.rst Longer title underline * Update changelog Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org> * Include SuperSuit in projects (#359) * include supersuit * longer title underline * Update changelog.rst * Fix default arguments + add bugbear (#363) * Fix potential bug + add bug bear * Remove unused variables * Minor: version bump * Add code of conduct + update doc (#373) * Add code of conduct * Fix DQN doc example * Update doc (channel-last/first) * Apply suggestions from code review Co-authored-by: Anssi <kaneran21@hotmail.com> * Apply suggestions from code review Co-authored-by: Adam Gleave <adam@gleave.me> Co-authored-by: Anssi <kaneran21@hotmail.com> Co-authored-by: Adam Gleave <adam@gleave.me> * Make installation command compatible with ZSH (#376) * Add quotes * Add Zsh bracket info * Add clarify pip installation line * Make note bold * Add Zsh pip installation note * Add handle timeouts param * Fixes * Fixes (buffer size, extend test) * Fix `max_episode_length` redefinition * Fix potential issue * Add some docs on dict obs * Fix performance bug * Fix slowdown * Add package to install (#378) * Add package to install * Update docs packages installation command Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Fix backward compat + add test * Fix VecEnv detection * Update doc * Fix vec env check * Support for `VecMonitor` for gym3-style environments (#311) * add vectorized monitor * auto format of the code * add documentation and VecExtractDictObs * refactor and add test cases * add test cases and format * avoid circular import and fix doc * fix type * fix type * oops * Update stable_baselines3/common/monitor.py Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Update stable_baselines3/common/monitor.py Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * add test cases * update changelog * fix mutable argument * quick fix * Apply suggestions from code review * fix terminal observation for gym3 envs * delete comment * Update doc and bump version * Add warning when already using `Monitor` wrapper * Update vecmonitor tests * Fixes Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Reformat * Fixed loading of ``ent_coef`` for ``SAC`` and ``TQC``, it was not optimized anymore (#392) * Fix ent coef loading bug * Add test * Add comment * Reuse save path * Add test for GAE + rename `RolloutBuffer.dones` for clarification (#375) * Fix return computation + add test for GAE * Rename `last_dones` to `episode_starts` for clarification * Revert advantage * Cleanup test * Rename variable * Clarify return computation * Clarify docs * Add multi-episode rollout test * Reformat Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com> * Fixed saving of `A2C` and `PPO` policy when using gSDE (#401) * Improve doc and replay buffer loading * Add support for images * Fix doc * Update Procgen doc * Update changelog * Update docstrings Co-authored-by: Adam Gleave <adam@gleave.me> Co-authored-by: Jacopo Panerati <jacopo.panerati@utoronto.ca> Co-authored-by: Justin Terry <justinkterry@gmail.com> Co-authored-by: Anssi <kaneran21@hotmail.com> Co-authored-by: Tom Dörr <tomdoerr96@gmail.com> Co-authored-by: Tom Dörr <tom.doerr@tum.de> Co-authored-by: Costa Huang <costa.huang@outlook.com> * Update doc and minor fixes * Update doc * Added note about MultiInputPolicy in error of NatureCNN * Merge branch 'master' into feat/dict_observations * Address comments * Naming clarifications * Actually saving the file would be nice * Fix edge case when doing online sampling with HER * Cleanup * Add sanity check Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com> Co-authored-by: Adam Gleave <adam@gleave.me> Co-authored-by: Jacopo Panerati <jacopo.panerati@utoronto.ca> Co-authored-by: Justin Terry <justinkterry@gmail.com> Co-authored-by: Tom Dörr <tomdoerr96@gmail.com> Co-authored-by: Tom Dörr <tom.doerr@tum.de> Co-authored-by: Costa Huang <costa.huang@outlook.com>	2021-05-11 12:29:30 +02:00
Antonin RAFFIN	8a08078ea2	Fix default arguments + add bugbear (#363 ) * Fix potential bug + add bug bear * Remove unused variables * Minor: version bump	2021-03-25 11:35:21 +02:00
Antonin RAFFIN	c62e9259db	Add custom objects support + bug fix (#336 ) * Add support for custom objects * Add python 3.8 to the CI * Bump version * PyType fixes * [ci skip] Fix typo * Add note about slow-down + fix typos * Minor edits to the doc * Bug fix for DQN * Update test * Add test for custom objects	2021-03-06 15:17:43 +02:00
Antonin RAFFIN	b2c94a677d	Fix `train_freq` at load time (#332 ) * Fix train_freq loading * Update docker * Add sanity checks + tests for train freq	2021-02-27 19:53:13 +01:00
M. Ernestus	0c50d75ecb	TD3 Code review (#245 ) * Removed unneeded overrides of feature_extractor and normalize_images in the TD3 Actor. * Add learning rate schedule example (#248) * Add learning rate schedule example * Update docs/guide/examples.rst Co-authored-by: Adam Gleave <adam@gleave.me> * Address comments Co-authored-by: Adam Gleave <adam@gleave.me> * Add supported action spaces checks (#254) * Add supported action spaces checks * Address comment * Use `pass` in an abstractmethod instead of deleting the arguments. * Remove the "deterministic" keyword from the forward method of the TD3 Actor since it always is deterministic anyways. * Rename _get_data to _get_data_to_reconstruct_model. _get_data was too generic and could have meant anything. * Remove the n_episodes_rollout parameter and allow passing tuples as train_freq instead. * Fix docstring of `train_freq` parameter. * Black fixes. * Fix TD3 delayed update + rename `_get_data()` * Fix TD3 test * Normalize `train_freq` to a tuple in the constructor and turn the warning into an assert. * Make one step the default train frequency. * Black fixes. * Change np.bool to bool. * Use the tuple format to specify an amount of steps in terms of steps or episodes in the collect_collouts of the off policy algorithm. * Use the tuple format to specify an amount of steps in terms of steps or episodes in the collect_collouts of HER. * Use named tuple for train freq * Rename train_freq to train_every and TrainFreq to ExperienceDuration. Also add some type annotations and documentation. * Black fixes. * Revert to train_freq * Fix terminal observation issues * Typo * Fix action noise bug in HER * Add assert when loading HER models * Update version Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> Co-authored-by: Adam Gleave <adam@gleave.me>	2021-02-27 17:33:50 +01:00
Antonin RAFFIN	2b9fc1f923	Add supported action spaces checks (#254 ) * Add supported action spaces checks * Address comment	2020-12-06 14:05:10 +02:00
Antonin RAFFIN	d04aad2a20	Doc fixes and add `monitor_kwargs` parameter (#230 ) * Fix type annotation * Fix migration doc for A2C * Update version * Add `monitor_kwargs` argument * Update docs/guide/migration.rst Co-authored-by: Adam Gleave <adam@gleave.me> * Fix make atari env * Fix docstring * Renamed LearningRateSchedule Co-authored-by: Adam Gleave <adam@gleave.me>	2020-11-20 10:28:54 +01:00
M. Ernestus	c74509ae9d	Add callable signatures to type annotations. (#215 ) * Add callback signature to the learning rate type annotations. * Add callback signature to the learning rate schedule type annotations. * Add missing type annotations for learning rate callbacks. * Add signature to old-style learning and evaluation callbacks. * Add signature to env wrapper callback. * Add type annotation to closure function. * Use MaybeCallback more consistently. * Update changelog. * Remove now unused List import. * Fix import order. * Add type alias for learning rate schedules. * Optimize imports. * Fix messed up import. * Remove resolved TODO. Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-11-15 17:50:28 +01:00
Stefan Heid	9d463bc476	Small docstring improvements related to the notion of Rollout (#206 ) * Small docstring improvements related to the notion of Rollout * documented changes in changelog.rst, added myself to contributers * Minor edits Co-authored-by: Stefan Heid <stefan.heid@upb.de> Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-11-02 11:45:08 +01:00
Megan Klaiber	dd6e361204	Implement HER (#120 ) * Added working her version, Online sampling is missing. * Updated test_her. * Added first version of online her sampling. Still problems with tensor dimensions. * Reformat * Fixed tests * Added some comments. * Updated changelog. * Add missing init file * Fixed some small bugs. * Reduced arguments for HER, small changes. * Added getattr. Fixed bug for online sampling. * Updated save/load funtions. Small changes. * Added her to init. * Updated save method. * Updated her ratio. * Move obs_wrapper * Added DQN test. * Fix potential bug * Offline and online her share same sample_goal function. * Changed lists into arrays. * Updated her test. * Fix online sampling * Fixed action bug. Updated time limit for episodes. * Updated convert_dict method to take keys as arguments. * Renamed obs dict wrapper. * Seed bit flipping env * Remove get_episode_dict * Add fast online sampling version * Added documentation. * Vectorized reward computation * Vectorized goal sampling * Update time limit for episodes in online her sampling. * Fix max episode length inference * Bug fix for Fetch envs * Fix for HER + gSDE * Reformat (new black version) * Added info dict to compute new reward. Check her_replay_buffer again. * Fix info buffer * Updated done flag. * Fixes for gSDE * Offline her version uses now HerReplayBuffer as episode storage. * Fix num_timesteps computation * Fix get torch params * Vectorized version for offline sampling. * Modified offline her sampling to use sample method of her_replay_buffer * Updated HER tests. * Updated documentation * Cleanup docstrings * Updated to review comments * Fix pytype * Update according to review comments. * Removed random goal strategy. Updated sample transitions. * Updated migration. Removed time signal removal. * Update doc * Fix potential load issue * Add VecNormalize support for dict obs * Updated saving/loading replay buffer for HER. * Fix test memory usage * Fixed save/load replay buffer. * Fixed save/load replay buffer * Fixed transition index after loading replay buffer in online sampling * Better error handling * Add tests for get_time_limit * More tests for VecNormalize with dict obs * Update doc * Improve HER description * Add test for sde support * Add comments * Add comments * Remove check that was always valid * Fix for terminal observation * Updated buffer size in offline version and reset of HER buffer * Reformat * Update doc * Remove np.empty + add doc * Fix loading * Updated loading replay buffer * Separate online and offline sampling + bug fixes * Update tensorboard log name * Version bump * Bug fix for special case Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de> Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-10-22 11:56:43 +02:00
Antonin RAFFIN	55912576ed	Cleanup docstring types (#169 ) * Cleanup docstring types * Update style * Test with js hack * Revert "Test with js hack" This reverts commit d091f438e8851ab8d01b66628e06a104f5e5ec69. * Fix types * Fix typo * Update CONTRIBUTING example	2020-10-02 20:05:55 +03:00
Francisco Caio	5fc90a7f7d	Add StopTrainingOnMaxEpisodes to callback collection (#147 ) * Add StopTrainingOnMaxEpisodes class to pre-made callback collection * Adjust instant when counters are incremented for both OnPolicy and OffPolicy algorithms * Improv to StopTrainingOnMaxEpisodes including output, tests and doc * Improv StopTrainingOnMaxEpisodes callback running _init_callback * Update callbacks.py * Update test_callbacks.py * Fix style * Update changelog.rst * Fix test Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>	2020-08-28 11:36:33 +02:00
Stelios Tymvios	9003a09d5b	Callbacks have access to locals (#115 ) * callbacks have access to locals * changeloc * doc * callbacks have access to locals * changeloc * doc * Added update function for child callbacks * Pre-Release 0.8.0 (#134) * Fix double reset and improve typing coverage (#136) * Fix double reset and improve typing coverage * Revert minor edit * Add doc about types * Update child callbacks * cleaned imports * format * import order * Simplify tests and add comments Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-08-23 14:34:01 +02:00
Sam Toyer	42ef6d4677	Remove "device" argument from policies (#141 ) * Remove device arg from policies * Clean up for PR * Update test and doc * Fix codestyle Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-08-23 13:27:52 +02:00
Antonin RAFFIN	23afedb254	Auto-formatting with black and isort (#97 ) * Add auto formatting with black and isort * Reformat code * Ignore typing errors * Add note about line length * Add minimum version for isort * Add commit-checks * Update docker image * Fixed lost import (during last merge) * Fix opencv dependency	2020-07-16 16:12:16 +02:00

1 2

56 commits