stable-baselines3

mirror of https://github.com/saymrwulf/stable-baselines3.git synced 2026-05-18 21:30:19 +00:00

Author	SHA1	Message	Date
Quentin Gallouédec	c5adad82b2	Multiprocessing support for HerReplayBuffer (#704 ) * IM compat. modif from old fork * mp her working, without offline sampling * update readme and doc * fix discrete action/obs space case * handle offline sampling * fix pos to be consistent with the old version * improve typing and docstring * fix discrete obs special case * new her, using episode uid * deal with full buffer * offline not implemented * info storage; compute_reward as arg; offline sampling error * offline sampling; timeout_termination; fix last_trans detection * rm max_episode_length from tests * fix loading and loading test * Fix episode sampling strategy * Episode interrupted not valid * Typo * Fix infos sampling, next_obs desired goals, offline sampling * update tests for multienvs * speed up code * handle timeout sampling when samping * give up ep_uid for ep_start and ep_lenght * speed up sampling * Improve docstring * Typos and renaming * Fix typing * Fix linter warnings * Renaming + add note * fix reward type * Fix future sampling strategy * Fix future goal selection strategy * env_fn as lambda * Re-fix linter warnings * Formatting * Fix offline sampling * restore the initial performance budget * Remove max_episode_length for HerReplayBuffer kwargs * SubprcVecEnv compat test * Dedicated SubrocVecEnv test rm n_envs from parametrization * Back to using the env arg instead of compute_reward * Up VecEnv import * fix lint warnings * fix docstring * Fix device issue * actor_loss_modifier in SAV and TD3 * Merge RewardModifier and ActorLossModifier into Surgeon * update surgeon for rnd * fix uninteded merge * fix uninteded merge * fix unintended merge * Rm unintended merge * Fix KeyError * Remove useless `all_inds` * Minor docstring format * Fix hint * speedup! * Speedup again * speedup * np.nonzero * fix env normalization * flat sampling for speedup * typo * drop online * format * remove observation from env_cheker (see #1335) * update changelog * default device to "auto" * add comment for info storage * add comment for ep_start and ep_length attributes * a[b][c] to a[b, c] * comment flatnonzero and unravel_index * update _sample_goals docstring * Fix future gaol sampling for split episode * add informative error message for learning_starts too small * use keyword arg for env * try fix pytye * Update stable_baselines3/common/off_policy_algorithm.py Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Add `copy_info_dict` option * Ignore pytype * Update changelog * Rename variables and improve documentation * Ignore new bug bear rule * Add note about future strategy * Add deprecation warning * Fix bug trying to pickle buffer kwargs --------- Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2023-03-20 12:03:57 +01:00
Antonin RAFFIN	e5deeed16e	Update doc about Gymnasium support (#1382 )	2023-03-14 12:43:19 +01:00
Antonin RAFFIN	f0382a25bd	Add documentation about default network architecture (#1353 ) * Add documentation about default network architecture * [ci skip] Rename custom policy section to Policy Networks	2023-03-02 14:14:57 +01:00
harveybellini	7a1e429702	Remove Note from examples - Code works (#1330 ) * Remove Note Gif creation works with Atari Environments using the script provided below. * Update changelog --------- Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>	2023-02-15 13:14:02 +01:00
Quentin Gallouédec	2e4a45020e	Refactor observation stacking (#1238 ) * refactor stacking obs * Improve docstring * remove all StackedDictObservations * Update tests and make stacked obs clearer * Fix type check * fix stacked_observation_space * undo init change, deprecate StackedDictObservations * deprecate stack_observation_space * type hints * ignore pytype errors * undo vecenv doc change * Deprecation warning in StackedDictObs doctstring * Fix vec_env.rst * Fix __all__ sorting * fix pytype ignore statement * Update docstring * stack * Remove n_stack * Update changelog * Simplify code * Rename test file * Re-use variable for shift * Fix doc build * Remove pytype comment * Disable pytype error --------- Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2023-02-06 22:41:59 +01:00
Alex Pasquali	b702884c23	Removed shared layers in mlp_extractor (#1292 ) * Modified actor-critic policies & MlpExtractor class ActorCriticPolicy: - changed type hint of net_arch param: now it's a dict - removed check that if features extractor is not shared: no shared layers are allowed in the mlp_extractor regardless of the features extractor ActorCriticCnnPolicy: - changed type hint of net_arch param: now it's a dict MultiInputActorcriticPolicy: - changed type hint of net_arch param: now it's a dict MlpExtractor: - changed type hint of net_arch param: now it's a dict - adapted networks creation - adapted methods: forward, forward_actor & forward_critic * Removed shared layers in mlp_extractor * Updated docs and changelog + reformat * Updated custom policy tests * Removed test on deprecation warning for share layers in mlp_extractor Now shared layers are removed * Update version * Update RL Zoo doc * Fix linter warnings * Add ruff to Makefile (experimental) * Add backward compat code and minor updates * Update tests * Add backward compatibility * Fix test * Improve compat code Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2023-01-23 14:55:19 +01:00
Quentin Gallouédec	92f7a6f23b	Fix `test_vec_normalize.py`, `test_tensorboard.py` and `common/monitor.py` type hint (#1194 ) * Remove from mypy exclude * type hint for metadata * Union[float, int] -> float * Remove useless __init__ * Type hint for model and logger in BaseCallback * Type hint for metric_dict * Update changelog * fix test_tensorboard * ignore gamma type checking * Fix monitor type hint * Update logger type hints * Fix type annotation and bump version * Fix circular import Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2023-01-13 18:28:22 +01:00
Yu Zheng	9bb1538b78	Fix outdated `load_parameters` to `set_parameters` (#1270 ) * Update examples.rst * Update changelog Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2023-01-11 14:13:21 +01:00
Alex Pasquali	30a19848ce	Deprecation of shared layers in `MlpExtractor` (#1252 ) * Deprecation warning for shared layers in Mlpextractor * Updated changelog * Updated custom policy doc * Update doc and deprecation * Fix doc build * Minor edits Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2023-01-05 09:59:36 +01:00
Quentin Gallouédec	4fa17dcf0f	Standardize the use of `from gym import spaces` (#1240 ) * generalize the use of `from gym import spaces` * command line get system info * Documentation line length for doc * update changelog * add space before os plateform to avoid ref to other issue * format * get_system_info update in changelog * fix type check error * fix get system info * add comment about regex * update version	2023-01-02 14:51:11 +01:00
Antonin RAFFIN	7202ece85b	Update tensorboard callback doc (#1221 ) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2022-12-21 12:51:28 +01:00
Quentin Gallouédec	96b1a7cf01	`env_id` consistency in tests (#1224 ) Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2022-12-20 16:01:26 +01:00
Alex Pasquali	2cfcec4f50	Modified ActorCriticPolicy to support non-shared features extractor (#1148 ) * Modified ActorCriticPolicy to support non-shared features extractor * Refactored features extraction with non-shared features extractor in ActorCriticPolicy and updated doc Doc update: added 'warning' on custom policy docs that says that, if the features extractor is non-shared, it's not possible to have shared layers in the mlp_extractor * Moved attrib share_features_extractor in class * Updated custom policy doc for non-shared features extractor * Updated changelog * Made some if-statements more readable if policies.py The if-statements are related to the shared/non-shared features extractor in ActorCritic policies * Simplify implementation and add run test * Keep order in module gain to keep previous results consistents * Fix test * Improved docstring in policies.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Added some tests * feature extractor -> features extractor * Fix test * Fix env_id in test * Make features extractor parameter explicit * Remove duplicate Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>	2022-12-20 15:12:05 +01:00
Antonin RAFFIN	8452106734	Fix support of image like normalized inputs (#1214 ) * Fix support of image like normalized inputs * Improve docstring and warning message. * Don't check if obs is image when normalize_images is False (lil opt) * Comment fix * Fix normalize_images not passed to parent * Check for subclasses too * Remove useless multiline * Update version and add comment * Fix some typos Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2022-12-20 13:18:28 +01:00
Alex Pasquali	6d55a09f81	Updated custom policy docs to better explain the ``mlp_extractor``'s dimensions (#1196 ) * Updated custom policy docs Better explained how the dimensions of the mlp_extractor work, including the action net and the value net after the layers specified in net_arch. * Improved custom policy doc Section: Custom Network Architecture. Explained with greater detail that an action net and a value net will be added on top of the net_arch. * Improved custom policy doc Section: Custom Network Architecture. Merged a comment into a note * Alignment Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>	2022-12-12 16:19:51 +01:00
Athanasios Theocharis	f7d7ed3fa7	Update custom_policy.rst (#1183 ) * Update custom_policy.rst * Update changelog Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>	2022-12-06 17:51:52 +01:00
Quentin Gallouédec	e3b24829a5	Drop `gym.GoalEnv` and other minor changes initally from #780 (#1184 ) * Various changes from #780 * Fix env_checker for goal_env detection	2022-11-28 18:22:31 +01:00
Franz Srambical	8641b05b09	Fix typo in documentation (#1177 )	2022-11-15 15:00:03 +01:00
Antonin RAFFIN	0532a5719c	Fix integration documentation (#1135 )	2022-10-24 13:20:58 +02:00
Antonin Raffin	37a942c8f9	Fixes	2022-10-24 12:53:48 +02:00
Thomas Simonini	0274aaf056	Update docs/guide/integrations.rst Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2022-10-24 11:22:33 +02:00
Thomas Simonini	714737c986	Update Hugging Face Integration Documentation	2022-10-24 10:55:30 +02:00
Sam Toyer	5e8f06b3cb	Link to full imitation docs (#1106 )	2022-10-10 21:36:30 -07:00
Antonin RAFFIN	e2f81bb70b	Release v1.6.2 (#1103 ) * Release v1.6.2 * Remove Gitlab CI, no more minutes	2022-10-10 16:37:11 +02:00
tobirohrer	d8a430e088	Deprecate `create_eval_env`, `eval_env` and `eval_freq` parameter (#1082 ) * Adds deprecation warning if `eval_env` or `eval_freq` parameters are used. See #925 * added changelog entry * added missing backtick * deprecating `create_eval_env` parameter as well and adding comments to explain the `stacklevel` parameter used * Updated tests to ignore DeprecationWarnings * Updated changelog entry * - Removed the `create_eval_env` parameter from the examples in the docs - Removed information about the `create_eval_env` parameter from the migration docs - Added information about deprecation of the `create_eval_env` parameter in the docs * Add alternative in docstring * Update docstrings * `eval_freq` warning in docstring * Add deprecation comments in tests Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>	2022-10-10 15:39:38 +02:00
Antonin RAFFIN	7c21b79188	Add progress bar callback and argument (#1095 ) * Add progress bar callback and argument * Update doc * Update changelog * Upgrade pytype in docker image * Use tqdm.write in the logger to have cleaner output * Fix logger test * Fix when doing multiple calls to learn() * Address comments from code-review	2022-10-06 18:17:31 +02:00
Quentin Gallouédec	a697401e03	Standardized the use of ``"`` for string representation (#1086 ) * Replace ``'`` by ``" `` in python code * Update changelog * Rm whitespace	2022-10-03 15:15:39 +02:00
Antonin RAFFIN	537a82a7fd	Update export doc (fixes + add torch jit) (#1074 ) * Update export doc (fixes + add torch jit) * Fix conflicts * Update according to code review comments * fix torch -> th Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2022-09-30 14:30:40 +02:00
Alex Pasquali	d0b129ecc3	Updated custom policy docs (#1067 )	2022-09-18 09:17:57 +02:00
Quentin Gallouédec	98e786f744	Clarify and standardize verbosity documentation (#1056 ) * Standardize the use of verbosity: > to >= * Make verbose docstring more specific * Update changelog	2022-09-09 16:46:28 +02:00
Luke Fisher	a7f30b04e3	Updated minor grammar error (#1041 ) "an history" -> "a history"	2022-08-31 18:04:15 +02:00
Anand Balakrishnan	59af0c1b01	`CheckpointCallback` can now save replay buffer and `VecNormalize` (#1030 ) * CheckpointCallback now saves replay buffer (if present) * VecNormalize stats are saved at checkpoints * Make checkpointing replay buffer and VecNormalize opt-in * Edit changelog * Add documentation for new parameters * Update docs/misc/changelog.rst * Add documentation for new parameters * Implement suggested edits * Reformat code * Fix git conflict * Add .pkl suffix to VecNormalize checkpoints * Add tests for new CheckpointCallback params * Merge CheckpointCallback tests * Update test and add helper for checkpoint path Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2022-08-25 10:57:51 +02:00
Timothé	01cc127d32	Support hparams logging to tensorboard (#984 ) * create Hparam class & support in all OutputFormats * add hparams documentation & example * add hparam tests * remove unnecessary test & fix name * format changes * support hyperparameters logging to tensorboard * fix HParams class docstring * use more explicit variable names * raise error instead of warning * Unpin protobuf * Add test for logging hparams Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2022-08-22 22:06:54 +02:00
jlp-ue	6ce33f5bd2	Fix url in docs (#1000 ) * fixed URL in docs * Update changelog.rst	2022-08-05 17:54:48 +02:00
Marsel Khisamutdinov	d532362e94	Adds info on split tensorboard graphs (#989 ) * Add info on split tensorboard graphs. * Change wording to make it look better. * Update changelog.rst * Rephrase and add link to issue Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2022-07-30 12:44:25 +02:00
Antonin RAFFIN	a18b91e01a	Replace "nature" with "Nature" (magazine) to reduce confusion (#965 ) * Replace "nature" with "Nature" (magazine) to reduce confusion * Replace "nature" with "Nature" (magazine) to reduce confusion * Update changelog Co-authored-by: mel <callmesolis@gmail.com>	2022-07-15 22:48:27 +02:00
Antonin RAFFIN	c1f1c3d3d7	Release v1.6.0 (#958 ) * Release v1.6.0 + update doc + add copy button * Update read the doc conda env * Update year * Fix bug in kl divergence check * Rephrase requirement for envpool and isaac gym	2022-07-12 22:50:23 +02:00
Antonin RAFFIN	d68f0a2411	Update doc: SB3 Contrib RecurrentPPO (#927 ) * Update doc: contrib update * Update docs/misc/changelog.rst Co-authored-by: Anssi <kaneran21@hotmail.com> * Address Anssi comments Co-authored-by: Anssi <kaneran21@hotmail.com>	2022-05-31 18:11:16 +02:00
Antonin RAFFIN	49813d8c68	Update doc and add check for unbounded action space (#918 )	2022-05-25 16:24:21 +02:00
Thomas Rudolf	c2518dc160	Add doc to use mlflow logger (#889 ) * ADD feature for mlflow logger via MLflowOutputFormat. * Move MLFlow integration to doc Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2022-05-08 15:28:31 +02:00
Marsel Khisamutdinov	e98ae129de	Fix a grammatical mistake (#899 ) Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2022-05-03 16:27:48 +02:00
Antonin RAFFIN	c5f0aa5de0	Update doc: PPO blog post and remark on timeouts (#896 )	2022-05-01 16:26:34 +02:00
Antonin RAFFIN	248f082cdc	Bump min PyTorch version (#855 )	2022-04-11 18:34:15 +02:00
Yuan	009bb0549a	Update tensorboard.rst in SummaryWriterCallback (#822 ) * Update tensorboard.rst * update changelog.rst * update changelog.rst, add username	2022-03-15 21:48:52 +01:00
Antonin RAFFIN	e88eb1c9ca	Add explanation of logger output (#803 ) * Add explanation of logger output * Apply suggestions from code review Co-authored-by: Anssi <kaneran21@hotmail.com> * Add example output Co-authored-by: Anssi <kaneran21@hotmail.com>	2022-03-07 12:20:43 +01:00
Julio César Alves	cdaa9ab418	Callback to early stop the training if there is no model improvement after consecutive evaluations (#741 ) * Added StopTrainingOnNoModelImprovement callback and callback_after_eval parameter in EvalCallback * Correction in EvalCallback and tests for StopTrainingOnNoModelImprovement * Update the docs related to new StopTrainingOnNoModelImprovement callback * Update doc Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>	2022-02-25 11:56:47 +01:00
Gautam J	59bec30180	update docs fix indentation (#764 ) * update docs fix indentation Changed code block indentation from 2 spaces to 4 spaces for consistency. * update changelog * Update changelog.rst Co-authored-by: Anssi <kaneran21@hotmail.com>	2022-02-07 21:00:53 +02:00
Ashish Dutt	954daaac37	Custom environment page modified. Following fixes are committed in response to issue#755. (#758 ) * Page modified. Following fixes are committed in response to issue#755. - fixed the broken url on creating custom gym environment. Also added appropriate advice by citing official OpenAi gym documents. - SB3 text tweaked. * modified page - updated the in-line text hyperlinks to follow Sphinx restructured text format. * modified page - updated the in-line text hyperlinks to follow Sphinx restructured text format. - updated text grammar * Language Co-authored-by: Anssi <kaneran21@hotmail.com>	2022-02-05 13:36:36 +02:00
Carlos Luis	5143cd19f7	Gym fixes - Follow up from #705 (#734 ) * fix Atari in CI * fix dtype and atari extra * Update setup.py * remove 3.6 * note about how to install Atari * pendulum-v1 * atari v5 * black * fix pendulum capitalization * add minimum version * moved things in changelog to breaking changes * partial v5 fix * env update to pass tests * mismatch env version fixed * Fix tests after merge * Include autorom in setup.py * Blacken code * Fix dtype issue in more robust way * Fix GitLab CI: switch to Docker container with new black version * Remove workaround from GitLab. (May need to rebuild Docker for this though.) * Revert to v4 * Update setup.py * Apply suggestions from code review * Remove unnecessary autorom * Consistent gym versions Co-authored-by: J K Terry <justinkterry@gmail.com> Co-authored-by: Anssi <kaneran21@hotmail.com> Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> Co-authored-by: modanesh <mohamad4danesh@gmail.com> Co-authored-by: Adam Gleave <adam@gleave.me>	2022-02-04 15:13:57 -08:00
Antonin RAFFIN	54bcfa4544	Add Hugging Face integration to SB3 doc (#733 ) * Add Hugging Face to SB3 doc * Update doc + fixes * Use SB3 model from the hub * Bump version * Fixes Co-authored-by: simoninithomas <simonini_thomas@outlook.fr>	2022-01-20 10:04:12 +01:00

1 2 3

129 commits