stable-baselines3

mirror of https://github.com/saymrwulf/stable-baselines3.git synced 2026-07-01 03:45:11 +00:00

Author	SHA1	Message	Date
Paul Stahlhofen	c5c29a32d9	Clarify the use of Gym wrappers with `make_vec_env` (#2079 ) * Added a note to the documentation of Vectorized Environments to show the possibility of wrapping sub-environments with `make_vec_env` (See #2075 ) * Add example --------- Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2025-02-07 12:04:48 +01:00
Antonin RAFFIN	f8ea2995cb	Doc update: custom envs, IsaacLab, Brax and dm_control (#2072 ) * Add note about start!=0 for Discrete spaces * Update doc for IsaacLab and dm_control * Fix test due to rounding error	2025-01-26 11:42:57 +01:00
Yufeng Gao	d055a2e2af	fix docs atari example by import ale_py (#2071 ) * fix docs atari example by import ale_py * Update changelog --------- Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>	2025-01-21 10:53:42 +01:00
Antonin RAFFIN	897d01d225	Update PyBullet example (#2049 )	2024-11-29 14:58:09 +01:00
James MacGlashan	8a3e3ccb4e	Add Decisions and Dragons site to resources (#2044 ) * add dnd site to resources . add info * add to changelog	2024-11-22 23:02:13 +01:00
Antonin RAFFIN	daaebd0a52	Drop python 3.8 and add python 3.12 support (#2041 ) * Drop python 3.8 support, add python 3.12 support * Upgrade to python 3.9 syntax * Fixes for Numpy v2 * Fix doc warning	2024-11-18 15:40:36 +01:00
Antonin RAFFIN	dd3d0acf15	Update readme and clarify planned features (#2030 ) * Update readme and clarify planned features * Fix rtd python version * Fix pip version for rtd * Update rtd ubuntu and mambaforge * Add upper bound for gymnasium * [ci skip] Update readme	2024-10-29 12:23:13 +01:00
Antonin RAFFIN	3d59b5c86b	Use uv on GitHub CI for faster download and update changelog (#2026 ) * Use uv on GitHub CI for faster download and update changelog * Fix new mypy issues	2024-10-24 15:20:05 +02:00
Quentin Gallouédec	1a69fc8314	Update examples.rst (#1969 )	2024-07-15 23:57:24 +02:00
Sahit Chintalapudi	0eebde7ca1	Fix typo in examples.rst (#1962 ) The variable `env` is not defined. The gym env we want to change is `vec_env`	2024-07-05 15:00:48 +02:00
Chris Schindlbeck	4317c62598	Fix various typos (#1926 ) * Fix various typos * Update changelog --------- Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2024-05-15 15:19:39 +02:00
Nicolò Lucchesi	35eccaf04f	Fix tensorboad video slow numpy->torch conversion (#1910 ) * fixed tb video docs * updated changelog * add comment on expected render() output * Update changelog.rst --------- Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2024-04-26 12:12:04 +02:00
Corentin	e93175084f	Adding ER-MRL to community project (#1904 ) * Add ER_MRL * Update changelog * Move ER-MRL at the end of the file * Improve project description * Update changelog --------- Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>	2024-04-25 14:31:15 +02:00
Antonin Raffin	4af4a32d1b	Update RL Tips and Tricks section	2024-04-22 10:25:32 +02:00
Mark Smith	9a749389d3	Cast learning_rate to float lambda for pickle safety when doing model.load (#1901 ) * create failing test for unpickle error * Fix learning_rate argument causing failure in weights_only=True if passed a function with non-float types * Updated with feedback from araffin on PR#1901 * Update test and version * Update changelog and SBX doc --------- Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2024-04-22 10:04:01 +02:00
Antonin RAFFIN	8b3723c6d8	Update ruff and documentation for hf sb3 (#1866 ) * Update ruff * Only load weights with `torch.load()` to avoid security issues * Update doc about HF integration and remote code execution * Fix doc build * Revert weight_only=True for policies	2024-03-11 13:53:06 +01:00
Antonin RAFFIN	620e58e61f	Update SB3 ONNX export documentation (#1816 )	2024-01-30 15:53:25 +01:00
Francesco Capuano	a653aec10d	Docs: Env attributes should be modified using env setters (#1789 ) * add: paragraph on how to modify vec envs attributes via setters (solves DLR-RM#1573) * Update vec env doc * Update callback doc and SB3 version * Fix indentation --------- Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>	2024-01-10 14:46:40 +01:00
Quentin Gallouédec	373166d6ac	Fix doc: Gym to Gymnasium Atari install command in `examples.rst` (#1773 ) * Update examples.rst * Update changelog.rst	2023-12-05 11:31:11 +01:00
Antonin RAFFIN	294f2b4309	Documentation update (#1732 ) * Update RL Tips * Fix grammar * Update SBX doc * Fix various typos and grammar mistakes	2023-11-03 17:17:46 +01:00
M. Ernestus	69afefc91d	Add rollout_buffer_class parameter to on-policy algorithms (#1720 ) * Add rollout_buffer_class and rollout_buffer_kwargs parameters to OnPolicyAlgorithm * Add rollout_buffer_class and rollout_buffer_kwargs to PPO. * Add rollout_buffer_class and rollout_buffer_kwargs to A2C. * Make use of the rollout buffer kwargs. * Update version * Add test and update doc --------- Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>	2023-10-27 17:36:24 +02:00
Hosseinkhan Rémy	aab545901f	Add support for setting `options` at reset with `VecEnv` (#1606 ) * Update signatures, and test with options * Update changelog and black formatting * Finish implementation (fixes, doc, tests) * Use deepcopy to avoid side effects (modif by reference) * Fix for mypy --------- Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2023-10-23 13:38:48 +02:00
Antonin RAFFIN	e9f0f23ce4	Fix type hints for callbacks, utils and `VecTranspose` (#1648 ) * Fix type hints in `common/utils.py` * Fix `VecTranspose` type annotations * Fix types for callbacks * Update changelog * Fix video recorder type hints * Fix save utils type hints * Allow BytesIO * Improve error message * Make logger and training env properties * Clarify which open_path fn is called	2023-08-29 16:04:08 +02:00
Kyle He	d43400b464	Fix typo in the documentation for Custom Policy Networks (#1620 ) * Update custom_policy.rst * Update changelog.rst --------- Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2023-08-01 13:20:29 +02:00
BertrandDecoster	fa7a3168f3	Update the Callbacks: Evaluate Agent Performance section of the Examples (#1604 ) * Update examples.rst section "Callbacks: Evaluate Agent Performance" Two typos fixed * Update changelog --------- Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2023-07-18 13:02:47 +02:00
Antonin RAFFIN	d68ff2e17f	Drop python 3.7, add 3.11 and update github templates (#1587 ) * Add missing word in patch error message * Add changelog * Drop python 3.7, add 3.11 and update github templates * [ci skip] Update version in doc * Update minimum PyTorch version * Update conda env and fix mypy --------- Co-authored-by: Lukas Hass <lukas@slucky.de>	2023-07-03 12:44:18 +02:00
Antonin Raffin	cc103ff725	Update doc before 2.0 release	2023-06-23 12:31:14 +02:00
Antonin RAFFIN	4fdb65ecf3	Doc fix and add Stable-Baselines3 Jax (SBX) page (#1566 ) * Fix custom policy example * Add RL Zoo doc link * Add changelog to pypi * Add SBX doc page * Fix small mistake in docstring --------- Co-authored-by: Peter Elmers <peter.elmers@yahoo.com>	2023-06-21 18:54:16 +02:00
Thomas Simonini	4fcda6b2d4	Add Deep Reinforcement Learning Course Link (#1531 ) * Add Deep RL Course link * Update changelog.rst	2023-06-05 10:36:09 +02:00
Antonin RAFFIN	c8210ddf50	Update env checker goal env check and links (#1517 ) * Update env checker goal env check and links * Update stable_baselines3/common/env_checker.py Co-authored-by: coin15 <j.andregon15@gmail.com> --------- Co-authored-by: coin15 <j.andregon15@gmail.com>	2023-05-24 11:16:47 +02:00
Antonin RAFFIN	fd0cd82339	Update outdated custom env doc (#1490 ) * Update outdated custom env doc * fix render_mode and term/trunc/reset_info * gym -> gymnasium --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2023-05-08 13:48:26 +02:00
Sidney Tio	d6ddee9366	Add evalcallback example (#1468 ) * Moved 'Monitoring Training' to subsubsection of 'Using callbacks' * Added EvalCallback example * Updated Changelogs * Edited the language * Moved subsection headers up one level * added make_vec_env into Evalcallback example * Added parameters to the top for readability * Added note on multiple training environments * Added more clarity to eval_freq note * Apply suggestions from code review --------- Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2023-05-02 18:02:36 +02:00
Antonin RAFFIN	40e0b9d2c8	Add Gymnasium support (#1327 ) * Fix failing set_env test * Fix test failiing due to deprectation of env.seed * Adjust mean reward threshold in failing test * Fix her test failing due to rng * Change seed and revert reward threshold to 90 * Pin gym version * Make VecEnv compatible with gym seeding change * Revert change to VecEnv reset signature * Change subprocenv seed cmd to call reset instead * Fix type check * Add backward compat * Add `compat_gym_seed` helper * Add goal env checks in env_checker * Add docs on HER requirements for envs * Capture user warning in test with inverted box space * Update ale-py version * Fix randint * Allow noop_max to be zero * Update changelog * Update docker image * Update doc conda env and dockerfile * Custom envs should not have any warnings * Fix test for numpy >= 1.21 * Add check for vectorized compute reward * Bump to gym 0.24 * Fix gym default step docstring * Test downgrading gym * Revert "Test downgrading gym" This reverts commit 0072b77156c006ada8a1d6e26ce347ed85a83eeb. * Fix protobuf error * Fix in dependencies * Fix protobuf dep * Use newest version of cartpole * Update gym * Fix warning * Loosen required scipy version * Scipy no longer needed * Try gym 0.25 * Silence warnings from gym * Filter warnings during tests * Update doc * Update requirements * Add gym 26 compat in vec env * Fixes in envs and tests for gym 0.26+ * Enforce gym 0.26 api * format * Fix formatting * Fix dependencies * Fix syntax * Cleanup doc and warnings * Faster tests * Higher budget for HER perf test (revert prev change) * Fixes and update doc * Fix doc build * Fix breaking change * Fixes for rendering * Rename variables in monitor * update render method for gym 0.26 API backwards compatible (mode argument is allowed) while using the gym 0.26 API (render mode is determined at environment creation) * update tests and docs to new gym render API * undo removal of render modes metatadata check * set rgb_array as default render mode for gym.make * undo changes & raise warning if not 'rgb_array' * Fix type check * Remove recursion and fix type checking * Remove hacks for protobuf and gym 0.24 * Fix type annotations * reuse existing render_mode attribute * return tiled images for 'human' render mode * Allow to use opencv for human render, fix typos * Add warning when using non-zero start with Discrete (fixes #1197) * Fix type checking * Bug fixes and handle more cases * Throw proper warnings * Update test * Fix new metadata name * Ignore numpy warnings * Fixes in vec recorder * Global ignore * Filter local warning too * Monkey patch not needed for gym 26 * Add doc of VecEnv vs Gym API * Add render test * Fix return type * Update VecEnv vs Gym API doc * Fix for custom render mode * Fix return type * Fix type checking * check test env test_buffer * skip render check * check env test_dict_env * test_env test_gae * check envs in remaining tests * Update tests * Add warning for Discrete action space with non-zero (#1295) * Fix atari annotation * ignore get_action_meanings [attr-defined] * Fix mypy issues * Add patch for gym/gymnasium transition * Switch to gymnasium * Rely on signature instead of version * More patches * Type ignore because of https://github.com/Farama-Foundation/Gymnasium/pull/39 * Fix doc build * Fix pytype errors * Fix atari requirement * Update env checker due to change in dtype for Discrete * Fix type hint * Convert spaces for saved models * Ignore pytype * Remove gitlab CI * Disable pytype for convert space * Fix undefined info * Fix undefined info * Upgrade shimmy * Fix wrappers type annotation (need PR from Gymnasium) * Fix gymnasium dependency * Fix dependency declaration * Cap pygame version for python 3.7 * Point to master branch (v0.28.0) * Fix: use main not master branch * Rename done to terminated * Fix pygame dependency for python 3.7 * Rename gym to gymnasium * Update Gymnasium * Fix test * Fix tests * Forks don't have access to private variables * Fix linter warnings * Update read the doc env * Fix env checker for GoalEnv * Fix import * Update env checker (more info) and fix dtype * Use micromamab for Docker * Update dependencies * Clarify VecEnv doc * Fix Gymnasium version * Copy file only after mamba install * [ci skip] Update docker doc * Polish code * Reformat * Remove deprecated features * Ignore warning * Update doc * Update examples and changelog * Fix type annotation bundle (SAC, TD3, A2C, PPO, base class) (#1436) * Fix SAC type hints, improve DQN ones * Fix A2C and TD3 type hints * Fix PPO type hints * Fix on-policy type hints * Fix base class type annotation, do not use defaults * Update version * Disable mypy for python 3.7 * Rename Gym26StepReturn * Update continuous critic type annotation * Fix pytype complain --------- Co-authored-by: Carlos Luis <carlos.luisgonc@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Thomas Lips <37955681+tlpss@users.noreply.github.com> Co-authored-by: tlips <thomas.lips@ugent.be> Co-authored-by: tlpss <thomas17.lips@gmail.com> Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>	2023-04-14 13:13:59 +02:00
Quentin Gallouédec	c5adad82b2	Multiprocessing support for HerReplayBuffer (#704 ) * IM compat. modif from old fork * mp her working, without offline sampling * update readme and doc * fix discrete action/obs space case * handle offline sampling * fix pos to be consistent with the old version * improve typing and docstring * fix discrete obs special case * new her, using episode uid * deal with full buffer * offline not implemented * info storage; compute_reward as arg; offline sampling error * offline sampling; timeout_termination; fix last_trans detection * rm max_episode_length from tests * fix loading and loading test * Fix episode sampling strategy * Episode interrupted not valid * Typo * Fix infos sampling, next_obs desired goals, offline sampling * update tests for multienvs * speed up code * handle timeout sampling when samping * give up ep_uid for ep_start and ep_lenght * speed up sampling * Improve docstring * Typos and renaming * Fix typing * Fix linter warnings * Renaming + add note * fix reward type * Fix future sampling strategy * Fix future goal selection strategy * env_fn as lambda * Re-fix linter warnings * Formatting * Fix offline sampling * restore the initial performance budget * Remove max_episode_length for HerReplayBuffer kwargs * SubprcVecEnv compat test * Dedicated SubrocVecEnv test rm n_envs from parametrization * Back to using the env arg instead of compute_reward * Up VecEnv import * fix lint warnings * fix docstring * Fix device issue * actor_loss_modifier in SAV and TD3 * Merge RewardModifier and ActorLossModifier into Surgeon * update surgeon for rnd * fix uninteded merge * fix uninteded merge * fix unintended merge * Rm unintended merge * Fix KeyError * Remove useless `all_inds` * Minor docstring format * Fix hint * speedup! * Speedup again * speedup * np.nonzero * fix env normalization * flat sampling for speedup * typo * drop online * format * remove observation from env_cheker (see #1335) * update changelog * default device to "auto" * add comment for info storage * add comment for ep_start and ep_length attributes * a[b][c] to a[b, c] * comment flatnonzero and unravel_index * update _sample_goals docstring * Fix future gaol sampling for split episode * add informative error message for learning_starts too small * use keyword arg for env * try fix pytye * Update stable_baselines3/common/off_policy_algorithm.py Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Add `copy_info_dict` option * Ignore pytype * Update changelog * Rename variables and improve documentation * Ignore new bug bear rule * Add note about future strategy * Add deprecation warning * Fix bug trying to pickle buffer kwargs --------- Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2023-03-20 12:03:57 +01:00
Antonin RAFFIN	e5deeed16e	Update doc about Gymnasium support (#1382 )	2023-03-14 12:43:19 +01:00
Antonin RAFFIN	f0382a25bd	Add documentation about default network architecture (#1353 ) * Add documentation about default network architecture * [ci skip] Rename custom policy section to Policy Networks	2023-03-02 14:14:57 +01:00
harveybellini	7a1e429702	Remove Note from examples - Code works (#1330 ) * Remove Note Gif creation works with Atari Environments using the script provided below. * Update changelog --------- Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>	2023-02-15 13:14:02 +01:00
Quentin Gallouédec	2e4a45020e	Refactor observation stacking (#1238 ) * refactor stacking obs * Improve docstring * remove all StackedDictObservations * Update tests and make stacked obs clearer * Fix type check * fix stacked_observation_space * undo init change, deprecate StackedDictObservations * deprecate stack_observation_space * type hints * ignore pytype errors * undo vecenv doc change * Deprecation warning in StackedDictObs doctstring * Fix vec_env.rst * Fix __all__ sorting * fix pytype ignore statement * Update docstring * stack * Remove n_stack * Update changelog * Simplify code * Rename test file * Re-use variable for shift * Fix doc build * Remove pytype comment * Disable pytype error --------- Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2023-02-06 22:41:59 +01:00
Alex Pasquali	b702884c23	Removed shared layers in mlp_extractor (#1292 ) * Modified actor-critic policies & MlpExtractor class ActorCriticPolicy: - changed type hint of net_arch param: now it's a dict - removed check that if features extractor is not shared: no shared layers are allowed in the mlp_extractor regardless of the features extractor ActorCriticCnnPolicy: - changed type hint of net_arch param: now it's a dict MultiInputActorcriticPolicy: - changed type hint of net_arch param: now it's a dict MlpExtractor: - changed type hint of net_arch param: now it's a dict - adapted networks creation - adapted methods: forward, forward_actor & forward_critic * Removed shared layers in mlp_extractor * Updated docs and changelog + reformat * Updated custom policy tests * Removed test on deprecation warning for share layers in mlp_extractor Now shared layers are removed * Update version * Update RL Zoo doc * Fix linter warnings * Add ruff to Makefile (experimental) * Add backward compat code and minor updates * Update tests * Add backward compatibility * Fix test * Improve compat code Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2023-01-23 14:55:19 +01:00
Quentin Gallouédec	92f7a6f23b	Fix `test_vec_normalize.py`, `test_tensorboard.py` and `common/monitor.py` type hint (#1194 ) * Remove from mypy exclude * type hint for metadata * Union[float, int] -> float * Remove useless __init__ * Type hint for model and logger in BaseCallback * Type hint for metric_dict * Update changelog * fix test_tensorboard * ignore gamma type checking * Fix monitor type hint * Update logger type hints * Fix type annotation and bump version * Fix circular import Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2023-01-13 18:28:22 +01:00
Yu Zheng	9bb1538b78	Fix outdated `load_parameters` to `set_parameters` (#1270 ) * Update examples.rst * Update changelog Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2023-01-11 14:13:21 +01:00
Alex Pasquali	30a19848ce	Deprecation of shared layers in `MlpExtractor` (#1252 ) * Deprecation warning for shared layers in Mlpextractor * Updated changelog * Updated custom policy doc * Update doc and deprecation * Fix doc build * Minor edits Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2023-01-05 09:59:36 +01:00
Quentin Gallouédec	4fa17dcf0f	Standardize the use of `from gym import spaces` (#1240 ) * generalize the use of `from gym import spaces` * command line get system info * Documentation line length for doc * update changelog * add space before os plateform to avoid ref to other issue * format * get_system_info update in changelog * fix type check error * fix get system info * add comment about regex * update version	2023-01-02 14:51:11 +01:00
Antonin RAFFIN	7202ece85b	Update tensorboard callback doc (#1221 ) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2022-12-21 12:51:28 +01:00
Quentin Gallouédec	96b1a7cf01	`env_id` consistency in tests (#1224 ) Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2022-12-20 16:01:26 +01:00
Alex Pasquali	2cfcec4f50	Modified ActorCriticPolicy to support non-shared features extractor (#1148 ) * Modified ActorCriticPolicy to support non-shared features extractor * Refactored features extraction with non-shared features extractor in ActorCriticPolicy and updated doc Doc update: added 'warning' on custom policy docs that says that, if the features extractor is non-shared, it's not possible to have shared layers in the mlp_extractor * Moved attrib share_features_extractor in class * Updated custom policy doc for non-shared features extractor * Updated changelog * Made some if-statements more readable if policies.py The if-statements are related to the shared/non-shared features extractor in ActorCritic policies * Simplify implementation and add run test * Keep order in module gain to keep previous results consistents * Fix test * Improved docstring in policies.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Added some tests * feature extractor -> features extractor * Fix test * Fix env_id in test * Make features extractor parameter explicit * Remove duplicate Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>	2022-12-20 15:12:05 +01:00
Antonin RAFFIN	8452106734	Fix support of image like normalized inputs (#1214 ) * Fix support of image like normalized inputs * Improve docstring and warning message. * Don't check if obs is image when normalize_images is False (lil opt) * Comment fix * Fix normalize_images not passed to parent * Check for subclasses too * Remove useless multiline * Update version and add comment * Fix some typos Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2022-12-20 13:18:28 +01:00
Alex Pasquali	6d55a09f81	Updated custom policy docs to better explain the ``mlp_extractor``'s dimensions (#1196 ) * Updated custom policy docs Better explained how the dimensions of the mlp_extractor work, including the action net and the value net after the layers specified in net_arch. * Improved custom policy doc Section: Custom Network Architecture. Explained with greater detail that an action net and a value net will be added on top of the net_arch. * Improved custom policy doc Section: Custom Network Architecture. Merged a comment into a note * Alignment Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>	2022-12-12 16:19:51 +01:00
Athanasios Theocharis	f7d7ed3fa7	Update custom_policy.rst (#1183 ) * Update custom_policy.rst * Update changelog Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>	2022-12-06 17:51:52 +01:00
Quentin Gallouédec	e3b24829a5	Drop `gym.GoalEnv` and other minor changes initally from #780 (#1184 ) * Various changes from #780 * Fix env_checker for goal_env detection	2022-11-28 18:22:31 +01:00

1 2 3 4

162 commits