stable-baselines3

mirror of https://github.com/saymrwulf/stable-baselines3.git synced 2026-05-16 21:10:08 +00:00

History

Quentin Gallouédec c5adad82b2 Multiprocessing support for HerReplayBuffer (#704 ) * IM compat. modif from old fork * mp her working, without offline sampling * update readme and doc * fix discrete action/obs space case * handle offline sampling * fix pos to be consistent with the old version * improve typing and docstring * fix discrete obs special case * new her, using episode uid * deal with full buffer * offline not implemented * info storage; compute_reward as arg; offline sampling error * offline sampling; timeout_termination; fix last_trans detection * rm max_episode_length from tests * fix loading and loading test * Fix episode sampling strategy * Episode interrupted not valid * Typo * Fix infos sampling, next_obs desired goals, offline sampling * update tests for multienvs * speed up code * handle timeout sampling when samping * give up ep_uid for ep_start and ep_lenght * speed up sampling * Improve docstring * Typos and renaming * Fix typing * Fix linter warnings * Renaming + add note * fix reward type * Fix future sampling strategy * Fix future goal selection strategy * env_fn as lambda * Re-fix linter warnings * Formatting * Fix offline sampling * restore the initial performance budget * Remove max_episode_length for HerReplayBuffer kwargs * SubprcVecEnv compat test * Dedicated SubrocVecEnv test rm n_envs from parametrization * Back to using the env arg instead of compute_reward * Up VecEnv import * fix lint warnings * fix docstring * Fix device issue * actor_loss_modifier in SAV and TD3 * Merge RewardModifier and ActorLossModifier into Surgeon * update surgeon for rnd * fix uninteded merge * fix uninteded merge * fix unintended merge * Rm unintended merge * Fix KeyError * Remove useless `all_inds` * Minor docstring format * Fix hint * speedup! * Speedup again * speedup * np.nonzero * fix env normalization * flat sampling for speedup * typo * drop online * format * remove observation from env_cheker (see #1335) * update changelog * default device to "auto" * add comment for info storage * add comment for ep_start and ep_length attributes * a[b][c] to a[b, c] * comment flatnonzero and unravel_index * update _sample_goals docstring * Fix future gaol sampling for split episode * add informative error message for learning_starts too small * use keyword arg for env * try fix pytye * Update stable_baselines3/common/off_policy_algorithm.py Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Add `copy_info_dict` option * Ignore pytype * Update changelog * Rename variables and improve documentation * Ignore new bug bear rule * Add note about future strategy * Add deprecation warning * Fix bug trying to pickle buffer kwargs --------- Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>		2023-03-20 12:03:57 +01:00
..
a2c.rst	Add scaling section to A2C documentation (#1250 )	2023-02-02 12:34:38 +01:00
base.rst	Review of code (A2C, PPO and refactoring) (#35 )	2020-06-09 13:54:18 +02:00
ddpg.rst	Gym fixes - Follow up from #705 (#734 )	2022-02-04 15:13:57 -08:00
dqn.rst	Update changelog for #1184 (#1185 )	2022-11-28 19:36:26 +01:00
her.rst	Multiprocessing support for HerReplayBuffer (#704 )	2023-03-20 12:03:57 +01:00
ppo.rst	Fixed typo in PPO doc (#983 )	2022-07-30 12:52:35 +02:00
sac.rst	Gym fixes - Follow up from #705 (#734 )	2022-02-04 15:13:57 -08:00
td3.rst	Gym fixes - Follow up from #705 (#734 )	2022-02-04 15:13:57 -08:00