stable-baselines3

mirror of https://github.com/saymrwulf/stable-baselines3.git synced 2026-05-14 20:58:03 +00:00

Author	SHA1	Message	Date
Antonin RAFFIN	f8ea2995cb	Doc update: custom envs, IsaacLab, Brax and dm_control (#2072 ) * Add note about start!=0 for Discrete spaces * Update doc for IsaacLab and dm_control * Fix test due to rounding error	2025-01-26 11:42:57 +01:00
Yufeng Gao	d055a2e2af	fix docs atari example by import ale_py (#2071 ) * fix docs atari example by import ale_py * Update changelog --------- Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>	2025-01-21 10:53:42 +01:00
Antonin RAFFIN	897d01d225	Update PyBullet example (#2049 )	2024-11-29 14:58:09 +01:00
Quentin Gallouédec	1a69fc8314	Update examples.rst (#1969 )	2024-07-15 23:57:24 +02:00
Sahit Chintalapudi	0eebde7ca1	Fix typo in examples.rst (#1962 ) The variable `env` is not defined. The gym env we want to change is `vec_env`	2024-07-05 15:00:48 +02:00
Chris Schindlbeck	4317c62598	Fix various typos (#1926 ) * Fix various typos * Update changelog --------- Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2024-05-15 15:19:39 +02:00
Quentin Gallouédec	373166d6ac	Fix doc: Gym to Gymnasium Atari install command in `examples.rst` (#1773 ) * Update examples.rst * Update changelog.rst	2023-12-05 11:31:11 +01:00
Antonin RAFFIN	294f2b4309	Documentation update (#1732 ) * Update RL Tips * Fix grammar * Update SBX doc * Fix various typos and grammar mistakes	2023-11-03 17:17:46 +01:00
BertrandDecoster	fa7a3168f3	Update the Callbacks: Evaluate Agent Performance section of the Examples (#1604 ) * Update examples.rst section "Callbacks: Evaluate Agent Performance" Two typos fixed * Update changelog --------- Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2023-07-18 13:02:47 +02:00
Antonin Raffin	cc103ff725	Update doc before 2.0 release	2023-06-23 12:31:14 +02:00
Antonin RAFFIN	fd0cd82339	Update outdated custom env doc (#1490 ) * Update outdated custom env doc * fix render_mode and term/trunc/reset_info * gym -> gymnasium --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2023-05-08 13:48:26 +02:00
Sidney Tio	d6ddee9366	Add evalcallback example (#1468 ) * Moved 'Monitoring Training' to subsubsection of 'Using callbacks' * Added EvalCallback example * Updated Changelogs * Edited the language * Moved subsection headers up one level * added make_vec_env into Evalcallback example * Added parameters to the top for readability * Added note on multiple training environments * Added more clarity to eval_freq note * Apply suggestions from code review --------- Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2023-05-02 18:02:36 +02:00
Antonin RAFFIN	40e0b9d2c8	Add Gymnasium support (#1327 ) * Fix failing set_env test * Fix test failiing due to deprectation of env.seed * Adjust mean reward threshold in failing test * Fix her test failing due to rng * Change seed and revert reward threshold to 90 * Pin gym version * Make VecEnv compatible with gym seeding change * Revert change to VecEnv reset signature * Change subprocenv seed cmd to call reset instead * Fix type check * Add backward compat * Add `compat_gym_seed` helper * Add goal env checks in env_checker * Add docs on HER requirements for envs * Capture user warning in test with inverted box space * Update ale-py version * Fix randint * Allow noop_max to be zero * Update changelog * Update docker image * Update doc conda env and dockerfile * Custom envs should not have any warnings * Fix test for numpy >= 1.21 * Add check for vectorized compute reward * Bump to gym 0.24 * Fix gym default step docstring * Test downgrading gym * Revert "Test downgrading gym" This reverts commit 0072b77156c006ada8a1d6e26ce347ed85a83eeb. * Fix protobuf error * Fix in dependencies * Fix protobuf dep * Use newest version of cartpole * Update gym * Fix warning * Loosen required scipy version * Scipy no longer needed * Try gym 0.25 * Silence warnings from gym * Filter warnings during tests * Update doc * Update requirements * Add gym 26 compat in vec env * Fixes in envs and tests for gym 0.26+ * Enforce gym 0.26 api * format * Fix formatting * Fix dependencies * Fix syntax * Cleanup doc and warnings * Faster tests * Higher budget for HER perf test (revert prev change) * Fixes and update doc * Fix doc build * Fix breaking change * Fixes for rendering * Rename variables in monitor * update render method for gym 0.26 API backwards compatible (mode argument is allowed) while using the gym 0.26 API (render mode is determined at environment creation) * update tests and docs to new gym render API * undo removal of render modes metatadata check * set rgb_array as default render mode for gym.make * undo changes & raise warning if not 'rgb_array' * Fix type check * Remove recursion and fix type checking * Remove hacks for protobuf and gym 0.24 * Fix type annotations * reuse existing render_mode attribute * return tiled images for 'human' render mode * Allow to use opencv for human render, fix typos * Add warning when using non-zero start with Discrete (fixes #1197) * Fix type checking * Bug fixes and handle more cases * Throw proper warnings * Update test * Fix new metadata name * Ignore numpy warnings * Fixes in vec recorder * Global ignore * Filter local warning too * Monkey patch not needed for gym 26 * Add doc of VecEnv vs Gym API * Add render test * Fix return type * Update VecEnv vs Gym API doc * Fix for custom render mode * Fix return type * Fix type checking * check test env test_buffer * skip render check * check env test_dict_env * test_env test_gae * check envs in remaining tests * Update tests * Add warning for Discrete action space with non-zero (#1295) * Fix atari annotation * ignore get_action_meanings [attr-defined] * Fix mypy issues * Add patch for gym/gymnasium transition * Switch to gymnasium * Rely on signature instead of version * More patches * Type ignore because of https://github.com/Farama-Foundation/Gymnasium/pull/39 * Fix doc build * Fix pytype errors * Fix atari requirement * Update env checker due to change in dtype for Discrete * Fix type hint * Convert spaces for saved models * Ignore pytype * Remove gitlab CI * Disable pytype for convert space * Fix undefined info * Fix undefined info * Upgrade shimmy * Fix wrappers type annotation (need PR from Gymnasium) * Fix gymnasium dependency * Fix dependency declaration * Cap pygame version for python 3.7 * Point to master branch (v0.28.0) * Fix: use main not master branch * Rename done to terminated * Fix pygame dependency for python 3.7 * Rename gym to gymnasium * Update Gymnasium * Fix test * Fix tests * Forks don't have access to private variables * Fix linter warnings * Update read the doc env * Fix env checker for GoalEnv * Fix import * Update env checker (more info) and fix dtype * Use micromamab for Docker * Update dependencies * Clarify VecEnv doc * Fix Gymnasium version * Copy file only after mamba install * [ci skip] Update docker doc * Polish code * Reformat * Remove deprecated features * Ignore warning * Update doc * Update examples and changelog * Fix type annotation bundle (SAC, TD3, A2C, PPO, base class) (#1436) * Fix SAC type hints, improve DQN ones * Fix A2C and TD3 type hints * Fix PPO type hints * Fix on-policy type hints * Fix base class type annotation, do not use defaults * Update version * Disable mypy for python 3.7 * Rename Gym26StepReturn * Update continuous critic type annotation * Fix pytype complain --------- Co-authored-by: Carlos Luis <carlos.luisgonc@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Thomas Lips <37955681+tlpss@users.noreply.github.com> Co-authored-by: tlips <thomas.lips@ugent.be> Co-authored-by: tlpss <thomas17.lips@gmail.com> Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>	2023-04-14 13:13:59 +02:00
Quentin Gallouédec	c5adad82b2	Multiprocessing support for HerReplayBuffer (#704 ) * IM compat. modif from old fork * mp her working, without offline sampling * update readme and doc * fix discrete action/obs space case * handle offline sampling * fix pos to be consistent with the old version * improve typing and docstring * fix discrete obs special case * new her, using episode uid * deal with full buffer * offline not implemented * info storage; compute_reward as arg; offline sampling error * offline sampling; timeout_termination; fix last_trans detection * rm max_episode_length from tests * fix loading and loading test * Fix episode sampling strategy * Episode interrupted not valid * Typo * Fix infos sampling, next_obs desired goals, offline sampling * update tests for multienvs * speed up code * handle timeout sampling when samping * give up ep_uid for ep_start and ep_lenght * speed up sampling * Improve docstring * Typos and renaming * Fix typing * Fix linter warnings * Renaming + add note * fix reward type * Fix future sampling strategy * Fix future goal selection strategy * env_fn as lambda * Re-fix linter warnings * Formatting * Fix offline sampling * restore the initial performance budget * Remove max_episode_length for HerReplayBuffer kwargs * SubprcVecEnv compat test * Dedicated SubrocVecEnv test rm n_envs from parametrization * Back to using the env arg instead of compute_reward * Up VecEnv import * fix lint warnings * fix docstring * Fix device issue * actor_loss_modifier in SAV and TD3 * Merge RewardModifier and ActorLossModifier into Surgeon * update surgeon for rnd * fix uninteded merge * fix uninteded merge * fix unintended merge * Rm unintended merge * Fix KeyError * Remove useless `all_inds` * Minor docstring format * Fix hint * speedup! * Speedup again * speedup * np.nonzero * fix env normalization * flat sampling for speedup * typo * drop online * format * remove observation from env_cheker (see #1335) * update changelog * default device to "auto" * add comment for info storage * add comment for ep_start and ep_length attributes * a[b][c] to a[b, c] * comment flatnonzero and unravel_index * update _sample_goals docstring * Fix future gaol sampling for split episode * add informative error message for learning_starts too small * use keyword arg for env * try fix pytye * Update stable_baselines3/common/off_policy_algorithm.py Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Add `copy_info_dict` option * Ignore pytype * Update changelog * Rename variables and improve documentation * Ignore new bug bear rule * Add note about future strategy * Add deprecation warning * Fix bug trying to pickle buffer kwargs --------- Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2023-03-20 12:03:57 +01:00
harveybellini	7a1e429702	Remove Note from examples - Code works (#1330 ) * Remove Note Gif creation works with Atari Environments using the script provided below. * Update changelog --------- Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>	2023-02-15 13:14:02 +01:00
Yu Zheng	9bb1538b78	Fix outdated `load_parameters` to `set_parameters` (#1270 ) * Update examples.rst * Update changelog Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2023-01-11 14:13:21 +01:00
Antonin RAFFIN	8452106734	Fix support of image like normalized inputs (#1214 ) * Fix support of image like normalized inputs * Improve docstring and warning message. * Don't check if obs is image when normalize_images is False (lil opt) * Comment fix * Fix normalize_images not passed to parent * Check for subclasses too * Remove useless multiline * Update version and add comment * Fix some typos Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2022-12-20 13:18:28 +01:00
Quentin Gallouédec	e3b24829a5	Drop `gym.GoalEnv` and other minor changes initally from #780 (#1184 ) * Various changes from #780 * Fix env_checker for goal_env detection	2022-11-28 18:22:31 +01:00
tobirohrer	d8a430e088	Deprecate `create_eval_env`, `eval_env` and `eval_freq` parameter (#1082 ) * Adds deprecation warning if `eval_env` or `eval_freq` parameters are used. See #925 * added changelog entry * added missing backtick * deprecating `create_eval_env` parameter as well and adding comments to explain the `stacklevel` parameter used * Updated tests to ignore DeprecationWarnings * Updated changelog entry * - Removed the `create_eval_env` parameter from the examples in the docs - Removed information about the `create_eval_env` parameter from the migration docs - Added information about deprecation of the `create_eval_env` parameter in the docs * Add alternative in docstring * Update docstrings * `eval_freq` warning in docstring * Add deprecation comments in tests Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>	2022-10-10 15:39:38 +02:00
Antonin RAFFIN	7c21b79188	Add progress bar callback and argument (#1095 ) * Add progress bar callback and argument * Update doc * Update changelog * Upgrade pytype in docker image * Use tqdm.write in the logger to have cleaner output * Fix logger test * Fix when doing multiple calls to learn() * Address comments from code-review	2022-10-06 18:17:31 +02:00
Quentin Gallouédec	a697401e03	Standardized the use of ``"`` for string representation (#1086 ) * Replace ``'`` by ``" `` in python code * Update changelog * Rm whitespace	2022-10-03 15:15:39 +02:00
Quentin Gallouédec	98e786f744	Clarify and standardize verbosity documentation (#1056 ) * Standardize the use of verbosity: > to >= * Make verbose docstring more specific * Update changelog	2022-09-09 16:46:28 +02:00
Antonin RAFFIN	c1f1c3d3d7	Release v1.6.0 (#958 ) * Release v1.6.0 + update doc + add copy button * Update read the doc conda env * Update year * Fix bug in kl divergence check * Rephrase requirement for envpool and isaac gym	2022-07-12 22:50:23 +02:00
Carlos Luis	5143cd19f7	Gym fixes - Follow up from #705 (#734 ) * fix Atari in CI * fix dtype and atari extra * Update setup.py * remove 3.6 * note about how to install Atari * pendulum-v1 * atari v5 * black * fix pendulum capitalization * add minimum version * moved things in changelog to breaking changes * partial v5 fix * env update to pass tests * mismatch env version fixed * Fix tests after merge * Include autorom in setup.py * Blacken code * Fix dtype issue in more robust way * Fix GitLab CI: switch to Docker container with new black version * Remove workaround from GitLab. (May need to rebuild Docker for this though.) * Revert to v4 * Update setup.py * Apply suggestions from code review * Remove unnecessary autorom * Consistent gym versions Co-authored-by: J K Terry <justinkterry@gmail.com> Co-authored-by: Anssi <kaneran21@hotmail.com> Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> Co-authored-by: modanesh <mohamad4danesh@gmail.com> Co-authored-by: Adam Gleave <adam@gleave.me>	2022-02-04 15:13:57 -08:00
Antonin RAFFIN	bb16645c4e	Add `skip` option for `VecTransposeImage` and bug fix in frame stack (#700 ) * Update doc * Add comment * Add skip option to VecTransposeImage and fix bug in frame stack	2021-12-23 17:12:49 +02:00
Demetrio92	798b16aaf7	more verbose documentation regarding `.load` vs `.set_parameters` (#696 ) * more verbose documentation regarding `.load` vs `.set_parameters` (#683, #614) * add a note to explain the difference between `.load` and `.set_parameters` to the examples * fix typos Co-authored-by: Anssi <kaneran21@hotmail.com> Co-authored-by: Anssi <kaneran21@hotmail.com>	2021-12-18 17:28:37 +02:00
Antonin RAFFIN	507ed1762e	Multiprocessing support for off policy algorithms (#439 ) * Add multi-env training support for SAC * Fix for dict obs * Pytype fixes * Fix assert on number of envs * Remove for loop * Add support for Dict obs * Start cleanup * Update doc and bug fix * Add support for vectorized action noise and add multi env example for off-policy * Update version * Bug fix with VecNormalize * Update README table * Update variable names * Update changelog and version * Update doc and fix for `gradient_steps=-1` * Add test for `gradient_steps=-1` * Disable pytype pyi errors * Fix for DQN * Update comment on deepcopy * Remove episode_reward field * Fix RolloutReturn * Avoid modification by reference * Fix error message Co-authored-by: Anssi <kaneran21@hotmail.com>	2021-12-01 22:30:09 +01:00
Antonin RAFFIN	e907eca18e	Fix `set_env` to keep the number of timesteps (#615 ) * Fix for `set_env` * Add test and update changelog * Use underscores and f-strings * Add PyPi info * Update comments	2021-10-23 16:36:40 +02:00
Antonin RAFFIN	1564a85081	System info helper (#613 ) * Add `system_env_info` * Add `print_system_info` to load and store system info at save time * Remove TODO * Rename to `get_system_info` * Import as sb3 for consistency * Update changelog * Add warning for old SB3 versions * Use underscore litteral for more clarity	2021-10-18 10:43:56 +02:00
MihaiAnca13	c41368f2ea	Docs examples warning - issue #526 (#530 ) * Update a2c.rst * Update ddpg.rst * Update dqn.rst * Update her.rst * Update ppo.rst * Update sac.rst * Update td3.rst * Update changelog.rst * modified message * Update examples.rst Co-authored-by: Anssi <kaneran21@hotmail.com>	2021-08-09 16:23:25 +03:00
Antonin RAFFIN	503425932f	Documentation fixes (#514 ) * Update multiprocessing example * Add VecEnvWrapper example * Update docs/guide/vec_envs.rst Co-authored-by: Anssi <kaneran21@hotmail.com> Co-authored-by: Anssi <kaneran21@hotmail.com>	2021-07-18 20:51:41 +02:00
Antonin RAFFIN	378d197b00	Doc update (custom policy + fix her example) (#436 )	2021-05-16 18:21:07 +02:00
Jaden Travnik	75b6f3b3b0	Dictionary Observations (#243 ) * First commit * Fixing missing refs from a quick merge from master * Reformat * Adding DictBuffers * Reformat * Minor reformat * added slow dict test. Added SACMultiInputPolicy for future. Added private static image transpose helper to common policy * Ran black on buffers * Ran isort * Adding StackedObservations classes used within VecStackEnvs wrappers. Made test_dict_env shorter and removed slow * Running isort :facepalm * Fixed typing issues * Adding docstrings and typing. Using util for moving data to device. * Fixed trailing commas * Fix types * Minor edits * Avoid duplicating code * Fix calls to parents * Adding assert to buffers. Updating changelong * Running format on buffers * Adding multi-input policies to dqn,td3,a2c. Fixing warnings. Fixed bug with DictReplayBuffer as Replay buffers use only 1 env * Fixing warnings, splitting is_vectorized_observation into multiple functions based on space type * Created envs folder in common. Updated imports. Moved stacked_obs to vec_env folder * Moved envs to envs directory. Moved stacked obs to vec_envs. Started update on documentation * Fixes * Running code style * Update docstrings on torch_layers * Decapitalize non-constant variables * Using NatureCNN architecture in combined extractor. Increasing img size in multi input env. Adding memory reduction in test * Update doc * Update doc * Fix format * Removing NineRoom env. Using nested preprocess. Removing mutable default args * running code style * Passing channel check through to stacked dict observations. * Running black * Adding channel control to SimpleMultiObsEnv. Passing check_channels to CombinedExtractor * Remove optimize memory for dict buffers * Update doc * Move identity env * Minor edits + bump version * Update doc * Fix doc build * Bug fixes + add support for more type of dict env * Fixes + add multi env test * Add support for vectranspose * Fix stacked obs for dict and add tests * Add check for nested spaces. Fix dict-subprocvecenv test * Fix (single) pytype error * Simplify CombinedExtractor * Fix tests * Fix check * Merge branch 'master' into feat/dict_observations * Fix for net_arch with dict and vector obs * Fixes * Add consistency test * Update env checker * Add some docs on dict obs * Update default CNN feature vector size * Refactor HER (#351) * Start refactoring HER * Fixes * Additional fixes * Faster tests * WIP: HER as a custom replay buffer * New replay only version (working with DQN) * Add support for all off-policy algorithms * Fix saving/loading * Remove ObsDictWrapper and add VecNormalize tests with dict * Stable-Baselines3 v1.0 (#354) * Bump version and update doc * Fix name * Apply suggestions from code review Co-authored-by: Adam Gleave <adam@gleave.me> * Update docs/index.rst Co-authored-by: Adam Gleave <adam@gleave.me> * Update wording for RL zoo Co-authored-by: Adam Gleave <adam@gleave.me> * Add gym-pybullet-drones project (#358) * Update projects.rst Added gym-pybullet-drones * Update projects.rst Longer title underline * Update changelog Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org> * Include SuperSuit in projects (#359) * include supersuit * longer title underline * Update changelog.rst * Fix default arguments + add bugbear (#363) * Fix potential bug + add bug bear * Remove unused variables * Minor: version bump * Add code of conduct + update doc (#373) * Add code of conduct * Fix DQN doc example * Update doc (channel-last/first) * Apply suggestions from code review Co-authored-by: Anssi <kaneran21@hotmail.com> * Apply suggestions from code review Co-authored-by: Adam Gleave <adam@gleave.me> Co-authored-by: Anssi <kaneran21@hotmail.com> Co-authored-by: Adam Gleave <adam@gleave.me> * Make installation command compatible with ZSH (#376) * Add quotes * Add Zsh bracket info * Add clarify pip installation line * Make note bold * Add Zsh pip installation note * Add handle timeouts param * Fixes * Fixes (buffer size, extend test) * Fix `max_episode_length` redefinition * Fix potential issue * Add some docs on dict obs * Fix performance bug * Fix slowdown * Add package to install (#378) * Add package to install * Update docs packages installation command Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Fix backward compat + add test * Fix VecEnv detection * Update doc * Fix vec env check * Support for `VecMonitor` for gym3-style environments (#311) * add vectorized monitor * auto format of the code * add documentation and VecExtractDictObs * refactor and add test cases * add test cases and format * avoid circular import and fix doc * fix type * fix type * oops * Update stable_baselines3/common/monitor.py Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Update stable_baselines3/common/monitor.py Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * add test cases * update changelog * fix mutable argument * quick fix * Apply suggestions from code review * fix terminal observation for gym3 envs * delete comment * Update doc and bump version * Add warning when already using `Monitor` wrapper * Update vecmonitor tests * Fixes Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Reformat * Fixed loading of ``ent_coef`` for ``SAC`` and ``TQC``, it was not optimized anymore (#392) * Fix ent coef loading bug * Add test * Add comment * Reuse save path * Add test for GAE + rename `RolloutBuffer.dones` for clarification (#375) * Fix return computation + add test for GAE * Rename `last_dones` to `episode_starts` for clarification * Revert advantage * Cleanup test * Rename variable * Clarify return computation * Clarify docs * Add multi-episode rollout test * Reformat Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com> * Fixed saving of `A2C` and `PPO` policy when using gSDE (#401) * Improve doc and replay buffer loading * Add support for images * Fix doc * Update Procgen doc * Update changelog * Update docstrings Co-authored-by: Adam Gleave <adam@gleave.me> Co-authored-by: Jacopo Panerati <jacopo.panerati@utoronto.ca> Co-authored-by: Justin Terry <justinkterry@gmail.com> Co-authored-by: Anssi <kaneran21@hotmail.com> Co-authored-by: Tom Dörr <tomdoerr96@gmail.com> Co-authored-by: Tom Dörr <tom.doerr@tum.de> Co-authored-by: Costa Huang <costa.huang@outlook.com> * Update doc and minor fixes * Update doc * Added note about MultiInputPolicy in error of NatureCNN * Merge branch 'master' into feat/dict_observations * Address comments * Naming clarifications * Actually saving the file would be nice * Fix edge case when doing online sampling with HER * Cleanup * Add sanity check Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com> Co-authored-by: Adam Gleave <adam@gleave.me> Co-authored-by: Jacopo Panerati <jacopo.panerati@utoronto.ca> Co-authored-by: Justin Terry <justinkterry@gmail.com> Co-authored-by: Tom Dörr <tomdoerr96@gmail.com> Co-authored-by: Tom Dörr <tom.doerr@tum.de> Co-authored-by: Costa Huang <costa.huang@outlook.com>	2021-05-11 12:29:30 +02:00
Costa Huang	ddbe0e93f9	Support for `VecMonitor` for gym3-style environments (#311 ) * add vectorized monitor * auto format of the code * add documentation and VecExtractDictObs * refactor and add test cases * add test cases and format * avoid circular import and fix doc * fix type * fix type * oops * Update stable_baselines3/common/monitor.py Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Update stable_baselines3/common/monitor.py Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * add test cases * update changelog * fix mutable argument * quick fix * Apply suggestions from code review * fix terminal observation for gym3 envs * delete comment * Update doc and bump version * Add warning when already using `Monitor` wrapper * Update vecmonitor tests * Fixes Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2021-04-13 18:09:31 +02:00
Anssi	f13de5b57b	Add note on loading and resetting environments (#340 ) * Update documentation and changelog * Fix docs * Update examples.rst Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2021-03-05 17:05:14 +01:00
Antonin RAFFIN	d0d55f3767	Beta is over =)! V1.0rc0 (#334 ) * Fix doc + bump version * Removed cmd util * Remove test	2021-03-01 13:35:21 +01:00
M. Ernestus	0c50d75ecb	TD3 Code review (#245 ) * Removed unneeded overrides of feature_extractor and normalize_images in the TD3 Actor. * Add learning rate schedule example (#248) * Add learning rate schedule example * Update docs/guide/examples.rst Co-authored-by: Adam Gleave <adam@gleave.me> * Address comments Co-authored-by: Adam Gleave <adam@gleave.me> * Add supported action spaces checks (#254) * Add supported action spaces checks * Address comment * Use `pass` in an abstractmethod instead of deleting the arguments. * Remove the "deterministic" keyword from the forward method of the TD3 Actor since it always is deterministic anyways. * Rename _get_data to _get_data_to_reconstruct_model. _get_data was too generic and could have meant anything. * Remove the n_episodes_rollout parameter and allow passing tuples as train_freq instead. * Fix docstring of `train_freq` parameter. * Black fixes. * Fix TD3 delayed update + rename `_get_data()` * Fix TD3 test * Normalize `train_freq` to a tuple in the constructor and turn the warning into an assert. * Make one step the default train frequency. * Black fixes. * Change np.bool to bool. * Use the tuple format to specify an amount of steps in terms of steps or episodes in the collect_collouts of the off policy algorithm. * Use the tuple format to specify an amount of steps in terms of steps or episodes in the collect_collouts of HER. * Use named tuple for train freq * Rename train_freq to train_every and TrainFreq to ExperienceDuration. Also add some type annotations and documentation. * Black fixes. * Revert to train_freq * Fix terminal observation issues * Typo * Fix action noise bug in HER * Add assert when loading HER models * Update version Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> Co-authored-by: Adam Gleave <adam@gleave.me>	2021-02-27 17:33:50 +01:00
lorepieri8	ce3f9e3302	Update docs, Provide the env when loading the model (#327 ) (#330 ) * Provide the env when loading the model (#327) * Update docs/misc/changelog.rst Co-authored-by: Anssi <kaneran21@hotmail.com>	2021-02-27 16:24:39 +01:00
Antonin RAFFIN	e747e7e2b3	Add learning rate schedule example (#248 ) * Add learning rate schedule example * Update docs/guide/examples.rst Co-authored-by: Adam Gleave <adam@gleave.me> * Address comments Co-authored-by: Adam Gleave <adam@gleave.me>	2020-12-02 14:54:18 +01:00
Anssi	18d10dbf42	Use Monitor episode reward/length for `evaluate_policy` (#220 ) * Update evaluate_policy to use monitor data if available * Update documentation * Cleaning up * Remove unnecessary typing trickery * Update doc * Rename is_wrapped to clarify it is for vecenvs * Add is_wrapped for regular envs * Add is_wrapped call for subprocvecenv and update code for circular imports * Move new functions back to env_util and fix imports * Update changelog * Clarify evaluate_policy docs * Add tests for wrapped modifying episode lengths * Fix tests * Update changelog * Minor edits * Add warn switch to evaluate_policy and update tests Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-11-16 11:52:28 +01:00
Antonin RAFFIN	897e98c4e2	Update documentation (#199 ) * Update doc and add new example * Add save/load replay buffer example * Add save format + export doc * Add example for get/set parameters * Typos and minor edits * Add results sections * Add note about performance * Add DDPG results * Address comments * Fix grammar/wording Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>	2020-10-28 09:55:16 +01:00
Megan Klaiber	dd6e361204	Implement HER (#120 ) * Added working her version, Online sampling is missing. * Updated test_her. * Added first version of online her sampling. Still problems with tensor dimensions. * Reformat * Fixed tests * Added some comments. * Updated changelog. * Add missing init file * Fixed some small bugs. * Reduced arguments for HER, small changes. * Added getattr. Fixed bug for online sampling. * Updated save/load funtions. Small changes. * Added her to init. * Updated save method. * Updated her ratio. * Move obs_wrapper * Added DQN test. * Fix potential bug * Offline and online her share same sample_goal function. * Changed lists into arrays. * Updated her test. * Fix online sampling * Fixed action bug. Updated time limit for episodes. * Updated convert_dict method to take keys as arguments. * Renamed obs dict wrapper. * Seed bit flipping env * Remove get_episode_dict * Add fast online sampling version * Added documentation. * Vectorized reward computation * Vectorized goal sampling * Update time limit for episodes in online her sampling. * Fix max episode length inference * Bug fix for Fetch envs * Fix for HER + gSDE * Reformat (new black version) * Added info dict to compute new reward. Check her_replay_buffer again. * Fix info buffer * Updated done flag. * Fixes for gSDE * Offline her version uses now HerReplayBuffer as episode storage. * Fix num_timesteps computation * Fix get torch params * Vectorized version for offline sampling. * Modified offline her sampling to use sample method of her_replay_buffer * Updated HER tests. * Updated documentation * Cleanup docstrings * Updated to review comments * Fix pytype * Update according to review comments. * Removed random goal strategy. Updated sample transitions. * Updated migration. Removed time signal removal. * Update doc * Fix potential load issue * Add VecNormalize support for dict obs * Updated saving/loading replay buffer for HER. * Fix test memory usage * Fixed save/load replay buffer. * Fixed save/load replay buffer * Fixed transition index after loading replay buffer in online sampling * Better error handling * Add tests for get_time_limit * More tests for VecNormalize with dict obs * Update doc * Improve HER description * Add test for sde support * Add comments * Add comments * Remove check that was always valid * Fix for terminal observation * Updated buffer size in offline version and reset of HER buffer * Reformat * Update doc * Remove np.empty + add doc * Fix loading * Updated loading replay buffer * Separate online and offline sampling + bug fixes * Update tensorboard log name * Version bump * Bug fix for special case Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de> Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-10-22 11:56:43 +02:00
Anssi	19c1a89a3a	Rename cmd_util to env_util (#197 ) * Rename cmd_util to env_util * Fix docs and add missing newline * Address comments	2020-10-22 11:05:52 +02:00
Antonin RAFFIN	3756d05f72	Refactored ContinuousCritic for SAC/TD3 (#78 ) * Refactored ContinuousCritic for SAC/TD3 * Address comments * Add pybullet notebook	2020-07-07 01:02:51 +03:00
Noah	96b771f24e	Implement DQN (#28 ) * Created DQN template according to the paper. Next steps: - Create Policy - Complete Training - Debug * Changed Base Class * refactor save, to be consistence with overriding the excluded_save_params function. Do not try to exclude the parameters twice. * Added simple DQN policy * Finished learn and train function - missing correct loss computation * changed collect_rollouts to work with discrete space * moved discrete space collect_rollouts to dqn * basic dqn working * deleted SDE related code * added gradient clipping and moved greedy policy to policy * changed policy to implement target network and added soft update(in fact standart tau is 1 so hard update) * fixed policy setup * rebase target_update_intervall on _n_updates * adapted all tests all tests passing * Move to stable-baseline3 * Fixes for DQN * Fix tests + add CNNPolicy * Allow any optimizer for DQN * added some util functions to create a arbitrary linear schedule, fixed pickle problem with old exploration schedule * more documentation * changed buffer dtype * refactor and document * Added Sphinx Documentation Updated changelog.rst * removed custom collect_rollouts as it is no longer necessary * Implemented suggestions to clean code and documentation. * extracted some functions on tests to reduce duplicated code * added support for exploration_fraction * Fixed exploration_fraction * Added documentation * Fixed get_linear_fn -> proper progress scaling * Merged master * Added nature reference * Changed default parameters to https://www.nature.com/articles/nature14236/tables/1 * Fixed n_updates to be incremented correctly * Correct train_freq * Doc update * added special parameter for DQN in tests * different fix for test_discrete * Update docs/modules/dqn.rst Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Update docs/modules/dqn.rst Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Update docs/modules/dqn.rst Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Added RMSProp in optimizer_kwargs, as described in nature paper * Exploration fraction is inverse of 50.000.000 (total frames) / 1.000.000 (frames with linear schedule) according to nature paper * Changelog update for buffer dtype * standard exlude parameters should be always excluded to assure proper saving only if intentionally included by ``include`` parameter * slightly more iterations on test_discrete to pass the test * added param use_rms_prop instead of mutable default argument * forgot alpha * using huber loss, adam and learning rate 1e-4 * account for train_freq in update_target_network * Added memory check for both buffers * Doc updated for buffer allocation * Added psutil Requirement * Adapted test_identity.py * Fixes with new SB3 version * Fix for tensorboard name * Convert assert to warning and fix tests * Refactor off-policy algorithms * Fixes * test: remove next_obs in replay buffer * Update changelog * Fix tests and use tmp_path where possible * Fix sampling bug in buffer * Do not store next obs on episode termination * Fix replay buffer sampling * Update comment * moved epsilon from policy to model * Update predict method * Update atari wrappers to match SB2 * Minor edit in the buffers * Update changelog * Merge branch 'master' into dqn * Update DQN to new structure * Fix tests and remove hardcoded path * Fix for DQN * Disable memory efficient replay buffer by default * Fix docstring * Add tests for memory efficient buffer * Update changelog * Split collect rollout * Move target update outside `train()` for DQN * Update changelog * Update linear schedule doc * Cleanup DQN code * Minor edit * Update version and docker images Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-06-29 11:16:54 +02:00
Antonin RAFFIN	a861f33107	Update notebooks (#65 )	2020-06-17 12:47:09 +02:00
Antonin RAFFIN	3a5b743760	Update doc	2020-05-19 10:40:52 +02:00
Antonin RAFFIN	a06c4a7859	Update PyBullet example	2020-05-09 14:38:57 +02:00
Antonin RAFFIN	623f821571	Update examples	2020-05-08 12:14:33 +02:00
Antonin RAFFIN	e76181f9b5	Update doc (add rl zoo)	2020-05-08 11:58:43 +02:00

1 2

52 commits