stable-baselines3

mirror of https://github.com/saymrwulf/stable-baselines3.git synced 2026-05-31 23:28:05 +00:00

Author	SHA1	Message	Date
Antonin RAFFIN	daaebd0a52	Drop python 3.8 and add python 3.12 support (#2041 ) * Drop python 3.8 support, add python 3.12 support * Upgrade to python 3.9 syntax * Fixes for Numpy v2 * Fix doc warning	2024-11-18 15:40:36 +01:00
will-maclean	4efee92fba	Set CallbackList children's parent correctly (#1939 ) * Fixing #1791 * Update test and version * Add test for callback after eval * Fix mypy error * Remove tqdm warnings --------- Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2024-06-07 14:07:28 +02:00
Andrew James	766b9e9f7d	Avoid torch type-error under torch.compile (#1922 ) * Avoid torch type-error under torch.compile * Update changelog and version * Update stable_baselines3/common/buffers.py Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> --------- Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2024-05-13 17:28:23 +02:00
Jan-Hendrik Ewers	2ddf015cd9	fix: Follow PEP8 guidelines and evaluate falsy to truthy with `not` rather than `is False`. (#1707 ) * fix: Follow PEP8 guidelines and evaluate falsy to truth with `not` rather than `is False`. https://docs.python.org/2/library/stdtypes.html#truth-value-testing * chore: Update changelog inline with intent of changes in PR #1707 Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * fix: Change `is False` to `not` as per PEP8 * chore: Remove superfluous comment about `is False` * test: One On- and one Off-Policy algorithm (A2C and SAC respectively), with settings to speed up testing * Update changelog * chore: Remove EvalCallback as it's not actually required * Update changelog.rst * Rm duplicated "others" section in changelog.rst --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2023-10-09 12:21:12 +02:00
Antonin RAFFIN	c6c660e51b	Fix type annotations of buffers (#1700 ) * Fix type annotation and replay buffer * Exclude pytype check * Remove some pytype specific annotaiton and update changelog * Fix HerReplayBuffer type hints * try remove # type: ignore[assignment] * revert change --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2023-09-28 18:52:46 +02:00
Tobias Rohrer	ba77dd7c61	Fix to use float64 actions for off policy algorithms (#1572 ) * Added test cases where off policy algorithms fail with float64 actionspace * casting observations and actions to `np.float32` to unify behaviour between `ReplayBuffer` and `RolloutBuffer`. Fixing issue #1145 * reformatted using black * making test more restrictive by checking models action is float64 * added changelog entry * undo cast of observations as `preprocessing.preprocess_obs()` casts them to float32 anyways. * - Casting to float32 only, if action.dtype is float64 - Added cast to `DictReplayBuffer` as well * Added tests for multiple variations of continuous action types and observation spaces * applied reformatting by `make commit-checks` * Added typing and comment referring to description in merge request * Apply linter for single element slice * Rename helper and refactor tests * Update changelog and docstring --------- Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2023-07-24 16:38:03 +02:00
Antonin RAFFIN	63a0bb9da1	Type annotation bundle (logger, vec env, custom envs) (#1479 ) * Switch from List to Sequence for `seed()` type hint * Fix logger type hints * Improve replay buffer type hints * Fix custom envs type annotations * Fix VecMonitor type hints * Fix RMSprop type hint * Fix vec extract dict obs type hints * Fix vec frame stack type annotations * Fix base vec env type hints * Fix dummy vec env type hints * Fix for mypy * Fixes for the tests * mypy doesn't like when we overwrite type * fix step of SimpleMultiObsEnv * remove useless type specification * Rm useless type hint * Improve logger type hint * format * rm useless type hint * Re-add variables in constructor, remove unused import --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2023-05-04 20:27:15 +02:00
Antonin RAFFIN	40e0b9d2c8	Add Gymnasium support (#1327 ) * Fix failing set_env test * Fix test failiing due to deprectation of env.seed * Adjust mean reward threshold in failing test * Fix her test failing due to rng * Change seed and revert reward threshold to 90 * Pin gym version * Make VecEnv compatible with gym seeding change * Revert change to VecEnv reset signature * Change subprocenv seed cmd to call reset instead * Fix type check * Add backward compat * Add `compat_gym_seed` helper * Add goal env checks in env_checker * Add docs on HER requirements for envs * Capture user warning in test with inverted box space * Update ale-py version * Fix randint * Allow noop_max to be zero * Update changelog * Update docker image * Update doc conda env and dockerfile * Custom envs should not have any warnings * Fix test for numpy >= 1.21 * Add check for vectorized compute reward * Bump to gym 0.24 * Fix gym default step docstring * Test downgrading gym * Revert "Test downgrading gym" This reverts commit 0072b77156c006ada8a1d6e26ce347ed85a83eeb. * Fix protobuf error * Fix in dependencies * Fix protobuf dep * Use newest version of cartpole * Update gym * Fix warning * Loosen required scipy version * Scipy no longer needed * Try gym 0.25 * Silence warnings from gym * Filter warnings during tests * Update doc * Update requirements * Add gym 26 compat in vec env * Fixes in envs and tests for gym 0.26+ * Enforce gym 0.26 api * format * Fix formatting * Fix dependencies * Fix syntax * Cleanup doc and warnings * Faster tests * Higher budget for HER perf test (revert prev change) * Fixes and update doc * Fix doc build * Fix breaking change * Fixes for rendering * Rename variables in monitor * update render method for gym 0.26 API backwards compatible (mode argument is allowed) while using the gym 0.26 API (render mode is determined at environment creation) * update tests and docs to new gym render API * undo removal of render modes metatadata check * set rgb_array as default render mode for gym.make * undo changes & raise warning if not 'rgb_array' * Fix type check * Remove recursion and fix type checking * Remove hacks for protobuf and gym 0.24 * Fix type annotations * reuse existing render_mode attribute * return tiled images for 'human' render mode * Allow to use opencv for human render, fix typos * Add warning when using non-zero start with Discrete (fixes #1197) * Fix type checking * Bug fixes and handle more cases * Throw proper warnings * Update test * Fix new metadata name * Ignore numpy warnings * Fixes in vec recorder * Global ignore * Filter local warning too * Monkey patch not needed for gym 26 * Add doc of VecEnv vs Gym API * Add render test * Fix return type * Update VecEnv vs Gym API doc * Fix for custom render mode * Fix return type * Fix type checking * check test env test_buffer * skip render check * check env test_dict_env * test_env test_gae * check envs in remaining tests * Update tests * Add warning for Discrete action space with non-zero (#1295) * Fix atari annotation * ignore get_action_meanings [attr-defined] * Fix mypy issues * Add patch for gym/gymnasium transition * Switch to gymnasium * Rely on signature instead of version * More patches * Type ignore because of https://github.com/Farama-Foundation/Gymnasium/pull/39 * Fix doc build * Fix pytype errors * Fix atari requirement * Update env checker due to change in dtype for Discrete * Fix type hint * Convert spaces for saved models * Ignore pytype * Remove gitlab CI * Disable pytype for convert space * Fix undefined info * Fix undefined info * Upgrade shimmy * Fix wrappers type annotation (need PR from Gymnasium) * Fix gymnasium dependency * Fix dependency declaration * Cap pygame version for python 3.7 * Point to master branch (v0.28.0) * Fix: use main not master branch * Rename done to terminated * Fix pygame dependency for python 3.7 * Rename gym to gymnasium * Update Gymnasium * Fix test * Fix tests * Forks don't have access to private variables * Fix linter warnings * Update read the doc env * Fix env checker for GoalEnv * Fix import * Update env checker (more info) and fix dtype * Use micromamab for Docker * Update dependencies * Clarify VecEnv doc * Fix Gymnasium version * Copy file only after mamba install * [ci skip] Update docker doc * Polish code * Reformat * Remove deprecated features * Ignore warning * Update doc * Update examples and changelog * Fix type annotation bundle (SAC, TD3, A2C, PPO, base class) (#1436) * Fix SAC type hints, improve DQN ones * Fix A2C and TD3 type hints * Fix PPO type hints * Fix on-policy type hints * Fix base class type annotation, do not use defaults * Update version * Disable mypy for python 3.7 * Rename Gym26StepReturn * Update continuous critic type annotation * Fix pytype complain --------- Co-authored-by: Carlos Luis <carlos.luisgonc@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Thomas Lips <37955681+tlpss@users.noreply.github.com> Co-authored-by: tlips <thomas.lips@ugent.be> Co-authored-by: tlpss <thomas17.lips@gmail.com> Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>	2023-04-14 13:13:59 +02:00
Omar Younis	a60b0179e0	Fix: Reshape action in DictRolloutBuffer (#1395 ) * reshape action in DictRolloutBuffer * improve buffer test * update changelog * add comment * Update comments and version --------- Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>	2023-03-29 16:25:05 +02:00
Antonin RAFFIN	470771b5c2	Fix Atari Roms download, enable RUF linting (#1379 ) * Add extra no Atari and fix CI for forks * Enable ruff rules * Change to no roms	2023-03-12 18:47:52 +01:00
Quentin Gallouédec	82bc63fca4	Upgrade black formatting (#1310 ) * apply black * Reformat tests --------- Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2023-02-02 11:58:41 +01:00
Alex Pasquali	b702884c23	Removed shared layers in mlp_extractor (#1292 ) * Modified actor-critic policies & MlpExtractor class ActorCriticPolicy: - changed type hint of net_arch param: now it's a dict - removed check that if features extractor is not shared: no shared layers are allowed in the mlp_extractor regardless of the features extractor ActorCriticCnnPolicy: - changed type hint of net_arch param: now it's a dict MultiInputActorcriticPolicy: - changed type hint of net_arch param: now it's a dict MlpExtractor: - changed type hint of net_arch param: now it's a dict - adapted networks creation - adapted methods: forward, forward_actor & forward_critic * Removed shared layers in mlp_extractor * Updated docs and changelog + reformat * Updated custom policy tests * Removed test on deprecation warning for share layers in mlp_extractor Now shared layers are removed * Update version * Update RL Zoo doc * Fix linter warnings * Add ruff to Makefile (experimental) * Add backward compat code and minor updates * Update tests * Add backward compatibility * Fix test * Improve compat code Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2023-01-23 14:55:19 +01:00
Quentin Gallouédec	69fdf155e1	Downgrade `sphinx-autodoc-typehints` (#1291 ) * Update setup.py * black * hotfix pytype	2023-01-23 10:56:45 +01:00
Quentin Gallouédec	68a40e0940	Construct tensors directly on GPU (#1218 ) * Replace .to(device) when possible * fix numpy dep * black * Add warning for device != cpu and copy=False * Update changelog * Remove warning * Update buffers.py	2022-12-19 12:50:22 +01:00
Dominic Kerr	899eee6bd4	Automatically create missing directories of ``filenames passed to` `ResultsWriter`` (#1072 ) * Create (if any) missing filename directories, passed into ResultsWriter * Fixed incorrect ``filename`` docstring (if ``filename`` where ``None``, the string method ``filename.endswith(Monitor.EXT)`` would raise an ``AttributeError``), and renamed ``reset_keywords`` docstring. * Added description of #1068 * Ignore pytype errors * Update changelog.rst Co-authored-by: dominicgkerr <dominicgkerr1@gmail.co> Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de> Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2022-09-21 13:14:38 +02:00
Quentin Gallouédec	73822c34da	Support for `device=auto` buffers and set it as default value (#1009 ) * Default device is "auto" for buffer + auto device support in BufferBaseClass * Update docstring * Update tests * Unify tests * Update changelog * Fix tests on CUDA device Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>	2022-08-16 17:54:55 +02:00
Quentin Gallouédec	c4f54fcf04	Handling multi-dimensional action spaces (#971 ) * Handle non 1D action shape * Revert changes of observation (out of the scope of this PR) * Apply changes to DictReplayBuffer * Update tests * Rollout buffer n-D actions space handling * Remove error when non 1D action space * ActorCriticPolicy return action with the proper shape * remove useless reshape * Update changelog * Add tests Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2022-08-06 14:19:20 +02:00
Antonin RAFFIN	c1f1c3d3d7	Release v1.6.0 (#958 ) * Release v1.6.0 + update doc + add copy button * Update read the doc conda env * Update year * Fix bug in kl divergence check * Rephrase requirement for envpool and isaac gym	2022-07-12 22:50:23 +02:00
Max Weltevrede	ef10189d80	Prohibit simultaneous use of optimize_memory_usage and handle_timeout_termination (#948 ) * Prohibit simultaneous use of optimize_memory_buffer and handle_timeout_termination * Modify test to avoid unsupported buffer configuration * Change from assertion to raising of ValueError * Update changelog * Update style for consistency * Use handle_timeout_termination when possible Co-authored-by: Anssi <kaneran21@hotmail.com> Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2022-07-04 15:08:54 +02:00
Antonin RAFFIN	a6f5049a99	Upgrade code to Python 3.7+ syntax using `pyupgrade` (#887 ) * Upgrade code to Python 3.7+ syntax * Update changelog	2022-04-25 13:01:38 +03:00
Quentin Gallouédec	13fcb12471	Fix normalization for `DictReplayBuffer` (#744 ) * Normalize samples DictReplayBuffer (#743) * Fixed sample normalization in ``DictReplayBuffer`` (#743) * Test buffer normalization * Rename test replay buffer * Bump version Co-authored-by: Anssi <kaneran21@hotmail.com> Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2022-02-23 13:04:57 +01:00
Antonin RAFFIN	507ed1762e	Multiprocessing support for off policy algorithms (#439 ) * Add multi-env training support for SAC * Fix for dict obs * Pytype fixes * Fix assert on number of envs * Remove for loop * Add support for Dict obs * Start cleanup * Update doc and bug fix * Add support for vectorized action noise and add multi env example for off-policy * Update version * Bug fix with VecNormalize * Update README table * Update variable names * Update changelog and version * Update doc and fix for `gradient_steps=-1` * Add test for `gradient_steps=-1` * Disable pytype pyi errors * Fix for DQN * Update comment on deepcopy * Remove episode_reward field * Fix RolloutReturn * Avoid modification by reference * Fix error message Co-authored-by: Anssi <kaneran21@hotmail.com>	2021-12-01 22:30:09 +01:00
Shyamal H Anadkat	3b68dc7312	Update GAE computation docstring (#655 ) * Fix typo in buffers.py * Revert "Fix typo in buffers.py" This reverts commit ca643d5e3a509ae1b8a65bf0de98f4609ca9d8da. * Ignore pytype errors * Update GAE computation docstring Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2021-11-25 10:53:42 +01:00
Carlo Rizzardo	066e1409d9	Corrected DictReplayBuffer observation dtype #484 (#486 ) * Fix observation buffer dtype in DictReplayBuffer * Formatting fix (line length) * Changelog update, bugfix DictReplaybuffer observations dtype	2021-06-22 13:41:26 +02:00
Jaden Travnik	75b6f3b3b0	Dictionary Observations (#243 ) * First commit * Fixing missing refs from a quick merge from master * Reformat * Adding DictBuffers * Reformat * Minor reformat * added slow dict test. Added SACMultiInputPolicy for future. Added private static image transpose helper to common policy * Ran black on buffers * Ran isort * Adding StackedObservations classes used within VecStackEnvs wrappers. Made test_dict_env shorter and removed slow * Running isort :facepalm * Fixed typing issues * Adding docstrings and typing. Using util for moving data to device. * Fixed trailing commas * Fix types * Minor edits * Avoid duplicating code * Fix calls to parents * Adding assert to buffers. Updating changelong * Running format on buffers * Adding multi-input policies to dqn,td3,a2c. Fixing warnings. Fixed bug with DictReplayBuffer as Replay buffers use only 1 env * Fixing warnings, splitting is_vectorized_observation into multiple functions based on space type * Created envs folder in common. Updated imports. Moved stacked_obs to vec_env folder * Moved envs to envs directory. Moved stacked obs to vec_envs. Started update on documentation * Fixes * Running code style * Update docstrings on torch_layers * Decapitalize non-constant variables * Using NatureCNN architecture in combined extractor. Increasing img size in multi input env. Adding memory reduction in test * Update doc * Update doc * Fix format * Removing NineRoom env. Using nested preprocess. Removing mutable default args * running code style * Passing channel check through to stacked dict observations. * Running black * Adding channel control to SimpleMultiObsEnv. Passing check_channels to CombinedExtractor * Remove optimize memory for dict buffers * Update doc * Move identity env * Minor edits + bump version * Update doc * Fix doc build * Bug fixes + add support for more type of dict env * Fixes + add multi env test * Add support for vectranspose * Fix stacked obs for dict and add tests * Add check for nested spaces. Fix dict-subprocvecenv test * Fix (single) pytype error * Simplify CombinedExtractor * Fix tests * Fix check * Merge branch 'master' into feat/dict_observations * Fix for net_arch with dict and vector obs * Fixes * Add consistency test * Update env checker * Add some docs on dict obs * Update default CNN feature vector size * Refactor HER (#351) * Start refactoring HER * Fixes * Additional fixes * Faster tests * WIP: HER as a custom replay buffer * New replay only version (working with DQN) * Add support for all off-policy algorithms * Fix saving/loading * Remove ObsDictWrapper and add VecNormalize tests with dict * Stable-Baselines3 v1.0 (#354) * Bump version and update doc * Fix name * Apply suggestions from code review Co-authored-by: Adam Gleave <adam@gleave.me> * Update docs/index.rst Co-authored-by: Adam Gleave <adam@gleave.me> * Update wording for RL zoo Co-authored-by: Adam Gleave <adam@gleave.me> * Add gym-pybullet-drones project (#358) * Update projects.rst Added gym-pybullet-drones * Update projects.rst Longer title underline * Update changelog Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org> * Include SuperSuit in projects (#359) * include supersuit * longer title underline * Update changelog.rst * Fix default arguments + add bugbear (#363) * Fix potential bug + add bug bear * Remove unused variables * Minor: version bump * Add code of conduct + update doc (#373) * Add code of conduct * Fix DQN doc example * Update doc (channel-last/first) * Apply suggestions from code review Co-authored-by: Anssi <kaneran21@hotmail.com> * Apply suggestions from code review Co-authored-by: Adam Gleave <adam@gleave.me> Co-authored-by: Anssi <kaneran21@hotmail.com> Co-authored-by: Adam Gleave <adam@gleave.me> * Make installation command compatible with ZSH (#376) * Add quotes * Add Zsh bracket info * Add clarify pip installation line * Make note bold * Add Zsh pip installation note * Add handle timeouts param * Fixes * Fixes (buffer size, extend test) * Fix `max_episode_length` redefinition * Fix potential issue * Add some docs on dict obs * Fix performance bug * Fix slowdown * Add package to install (#378) * Add package to install * Update docs packages installation command Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Fix backward compat + add test * Fix VecEnv detection * Update doc * Fix vec env check * Support for `VecMonitor` for gym3-style environments (#311) * add vectorized monitor * auto format of the code * add documentation and VecExtractDictObs * refactor and add test cases * add test cases and format * avoid circular import and fix doc * fix type * fix type * oops * Update stable_baselines3/common/monitor.py Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Update stable_baselines3/common/monitor.py Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * add test cases * update changelog * fix mutable argument * quick fix * Apply suggestions from code review * fix terminal observation for gym3 envs * delete comment * Update doc and bump version * Add warning when already using `Monitor` wrapper * Update vecmonitor tests * Fixes Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Reformat * Fixed loading of ``ent_coef`` for ``SAC`` and ``TQC``, it was not optimized anymore (#392) * Fix ent coef loading bug * Add test * Add comment * Reuse save path * Add test for GAE + rename `RolloutBuffer.dones` for clarification (#375) * Fix return computation + add test for GAE * Rename `last_dones` to `episode_starts` for clarification * Revert advantage * Cleanup test * Rename variable * Clarify return computation * Clarify docs * Add multi-episode rollout test * Reformat Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com> * Fixed saving of `A2C` and `PPO` policy when using gSDE (#401) * Improve doc and replay buffer loading * Add support for images * Fix doc * Update Procgen doc * Update changelog * Update docstrings Co-authored-by: Adam Gleave <adam@gleave.me> Co-authored-by: Jacopo Panerati <jacopo.panerati@utoronto.ca> Co-authored-by: Justin Terry <justinkterry@gmail.com> Co-authored-by: Anssi <kaneran21@hotmail.com> Co-authored-by: Tom Dörr <tomdoerr96@gmail.com> Co-authored-by: Tom Dörr <tom.doerr@tum.de> Co-authored-by: Costa Huang <costa.huang@outlook.com> * Update doc and minor fixes * Update doc * Added note about MultiInputPolicy in error of NatureCNN * Merge branch 'master' into feat/dict_observations * Address comments * Naming clarifications * Actually saving the file would be nice * Fix edge case when doing online sampling with HER * Cleanup * Add sanity check Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com> Co-authored-by: Adam Gleave <adam@gleave.me> Co-authored-by: Jacopo Panerati <jacopo.panerati@utoronto.ca> Co-authored-by: Justin Terry <justinkterry@gmail.com> Co-authored-by: Tom Dörr <tomdoerr96@gmail.com> Co-authored-by: Tom Dörr <tom.doerr@tum.de> Co-authored-by: Costa Huang <costa.huang@outlook.com>	2021-05-11 12:29:30 +02:00
Antonin RAFFIN	5d47296b8d	Add test for GAE + rename `RolloutBuffer.dones` for clarification (#375 ) * Fix return computation + add test for GAE * Rename `last_dones` to `episode_starts` for clarification * Revert advantage * Cleanup test * Rename variable * Clarify return computation * Clarify docs * Add multi-episode rollout test * Reformat Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>	2021-04-16 15:52:55 +02:00
Antonin RAFFIN	d7c6aff252	Fix discrete obs support (#296 ) * Fixed discrete obs support * Suggest new edit, fix failed test * Revert "Suggest new edit, fix failed test" This reverts commit 6892bf05506bb5ad0e87016d8d382705ab72e6a4. * Fix test * Special case for discrete obs Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>	2021-01-21 02:42:33 +02:00
Stefan Heid	9d463bc476	Small docstring improvements related to the notion of Rollout (#206 ) * Small docstring improvements related to the notion of Rollout * documented changes in changelog.rst, added myself to contributers * Minor edits Co-authored-by: Stefan Heid <stefan.heid@upb.de> Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-11-02 11:45:08 +01:00
Megan Klaiber	dd6e361204	Implement HER (#120 ) * Added working her version, Online sampling is missing. * Updated test_her. * Added first version of online her sampling. Still problems with tensor dimensions. * Reformat * Fixed tests * Added some comments. * Updated changelog. * Add missing init file * Fixed some small bugs. * Reduced arguments for HER, small changes. * Added getattr. Fixed bug for online sampling. * Updated save/load funtions. Small changes. * Added her to init. * Updated save method. * Updated her ratio. * Move obs_wrapper * Added DQN test. * Fix potential bug * Offline and online her share same sample_goal function. * Changed lists into arrays. * Updated her test. * Fix online sampling * Fixed action bug. Updated time limit for episodes. * Updated convert_dict method to take keys as arguments. * Renamed obs dict wrapper. * Seed bit flipping env * Remove get_episode_dict * Add fast online sampling version * Added documentation. * Vectorized reward computation * Vectorized goal sampling * Update time limit for episodes in online her sampling. * Fix max episode length inference * Bug fix for Fetch envs * Fix for HER + gSDE * Reformat (new black version) * Added info dict to compute new reward. Check her_replay_buffer again. * Fix info buffer * Updated done flag. * Fixes for gSDE * Offline her version uses now HerReplayBuffer as episode storage. * Fix num_timesteps computation * Fix get torch params * Vectorized version for offline sampling. * Modified offline her sampling to use sample method of her_replay_buffer * Updated HER tests. * Updated documentation * Cleanup docstrings * Updated to review comments * Fix pytype * Update according to review comments. * Removed random goal strategy. Updated sample transitions. * Updated migration. Removed time signal removal. * Update doc * Fix potential load issue * Add VecNormalize support for dict obs * Updated saving/loading replay buffer for HER. * Fix test memory usage * Fixed save/load replay buffer. * Fixed save/load replay buffer * Fixed transition index after loading replay buffer in online sampling * Better error handling * Add tests for get_time_limit * More tests for VecNormalize with dict obs * Update doc * Improve HER description * Add test for sde support * Add comments * Add comments * Remove check that was always valid * Fix for terminal observation * Updated buffer size in offline version and reset of HER buffer * Reformat * Update doc * Remove np.empty + add doc * Fix loading * Updated loading replay buffer * Separate online and offline sampling + bug fixes * Update tensorboard log name * Version bump * Bug fix for special case Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de> Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-10-22 11:56:43 +02:00
Antonin RAFFIN	fc9527157a	Fix off-by-one GAE computation (#185 ) * Fix off-by-one GAE computation * Fix identity test * Revert gae loop	2020-10-13 00:10:54 +03:00
Antonin RAFFIN	a1e055695c	Improve typing coverage (#175 ) * Improve typing coverage * Even more types * Fixes * Update changelog * Unified docstrings * Improve error messages for unsupported spaces	2020-10-07 10:51:49 +02:00
Antonin RAFFIN	55912576ed	Cleanup docstring types (#169 ) * Cleanup docstring types * Update style * Test with js hack * Revert "Test with js hack" This reverts commit d091f438e8851ab8d01b66628e06a104f5e5ec69. * Fix types * Fix typo * Update CONTRIBUTING example	2020-10-02 20:05:55 +03:00
Antonin RAFFIN	15d32c6a4a	Update black version + update docker image (#151 ) * Update docker image * Update black and reformat	2020-08-27 23:02:59 +02:00
Antonin RAFFIN	23afedb254	Auto-formatting with black and isort (#97 ) * Add auto formatting with black and isort * Reformat code * Ignore typing errors * Add note about line length * Add minimum version for isort * Add commit-checks * Update docker image * Fixed lost import (during last merge) * Fix opencv dependency	2020-07-16 16:12:16 +02:00
Noah	96b771f24e	Implement DQN (#28 ) * Created DQN template according to the paper. Next steps: - Create Policy - Complete Training - Debug * Changed Base Class * refactor save, to be consistence with overriding the excluded_save_params function. Do not try to exclude the parameters twice. * Added simple DQN policy * Finished learn and train function - missing correct loss computation * changed collect_rollouts to work with discrete space * moved discrete space collect_rollouts to dqn * basic dqn working * deleted SDE related code * added gradient clipping and moved greedy policy to policy * changed policy to implement target network and added soft update(in fact standart tau is 1 so hard update) * fixed policy setup * rebase target_update_intervall on _n_updates * adapted all tests all tests passing * Move to stable-baseline3 * Fixes for DQN * Fix tests + add CNNPolicy * Allow any optimizer for DQN * added some util functions to create a arbitrary linear schedule, fixed pickle problem with old exploration schedule * more documentation * changed buffer dtype * refactor and document * Added Sphinx Documentation Updated changelog.rst * removed custom collect_rollouts as it is no longer necessary * Implemented suggestions to clean code and documentation. * extracted some functions on tests to reduce duplicated code * added support for exploration_fraction * Fixed exploration_fraction * Added documentation * Fixed get_linear_fn -> proper progress scaling * Merged master * Added nature reference * Changed default parameters to https://www.nature.com/articles/nature14236/tables/1 * Fixed n_updates to be incremented correctly * Correct train_freq * Doc update * added special parameter for DQN in tests * different fix for test_discrete * Update docs/modules/dqn.rst Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Update docs/modules/dqn.rst Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Update docs/modules/dqn.rst Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Added RMSProp in optimizer_kwargs, as described in nature paper * Exploration fraction is inverse of 50.000.000 (total frames) / 1.000.000 (frames with linear schedule) according to nature paper * Changelog update for buffer dtype * standard exlude parameters should be always excluded to assure proper saving only if intentionally included by ``include`` parameter * slightly more iterations on test_discrete to pass the test * added param use_rms_prop instead of mutable default argument * forgot alpha * using huber loss, adam and learning rate 1e-4 * account for train_freq in update_target_network * Added memory check for both buffers * Doc updated for buffer allocation * Added psutil Requirement * Adapted test_identity.py * Fixes with new SB3 version * Fix for tensorboard name * Convert assert to warning and fix tests * Refactor off-policy algorithms * Fixes * test: remove next_obs in replay buffer * Update changelog * Fix tests and use tmp_path where possible * Fix sampling bug in buffer * Do not store next obs on episode termination * Fix replay buffer sampling * Update comment * moved epsilon from policy to model * Update predict method * Update atari wrappers to match SB2 * Minor edit in the buffers * Update changelog * Merge branch 'master' into dqn * Update DQN to new structure * Fix tests and remove hardcoded path * Fix for DQN * Disable memory efficient replay buffer by default * Fix docstring * Add tests for memory efficient buffer * Update changelog * Split collect rollout * Move target update outside `train()` for DQN * Update changelog * Update linear schedule doc * Cleanup DQN code * Minor edit * Update version and docker images Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-06-29 11:16:54 +02:00
Anssi	b833207142	Add some missing tests, update VecNormalize and RolloutBuffer (#50 ) * Change saving/loading normalization parameters to use single pickle file * Remove 'use_gae' from RolloutBuffer compute_returns function * Add some missing tests for normalizer, nan-checker and PPO clip_value_fn argument * Update changelog * Fix typo * Use proper pytest.raises for catching errors in tests * Add comment on GAE and how to obtain non-GAE behaviour * Remove save/load_running_average from VecNormalize in favor of load/save * Update changelog * Update docstring * Add accidentally removed tests for VecNormalize Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-06-10 12:09:04 +02:00
Roland Gavrilescu	91adefdb4b	Support for MultiBinary / MultiDiscrete spaces (#13 ) * multicategorical dist and test * fixed List annotation * bernoulli dist and test * added distributions to preprocessing (needs testing) * fixed and tested distributions * added changelog and fixed ppo policy * minor fix * dist fixes, added test_spaces * clean up * modified changelog * additional fixes * minor changelog mod * hot encoding fix, flake8 clean up * lint tests * preprocessing fix * fixed bernoulli bug * removed commented prints * Update changelog.rst * included suggested modifications * linting fix * increased space dim * Update doc and tests Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-05-18 14:42:13 +02:00
Antonin RAFFIN	d542732c8d	Rename to stable-baselines3	2020-05-05 15:02:35 +02:00

38 commits