stable-baselines3

mirror of https://github.com/saymrwulf/stable-baselines3.git synced 2026-06-28 03:21:16 +00:00

Author	SHA1	Message	Date
Anssi	37f48aa979	Fix initializing CUDA even when `device="cpu"` is used. (#194 ) * Fall back to 'cpu' device in policies instead of 'auto' * Update changelog	2020-10-18 20:51:56 +02:00
Bernhard Raml	97b81f9e9e	Fix ignoring the exclude in the logger's record function for json, csv and log logging formats (#190 ) * Fix ignoring the exclude in logger record For the logging formats json, csv, and log the exclude parameter of the logger's record function has been ignored. The necessary checks were missing from some of the format writer classes. Regression tests have been added to prevent this error in the future. * Fix docstring for filter_excluded_keys Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Added missing type hints to local functions * Update stable_baselines3/common/logger.py Co-authored-by: Bernhard Raml <raml.bernhard@gmail.com> Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-10-16 17:34:49 +02:00
Wilson	fe6ade3089	Allow env_kwargs in make_vec_env when env ID string supplied (#189 ) * Allow env_kwargs in make_vec_env when env ID string supplied Resolves #188 * Update docs/misc/changelog.rst Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Add test for env kargs in make_vec_env * remove unnecessary args in test_vec_env_kwargs function * Fixes and reformat * Doc fix Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-10-16 11:09:19 +02:00
Antonin RAFFIN	2599f04940	Add custom arch for off-policy actor/critic networks (#182 ) * Add custom arch for off-policy actor/critic networks * Fix type hints * Address comments * Make sure number of updated parameters match in polyak * Add zip_strict for strict-length zipping * Fix building docs * Add test for zip strict * Faster tests Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>	2020-10-13 12:01:33 +02:00
Antonin RAFFIN	fc9527157a	Fix off-by-one GAE computation (#185 ) * Fix off-by-one GAE computation * Fix identity test * Revert gae loop	2020-10-13 00:10:54 +03:00
Antonin RAFFIN	fc6c5d3daa	Migration Guide (#123 ) * Start migration guide * Update guide * Add comment on RMSpropTFLike plus PPO/A2C migrations * Add note about set/get-parameters * Update migration guide * Update changelog and readme * Update doc + clean changelog * Address comments Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>	2020-10-11 23:22:12 +02:00
Antonin RAFFIN	a1e055695c	Improve typing coverage (#175 ) * Improve typing coverage * Even more types * Fixes * Update changelog * Unified docstrings * Improve error messages for unsupported spaces	2020-10-07 10:51:49 +02:00
Antonin RAFFIN	a10e3ae587	Release v0.9.0 (#174 )	2020-10-04 17:12:35 +02:00
Antonin RAFFIN	55912576ed	Cleanup docstring types (#169 ) * Cleanup docstring types * Update style * Test with js hack * Revert "Test with js hack" This reverts commit d091f438e8851ab8d01b66628e06a104f5e5ec69. * Fix types * Fix typo * Update CONTRIBUTING example	2020-10-02 20:05:55 +03:00
Antonin RAFFIN	2c924f52f5	Update docs (custom policy, type hints) (#167 ) * Change import * Update custom policy doc * Re-enable sphinx_autodoc_typehints * Update docker image * Attempt to fix read the doc build error * Add sphinx_autodoc_typehints to read the doc env * Fix pip version * Add full custom policy example * Fix	2020-09-29 20:41:14 +03:00
Antonin RAFFIN	44a723eecb	Fix loading of old versions and update changelog (#165 )	2020-09-24 16:05:36 +02:00
Anssi	9855486488	Get/set parameters and review of saving and loading (#138 ) * Update comments and docstrings * Rename get_torch_variables to private and update docs * Clarify documentation on data, params and tensors * Make excluded_save_params private and update docs * Update get_torch_variable_names to get_torch_save_params for description * Simplify saving code and update docs on params vs tensors * Rename saved item tensors to pytorch_variables for clarity * Reformat * Fix a typo * Add get/set_parameters, update tests accordingly * Use f-strings for formatting * Fix load docstring * Reorganize functions in BaseClass * Update changelog * Add library version to the stored models * Actually run isort this time * Fix flake8 complaints and also fix testing code * Fix isort * ...and black * Fix set_random_seed Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>	2020-09-24 14:28:27 +02:00
mloo3	00595b09d8	Add actor/critic loss logging to td3 (#164 ) * add actor/critic loss logging to td3 * Update changelog.rst Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-09-23 22:40:41 +02:00
Wilson	e908583e2a	Fix type annotation in make_vec_env (#162 ) * Fix type annotation in make_vec_env The variable `vec_env_cls` is a type and not an instance of either DummyVecEnv or SubprocVecEnv * Update changelog.rst	2020-09-23 10:34:35 +02:00
liorcohen5	f5104a5efc	Allow to set a device when loading a model (#154 ) * Added a 'device' keyword argument to BaseAlgorithm.load(). Edited the save and load test to also test the load method with all possible devices. Added the changes to the changelog * improved the load test to ensure that the model loads to the correct device. * improved the test: now the correctness is improved. If the get_device policy would change, it wouldn't break the test. * Update tests/test_save_load.py @araffin's suggestion during the PR process Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Update tests/test_save_load.py Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Bug fixes: when comparing devices, comparing only device type since get_device() doesn't provide device index. Now the code loads all of the model parameters from the saved state dict straight into the required device. (fixed load_from_zip_file). * PR fixes: bug fix - a non-related test failed when running on GPU. updated the assertion to consider only types of devices. Also corrected a related bug in 'get_device()' method. * Update changelog.rst Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-09-20 19:13:18 +02:00
Antonin RAFFIN	583d4b8e41	Minor: fix changelog	2020-09-10 16:56:27 +02:00
Vsevolod Kompantsev	4fd408bec2	Fix PPO logging of clip_fractions (#150 ) * bugfix for PPO logging of clip_fractions * Update changelog.rst Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-09-01 09:52:31 +02:00
Francisco Caio	5fc90a7f7d	Add StopTrainingOnMaxEpisodes to callback collection (#147 ) * Add StopTrainingOnMaxEpisodes class to pre-made callback collection * Adjust instant when counters are incremented for both OnPolicy and OffPolicy algorithms * Improv to StopTrainingOnMaxEpisodes including output, tests and doc * Improv StopTrainingOnMaxEpisodes callback running _init_callback * Update callbacks.py * Update test_callbacks.py * Fix style * Update changelog.rst * Fix test Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>	2020-08-28 11:36:33 +02:00
Antonin RAFFIN	a1afc5e42f	Fix typos in SAC and TD3 (#145 )	2020-08-23 17:44:35 +02:00
Stelios Tymvios	9003a09d5b	Callbacks have access to locals (#115 ) * callbacks have access to locals * changeloc * doc * callbacks have access to locals * changeloc * doc * Added update function for child callbacks * Pre-Release 0.8.0 (#134) * Fix double reset and improve typing coverage (#136) * Fix double reset and improve typing coverage * Revert minor edit * Add doc about types * Update child callbacks * cleaned imports * format * import order * Simplify tests and add comments Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-08-23 14:34:01 +02:00
Sam Toyer	42ef6d4677	Remove "device" argument from policies (#141 ) * Remove device arg from policies * Clean up for PR * Update test and doc * Fix codestyle Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-08-23 13:27:52 +02:00
Antonin RAFFIN	21e9994ff9	Fix double reset and improve typing coverage (#136 ) * Fix double reset and improve typing coverage * Revert minor edit * Add doc about types	2020-08-05 13:12:02 +03:00
Antonin RAFFIN	cceffd5ab2	Pre-Release 0.8.0 (#134 )	2020-08-03 22:38:54 +02:00
Anssi	2cd6a4f93b	Match performance with stable-baselines (discrete case) (#110 ) * Fix storing correct episode dones * Fix number of filters in NatureCNN network * Add TF-like RMSprop for matching performance with sb2 * Remove stuff that was accidentally included * Reformat * Clarify variable naming * Update changelog * Add comment on RMSprop implementations to A2C * Add test for RMSpropTFLike Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-08-03 22:22:51 +02:00
RaphaelWag	3253ee11e7	Update custom_policy.rst (#125 ) * Update custom_policy.rst Fixed Typo * Update changelog.rst Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-07-31 11:10:48 +02:00
Anssi	77cb3dd0ab	Separate feature extractor networks for DQN networks (#132 ) * Separate feature extractor networks for DQN networks * [ci skip] Bump version Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-07-30 20:48:30 +02:00
Andy Shih	8f9aaaebe9	fix approximate entropy calculation in PPO and A2C (#130 )	2020-07-29 21:19:41 +02:00
rk37	bd2aae0c27	Fix ortho init when `bias=False` with custom policy (#126 ) * Update policies.py fix AttributeError occurred when use "bias=False" linear layer in custom FeaturesExtractor #124 * Update changelog.rst update the changelog accordingly * Update changelog.rst Co-authored-by: Kong Lingchao <konglingchao@gmail.com> Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-07-25 22:35:48 +02:00
Steven H. Wang	83530560b5	Fix CloudpickleWrapper load (#118 ) * CloudpickleWrapper: Load using cloudpickle * Update changelog	2020-07-21 10:12:39 +02:00
Stelios Tymvios	dbe8cfceb6	Optimized polyak updates (#106 ) * quick polyak updates * changelog * typing * reverted autoformatting * rerverted autofmt * Update stable_baselines3/common/utils.py Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * parameter names in test * cleanup * Merge branch 'master' into polyak * Update changelog * Apply suggestions from code review Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Update stable_baselines3/common/utils.py * Update utils.py Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-07-17 15:53:28 +02:00
Antonin RAFFIN	23afedb254	Auto-formatting with black and isort (#97 ) * Add auto formatting with black and isort * Reformat code * Ignore typing errors * Add note about line length * Add minimum version for isort * Add commit-checks * Update docker image * Fixed lost import (during last merge) * Fix opencv dependency	2020-07-16 16:12:16 +02:00
Antonin RAFFIN	5ff176b2f1	Implement DDPG (#92 ) * Add DDPG + TD3 with any number of critics * Allow any number of critics for SAC * Update doc * [ci skip] Update DDPG example * Remove unused parameter * Add DDPG to identity test * Fix computation with n_critics=1,3 * Update doc * Apply suggestions from code review Co-authored-by: Adam Gleave <adam@gleave.me> * Update docstrings for off-policy algos * Add check for sde Co-authored-by: Adam Gleave <adam@gleave.me>	2020-07-16 14:14:22 +02:00
Antonin RAFFIN	208890dfc8	Ignore errors from new pytype version (#107 )	2020-07-16 11:54:37 +02:00
Joel Joseph	3cf6e9714b	Update ppo.rst (#94 ) * Update ppo.rst minor correction from A2C to PPO * Update changelog.rst * Update changelog.rst Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-07-10 10:38:35 +02:00
Adam Gleave	e7130344de	Add changelog entry	2020-07-07 19:03:46 -07:00
Antonin RAFFIN	3756d05f72	Refactored ContinuousCritic for SAC/TD3 (#78 ) * Refactored ContinuousCritic for SAC/TD3 * Address comments * Add pybullet notebook	2020-07-07 01:02:51 +03:00
Stelios Tymvios	4aa66ed34a	Automatically create paths for saved objects (#80 ) * automatically create paths for saved objects * Minor Corrections, more tests * linting * typing * Correct mode checking * corrected tests to reflect new verbose functionality	2020-07-03 01:14:21 +03:00
Marios Koulakis	7d8ebb9e98	Udacity Reacher Project with Unity (#79 ) * Add the reacher project to the sample projects * Update the change log * Remove github incompatible link notation * Update changelog.rst Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-06-30 15:03:02 +02:00
Antonin RAFFIN	08e7519381	Fix q-target in SAC (#77 ) * Fix q-target in SAC * [ci skip] Update version	2020-06-29 17:58:55 +02:00
Noah	96b771f24e	Implement DQN (#28 ) * Created DQN template according to the paper. Next steps: - Create Policy - Complete Training - Debug * Changed Base Class * refactor save, to be consistence with overriding the excluded_save_params function. Do not try to exclude the parameters twice. * Added simple DQN policy * Finished learn and train function - missing correct loss computation * changed collect_rollouts to work with discrete space * moved discrete space collect_rollouts to dqn * basic dqn working * deleted SDE related code * added gradient clipping and moved greedy policy to policy * changed policy to implement target network and added soft update(in fact standart tau is 1 so hard update) * fixed policy setup * rebase target_update_intervall on _n_updates * adapted all tests all tests passing * Move to stable-baseline3 * Fixes for DQN * Fix tests + add CNNPolicy * Allow any optimizer for DQN * added some util functions to create a arbitrary linear schedule, fixed pickle problem with old exploration schedule * more documentation * changed buffer dtype * refactor and document * Added Sphinx Documentation Updated changelog.rst * removed custom collect_rollouts as it is no longer necessary * Implemented suggestions to clean code and documentation. * extracted some functions on tests to reduce duplicated code * added support for exploration_fraction * Fixed exploration_fraction * Added documentation * Fixed get_linear_fn -> proper progress scaling * Merged master * Added nature reference * Changed default parameters to https://www.nature.com/articles/nature14236/tables/1 * Fixed n_updates to be incremented correctly * Correct train_freq * Doc update * added special parameter for DQN in tests * different fix for test_discrete * Update docs/modules/dqn.rst Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Update docs/modules/dqn.rst Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Update docs/modules/dqn.rst Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Added RMSProp in optimizer_kwargs, as described in nature paper * Exploration fraction is inverse of 50.000.000 (total frames) / 1.000.000 (frames with linear schedule) according to nature paper * Changelog update for buffer dtype * standard exlude parameters should be always excluded to assure proper saving only if intentionally included by ``include`` parameter * slightly more iterations on test_discrete to pass the test * added param use_rms_prop instead of mutable default argument * forgot alpha * using huber loss, adam and learning rate 1e-4 * account for train_freq in update_target_network * Added memory check for both buffers * Doc updated for buffer allocation * Added psutil Requirement * Adapted test_identity.py * Fixes with new SB3 version * Fix for tensorboard name * Convert assert to warning and fix tests * Refactor off-policy algorithms * Fixes * test: remove next_obs in replay buffer * Update changelog * Fix tests and use tmp_path where possible * Fix sampling bug in buffer * Do not store next obs on episode termination * Fix replay buffer sampling * Update comment * moved epsilon from policy to model * Update predict method * Update atari wrappers to match SB2 * Minor edit in the buffers * Update changelog * Merge branch 'master' into dqn * Update DQN to new structure * Fix tests and remove hardcoded path * Fix for DQN * Disable memory efficient replay buffer by default * Fix docstring * Add tests for memory efficient buffer * Update changelog * Split collect rollout * Move target update outside `train()` for DQN * Update changelog * Update linear schedule doc * Cleanup DQN code * Minor edit * Update version and docker images Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-06-29 11:16:54 +02:00
JieQiang (Jay) Wei	e47da426c1	Update rl_zoo.rst (#72 ) * Update rl_zoo.rst a typo fixed. * Update changelog.rst Fixed a typo in zoo readme. * Update changelog.rst Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-06-25 12:14:56 +02:00
Matthias K	977a615c82	Fixed SubprocVecEnv close. (#68 ) Updated changelog. Co-authored-by: Matthias K <wirspielen@web.de>	2020-06-20 18:01:37 +02:00
Tirafesi	644d2c17ac	save_replay_buffer now receives as argument the file path instead of the folder path (#63 ) * save_replay_buffer now receives as argument the file path instead of the folder path * Update changelog.rst Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-06-17 14:00:49 +02:00
Antonin RAFFIN	a861f33107	Update notebooks (#65 )	2020-06-17 12:47:09 +02:00
Antonin RAFFIN	494ebfd20a	Hotfix PPO + gSDE (#53 ) * Fix variable being passed with gradients * Update changelog * Bump version * Fixes #54	2020-06-10 18:58:35 +02:00
Anssi	b833207142	Add some missing tests, update VecNormalize and RolloutBuffer (#50 ) * Change saving/loading normalization parameters to use single pickle file * Remove 'use_gae' from RolloutBuffer compute_returns function * Add some missing tests for normalizer, nan-checker and PPO clip_value_fn argument * Update changelog * Fix typo * Use proper pytest.raises for catching errors in tests * Add comment on GAE and how to obtain non-GAE behaviour * Remove save/load_running_average from VecNormalize in favor of load/save * Update changelog * Update docstring * Add accidentally removed tests for VecNormalize Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-06-10 12:09:04 +02:00
Anssi	44f8218df0	Review of code (A2C, PPO and refactoring) (#35 ) * Split torch module code into torch_layers file * Updated reference to CNN * Change 'CxWxH' to 'CxHxW', as per common notion * Fix missing import in policies.py * Move PPOPolicy to OnlineActorCriticPolicy * Create OnPolicyRLModel from PPO, and make A2C and PPO inherit * Update A2C optimizer comment * Clean weight init scales for clarity * Fix A2C log_interval default parameter * Rename 'progress' to 'progress_remaining * Rename 'Models' to 'Algorithms' * Rename 'OnlineActorCriticPolicy' to 'ActorCriticPolicy' * Move static functions out from BaseAlgorithm * Move on/off_policy base algorithms to their own files * Add files for A2C/PPO * Fix docs * Fix pytype * Update documentation on OnPolicyAlgorithm * Add proper doctstring for on_policy rollout gathering * Add bit clarification on the mlppolicy/cnnpolicy naming * Move static function is_vectorized_policies to utils.py * Checking docstrings, pep8 fixes * Update changelog * Clean changelog * Remove policy warnings for sac/td3 * Add monitor_wrapper for OnPolicyAlgorithm. Clean tb logging variables. Add parameter keywords to OffPolicyAlgorithm super init Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-06-09 13:54:18 +02:00
Antonin RAFFIN	11d33eb4ae	Fix gSDE loading issue in test mode (#45 ) * Fix gSDE loading issue in test mode * Forward `reset_noise` method * Re-add `make_actor` * Reformat	2020-06-08 11:15:10 +02:00
Antonin RAFFIN	353ea81080	Fix several VecEnv issues, add `fork` start method to tests (#43 ) * Fix several VecEnv issues, add `fork` start method to tests * Fix signature	2020-06-04 11:22:12 +02:00
Antonin RAFFIN	403fff5d50	Pre-Release v0.6.0 (#39 ) * Prepare release * Update docker images	2020-06-01 13:09:47 +02:00
Roland Gavrilescu	bb01253261	Tensorboard integration (#30 ) * init commit tensorboard-integration * Added tb logger to ppo (with output exclusions) * fixed truncated stdout * categorize stdout outputs by tag * separated exclusions from values, added missing logs * saving exclusions as dict instead of list * reformatting, auto run indexing * included renaming suggestions, fixed tests * tb support for sac * linting * moved logging to base class * tb support for td3 * removed histograms, non-verbose output working * modifed changelog * linting * fixed type error * moved logger config to utils * removed episode_rewards log from ppo * Enable tensorboard in tests * Remove unused import * Update logger sub titles * Minor edit for PPO * Update logger and tb log folder * Pass correct logger to Callbacks * updated docs * added tb example image to docs * add support for continuing training in tensorboard * added tensorboard to docs index * added tb test * moved logger config to _setup_learn, updated tests * accessing verbose from base class * Update doc and tests * Rename session -> time * Update version * Update logger truncate * Update types * Remove duplicated code Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-06-01 11:55:44 +02:00
mloo3	42f432c79c	Fix TD3 Example Code Documentation (#38 ) Fix TD3's example code	2020-06-01 11:37:42 +03:00
Stelios Tymvios	78e8d405d7	Implemented Vectorized Action Noise (#34 ) * Implemented Vectorized Action Noise Vectorized Action Noise allows for multiple instances of ActionNoiseProcesses to run in parallel. This makes it easier to run TD3/SAC/DDPG with VecEnv. * fixed linting issues * make test function name consistent Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * sanity checks and more detailed test * Update stable_baselines3/common/noise.py Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Added assertion error message in noises setter * Corrected tests to reflect change to AssertionError from ValueError Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-05-27 09:53:01 +02:00
Antonin RAFFIN	9b42b9717a	Fix `sde_sample_freq` for SAC (#32 ) * Fix `sde_sample_freq` for SAC * [ci skip] Add Acknowledgments	2020-05-24 16:44:44 +02:00
Tarik Kelestemur	b1322ff5d6	Fix cmd_util.py imports (#24 ) * fix cmd_util.py imports * Update changelog.rst Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-05-19 10:19:16 +02:00
Roland Gavrilescu	91adefdb4b	Support for MultiBinary / MultiDiscrete spaces (#13 ) * multicategorical dist and test * fixed List annotation * bernoulli dist and test * added distributions to preprocessing (needs testing) * fixed and tested distributions * added changelog and fixed ppo policy * minor fix * dist fixes, added test_spaces * clean up * modified changelog * additional fixes * minor changelog mod * hot encoding fix, flake8 clean up * lint tests * preprocessing fix * fixed bernoulli bug * removed commented prints * Update changelog.rst * included suggested modifications * linting fix * increased space dim * Update doc and tests Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-05-18 14:42:13 +02:00
Antonin RAFFIN	15ff6d47ee	Documentation update and style fixes (#21 ) * Update doc: add gSDE * Fix codestyle * Remove travis script * Add lint check to gitlab	2020-05-15 13:54:06 +02:00
Antonin RAFFIN	54f6f5b6fb	Add flake8 linter and Github CI (#19 ) * Cleanup code * Add flake8 lint and github workflow * Update build matrix * Relax precision for python3.7	2020-05-12 17:55:01 +02:00
Antonin RAFFIN	b02afd6ee3	Doc update (#15 )	2020-05-11 12:28:43 +02:00
Antonin RAFFIN	257a40ef4b	Add Gitlab CI (#12 ) * Test gitlab-ci * Try different image * Add pytest and doc build * Fix command * Fix image used for CI * Seperate pytest builds * Fix weird seg fault in docker image due to FakeImageEnv * Fix make command * [ci skip] Add space in the badges * Fix CI failures * Re-install opencv * Use opencv-headless * Test with new docker image	2020-05-09 23:10:49 +02:00
Kinal Mehta	b1f5db1bb2	Add CONTRIBUTION.md link in README (#2 ) * Fix CONTRIBUTION.md link in README * Update changelog.rst Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-05-09 13:01:15 +02:00
Antonin RAFFIN	c20af230f7	Remove SDE support for TD3	2020-05-08 15:00:34 +02:00
Antonin RAFFIN	97aea21349	Update minimum gym version	2020-05-08 12:43:42 +02:00
Antonin RAFFIN	e6ff4bbd6c	Update setup	2020-05-07 16:24:19 +02:00
Antonin RAFFIN	aa66012764	Update requirements	2020-05-07 16:21:33 +02:00
Antonin RAFFIN	8046a24719	More doc + sync VecEnvs + atari	2020-05-07 16:08:23 +02:00
Antonin RAFFIN	73afaf157c	Add version.txt to package	2020-05-07 12:19:29 +02:00
Antonin RAFFIN	d17f29c8ad	Add base doc	2020-05-07 10:10:51 +02:00
Antonin RAFFIN	580317158b	Update changelog	2020-05-05 17:21:56 +02:00
Antonin RAFFIN	0481fbe727	Update changelog	2020-05-05 16:54:33 +02:00
Antonin RAFFIN	2c34a4d694	Sync with Stable-Baselines	2020-05-05 16:28:38 +02:00
Antonin RAFFIN	d542732c8d	Rename to stable-baselines3	2020-05-05 15:02:35 +02:00
Antonin RAFFIN	88cee2ba55	Add type hints and f-strings to logger	2020-05-05 14:49:32 +02:00
Antonin RAFFIN	041f2bc59a	Cleanup, bug fixes + more tests	2020-04-22 13:14:22 +02:00
Antonin RAFFIN	8aac9e819d	Add `VecTransposeImage` and fix for SAC	2020-04-21 20:41:58 +02:00
Antonin RAFFIN	93c2a01f91	Start CNN support (failing for SAC)	2020-04-21 16:22:46 +02:00
Antonin RAFFIN	f347474e6a	Independent save/load for policies	2020-04-20 15:59:44 +02:00
Antonin RAFFIN	17f9246257	Add `get_device` util and fix `squash_output`	2020-04-20 15:43:11 +02:00
Antonin RAFFIN	aa1026ee87	Added ``optimizer` `and` `optimizer_kwargs` `to` `policy_kwargs``	2020-04-17 15:13:45 +02:00
Antonin RAFFIN	0e44cdce44	Fixed ``reset_num_timesteps`` behavior	2020-04-17 12:36:27 +02:00
Antonin RAFFIN	08a22c4834	Release 0.4.0	2020-04-14 18:13:51 +02:00
Antonin RAFFIN	fdecd512db	Add save/load weights for policies and refactor action distributions	2020-03-31 16:29:13 +02:00
Antonin RAFFIN	fa599c65a6	Add support for Discrete observation spaces	2020-03-25 16:42:05 +01:00
Antonin RAFFIN	72a88a8d92	Fix type hint for activation fn	2020-03-24 10:10:37 +01:00
Antonin RAFFIN	ba18258af6	Add proper preprocessing	2020-03-23 17:15:30 +01:00
Antonin RAFFIN	dcb54b5301	Remove CEMRL	2020-03-23 14:48:38 +01:00
Antonin RAFFIN	57b37513b6	Refactor handling of obs and action space + remove duplicated code	2020-03-20 10:09:09 +01:00
Antonin RAFFIN	7251b9d2c2	Release v0.3.0	2020-03-19 11:11:36 +01:00
Antonin RAFFIN	fd9e73cfb8	Fix entropy computation	2020-03-19 10:19:48 +01:00
Antonin RAFFIN	9485b90a41	Sync predict with SB and add version file	2020-03-18 15:11:19 +01:00
Antonin RAFFIN	c3187604bc	Code cleanup: rename lr to lr_schedule + typing	2020-03-16 14:01:32 +01:00
Antonin Raffin	29d7018265	Add better logging for SAC and PPO	2020-03-13 11:43:12 +01:00
Antonin Raffin	c39421fa64	Fix colors in results plotter	2020-03-13 10:59:16 +01:00
Antonin Raffin	b64873ffff	Sync callbacks	2020-03-12 12:34:25 +01:00
Antonin Raffin	037986a91d	Add test for `expln`	2020-03-11 16:35:13 +01:00
Antonin Raffin	6ebad92e1b	Remove default seed and bump dependencies	2020-03-10 17:43:54 +01:00
Antonin Raffin	20ee8cb68d	Update changelog and add more namedtuples	2020-03-10 16:55:13 +01:00
Antonin Raffin	1e81f38d66	Update changelog	2020-03-09 19:05:22 +01:00
Antonin Raffin	26ccf499b3	Use normal sampling for SAC	2020-02-21 14:50:28 +01:00
Antonin Raffin	809a3d3d38	Release 0.2.0	2020-02-14 14:39:24 +01:00
Antonin Raffin	8b559d71ab	Remove deprecated monitor format and improve tests	2020-02-14 13:42:16 +01:00
Antonin Raffin	f1a4fa2d3f	Improve predict method	2020-02-12 15:25:05 +01:00
Antonin Raffin	9caea35a11	Add results plotter	2020-02-12 14:31:15 +01:00
Antonin Raffin	7bafdb3a67	Add `get_vec_normalize_env()`	2020-02-12 11:34:29 +01:00
Antonin Raffin	2ce31c1e21	Fix entropy loss for squashed Gaussian and VecEnv seeding	2020-02-11 17:22:03 +01:00
Antonin Raffin	b7dcc8d58e	Add extend method	2020-02-11 16:40:44 +01:00
Antonin Raffin	75a86881b3	Add save/load for replay buffer	2020-02-05 13:10:02 +01:00
Antonin Raffin	c2318149dd	Update changelog and version	2020-02-03 15:50:40 +01:00
Antonin Raffin	5d4e73544c	Fix `reset_num_timesteps`	2020-01-31 13:16:28 +01:00
Antonin Raffin	6d59bfd4a0	Merge branch 'master' into feat/callbacks	2020-01-31 13:09:55 +01:00
Dormann, Noah	1f0dd60b97	Fix saving on GPU - Loading on CPU (#45 ) * removed policy from save, changed th.loads to map to device * found hack: catch pickle exception and trying th.load with mapping instead, otherwise raise exception with more information -> loading cuda on cpu raises exception -> leads to th.load with map being called * deleted todo * updated changelog * start of saving refactor * first working c * all tests pass, save refactored * - backwards compatibilty not always - make pytest all passing - make typing all passing * Fixes and simplify the save method * Remove unused param * Fix backward compat * Fix docstring	2020-01-31 13:06:55 +01:00
Antonin Raffin	98037352f5	Update changelog	2020-01-27 15:57:34 +01:00
Antonin Raffin	b66003cfb3	Add callback support	2020-01-27 14:32:31 +01:00
Antonin Raffin	0328a39d1b	Update changelog	2020-01-22 17:25:08 +01:00
Antonin Raffin	9345b85cfc	Update changelog and README	2020-01-22 17:23:42 +01:00
Antonin Raffin	9e250b6818	Build doc	2020-01-20 16:19:35 +01:00
Antonin Raffin	b4dc9d4e4d	Add doc	2019-09-26 11:46:40 +02:00

... 3 4 5 6 7

317 commits