Shyamal H Anadkat
3b68dc7312
Update GAE computation docstring ( #655 )
...
* Fix typo in buffers.py
* Revert "Fix typo in buffers.py"
This reverts commit ca643d5e3a509ae1b8a65bf0de98f4609ca9d8da.
* Ignore pytype errors
* Update GAE computation docstring
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-11-25 10:53:42 +01:00
Parth Kothari
58e5506385
Editted Authors of DriverGym project ( #669 )
2021-11-18 10:18:18 +01:00
Parth Kothari
1ac35eaef2
Add DriverGym project to SB3 project documentation ( #665 )
...
* Added DriverGym project
* Updated changelog
* Update changelog.rst
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-11-17 11:13:43 +01:00
Antonin RAFFIN
d228364ccf
Add timeout handling for on-policy algorithms ( #658 )
...
* Add timeout handling for on-policy algorithms
* Fixes
* Fix infinite loop in eval
* Skip type check for python 3.9
* Fix for discrete obs + add docstring
* Fix A2C test
* Removed unused helper
* Add test for infinite horizon
* typed ast should be fixed
* Apply suggestions from code review
Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Anssi <kaneran21@hotmail.com>
2021-11-16 17:19:16 +01:00
Antonin RAFFIN
e75e1de4c1
Fix indentation in RL tips doc ( #657 )
...
* Update rl_tips.rst
indent fix to make if done and its following statement work
* Fix indentation and update changelog
* Skip type check for python 3.9
Co-authored-by: paulg <cove9988@gmail.com>
2021-11-10 16:54:20 +00:00
Antonin RAFFIN
2bb4500948
Fix set_env when using VecNormalize ( #638 )
...
* Fix `set_env` when using `VecNormalize`
* Update version
2021-11-02 13:52:26 +02:00
ac-93
98c1a637cf
add tactile-gym to the list of projects using SB3 ( #640 )
...
* Update projects.rst
* Update changelog.rst
* Update projects.rst
* Fix doc build
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-10-31 18:26:06 +01:00
Oleksii Kachaiev
0c17fedfac
Adjust FPS calculation to accommodate for reset_num_timesteps=False ( #636 )
...
* Store number of timesteps at the beginning of each learn cycle
* Update changelog
* Set default _num_timesteps_at_start in the contructor
* Test case for FPS logger
* Adjust test to cover both on-policy and off-policy algorithms
* Fix formatting
* Update test and add comment
* Fix test
Co-authored-by: Oleksii Kachaiev <okachaiev@riotgames.com>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-10-31 18:19:03 +01:00
Edouard Leurent
a2e3001598
Add highway-env to the list of projects using SB3 ( #639 )
...
* Add highway-env to the list of projects using SB3
Many thanks for this fantastic library, keep up the good work!
* Update changelog with added documentation
2021-10-30 13:53:36 +02:00
Oleksii Kachaiev
0503e694b2
Introduce norm_obs_keys param for VecNormalize environment wrapper ( #631 )
...
* Implement new norm_obs_keys param for VecNormalize environment wrapper
* Simplified doc string to avoid issues with lint and doc
* Updated changelog
* Update changelog.rst
* Update test_vec_normalize.py
* Update sanity checks
* Fix backward compat
* Update doc
* Update changelog
* Fix lint warnings
* Fix tests
* Minor edit
* observation_space sanity check was applied twice
Co-authored-by: Oleksii Kachaiev <okachaiev@riotgames.com>
Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2021-10-28 19:18:39 +02:00
Antonin RAFFIN
7b977d7b03
Release 1.3.0 ( #625 )
2021-10-23 17:07:00 +02:00
Antonin RAFFIN
e907eca18e
Fix set_env to keep the number of timesteps ( #615 )
...
* Fix for `set_env`
* Add test and update changelog
* Use underscores and f-strings
* Add PyPi info
* Update comments
2021-10-23 16:36:40 +02:00
Antonin RAFFIN
1564a85081
System info helper ( #613 )
...
* Add `system_env_info`
* Add `print_system_info` to load
and store system info at save time
* Remove TODO
* Rename to `get_system_info`
* Import as sb3 for consistency
* Update changelog
* Add warning for old SB3 versions
* Use underscore litteral for more clarity
2021-10-18 10:43:56 +02:00
Timo Kaufmann
09e9fc42eb
Use consistent logging keys ( #605 )
...
* Use a consistent key to log the total timesteps
This changes the timestep logging key of on-policy algorithms from
`time/total_timesteps` to `time/total timesteps` (note the
underscore/space). The off-policy algorithms and the eval callback
already use the latter, so this behavior is more consistent.
* Use underscores instead of spaces in logging keys
Most keys already followed this policy and consistent behavior is
friendlier to new users.
* Minor edit and bump version
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-10-12 13:17:30 +02:00
Antonin RAFFIN
75aa31dcfb
Update SB3 contrib algorithms ( #604 )
2021-10-10 15:41:39 +02:00
Antonin RAFFIN
1881d904a0
Doc fix and improve error messages ( #598 )
...
* Fix custom env doc
* Catch common mistake
* Improve `EvalCallback` error message
* Lint test
* Update docs/guide/custom_env.rst
Co-authored-by: Adam Gleave <adam@gleave.me>
Co-authored-by: Adam Gleave <adam@gleave.me>
2021-10-08 18:08:31 +02:00
Ilja Avadiev
740d61ada3
Doc fix environment mixup ( #588 )
2021-09-29 10:16:59 +02:00
Antonin RAFFIN
306e49fda6
Fixes in is_vectorized_observation ( #587 )
...
* Fix is vectorized bug in DQN
* Fix sub-classed obs
2021-09-28 21:57:49 +02:00
Antonin RAFFIN
201fbffa8c
Remove sde_net_arch + Simplify policy ( #584 )
...
* Remove `sde_net_arch` + Simplify policy
* Add warning at load time
2021-09-28 22:32:54 +03:00
batu
89af49ca91
ONNX Documentation Update ( #464 )
...
* Updated ONNX documentation
First draft on the documentation explaining how to export SB3 models in the ONNX format
* Updated changelog with ONNX documentation fix
* Address comments
* Update changelog.rst
* Update rtd env
* Fixes + add test example
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Anssi Kanervisto <anssk@Anssis-MacBook-Air.local>
Co-authored-by: Anssi Kanervisto <kaneran21@hotmail.com>
2021-09-26 17:40:35 +02:00
Baek Junyeob
914bc10a0d
Add policy-distillation-baselines to project page ( #578 )
...
* Update projects.rst
* Update docs/misc/projects.rst
* Apply suggestions from code review
* Update changelog.rst
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-09-20 16:30:16 +02:00
Adam Gleave
e825fbdd33
VecNormalize: allow non-continuous observations when norm_obs is False ( #575 )
...
* VecNormalize: allow non-continuous observations when norm_obs is False
* Update changelog, fix lint
* Switch to environment present in new and old versions of Gym
* Fix name
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-09-18 12:11:01 +02:00
Matthew Allen
76c212a854
Add RLGym to project page ( #576 )
...
* Add RLGym to projects list.
Per the request in this issue on our repo: https://github.com/lucas-emery/rocket-league-gym/issues/24
* Update changelog documentation section
* Update changelog.rst
* Update docs/misc/projects.rst
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-09-18 11:47:22 +02:00
Wilhelm Kirchgässner
303df08a80
Add GEM project to project section of doc ( #574 )
...
* add GEM project to project section of doc
* Update docs/misc/projects.rst
* Update changelog.rst
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-09-18 11:10:04 +02:00
Cyprien
f3a35aa786
Add method predict_values for ActorCriticPolicy ( #569 )
...
* feat: add method predict_values for ActorCriticPolicy
* Fixes for new gym version
* Reformat
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-09-15 14:03:04 +02:00
Antonin RAFFIN
16f8b21d9b
Add get_distribution for on-policy algorithms ( #566 )
...
* feat: get_distribution method for ActorCriticPolicy
New method get_distribution for class ActorCriticPolicy returning current action distribution given observations
* doc: updating changelog.rst
- adding block for Release 1.2.1a0
- adding cyprienc to contributors
* style: make format
* fix: updating version.txt
Changing version from 1.2.0 to 1.2.1a0
* Update changelog
* Add test for get distribution
Co-authored-by: Cyprien <courtot.c@gmail.com>
2021-09-13 10:25:42 +02:00
Antonin RAFFIN
f8a0869073
Hotfix for Vecnormalize ( #558 )
...
* Hotfix for Vecnormalize
* Rename `ret` to `returns`
2021-09-08 12:30:20 +02:00
Antonin RAFFIN
f9e5753acd
Refactor BasePolicy predict ( #559 )
2021-09-05 02:27:45 +03:00
Scott Brownlie
1afc2f3abe
Avoid putting target networks into training mode ( #553 )
...
* make sure DQN policy is always in correct mode - train or eval
* make set_training_mode an abstract method of the base policy - safer
* update docstring of _build method to note that the target network is put into eval mode
* use set_training_mode to put the dqn target network into eval mode
* use set_training_mode to set the training model of the q-network
* move set_training_mode abstract method from BasePolicy to BaseModel
* set train and eval mode for TD3
* make sure critic is always in correct mode during train
* set train and eval mode for SAC
* add comment re batch norm and dropout
* set train and eval mode for A2C and PPO
* add tests for collect rollouts with batch norm
* fix formatting
* update change log
* update version
* remove Optional typing for batch size - causing type check to fail
* Fix scipy dependency for toy text envs
* implement set_training_mode method in BaseModel
* move all tests of train/eval mode to test_train_eval_mode
* call learn with learning_starts = total_timesteps to test that collect_rollouts does not update batch norm
* remove extra calls to set_training_mode in train method of TD3 and SAC
* Allow gradient_steps=0
* Refactor tests
* Add comment + use aliases
* Typos
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-08-30 17:42:41 +02:00
David Blom
3efab0d267
Training and evaluation: call model.train() and model.eval() ( #537 )
...
* training and evaluation: call model.train() and model.eval() to enable and disable dropout and batchnorm
* Add comment documentation
* Fix train and eval for the Actor class
* Run black
* Add github handle to changelog
* Add unit tests for PPO and DQN
* Refactor unit test
* Run black
* unit test: add a dropout layer and check that calling predict with deterministic=True is deterministic
* documentation: add bugfix description to changelog
* unit test: use learning_starts=0, decrease the size of the network and use more training steps
* on policy algorithms: call policy.train() and policy.eval() instead of disable_training and enable_training as it is a th.nn.module
* Rename unit test
* unit test: use drop out probability of 0.5
* Call policy.train and policy.eval
* Fixes + update tests
* Remove unneeded eval
Co-authored-by: David Blom <davidsblom@gmail.com>
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-08-14 14:08:27 +02:00
MihaiAnca13
c41368f2ea
Docs examples warning - issue #526 ( #530 )
...
* Update a2c.rst
* Update ddpg.rst
* Update dqn.rst
* Update her.rst
* Update ppo.rst
* Update sac.rst
* Update td3.rst
* Update changelog.rst
* modified message
* Update examples.rst
Co-authored-by: Anssi <kaneran21@hotmail.com>
2021-08-09 16:23:25 +03:00
Antonin RAFFIN
be86883f36
Fix type annotations ( #522 )
...
* Fix type annotations
* Add citation file
* Update CITATION.cff
* Add note about tb logging
Co-authored-by: Anssi <kaneran21@hotmail.com>
2021-07-29 13:02:09 +02:00
Antonin RAFFIN
503425932f
Documentation fixes ( #514 )
...
* Update multiprocessing example
* Add VecEnvWrapper example
* Update docs/guide/vec_envs.rst
Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Anssi <kaneran21@hotmail.com>
2021-07-18 20:51:41 +02:00
Antonin RAFFIN
2fa06ae8d2
Add Python3.9 CI + upgrade min PyTorch version ( #503 )
...
* Add Python3.9 CI + upgrade min PyTorch version
* Upgrade min PyTorch version
2021-07-06 09:32:03 +02:00
Antonin RAFFIN
5af35fa2cc
Release v1.1.0 ( #497 )
2021-07-02 11:21:09 +02:00
Skander Moalla
abbf48e93e
Fix Inconsistencies with EvalCallback tensorboard logs ( #492 )
...
* Make EvalCallback dump the evaluation logs it records #457 .
* Make test deterministic
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-07-01 15:43:08 +02:00
Carlo Rizzardo
066e1409d9
Corrected DictReplayBuffer observation dtype #484 ( #486 )
...
* Fix observation buffer dtype in DictReplayBuffer
* Formatting fix (line length)
* Changelog update, bugfix DictReplaybuffer observations dtype
2021-06-22 13:41:26 +02:00
Antonin RAFFIN
b52c6fc18f
Fix logger setup ( #469 )
...
* Make logger an attribute
* Update doc
* Fix logger reset when using multiple runs
* Cleanup logger: remove `Logger.CURRENT`
* Fix for PPO
* Update tests and improve docstring
* Add warning
* Throw error when tensorboard not installed
2021-06-14 15:17:48 +02:00
Benjamin Steenhoek
180a2e3832
Remove recurrent policies from A2C docs ( #470 )
...
* Remove recurrent policies from A2C docs
Recurrent policies are not supported yet as of (https://github.com/DLR-RM/stable-baselines3/issues/160#issuecomment-694756355 ), but the docs say that A2C supports them. Changing it to avoid misleading.
* Update changelog
Co-authored-by: benjaminjsteenhoek@gmail.com <benjis@iastate.edu>
2021-06-07 19:39:49 +02:00
Benjamin Black
a038044d11
Added support for vector envs in evaluation ( #447 )
...
* added vector env support to evaluate_policy
* fixed linting and documentation
* updated changelog
* fixed code style issue
* added tests for vec env
* fixed formatting
* renamed observations
* added comments for vector evaluation
* fixed issues
* Cleanup + bump version
* Add comment
* Fix wrong count of episodes
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2021-05-28 12:40:29 +02:00
Antonin RAFFIN
88e1be9ff5
Documentation update ( #450 )
...
* Update migration guide
* Add sanity check
* Removed parameter ``channels_last`` from ``is_image_space``
* Pin docutils
* Clarify callback `save_freq` definition
* Update docs/misc/changelog.rst
* Update docs/misc/changelog.rst
* Fix typos
Co-authored-by: Anssi <kaneran21@hotmail.com>
2021-05-23 13:13:11 +02:00
Amanda Dsouza
18f4e3ace0
Added wrapper_kwargs argument to make_vec_env ( #448 )
...
* Added wrapper_kwargs to make_vec_env
* code black format
* Tmp fix for atari-py
* Update changelog
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-05-23 11:33:34 +02:00
Rohan Tangri
df6f9de8f4
KL Divergence Helper Function ( #431 )
...
* add kl divergence wrapper
* add test
* update changelog
* black lint
* remove unused import
* Fix ent coef loading for SAC (#429 )
* Fix ent coef loading for SAC
* Better fix and add comment
* add 'distribution' to base Distribution class
* add sample test
* revert to plain pytorch implementation
* black reformat
* Update docs/misc/changelog.rst
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Doc update (custom policy + fix her example) (#436 )
* isort and black reformat
* float -> bool tensor
* add sanity test
* more concise kl code
* remove outdated comment
* all -> allclose assertion
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Fix PyTorch warning
* Update gSDE entropy test
* Update entropy test
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2021-05-20 19:01:07 +02:00
Antonin RAFFIN
378d197b00
Doc update (custom policy + fix her example) ( #436 )
2021-05-16 18:21:07 +02:00
Antonin RAFFIN
1ce911994b
Fix ent coef loading for SAC ( #429 )
...
* Fix ent coef loading for SAC
* Better fix and add comment
2021-05-12 12:21:54 +03:00
Jaden Travnik
75b6f3b3b0
Dictionary Observations ( #243 )
...
* First commit
* Fixing missing refs from a quick merge from master
* Reformat
* Adding DictBuffers
* Reformat
* Minor reformat
* added slow dict test. Added SACMultiInputPolicy for future. Added private static image transpose helper to common policy
* Ran black on buffers
* Ran isort
* Adding StackedObservations classes used within VecStackEnvs wrappers. Made test_dict_env shorter and removed slow
* Running isort :facepalm
* Fixed typing issues
* Adding docstrings and typing. Using util for moving data to device.
* Fixed trailing commas
* Fix types
* Minor edits
* Avoid duplicating code
* Fix calls to parents
* Adding assert to buffers. Updating changelong
* Running format on buffers
* Adding multi-input policies to dqn,td3,a2c. Fixing warnings. Fixed bug with DictReplayBuffer as Replay buffers use only 1 env
* Fixing warnings, splitting is_vectorized_observation into multiple functions based on space type
* Created envs folder in common. Updated imports. Moved stacked_obs to vec_env folder
* Moved envs to envs directory. Moved stacked obs to vec_envs. Started update on documentation
* Fixes
* Running code style
* Update docstrings on torch_layers
* Decapitalize non-constant variables
* Using NatureCNN architecture in combined extractor. Increasing img size in multi input env. Adding memory reduction in test
* Update doc
* Update doc
* Fix format
* Removing NineRoom env. Using nested preprocess. Removing mutable default args
* running code style
* Passing channel check through to stacked dict observations.
* Running black
* Adding channel control to SimpleMultiObsEnv. Passing check_channels to CombinedExtractor
* Remove optimize memory for dict buffers
* Update doc
* Move identity env
* Minor edits + bump version
* Update doc
* Fix doc build
* Bug fixes + add support for more type of dict env
* Fixes + add multi env test
* Add support for vectranspose
* Fix stacked obs for dict and add tests
* Add check for nested spaces. Fix dict-subprocvecenv test
* Fix (single) pytype error
* Simplify CombinedExtractor
* Fix tests
* Fix check
* Merge branch 'master' into feat/dict_observations
* Fix for net_arch with dict and vector obs
* Fixes
* Add consistency test
* Update env checker
* Add some docs on dict obs
* Update default CNN feature vector size
* Refactor HER (#351 )
* Start refactoring HER
* Fixes
* Additional fixes
* Faster tests
* WIP: HER as a custom replay buffer
* New replay only version (working with DQN)
* Add support for all off-policy algorithms
* Fix saving/loading
* Remove ObsDictWrapper and add VecNormalize tests with dict
* Stable-Baselines3 v1.0 (#354 )
* Bump version and update doc
* Fix name
* Apply suggestions from code review
Co-authored-by: Adam Gleave <adam@gleave.me>
* Update docs/index.rst
Co-authored-by: Adam Gleave <adam@gleave.me>
* Update wording for RL zoo
Co-authored-by: Adam Gleave <adam@gleave.me>
* Add gym-pybullet-drones project (#358 )
* Update projects.rst
Added gym-pybullet-drones
* Update projects.rst
Longer title underline
* Update changelog
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Include SuperSuit in projects (#359 )
* include supersuit
* longer title underline
* Update changelog.rst
* Fix default arguments + add bugbear (#363 )
* Fix potential bug + add bug bear
* Remove unused variables
* Minor: version bump
* Add code of conduct + update doc (#373 )
* Add code of conduct
* Fix DQN doc example
* Update doc (channel-last/first)
* Apply suggestions from code review
Co-authored-by: Anssi <kaneran21@hotmail.com>
* Apply suggestions from code review
Co-authored-by: Adam Gleave <adam@gleave.me>
Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Adam Gleave <adam@gleave.me>
* Make installation command compatible with ZSH (#376 )
* Add quotes
* Add Zsh bracket info
* Add clarify pip installation line
* Make note bold
* Add Zsh pip installation note
* Add handle timeouts param
* Fixes
* Fixes (buffer size, extend test)
* Fix `max_episode_length` redefinition
* Fix potential issue
* Add some docs on dict obs
* Fix performance bug
* Fix slowdown
* Add package to install (#378 )
* Add package to install
* Update docs packages installation command
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Fix backward compat + add test
* Fix VecEnv detection
* Update doc
* Fix vec env check
* Support for `VecMonitor` for gym3-style environments (#311 )
* add vectorized monitor
* auto format of the code
* add documentation and VecExtractDictObs
* refactor and add test cases
* add test cases and format
* avoid circular import and fix doc
* fix type
* fix type
* oops
* Update stable_baselines3/common/monitor.py
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Update stable_baselines3/common/monitor.py
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* add test cases
* update changelog
* fix mutable argument
* quick fix
* Apply suggestions from code review
* fix terminal observation for gym3 envs
* delete comment
* Update doc and bump version
* Add warning when already using `Monitor` wrapper
* Update vecmonitor tests
* Fixes
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Reformat
* Fixed loading of ``ent_coef`` for ``SAC`` and ``TQC``, it was not optimized anymore (#392 )
* Fix ent coef loading bug
* Add test
* Add comment
* Reuse save path
* Add test for GAE + rename `RolloutBuffer.dones` for clarification (#375 )
* Fix return computation + add test for GAE
* Rename `last_dones` to `episode_starts` for clarification
* Revert advantage
* Cleanup test
* Rename variable
* Clarify return computation
* Clarify docs
* Add multi-episode rollout test
* Reformat
Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>
* Fixed saving of `A2C` and `PPO` policy when using gSDE (#401 )
* Improve doc and replay buffer loading
* Add support for images
* Fix doc
* Update Procgen doc
* Update changelog
* Update docstrings
Co-authored-by: Adam Gleave <adam@gleave.me>
Co-authored-by: Jacopo Panerati <jacopo.panerati@utoronto.ca>
Co-authored-by: Justin Terry <justinkterry@gmail.com>
Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Tom Dörr <tomdoerr96@gmail.com>
Co-authored-by: Tom Dörr <tom.doerr@tum.de>
Co-authored-by: Costa Huang <costa.huang@outlook.com>
* Update doc and minor fixes
* Update doc
* Added note about MultiInputPolicy in error of NatureCNN
* Merge branch 'master' into feat/dict_observations
* Address comments
* Naming clarifications
* Actually saving the file would be nice
* Fix edge case when doing online sampling with HER
* Cleanup
* Add sanity check
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>
Co-authored-by: Adam Gleave <adam@gleave.me>
Co-authored-by: Jacopo Panerati <jacopo.panerati@utoronto.ca>
Co-authored-by: Justin Terry <justinkterry@gmail.com>
Co-authored-by: Tom Dörr <tomdoerr96@gmail.com>
Co-authored-by: Tom Dörr <tom.doerr@tum.de>
Co-authored-by: Costa Huang <costa.huang@outlook.com>
2021-05-11 12:29:30 +02:00
Rohan Tangri
2ada2dd0b2
Update PPO KL Divergence Estimator ( #419 )
...
* remove unused all_kl_divs memory
* new kl approximate equation
* move kl check before update step
* update changelog
* add continue_training flag update to kl check
* add verbose check
* update changelog
* lint with black
* r -> log_ratio
* Add link to PR
* invert ratio
* Fix for Sphinx v4.0
Co-authored-by: Anssi <kaneran21@hotmail.com>
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-05-10 13:21:00 +03:00
Rohan Tangri
35da0b59b9
Policy Base for On-policy Algorithms ( #412 ) ( #415 )
...
* add policy_base input to OnPolicyAlgorithms
* update changelog
* Fix pytype error
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-05-04 12:59:36 +03:00
Antonin RAFFIN
6f822b9ed7
Reformat with new black version ( #408 )
...
* Reformat
* Update changelog
2021-04-26 15:58:19 +02:00
Antonin RAFFIN
c69f7cd5e6
Fixed saving of A2C and PPO policy when using gSDE ( #401 )
2021-04-19 12:23:02 +02:00