* VecNormalize: allow non-continuous observations when norm_obs is False
* Update changelog, fix lint
* Switch to environment present in new and old versions of Gym
* Fix name
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Update evaluate_policy to use monitor data if available
* Update documentation
* Cleaning up
* Remove unnecessary typing trickery
* Update doc
* Rename is_wrapped to clarify it is for vecenvs
* Add is_wrapped for regular envs
* Add is_wrapped call for subprocvecenv and update code for circular imports
* Move new functions back to env_util and fix imports
* Update changelog
* Clarify evaluate_policy docs
* Add tests for wrapped modifying episode lengths
* Fix tests
* Update changelog
* Minor edits
* Add warn switch to evaluate_policy and update tests
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Added working her version, Online sampling is missing.
* Updated test_her.
* Added first version of online her sampling. Still problems with tensor dimensions.
* Reformat
* Fixed tests
* Added some comments.
* Updated changelog.
* Add missing init file
* Fixed some small bugs.
* Reduced arguments for HER, small changes.
* Added getattr. Fixed bug for online sampling.
* Updated save/load funtions. Small changes.
* Added her to init.
* Updated save method.
* Updated her ratio.
* Move obs_wrapper
* Added DQN test.
* Fix potential bug
* Offline and online her share same sample_goal function.
* Changed lists into arrays.
* Updated her test.
* Fix online sampling
* Fixed action bug. Updated time limit for episodes.
* Updated convert_dict method to take keys as arguments.
* Renamed obs dict wrapper.
* Seed bit flipping env
* Remove get_episode_dict
* Add fast online sampling version
* Added documentation.
* Vectorized reward computation
* Vectorized goal sampling
* Update time limit for episodes in online her sampling.
* Fix max episode length inference
* Bug fix for Fetch envs
* Fix for HER + gSDE
* Reformat (new black version)
* Added info dict to compute new reward. Check her_replay_buffer again.
* Fix info buffer
* Updated done flag.
* Fixes for gSDE
* Offline her version uses now HerReplayBuffer as episode storage.
* Fix num_timesteps computation
* Fix get torch params
* Vectorized version for offline sampling.
* Modified offline her sampling to use sample method of her_replay_buffer
* Updated HER tests.
* Updated documentation
* Cleanup docstrings
* Updated to review comments
* Fix pytype
* Update according to review comments.
* Removed random goal strategy. Updated sample transitions.
* Updated migration. Removed time signal removal.
* Update doc
* Fix potential load issue
* Add VecNormalize support for dict obs
* Updated saving/loading replay buffer for HER.
* Fix test memory usage
* Fixed save/load replay buffer.
* Fixed save/load replay buffer
* Fixed transition index after loading replay buffer in online sampling
* Better error handling
* Add tests for get_time_limit
* More tests for VecNormalize with dict obs
* Update doc
* Improve HER description
* Add test for sde support
* Add comments
* Add comments
* Remove check that was always valid
* Fix for terminal observation
* Updated buffer size in offline version and reset of HER buffer
* Reformat
* Update doc
* Remove np.empty + add doc
* Fix loading
* Updated loading replay buffer
* Separate online and offline sampling + bug fixes
* Update tensorboard log name
* Version bump
* Bug fix for special case
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Add auto formatting with black and isort
* Reformat code
* Ignore typing errors
* Add note about line length
* Add minimum version for isort
* Add commit-checks
* Update docker image
* Fixed lost import (during last merge)
* Fix opencv dependency
* Created DQN template according to the paper.
Next steps:
- Create Policy
- Complete Training
- Debug
* Changed Base Class
* refactor save, to be consistence with overriding the excluded_save_params function. Do not try to exclude the parameters twice.
* Added simple DQN policy
* Finished learn and train function
- missing correct loss computation
* changed collect_rollouts to work with discrete space
* moved discrete space collect_rollouts to dqn
* basic dqn working
* deleted SDE related code
* added gradient clipping and moved greedy policy to policy
* changed policy to implement target network
and added soft update(in fact standart tau is 1 so hard update)
* fixed policy setup
* rebase target_update_intervall on _n_updates
* adapted all tests
all tests passing
* Move to stable-baseline3
* Fixes for DQN
* Fix tests + add CNNPolicy
* Allow any optimizer for DQN
* added some util functions to create a arbitrary linear schedule, fixed pickle problem with old exploration schedule
* more documentation
* changed buffer dtype
* refactor and document
* Added Sphinx Documentation
Updated changelog.rst
* removed custom collect_rollouts as it is no longer necessary
* Implemented suggestions to clean code and documentation.
* extracted some functions on tests to reduce duplicated code
* added support for exploration_fraction
* Fixed exploration_fraction
* Added documentation
* Fixed get_linear_fn -> proper progress scaling
* Merged master
* Added nature reference
* Changed default parameters to https://www.nature.com/articles/nature14236/tables/1
* Fixed n_updates to be incremented correctly
* Correct train_freq
* Doc update
* added special parameter for DQN in tests
* different fix for test_discrete
* Update docs/modules/dqn.rst
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Update docs/modules/dqn.rst
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Update docs/modules/dqn.rst
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Added RMSProp in optimizer_kwargs, as described in nature paper
* Exploration fraction is inverse of 50.000.000 (total frames) / 1.000.000 (frames with linear schedule) according to nature paper
* Changelog update for buffer dtype
* standard exlude parameters should be always excluded to assure proper saving only if intentionally included by ``include`` parameter
* slightly more iterations on test_discrete to pass the test
* added param use_rms_prop instead of mutable default argument
* forgot alpha
* using huber loss, adam and learning rate 1e-4
* account for train_freq in update_target_network
* Added memory check for both buffers
* Doc updated for buffer allocation
* Added psutil Requirement
* Adapted test_identity.py
* Fixes with new SB3 version
* Fix for tensorboard name
* Convert assert to warning and fix tests
* Refactor off-policy algorithms
* Fixes
* test: remove next_obs in replay buffer
* Update changelog
* Fix tests and use tmp_path where possible
* Fix sampling bug in buffer
* Do not store next obs on episode termination
* Fix replay buffer sampling
* Update comment
* moved epsilon from policy to model
* Update predict method
* Update atari wrappers to match SB2
* Minor edit in the buffers
* Update changelog
* Merge branch 'master' into dqn
* Update DQN to new structure
* Fix tests and remove hardcoded path
* Fix for DQN
* Disable memory efficient replay buffer by default
* Fix docstring
* Add tests for memory efficient buffer
* Update changelog
* Split collect rollout
* Move target update outside `train()` for DQN
* Update changelog
* Update linear schedule doc
* Cleanup DQN code
* Minor edit
* Update version and docker images
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Change saving/loading normalization parameters to use single pickle file
* Remove 'use_gae' from RolloutBuffer compute_returns function
* Add some missing tests for normalizer, nan-checker and PPO clip_value_fn argument
* Update changelog
* Fix typo
* Use proper pytest.raises for catching errors in tests
* Add comment on GAE and how to obtain non-GAE behaviour
* Remove save/load_running_average from VecNormalize in favor of load/save
* Update changelog
* Update docstring
* Add accidentally removed tests for VecNormalize
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>