stable-baselines3/docs/index.rst

.. Stable Baselines3 documentation master file, created by
   sphinx-quickstart on Thu Sep 26 11:06:54 2019.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

Stable-Baselines3 Docs - Reliable Reinforcement Learning Implementations
========================================================================

`Stable Baselines3 (SB3) <https://github.com/DLR-RM/stable-baselines3>`_ is a set of reliable implementations of reinforcement learning algorithms in PyTorch.
It is the next major version of `Stable Baselines <https://github.com/hill-a/stable-baselines>`_.


Github repository: https://github.com/DLR-RM/stable-baselines3

RL Baselines3 Zoo (training framework for SB3): https://github.com/DLR-RM/rl-baselines3-zoo

RL Baselines3 Zoo provides a collection of pre-trained agents, scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.

SB3 Contrib (experimental RL code, latest algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib


Main Features
--------------

- Unified structure for all algorithms
- PEP8 compliant (unified code style)
- Documented functions and classes
- Tests, high code coverage and type hints
- Clean code
- Tensorboard support
- **The performance of each algorithm was tested** (see *Results* section in their respective page)


.. toctree::
   :maxdepth: 2
   :caption: User Guide

   guide/install
   guide/quickstart
   guide/rl_tips
   guide/rl
   guide/algos
   guide/examples
   guide/vec_envs
   guide/custom_env
   guide/custom_policy
   guide/callbacks
   guide/tensorboard
   guide/rl_zoo
   guide/sb3_contrib
   guide/imitation
   guide/migration
   guide/checking_nan
   guide/developer
   guide/save_format
   guide/export


.. toctree::
  :maxdepth: 1
  :caption: RL Algorithms

  modules/base
  modules/a2c
  modules/ddpg
  modules/dqn
  modules/her
  modules/ppo
  modules/sac
  modules/td3

.. toctree::
  :maxdepth: 1
  :caption: Common

  common/atari_wrappers
  common/env_util
  common/distributions
  common/evaluation
  common/env_checker
  common/monitor
  common/logger
  common/noise
  common/utils

.. toctree::
  :maxdepth: 1
  :caption: Misc

  misc/changelog
  misc/projects


Citing Stable Baselines3
------------------------
To cite this project in publications:

.. code-block:: bibtex

    @misc{stable-baselines3,
      author = {Raffin, Antonin and Hill, Ashley and Ernestus, Maximilian and Gleave, Adam and Kanervisto, Anssi and Dormann, Noah},
      title = {Stable Baselines3},
      year = {2019},
      publisher = {GitHub},
      journal = {GitHub repository},
      howpublished = {\url{https://github.com/DLR-RM/stable-baselines3}},
    }

Contributing
------------

To any interested in making the rl baselines better, there are still some improvements
that need to be done.
You can check issues in the `repo <https://github.com/DLR-RM/stable-baselines3/issues>`_.

If you want to contribute, please read `CONTRIBUTING.md <https://github.com/DLR-RM/stable-baselines3/blob/master/CONTRIBUTING.md>`_ first.

Indices and tables
-------------------

* :ref:`genindex`
* :ref:`search`
* :ref:`modindex`
Add base doc 2020-05-07 08:10:51 +00:00			`.. Stable Baselines3 documentation master file, created by`
Add doc 2019-09-26 09:46:40 +00:00			`sphinx-quickstart on Thu Sep 26 11:06:54 2019.`
			`You can adapt this file completely to your liking, but it should at least`
			contain the root `toctree` directive.

Update doc: SB3-Contrib (#267) * Fix big when saving/loading q-net alone * Rename variables to match SB3-contrib * Update docker image * Set min version for tensorboard * Add SB3-Contrib to doc * Update DQN * Apply suggestions from code review Co-authored-by: Adam Gleave <adam@gleave.me> * Update wording Co-authored-by: Adam Gleave <adam@gleave.me> 2020-12-21 15:17:24 +00:00			`Stable-Baselines3 Docs - Reliable Reinforcement Learning Implementations`
			`========================================================================`
Add doc 2019-09-26 09:46:40 +00:00
Update doc: SB3-Contrib (#267) * Fix big when saving/loading q-net alone * Rename variables to match SB3-contrib * Update docker image * Set min version for tensorboard * Add SB3-Contrib to doc * Update DQN * Apply suggestions from code review Co-authored-by: Adam Gleave <adam@gleave.me> * Update wording Co-authored-by: Adam Gleave <adam@gleave.me> 2020-12-21 15:17:24 +00:00			`Stable Baselines3 (SB3) <https://github.com/DLR-RM/stable-baselines3>`_ is a set of reliable implementations of reinforcement learning algorithms in PyTorch.
Update doc 2020-05-08 11:09:38 +00:00			It is the next major version of `Stable Baselines <https://github.com/hill-a/stable-baselines>`_.
Add base doc 2020-05-07 08:10:51 +00:00

			`Github repository: https://github.com/DLR-RM/stable-baselines3`
Add doc 2019-09-26 09:46:40 +00:00
Stable-Baselines3 v1.0 (#354) * Bump version and update doc * Fix name * Apply suggestions from code review Co-authored-by: Adam Gleave <adam@gleave.me> * Update docs/index.rst Co-authored-by: Adam Gleave <adam@gleave.me> * Update wording for RL zoo Co-authored-by: Adam Gleave <adam@gleave.me> 2021-03-17 13:20:31 +00:00			`RL Baselines3 Zoo (training framework for SB3): https://github.com/DLR-RM/rl-baselines3-zoo`
Add doc 2019-09-26 09:46:40 +00:00
Stable-Baselines3 v1.0 (#354) * Bump version and update doc * Fix name * Apply suggestions from code review Co-authored-by: Adam Gleave <adam@gleave.me> * Update docs/index.rst Co-authored-by: Adam Gleave <adam@gleave.me> * Update wording for RL zoo Co-authored-by: Adam Gleave <adam@gleave.me> 2021-03-17 13:20:31 +00:00			`RL Baselines3 Zoo provides a collection of pre-trained agents, scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.`
Add doc 2019-09-26 09:46:40 +00:00
Update doc: SB3-Contrib (#267) * Fix big when saving/loading q-net alone * Rename variables to match SB3-contrib * Update docker image * Set min version for tensorboard * Add SB3-Contrib to doc * Update DQN * Apply suggestions from code review Co-authored-by: Adam Gleave <adam@gleave.me> * Update wording Co-authored-by: Adam Gleave <adam@gleave.me> 2020-12-21 15:17:24 +00:00			`SB3 Contrib (experimental RL code, latest algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib`

Add doc 2019-09-26 09:46:40 +00:00
Add base doc 2020-05-07 08:10:51 +00:00			`Main Features`
			`--------------`

			`- Unified structure for all algorithms`
			`- PEP8 compliant (unified code style)`
			`- Documented functions and classes`
			`- Tests, high code coverage and type hints`
			`- Clean code`
Tensorboard integration (#30) * init commit tensorboard-integration * Added tb logger to ppo (with output exclusions) * fixed truncated stdout * categorize stdout outputs by tag * separated exclusions from values, added missing logs * saving exclusions as dict instead of list * reformatting, auto run indexing * included renaming suggestions, fixed tests * tb support for sac * linting * moved logging to base class * tb support for td3 * removed histograms, non-verbose output working * modifed changelog * linting * fixed type error * moved logger config to utils * removed episode_rewards log from ppo * Enable tensorboard in tests * Remove unused import * Update logger sub titles * Minor edit for PPO * Update logger and tb log folder * Pass correct logger to Callbacks * updated docs * added tb example image to docs * add support for continuing training in tensorboard * added tensorboard to docs index * added tb test * moved logger config to _setup_learn, updated tests * accessing verbose from base class * Update doc and tests * Rename session -> time * Update version * Update logger truncate * Update types * Remove duplicated code Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> 2020-06-01 09:55:44 +00:00			`- Tensorboard support`
Update documentation (#199) * Update doc and add new example * Add save/load replay buffer example * Add save format + export doc * Add example for get/set parameters * Typos and minor edits * Add results sections * Add note about performance * Add DDPG results * Address comments * Fix grammar/wording Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com> 2020-10-28 08:55:16 +00:00			`- The performance of each algorithm was tested (see Results section in their respective page)`
Add base doc 2020-05-07 08:10:51 +00:00
Add doc 2019-09-26 09:46:40 +00:00
			`.. toctree::`
			`:maxdepth: 2`
			`:caption: User Guide`

Add base doc 2020-05-07 08:10:51 +00:00			`guide/install`
Add doc 2019-09-26 09:46:40 +00:00			`guide/quickstart`
Add base doc 2020-05-07 08:10:51 +00:00			`guide/rl_tips`
			`guide/rl`
More doc + sync VecEnvs + atari 2020-05-07 14:08:23 +00:00			`guide/algos`
			`guide/examples`
Add doc 2019-09-26 09:46:40 +00:00			`guide/vec_envs`
Add base doc 2020-05-07 08:10:51 +00:00			`guide/custom_env`
More doc + sync VecEnvs + atari 2020-05-07 14:08:23 +00:00			`guide/custom_policy`
Add base doc 2020-05-07 08:10:51 +00:00			`guide/callbacks`
Tensorboard integration (#30) * init commit tensorboard-integration * Added tb logger to ppo (with output exclusions) * fixed truncated stdout * categorize stdout outputs by tag * separated exclusions from values, added missing logs * saving exclusions as dict instead of list * reformatting, auto run indexing * included renaming suggestions, fixed tests * tb support for sac * linting * moved logging to base class * tb support for td3 * removed histograms, non-verbose output working * modifed changelog * linting * fixed type error * moved logger config to utils * removed episode_rewards log from ppo * Enable tensorboard in tests * Remove unused import * Update logger sub titles * Minor edit for PPO * Update logger and tb log folder * Pass correct logger to Callbacks * updated docs * added tb example image to docs * add support for continuing training in tensorboard * added tensorboard to docs index * added tb test * moved logger config to _setup_learn, updated tests * accessing verbose from base class * Update doc and tests * Rename session -> time * Update version * Update logger truncate * Update types * Remove duplicated code Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> 2020-06-01 09:55:44 +00:00			`guide/tensorboard`
Update doc (add rl zoo) 2020-05-08 09:58:43 +00:00			`guide/rl_zoo`
Update doc: SB3-Contrib (#267) * Fix big when saving/loading q-net alone * Rename variables to match SB3-contrib * Update docker image * Set min version for tensorboard * Add SB3-Contrib to doc * Update DQN * Apply suggestions from code review Co-authored-by: Adam Gleave <adam@gleave.me> * Update wording Co-authored-by: Adam Gleave <adam@gleave.me> 2020-12-21 15:17:24 +00:00			`guide/sb3_contrib`
Add imitation library docs (#200) * docs: Add imitation library docs * Fix doc syntax errors * Fix internal link; PDF->abstract for DAgger for consistency * Grammar * Update migration guide Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> Co-authored-by: Adam Gleave <adam@gleave.me> 2020-10-24 16:33:26 +00:00			`guide/imitation`
Add base doc 2020-05-07 08:10:51 +00:00			`guide/migration`
			`guide/checking_nan`
Add developer guide 2020-05-08 14:20:21 +00:00			`guide/developer`
Update documentation (#199) * Update doc and add new example * Add save/load replay buffer example * Add save format + export doc * Add example for get/set parameters * Typos and minor edits * Add results sections * Add note about performance * Add DDPG results * Address comments * Fix grammar/wording Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com> 2020-10-28 08:55:16 +00:00			`guide/save_format`
			`guide/export`
Add doc 2019-09-26 09:46:40 +00:00

			`.. toctree::`
			`:maxdepth: 1`
			`:caption: RL Algorithms`

			`modules/base`
Build doc 2020-01-20 15:19:35 +00:00			`modules/a2c`
Implement DDPG (#92) * Add DDPG + TD3 with any number of critics * Allow any number of critics for SAC * Update doc * [ci skip] Update DDPG example * Remove unused parameter * Add DDPG to identity test * Fix computation with n_critics=1,3 * Update doc * Apply suggestions from code review Co-authored-by: Adam Gleave <adam@gleave.me> * Update docstrings for off-policy algos * Add check for sde Co-authored-by: Adam Gleave <adam@gleave.me> 2020-07-16 12:14:22 +00:00			`modules/ddpg`
			`modules/dqn`
Implement HER (#120) * Added working her version, Online sampling is missing. * Updated test_her. * Added first version of online her sampling. Still problems with tensor dimensions. * Reformat * Fixed tests * Added some comments. * Updated changelog. * Add missing init file * Fixed some small bugs. * Reduced arguments for HER, small changes. * Added getattr. Fixed bug for online sampling. * Updated save/load funtions. Small changes. * Added her to init. * Updated save method. * Updated her ratio. * Move obs_wrapper * Added DQN test. * Fix potential bug * Offline and online her share same sample_goal function. * Changed lists into arrays. * Updated her test. * Fix online sampling * Fixed action bug. Updated time limit for episodes. * Updated convert_dict method to take keys as arguments. * Renamed obs dict wrapper. * Seed bit flipping env * Remove get_episode_dict * Add fast online sampling version * Added documentation. * Vectorized reward computation * Vectorized goal sampling * Update time limit for episodes in online her sampling. * Fix max episode length inference * Bug fix for Fetch envs * Fix for HER + gSDE * Reformat (new black version) * Added info dict to compute new reward. Check her_replay_buffer again. * Fix info buffer * Updated done flag. * Fixes for gSDE * Offline her version uses now HerReplayBuffer as episode storage. * Fix num_timesteps computation * Fix get torch params * Vectorized version for offline sampling. * Modified offline her sampling to use sample method of her_replay_buffer * Updated HER tests. * Updated documentation * Cleanup docstrings * Updated to review comments * Fix pytype * Update according to review comments. * Removed random goal strategy. Updated sample transitions. * Updated migration. Removed time signal removal. * Update doc * Fix potential load issue * Add VecNormalize support for dict obs * Updated saving/loading replay buffer for HER. * Fix test memory usage * Fixed save/load replay buffer. * Fixed save/load replay buffer * Fixed transition index after loading replay buffer in online sampling * Better error handling * Add tests for get_time_limit * More tests for VecNormalize with dict obs * Update doc * Improve HER description * Add test for sde support * Add comments * Add comments * Remove check that was always valid * Fix for terminal observation * Updated buffer size in offline version and reset of HER buffer * Reformat * Update doc * Remove np.empty + add doc * Fix loading * Updated loading replay buffer * Separate online and offline sampling + bug fixes * Update tensorboard log name * Version bump * Bug fix for special case Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de> Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> 2020-10-22 09:56:43 +00:00			`modules/her`
Add doc 2019-09-26 09:46:40 +00:00			`modules/ppo`
			`modules/sac`
			`modules/td3`

Add base doc 2020-05-07 08:10:51 +00:00			`.. toctree::`
			`:maxdepth: 1`
			`:caption: Common`

Update doc 2020-05-08 11:09:38 +00:00			`common/atari_wrappers`
Rename cmd_util to env_util (#197) * Rename cmd_util to env_util * Fix docs and add missing newline * Address comments 2020-10-22 09:05:52 +00:00			`common/env_util`
Add base doc 2020-05-07 08:10:51 +00:00			`common/distributions`
			`common/evaluation`
			`common/env_checker`
Update doc 2020-05-08 11:09:38 +00:00			`common/monitor`
			`common/logger`
			`common/noise`
			`common/utils`
Add doc 2019-09-26 09:46:40 +00:00
			`.. toctree::`
			`:maxdepth: 1`
			`:caption: Misc`

			`misc/changelog`
Add base doc 2020-05-07 08:10:51 +00:00			`misc/projects`
Add doc 2019-09-26 09:46:40 +00:00

Rename to stable-baselines3 2020-05-05 13:02:35 +00:00			`Citing Stable Baselines3`
Sync with Stable-Baselines 2020-05-05 14:28:38 +00:00			`------------------------`
Add doc 2019-09-26 09:46:40 +00:00			`To cite this project in publications:`

			`.. code-block:: bibtex`

Sync with Stable-Baselines 2020-05-05 14:28:38 +00:00			`@misc{stable-baselines3,`
Update tests 2020-02-03 14:57:37 +00:00			`author = {Raffin, Antonin and Hill, Ashley and Ernestus, Maximilian and Gleave, Adam and Kanervisto, Anssi and Dormann, Noah},`
Rename to stable-baselines3 2020-05-05 13:02:35 +00:00			`title = {Stable Baselines3},`
Add doc 2019-09-26 09:46:40 +00:00			`year = {2019},`
			`publisher = {GitHub},`
			`journal = {GitHub repository},`
Sync with Stable-Baselines 2020-05-05 14:28:38 +00:00			`howpublished = {\url{https://github.com/DLR-RM/stable-baselines3}},`
Add doc 2019-09-26 09:46:40 +00:00			`}`

Remove "device" argument from policies (#141) * Remove device arg from policies * Clean up for PR * Update test and doc * Fix codestyle Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> 2020-08-23 11:27:52 +00:00			`Contributing`
			`------------`

			`To any interested in making the rl baselines better, there are still some improvements`
			`that need to be done.`
			You can check issues in the `repo <https://github.com/DLR-RM/stable-baselines3/issues>`_.

			If you want to contribute, please read `CONTRIBUTING.md <https://github.com/DLR-RM/stable-baselines3/blob/master/CONTRIBUTING.md>`_ first.

Add doc 2019-09-26 09:46:40 +00:00			`Indices and tables`
			`-------------------`

			* :ref:`genindex`
			* :ref:`search`
			* :ref:`modindex`