diff --git a/docs/guide/imitation.rst b/docs/guide/imitation.rst new file mode 100644 index 0000000..df7895c --- /dev/null +++ b/docs/guide/imitation.rst @@ -0,0 +1,55 @@ +.. _imitation: + +Imitation Learning +================== + +The `imitation `__ library implements +imitation learning algorithms on top of Stable-Baselines3, including: + + - Behavioral Cloning + - `DAgger `_ with synthetic examples + - `Adversarial Inverse Reinforcement Learning `_ (AIRL) + - `Generative Adversarial Imitation Learning `_ (GAIL) + + +It also provides `CLI scripts <#cli-quickstart>`_ for training and saving +demonstrations from RL experts, and for training imitation learners on these demonstrations. + + +Installation +------------ + +Installation requires Python 3.7+: + +:: + + pip install imitation + + +CLI Quickstart +--------------------- + +:: + + # Train PPO agent on cartpole and collect expert demonstrations + python -m imitation.scripts.expert_demos with fast cartpole log_dir=quickstart + + # Train GAIL from demonstrations + python -m imitation.scripts.train_adversarial with fast gail cartpole rollout_path=quickstart/rollouts/final.pkl + + # Train AIRL from demonstrations + python -m imitation.scripts.train_adversarial with fast airl cartpole rollout_path=quickstart/rollouts/final.pkl + + +.. note:: + + You can remove the ``fast`` option to run training to completion. For more CLI options + and information on reading Tensorboard plots, see the + `README `_. + + +Python Interface Quickstart +--------------------------- + +This `example script `_ +uses the Python API to train BC, GAIL, and AIRL models on CartPole data. diff --git a/docs/guide/migration.rst b/docs/guide/migration.rst index 1f7f5ce..0d54ad4 100644 --- a/docs/guide/migration.rst +++ b/docs/guide/migration.rst @@ -46,8 +46,8 @@ Breaking Changes - The features extractor (CNN extractor) is shared between policy and q-networks for DDPG/SAC/TD3 and only the policy loss used to update it (much faster) - Tensorboard legacy logging was dropped in favor of having one logger for the terminal and Tensorboard (cf :ref:`Tensorboard integration `) - We dropped ACKTR/ACER support because of their complexity compared to simpler alternatives (PPO, SAC, TD3) performing as good. -- We dropped GAIL support as we are focusing on model-free RL only, you can however take a look at the `Imitation Learning Baseline Implementations `_ - which are based on SB3. +- We dropped GAIL support as we are focusing on model-free RL only, you can however take a look at the :ref:`imitation project ` which implements + GAIL and other imitation learning algorithms on top of SB3. - ``action_probability`` is currently not implemented in the base class You can take a look at the `issue about SB3 implementation design `_ for more details. diff --git a/docs/index.rst b/docs/index.rst index 9b7edea..cfbb6fe 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -44,6 +44,7 @@ Main Features guide/callbacks guide/tensorboard guide/rl_zoo + guide/imitation guide/migration guide/checking_nan guide/developer diff --git a/docs/misc/changelog.rst b/docs/misc/changelog.rst index 1474856..09957f4 100644 --- a/docs/misc/changelog.rst +++ b/docs/misc/changelog.rst @@ -40,6 +40,7 @@ Others: Documentation: ^^^^^^^^^^^^^^ - Added first draft of migration guide +- Added intro to `imitation `_ library (@shwang) - Enabled doc for ``CnnPolicies``