Add imitation library docs (#200)

* docs: Add imitation library docs

* Fix doc syntax errors

* Fix internal link; PDF->abstract for DAgger for consistency

* Grammar

* Update migration guide

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Co-authored-by: Adam Gleave <adam@gleave.me>
This commit is contained in:
Steven H. Wang 2020-10-24 09:33:26 -07:00 committed by GitHub
parent dd6e361204
commit b252f4212c
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
4 changed files with 59 additions and 2 deletions

55
docs/guide/imitation.rst Normal file
View file

@ -0,0 +1,55 @@
.. _imitation:
Imitation Learning
==================
The `imitation <https://github.com/HumanCompatibleAI/imitation>`__ library implements
imitation learning algorithms on top of Stable-Baselines3, including:
- Behavioral Cloning
- `DAgger <https://arxiv.org/abs/1011.0686>`_ with synthetic examples
- `Adversarial Inverse Reinforcement Learning <https://arxiv.org/abs/1710.11248>`_ (AIRL)
- `Generative Adversarial Imitation Learning <https://arxiv.org/abs/1606.03476>`_ (GAIL)
It also provides `CLI scripts <#cli-quickstart>`_ for training and saving
demonstrations from RL experts, and for training imitation learners on these demonstrations.
Installation
------------
Installation requires Python 3.7+:
::
pip install imitation
CLI Quickstart
---------------------
::
# Train PPO agent on cartpole and collect expert demonstrations
python -m imitation.scripts.expert_demos with fast cartpole log_dir=quickstart
# Train GAIL from demonstrations
python -m imitation.scripts.train_adversarial with fast gail cartpole rollout_path=quickstart/rollouts/final.pkl
# Train AIRL from demonstrations
python -m imitation.scripts.train_adversarial with fast airl cartpole rollout_path=quickstart/rollouts/final.pkl
.. note::
You can remove the ``fast`` option to run training to completion. For more CLI options
and information on reading Tensorboard plots, see the
`README <https://github.com/HumanCompatibleAI/imitation#cli-quickstart>`_.
Python Interface Quickstart
---------------------------
This `example script <https://github.com/HumanCompatibleAI/imitation/blob/master/examples/quickstart.py>`_
uses the Python API to train BC, GAIL, and AIRL models on CartPole data.

View file

@ -46,8 +46,8 @@ Breaking Changes
- The features extractor (CNN extractor) is shared between policy and q-networks for DDPG/SAC/TD3 and only the policy loss used to update it (much faster)
- Tensorboard legacy logging was dropped in favor of having one logger for the terminal and Tensorboard (cf :ref:`Tensorboard integration <tensorboard>`)
- We dropped ACKTR/ACER support because of their complexity compared to simpler alternatives (PPO, SAC, TD3) performing as good.
- We dropped GAIL support as we are focusing on model-free RL only, you can however take a look at the `Imitation Learning Baseline Implementations <https://github.com/HumanCompatibleAI/imitation>`_
which are based on SB3.
- We dropped GAIL support as we are focusing on model-free RL only, you can however take a look at the :ref:`imitation project <imitation>` which implements
GAIL and other imitation learning algorithms on top of SB3.
- ``action_probability`` is currently not implemented in the base class
You can take a look at the `issue about SB3 implementation design <https://github.com/hill-a/stable-baselines/issues/576>`_ for more details.

View file

@ -44,6 +44,7 @@ Main Features
guide/callbacks
guide/tensorboard
guide/rl_zoo
guide/imitation
guide/migration
guide/checking_nan
guide/developer

View file

@ -40,6 +40,7 @@ Others:
Documentation:
^^^^^^^^^^^^^^
- Added first draft of migration guide
- Added intro to `imitation <https://github.com/HumanCompatibleAI/imitation>`_ library (@shwang)
- Enabled doc for ``CnnPolicies``