2019-09-26 09:46:40 +00:00
|
|
|
.. _quickstart:
|
|
|
|
|
|
|
|
|
|
===============
|
|
|
|
|
Getting Started
|
|
|
|
|
===============
|
|
|
|
|
|
|
|
|
|
Most of the library tries to follow a sklearn-like syntax for the Reinforcement Learning algorithms.
|
|
|
|
|
|
2020-05-07 08:10:51 +00:00
|
|
|
Here is a quick example of how to train and run A2C on a CartPole environment:
|
2019-09-26 09:46:40 +00:00
|
|
|
|
|
|
|
|
.. code-block:: python
|
|
|
|
|
|
|
|
|
|
import gym
|
|
|
|
|
|
2020-05-07 08:10:51 +00:00
|
|
|
from stable_baselines3 import A2C
|
2019-09-26 09:46:40 +00:00
|
|
|
|
2022-10-03 13:15:39 +00:00
|
|
|
env = gym.make("CartPole-v1")
|
2019-09-26 09:46:40 +00:00
|
|
|
|
2022-10-03 13:15:39 +00:00
|
|
|
model = A2C("MlpPolicy", env, verbose=1)
|
2019-09-26 09:46:40 +00:00
|
|
|
model.learn(total_timesteps=10000)
|
|
|
|
|
|
|
|
|
|
obs = env.reset()
|
|
|
|
|
for i in range(1000):
|
2020-05-07 08:10:51 +00:00
|
|
|
action, _state = model.predict(obs, deterministic=True)
|
|
|
|
|
obs, reward, done, info = env.step(action)
|
2019-09-26 09:46:40 +00:00
|
|
|
env.render()
|
2020-05-07 08:10:51 +00:00
|
|
|
if done:
|
|
|
|
|
obs = env.reset()
|
2019-09-26 09:46:40 +00:00
|
|
|
|
2022-03-07 11:20:43 +00:00
|
|
|
.. note::
|
|
|
|
|
|
|
|
|
|
You can find explanations about the logger output and names in the :ref:`Logger <logger>` section.
|
|
|
|
|
|
2019-09-26 09:46:40 +00:00
|
|
|
|
|
|
|
|
Or just train a model with a one liner if
|
|
|
|
|
`the environment is registered in Gym <https://github.com/openai/gym/wiki/Environments>`_ and if
|
|
|
|
|
the policy is registered:
|
|
|
|
|
|
|
|
|
|
.. code-block:: python
|
|
|
|
|
|
2020-05-07 08:10:51 +00:00
|
|
|
from stable_baselines3 import A2C
|
2019-09-26 09:46:40 +00:00
|
|
|
|
2022-10-03 13:15:39 +00:00
|
|
|
model = A2C("MlpPolicy", "CartPole-v1").learn(10000)
|