2020-06-29 09:16:54 +00:00
.. _dqn:
.. automodule :: stable_baselines3.dqn
DQN
===
2020-10-28 08:55:16 +00:00
`Deep Q Network (DQN) <https://arxiv.org/abs/1312.5602> `_ builds on `Fitted Q-Iteration (FQI) <http://ml.informatik.uni-freiburg.de/former/_media/publications/rieecml05.pdf> `_
and make use of different tricks to stabilize the learning with neural networks: it uses a replay buffer, a target network and gradient clipping.
2020-06-29 09:16:54 +00:00
.. rubric :: Available Policies
.. autosummary ::
:nosignatures:
MlpPolicy
CnnPolicy
2021-05-11 10:29:30 +00:00
MultiInputPolicy
2020-06-29 09:16:54 +00:00
Notes
-----
- Original paper: https://arxiv.org/abs/1312.5602
- Further reference: https://www.nature.com/articles/nature14236
2024-11-08 10:01:04 +00:00
- Tutorial "From Tabular Q-Learning to DQN": https://github.com/araffin/rlss23-dqn-tutorial
2020-06-29 09:16:54 +00:00
.. note ::
This implementation provides only vanilla Deep Q-Learning and has no extensions such as Double-DQN, Dueling-DQN and Prioritized Experience Replay.
Can I use?
----------
- Recurrent policies: ❌
2021-12-01 21:30:09 +00:00
- Multi processing: ✔️
2020-06-29 09:16:54 +00:00
- Gym spaces:
============= ====== ===========
Space Action Observation
============= ====== ===========
2021-05-16 16:21:07 +00:00
Discrete ✔️ ✔️
Box ❌ ✔️
MultiDiscrete ❌ ✔️
MultiBinary ❌ ✔️
Dict ❌ ✔️️
2020-06-29 09:16:54 +00:00
============= ====== ===========
Example
-------
2021-08-09 13:23:25 +00:00
This example is only to demonstrate the use of the library and its functions, and the trained agents may not solve the environments. Optimized hyperparameters can be found in RL Zoo `repository <https://github.com/DLR-RM/rl-baselines3-zoo> `_ .
2020-06-29 09:16:54 +00:00
.. code-block :: python
2023-04-14 11:13:59 +00:00
import gymnasium as gym
2020-06-29 09:16:54 +00:00
from stable_baselines3 import DQN
2023-05-08 11:48:26 +00:00
env = gym.make("CartPole-v1", render_mode="human")
2020-06-29 09:16:54 +00:00
2021-03-06 13:17:43 +00:00
model = DQN("MlpPolicy", env, verbose=1)
2020-06-29 09:16:54 +00:00
model.learn(total_timesteps=10000, log_interval=4)
2021-03-31 08:31:03 +00:00
model.save("dqn_cartpole")
2020-06-29 09:16:54 +00:00
del model # remove to demonstrate saving and loading
2021-03-31 08:31:03 +00:00
model = DQN.load("dqn_cartpole")
2020-06-29 09:16:54 +00:00
2023-05-08 11:48:26 +00:00
obs, info = env.reset()
2020-06-29 09:16:54 +00:00
while True:
action, _states = model.predict(obs, deterministic=True)
2023-05-08 11:48:26 +00:00
obs, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
obs, info = env.reset()
2020-06-29 09:16:54 +00:00
2020-10-28 08:55:16 +00:00
Results
-------
Atari Games
^^^^^^^^^^^
The complete learning curves are available in the `associated PR #110 <https://github.com/DLR-RM/stable-baselines3/pull/110> `_ .
How to replicate the results?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Clone the `rl-zoo repo <https://github.com/DLR-RM/rl-baselines3-zoo> `_ :
.. code-block :: bash
git clone https://github.com/DLR-RM/rl-baselines3-zoo
cd rl-baselines3-zoo/
Run the benchmark (replace `` $ENV_ID `` by the env id, for instance `` BreakoutNoFrameskip-v4 `` ):
.. code-block :: bash
2020-11-03 11:34:09 +00:00
python train.py --algo dqn --env $ENV_ID --eval-episodes 10 --eval-freq 10000
2020-10-28 08:55:16 +00:00
Plot the results:
.. code-block :: bash
python scripts/all_plots.py -a dqn -e Pong Breakout -f logs/ -o logs/dqn_results
python scripts/plot_from_file.py -i logs/dqn_results.pkl -latex -l DQN
2020-06-29 09:16:54 +00:00
Parameters
----------
.. autoclass :: DQN
:members:
:inherited-members:
.. _dqn_policies:
DQN Policies
-------------
.. autoclass :: MlpPolicy
:members:
:inherited-members:
2020-10-22 09:56:43 +00:00
.. autoclass :: stable_baselines3.dqn.policies.DQNPolicy
:members:
:noindex:
2020-06-29 09:16:54 +00:00
.. autoclass :: CnnPolicy
:members:
2021-05-12 09:21:54 +00:00
.. autoclass :: MultiInputPolicy
:members: