mirror of
https://github.com/saymrwulf/stable-baselines3.git
synced 2026-05-17 21:20:11 +00:00
* init commit tensorboard-integration * Added tb logger to ppo (with output exclusions) * fixed truncated stdout * categorize stdout outputs by tag * separated exclusions from values, added missing logs * saving exclusions as dict instead of list * reformatting, auto run indexing * included renaming suggestions, fixed tests * tb support for sac * linting * moved logging to base class * tb support for td3 * removed histograms, non-verbose output working * modifed changelog * linting * fixed type error * moved logger config to utils * removed episode_rewards log from ppo * Enable tensorboard in tests * Remove unused import * Update logger sub titles * Minor edit for PPO * Update logger and tb log folder * Pass correct logger to Callbacks * updated docs * added tb example image to docs * add support for continuing training in tensorboard * added tensorboard to docs index * added tb test * moved logger config to _setup_learn, updated tests * accessing verbose from base class * Update doc and tests * Rename session -> time * Update version * Update logger truncate * Update types * Remove duplicated code Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
82 lines
2.5 KiB
ReStructuredText
82 lines
2.5 KiB
ReStructuredText
.. _tensorboard:
|
|
|
|
Tensorboard Integration
|
|
=======================
|
|
|
|
Basic Usage
|
|
------------
|
|
|
|
To use Tensorboard with stable baselines3, you simply need to pass the location of the log folder to the RL agent:
|
|
|
|
.. code-block:: python
|
|
|
|
from stable_baselines3 import A2C
|
|
|
|
model = A2C('MlpPolicy', 'CartPole-v1', verbose=1, tensorboard_log="./a2c_cartpole_tensorboard/")
|
|
model.learn(total_timesteps=10000)
|
|
|
|
|
|
You can also define custom logging name when training (by default it is the algorithm name)
|
|
|
|
.. code-block:: python
|
|
|
|
from stable_baselines3 import A2C
|
|
|
|
model = A2C('MlpPolicy', 'CartPole-v1', verbose=1, tensorboard_log="./a2c_cartpole_tensorboard/")
|
|
model.learn(total_timesteps=10000, tb_log_name="first_run")
|
|
# Pass reset_num_timesteps=False to continue the training curve in tensorboard
|
|
# By default, it will create a new curve
|
|
model.learn(total_timesteps=10000, tb_log_name="second_run", reset_num_timesteps=False)
|
|
model.learn(total_timesteps=10000, tb_log_name="third_run", reset_num_timesteps=False)
|
|
|
|
|
|
Once the learn function is called, you can monitor the RL agent during or after the training, with the following bash command:
|
|
|
|
.. code-block:: bash
|
|
|
|
tensorboard --logdir ./a2c_cartpole_tensorboard/
|
|
|
|
you can also add past logging folders:
|
|
|
|
.. code-block:: bash
|
|
|
|
tensorboard --logdir ./a2c_cartpole_tensorboard/;./ppo2_cartpole_tensorboard/
|
|
|
|
It will display information such as the episode reward (when using a ``Monitor`` wrapper), the model losses and other parameter unique to some models.
|
|
|
|
.. image:: ../_static/img/Tensorboard_example.png
|
|
:width: 600
|
|
:alt: plotting
|
|
|
|
Logging More Values
|
|
-------------------
|
|
|
|
Using a callback, you can easily log more values with TensorBoard.
|
|
Here is a simple example on how to log both additional tensor or arbitrary scalar value:
|
|
|
|
.. code-block:: python
|
|
|
|
import numpy as np
|
|
|
|
from stable_baselines3 import SAC
|
|
from stable_baselines3.common.callbacks import BaseCallback
|
|
|
|
model = SAC("MlpPolicy", "Pendulum-v0", tensorboard_log="/tmp/sac/", verbose=1)
|
|
|
|
|
|
class TensorboardCallback(BaseCallback):
|
|
"""
|
|
Custom callback for plotting additional values in tensorboard.
|
|
"""
|
|
|
|
def __init__(self, verbose=0):
|
|
super(TensorboardCallback, self).__init__(verbose)
|
|
|
|
def _on_step(self) -> bool:
|
|
# Log scalar value (here a random variable)
|
|
value = np.random.random()
|
|
self.logger.record('random_value', value)
|
|
return True
|
|
|
|
|
|
model.learn(50000, callback=TensorboardCallback())
|