stable-baselines3/docs/common/distributions.rst

27 lines
1.1 KiB
ReStructuredText
Raw Normal View History

2020-05-07 08:10:51 +00:00
.. _distributions:
Probability Distributions
=========================
Probability distributions used for the different action spaces:
- ``CategoricalDistribution`` -> Discrete
- ``DiagGaussianDistribution`` -> Box (continuous actions)
- ``StateDependentNoiseDistribution`` -> Box (continuous actions) when ``use_sde=True``
.. - ``MultiCategoricalDistribution`` -> MultiDiscrete
.. - ``BernoulliDistribution`` -> MultiBinary
The policy networks output parameters for the distributions (named ``flat`` in the methods).
Actions are then sampled from those distributions.
For instance, in the case of discrete actions. The policy network outputs probability
of taking each action. The ``CategoricalDistribution`` allows sampling from it,
2020-05-07 08:10:51 +00:00
computes the entropy, the log probability (``log_prob``) and backpropagate the gradient.
In the case of continuous actions, a Gaussian distribution is used. The policy network outputs
mean and (log) std of the distribution (assumed to be a ``DiagGaussianDistribution``).
.. automodule:: stable_baselines3.common.distributions
:members: