stable-baselines3/docs/common/distributions.rst

.. _distributions:

Probability Distributions
=========================

Probability distributions used for the different action spaces:

- ``CategoricalDistribution`` -> Discrete
- ``DiagGaussianDistribution`` -> Box (continuous actions)
- ``StateDependentNoiseDistribution`` -> Box (continuous actions) when ``use_sde=True``

.. - ``MultiCategoricalDistribution`` -> MultiDiscrete
.. - ``BernoulliDistribution`` -> MultiBinary

The policy networks output parameters for the distributions (named ``flat`` in the methods).
Actions are then sampled from those distributions.

For instance, in the case of discrete actions. The policy network outputs probability
of taking each action. The ``CategoricalDistribution`` allows sampling from it,
computes the entropy, the log probability (``log_prob``) and backpropagate the gradient.

In the case of continuous actions, a Gaussian distribution is used. The policy network outputs
mean and (log) std of the distribution (assumed to be a ``DiagGaussianDistribution``).

.. automodule:: stable_baselines3.common.distributions
  :members:
Add base doc 2020-05-07 08:10:51 +00:00			`.. _distributions:`

			`Probability Distributions`
			`=========================`

			`Probability distributions used for the different action spaces:`

			- ``CategoricalDistribution`` -> Discrete
			- ``DiagGaussianDistribution`` -> Box (continuous actions)
			- ``StateDependentNoiseDistribution`` -> Box (continuous actions) when ``use_sde=True``

			.. - ``MultiCategoricalDistribution`` -> MultiDiscrete
			.. - ``BernoulliDistribution`` -> MultiBinary

			The policy networks output parameters for the distributions (named ``flat`` in the methods).
			`Actions are then sampled from those distributions.`

			`For instance, in the case of discrete actions. The policy network outputs probability`
Documentation update (#1732) * Update RL Tips * Fix grammar * Update SBX doc * Fix various typos and grammar mistakes 2023-11-03 16:17:46 +00:00			of taking each action. The ``CategoricalDistribution`` allows sampling from it,
Add base doc 2020-05-07 08:10:51 +00:00			computes the entropy, the log probability (``log_prob``) and backpropagate the gradient.

			`In the case of continuous actions, a Gaussian distribution is used. The policy network outputs`
			mean and (log) std of the distribution (assumed to be a ``DiagGaussianDistribution``).

			`.. automodule:: stable_baselines3.common.distributions`
			`:members:`