mirror of
https://github.com/saymrwulf/stable-baselines3.git
synced 2026-05-17 21:20:11 +00:00
* Fix failing set_env test * Fix test failiing due to deprectation of env.seed * Adjust mean reward threshold in failing test * Fix her test failing due to rng * Change seed and revert reward threshold to 90 * Pin gym version * Make VecEnv compatible with gym seeding change * Revert change to VecEnv reset signature * Change subprocenv seed cmd to call reset instead * Fix type check * Add backward compat * Add `compat_gym_seed` helper * Add goal env checks in env_checker * Add docs on HER requirements for envs * Capture user warning in test with inverted box space * Update ale-py version * Fix randint * Allow noop_max to be zero * Update changelog * Update docker image * Update doc conda env and dockerfile * Custom envs should not have any warnings * Fix test for numpy >= 1.21 * Add check for vectorized compute reward * Bump to gym 0.24 * Fix gym default step docstring * Test downgrading gym * Revert "Test downgrading gym" This reverts commit 0072b77156c006ada8a1d6e26ce347ed85a83eeb. * Fix protobuf error * Fix in dependencies * Fix protobuf dep * Use newest version of cartpole * Update gym * Fix warning * Loosen required scipy version * Scipy no longer needed * Try gym 0.25 * Silence warnings from gym * Filter warnings during tests * Update doc * Update requirements * Add gym 26 compat in vec env * Fixes in envs and tests for gym 0.26+ * Enforce gym 0.26 api * format * Fix formatting * Fix dependencies * Fix syntax * Cleanup doc and warnings * Faster tests * Higher budget for HER perf test (revert prev change) * Fixes and update doc * Fix doc build * Fix breaking change * Fixes for rendering * Rename variables in monitor * update render method for gym 0.26 API backwards compatible (mode argument is allowed) while using the gym 0.26 API (render mode is determined at environment creation) * update tests and docs to new gym render API * undo removal of render modes metatadata check * set rgb_array as default render mode for gym.make * undo changes & raise warning if not 'rgb_array' * Fix type check * Remove recursion and fix type checking * Remove hacks for protobuf and gym 0.24 * Fix type annotations * reuse existing render_mode attribute * return tiled images for 'human' render mode * Allow to use opencv for human render, fix typos * Add warning when using non-zero start with Discrete (fixes #1197) * Fix type checking * Bug fixes and handle more cases * Throw proper warnings * Update test * Fix new metadata name * Ignore numpy warnings * Fixes in vec recorder * Global ignore * Filter local warning too * Monkey patch not needed for gym 26 * Add doc of VecEnv vs Gym API * Add render test * Fix return type * Update VecEnv vs Gym API doc * Fix for custom render mode * Fix return type * Fix type checking * check test env test_buffer * skip render check * check env test_dict_env * test_env test_gae * check envs in remaining tests * Update tests * Add warning for Discrete action space with non-zero (#1295) * Fix atari annotation * ignore get_action_meanings [attr-defined] * Fix mypy issues * Add patch for gym/gymnasium transition * Switch to gymnasium * Rely on signature instead of version * More patches * Type ignore because of https://github.com/Farama-Foundation/Gymnasium/pull/39 * Fix doc build * Fix pytype errors * Fix atari requirement * Update env checker due to change in dtype for Discrete * Fix type hint * Convert spaces for saved models * Ignore pytype * Remove gitlab CI * Disable pytype for convert space * Fix undefined info * Fix undefined info * Upgrade shimmy * Fix wrappers type annotation (need PR from Gymnasium) * Fix gymnasium dependency * Fix dependency declaration * Cap pygame version for python 3.7 * Point to master branch (v0.28.0) * Fix: use main not master branch * Rename done to terminated * Fix pygame dependency for python 3.7 * Rename gym to gymnasium * Update Gymnasium * Fix test * Fix tests * Forks don't have access to private variables * Fix linter warnings * Update read the doc env * Fix env checker for GoalEnv * Fix import * Update env checker (more info) and fix dtype * Use micromamab for Docker * Update dependencies * Clarify VecEnv doc * Fix Gymnasium version * Copy file only after mamba install * [ci skip] Update docker doc * Polish code * Reformat * Remove deprecated features * Ignore warning * Update doc * Update examples and changelog * Fix type annotation bundle (SAC, TD3, A2C, PPO, base class) (#1436) * Fix SAC type hints, improve DQN ones * Fix A2C and TD3 type hints * Fix PPO type hints * Fix on-policy type hints * Fix base class type annotation, do not use defaults * Update version * Disable mypy for python 3.7 * Rename Gym26StepReturn * Update continuous critic type annotation * Fix pytype complain --------- Co-authored-by: Carlos Luis <carlos.luisgonc@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Thomas Lips <37955681+tlpss@users.noreply.github.com> Co-authored-by: tlips <thomas.lips@ugent.be> Co-authored-by: tlpss <thomas17.lips@gmail.com> Co-authored-by: Quentin GALLOUÉDEC <gallouedec.quentin@gmail.com>
214 lines
8.3 KiB
ReStructuredText
214 lines
8.3 KiB
ReStructuredText
.. _vec_env:
|
|
|
|
.. automodule:: stable_baselines3.common.vec_env
|
|
|
|
Vectorized Environments
|
|
=======================
|
|
|
|
Vectorized Environments are a method for stacking multiple independent environments into a single environment.
|
|
Instead of training an RL agent on 1 environment per step, it allows us to train it on ``n`` environments per step.
|
|
Because of this, ``actions`` passed to the environment are now a vector (of dimension ``n``).
|
|
It is the same for ``observations``, ``rewards`` and end of episode signals (``dones``).
|
|
In the case of non-array observation spaces such as ``Dict`` or ``Tuple``, where different sub-spaces
|
|
may have different shapes, the sub-observations are vectors (of dimension ``n``).
|
|
|
|
============= ======= ============ ======== ========= ================
|
|
Name ``Box`` ``Discrete`` ``Dict`` ``Tuple`` Multi Processing
|
|
============= ======= ============ ======== ========= ================
|
|
DummyVecEnv ✔️ ✔️ ✔️ ✔️ ❌️
|
|
SubprocVecEnv ✔️ ✔️ ✔️ ✔️ ✔️
|
|
============= ======= ============ ======== ========= ================
|
|
|
|
.. note::
|
|
|
|
Vectorized environments are required when using wrappers for frame-stacking or normalization.
|
|
|
|
.. note::
|
|
|
|
When using vectorized environments, the environments are automatically reset at the end of each episode.
|
|
Thus, the observation returned for the i-th environment when ``done[i]`` is true will in fact be the first observation of the next episode, not the last observation of the episode that has just terminated.
|
|
You can access the "real" final observation of the terminated episode—that is, the one that accompanied the ``done`` event provided by the underlying environment—using the ``terminal_observation`` keys in the info dicts returned by the ``VecEnv``.
|
|
|
|
|
|
.. warning::
|
|
|
|
When defining a custom ``VecEnv`` (for instance, using gym3 ``ProcgenEnv``), you should provide ``terminal_observation`` keys in the info dicts returned by the ``VecEnv``
|
|
(cf. note above).
|
|
|
|
|
|
.. warning::
|
|
|
|
When using ``SubprocVecEnv``, users must wrap the code in an ``if __name__ == "__main__":`` if using the ``forkserver`` or ``spawn`` start method (default on Windows).
|
|
On Linux, the default start method is ``fork`` which is not thread safe and can create deadlocks.
|
|
|
|
For more information, see Python's `multiprocessing guidelines <https://docs.python.org/3/library/multiprocessing.html#the-spawn-and-forkserver-start-methods>`_.
|
|
|
|
|
|
VecEnv API vs Gym API
|
|
---------------------
|
|
|
|
For consistency across Stable-Baselines3 (SB3) versions and because of its special requirements and features,
|
|
SB3 VecEnv API is not the same as Gym API.
|
|
SB3 VecEnv API is actually close to Gym 0.21 API but differs to Gym 0.26+ API:
|
|
|
|
- the ``reset()`` method only returns the observation (``obs = vec_env.reset()``) and not a tuple, the info at reset are stored in ``vec_env.reset_infos``.
|
|
|
|
- only the initial call to ``vec_env.reset()`` is required, environments are reset automatically afterward (and ``reset_infos`` is updated automatically).
|
|
|
|
- the ``vec_env.step(actions)`` method expects an array as input
|
|
(with a batch size corresponding to the number of environments) and returns a 4-tuple (and not a 5-tuple): ``obs, rewards, dones, infos`` instead of ``obs, reward, terminated, truncated, info``
|
|
where ``dones = terminated or truncated`` (for each env).
|
|
``obs, rewards, dones`` are numpy arrays with shape ``(n_envs, shape_for_single_env)`` (so with a batch dimension).
|
|
Additional information is passed via the ``infos`` value which is a list of dictionaries.
|
|
|
|
- at the end of an episode, ``infos[env_idx]["TimeLimit.truncated"] = truncated and not terminated``
|
|
tells the user if an episode was truncated or not:
|
|
you should bootstrap if ``infos[env_idx]["TimeLimit.truncated"] is True`` (episode over due to a timeout/truncation)
|
|
or ``dones[env_idx] is False`` (episode not finished).
|
|
Note: compared to Gym 0.26+ ``infos[env_idx]["TimeLimit.truncated"]`` and ``terminated`` `are mutually exclusive <https://github.com/openai/gym/issues/3102>`_.
|
|
The conversion from SB3 to Gym API is
|
|
|
|
.. code-block:: python
|
|
|
|
# done is True at the end of an episode
|
|
# dones[env_idx] = terminated[env_idx] or truncated[env_idx]
|
|
# In SB3, truncated and terminated are mutually exclusive
|
|
# infos[env_idx]["TimeLimit.truncated"] = truncated and not terminated
|
|
# terminated[env_idx] tells you whether you should bootstrap or not:
|
|
# when the episode has not ended or when the termination was a timeout/truncation
|
|
terminated[env_idx] = dones[env_idx] and not infos[env_idx]["TimeLimit.truncated"]
|
|
should_bootstrap[env_idx] = not terminated[env_idx]
|
|
|
|
|
|
- at the end of an episode, because the environment resets automatically,
|
|
we provide ``infos[env_idx]["terminal_observation"]`` which contains the last observation
|
|
of an episode (and can be used when bootstrapping, see note in the previous section)
|
|
|
|
- to overcome the current Gymnasium limitation (only one render mode allowed per env instance, see `issue #100 <https://github.com/Farama-Foundation/Gymnasium/issues/100>`_),
|
|
we recommend using ``render_mode="rgb_array"`` since we can both have the image as a numpy array and display it with OpenCV.
|
|
if no mode is passed or ``mode="rgb_array"`` is passed when calling ``vec_env.render`` then we use the default mode, otherwise, we use the OpenCV display.
|
|
Note that if ``render_mode != "rgb_array"``, you can only call ``vec_env.render()`` (without argument or with ``mode=env.render_mode``).
|
|
|
|
- the ``reset()`` method doesn't take any parameter. If you want to seed the pseudo-random generator,
|
|
you should call ``vec_env.seed(seed=seed)`` and ``obs = vec_env.reset()`` afterward.
|
|
|
|
- methods and attributes of the underlying Gym envs can be accessed, called and set using ``vec_env.get_attr("attribute_name")``,
|
|
``vec_env.env_method("method_name", args1, args2, kwargs1=kwargs1)`` and ``vec_env.set_attr("attribute_name", new_value)``.
|
|
|
|
|
|
Vectorized Environments Wrappers
|
|
--------------------------------
|
|
|
|
If you want to alter or augment a ``VecEnv`` without redefining it completely (e.g. stack multiple frames, monitor the ``VecEnv``, normalize the observation, ...), you can use ``VecEnvWrapper`` for that.
|
|
They are the vectorized equivalents (i.e., they act on multiple environments at the same time) of ``gym.Wrapper``.
|
|
|
|
You can find below an example for extracting one key from the observation:
|
|
|
|
.. code-block:: python
|
|
|
|
import numpy as np
|
|
|
|
from stable_baselines3.common.vec_env.base_vec_env import VecEnv, VecEnvStepReturn, VecEnvWrapper
|
|
|
|
|
|
class VecExtractDictObs(VecEnvWrapper):
|
|
"""
|
|
A vectorized wrapper for filtering a specific key from dictionary observations.
|
|
Similar to Gym's FilterObservation wrapper:
|
|
https://github.com/openai/gym/blob/master/gym/wrappers/filter_observation.py
|
|
|
|
:param venv: The vectorized environment
|
|
:param key: The key of the dictionary observation
|
|
"""
|
|
|
|
def __init__(self, venv: VecEnv, key: str):
|
|
self.key = key
|
|
super().__init__(venv=venv, observation_space=venv.observation_space.spaces[self.key])
|
|
|
|
def reset(self) -> np.ndarray:
|
|
obs = self.venv.reset()
|
|
return obs[self.key]
|
|
|
|
def step_async(self, actions: np.ndarray) -> None:
|
|
self.venv.step_async(actions)
|
|
|
|
def step_wait(self) -> VecEnvStepReturn:
|
|
obs, reward, done, info = self.venv.step_wait()
|
|
return obs[self.key], reward, done, info
|
|
|
|
env = DummyVecEnv([lambda: gym.make("FetchReach-v1")])
|
|
# Wrap the VecEnv
|
|
env = VecExtractDictObs(env, key="observation")
|
|
|
|
|
|
VecEnv
|
|
------
|
|
|
|
.. autoclass:: VecEnv
|
|
:members:
|
|
|
|
DummyVecEnv
|
|
-----------
|
|
|
|
.. autoclass:: DummyVecEnv
|
|
:members:
|
|
|
|
SubprocVecEnv
|
|
-------------
|
|
|
|
.. autoclass:: SubprocVecEnv
|
|
:members:
|
|
|
|
Wrappers
|
|
--------
|
|
|
|
VecFrameStack
|
|
~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: VecFrameStack
|
|
:members:
|
|
|
|
StackedObservations
|
|
~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: stable_baselines3.common.vec_env.stacked_observations.StackedObservations
|
|
:members:
|
|
|
|
VecNormalize
|
|
~~~~~~~~~~~~
|
|
|
|
.. autoclass:: VecNormalize
|
|
:members:
|
|
|
|
|
|
VecVideoRecorder
|
|
~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: VecVideoRecorder
|
|
:members:
|
|
|
|
|
|
VecCheckNan
|
|
~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: VecCheckNan
|
|
:members:
|
|
|
|
|
|
VecTransposeImage
|
|
~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: VecTransposeImage
|
|
:members:
|
|
|
|
VecMonitor
|
|
~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: VecMonitor
|
|
:members:
|
|
|
|
VecExtractDictObs
|
|
~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: VecExtractDictObs
|
|
:members:
|