stable-baselines3/docs/guide/custom_env.rst

.. _custom_env:

Using Custom Environments
==========================

To use the RL baselines with custom environments, they just need to follow the *gym* interface.
That is to say, your environment must implement the following methods (and inherits from OpenAI Gym Class):


.. note::
	If you are using images as input, the observation must be of type ``np.uint8`` and be contained in [0, 255].
	By default, the observation is normalized by SB3 pre-processing (dividing by 255 to have values in [0, 1]) when using CNN policies.
	Images can be either channel-first or channel-last.

  If you want to use ``CnnPolicy`` or ``MultiInputPolicy`` with image-like observation (3D tensor) that are already normalized, you must pass ``normalize_images=False``
	to the policy (using ``policy_kwargs`` parameter, ``policy_kwargs=dict(normalize_images=False)``)
	and make sure your image is in the **channel-first** format.


.. note::

  Although SB3 supports both channel-last and channel-first images as input, we recommend using the channel-first convention when possible.
  Under the hood, when a channel-last image is passed, SB3 uses a ``VecTransposeImage`` wrapper to re-order the channels.


.. code-block:: python

  import gym
  from gym import spaces

  class CustomEnv(gym.Env):
      """Custom Environment that follows gym interface"""
      metadata = {"render.modes": ["human"]}

      def __init__(self, arg1, arg2, ...):
          super(CustomEnv, self).__init__()
          # Define action and observation space
          # They must be gym.spaces objects
          # Example when using discrete actions:
          self.action_space = spaces.Discrete(N_DISCRETE_ACTIONS)
          # Example for using image as input (channel-first; channel-last also works):
          self.observation_space = spaces.Box(low=0, high=255,
                                              shape=(N_CHANNELS, HEIGHT, WIDTH), dtype=np.uint8)

      def step(self, action):
          ...
          return observation, reward, done, info
      def reset(self):
          ...
          return observation  # reward, done, info can't be included
      def render(self, mode="human"):
          ...
      def close (self):
          ...


Then you can define and train a RL agent with:

.. code-block:: python

  # Instantiate the env
  env = CustomEnv(arg1, ...)
  # Define and Train the agent
  model = A2C("CnnPolicy", env).learn(total_timesteps=1000)


To check that your environment follows the Gym interface that SB3 supports, please use:

.. code-block:: python

	from stable_baselines3.common.env_checker import check_env

	env = CustomEnv(arg1, ...)
	# It will check your custom environment and output additional warnings if needed
	check_env(env)

Gym also have its own `env checker <https://www.gymlibrary.ml/content/api/#checking-api-conformity>`_ but it checks a superset of what SB3 supports (SB3 does not support all Gym features).

We have created a `colab notebook <https://colab.research.google.com/github/araffin/rl-tutorial-jnrr19/blob/master/5_custom_gym_env.ipynb>`_ for a concrete example on creating a custom environment along with an example of using it with Stable-Baselines3 interface.

Alternatively, you may look at OpenAI Gym `built-in environments <https://www.gymlibrary.ml/>`_. However, the readers are cautioned as per OpenAI Gym `official wiki <https://github.com/openai/gym/wiki/FAQ>`_, its advised not to customize their built-in environments. It is better to copy and create new ones if you need to modify them.

Optionally, you can also register the environment with gym, that will allow you to create the RL agent in one line (and use ``gym.make()`` to instantiate the env):

.. code-block:: python

	from gym.envs.registration import register
	# Example for the CartPole environment
	register(
	    # unique identifier for the env `name-version`
	    id="CartPole-v1",
	    # path to the class for creating the env
	    # Note: entry_point also accept a class as input (and not only a string)
	    entry_point="gym.envs.classic_control:CartPoleEnv",
	    # Max number of steps per episode, using a `TimeLimitWrapper`
	    max_episode_steps=500,
	)


In the project, for testing purposes, we use a custom environment named ``IdentityEnv``
defined `in this file <https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/common/envs/identity_env.py>`_.
An example of how to use it can be found `here <https://github.com/DLR-RM/stable-baselines3/blob/master/tests/test_identity.py>`_.
Add base doc 2020-05-07 08:10:51 +00:00			`.. _custom_env:`

			`Using Custom Environments`
			`==========================`

Custom environment page modified. Following fixes are committed in response to issue#755. (#758) * Page modified. Following fixes are committed in response to issue#755. - fixed the broken url on creating custom gym environment. Also added appropriate advice by citing official OpenAi gym documents. - SB3 text tweaked. * modified page - updated the in-line text hyperlinks to follow Sphinx restructured text format. * modified page - updated the in-line text hyperlinks to follow Sphinx restructured text format. - updated text grammar * Language Co-authored-by: Anssi <kaneran21@hotmail.com> 2022-02-05 11:36:36 +00:00			`To use the RL baselines with custom environments, they just need to follow the gym interface.`
Add base doc 2020-05-07 08:10:51 +00:00			`That is to say, your environment must implement the following methods (and inherits from OpenAI Gym Class):`


			`.. note::`
Fix support of image like normalized inputs (#1214) * Fix support of image like normalized inputs * Improve docstring and warning message. * Don't check if obs is image when normalize_images is False (lil opt) * Comment fix * Fix normalize_images not passed to parent * Check for subclasses too * Remove useless multiline * Update version and add comment * Fix some typos Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> 2022-12-20 12:18:28 +00:00			If you are using images as input, the observation must be of type ``np.uint8`` and be contained in [0, 255].
			`By default, the observation is normalized by SB3 pre-processing (dividing by 255 to have values in [0, 1]) when using CNN policies.`
			`Images can be either channel-first or channel-last.`

			If you want to use ``CnnPolicy`` or ``MultiInputPolicy`` with image-like observation (3D tensor) that are already normalized, you must pass ``normalize_images=False``
			to the policy (using ``policy_kwargs`` parameter, ``policy_kwargs=dict(normalize_images=False)``)
			`and make sure your image is in the channel-first format.`
Add base doc 2020-05-07 08:10:51 +00:00

Add code of conduct + update doc (#373) * Add code of conduct * Fix DQN doc example * Update doc (channel-last/first) * Apply suggestions from code review Co-authored-by: Anssi <kaneran21@hotmail.com> * Apply suggestions from code review Co-authored-by: Adam Gleave <adam@gleave.me> Co-authored-by: Anssi <kaneran21@hotmail.com> Co-authored-by: Adam Gleave <adam@gleave.me> 2021-03-31 08:31:03 +00:00			`.. note::`

			`Although SB3 supports both channel-last and channel-first images as input, we recommend using the channel-first convention when possible.`
			Under the hood, when a channel-last image is passed, SB3 uses a ``VecTransposeImage`` wrapper to re-order the channels.


Add base doc 2020-05-07 08:10:51 +00:00
			`.. code-block:: python`

			`import gym`
			`from gym import spaces`

			`class CustomEnv(gym.Env):`
update docs fix indentation (#764) * update docs fix indentation Changed code block indentation from 2 spaces to 4 spaces for consistency. * update changelog * Update changelog.rst Co-authored-by: Anssi <kaneran21@hotmail.com> 2022-02-07 19:00:53 +00:00			`"""Custom Environment that follows gym interface"""`
Standardized the use of ``"`` for string representation (#1086) * Replace ``'`` by ``" `` in python code * Update changelog * Rm whitespace 2022-10-03 13:15:39 +00:00			`metadata = {"render.modes": ["human"]}`
update docs fix indentation (#764) * update docs fix indentation Changed code block indentation from 2 spaces to 4 spaces for consistency. * update changelog * Update changelog.rst Co-authored-by: Anssi <kaneran21@hotmail.com> 2022-02-07 19:00:53 +00:00
			`def __init__(self, arg1, arg2, ...):`
			`super(CustomEnv, self).__init__()`
			`# Define action and observation space`
			`# They must be gym.spaces objects`
			`# Example when using discrete actions:`
			`self.action_space = spaces.Discrete(N_DISCRETE_ACTIONS)`
			`# Example for using image as input (channel-first; channel-last also works):`
			`self.observation_space = spaces.Box(low=0, high=255,`
			`shape=(N_CHANNELS, HEIGHT, WIDTH), dtype=np.uint8)`

			`def step(self, action):`
			`...`
			`return observation, reward, done, info`
			`def reset(self):`
			`...`
			`return observation # reward, done, info can't be included`
Standardized the use of ``"`` for string representation (#1086) * Replace ``'`` by ``" `` in python code * Update changelog * Rm whitespace 2022-10-03 13:15:39 +00:00			`def render(self, mode="human"):`
update docs fix indentation (#764) * update docs fix indentation Changed code block indentation from 2 spaces to 4 spaces for consistency. * update changelog * Update changelog.rst Co-authored-by: Anssi <kaneran21@hotmail.com> 2022-02-07 19:00:53 +00:00			`...`
			`def close (self):`
			`...`
Add base doc 2020-05-07 08:10:51 +00:00

			`Then you can define and train a RL agent with:`

			`.. code-block:: python`

			`# Instantiate the env`
			`env = CustomEnv(arg1, ...)`
			`# Define and Train the agent`
Standardized the use of ``"`` for string representation (#1086) * Replace ``'`` by ``" `` in python code * Update changelog * Rm whitespace 2022-10-03 13:15:39 +00:00			`model = A2C("CnnPolicy", env).learn(total_timesteps=1000)`
Add base doc 2020-05-07 08:10:51 +00:00

Bump min PyTorch version (#855) 2022-04-11 16:34:15 +00:00			`To check that your environment follows the Gym interface that SB3 supports, please use:`
Add base doc 2020-05-07 08:10:51 +00:00
			`.. code-block:: python`

			`from stable_baselines3.common.env_checker import check_env`

			`env = CustomEnv(arg1, ...)`
			`# It will check your custom environment and output additional warnings if needed`
			`check_env(env)`

Bump min PyTorch version (#855) 2022-04-11 16:34:15 +00:00			Gym also have its own `env checker <https://www.gymlibrary.ml/content/api/#checking-api-conformity>`_ but it checks a superset of what SB3 supports (SB3 does not support all Gym features).
Add base doc 2020-05-07 08:10:51 +00:00
Custom environment page modified. Following fixes are committed in response to issue#755. (#758) * Page modified. Following fixes are committed in response to issue#755. - fixed the broken url on creating custom gym environment. Also added appropriate advice by citing official OpenAi gym documents. - SB3 text tweaked. * modified page - updated the in-line text hyperlinks to follow Sphinx restructured text format. * modified page - updated the in-line text hyperlinks to follow Sphinx restructured text format. - updated text grammar * Language Co-authored-by: Anssi <kaneran21@hotmail.com> 2022-02-05 11:36:36 +00:00			We have created a `colab notebook <https://colab.research.google.com/github/araffin/rl-tutorial-jnrr19/blob/master/5_custom_gym_env.ipynb>`_ for a concrete example on creating a custom environment along with an example of using it with Stable-Baselines3 interface.
Add base doc 2020-05-07 08:10:51 +00:00
Bump min PyTorch version (#855) 2022-04-11 16:34:15 +00:00			Alternatively, you may look at OpenAI Gym `built-in environments <https://www.gymlibrary.ml/>`_. However, the readers are cautioned as per OpenAI Gym `official wiki <https://github.com/openai/gym/wiki/FAQ>`_, its advised not to customize their built-in environments. It is better to copy and create new ones if you need to modify them.
Add base doc 2020-05-07 08:10:51 +00:00
Custom environment page modified. Following fixes are committed in response to issue#755. (#758) * Page modified. Following fixes are committed in response to issue#755. - fixed the broken url on creating custom gym environment. Also added appropriate advice by citing official OpenAi gym documents. - SB3 text tweaked. * modified page - updated the in-line text hyperlinks to follow Sphinx restructured text format. * modified page - updated the in-line text hyperlinks to follow Sphinx restructured text format. - updated text grammar * Language Co-authored-by: Anssi <kaneran21@hotmail.com> 2022-02-05 11:36:36 +00:00			Optionally, you can also register the environment with gym, that will allow you to create the RL agent in one line (and use ``gym.make()`` to instantiate the env):
Doc fix and improve error messages (#598) * Fix custom env doc * Catch common mistake * Improve `EvalCallback` error message * Lint test * Update docs/guide/custom_env.rst Co-authored-by: Adam Gleave <adam@gleave.me> Co-authored-by: Adam Gleave <adam@gleave.me> 2021-10-08 16:08:31 +00:00
			`.. code-block:: python`

			`from gym.envs.registration import register`
			`# Example for the CartPole environment`
			`register(`
			# unique identifier for the env `name-version`
			`id="CartPole-v1",`
			`# path to the class for creating the env`
			`# Note: entry_point also accept a class as input (and not only a string)`
			`entry_point="gym.envs.classic_control:CartPoleEnv",`
			# Max number of steps per episode, using a `TimeLimitWrapper`
			`max_episode_steps=500,`
			`)`

Add base doc 2020-05-07 08:10:51 +00:00

			In the project, for testing purposes, we use a custom environment named ``IdentityEnv``
Dictionary Observations (#243) * First commit * Fixing missing refs from a quick merge from master * Reformat * Adding DictBuffers * Reformat * Minor reformat * added slow dict test. Added SACMultiInputPolicy for future. Added private static image transpose helper to common policy * Ran black on buffers * Ran isort * Adding StackedObservations classes used within VecStackEnvs wrappers. Made test_dict_env shorter and removed slow * Running isort :facepalm * Fixed typing issues * Adding docstrings and typing. Using util for moving data to device. * Fixed trailing commas * Fix types * Minor edits * Avoid duplicating code * Fix calls to parents * Adding assert to buffers. Updating changelong * Running format on buffers * Adding multi-input policies to dqn,td3,a2c. Fixing warnings. Fixed bug with DictReplayBuffer as Replay buffers use only 1 env * Fixing warnings, splitting is_vectorized_observation into multiple functions based on space type * Created envs folder in common. Updated imports. Moved stacked_obs to vec_env folder * Moved envs to envs directory. Moved stacked obs to vec_envs. Started update on documentation * Fixes * Running code style * Update docstrings on torch_layers * Decapitalize non-constant variables * Using NatureCNN architecture in combined extractor. Increasing img size in multi input env. Adding memory reduction in test * Update doc * Update doc * Fix format * Removing NineRoom env. Using nested preprocess. Removing mutable default args * running code style * Passing channel check through to stacked dict observations. * Running black * Adding channel control to SimpleMultiObsEnv. Passing check_channels to CombinedExtractor * Remove optimize memory for dict buffers * Update doc * Move identity env * Minor edits + bump version * Update doc * Fix doc build * Bug fixes + add support for more type of dict env * Fixes + add multi env test * Add support for vectranspose * Fix stacked obs for dict and add tests * Add check for nested spaces. Fix dict-subprocvecenv test * Fix (single) pytype error * Simplify CombinedExtractor * Fix tests * Fix check * Merge branch 'master' into feat/dict_observations * Fix for net_arch with dict and vector obs * Fixes * Add consistency test * Update env checker * Add some docs on dict obs * Update default CNN feature vector size * Refactor HER (#351) * Start refactoring HER * Fixes * Additional fixes * Faster tests * WIP: HER as a custom replay buffer * New replay only version (working with DQN) * Add support for all off-policy algorithms * Fix saving/loading * Remove ObsDictWrapper and add VecNormalize tests with dict * Stable-Baselines3 v1.0 (#354) * Bump version and update doc * Fix name * Apply suggestions from code review Co-authored-by: Adam Gleave <adam@gleave.me> * Update docs/index.rst Co-authored-by: Adam Gleave <adam@gleave.me> * Update wording for RL zoo Co-authored-by: Adam Gleave <adam@gleave.me> * Add gym-pybullet-drones project (#358) * Update projects.rst Added gym-pybullet-drones * Update projects.rst Longer title underline * Update changelog Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org> * Include SuperSuit in projects (#359) * include supersuit * longer title underline * Update changelog.rst * Fix default arguments + add bugbear (#363) * Fix potential bug + add bug bear * Remove unused variables * Minor: version bump * Add code of conduct + update doc (#373) * Add code of conduct * Fix DQN doc example * Update doc (channel-last/first) * Apply suggestions from code review Co-authored-by: Anssi <kaneran21@hotmail.com> * Apply suggestions from code review Co-authored-by: Adam Gleave <adam@gleave.me> Co-authored-by: Anssi <kaneran21@hotmail.com> Co-authored-by: Adam Gleave <adam@gleave.me> * Make installation command compatible with ZSH (#376) * Add quotes * Add Zsh bracket info * Add clarify pip installation line * Make note bold * Add Zsh pip installation note * Add handle timeouts param * Fixes * Fixes (buffer size, extend test) * Fix `max_episode_length` redefinition * Fix potential issue * Add some docs on dict obs * Fix performance bug * Fix slowdown * Add package to install (#378) * Add package to install * Update docs packages installation command Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Fix backward compat + add test * Fix VecEnv detection * Update doc * Fix vec env check * Support for `VecMonitor` for gym3-style environments (#311) * add vectorized monitor * auto format of the code * add documentation and VecExtractDictObs * refactor and add test cases * add test cases and format * avoid circular import and fix doc * fix type * fix type * oops * Update stable_baselines3/common/monitor.py Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Update stable_baselines3/common/monitor.py Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * add test cases * update changelog * fix mutable argument * quick fix * Apply suggestions from code review * fix terminal observation for gym3 envs * delete comment * Update doc and bump version * Add warning when already using `Monitor` wrapper * Update vecmonitor tests * Fixes Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Reformat * Fixed loading of ``ent_coef`` for ``SAC`` and ``TQC``, it was not optimized anymore (#392) * Fix ent coef loading bug * Add test * Add comment * Reuse save path * Add test for GAE + rename `RolloutBuffer.dones` for clarification (#375) * Fix return computation + add test for GAE * Rename `last_dones` to `episode_starts` for clarification * Revert advantage * Cleanup test * Rename variable * Clarify return computation * Clarify docs * Add multi-episode rollout test * Reformat Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com> * Fixed saving of `A2C` and `PPO` policy when using gSDE (#401) * Improve doc and replay buffer loading * Add support for images * Fix doc * Update Procgen doc * Update changelog * Update docstrings Co-authored-by: Adam Gleave <adam@gleave.me> Co-authored-by: Jacopo Panerati <jacopo.panerati@utoronto.ca> Co-authored-by: Justin Terry <justinkterry@gmail.com> Co-authored-by: Anssi <kaneran21@hotmail.com> Co-authored-by: Tom Dörr <tomdoerr96@gmail.com> Co-authored-by: Tom Dörr <tom.doerr@tum.de> Co-authored-by: Costa Huang <costa.huang@outlook.com> * Update doc and minor fixes * Update doc * Added note about MultiInputPolicy in error of NatureCNN * Merge branch 'master' into feat/dict_observations * Address comments * Naming clarifications * Actually saving the file would be nice * Fix edge case when doing online sampling with HER * Cleanup * Add sanity check Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com> Co-authored-by: Adam Gleave <adam@gleave.me> Co-authored-by: Jacopo Panerati <jacopo.panerati@utoronto.ca> Co-authored-by: Justin Terry <justinkterry@gmail.com> Co-authored-by: Tom Dörr <tomdoerr96@gmail.com> Co-authored-by: Tom Dörr <tom.doerr@tum.de> Co-authored-by: Costa Huang <costa.huang@outlook.com> 2021-05-11 10:29:30 +00:00			defined `in this file <https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/common/envs/identity_env.py>`_.
			An example of how to use it can be found `here <https://github.com/DLR-RM/stable-baselines3/blob/master/tests/test_identity.py>`_.