mirror of
https://github.com/saymrwulf/stable-baselines3.git
synced 2026-05-14 20:58:03 +00:00
* IM compat. modif from old fork * mp her working, without offline sampling * update readme and doc * fix discrete action/obs space case * handle offline sampling * fix pos to be consistent with the old version * improve typing and docstring * fix discrete obs special case * new her, using episode uid * deal with full buffer * offline not implemented * info storage; compute_reward as arg; offline sampling error * offline sampling; timeout_termination; fix last_trans detection * rm max_episode_length from tests * fix loading and loading test * Fix episode sampling strategy * Episode interrupted not valid * Typo * Fix infos sampling, next_obs desired goals, offline sampling * update tests for multienvs * speed up code * handle timeout sampling when samping * give up ep_uid for ep_start and ep_lenght * speed up sampling * Improve docstring * Typos and renaming * Fix typing * Fix linter warnings * Renaming + add note * fix reward type * Fix future sampling strategy * Fix future goal selection strategy * env_fn as lambda * Re-fix linter warnings * Formatting * Fix offline sampling * restore the initial performance budget * Remove max_episode_length for HerReplayBuffer kwargs * SubprcVecEnv compat test * Dedicated SubrocVecEnv test rm n_envs from parametrization * Back to using the env arg instead of compute_reward * Up VecEnv import * fix lint warnings * fix docstring * Fix device issue * actor_loss_modifier in SAV and TD3 * Merge RewardModifier and ActorLossModifier into Surgeon * update surgeon for rnd * fix uninteded merge * fix uninteded merge * fix unintended merge * Rm unintended merge * Fix KeyError * Remove useless `all_inds` * Minor docstring format * Fix hint * speedup! * Speedup again * speedup * np.nonzero * fix env normalization * flat sampling for speedup * typo * drop online * format * remove observation from env_cheker (see #1335) * update changelog * default device to "auto" * add comment for info storage * add comment for ep_start and ep_length attributes * a[b][c] to a[b, c] * comment flatnonzero and unravel_index * update _sample_goals docstring * Fix future gaol sampling for split episode * add informative error message for learning_starts too small * use keyword arg for env * try fix pytye * Update stable_baselines3/common/off_policy_algorithm.py Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Add `copy_info_dict` option * Ignore pytype * Update changelog * Rename variables and improve documentation * Ignore new bug bear rule * Add note about future strategy * Add deprecation warning * Fix bug trying to pickle buffer kwargs --------- Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
74 lines
3.9 KiB
ReStructuredText
74 lines
3.9 KiB
ReStructuredText
RL Algorithms
|
||
=============
|
||
|
||
This table displays the rl algorithms that are implemented in the Stable Baselines3 project,
|
||
along with some useful characteristics: support for discrete/continuous actions, multiprocessing.
|
||
|
||
|
||
=================== =========== ============ ================= =============== ================
|
||
Name ``Box`` ``Discrete`` ``MultiDiscrete`` ``MultiBinary`` Multi Processing
|
||
=================== =========== ============ ================= =============== ================
|
||
ARS [#f1]_ ✔️ ✔️ ❌ ❌ ✔️
|
||
A2C ✔️ ✔️ ✔️ ✔️ ✔️
|
||
DDPG ✔️ ❌ ❌ ❌ ✔️
|
||
DQN ❌ ✔️ ❌ ❌ ✔️
|
||
HER ✔️ ✔️ ❌ ❌ ✔️
|
||
PPO ✔️ ✔️ ✔️ ✔️ ✔️
|
||
QR-DQN [#f1]_ ❌ ️ ✔️ ❌ ❌ ✔️
|
||
RecurrentPPO [#f1]_ ✔️ ✔️ ✔️ ✔️ ✔️
|
||
SAC ✔️ ❌ ❌ ❌ ✔️
|
||
TD3 ✔️ ❌ ❌ ❌ ✔️
|
||
TQC [#f1]_ ✔️ ❌ ❌ ❌ ✔️
|
||
TRPO [#f1]_ ✔️ ✔️ ✔️ ✔️ ✔️
|
||
Maskable PPO [#f1]_ ❌ ✔️ ✔️ ✔️ ✔️
|
||
=================== =========== ============ ================= =============== ================
|
||
|
||
|
||
.. [#f1] Implemented in `SB3 Contrib <https://github.com/Stable-Baselines-Team/stable-baselines3-contrib>`_
|
||
|
||
.. note::
|
||
``Tuple`` observation spaces are not supported by any environment,
|
||
however, single-level ``Dict`` spaces are (cf. :ref:`Examples <examples>`).
|
||
|
||
|
||
Actions ``gym.spaces``:
|
||
|
||
- ``Box``: A N-dimensional box that contains every point in the action
|
||
space.
|
||
- ``Discrete``: A list of possible actions, where each timestep only
|
||
one of the actions can be used.
|
||
- ``MultiDiscrete``: A list of possible actions, where each timestep only one action of each discrete set can be used.
|
||
- ``MultiBinary``: A list of possible actions, where each timestep any of the actions can be used in any combination.
|
||
|
||
|
||
.. note::
|
||
|
||
More algorithms (like QR-DQN or TQC) are implemented in our :ref:`contrib repo <sb3_contrib>`.
|
||
|
||
.. note::
|
||
|
||
Some logging values (like ``ep_rew_mean``, ``ep_len_mean``) are only available when using a ``Monitor`` wrapper
|
||
See `Issue #339 <https://github.com/hill-a/stable-baselines/issues/339>`_ for more info.
|
||
|
||
|
||
.. note::
|
||
|
||
When using off-policy algorithms, `Time Limits <https://arxiv.org/abs/1712.00378>`_ (aka timeouts) are handled
|
||
properly (cf. `issue #284 <https://github.com/DLR-RM/stable-baselines3/issues/284>`_).
|
||
You can revert to SB3 < 2.1.0 behavior by passing ``handle_timeout_termination=False``
|
||
via the ``replay_buffer_kwargs`` argument.
|
||
|
||
|
||
|
||
Reproducibility
|
||
---------------
|
||
|
||
Completely reproducible results are not guaranteed across PyTorch releases or different platforms.
|
||
Furthermore, results need not be reproducible between CPU and GPU executions, even when using identical seeds.
|
||
|
||
In order to make computations deterministics, on your specific problem on one specific platform,
|
||
you need to pass a ``seed`` argument at the creation of a model.
|
||
If you pass an environment to the model using ``set_env()``, then you also need to seed the environment first.
|
||
|
||
|
||
Credit: part of the *Reproducibility* section comes from `PyTorch Documentation <https://pytorch.org/docs/stable/notes/randomness.html>`_
|