diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md new file mode 100644 index 0000000..137c957 --- /dev/null +++ b/CODE_OF_CONDUCT.md @@ -0,0 +1,128 @@ +# Contributor Covenant Code of Conduct + +## Our Pledge + +We as members, contributors, and leaders pledge to make participation in our +community a harassment-free experience for everyone, regardless of age, body +size, visible or invisible disability, ethnicity, sex characteristics, gender +identity and expression, level of experience, education, socio-economic status, +nationality, personal appearance, race, religion, or sexual identity +and orientation. + +We pledge to act and interact in ways that contribute to an open, welcoming, +diverse, inclusive, and healthy community. + +## Our Standards + +Examples of behavior that contributes to a positive environment for our +community include: + +* Demonstrating empathy and kindness toward other people +* Being respectful of differing opinions, viewpoints, and experiences +* Giving and gracefully accepting constructive feedback +* Accepting responsibility and apologizing to those affected by our mistakes, + and learning from the experience +* Focusing on what is best not just for us as individuals, but for the + overall community + +Examples of unacceptable behavior include: + +* The use of sexualized language or imagery, and sexual attention or + advances of any kind +* Trolling, insulting or derogatory comments, and personal or political attacks +* Public or private harassment +* Publishing others' private information, such as a physical or email + address, without their explicit permission +* Other conduct which could reasonably be considered inappropriate in a + professional setting + +## Enforcement Responsibilities + +Community leaders are responsible for clarifying and enforcing our standards of +acceptable behavior and will take appropriate and fair corrective action in +response to any behavior that they deem inappropriate, threatening, offensive, +or harmful. + +Community leaders have the right and responsibility to remove, edit, or reject +comments, commits, code, wiki edits, issues, and other contributions that are +not aligned to this Code of Conduct, and will communicate reasons for moderation +decisions when appropriate. + +## Scope + +This Code of Conduct applies within all community spaces, and also applies when +an individual is officially representing the community in public spaces. +Examples of representing our community include using an official e-mail address, +posting via an official social media account, or acting as an appointed +representative at an online or offline event. + +## Enforcement + +Instances of abusive, harassing, or otherwise unacceptable behavior may be +reported to the community leaders responsible for enforcement at +antonin [dot] raffin [at] dlr [dot] de. +All complaints will be reviewed and investigated promptly and fairly. + +All community leaders are obligated to respect the privacy and security of the +reporter of any incident. + +## Enforcement Guidelines + +Community leaders will follow these Community Impact Guidelines in determining +the consequences for any action they deem in violation of this Code of Conduct: + +### 1. Correction + +**Community Impact**: Use of inappropriate language or other behavior deemed +unprofessional or unwelcome in the community. + +**Consequence**: A private, written warning from community leaders, providing +clarity around the nature of the violation and an explanation of why the +behavior was inappropriate. A public apology may be requested. + +### 2. Warning + +**Community Impact**: A violation through a single incident or series +of actions. + +**Consequence**: A warning with consequences for continued behavior. No +interaction with the people involved, including unsolicited interaction with +those enforcing the Code of Conduct, for a specified period of time. This +includes avoiding interactions in community spaces as well as external channels +like social media. Violating these terms may lead to a temporary or +permanent ban. + +### 3. Temporary Ban + +**Community Impact**: A serious violation of community standards, including +sustained inappropriate behavior. + +**Consequence**: A temporary ban from any sort of interaction or public +communication with the community for a specified period of time. No public or +private interaction with the people involved, including unsolicited interaction +with those enforcing the Code of Conduct, is allowed during this period. +Violating these terms may lead to a permanent ban. + +### 4. Permanent Ban + +**Community Impact**: Demonstrating a pattern of violation of community +standards, including sustained inappropriate behavior, harassment of an +individual, or aggression toward or disparagement of classes of individuals. + +**Consequence**: A permanent ban from any sort of public interaction within +the community. + +## Attribution + +This Code of Conduct is adapted from the [Contributor Covenant][homepage], +version 2.0, available at +https://www.contributor-covenant.org/version/2/0/code_of_conduct.html. + +Community Impact Guidelines were inspired by [Mozilla's code of conduct +enforcement ladder](https://github.com/mozilla/diversity). + +[homepage]: https://www.contributor-covenant.org + +For answers to common questions about this code of conduct, see the FAQ at +https://www.contributor-covenant.org/faq. Translations are available at +https://www.contributor-covenant.org/translations. diff --git a/docs/guide/callbacks.rst b/docs/guide/callbacks.rst index 6588f90..f5d9d02 100644 --- a/docs/guide/callbacks.rst +++ b/docs/guide/callbacks.rst @@ -185,6 +185,11 @@ It will save the best model if ``best_model_save_path`` folder is specified and You can pass a child callback via the ``callback_on_new_best`` argument. It will be triggered each time there is a new best model. +.. warning:: + + You need to make sure that ``eval_env`` is wrapped the same way as the training environment, for instance using the ``VecTransposeImage`` wrapper if you have a channel-last image as input. + The ``EvalCallback`` class outputs a warning if it is not the case. + .. code-block:: python diff --git a/docs/guide/custom_env.rst b/docs/guide/custom_env.rst index 6adf55d..cbcad96 100644 --- a/docs/guide/custom_env.rst +++ b/docs/guide/custom_env.rst @@ -13,6 +13,12 @@ That is to say, your environment must implement the following methods (and inher channel-first or channel-last. +.. note:: + + Although SB3 supports both channel-last and channel-first images as input, we recommend using the channel-first convention when possible. + Under the hood, when a channel-last image is passed, SB3 uses a ``VecTransposeImage`` wrapper to re-order the channels. + + .. code-block:: python @@ -29,9 +35,9 @@ That is to say, your environment must implement the following methods (and inher # They must be gym.spaces objects # Example when using discrete actions: self.action_space = spaces.Discrete(N_DISCRETE_ACTIONS) - # Example for using image as input (can be channel-first or channel-last): + # Example for using image as input (channel-first; channel-last also works): self.observation_space = spaces.Box(low=0, high=255, - shape=(HEIGHT, WIDTH, N_CHANNELS), dtype=np.uint8) + shape=(N_CHANNELS, HEIGHT, WIDTH), dtype=np.uint8) def step(self, action): ... diff --git a/docs/misc/changelog.rst b/docs/misc/changelog.rst index a0737f2..99effa6 100644 --- a/docs/misc/changelog.rst +++ b/docs/misc/changelog.rst @@ -23,11 +23,14 @@ Deprecations: Others: ^^^^^^^ - Added ``flake8-bugbear`` to tests dependencies to find likely bugs +- Added Code of Conduct Documentation: ^^^^^^^^^^^^^^ - Added gym pybullet drones project (@JacopoPan) - Added link to SuperSuit in projects (@justinkterry) +- Fixed DQN example (thanks @ltbd78) +- Clarify channel-first/channel-last recommendation Release 1.0 (2021-03-15) @@ -637,4 +640,4 @@ And all the contributors: @tirafesi @blurLake @koulakis @joeljosephjin @shwang @rk37 @andyshih12 @RaphaelWag @xicocaio @diditforlulz273 @liorcohen5 @ManifoldFR @mloo3 @SwamyDev @wmmc88 @megan-klaiber @thisray @tfederico @hn2 @LucasAlegre @AptX395 @zampanteymedio @decodyng @ardabbour @lorenz-h @mschweizer @lorepieri8 -@ShangqunYu @PierreExeter @JacopoPan +@ShangqunYu @PierreExeter @JacopoPan @ltbd78 diff --git a/docs/modules/dqn.rst b/docs/modules/dqn.rst index 76490dd..f35788f 100644 --- a/docs/modules/dqn.rst +++ b/docs/modules/dqn.rst @@ -61,11 +61,11 @@ Example model = DQN("MlpPolicy", env, verbose=1) model.learn(total_timesteps=10000, log_interval=4) - model.save("dqn_pendulum") + model.save("dqn_cartpole") del model # remove to demonstrate saving and loading - model = DQN.load("dqn_pendulum") + model = DQN.load("dqn_cartpole") obs = env.reset() while True: