pytorch/tools/extract_scripts.py

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

106 lines
3 KiB
Python
Raw Normal View History

#!/usr/bin/env python3
from __future__ import annotations
import argparse
import re
import sys
from pathlib import Path
from typing import Any
from typing_extensions import TypedDict # Python 3.11+
import yaml
Harden "Add annotations" workflow (#56071) Summary: Resolves https://github.com/pytorch/pytorch/issues/55810 by closing some possible security holes due to using [GitHub Actions `${{ <expressions> }}`](https://docs.github.com/en/actions/reference/context-and-expression-syntax-for-github-actions#about-contexts-and-expressions) in `.github/workflows/add_annotations.yml` and also patching a few other possible scenarios that could cause the workflow to fail by a PR passing a malformed artifact. - [x] flag and remove GitHub Actions expressions in JS scripts - [x] don't fail the workflow if the artifact doesn't look as expected - [x] write unit tests for `tools/extract_scripts.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56071 Test Plan: I tested the end-to-end "Lint" and "Add annotations" system in a separate sandbox repo, including the following cases: - well-formed artifact - missing artifact - artifact containing a file named `linter-output.zip` (name clash) - artifact whose `commit-sha.txt` doesn't contain a 40-digit hex string - artifact whose `commit-sha.txt` contains a 40-digit hex string that isn't a valid Git hash for the current repo - in this last case, the workflow does fail, but handling that is the responsibility of [pytorch/add-annotations-github-action](https://github.com/pytorch/add-annotations-github-action), not pytorch/pytorch To run the new unit tests added in this PR: ``` python tools/test/test_extract_scripts.py ``` Reviewed By: seemethere Differential Revision: D27807074 Pulled By: samestep fbshipit-source-id: e2d3cc5437fe80ff03d46237ebba289901bc567c
2021-04-16 14:44:56 +00:00
Step = dict[str, Any]
Harden "Add annotations" workflow (#56071) Summary: Resolves https://github.com/pytorch/pytorch/issues/55810 by closing some possible security holes due to using [GitHub Actions `${{ <expressions> }}`](https://docs.github.com/en/actions/reference/context-and-expression-syntax-for-github-actions#about-contexts-and-expressions) in `.github/workflows/add_annotations.yml` and also patching a few other possible scenarios that could cause the workflow to fail by a PR passing a malformed artifact. - [x] flag and remove GitHub Actions expressions in JS scripts - [x] don't fail the workflow if the artifact doesn't look as expected - [x] write unit tests for `tools/extract_scripts.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56071 Test Plan: I tested the end-to-end "Lint" and "Add annotations" system in a separate sandbox repo, including the following cases: - well-formed artifact - missing artifact - artifact containing a file named `linter-output.zip` (name clash) - artifact whose `commit-sha.txt` doesn't contain a 40-digit hex string - artifact whose `commit-sha.txt` contains a 40-digit hex string that isn't a valid Git hash for the current repo - in this last case, the workflow does fail, but handling that is the responsibility of [pytorch/add-annotations-github-action](https://github.com/pytorch/add-annotations-github-action), not pytorch/pytorch To run the new unit tests added in this PR: ``` python tools/test/test_extract_scripts.py ``` Reviewed By: seemethere Differential Revision: D27807074 Pulled By: samestep fbshipit-source-id: e2d3cc5437fe80ff03d46237ebba289901bc567c
2021-04-16 14:44:56 +00:00
Harden "Add annotations" workflow (#56071) Summary: Resolves https://github.com/pytorch/pytorch/issues/55810 by closing some possible security holes due to using [GitHub Actions `${{ <expressions> }}`](https://docs.github.com/en/actions/reference/context-and-expression-syntax-for-github-actions#about-contexts-and-expressions) in `.github/workflows/add_annotations.yml` and also patching a few other possible scenarios that could cause the workflow to fail by a PR passing a malformed artifact. - [x] flag and remove GitHub Actions expressions in JS scripts - [x] don't fail the workflow if the artifact doesn't look as expected - [x] write unit tests for `tools/extract_scripts.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56071 Test Plan: I tested the end-to-end "Lint" and "Add annotations" system in a separate sandbox repo, including the following cases: - well-formed artifact - missing artifact - artifact containing a file named `linter-output.zip` (name clash) - artifact whose `commit-sha.txt` doesn't contain a 40-digit hex string - artifact whose `commit-sha.txt` contains a 40-digit hex string that isn't a valid Git hash for the current repo - in this last case, the workflow does fail, but handling that is the responsibility of [pytorch/add-annotations-github-action](https://github.com/pytorch/add-annotations-github-action), not pytorch/pytorch To run the new unit tests added in this PR: ``` python tools/test/test_extract_scripts.py ``` Reviewed By: seemethere Differential Revision: D27807074 Pulled By: samestep fbshipit-source-id: e2d3cc5437fe80ff03d46237ebba289901bc567c
2021-04-16 14:44:56 +00:00
class Script(TypedDict):
extension: str
script: str
def extract(step: Step) -> Script | None:
run = step.get("run")
Harden "Add annotations" workflow (#56071) Summary: Resolves https://github.com/pytorch/pytorch/issues/55810 by closing some possible security holes due to using [GitHub Actions `${{ <expressions> }}`](https://docs.github.com/en/actions/reference/context-and-expression-syntax-for-github-actions#about-contexts-and-expressions) in `.github/workflows/add_annotations.yml` and also patching a few other possible scenarios that could cause the workflow to fail by a PR passing a malformed artifact. - [x] flag and remove GitHub Actions expressions in JS scripts - [x] don't fail the workflow if the artifact doesn't look as expected - [x] write unit tests for `tools/extract_scripts.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56071 Test Plan: I tested the end-to-end "Lint" and "Add annotations" system in a separate sandbox repo, including the following cases: - well-formed artifact - missing artifact - artifact containing a file named `linter-output.zip` (name clash) - artifact whose `commit-sha.txt` doesn't contain a 40-digit hex string - artifact whose `commit-sha.txt` contains a 40-digit hex string that isn't a valid Git hash for the current repo - in this last case, the workflow does fail, but handling that is the responsibility of [pytorch/add-annotations-github-action](https://github.com/pytorch/add-annotations-github-action), not pytorch/pytorch To run the new unit tests added in this PR: ``` python tools/test/test_extract_scripts.py ``` Reviewed By: seemethere Differential Revision: D27807074 Pulled By: samestep fbshipit-source-id: e2d3cc5437fe80ff03d46237ebba289901bc567c
2021-04-16 14:44:56 +00:00
# https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions#using-a-specific-shell
shell = step.get("shell", "bash")
Harden "Add annotations" workflow (#56071) Summary: Resolves https://github.com/pytorch/pytorch/issues/55810 by closing some possible security holes due to using [GitHub Actions `${{ <expressions> }}`](https://docs.github.com/en/actions/reference/context-and-expression-syntax-for-github-actions#about-contexts-and-expressions) in `.github/workflows/add_annotations.yml` and also patching a few other possible scenarios that could cause the workflow to fail by a PR passing a malformed artifact. - [x] flag and remove GitHub Actions expressions in JS scripts - [x] don't fail the workflow if the artifact doesn't look as expected - [x] write unit tests for `tools/extract_scripts.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56071 Test Plan: I tested the end-to-end "Lint" and "Add annotations" system in a separate sandbox repo, including the following cases: - well-formed artifact - missing artifact - artifact containing a file named `linter-output.zip` (name clash) - artifact whose `commit-sha.txt` doesn't contain a 40-digit hex string - artifact whose `commit-sha.txt` contains a 40-digit hex string that isn't a valid Git hash for the current repo - in this last case, the workflow does fail, but handling that is the responsibility of [pytorch/add-annotations-github-action](https://github.com/pytorch/add-annotations-github-action), not pytorch/pytorch To run the new unit tests added in this PR: ``` python tools/test/test_extract_scripts.py ``` Reviewed By: seemethere Differential Revision: D27807074 Pulled By: samestep fbshipit-source-id: e2d3cc5437fe80ff03d46237ebba289901bc567c
2021-04-16 14:44:56 +00:00
extension = {
"bash": ".sh",
"pwsh": ".ps1",
"python": ".py",
"sh": ".sh",
"cmd": ".cmd",
"powershell": ".ps1",
Harden "Add annotations" workflow (#56071) Summary: Resolves https://github.com/pytorch/pytorch/issues/55810 by closing some possible security holes due to using [GitHub Actions `${{ <expressions> }}`](https://docs.github.com/en/actions/reference/context-and-expression-syntax-for-github-actions#about-contexts-and-expressions) in `.github/workflows/add_annotations.yml` and also patching a few other possible scenarios that could cause the workflow to fail by a PR passing a malformed artifact. - [x] flag and remove GitHub Actions expressions in JS scripts - [x] don't fail the workflow if the artifact doesn't look as expected - [x] write unit tests for `tools/extract_scripts.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56071 Test Plan: I tested the end-to-end "Lint" and "Add annotations" system in a separate sandbox repo, including the following cases: - well-formed artifact - missing artifact - artifact containing a file named `linter-output.zip` (name clash) - artifact whose `commit-sha.txt` doesn't contain a 40-digit hex string - artifact whose `commit-sha.txt` contains a 40-digit hex string that isn't a valid Git hash for the current repo - in this last case, the workflow does fail, but handling that is the responsibility of [pytorch/add-annotations-github-action](https://github.com/pytorch/add-annotations-github-action), not pytorch/pytorch To run the new unit tests added in this PR: ``` python tools/test/test_extract_scripts.py ``` Reviewed By: seemethere Differential Revision: D27807074 Pulled By: samestep fbshipit-source-id: e2d3cc5437fe80ff03d46237ebba289901bc567c
2021-04-16 14:44:56 +00:00
}.get(shell)
is_gh_script = step.get("uses", "").startswith("actions/github-script@")
gh_script = step.get("with", {}).get("script")
Harden "Add annotations" workflow (#56071) Summary: Resolves https://github.com/pytorch/pytorch/issues/55810 by closing some possible security holes due to using [GitHub Actions `${{ <expressions> }}`](https://docs.github.com/en/actions/reference/context-and-expression-syntax-for-github-actions#about-contexts-and-expressions) in `.github/workflows/add_annotations.yml` and also patching a few other possible scenarios that could cause the workflow to fail by a PR passing a malformed artifact. - [x] flag and remove GitHub Actions expressions in JS scripts - [x] don't fail the workflow if the artifact doesn't look as expected - [x] write unit tests for `tools/extract_scripts.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56071 Test Plan: I tested the end-to-end "Lint" and "Add annotations" system in a separate sandbox repo, including the following cases: - well-formed artifact - missing artifact - artifact containing a file named `linter-output.zip` (name clash) - artifact whose `commit-sha.txt` doesn't contain a 40-digit hex string - artifact whose `commit-sha.txt` contains a 40-digit hex string that isn't a valid Git hash for the current repo - in this last case, the workflow does fail, but handling that is the responsibility of [pytorch/add-annotations-github-action](https://github.com/pytorch/add-annotations-github-action), not pytorch/pytorch To run the new unit tests added in this PR: ``` python tools/test/test_extract_scripts.py ``` Reviewed By: seemethere Differential Revision: D27807074 Pulled By: samestep fbshipit-source-id: e2d3cc5437fe80ff03d46237ebba289901bc567c
2021-04-16 14:44:56 +00:00
if run is not None and extension is not None:
script = {
"bash": f"#!/usr/bin/env bash\nset -eo pipefail\n{run}",
"sh": f"#!/usr/bin/env sh\nset -e\n{run}",
Harden "Add annotations" workflow (#56071) Summary: Resolves https://github.com/pytorch/pytorch/issues/55810 by closing some possible security holes due to using [GitHub Actions `${{ <expressions> }}`](https://docs.github.com/en/actions/reference/context-and-expression-syntax-for-github-actions#about-contexts-and-expressions) in `.github/workflows/add_annotations.yml` and also patching a few other possible scenarios that could cause the workflow to fail by a PR passing a malformed artifact. - [x] flag and remove GitHub Actions expressions in JS scripts - [x] don't fail the workflow if the artifact doesn't look as expected - [x] write unit tests for `tools/extract_scripts.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56071 Test Plan: I tested the end-to-end "Lint" and "Add annotations" system in a separate sandbox repo, including the following cases: - well-formed artifact - missing artifact - artifact containing a file named `linter-output.zip` (name clash) - artifact whose `commit-sha.txt` doesn't contain a 40-digit hex string - artifact whose `commit-sha.txt` contains a 40-digit hex string that isn't a valid Git hash for the current repo - in this last case, the workflow does fail, but handling that is the responsibility of [pytorch/add-annotations-github-action](https://github.com/pytorch/add-annotations-github-action), not pytorch/pytorch To run the new unit tests added in this PR: ``` python tools/test/test_extract_scripts.py ``` Reviewed By: seemethere Differential Revision: D27807074 Pulled By: samestep fbshipit-source-id: e2d3cc5437fe80ff03d46237ebba289901bc567c
2021-04-16 14:44:56 +00:00
}.get(shell, run)
return {"extension": extension, "script": script}
Harden "Add annotations" workflow (#56071) Summary: Resolves https://github.com/pytorch/pytorch/issues/55810 by closing some possible security holes due to using [GitHub Actions `${{ <expressions> }}`](https://docs.github.com/en/actions/reference/context-and-expression-syntax-for-github-actions#about-contexts-and-expressions) in `.github/workflows/add_annotations.yml` and also patching a few other possible scenarios that could cause the workflow to fail by a PR passing a malformed artifact. - [x] flag and remove GitHub Actions expressions in JS scripts - [x] don't fail the workflow if the artifact doesn't look as expected - [x] write unit tests for `tools/extract_scripts.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56071 Test Plan: I tested the end-to-end "Lint" and "Add annotations" system in a separate sandbox repo, including the following cases: - well-formed artifact - missing artifact - artifact containing a file named `linter-output.zip` (name clash) - artifact whose `commit-sha.txt` doesn't contain a 40-digit hex string - artifact whose `commit-sha.txt` contains a 40-digit hex string that isn't a valid Git hash for the current repo - in this last case, the workflow does fail, but handling that is the responsibility of [pytorch/add-annotations-github-action](https://github.com/pytorch/add-annotations-github-action), not pytorch/pytorch To run the new unit tests added in this PR: ``` python tools/test/test_extract_scripts.py ``` Reviewed By: seemethere Differential Revision: D27807074 Pulled By: samestep fbshipit-source-id: e2d3cc5437fe80ff03d46237ebba289901bc567c
2021-04-16 14:44:56 +00:00
elif is_gh_script and gh_script is not None:
return {"extension": ".js", "script": gh_script}
Harden "Add annotations" workflow (#56071) Summary: Resolves https://github.com/pytorch/pytorch/issues/55810 by closing some possible security holes due to using [GitHub Actions `${{ <expressions> }}`](https://docs.github.com/en/actions/reference/context-and-expression-syntax-for-github-actions#about-contexts-and-expressions) in `.github/workflows/add_annotations.yml` and also patching a few other possible scenarios that could cause the workflow to fail by a PR passing a malformed artifact. - [x] flag and remove GitHub Actions expressions in JS scripts - [x] don't fail the workflow if the artifact doesn't look as expected - [x] write unit tests for `tools/extract_scripts.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56071 Test Plan: I tested the end-to-end "Lint" and "Add annotations" system in a separate sandbox repo, including the following cases: - well-formed artifact - missing artifact - artifact containing a file named `linter-output.zip` (name clash) - artifact whose `commit-sha.txt` doesn't contain a 40-digit hex string - artifact whose `commit-sha.txt` contains a 40-digit hex string that isn't a valid Git hash for the current repo - in this last case, the workflow does fail, but handling that is the responsibility of [pytorch/add-annotations-github-action](https://github.com/pytorch/add-annotations-github-action), not pytorch/pytorch To run the new unit tests added in this PR: ``` python tools/test/test_extract_scripts.py ``` Reviewed By: seemethere Differential Revision: D27807074 Pulled By: samestep fbshipit-source-id: e2d3cc5437fe80ff03d46237ebba289901bc567c
2021-04-16 14:44:56 +00:00
else:
return None
def main() -> None:
parser = argparse.ArgumentParser()
parser.add_argument("--out", required=True)
args = parser.parse_args()
out = Path(args.out)
if out.exists():
sys.exit(f"{out} already exists; aborting to avoid overwriting")
gha_expressions_found = False
for p in Path(".github/workflows").iterdir():
with open(p, "rb") as f:
workflow = yaml.safe_load(f)
for job_name, job in workflow["jobs"].items():
job_dir = out / p / job_name
if "steps" not in job:
continue
steps = job["steps"]
index_chars = len(str(len(steps) - 1))
for i, step in enumerate(steps, start=1):
Harden "Add annotations" workflow (#56071) Summary: Resolves https://github.com/pytorch/pytorch/issues/55810 by closing some possible security holes due to using [GitHub Actions `${{ <expressions> }}`](https://docs.github.com/en/actions/reference/context-and-expression-syntax-for-github-actions#about-contexts-and-expressions) in `.github/workflows/add_annotations.yml` and also patching a few other possible scenarios that could cause the workflow to fail by a PR passing a malformed artifact. - [x] flag and remove GitHub Actions expressions in JS scripts - [x] don't fail the workflow if the artifact doesn't look as expected - [x] write unit tests for `tools/extract_scripts.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56071 Test Plan: I tested the end-to-end "Lint" and "Add annotations" system in a separate sandbox repo, including the following cases: - well-formed artifact - missing artifact - artifact containing a file named `linter-output.zip` (name clash) - artifact whose `commit-sha.txt` doesn't contain a 40-digit hex string - artifact whose `commit-sha.txt` contains a 40-digit hex string that isn't a valid Git hash for the current repo - in this last case, the workflow does fail, but handling that is the responsibility of [pytorch/add-annotations-github-action](https://github.com/pytorch/add-annotations-github-action), not pytorch/pytorch To run the new unit tests added in this PR: ``` python tools/test/test_extract_scripts.py ``` Reviewed By: seemethere Differential Revision: D27807074 Pulled By: samestep fbshipit-source-id: e2d3cc5437fe80ff03d46237ebba289901bc567c
2021-04-16 14:44:56 +00:00
extracted = extract(step)
if extracted:
script = extracted["script"]
step_name = step.get("name", "")
if "${{" in script:
gha_expressions_found = True
print(
f"{p} job `{job_name}` step {i}: {step_name}",
file=sys.stderr,
)
Harden "Add annotations" workflow (#56071) Summary: Resolves https://github.com/pytorch/pytorch/issues/55810 by closing some possible security holes due to using [GitHub Actions `${{ <expressions> }}`](https://docs.github.com/en/actions/reference/context-and-expression-syntax-for-github-actions#about-contexts-and-expressions) in `.github/workflows/add_annotations.yml` and also patching a few other possible scenarios that could cause the workflow to fail by a PR passing a malformed artifact. - [x] flag and remove GitHub Actions expressions in JS scripts - [x] don't fail the workflow if the artifact doesn't look as expected - [x] write unit tests for `tools/extract_scripts.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56071 Test Plan: I tested the end-to-end "Lint" and "Add annotations" system in a separate sandbox repo, including the following cases: - well-formed artifact - missing artifact - artifact containing a file named `linter-output.zip` (name clash) - artifact whose `commit-sha.txt` doesn't contain a 40-digit hex string - artifact whose `commit-sha.txt` contains a 40-digit hex string that isn't a valid Git hash for the current repo - in this last case, the workflow does fail, but handling that is the responsibility of [pytorch/add-annotations-github-action](https://github.com/pytorch/add-annotations-github-action), not pytorch/pytorch To run the new unit tests added in this PR: ``` python tools/test/test_extract_scripts.py ``` Reviewed By: seemethere Differential Revision: D27807074 Pulled By: samestep fbshipit-source-id: e2d3cc5437fe80ff03d46237ebba289901bc567c
2021-04-16 14:44:56 +00:00
job_dir.mkdir(parents=True, exist_ok=True)
Harden "Add annotations" workflow (#56071) Summary: Resolves https://github.com/pytorch/pytorch/issues/55810 by closing some possible security holes due to using [GitHub Actions `${{ <expressions> }}`](https://docs.github.com/en/actions/reference/context-and-expression-syntax-for-github-actions#about-contexts-and-expressions) in `.github/workflows/add_annotations.yml` and also patching a few other possible scenarios that could cause the workflow to fail by a PR passing a malformed artifact. - [x] flag and remove GitHub Actions expressions in JS scripts - [x] don't fail the workflow if the artifact doesn't look as expected - [x] write unit tests for `tools/extract_scripts.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56071 Test Plan: I tested the end-to-end "Lint" and "Add annotations" system in a separate sandbox repo, including the following cases: - well-formed artifact - missing artifact - artifact containing a file named `linter-output.zip` (name clash) - artifact whose `commit-sha.txt` doesn't contain a 40-digit hex string - artifact whose `commit-sha.txt` contains a 40-digit hex string that isn't a valid Git hash for the current repo - in this last case, the workflow does fail, but handling that is the responsibility of [pytorch/add-annotations-github-action](https://github.com/pytorch/add-annotations-github-action), not pytorch/pytorch To run the new unit tests added in this PR: ``` python tools/test/test_extract_scripts.py ``` Reviewed By: seemethere Differential Revision: D27807074 Pulled By: samestep fbshipit-source-id: e2d3cc5437fe80ff03d46237ebba289901bc567c
2021-04-16 14:44:56 +00:00
sanitized = re.sub(
"[^a-zA-Z_]+",
"_",
f"_{step_name}",
).rstrip("_")
extension = extracted["extension"]
filename = f"{i:0{index_chars}}{sanitized}{extension}"
Harden "Add annotations" workflow (#56071) Summary: Resolves https://github.com/pytorch/pytorch/issues/55810 by closing some possible security holes due to using [GitHub Actions `${{ <expressions> }}`](https://docs.github.com/en/actions/reference/context-and-expression-syntax-for-github-actions#about-contexts-and-expressions) in `.github/workflows/add_annotations.yml` and also patching a few other possible scenarios that could cause the workflow to fail by a PR passing a malformed artifact. - [x] flag and remove GitHub Actions expressions in JS scripts - [x] don't fail the workflow if the artifact doesn't look as expected - [x] write unit tests for `tools/extract_scripts.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56071 Test Plan: I tested the end-to-end "Lint" and "Add annotations" system in a separate sandbox repo, including the following cases: - well-formed artifact - missing artifact - artifact containing a file named `linter-output.zip` (name clash) - artifact whose `commit-sha.txt` doesn't contain a 40-digit hex string - artifact whose `commit-sha.txt` contains a 40-digit hex string that isn't a valid Git hash for the current repo - in this last case, the workflow does fail, but handling that is the responsibility of [pytorch/add-annotations-github-action](https://github.com/pytorch/add-annotations-github-action), not pytorch/pytorch To run the new unit tests added in this PR: ``` python tools/test/test_extract_scripts.py ``` Reviewed By: seemethere Differential Revision: D27807074 Pulled By: samestep fbshipit-source-id: e2d3cc5437fe80ff03d46237ebba289901bc567c
2021-04-16 14:44:56 +00:00
(job_dir / filename).write_text(script)
if gha_expressions_found:
sys.exit(
"Each of the above scripts contains a GitHub Actions "
"${{ <expression> }} which must be replaced with an `env` variable"
" for security reasons."
)
if __name__ == "__main__":
main()