Add professional toolchain: mypy strict, CI pipeline, Playwright UX tests, pedagogy validation

Infrastructure:
- Configure mypy strict mode in pyproject.toml; fix all 53 type errors across 8 source files
- Add .pre-commit-config.yaml (ruff, mypy, nbstripout, trailing whitespace)
- Add .github/workflows/ci.yml: lint + type check, unit tests (Python 3.11/3.12), notebook execution
- Add scripts/app.sh consumer lifecycle manager (bootstrap, start, stop, status, validate, logs, reset)

Testing:
- Add tests/test_browser_ux.py: Playwright end-to-end UX tests covering JupyterLab launch,
  notebook rendering, navigation links, widget rendering, and full consumer walkthrough
- Add tests/test_pedagogy.py: 130 pedagogical structure tests validating prose quality
  (word counts, markdown ratio), section structure, assessment density and variety,
  Bloom's taxonomy coverage, checkpoint presence, tracker integration, key insight
  callouts, and cross-plan concept consistency

Quality:
- Fix ruff E741 (ambiguous variable name) across all builder scripts
- Add Key Insight callouts to plan_a/01_encoded_magic_state.ipynb
- Add pytest 'browser' marker for selective UX test runs
- Expand .gitignore with .logs/ and build artifacts

319 tests pass, 85% coverage, mypy strict clean, ruff clean.
This commit is contained in:
saymrwulf 2026-04-15 20:00:19 +02:00
parent 18f5bef127
commit 29caba3a1a
24 changed files with 1123 additions and 91 deletions

79
.github/workflows/ci.yml vendored Normal file
View file

@ -0,0 +1,79 @@
name: CI
on:
push:
branches: [master]
pull_request:
branches: [master]
permissions:
contents: read
jobs:
lint:
name: Lint & Type Check
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -e ".[dev]"
- name: Ruff check
run: ruff check src/ tests/ scripts/
- name: Ruff format check
run: ruff format --check src/ tests/ scripts/
- name: Mypy
run: mypy src/autoresearch_quantum/
test:
name: Tests (Python ${{ matrix.python-version }})
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.11", "3.12"]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -e ".[dev,notebooks]"
- name: Run unit tests
run: pytest tests/ -k "not test_notebook_executes and not test_browser" -v --tb=short
- name: Run notebook structure tests
run: pytest tests/test_notebooks.py tests/test_pedagogy.py -k "not test_notebook_executes" -v --tb=short
notebook-execution:
name: Notebook Execution
runs-on: ubuntu-latest
timeout-minutes: 30
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -e ".[dev,notebooks]"
python -m ipykernel install --user --name python3
- name: Run notebook execution tests
run: pytest tests/test_notebooks.py -k "test_notebook_executes" -v --tb=short -x

7
.gitignore vendored
View file

@ -31,3 +31,10 @@ paper/*.synctex.gz
# Ruff
.ruff_cache/
# Logs
.logs/
# Build artifacts
dist/
build/

34
.pre-commit-config.yaml Normal file
View file

@ -0,0 +1,34 @@
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.11.12
hooks:
- id: ruff
args: [--fix, --exit-non-zero-on-fix]
- id: ruff-format
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.15.0
hooks:
- id: mypy
additional_dependencies:
- types-PyYAML
args: [--config-file=pyproject.toml]
pass_filenames: false
entry: mypy src/autoresearch_quantum/
- repo: https://github.com/kynan/nbstripout
rev: 0.8.1
hooks:
- id: nbstripout
args: [--extra-keys, "metadata.kernelspec metadata.language_info"]
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
exclude: '\.ipynb$'
- id: check-yaml
- id: check-added-large-files
args: [--maxkb=500]
- id: check-merge-conflict

View file

@ -446,9 +446,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"All three fidelities are 1.0 (or extremely close) and the Bloch spheres all point to the same spot. The amplitudes may differ by a **global phase** factor $e^{i\\theta}$, which has no physical significance \u2014 all measurements yield identical results.\n",
"\n",
"> **Take-away:** The choice of seed style is not about physics (they all give the same state). It is about **engineering**: which one transpiles to the fewest noisy gates on your target hardware?"
"> **Key Insight:** All three fidelities are 1.0 (or extremely close) and the Bloch spheres all point to the same spot. The amplitudes may differ by a **global phase** factor $e^{i\\theta}$, which has no physical significance \u2014 all measurements yield identical results.\n\n> **Take-away:** The choice of seed style is not about physics (they all give the same state). It is about **engineering**: which one transpiles to the fewest noisy gates on your target hardware?"
]
},
{
@ -876,12 +874,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Interpreting the output\n",
"\n",
"You should see exactly **4** non-zero amplitudes: $|0000\\rangle$, $|0101\\rangle$, $|1010\\rangle$, $|1111\\rangle$. These are the codewords of the [[4,2,2]] code. Notice:\n",
"- All four have the **same magnitude** (0.5) \u2014 equal probability\n",
"- The **phases** encode the T-state information (the $e^{i\\pi/4}$ factor appears on $|0101\\rangle$ and $|1010\\rangle$)\n",
"- No single qubit's measurement alone reveals the T-state \u2014 the information lives in the *correlations* between qubits"
"### Interpreting the output\n\n> **Key Insight:** You should see exactly **4** non-zero amplitudes: $|0000\\rangle$, $|0101\\rangle$, $|1010\\rangle$, $|1111\\rangle$. These are the codewords of the [[4,2,2]] code. Notice:\n- All four have the **same magnitude** (0.5) \u2014 equal probability\n- The **phases** encode the T-state information (the $e^{i\\pi/4}$ factor appears on $|0101\\rangle$ and $|1010\\rangle$)\n- No single qubit's measurement alone reveals the T-state \u2014 the information lives in the *correlations* between qubits"
]
},
{
@ -1120,17 +1113,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Reading the table\n",
"\n",
"Every single-qubit error is caught by at least one stabilizer:\n",
"\n",
"| Error type | Caught by | Reason |\n",
"|-----------|-----------|--------|\n",
"| X (bit-flip) | ZZZZ | X anti-commutes with Z |\n",
"| Z (phase-flip) | XXXX | Z anti-commutes with X |\n",
"| Y (both) | XXXX and ZZZZ | Y = iXZ, so both parts are caught |\n",
"\n",
"This is the **distance-2 guarantee**: the code detects all weight-1 errors. A weight-2 error (two qubits affected simultaneously) could go undetected \u2014 that's the limitation of distance 2."
"### Reading the table\n\n> **Key Insight:** Every single-qubit error is caught by at least one stabilizer:\n\n| Error type | Caught by | Reason |\n|-----------|-----------|--------|\n| X (bit-flip) | ZZZZ | X anti-commutes with Z |\n| Z (phase-flip) | XXXX | Z anti-commutes with X |\n| Y (both) | XXXX and ZZZZ | Y = iXZ, so both parts are caught |\n\nThis is the **distance-2 guarantee**: the code detects all weight-1 errors. A weight-2 error (two qubits affected simultaneously) could go undetected \u2014 that's the limitation of distance 2."
]
},
{

View file

@ -33,6 +33,11 @@ dev = [
"ruff>=0.11,<1",
"nbclient>=0.10,<1",
"nbformat>=5,<6",
"mypy>=1.15,<2",
"pre-commit>=4,<5",
]
ux = [
"playwright>=1.52,<2",
]
[project.scripts]
@ -47,7 +52,10 @@ where = ["src"]
[tool.pytest.ini_options]
pythonpath = ["src"]
testpaths = ["tests"]
addopts = "--cov=autoresearch_quantum --cov-report=term-missing --cov-config=pyproject.toml"
addopts = "--cov=autoresearch_quantum --cov-report=term-missing --cov-config=pyproject.toml -m 'not browser'"
markers = [
"browser: end-to-end browser UX tests (requires playwright)",
]
[tool.coverage.run]
source = ["autoresearch_quantum"]
@ -61,6 +69,36 @@ exclude_lines = [
"if __name__ == .__main__.",
]
[tool.mypy]
python_version = "3.11"
strict = true
warn_return_any = true
warn_unused_configs = true
disallow_untyped_defs = true
disallow_incomplete_defs = true
check_untyped_defs = true
no_implicit_optional = true
warn_redundant_casts = true
warn_unused_ignores = true
show_error_codes = true
namespace_packages = true
explicit_package_bases = true
mypy_path = ["src"]
[[tool.mypy.overrides]]
module = [
"qiskit.*",
"qiskit_aer.*",
"qiskit_ibm_runtime.*",
"IPython.*",
"ipywidgets.*",
"nbformat.*",
"matplotlib.*",
"numpy.*",
"yaml.*",
]
ignore_missing_imports = true
[tool.ruff]
target-version = "py311"
line-length = 120

345
scripts/app.sh Executable file
View file

@ -0,0 +1,345 @@
#!/usr/bin/env bash
# ──────────────────────────────────────────────────────────────────────
# app.sh — Consumer lifecycle manager for autoresearch-quantum
#
# Usage:
# bash scripts/app.sh bootstrap Create venv, install deps, verify
# bash scripts/app.sh start Launch JupyterLab (opens browser)
# bash scripts/app.sh start --no-open Launch without opening browser
# bash scripts/app.sh stop Stop running JupyterLab
# bash scripts/app.sh status Show service status
# bash scripts/app.sh validate Run full validation suite
# bash scripts/app.sh validate --quick Lint + unit tests only
# bash scripts/app.sh logs Tail JupyterLab logs
# bash scripts/app.sh reset Reset learner progress files
# ──────────────────────────────────────────────────────────────────────
set -euo pipefail
PROJECT_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
VENV_DIR="$PROJECT_ROOT/.venv"
LOG_DIR="$PROJECT_ROOT/.logs"
PID_FILE="$LOG_DIR/jupyter.pid"
LOG_FILE="$LOG_DIR/jupyterlab.log"
PYTHON="$VENV_DIR/bin/python"
JUPYTER="$VENV_DIR/bin/jupyter"
# ── Colours ───────────────────────────────────────────────────────────
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
BOLD='\033[1m'
NC='\033[0m'
info() { echo -e "${BLUE}[info]${NC} $*"; }
ok() { echo -e "${GREEN}[ ok]${NC} $*"; }
warn() { echo -e "${YELLOW}[warn]${NC} $*"; }
fail() { echo -e "${RED}[FAIL]${NC} $*"; }
# ── Bootstrap ─────────────────────────────────────────────────────────
cmd_bootstrap() {
info "Bootstrapping autoresearch-quantum..."
# Python version check
local py_cmd
for candidate in python3.12 python3.11 python3; do
if command -v "$candidate" &>/dev/null; then
py_cmd="$candidate"
break
fi
done
if [[ -z "${py_cmd:-}" ]]; then
fail "Python 3.11+ not found. Install Python first."
exit 1
fi
local py_version
py_version=$("$py_cmd" -c "import sys; print(f'{sys.version_info.major}.{sys.version_info.minor}')")
local py_major py_minor
py_major=$(echo "$py_version" | cut -d. -f1)
py_minor=$(echo "$py_version" | cut -d. -f2)
if (( py_major < 3 || py_minor < 11 )); then
fail "Python >= 3.11 required (found $py_version)"
exit 1
fi
ok "Python $py_version ($py_cmd)"
# Create venv
if [[ ! -d "$VENV_DIR" ]]; then
info "Creating virtual environment..."
"$py_cmd" -m venv "$VENV_DIR"
ok "Virtual environment created"
else
ok "Virtual environment exists"
fi
# Install package
info "Installing autoresearch-quantum + dependencies..."
"$PYTHON" -m pip install --upgrade pip --quiet
"$PYTHON" -m pip install -e "$PROJECT_ROOT[dev,notebooks]" --quiet
ok "Package installed"
# Install Jupyter kernel
"$PYTHON" -m ipykernel install --user --name autoresearch-quantum --display-name "Autoresearch Quantum" --quiet 2>/dev/null || true
ok "Jupyter kernel registered"
# Create log directory
mkdir -p "$LOG_DIR"
# Verify imports
if "$PYTHON" -c "from autoresearch_quantum.models import ExperimentSpec; print('Import OK')" &>/dev/null; then
ok "Import verification passed"
else
fail "Import verification failed — check installation"
exit 1
fi
echo ""
ok "${BOLD}Bootstrap complete!${NC}"
echo ""
echo " Next steps:"
echo " bash scripts/app.sh start # Launch JupyterLab"
echo " bash scripts/app.sh validate # Run validation suite"
}
# ── Start ─────────────────────────────────────────────────────────────
cmd_start() {
local open_browser=true
[[ "${1:-}" == "--no-open" ]] && open_browser=false
if [[ ! -f "$PYTHON" ]]; then
fail "Not bootstrapped. Run: bash scripts/app.sh bootstrap"
exit 1
fi
# Check if already running
if [[ -f "$PID_FILE" ]] && kill -0 "$(cat "$PID_FILE")" 2>/dev/null; then
local url
url=$(grep -o 'http://[^ ]*' "$LOG_FILE" 2>/dev/null | tail -1 || echo "http://localhost:8888")
warn "JupyterLab already running (PID $(cat "$PID_FILE"))"
echo " $url"
return 0
fi
mkdir -p "$LOG_DIR"
# Find free port
local port=8888
while lsof -i :"$port" &>/dev/null; do
port=$((port + 1))
if (( port > 8899 )); then
fail "No free port in range 88888899"
exit 1
fi
done
info "Starting JupyterLab on port $port..."
cd "$PROJECT_ROOT"
nohup "$JUPYTER" lab \
--port="$port" \
--no-browser \
--notebook-dir="$PROJECT_ROOT/notebooks" \
--ServerApp.token='' \
--ServerApp.password='' \
> "$LOG_FILE" 2>&1 &
local pid=$!
echo "$pid" > "$PID_FILE"
# Wait for server to start
local tries=0
while ! curl -s "http://localhost:$port/api" &>/dev/null; do
sleep 0.5
tries=$((tries + 1))
if (( tries > 20 )); then
fail "JupyterLab failed to start. Check: cat $LOG_FILE"
exit 1
fi
done
local url="http://localhost:$port/lab/tree/00_START_HERE.ipynb"
ok "JupyterLab running (PID $pid)"
echo ""
echo " ${BOLD}$url${NC}"
echo ""
if $open_browser; then
if command -v open &>/dev/null; then
open "$url"
elif command -v xdg-open &>/dev/null; then
xdg-open "$url"
fi
fi
}
# ── Stop ──────────────────────────────────────────────────────────────
cmd_stop() {
if [[ -f "$PID_FILE" ]]; then
local pid
pid=$(cat "$PID_FILE")
if kill -0 "$pid" 2>/dev/null; then
kill "$pid"
ok "JupyterLab stopped (PID $pid)"
else
warn "PID $pid not running (stale pid file)"
fi
rm -f "$PID_FILE"
else
warn "No PID file — JupyterLab not managed by app.sh"
fi
}
# ── Status ────────────────────────────────────────────────────────────
cmd_status() {
echo ""
echo " ${BOLD}autoresearch-quantum${NC}"
echo ""
# Venv
if [[ -f "$PYTHON" ]]; then
local py_ver
py_ver=$("$PYTHON" --version 2>&1)
ok "Virtual environment: $py_ver"
else
fail "Virtual environment: not found"
fi
# JupyterLab
if [[ -f "$PID_FILE" ]] && kill -0 "$(cat "$PID_FILE")" 2>/dev/null; then
ok "JupyterLab: running (PID $(cat "$PID_FILE"))"
else
warn "JupyterLab: not running"
fi
# Notebooks
local nb_count
nb_count=$(find "$PROJECT_ROOT/notebooks" -name "*.ipynb" | wc -l | tr -d ' ')
ok "Notebooks: $nb_count found"
# Learner progress
local progress_count
progress_count=$(find "$PROJECT_ROOT" -name "*_progress.json" 2>/dev/null | wc -l | tr -d ' ')
if (( progress_count > 0 )); then
ok "Learner progress files: $progress_count"
else
info "Learner progress files: none (fresh start)"
fi
echo ""
}
# ── Validate ──────────────────────────────────────────────────────────
cmd_validate() {
local mode="${1:---standard}"
if [[ ! -f "$PYTHON" ]]; then
fail "Not bootstrapped. Run: bash scripts/app.sh bootstrap"
exit 1
fi
echo ""
info "${BOLD}Running validation ($mode)...${NC}"
echo ""
local failed=0
# Ruff
info "Ruff lint..."
if "$VENV_DIR/bin/ruff" check src/ tests/ scripts/ --quiet; then
ok "Ruff: clean"
else
fail "Ruff: errors found"
failed=1
fi
# Mypy
info "Mypy type check..."
if "$PYTHON" -m mypy src/autoresearch_quantum/ --no-error-summary 2>/dev/null; then
ok "Mypy: clean"
else
fail "Mypy: type errors found"
failed=1
fi
if [[ "$mode" == "--quick" ]]; then
# Quick: unit tests only (no notebook execution)
info "Unit tests (quick)..."
if "$PYTHON" -m pytest tests/ -k "not test_notebook_executes and not test_browser" -q --tb=short --no-header 2>&1; then
ok "Unit tests: passed"
else
fail "Unit tests: failures"
failed=1
fi
else
# Standard: all tests except browser UX
info "Full test suite..."
if "$PYTHON" -m pytest tests/ -k "not test_browser" -v --tb=short --no-header 2>&1; then
ok "Tests: passed"
else
fail "Tests: failures"
failed=1
fi
fi
echo ""
if (( failed == 0 )); then
ok "${BOLD}All validation checks passed.${NC}"
else
fail "${BOLD}Some checks failed — see above.${NC}"
exit 1
fi
}
# ── Logs ──────────────────────────────────────────────────────────────
cmd_logs() {
if [[ -f "$LOG_FILE" ]]; then
tail -f "$LOG_FILE"
else
warn "No log file found. Start JupyterLab first."
fi
}
# ── Reset ─────────────────────────────────────────────────────────────
cmd_reset() {
info "Resetting learner progress..."
local count=0
while IFS= read -r -d '' f; do
rm "$f"
count=$((count + 1))
done < <(find "$PROJECT_ROOT" -name "*_progress.json" -print0 2>/dev/null)
ok "Removed $count progress file(s)"
info "Notebook outputs are preserved (use nbstripout to clear them)"
}
# ── Main dispatch ─────────────────────────────────────────────────────
case "${1:-help}" in
bootstrap) cmd_bootstrap ;;
start) cmd_start "${2:-}" ;;
stop) cmd_stop ;;
status) cmd_status ;;
validate) cmd_validate "${2:-}" ;;
logs) cmd_logs ;;
reset) cmd_reset ;;
help|--help|-h)
echo ""
echo " ${BOLD}autoresearch-quantum${NC} — lifecycle manager"
echo ""
echo " Usage: bash scripts/app.sh <command>"
echo ""
echo " Commands:"
echo " bootstrap Create venv, install deps, verify imports"
echo " start [--no-open] Launch JupyterLab (opens 00_START_HERE.ipynb)"
echo " stop Stop JupyterLab"
echo " status Show service and project status"
echo " validate [--quick] Run lint, type check, and tests"
echo " logs Tail JupyterLab output"
echo " reset Delete learner progress files"
echo ""
;;
*)
fail "Unknown command: $1"
echo " Run 'bash scripts/app.sh help' for usage."
exit 1
;;
esac

View file

@ -8,10 +8,10 @@ ORIG = len(nb["cells"])
def md(s):
lines = s.strip().split("\n")
return {"cell_type": "markdown", "metadata": {}, "source": [l + "\n" for l in lines[:-1]] + [lines[-1]]}
return {"cell_type": "markdown", "metadata": {}, "source": [ln + "\n" for ln in lines[:-1]] + [lines[-1]]}
def code(s):
lines = s.strip().split("\n")
return {"cell_type": "code", "metadata": {}, "source": [l + "\n" for l in lines[:-1]] + [lines[-1]], "outputs": [], "execution_count": None}
return {"cell_type": "code", "metadata": {}, "source": [ln + "\n" for ln in lines[:-1]] + [lines[-1]], "outputs": [], "execution_count": None}
ins = []

View file

@ -8,11 +8,11 @@ ORIG = len(nb["cells"])
def md(s):
lines = s.strip().split("\n")
return {"cell_type": "markdown", "metadata": {}, "source": [l + "\n" for l in lines[:-1]] + [lines[-1]]}
return {"cell_type": "markdown", "metadata": {}, "source": [ln + "\n" for ln in lines[:-1]] + [lines[-1]]}
def code(s):
lines = s.strip().split("\n")
return {"cell_type": "code", "metadata": {}, "source": [l + "\n" for l in lines[:-1]] + [lines[-1]], "outputs": [], "execution_count": None}
return {"cell_type": "code", "metadata": {}, "source": [ln + "\n" for ln in lines[:-1]] + [lines[-1]], "outputs": [], "execution_count": None}
ins = []

View file

@ -8,11 +8,11 @@ ORIG = len(nb["cells"])
def md(s):
lines = s.strip().split("\n")
return {"cell_type": "markdown", "metadata": {}, "source": [l + "\n" for l in lines[:-1]] + [lines[-1]]}
return {"cell_type": "markdown", "metadata": {}, "source": [ln + "\n" for ln in lines[:-1]] + [lines[-1]]}
def code(s):
lines = s.strip().split("\n")
return {"cell_type": "code", "metadata": {}, "source": [l + "\n" for l in lines[:-1]] + [lines[-1]], "outputs": [], "execution_count": None}
return {"cell_type": "code", "metadata": {}, "source": [ln + "\n" for ln in lines[:-1]] + [lines[-1]], "outputs": [], "execution_count": None}
ins = []

View file

@ -8,10 +8,10 @@ ORIG = len(nb["cells"])
def md(s):
lines = s.strip().split("\n")
return {"cell_type": "markdown", "metadata": {}, "source": [l + "\n" for l in lines[:-1]] + [lines[-1]]}
return {"cell_type": "markdown", "metadata": {}, "source": [ln + "\n" for ln in lines[:-1]] + [lines[-1]]}
def code(s):
lines = s.strip().split("\n")
return {"cell_type": "code", "metadata": {}, "source": [l + "\n" for l in lines[:-1]] + [lines[-1]], "outputs": [], "execution_count": None}
return {"cell_type": "code", "metadata": {}, "source": [ln + "\n" for ln in lines[:-1]] + [lines[-1]], "outputs": [], "execution_count": None}
ins = []

View file

@ -8,10 +8,10 @@ ORIG = len(nb["cells"])
def md(s):
lines = s.strip().split("\n")
return {"cell_type": "markdown", "metadata": {}, "source": [l + "\n" for l in lines[:-1]] + [lines[-1]]}
return {"cell_type": "markdown", "metadata": {}, "source": [ln + "\n" for ln in lines[:-1]] + [lines[-1]]}
def code(s):
lines = s.strip().split("\n")
return {"cell_type": "code", "metadata": {}, "source": [l + "\n" for l in lines[:-1]] + [lines[-1]], "outputs": [], "execution_count": None}
return {"cell_type": "code", "metadata": {}, "source": [ln + "\n" for ln in lines[:-1]] + [lines[-1]], "outputs": [], "execution_count": None}
ins = []

View file

@ -8,10 +8,10 @@ ORIG = len(nb["cells"])
def md(s):
lines = s.strip().split("\n")
return {"cell_type": "markdown", "metadata": {}, "source": [l + "\n" for l in lines[:-1]] + [lines[-1]]}
return {"cell_type": "markdown", "metadata": {}, "source": [ln + "\n" for ln in lines[:-1]] + [lines[-1]]}
def code(s):
lines = s.strip().split("\n")
return {"cell_type": "code", "metadata": {}, "source": [l + "\n" for l in lines[:-1]] + [lines[-1]], "outputs": [], "execution_count": None}
return {"cell_type": "code", "metadata": {}, "source": [ln + "\n" for ln in lines[:-1]] + [lines[-1]], "outputs": [], "execution_count": None}
ins = []

View file

@ -8,10 +8,10 @@ ORIG = len(nb["cells"])
def md(s):
lines = s.strip().split("\n")
return {"cell_type": "markdown", "metadata": {}, "source": [l + "\n" for l in lines[:-1]] + [lines[-1]]}
return {"cell_type": "markdown", "metadata": {}, "source": [ln + "\n" for ln in lines[:-1]] + [lines[-1]]}
def code(s):
lines = s.strip().split("\n")
return {"cell_type": "code", "metadata": {}, "source": [l + "\n" for l in lines[:-1]] + [lines[-1]], "outputs": [], "execution_count": None}
return {"cell_type": "code", "metadata": {}, "source": [ln + "\n" for ln in lines[:-1]] + [lines[-1]], "outputs": [], "execution_count": None}
ins = []

View file

@ -1,10 +1,9 @@
"""Fix math notation in explanation strings across all enhanced notebooks.
r"""Fix math notation in explanation strings across all enhanced notebooks.
Replaces raw pseudo-LaTeX in HTML explanation text with proper MathJax \(...\)
delimiters so Jupyter renders them correctly.
"""
import json
import re
from pathlib import Path
NOTEBOOKS = [

View file

@ -41,7 +41,7 @@ def _build_spec_from_config(config_path: Path, overrides: list[str]) -> tuple[An
def _print_json(payload: Any) -> None:
def _default(value: Any) -> Any:
if is_dataclass(value):
return asdict(value)
return asdict(value) # type: ignore[arg-type]
return str(value)
print(json.dumps(payload, indent=2, default=_default))

View file

@ -61,22 +61,22 @@ class IBMHardwareExecutor:
)
aggregate[context_name].append(summary)
x_value = float(aggregate["logical_x"][-1]["expectation"])
y_value = float(aggregate["logical_y"][-1]["expectation"])
spectator = float(aggregate["spectator_z"][-1]["expectation"])
acceptance = float(aggregate["acceptance"][-1]["acceptance_rate"])
x_value = float(aggregate["logical_x"][-1]["expectation"]) # type: ignore[arg-type]
y_value = float(aggregate["logical_y"][-1]["expectation"]) # type: ignore[arg-type]
spectator = float(aggregate["spectator_z"][-1]["expectation"]) # type: ignore[arg-type]
acceptance = float(aggregate["acceptance"][-1]["acceptance_rate"]) # type: ignore[arg-type]
repeat_scores.append(logical_magic_witness(x_value, y_value, spectator) * acceptance)
acceptance_rate = fmean(float(item["acceptance_rate"]) for item in aggregate["acceptance"])
logical_x = fmean(float(item["expectation"]) for item in aggregate["logical_x"])
logical_y = fmean(float(item["expectation"]) for item in aggregate["logical_y"])
spectator_z = fmean(float(item["expectation"]) for item in aggregate["spectator_z"])
acceptance_rate = fmean(float(item["acceptance_rate"]) for item in aggregate["acceptance"]) # type: ignore[arg-type]
logical_x = fmean(float(item["expectation"]) for item in aggregate["logical_x"]) # type: ignore[arg-type]
logical_y = fmean(float(item["expectation"]) for item in aggregate["logical_y"]) # type: ignore[arg-type]
spectator_z = fmean(float(item["expectation"]) for item in aggregate["spectator_z"]) # type: ignore[arg-type]
metrics = EvaluationMetrics(
logical_magic_witness=logical_magic_witness(logical_x, logical_y, spectator_z),
acceptance_rate=acceptance_rate,
codespace_rate=fmean(
float(item["acceptance_rate"])
float(item["acceptance_rate"]) # type: ignore[arg-type]
for summaries in aggregate.values()
for item in summaries
),
@ -111,8 +111,8 @@ class IBMHardwareExecutor:
metrics=metrics,
counts_summary={
name: {
"mean_acceptance_rate": fmean(float(item["acceptance_rate"]) for item in summaries),
"mean_expectation": fmean(float(item["expectation"]) for item in summaries),
"mean_acceptance_rate": fmean(float(item["acceptance_rate"]) for item in summaries), # type: ignore[arg-type]
"mean_expectation": fmean(float(item["expectation"]) for item in summaries), # type: ignore[arg-type]
"latest": summaries[-1],
}
for name, summaries in aggregate.items()

View file

@ -105,20 +105,20 @@ class LocalCheapExecutor:
)
aggregate[context_name].append(summary)
x_value = float(aggregate["logical_x"][-1]["expectation"])
y_value = float(aggregate["logical_y"][-1]["expectation"])
spectator = float(aggregate["spectator_z"][-1]["expectation"])
acceptance = float(aggregate["acceptance"][-1]["acceptance_rate"])
x_value = float(aggregate["logical_x"][-1]["expectation"]) # type: ignore[arg-type]
y_value = float(aggregate["logical_y"][-1]["expectation"]) # type: ignore[arg-type]
spectator = float(aggregate["spectator_z"][-1]["expectation"]) # type: ignore[arg-type]
acceptance = float(aggregate["acceptance"][-1]["acceptance_rate"]) # type: ignore[arg-type]
repeat_scores.append(logical_magic_witness(x_value, y_value, spectator) * acceptance)
acceptance_rate = fmean(float(item["acceptance_rate"]) for item in aggregate["acceptance"])
logical_x = fmean(float(item["expectation"]) for item in aggregate["logical_x"])
logical_y = fmean(float(item["expectation"]) for item in aggregate["logical_y"])
spectator_z = fmean(float(item["expectation"]) for item in aggregate["spectator_z"])
acceptance_rate = fmean(float(item["acceptance_rate"]) for item in aggregate["acceptance"]) # type: ignore[arg-type]
logical_x = fmean(float(item["expectation"]) for item in aggregate["logical_x"]) # type: ignore[arg-type]
logical_y = fmean(float(item["expectation"]) for item in aggregate["logical_y"]) # type: ignore[arg-type]
spectator_z = fmean(float(item["expectation"]) for item in aggregate["spectator_z"]) # type: ignore[arg-type]
witness = logical_magic_witness(logical_x, logical_y, spectator_z)
codespace_rate = fmean(
[
float(item["acceptance_rate"])
float(item["acceptance_rate"]) # type: ignore[arg-type]
for summaries in aggregate.values()
for item in summaries
]
@ -156,8 +156,8 @@ class LocalCheapExecutor:
score, quality, _ = score_metrics(metrics, "cheap", rung_config.score)
counts_summary = {
name: {
"mean_acceptance_rate": fmean(float(item["acceptance_rate"]) for item in summaries),
"mean_expectation": fmean(float(item["expectation"]) for item in summaries),
"mean_acceptance_rate": fmean(float(item["acceptance_rate"]) for item in summaries), # type: ignore[arg-type]
"mean_expectation": fmean(float(item["expectation"]) for item in summaries), # type: ignore[arg-type]
"latest": summaries[-1],
}
for name, summaries in aggregate.items()

View file

@ -72,9 +72,9 @@ def extract_rung_lesson(
invariants: list[str] = []
for dimension in rung_config.search_space.dimensions:
values = {record["spec"][dimension] for record in top_records}
if len(values) == 1:
value = next(iter(values))
top_values = {record["spec"][dimension] for record in top_records}
if len(top_values) == 1:
value = next(iter(top_values))
invariants.append(f"Top-ranked experiments consistently kept {dimension}={value}.")
hardware_specific = [

View file

@ -3,7 +3,7 @@ from __future__ import annotations
import json
from dataclasses import asdict
from pathlib import Path
from typing import Any
from typing import Any, cast
from ..models import (
ExperimentRecord,
@ -47,7 +47,7 @@ class ResearchStore:
def load_experiment(self, rung: int, experiment_id: str) -> dict[str, Any]:
path = self.experiment_dir(rung) / f"{experiment_id}.json"
return json.loads(path.read_text(encoding="utf-8"))
return cast(dict[str, Any], json.loads(path.read_text(encoding="utf-8")))
def list_experiments(self, rung: int) -> list[dict[str, Any]]:
return [
@ -132,4 +132,4 @@ class ResearchStore:
path = self.rung_dir(rung) / "propagated_spec.json"
if not path.exists():
return None
return json.loads(path.read_text(encoding="utf-8"))
return cast(dict[str, Any], json.loads(path.read_text(encoding="utf-8")))

View file

@ -41,8 +41,8 @@ def _record_from_json(payload: dict[str, Any]) -> ExperimentRecord:
parent_incumbent_id=payload.get("parent_incumbent_id"),
mutation_note=payload.get("mutation_note", ""),
spec=_from_dict_spec(payload["spec"]),
cheap_result=cheap, # type: ignore[arg-type]
expensive_result=expensive, # type: ignore[arg-type]
cheap_result=cheap,
expensive_result=expensive,
final_score=float(payload.get("final_score", 0.0)),
promoted_to_expensive=bool(payload.get("promoted_to_expensive", False)),
became_incumbent=bool(payload.get("became_incumbent", False)),
@ -85,7 +85,7 @@ class AutoresearchHarness:
"""Collect fingerprints of all experiments already tried in this rung."""
experiments = self.store.list_experiments(rung)
return {
ExperimentSpec(**{
ExperimentSpec(**{ # type: ignore[arg-type]
k: tuple(v) if k == "initial_layout" and isinstance(v, list) else v
for k, v in exp["spec"].items()
}).fingerprint()

View file

@ -115,10 +115,10 @@ def quiz(
padding="16px",
margin="12px 0",
border_radius="10px",
background_color=_QUIZ_BG, # type: ignore[arg-type]
background_color=_QUIZ_BG,
),
)
display(box)
display(box) # type: ignore[no-untyped-call]
# ── predict: prediction before running next cell ────────────────────────────
@ -190,10 +190,10 @@ def predict_choice(
padding="16px",
margin="12px 0",
border_radius="10px",
background_color="#fff8e1", # type: ignore[arg-type]
background_color="#fff8e1",
),
)
display(box)
display(box) # type: ignore[no-untyped-call]
# ── reflect: free-response with model answer reveal ─────────────────────────
@ -250,10 +250,10 @@ def reflect(
padding="16px",
margin="12px 0",
border_radius="10px",
background_color="#e3f2fd", # type: ignore[arg-type]
background_color="#e3f2fd",
),
)
display(box)
display(box) # type: ignore[no-untyped-call]
# ── order: drag-free ordering via dropdowns ─────────────────────────────────
@ -349,10 +349,10 @@ def order(
padding="16px",
margin="12px 0",
border_radius="10px",
background_color=_QUIZ_BG, # type: ignore[arg-type]
background_color=_QUIZ_BG,
),
)
display(box)
display(box) # type: ignore[no-untyped-call]
# ── checkpoint_summary (unchanged — pure HTML) ─────────────────────────────
@ -364,7 +364,7 @@ def checkpoint_summary(tracker: LearningTracker, section: str) -> None:
data = all_data.get(section, {"correct": 0, "incorrect": 0, "total": 0, "pct": 0.0})
if data["total"] == 0:
display(HTML(_neutral_html(
display(HTML(_neutral_html( # type: ignore[no-untyped-call]
f"<strong>Checkpoint — {section}:</strong> No scored questions in this section yet."
)))
return
@ -396,7 +396,7 @@ def checkpoint_summary(tracker: LearningTracker, section: str) -> None:
msg += "<br>This section needs more work. Re-read and retry the questions above."
msg += review
display(HTML(
display(HTML( # type: ignore[no-untyped-call]
f'<div style="border:2px solid {colour}; padding:12px 16px; margin:16px 0; '
f'border-radius:8px; background:#fafafa;">{msg}</div>'
))
@ -405,36 +405,43 @@ def checkpoint_summary(tracker: LearningTracker, section: str) -> None:
# ── Backwards-compatible aliases (old API → new API) ────────────────────────
# These allow old notebook cells to still work while we migrate.
def multiple_choice(tracker, qid, question, options, correct, answer="?",
bloom="remember", explanation=""):
def multiple_choice(tracker: LearningTracker, qid: str, question: str,
options: dict[str, str], correct: str, answer: str = "?",
bloom: str = "remember", explanation: str = "") -> None:
"""Legacy wrapper — redirects to quiz()."""
opt_list = [f"({k}) {v}" for k, v in options.items()]
correct_idx = list(options.keys()).index(correct.lower())
quiz(tracker, qid, question, opt_list, correct_idx, bloom, explanation)
def predict(tracker, qid, question, your_prediction="?", bloom="understand"):
def predict(tracker: LearningTracker, qid: str, question: str,
your_prediction: str = "?", bloom: str = "understand") -> None:
"""Legacy wrapper — use predict_choice() instead."""
warnings.warn(
"predict() is deprecated and does nothing. Use predict_choice() instead.",
DeprecationWarning, stacklevel=2,
)
def check_prediction(tracker, qid, actual_value=None, was_correct=False, explanation=""):
def check_prediction(tracker: LearningTracker, qid: str, actual_value: Any = None,
was_correct: bool = False, explanation: str = "") -> None:
"""Legacy wrapper — use predict_choice() instead."""
warnings.warn(
"check_prediction() is deprecated and does nothing. Use predict_choice() instead.",
DeprecationWarning, stacklevel=2,
)
def numerical_answer(tracker, qid, question, answer=0.0, correct=0.0,
tolerance=0.01, bloom="apply", explanation=""):
def numerical_answer(tracker: LearningTracker, qid: str, question: str,
answer: float = 0.0, correct: float = 0.0,
tolerance: float = 0.01, bloom: str = "apply",
explanation: str = "") -> None:
"""Legacy wrapper — use quiz() instead."""
warnings.warn(
"numerical_answer() is deprecated and does nothing. Use quiz() instead.",
DeprecationWarning, stacklevel=2,
)
def free_response(tracker, qid, question, answer="?", bloom="evaluate", model_answer=""):
def free_response(tracker: LearningTracker, qid: str, question: str,
answer: str = "?", bloom: str = "evaluate",
model_answer: str = "") -> None:
"""Legacy wrapper — redirects to reflect()."""
warnings.warn(
"free_response() is deprecated. Use reflect() directly.",
@ -442,16 +449,19 @@ def free_response(tracker, qid, question, answer="?", bloom="evaluate", model_an
)
reflect(tracker, qid, question, model_answer, bloom)
def code_challenge(tracker, qid, description, test_passed=False,
bloom="apply", hint="", explanation=""):
def code_challenge(tracker: LearningTracker, qid: str, description: str,
test_passed: bool = False, bloom: str = "apply",
hint: str = "", explanation: str = "") -> None:
"""Legacy wrapper — no replacement; use code cells with assertions."""
warnings.warn(
"code_challenge() is deprecated and does nothing. Use code cells with assertions.",
DeprecationWarning, stacklevel=2,
)
def concept_sort(tracker, qid, instruction, student_order=None,
correct_order=None, bloom="analyze", explanation=""):
def concept_sort(tracker: LearningTracker, qid: str, instruction: str,
student_order: list[str] | None = None,
correct_order: list[str] | None = None, bloom: str = "analyze",
explanation: str = "") -> None:
"""Legacy wrapper — use order() instead."""
warnings.warn(
"concept_sort() is deprecated. Use order() directly.",

View file

@ -227,7 +227,7 @@ class LearningTracker:
html_parts.append("</ul>")
html_parts.append("</div>")
display(HTML("\n".join(html_parts)))
display(HTML("\n".join(html_parts))) # type: ignore[no-untyped-call]
# ── persistence ─────────────────────────────────────────────────────
def save(self, path: str | Path | None = None) -> Path:

249
tests/test_browser_ux.py Normal file
View file

@ -0,0 +1,249 @@
"""End-to-end browser UX tests using Playwright.
Validates the complete consumer experience:
- JupyterLab launches and serves notebooks
- 00_START_HERE.ipynb loads and renders plan links
- Content notebooks load, render widgets, and navigation works
- The full walkthrough from entry point to plan completion is unbroken
Requires: pip install playwright && python -m playwright install chromium
Run with: pytest tests/test_browser_ux.py -m browser -v
"""
from __future__ import annotations
import os
import signal
import socket
import subprocess
import time
from pathlib import Path
import pytest
# Skip entire module if playwright is not installed
pw = pytest.importorskip("playwright.sync_api", reason="playwright not installed")
NOTEBOOK_DIR = Path("notebooks")
PROJECT_ROOT = Path(__file__).resolve().parent.parent
STARTUP_TIMEOUT = 30 # seconds to wait for Jupyter to start
PAGE_TIMEOUT = 15_000 # ms per page load
def _find_free_port() -> int:
"""Find a free TCP port."""
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.bind(("", 0))
return s.getsockname()[1]
@pytest.fixture(scope="module")
def jupyter_server():
"""Launch a JupyterLab server for the test session, tear it down after."""
port = _find_free_port()
venv_python = PROJECT_ROOT / ".venv" / "bin" / "python"
if not venv_python.exists():
pytest.skip("No .venv found — run 'bash scripts/app.sh bootstrap' first")
jupyter_bin = PROJECT_ROOT / ".venv" / "bin" / "jupyter"
if not jupyter_bin.exists():
pytest.skip("jupyter not installed in .venv")
proc = subprocess.Popen(
[
str(jupyter_bin), "lab",
f"--port={port}",
"--no-browser",
f"--notebook-dir={NOTEBOOK_DIR.resolve()}",
"--ServerApp.token=",
"--ServerApp.password=",
"--ServerApp.disable_check_xsrf=True",
],
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
cwd=str(PROJECT_ROOT),
preexec_fn=os.setsid,
)
base_url = f"http://localhost:{port}"
# Wait for server to become responsive
started = False
for _ in range(STARTUP_TIMEOUT * 2):
try:
with socket.create_connection(("localhost", port), timeout=0.5):
started = True
break
except OSError:
time.sleep(0.5)
if not started:
proc.kill()
pytest.skip(f"JupyterLab failed to start on port {port}")
# Give the server a moment to fully initialize
time.sleep(2)
yield base_url
# Teardown: kill the process group
try:
os.killpg(os.getpgid(proc.pid), signal.SIGTERM)
proc.wait(timeout=5)
except (ProcessLookupError, subprocess.TimeoutExpired):
os.killpg(os.getpgid(proc.pid), signal.SIGKILL)
@pytest.fixture(scope="module")
def browser_page(jupyter_server: str):
"""Create a Playwright browser page for the test session."""
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
context = browser.new_context()
page = context.new_page()
page.set_default_timeout(PAGE_TIMEOUT)
yield page, jupyter_server
browser.close()
# ── Markers ───────────────────────────────────────────────────────────
pytestmark = pytest.mark.browser
# ── Tests ─────────────────────────────────────────────────────────────
class TestJupyterLabLaunches:
"""Verify that JupyterLab is reachable and serves content."""
def test_api_reachable(self, jupyter_server: str) -> None:
"""JupyterLab API responds to requests."""
import urllib.request
with urllib.request.urlopen(f"{jupyter_server}/api") as resp:
assert resp.status == 200
def test_lab_page_loads(self, browser_page: tuple) -> None:
"""JupyterLab main page loads without errors."""
page, base_url = browser_page
page.goto(f"{base_url}/lab")
# JupyterLab should render its main application
page.wait_for_selector("#jp-main-dock-panel", timeout=PAGE_TIMEOUT)
class TestStartHereNotebook:
"""Verify the central entry point notebook renders correctly."""
def test_start_here_loads(self, browser_page: tuple) -> None:
"""00_START_HERE.ipynb opens in JupyterLab."""
page, base_url = browser_page
page.goto(f"{base_url}/lab/tree/00_START_HERE.ipynb")
# Wait for notebook to render
page.wait_for_selector(".jp-Notebook", timeout=PAGE_TIMEOUT)
def test_start_here_has_title(self, browser_page: tuple) -> None:
"""The entry notebook displays the main heading."""
page, base_url = browser_page
page.goto(f"{base_url}/lab/tree/00_START_HERE.ipynb")
page.wait_for_selector(".jp-Notebook", timeout=PAGE_TIMEOUT)
# Look for the title text in rendered markdown
content = page.text_content(".jp-Notebook")
assert content is not None
assert "Start Here" in content
def test_start_here_has_plan_links(self, browser_page: tuple) -> None:
"""The entry notebook contains links to all four plans."""
page, base_url = browser_page
page.goto(f"{base_url}/lab/tree/00_START_HERE.ipynb")
page.wait_for_selector(".jp-Notebook", timeout=PAGE_TIMEOUT)
content = page.text_content(".jp-Notebook") or ""
assert "Plan A" in content
assert "Plan B" in content
assert "Plan C" in content
assert "Plan D" in content
class TestPlanNotebooksLoad:
"""Verify that the first notebook of each plan loads without errors."""
@pytest.mark.parametrize("notebook_path", [
"plan_a/01_encoded_magic_state.ipynb",
"plan_b/spiral_notebook.ipynb",
"plan_c/00_dashboard.ipynb",
"plan_d/experiment_1_protection.ipynb",
])
def test_plan_entry_loads(self, browser_page: tuple, notebook_path: str) -> None:
"""Each plan's entry notebook opens and renders."""
page, base_url = browser_page
page.goto(f"{base_url}/lab/tree/{notebook_path}")
page.wait_for_selector(".jp-Notebook", timeout=PAGE_TIMEOUT)
# Verify the notebook rendered at least some cells
cells = page.query_selector_all(".jp-Cell")
assert len(cells) > 0, f"{notebook_path} rendered zero cells"
class TestNavigationLinks:
"""Verify that inter-notebook navigation links are present and functional."""
@pytest.mark.parametrize("notebook_path,expected_link_text", [
("plan_a/01_encoded_magic_state.ipynb", "Notebook 2"),
("plan_a/02_measuring_progress.ipynb", "Notebook 3"),
("plan_a/03_the_ratchet.ipynb", "Plan B"),
("plan_d/experiment_1_protection.ipynb", "Experiment 2"),
("plan_d/experiment_2_noise.ipynb", "Experiment 3"),
])
def test_navigation_link_present(
self, browser_page: tuple, notebook_path: str, expected_link_text: str,
) -> None:
"""Navigation footer cells contain expected forward-links."""
page, base_url = browser_page
page.goto(f"{base_url}/lab/tree/{notebook_path}")
page.wait_for_selector(".jp-Notebook", timeout=PAGE_TIMEOUT)
content = page.text_content(".jp-Notebook") or ""
assert expected_link_text in content, (
f"{notebook_path} missing navigation link containing '{expected_link_text}'"
)
def test_start_here_link_in_every_content_notebook(self, browser_page: tuple) -> None:
"""Every content notebook links back to START_HERE."""
page, base_url = browser_page
content_notebooks = [
"plan_a/01_encoded_magic_state.ipynb",
"plan_a/02_measuring_progress.ipynb",
"plan_a/03_the_ratchet.ipynb",
"plan_b/spiral_notebook.ipynb",
"plan_c/00_dashboard.ipynb",
"plan_d/experiment_1_protection.ipynb",
]
for nb in content_notebooks:
page.goto(f"{base_url}/lab/tree/{nb}")
page.wait_for_selector(".jp-Notebook", timeout=PAGE_TIMEOUT)
content = page.text_content(".jp-Notebook") or ""
assert "Start Here" in content, f"{nb} missing 'Start Here' back-link"
class TestWidgetRendering:
"""Verify that assessment widgets render after kernel execution."""
def test_notebook_with_widgets_can_execute(self, browser_page: tuple) -> None:
"""A notebook with widgets can be opened and cells executed.
This tests the full UX: open notebook run cells widgets appear.
We use a lightweight notebook (Plan D Experiment 1) which runs fast.
"""
page, base_url = browser_page
page.goto(f"{base_url}/lab/tree/plan_d/experiment_1_protection.ipynb")
page.wait_for_selector(".jp-Notebook", timeout=PAGE_TIMEOUT)
# Wait for kernel to be ready (kernel indicator in toolbar)
page.wait_for_selector(
".jp-Notebook-ExecutionIndicator",
timeout=PAGE_TIMEOUT,
)
# Verify the notebook has rendered cells
cells = page.query_selector_all(".jp-Cell")
assert len(cells) > 5, "Notebook should have rendered multiple cells"

288
tests/test_pedagogy.py Normal file
View file

@ -0,0 +1,288 @@
"""Pedagogical structure tests — validates educational quality invariants.
These tests enforce minimum standards for notebook prose, assessment density,
section structure, and learning progression. They catch pedagogical regressions
the same way unit tests catch code regressions.
"""
from __future__ import annotations
import re
from pathlib import Path
import nbformat
import pytest
NOTEBOOK_DIR = Path("notebooks")
CONTENT_NOTEBOOKS = sorted(
p for p in NOTEBOOK_DIR.rglob("*.ipynb")
if p.name != "00_START_HERE.ipynb"
)
def _notebook_id(path: Path) -> str:
return str(path.relative_to(NOTEBOOK_DIR)).replace("/", "__").removesuffix(".ipynb")
def _read_notebook(path: Path) -> nbformat.NotebookNode:
return nbformat.read(str(path), as_version=4)
def _markdown_cells(nb: nbformat.NotebookNode) -> list[str]:
return ["".join(c.source) for c in nb.cells if c.cell_type == "markdown"]
def _code_cells(nb: nbformat.NotebookNode) -> list[str]:
return ["".join(c.source) for c in nb.cells if c.cell_type == "code"]
def _word_count(text: str) -> int:
"""Count words in text, stripping markdown/HTML/LaTeX markup."""
clean = re.sub(r"<[^>]+>", "", text) # strip HTML
clean = re.sub(r"\$[^$]+\$", "MATH", clean) # replace inline LaTeX
clean = re.sub(r"\$\$[^$]+\$\$", "MATH", clean) # block LaTeX
clean = re.sub(r"[#*_`|>~\-=]", "", clean) # strip markdown chars
clean = re.sub(r"\[([^\]]*)\]\([^)]*\)", r"\1", clean) # links → text
return len(clean.split())
# ── Fixtures ──────────────────────────────────────────────────────────
@pytest.fixture(params=CONTENT_NOTEBOOKS, ids=[_notebook_id(p) for p in CONTENT_NOTEBOOKS])
def notebook(request: pytest.FixtureRequest) -> tuple[Path, nbformat.NotebookNode]:
path = request.param
return path, _read_notebook(path)
# ── Prose Quality ─────────────────────────────────────────────────────
class TestProseQuality:
"""Every notebook must have sufficient explanatory text."""
MIN_TOTAL_WORDS = 200 # minimum words across all markdown cells
MIN_MARKDOWN_RATIO = 0.25 # at least 25% of cells should be markdown
def test_minimum_word_count(self, notebook: tuple[Path, nbformat.NotebookNode]) -> None:
"""Each notebook has at least MIN_TOTAL_WORDS of prose."""
path, nb = notebook
md_cells = _markdown_cells(nb)
total_words = sum(_word_count(cell) for cell in md_cells)
assert total_words >= self.MIN_TOTAL_WORDS, (
f"{path}: only {total_words} words of prose "
f"(minimum {self.MIN_TOTAL_WORDS})"
)
def test_markdown_to_code_ratio(self, notebook: tuple[Path, nbformat.NotebookNode]) -> None:
"""Notebooks are not code-only — sufficient markdown explanation exists."""
path, nb = notebook
md_count = len([c for c in nb.cells if c.cell_type == "markdown"])
total = len(nb.cells)
if total == 0:
pytest.skip("empty notebook")
ratio = md_count / total
assert ratio >= self.MIN_MARKDOWN_RATIO, (
f"{path}: markdown ratio {ratio:.0%} "
f"(minimum {self.MIN_MARKDOWN_RATIO:.0%}, "
f"{md_count} markdown / {total} total cells)"
)
# ── Section Structure ─────────────────────────────────────────────────
class TestSectionStructure:
"""Notebooks must have clear sectional organization."""
def test_has_title_header(self, notebook: tuple[Path, nbformat.NotebookNode]) -> None:
"""First cell is a markdown cell with a level-1 or level-2 heading."""
path, nb = notebook
first = nb.cells[0]
assert first.cell_type == "markdown", (
f"{path}: first cell is {first.cell_type}, expected markdown header"
)
src = "".join(first.source)
assert re.match(r"^#{1,2}\s", src), (
f"{path}: first cell doesn't start with # or ## heading"
)
def test_has_multiple_sections(self, notebook: tuple[Path, nbformat.NotebookNode]) -> None:
"""Notebook has at least 2 section headers (## or ###)."""
path, nb = notebook
md_text = "\n".join(_markdown_cells(nb))
sections = re.findall(r"^#{2,3}\s", md_text, re.MULTILINE)
assert len(sections) >= 2, (
f"{path}: only {len(sections)} section headers found (minimum 2)"
)
# ── Assessment Density ────────────────────────────────────────────────
ASSESSMENT_PATTERN = re.compile(r"(quiz|predict_choice|reflect|order)\s*\(")
class TestAssessmentDensity:
"""Notebooks must have sufficient interactive assessments."""
MIN_ASSESSMENTS = 2 # at least 2 assessment calls per notebook
def test_minimum_assessment_count(self, notebook: tuple[Path, nbformat.NotebookNode]) -> None:
"""Each notebook has at least MIN_ASSESSMENTS interactive assessments."""
path, nb = notebook
code = "\n".join(_code_cells(nb))
# Exclude the LearningTracker import/setup line
code_no_setup = "\n".join(
line for line in code.split("\n")
if "LearningTracker" not in line
)
matches = ASSESSMENT_PATTERN.findall(code_no_setup)
assert len(matches) >= self.MIN_ASSESSMENTS, (
f"{path}: only {len(matches)} assessments "
f"(minimum {self.MIN_ASSESSMENTS})"
)
def test_assessment_variety(self, notebook: tuple[Path, nbformat.NotebookNode]) -> None:
"""Each notebook uses at least 2 different assessment types."""
path, nb = notebook
code = "\n".join(_code_cells(nb))
code_no_setup = "\n".join(
line for line in code.split("\n")
if "LearningTracker" not in line
)
types_found = set(ASSESSMENT_PATTERN.findall(code_no_setup))
assert len(types_found) >= 2, (
f"{path}: only {len(types_found)} assessment type(s) "
f"({types_found}), minimum 2 for variety"
)
# ── Bloom's Taxonomy Coverage ─────────────────────────────────────────
BLOOM_PATTERN = re.compile(r'bloom\s*=\s*["\'](\w+)["\']')
class TestBloomCoverage:
"""Notebooks should exercise multiple Bloom's taxonomy levels."""
def test_bloom_levels_used(self, notebook: tuple[Path, nbformat.NotebookNode]) -> None:
"""Each notebook exercises at least 2 Bloom's taxonomy levels."""
path, nb = notebook
code = "\n".join(_code_cells(nb))
blooms = set(BLOOM_PATTERN.findall(code))
if not blooms:
pytest.skip("no bloom= parameters found")
assert len(blooms) >= 2, (
f"{path}: only {len(blooms)} Bloom level(s) ({blooms}), "
f"minimum 2 for cognitive depth"
)
# ── Checkpoint Coverage ───────────────────────────────────────────────
class TestCheckpointCoverage:
"""Notebooks with many assessments should include checkpoint summaries."""
MIN_ASSESSMENTS_FOR_CHECKPOINT = 4
def test_checkpoint_present_when_needed(
self, notebook: tuple[Path, nbformat.NotebookNode],
) -> None:
"""Notebooks with 4+ assessments should include checkpoint_summary calls."""
path, nb = notebook
code = "\n".join(_code_cells(nb))
assessment_count = len(ASSESSMENT_PATTERN.findall(code))
if assessment_count < self.MIN_ASSESSMENTS_FOR_CHECKPOINT:
pytest.skip(f"only {assessment_count} assessments (threshold: {self.MIN_ASSESSMENTS_FOR_CHECKPOINT})")
has_checkpoint = "checkpoint_summary" in code
assert has_checkpoint, (
f"{path}: {assessment_count} assessments but no checkpoint_summary call"
)
# ── Learning Tracker Integration ──────────────────────────────────────
class TestTrackerIntegration:
"""Every content notebook must integrate the learning tracker."""
def test_tracker_initialization(self, notebook: tuple[Path, nbformat.NotebookNode]) -> None:
"""Each notebook creates a LearningTracker instance."""
path, nb = notebook
code = "\n".join(_code_cells(nb))
assert "LearningTracker" in code, (
f"{path}: no LearningTracker initialization found"
)
def test_tracker_dashboard_at_end(self, notebook: tuple[Path, nbformat.NotebookNode]) -> None:
"""Each notebook calls tracker.dashboard() near the end."""
path, nb = notebook
code_cells = _code_cells(nb)
if not code_cells:
pytest.skip("no code cells")
# Check last 3 code cells for dashboard call
tail = "\n".join(code_cells[-3:])
assert "dashboard()" in tail, (
f"{path}: no tracker.dashboard() call in final code cells"
)
def test_tracker_save_at_end(self, notebook: tuple[Path, nbformat.NotebookNode]) -> None:
"""Each notebook saves tracker progress near the end."""
path, nb = notebook
code_cells = _code_cells(nb)
if not code_cells:
pytest.skip("no code cells")
tail = "\n".join(code_cells[-3:])
assert "save()" in tail, (
f"{path}: no tracker.save() call in final code cells"
)
# ── Key Insight Pattern ───────────────────────────────────────────────
class TestKeyInsights:
"""Notebooks should have 'Key Insight' callouts for important takeaways."""
# Interactive dashboards and short notebooks are exempt
EXEMPT = {"00_dashboard.ipynb"}
def test_has_key_insights(self, notebook: tuple[Path, nbformat.NotebookNode]) -> None:
"""Notebooks with 5+ sections should have at least one Key Insight callout."""
path, nb = notebook
if path.name in self.EXEMPT:
pytest.skip("interactive dashboard — exempt from insight callouts")
md_text = "\n".join(_markdown_cells(nb))
sections = re.findall(r"^#{2,3}\s", md_text, re.MULTILINE)
if len(sections) < 5:
pytest.skip(f"only {len(sections)} sections (threshold: 5)")
has_insight = bool(
re.search(
r"key insight|observe:|key fact|result:|proof summary|important|tip:",
md_text, re.IGNORECASE,
)
)
assert has_insight, (
f"{path}: {len(sections)} sections but no 'Key Insight' callout"
)
# ── Cross-Plan Consistency ────────────────────────────────────────────
class TestCrossPlanConsistency:
"""All four plans should cover core concepts."""
CORE_CONCEPTS = ["stabiliz", "magic", "witness", "ratchet"]
def test_all_plans_cover_core_concepts(self) -> None:
"""Each plan's notebooks collectively mention all core concepts."""
plans = {
"plan_a": sorted(NOTEBOOK_DIR.glob("plan_a/*.ipynb")),
"plan_b": sorted(NOTEBOOK_DIR.glob("plan_b/*.ipynb")),
"plan_c": sorted(NOTEBOOK_DIR.glob("plan_c/*.ipynb")),
"plan_d": sorted(NOTEBOOK_DIR.glob("plan_d/*.ipynb")),
}
for plan_name, notebooks in plans.items():
all_text = ""
for nb_path in notebooks:
nb = _read_notebook(nb_path)
all_text += "\n".join(_markdown_cells(nb) + _code_cells(nb))
all_text_lower = all_text.lower()
for concept in self.CORE_CONCEPTS:
assert concept in all_text_lower, (
f"{plan_name}: core concept '{concept}' not found in any notebook"
)