Sync all documentation with current project ground truth

README: rewrite with Quick Start (app.sh), 335-test count, teaching layer narrative, testing/validation section, CI/CD docs, pre-commit hooks. THE_STORY: add Part 4 (teaching layer), Part 5 (app.sh consumer experience), update file map with all 13 test files and teaching/notebook/paper entries. compendium.tex: update notebook count (8→12), add Plan D cross-references. autoresearch_quantum.tex: update test counts (21→335), add app.sh validate. learning_objectives.md: add entry point reference and assessment type glossary. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-14 20:37:51 +00:00 · 2026-04-15 20:55:02 +02:00 · 2026-04-15 20:55:02 +02:00 · 55237d5f73
commit 55237d5f73
parent 29caba3a1a
5 changed files with 455 additions and 212 deletions
--- a/README.md
+++ b/README.md
@ -1,93 +1,205 @@
 # Autoresearch Quantum

-`autoresearch-quantum` is a Python research harness for a Karpathy-style autoresearch ratchet in quantum experiments:
+`autoresearch-quantum` is a Python research harness for a Karpathy-style autoresearch ratchet in quantum experiments, combined with a four-plan interactive coursework built on Jupyter notebooks.

- keep an incumbent experiment
- generate challenger experiments
- screen challengers on a cheap tier
- promote only justified challengers to an expensive tier
- replace the incumbent only when the challenger wins on the final criterion
- log every ratchet step
- extract a transferable lesson at the end of each rung
+The system has two layers:

-The first built-in experiment family targets encoded magic-state preparation in the `[[4,2,2]]` code with Qiskit. The framework is designed so the `[[4,2,2]]` rung is not the destination. It is the first rung in a ladder that shifts from best-circuit hunting toward reusable design rules for larger encoded workflows.
+1. **Research engine** --- an automated loop that discovers the best way to prepare encoded magic states on the [[4,2,2]] quantum error-detecting code. It proposes, evaluates, compares, learns, and repeats without human intervention.
+
+2. **Teaching layer** --- 12 Jupyter notebooks across 4 learning plans, each teaching the same core material through a different pedagogical lens: sequential (Plan A), spiral (Plan B), parallel tracks (Plan C), and hypothesis-driven experiments (Plan D). Every notebook includes interactive widget-based assessments, per-student progress tracking, and Bloom's taxonomy-aligned exercises.
+
+No IBM account or API key is needed --- everything runs locally with the Aer simulator.

 ## Project Tree

 ```text
 autoresearch-quantum/
 ├── configs/rungs/
-│   ├── rung1.yaml          Baseline: what recipe works?
-│   ├── rung2.yaml          Stability under noise variation
-│   ├── rung3.yaml          Transfer across backends
-│   ├── rung4.yaml          Factory throughput / cost
-│   └── rung5.yaml          Rosenfeld direction
+│   ├── rung1.yaml              Baseline: what recipe works?
+│   ├── rung2.yaml              Stability under noise variation
+│   ├── rung3.yaml              Transfer across backends
+│   ├── rung4.yaml              Factory throughput / cost
+│   └── rung5.yaml              Rosenfeld direction
 ├── src/autoresearch_quantum/
-│   ├── cli.py              CLI entry point
-│   ├── config.py           YAML config loader
-│   ├── models.py           All data structures
+│   ├── cli.py                  CLI entry point
+│   ├── config.py               YAML config loader
+│   ├── models.py               All data structures
 │   ├── codes/
-│   │   └── four_two_two.py [[4,2,2]] stabilisers, encoder, seed gates
+│   │   └── four_two_two.py     [[4,2,2]] stabilisers, encoder, seed gates
 │   ├── experiments/
 │   │   └── encoded_magic_state.py  Circuit bundle builder
 │   ├── execution/
-│   │   ├── analysis.py     Postselection, witness, stability
-│   │   ├── backends.py     Backend resolution
-│   │   ├── hardware.py     IBM hardware executor
-│   │   ├── local.py        Aer noise simulation executor
-│   │   ├── transfer.py     Cross-backend transfer evaluator
-│   │   └── transpile.py    Transpilation utilities
+│   │   ├── analysis.py         Postselection, witness, stability
+│   │   ├── backends.py         Backend resolution
+│   │   ├── hardware.py         IBM hardware executor
+│   │   ├── local.py            Aer noise simulation executor
+│   │   ├── transfer.py         Cross-backend transfer evaluator
+│   │   └── transpile.py        Transpilation utilities
 │   ├── lessons/
-│   │   ├── extractor.py    Human-readable lesson extraction
-│   │   └── feedback.py     Machine-readable rules + search narrowing
+│   │   ├── extractor.py        Human-readable lesson extraction
+│   │   └── feedback.py         Machine-readable rules + search narrowing
 │   ├── persistence/
-│   │   └── store.py        JSON file store with resumability
+│   │   └── store.py            JSON file store with resumability
 │   ├── ratchet/
-│   │   └── runner.py       AutoresearchHarness orchestrator
+│   │   └── runner.py           AutoresearchHarness orchestrator
 │   ├── scoring/
-│   │   └── score.py        WAC + factory throughput scorers
+│   │   └── score.py            WAC + factory throughput scorers
 │   ├── search/
-│   │   ├── challengers.py  Neighbour generation with dedup
-│   │   └── strategies.py   NeighborWalk, RandomCombo, LessonGuided
+│   │   ├── challengers.py      Neighbour generation with dedup
+│   │   └── strategies.py       NeighborWalk, RandomCombo, LessonGuided
 │   └── teaching/
-│       ├── assess.py       Widget-based quizzes, predictions, reflections
-│       └── tracker.py      LearningTracker — per-student progress tracking
+│       ├── assess.py           Widget-based quizzes, predictions, reflections
+│       └── tracker.py          LearningTracker --- per-student progress tracking
 ├── paper/
 │   ├── autoresearch_quantum.tex   Full technical paper (LaTeX)
 │   ├── autoresearch_quantum.pdf   Compiled PDF (19 pages)
 │   ├── compendium.tex             Companion textbook (LaTeX)
 │   └── compendium.pdf             Compiled PDF (36 pages)
 ├── notebooks/
-│   ├── plan_a/              Bottom-up: 3 sequential notebooks
+│   ├── 00_START_HERE.ipynb     Central entry point --- plan selector
+│   ├── learning_objectives.md  Per-notebook, per-section learning objectives
+│   ├── plan_a/                 Bottom-up: 3 sequential notebooks
 │   │   ├── 01_encoded_magic_state.ipynb
 │   │   ├── 02_measuring_progress.ipynb
 │   │   └── 03_the_ratchet.ipynb
-│   ├── plan_b/              Spiral: 1 notebook, three passes
+│   ├── plan_b/                 Spiral: 1 notebook, three passes
 │   │   └── spiral_notebook.ipynb
-│   ├── plan_c/              Parallel tracks + dashboard
+│   ├── plan_c/                 Parallel tracks + dashboard
 │   │   ├── 00_dashboard.ipynb
 │   │   ├── track_a_physics.ipynb
 │   │   ├── track_b_engineering.ipynb
 │   │   └── track_c_search.ipynb
-│   └── plan_d/              Three claim-driven experiments
+│   └── plan_d/                 Three claim-driven experiments
 │       ├── experiment_1_protection.ipynb
 │       ├── experiment_2_noise.ipynb
 │       └── experiment_3_optimisation.ipynb
-├── tests/                   107 tests
-│   ├── test_analysis.py
-│   ├── test_cli.py
-│   ├── test_codes.py
-│   ├── test_config.py
-│   ├── test_experiments.py
-│   ├── test_feedback.py
-│   ├── test_harness.py
-│   ├── test_persistence.py
-│   └── test_scoring.py
-├── THE_STORY.md             Narrative documentation
-├── pyproject.toml
+├── scripts/
+│   └── app.sh                  Consumer lifecycle manager
+├── tests/                      335 tests across 13 files
+│   ├── test_analysis.py        Postselection & witness tests
+│   ├── test_browser_ux.py      Playwright end-to-end UX tests
+│   ├── test_cli.py             CLI subcommand tests
+│   ├── test_codes.py           [[4,2,2]] code correctness
+│   ├── test_config.py          YAML config loading
+│   ├── test_experiments.py     Circuit bundle construction
+│   ├── test_feedback.py        Lesson extraction & search rules
+│   ├── test_harness.py         Full ratchet integration tests
+│   ├── test_notebooks.py       Notebook execution & structure
+│   ├── test_pedagogy.py        Pedagogical quality invariants
+│   ├── test_persistence.py     JSON store round-trips
+│   ├── test_scoring.py         Score function correctness
+│   └── test_teaching.py        Assessment widget & tracker tests
+├── .github/workflows/ci.yml    CI: lint, type check, test matrix, notebook execution
+├── .pre-commit-config.yaml     Ruff, mypy, nbstripout, hygiene hooks
+├── THE_STORY.md                Narrative documentation (system design)
+├── pyproject.toml              Build config, dependencies, tool settings
 └── README.md
 ```

+## Quick Start
+
+The fastest way to get running:
+
+```bash
+# Clone and bootstrap (creates venv, installs everything, registers Jupyter kernel)
+git clone https://github.com/saymrwulf/autoresearch-quantum.git
+cd autoresearch-quantum
+bash scripts/app.sh bootstrap
+
+# Launch JupyterLab (opens 00_START_HERE.ipynb in your browser)
+bash scripts/app.sh start
+```
+
+The `app.sh` lifecycle manager handles the entire consumer experience:
+
+| Command | What it does |
+|---------|-------------|
+| `bash scripts/app.sh bootstrap` | Create venv, install deps, register Jupyter kernel, verify imports |
+| `bash scripts/app.sh start` | Launch JupyterLab (auto-opens `00_START_HERE.ipynb`) |
+| `bash scripts/app.sh start --no-open` | Launch without opening browser |
+| `bash scripts/app.sh stop` | Stop JupyterLab |
+| `bash scripts/app.sh status` | Show venv, server, notebook, and progress status |
+| `bash scripts/app.sh validate` | Run full validation: ruff + mypy + pytest |
+| `bash scripts/app.sh validate --quick` | Lint + type check + unit tests only |
+| `bash scripts/app.sh logs` | Tail JupyterLab output |
+| `bash scripts/app.sh reset` | Delete learner progress files |
+
+### Manual installation
+
+If you prefer manual setup:
+
+```bash
+python3 -m venv .venv
+. .venv/bin/activate
+pip install -e '.[dev,notebooks]'
+```
+
+For the optional IBM hardware path:
+
+```bash
+pip install -e '.[hardware,dev,notebooks]'
+```
+
+## Jupyter Notebooks --- Learning Plans
+
+The `notebooks/` folder contains **12 notebooks across 4 independent learning plans**, all accessible from a central entry point: **`00_START_HERE.ipynb`**.
+
+Each plan teaches the same core material (encoded magic-state preparation, measurement, and the ratchet optimiser) through a different didactic lens. Every content notebook includes:
+
+- **Interactive assessments** --- multiple-choice quizzes, predictions, reflections, and ordering exercises (ipywidgets)
+- **Per-student progress tracking** --- `LearningTracker` records scores, Bloom's levels, and time per assessment
+- **Navigation links** --- forward/backward links between notebooks, cross-plan suggestions, and back-links to Start Here
+- **Key Insight callouts** --- highlighted takeaways for important concepts
+- **Checkpoint summaries** --- mid-notebook progress reviews in longer notebooks
+
+### Plan A --- Bottom-Up (3 sequential notebooks)
+
+| # | File | What you learn |
+|---|------|----------------|
+| 1 | `plan_a/01_encoded_magic_state.ipynb` | T-state, [[4,2,2]] encoder, stabilisers, error detection, postselection |
+| 2 | `plan_a/02_measuring_progress.ipynb` | Noise, logical operators, magic witness, scoring formula, parameter sweeps |
+| 3 | `plan_a/03_the_ratchet.ipynb` | Incumbent/challenger model, ratchet steps, lessons, cross-rung propagation |
+
+Start with notebook 01 and work through in order. Run each cell top-to-bottom (Shift+Enter).
+
+### Plan B --- Spiral (1 notebook, three passes)
+
+| File | What you learn |
+|------|----------------|
+| `plan_b/spiral_notebook.ipynb` | **Pass 1:** 5-min demo (black-box). **Pass 2:** Open the box (circuits, stabilisers, scoring). **Pass 3:** Make it your own (modify parameters, run experiments). |
+
+One notebook, 78 cells. Each pass revisits the same system at a deeper level.
+
+### Plan C --- Parallel Tracks (4 notebooks)
+
+| File | Focus |
+|------|-------|
+| `plan_c/00_dashboard.ipynb` | Interactive dashboard (ipywidgets) --- run experiments from dropdowns |
+| `plan_c/track_a_physics.ipynb` | Pure quantum mechanics: Eastin-Knill, Bloch sphere, stabiliser algebra |
+| `plan_c/track_b_engineering.ipynb` | Noise models, transpilation, cost model, failure modes |
+| `plan_c/track_c_search.ipynb` | Parameter space, search strategies, lesson extraction, cross-rung transfer |
+
+Start with the dashboard for an overview, then dive into whichever track interests you. The three tracks are independent and can be read in any order.
+
+### Plan D --- Three Claim-Driven Experiments
+
+| # | File | Hypothesis |
+|---|------|-----------|
+| 1 | `plan_d/experiment_1_protection.ipynb` | The [[4,2,2]] code can protect a magic state: W=1.0, all errors detected |
+| 2 | `plan_d/experiment_2_noise.ipynb` | Noise degrades quality but parameter choice matters >2x |
+| 3 | `plan_d/experiment_3_optimisation.ipynb` | A ratchet can learn to optimise and its knowledge transfers |
+
+Each notebook follows: **Hypothesis -> Claim -> Experiment -> Proof -> Next Hypothesis**.
+
+### Troubleshooting
+
+| Problem | Fix |
+|---------|-----|
+| `ModuleNotFoundError: autoresearch_quantum` | Run `bash scripts/app.sh bootstrap` or `pip install -e '.[notebooks]'` |
+| `ModuleNotFoundError: ipywidgets` | Run `pip install ipywidgets` --- needed for interactive assessments |
+| Plots don't render | Make sure `%matplotlib inline` is in the first code cell (it already is) |
+| Kernel not found | In JupyterLab, select **Kernel > Change Kernel** and pick the `.venv` Python |
+
 ## Scientific Framing

 ### What is optimized
@ -144,7 +256,7 @@ Expensive tier:

 ## Built-In `[[4,2,2]]` Experiment

-The built-in experiment prepares an encoded logical T-state on one logical qubit of the `[[4,2,2]]` code while keeping the spectator logical qubit in `|0⟩`. The code utilities live in [`four_two_two.py`](src/autoresearch_quantum/codes/four_two_two.py).
+The built-in experiment prepares an encoded logical T-state on one logical qubit of the `[[4,2,2]]` code while keeping the spectator logical qubit in `|0>`. The code utilities live in [`four_two_two.py`](src/autoresearch_quantum/codes/four_two_two.py).

 The harness evaluates:

@ -158,108 +270,12 @@ This keeps the core scientific distinction explicit:
 - a circuit can be locally good for `[[4,2,2]]`
 - a rule is only valuable if it keeps helping across new backends or new rungs

-## Installation
-
-Create an isolated environment in the project root and install the package:
-
-```bash
-python3 -m venv .venv
-. .venv/bin/activate
-pip install -e '.[dev,notebooks]'
-```
-
-For the optional IBM hardware path:
-
-```bash
-pip install -e '.[hardware,dev,notebooks]'
-```
-
-If you want the CLI without installing editable mode, use `PYTHONPATH=src`.
-
-## Jupyter Notebooks --- Learning Plans
-
-The `notebooks/` folder contains four independent learning experiences.
-Each plan teaches the same material (encoded magic-state preparation, measurement, and the ratchet optimiser) through a different didactic lens.
-**No IBM account or API key is needed** --- everything runs locally with the Aer simulator.
-
-### Quick start
-
-```bash
-# 1. Activate the virtual environment (if not already active)
-. .venv/bin/activate
-
-# 2. Install the project with notebook dependencies
-pip install -e '.[notebooks]'
-
-# 3. Start the Jupyter server
-jupyter lab --notebook-dir=notebooks
-```
-
-This opens JupyterLab in your browser (usually at http://localhost:8888).
-Navigate into any plan folder and open the first notebook.
-
-> **Alternative:** If you prefer the classic notebook interface, run
-> `jupyter notebook --notebook-dir=notebooks` instead.
-
-### Plan A --- Bottom-Up (3 sequential notebooks)
-
-| # | File | What you learn |
-|---|------|----------------|
-| 1 | `plan_a/01_encoded_magic_state.ipynb` | T-state, [[4,2,2]] encoder, stabilisers, error detection, postselection |
-| 2 | `plan_a/02_measuring_progress.ipynb` | Noise, logical operators, magic witness, scoring formula, parameter sweeps |
-| 3 | `plan_a/03_the_ratchet.ipynb` | Incumbent/challenger model, ratchet steps, lessons, cross-rung propagation |
-
-Start with notebook 01 and work through in order.
-Run each cell top-to-bottom (Shift+Enter).
-
-### Plan B --- Spiral (1 notebook, three passes)
-
-| File | What you learn |
-|------|----------------|
-| `plan_b/spiral_notebook.ipynb` | **Pass 1:** 5-min demo (black-box). **Pass 2:** Open the box (circuits, stabilisers, scoring). **Pass 3:** Make it your own (modify parameters, run experiments). |
-
-One notebook, 78 cells. Each pass revisits the same system at a deeper level.
-
-### Plan C --- Parallel Tracks (4 notebooks)
-
-| File | Focus |
-|------|-------|
-| `plan_c/00_dashboard.ipynb` | Interactive dashboard (ipywidgets) --- run experiments from dropdowns |
-| `plan_c/track_a_physics.ipynb` | Pure quantum mechanics: Eastin-Knill, Bloch sphere, stabiliser algebra |
-| `plan_c/track_b_engineering.ipynb` | Noise models, transpilation, cost model, failure modes |
-| `plan_c/track_c_search.ipynb` | Parameter space, search strategies, lesson extraction, cross-rung transfer |
-
-Start with the dashboard for an overview, then dive into whichever track interests you.
-The three tracks are independent and can be read in any order.
-
-### Plan D --- Three Claim-Driven Experiments
-
-| # | File | Hypothesis |
-|---|------|-----------|
-| 1 | `plan_d/experiment_1_protection.ipynb` | The [[4,2,2]] code can protect a magic state: W=1.0, all errors detected |
-| 2 | `plan_d/experiment_2_noise.ipynb` | Noise degrades quality but parameter choice matters >2× |
-| 3 | `plan_d/experiment_3_optimisation.ipynb` | A ratchet can learn to optimise and its knowledge transfers |
-
-Each notebook follows: **Hypothesis → Claim → Experiment → Proof → Next Hypothesis**.
-The output of each experiment motivates the next.
-
-### Troubleshooting
-
-| Problem | Fix |
-|---------|-----|
-| `ModuleNotFoundError: autoresearch_quantum` | Run `pip install -e '.[notebooks]'` inside the activated `.venv` |
-| `ModuleNotFoundError: ipywidgets` | Run `pip install ipywidgets` --- needed for the Plan C dashboard |
-| Plots don't render | Make sure `%matplotlib inline` is in the first code cell (it already is) |
-| Kernel not found | In JupyterLab, select **Kernel > Change Kernel** and pick the `.venv` Python |
-
-## How To Run
+## How To Run (CLI)

 ### 1. Run a single local experiment

-Use the rung config bootstrap incumbent as-is:
-
 ```bash
-PYTHONPATH=src .venv/bin/python -m autoresearch_quantum run-experiment \
+autoresearch-quantum run-experiment \
  --config configs/rungs/rung1.yaml \
  --store-dir data/demo
 ```
@ -267,7 +283,7 @@ PYTHONPATH=src .venv/bin/python -m autoresearch_quantum run-experiment \
 Override individual experiment fields:

 ```bash
-PYTHONPATH=src .venv/bin/python -m autoresearch_quantum run-experiment \
+autoresearch-quantum run-experiment \
  --config configs/rungs/rung1.yaml \
  --store-dir data/demo \
  --set verification=z_only \
@ -278,7 +294,7 @@ PYTHONPATH=src .venv/bin/python -m autoresearch_quantum run-experiment \
 ### 2. Run one ratchet step

 ```bash
-PYTHONPATH=src .venv/bin/python -m autoresearch_quantum run-step \
+autoresearch-quantum run-step \
  --config configs/rungs/rung1.yaml \
  --store-dir data/demo
 ```
@ -294,7 +310,7 @@ This will:
 ### 3. Run one full rung

 ```bash
-PYTHONPATH=src .venv/bin/python -m autoresearch_quantum run-rung \
+autoresearch-quantum run-rung \
  --config configs/rungs/rung1.yaml \
  --store-dir data/demo
 ```
@ -310,7 +326,7 @@ Artifacts are persisted under `data/demo/rung_<n>/`:
 ### 4. Run a multi-rung ratchet campaign

 ```bash
-PYTHONPATH=src .venv/bin/python -m autoresearch_quantum run-ratchet \
+autoresearch-quantum run-ratchet \
  --config configs/rungs/rung1.yaml \
  --config configs/rungs/rung2.yaml \
  --config configs/rungs/rung3.yaml \
@ -320,18 +336,17 @@ PYTHONPATH=src .venv/bin/python -m autoresearch_quantum run-ratchet \

 ### 5. Run an optional hardware-backed confirmation

-First install the hardware extra and make IBM credentials available in the usual `qiskit-ibm-runtime` way. The simplest path is to export:
+First install the hardware extra and make IBM credentials available:

 ```bash
+pip install -e '.[hardware]'
 export QISKIT_IBM_TOKEN=...
 ```

 Then enable the hardware tier in the rung config by setting `tier_policy.enable_hardware: true` and optionally `hardware.backend_name: ibm_brisbane`.

-Run:
-
 ```bash
-PYTHONPATH=src .venv/bin/python -m autoresearch_quantum run-step \
+autoresearch-quantum run-step \
  --config configs/rungs/rung1.yaml \
  --store-dir data/hardware \
  --hardware
@ -339,18 +354,71 @@ PYTHONPATH=src .venv/bin/python -m autoresearch_quantum run-step \

 Only challengers that beat the incumbent cheap-tier score by `tier_policy.cheap_margin` are promoted.

+## Testing & Validation
+
+The project has **335 tests** across 13 test files covering every layer:
+
+| Test file | What it validates |
+|-----------|-------------------|
+| `test_codes.py` | [[4,2,2]] stabilisers, encoder, seed gates |
+| `test_experiments.py` | Circuit bundle construction |
+| `test_analysis.py` | Postselection, witness, stability metrics |
+| `test_scoring.py` | WAC and factory throughput score functions |
+| `test_feedback.py` | Lesson extraction, search rules, space narrowing |
+| `test_harness.py` | Full ratchet integration (rung, multi-rung, resumability) |
+| `test_persistence.py` | JSON store round-trips |
+| `test_cli.py` | CLI subcommands |
+| `test_config.py` | YAML config loading |
+| `test_teaching.py` | Assessment widgets, LearningTracker |
+| `test_notebooks.py` | Notebook execution via nbclient, structure validation |
+| `test_pedagogy.py` | Pedagogical quality: prose density, assessment density, Bloom's coverage, section structure, tracker integration, key insights, cross-plan consistency |
+| `test_browser_ux.py` | Playwright end-to-end: JupyterLab launch, notebook rendering, navigation links, widget rendering |
+
+### Running tests
+
+```bash
+# Standard: all tests except browser UX (default)
+bash scripts/app.sh validate
+
+# Quick: lint + type check + unit tests only
+bash scripts/app.sh validate --quick
+
+# Direct pytest (browser tests excluded by default via marker)
+.venv/bin/python -m pytest tests/ -v
+
+# Browser UX tests (requires playwright)
+pip install playwright && python -m playwright install chromium
+.venv/bin/python -m pytest tests/test_browser_ux.py -m browser -v
+```
+
+### Static analysis
+
+- **Ruff** --- linting and formatting (E, F, W, I, UP, B, SIM rule sets)
+- **mypy** --- strict mode type checking across all source files
+- **nbstripout** --- strips notebook outputs before commit
+
+All three run automatically as **pre-commit hooks** (`.pre-commit-config.yaml`). Install with:
+
+```bash
+.venv/bin/pre-commit install
+```
+
+### CI/CD
+
+The GitHub Actions pipeline (`.github/workflows/ci.yml`) runs on every push and PR:
+
+1. **Lint job** --- ruff check, ruff format --check, mypy strict (Python 3.11)
+2. **Test job** --- full test suite on Python 3.11 and 3.12 matrix
+3. **Notebook execution job** --- runs all 12 notebooks end-to-end via nbclient
+
 ## Extending The Ladder

 The intended progression is:

-1. `rung1.yaml`
-   baseline `[[4,2,2]]` encoded magic-state preparation
-2. `rung2.yaml`
-   same code with stronger stability and backend-awareness
-3. `rung3.yaml`
-   transfer across backend families
-4. `rung4.yaml`
-   factory-style cost pressure
+1. `rung1.yaml` --- baseline `[[4,2,2]]` encoded magic-state preparation
+2. `rung2.yaml` --- same code with stronger stability and backend-awareness
+3. `rung3.yaml` --- transfer across backend families
+4. `rung4.yaml` --- factory-style cost pressure

 To add a new rung:

--- a/THE_STORY.md
+++ b/THE_STORY.md
@ -398,7 +398,124 @@ and checks that their computed seeds are different.
 ---


-## Part 4: The file map
+## Part 4: The teaching layer
+
+The system is not only a research engine. It is also a course. Twelve Jupyter
+notebooks, organised into four independent learning plans, teach the same
+material through different pedagogical lenses. The teaching layer sits on top
+of the research engine and uses its real components (circuits, simulators,
+scorers, ratchet) as the substrate for interactive learning.
+
+### 4.1 Entry point: 00_START_HERE.ipynb
+
+Every learner begins at `notebooks/00_START_HERE.ipynb`. This notebook
+contains no code --- it is a plan selector. It describes the four plans, their
+target audiences, and links directly to each plan's first notebook. All
+content notebooks link back to Start Here.
+
+### 4.2 The four plans
+
+| Plan | Style | Notebooks | Target learner |
+|------|-------|-----------|----------------|
+| **A** | Bottom-up, sequential | 3 | Methodical learners who want foundations first |
+| **B** | Spiral, three passes | 1 (78 cells) | Time-pressed learners who want a demo first, theory later |
+| **C** | Parallel tracks + dashboard | 4 | Learners who want to choose their own path |
+| **D** | Hypothesis-driven experiments | 3 | Research-oriented learners who want to test claims |
+
+All four plans cover the same core concepts: T-state preparation, [[4,2,2]]
+encoding, stabiliser verification, postselection, scoring, the ratchet
+optimiser, lesson extraction, and cross-rung transfer.
+
+### 4.3 Interactive assessments (teaching/assess.py)
+
+Every content notebook includes interactive assessments built with ipywidgets:
+
+- **quiz()** --- multiple-choice questions with immediate feedback
+- **predict_choice()** --- "What do you think will happen?" before running code
+- **reflect()** --- open-ended reflections graded by keyword matching
+- **order()** --- drag-and-drop ordering exercises (e.g., rank error types)
+
+Each assessment is tagged with a Bloom's taxonomy level (remember, understand,
+apply, analyse, evaluate) and a topic. The full mapping of learning objectives
+to assessments is documented in `notebooks/learning_objectives.md`.
+
+### 4.4 Progress tracking (teaching/tracker.py)
+
+Each notebook creates a `LearningTracker` instance that records:
+
+- scores per assessment (correct/incorrect, attempt count)
+- Bloom's level distribution (how many of each level attempted/passed)
+- time spent per assessment
+- checkpoint summaries at natural breakpoints
+
+At the end of each notebook, `tracker.dashboard()` displays a visual summary,
+and `tracker.save()` persists progress to a JSON file. Progress files can be
+reset with `bash scripts/app.sh reset`.
+
+### 4.5 Navigation
+
+Every content notebook has a navigation footer with:
+
+- **Forward link** to the next notebook in the plan
+- **Back-link** to 00_START_HERE.ipynb
+- **Cross-plan suggestions** at terminal notebooks (e.g., "Finished Plan A?
+  Try Plan D for a different perspective.")
+
+### 4.6 Pedagogical quality enforcement
+
+The test suite includes `tests/test_pedagogy.py`, which enforces educational
+quality invariants across all content notebooks:
+
+- Minimum 200 words of prose per notebook
+- At least 25% of cells are markdown (not code-only)
+- Every notebook has a title header and multiple sections
+- At least 2 interactive assessments per notebook
+- At least 2 different assessment types per notebook (variety)
+- Bloom's taxonomy coverage: at least 2 levels per notebook
+- Checkpoint summaries present when a notebook has 4+ assessments
+- LearningTracker initialisation, dashboard(), and save() in every notebook
+- Key Insight callouts in longer notebooks (5+ sections)
+- All four plans collectively cover core concepts (stabiliser, magic, witness, ratchet)
+
+These tests catch pedagogical regressions the same way unit tests catch code
+regressions. Adding a new notebook or modifying an existing one will fail CI
+if it violates these invariants.
+
+
+---
+
+
+## Part 5: The consumer experience (app.sh)
+
+The project includes a lifecycle manager (`scripts/app.sh`) that handles the
+entire consumer experience from first clone to running notebooks:
+
+```bash
+bash scripts/app.sh bootstrap     # venv, pip install, kernel registration, import check
+bash scripts/app.sh start         # launch JupyterLab, open 00_START_HERE.ipynb
+bash scripts/app.sh stop          # graceful shutdown
+bash scripts/app.sh status        # venv, server, notebook, progress summary
+bash scripts/app.sh validate      # ruff + mypy + full test suite
+bash scripts/app.sh validate --quick  # lint + type check + unit tests only
+bash scripts/app.sh logs          # tail JupyterLab output
+bash scripts/app.sh reset         # delete learner progress files
+```
+
+Bootstrap checks Python >= 3.11, creates the venv, installs the package with
+dev and notebook dependencies, registers a Jupyter kernel, and verifies that
+core imports succeed. Start finds a free port (8888-8899), launches JupyterLab
+in the background with PID tracking, and opens the browser directly to
+`00_START_HERE.ipynb`.
+
+Validation runs the full quality pipeline: ruff linting, mypy strict type
+checking, and the pytest suite (335 tests, excluding browser UX by default).
+The `--quick` flag runs only lint, type check, and unit tests.
+
+
+---
+
+
+## Part 6: The file map

 ```
 autoresearch-quantum/
@ -450,8 +567,42 @@ autoresearch-quantum/
      store.py             JSON file store: experiments, steps, progress,
                           lessons, feedback, propagated specs

-  tests/
-    test_harness.py        21 tests covering every subsystem
+    teaching/
+      assess.py            Widget-based quizzes, predictions, reflections
+      tracker.py           LearningTracker: per-student progress tracking
+
+  notebooks/
+    00_START_HERE.ipynb    Central entry point: plan selector
+    learning_objectives.md Per-notebook, per-section learning objectives
+    plan_a/                Bottom-up: 3 sequential notebooks
+    plan_b/                Spiral: 1 notebook, 3 passes
+    plan_c/                Parallel tracks + dashboard: 4 notebooks
+    plan_d/                Hypothesis-driven: 3 experiments
+
+  paper/
+    autoresearch_quantum.tex   Technical paper (LaTeX, 19 pages)
+    compendium.tex             Companion textbook (LaTeX, 36 pages)
+
+  scripts/
+    app.sh                 Consumer lifecycle manager (bootstrap/start/stop/validate)
+
+  tests/                   335 tests across 13 files
+    test_analysis.py       Postselection & witness
+    test_browser_ux.py     Playwright end-to-end UX
+    test_cli.py            CLI subcommands
+    test_codes.py          [[4,2,2]] code correctness
+    test_config.py         YAML config loading
+    test_experiments.py    Circuit bundle construction
+    test_feedback.py       Lesson extraction & search rules
+    test_harness.py        Full ratchet integration
+    test_notebooks.py      Notebook execution & structure
+    test_pedagogy.py       Pedagogical quality invariants (130 tests)
+    test_persistence.py    JSON store round-trips
+    test_scoring.py        Score functions
+    test_teaching.py       Assessment widgets & tracker
+
+  .github/workflows/ci.yml  CI: lint, type check, test matrix, notebook execution
+  .pre-commit-config.yaml   Ruff, mypy, nbstripout, hygiene hooks

  data/                    Output directory (created at runtime)
    default/
@ -472,12 +623,12 @@ autoresearch-quantum/
 ---


-## Part 5: How to use it without Claude
+## Part 7: How to use it without Claude

 You do not need an AI to run this system or to make progress with its
 output. Everything below runs in your terminal.

-### 5.1 Setup
+### 7.1 Setup

 ```bash
 cd autoresearch-quantum
@ -486,7 +637,7 @@ source .venv/bin/activate
 pip install -e ".[dev]"
 ```

-### 5.2 Run a single experiment
+### 7.2 Run a single experiment

 ```bash
 python -m autoresearch_quantum run-experiment \
@ -498,7 +649,7 @@ python -m autoresearch_quantum run-experiment \
 This prints a JSON result with the score, failure mode, and experiment ID.
 The full record is saved to `data/default/rung_1/experiments/`.

-### 5.3 Run one ratchet step
+### 7.3 Run one ratchet step

 ```bash
 python -m autoresearch_quantum run-step \
@ -510,7 +661,7 @@ them, promotes the best, and saves the step record. Run it again and it
 generates *new* challengers (never repeating), with a new incumbent if one was
 found.

-### 5.4 Run a full rung
+### 7.4 Run a full rung

 ```bash
 python -m autoresearch_quantum run-rung \
@ -521,7 +672,7 @@ Runs up to `step_budget` steps (default 3), stopping early if patience runs
 out. Produces `data/default/rung_1/lesson.md` -- read this file. It tells you
 what helped, what hurt, what seems invariant, and what to test next.

-### 5.5 Run the full five-rung ratchet
+### 7.5 Run the full five-rung ratchet

 ```bash
 python -m autoresearch_quantum run-ratchet \
@ -536,7 +687,7 @@ This is the full pipeline. Each rung's winner is automatically propagated to
 the next rung. Each rung's lessons narrow the search space for the next.
 When it finishes, you have five lesson files and a final optimised recipe.

-### 5.6 Run a transfer evaluation
+### 7.6 Run a transfer evaluation

 ```bash
 python -m autoresearch_quantum run-transfer \
@ -547,7 +698,7 @@ python -m autoresearch_quantum run-transfer \
 Tests a single spec across multiple backend noise models. The output tells you
 the per-backend scores and the pessimistic transfer score.

-### 5.7 Reading the output
+### 7.7 Reading the output

 After a ratchet run, the most valuable artefacts are:

@ -559,7 +710,7 @@ After a ratchet run, the most valuable artefacts are:
 | `rung_N/propagated_spec.json` | The spec that was carried forward from the previous rung. Compare it with the YAML bootstrap to see what the system changed. |
 | `rung_N/progress.json` | If the run was interrupted, this tells you where it left off. Just re-run the same command to resume. |

-### 5.8 Making manual progress with the artefacts
+### 7.8 Making manual progress with the artefacts

 The system is designed so that you can interleave human intuition with
 automated search:
@ -591,22 +742,27 @@ automated search:
   You are now doing what the system does in `run_ratchet` -- but with human
   judgement about what to explore next.

-### 5.9 Running the tests
+### 7.9 Running the tests

 ```bash
+# Full validation (recommended)
+bash scripts/app.sh validate
+
+# Or directly with pytest
 python -m pytest tests/ -v
 ```

-All 21 tests should pass. They take about 13 seconds. If a test fails after
-you edit a YAML config, the most likely cause is that you introduced a
-dimension value that does not correspond to an implemented code path (e.g.,
-`encoder_style: "rzz_lattice"` does not exist in `four_two_two.py`).
+All 335 tests should pass (browser UX tests excluded by default). If a test
+fails after you edit a YAML config, the most likely cause is that you
+introduced a dimension value that does not correspond to an implemented code
+path (e.g., `encoder_style: "rzz_lattice"` does not exist in
+`four_two_two.py`).


 ---


-## Part 6: What this system does NOT do (yet)
+## Part 8: What this system does NOT do (yet)

 - **It does not run on real quantum hardware by default.** The
  `IBMHardwareExecutor` exists and is wired up, but `enable_hardware: false`
@ -623,8 +779,10 @@ dimension value that does not correspond to an implemented code path (e.g.,
  `SearchRule` extraction, the `CompositeGenerator` budget allocation, and
  the cross-rung propagation logic.

- **It does not visualise results.** There is no dashboard. The output is
-  JSON and Markdown. You read it, or you write a script to plot it.
+- **CLI output is JSON and Markdown.** The CLI ratchet produces JSON files
+  and Markdown lessons. For interactive exploration, use the Plan C dashboard
+  notebook (`plan_c/00_dashboard.ipynb`), which provides a widget-based
+  interface for running experiments and viewing results.

 - **It does not parallelise evaluations.** Each experiment runs sequentially.
  On a machine with multiple cores, you could shard the challenger set across
@ -634,7 +792,7 @@ dimension value that does not correspond to an implemented code path (e.g.,
 ---


-## Part 7: Architecture diagram
+## Part 9: Architecture diagram

 ```
                          configs/rungs/rung1-5.yaml
@ -677,6 +835,6 @@ ratchet runs multiple rungs. The lessons tighten the circle with every pass.

 ---

-*This document was written on 2026-04-04 to describe the system as built.
-The code is the ground truth. If this document contradicts the code, the
-code is correct.*
+*This document was last updated on 2026-04-15 to describe the system as
+built. The code is the ground truth. If this document contradicts the code,
+the code is correct.*
--- a/notebooks/learning_objectives.md
+++ b/notebooks/learning_objectives.md
@ -1,8 +1,19 @@
-# Learning Objectives — Per Notebook, Per Section
+# Learning Objectives --- Per Notebook, Per Section

 Each objective has a Bloom level and a matched assessment type.
 All four plans teach the same core material; the pedagogical approach differs.

+**Entry point:** Open `00_START_HERE.ipynb` to choose your plan. Every content
+notebook links back to Start Here and forward to the next notebook in the plan.
+
+**Assessment types:**
+- **MCQ** (`quiz()`) --- multiple-choice with immediate feedback
+- **Predict** (`predict_choice()`) --- predict an outcome before running code
+- **Reflect** (`reflect()`) --- open-ended reflection graded by keywords
+- **Order** (`order()`) --- rank or sequence items
+
+All assessments are tracked by `LearningTracker` with Bloom's taxonomy levels.
+
 ---

 ## Plan A — Bottom-Up (3 Sequential Notebooks)
--- a/paper/autoresearch_quantum.tex
+++ b/paper/autoresearch_quantum.tex
@ -913,9 +913,10 @@ re-evaluated, and the patience counter is preserved.
 \label{sec:verification_claims}
 % ============================================================================

-The test suite contains 21 tests, each anchored to a specific architectural
-claim. We present them grouped by subsystem, with the falsification condition
-for each.
+The full test suite contains 335 tests across 13 files, covering the research
+engine, teaching layer, notebook structure, and pedagogical quality. Below we
+present the 21 core research-engine tests, grouped by subsystem, with the
+falsification condition for each.

 \subsection{Quantum Correctness (3 tests)}

@ -1108,7 +1109,8 @@ with different \code{verification} values. The seeds must differ.
 cd autoresearch-quantum
 python -m venv .venv && source .venv/bin/activate
 pip install -e ".[dev]"
-python -m pytest tests/ -v          # 21 tests, ~13 seconds
+python -m pytest tests/ -v          # 335 tests
+bash scripts/app.sh validate       # full validation (lint + types + tests)
 \end{lstlisting}

 Requires Python $\geq$ 3.11 and Qiskit $\geq$ 2.3. No GPU needed.
--- a/paper/compendium.tex
+++ b/paper/compendium.tex
@ -116,13 +116,14 @@
 \begin{center}
 \begin{minipage}{0.85\textwidth}
 \small\itshape
-This compendium is the ``course textbook'' for the eight Jupyter notebooks
-in the \textsc{autoresearch-quantum} project. It is designed to be read
-before, during, or after working through the notebooks. Every concept
-exercised in the notebooks is explained here with the depth and context
-that a tutorial session cannot provide. No prior knowledge of quantum
-error correction is assumed; familiarity with linear algebra and
-complex numbers is helpful.
+This compendium is the ``course textbook'' for the twelve Jupyter notebooks
+(across four learning plans) in the \textsc{autoresearch-quantum} project.
+Start at \texttt{00\_START\_HERE.ipynb} to choose your plan. This document
+is designed to be read before, during, or after working through the
+notebooks. Every concept exercised in the notebooks is explained here
+with the depth and context that a tutorial session cannot provide. No
+prior knowledge of quantum error correction is assumed; familiarity with
+linear algebra and complex numbers is helpful.
 \end{minipage}
 \end{center}
 \vspace{2cm}
@ -1419,13 +1420,13 @@ expectation value is the average over many measurements.
 \textbf{Notebook Topic} & \textbf{Notebooks} & \textbf{Compendium} \\
 \midrule
 T-state definition \& Bloch sphere &
-  A/01~\S1--2, B~\S2.1, C/A~\S1--3 &
+  A/01~\S1--2, B~\S2.1, C/A~\S1--3, D/1~\S1 &
  \cref{ch:magic} \\
 Why encode (no-cloning, distance) &
-  A/01~\S3, C/A~\S1 &
+  A/01~\S3, C/A~\S1, D/1~\S2 &
  \cref{ch:code}~\S1--2 \\
 Stabilisers \& codespace &
-  A/01~\S6, B~\S2.3, C/A~\S4 &
+  A/01~\S6, B~\S2.3, C/A~\S4, D/1~\S3 &
  \cref{ch:code}~\S3 \\
 Logical operators &
  A/01~\S6, C/A~\S5 &
@ -1434,22 +1435,22 @@ Encoder circuits &
  A/01~\S4--5, C/A~\S6 &
  \cref{sec:encoder} \\
 Error detection &
-  A/01~\S7, C/A~\S8 &
+  A/01~\S7, C/A~\S8, D/1~\S4 &
  \cref{sec:errors} \\
 Ancilla \& syndrome extraction &
  A/01~\S9, C/A~\S7 &
  \cref{ch:measurement}~\S2 \\
 Postselection &
-  A/01~\S11, A/02~\S3, B~\S2.5 &
+  A/01~\S11, A/02~\S3, B~\S2.5, D/1~\S6 &
  \cref{sec:postselection} \\
 Noise models \& transpilation &
-  A/02~\S2, C/B~\S1--3 &
+  A/02~\S2, C/B~\S1--3, D/2~\S1 &
  \cref{ch:noise} \\
 Magic witness formula &
-  A/02~\S5, B~\S2.7, C/A~\S9 &
+  A/02~\S5, B~\S2.7, C/A~\S9, D/1~\S5 &
  \cref{ch:witness} \\
 Scoring formula &
-  A/02~\S7, B~\S2.9, C/B~\S8 &
+  A/02~\S7, B~\S2.9, C/B~\S8, D/2~\S2 &
  \cref{ch:scoring} \\
 Factory throughput &
  A/02~\S10, C/B~\S9 &
@ -1458,20 +1459,23 @@ Failure modes &
  A/02~\S9, C/B~\S7 &
  \cref{sec:failures} \\
 Ratchet mechanism &
-  A/03~\S1--4, B~\S2.10--12, C/C~\S1--7 &
+  A/03~\S1--4, B~\S2.10--12, C/C~\S1--7, D/3~\S1--2 &
  \cref{ch:ratchet}~\S1--3 \\
 Search strategies &
-  A/03~\S7, B~\S3.5, C/C~\S3--4 &
+  A/03~\S7, B~\S3.5, C/C~\S3--4, D/3~\S2 &
  \cref{sec:strategies} \\
 Lesson extraction \& rules &
-  A/03~\S8, B~\S3.6, C/C~\S8--9 &
+  A/03~\S8, B~\S3.6, C/C~\S8--9, D/3~\S4 &
  \cref{sec:lessons} \\
 Narrowing \& propagation &
  B~\S3.7, C/C~\S10--11 &
  \cref{ch:ratchet}~\S5--6 \\
 Transfer evaluation &
-  A/03~\S10, B~\S3.8, C/C~\S12 &
+  A/03~\S10, B~\S3.8, C/C~\S12, D/3~\S5 &
  \cref{ch:ratchet}~\S7 \\
+Parameter sweep \& optimisation &
+  A/02~\S8, D/2~\S3, D/3~\S3 &
+  \cref{ch:scoring} \\
 \bottomrule
 \end{tabular}
 \end{center}