Sync all documentation with current project ground truth

README: rewrite with Quick Start (app.sh), 335-test count, teaching layer narrative, testing/validation section, CI/CD docs, pre-commit hooks. THE_STORY: add Part 4 (teaching layer), Part 5 (app.sh consumer experience), update file map with all 13 test files and teaching/notebook/paper entries. compendium.tex: update notebook count (8→12), add Plan D cross-references. autoresearch_quantum.tex: update test counts (21→335), add app.sh validate. learning_objectives.md: add entry point reference and assessment type glossary. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-14 20:37:51 +00:00 · 2026-04-15 20:55:02 +02:00 · 2026-04-15 20:55:02 +02:00 · 55237d5f73
commit 55237d5f73
parent 29caba3a1a
5 changed files with 455 additions and 212 deletions
--- a/README.md
+++ b/README.md
@ -1,93 +1,205 @@
 # Autoresearch Quantum
-`autoresearch-quantum` is a Python research harness for a Karpathy-style autoresearch ratchet in quantum experiments:
+`autoresearch-quantum` is a Python research harness for a Karpathy-style autoresearch ratchet in quantum experiments, combined with a four-plan interactive coursework built on Jupyter notebooks.
- keep an incumbent experiment
+The system has two layers:
 - generate challenger experiments
 - screen challengers on a cheap tier
 - promote only justified challengers to an expensive tier
 - replace the incumbent only when the challenger wins on the final criterion
 - log every ratchet step
 - extract a transferable lesson at the end of each rung
-The first built-in experiment family targets encoded magic-state preparation in the `[[4,2,2]]` code with Qiskit. The framework is designed so the `[[4,2,2]]` rung is not the destination. It is the first rung in a ladder that shifts from best-circuit hunting toward reusable design rules for larger encoded workflows.
+1. **Research engine** --- an automated loop that discovers the best way to prepare encoded magic states on the [[4,2,2]] quantum error-detecting code. It proposes, evaluates, compares, learns, and repeats without human intervention.
 2. **Teaching layer** --- 12 Jupyter notebooks across 4 learning plans, each teaching the same core material through a different pedagogical lens: sequential (Plan A), spiral (Plan B), parallel tracks (Plan C), and hypothesis-driven experiments (Plan D). Every notebook includes interactive widget-based assessments, per-student progress tracking, and Bloom's taxonomy-aligned exercises.
 No IBM account or API key is needed --- everything runs locally with the Aer simulator.
 ## Project Tree
 ```text
 autoresearch-quantum/
 ├── configs/rungs/
-│   ├── rung1.yaml          Baseline: what recipe works?
+│   ├── rung1.yaml              Baseline: what recipe works?
-│   ├── rung2.yaml          Stability under noise variation
+│   ├── rung2.yaml              Stability under noise variation
-│   ├── rung3.yaml          Transfer across backends
+│   ├── rung3.yaml              Transfer across backends
-│   ├── rung4.yaml          Factory throughput / cost
+│   ├── rung4.yaml              Factory throughput / cost
-│   └── rung5.yaml          Rosenfeld direction
+│   └── rung5.yaml              Rosenfeld direction
 ├── src/autoresearch_quantum/
-│   ├── cli.py              CLI entry point
+│   ├── cli.py                  CLI entry point
-│   ├── config.py           YAML config loader
+│   ├── config.py               YAML config loader
-│   ├── models.py           All data structures
+│   ├── models.py               All data structures
 │   ├── codes/
-│   │   └── four_two_two.py [[4,2,2]] stabilisers, encoder, seed gates
+│   │   └── four_two_two.py     [[4,2,2]] stabilisers, encoder, seed gates
 │   ├── experiments/
 │   │   └── encoded_magic_state.py  Circuit bundle builder
 │   ├── execution/
-│   │   ├── analysis.py     Postselection, witness, stability
+│   │   ├── analysis.py         Postselection, witness, stability
-│   │   ├── backends.py     Backend resolution
+│   │   ├── backends.py         Backend resolution
-│   │   ├── hardware.py     IBM hardware executor
+│   │   ├── hardware.py         IBM hardware executor
-│   │   ├── local.py        Aer noise simulation executor
+│   │   ├── local.py            Aer noise simulation executor
-│   │   ├── transfer.py     Cross-backend transfer evaluator
+│   │   ├── transfer.py         Cross-backend transfer evaluator
-│   │   └── transpile.py    Transpilation utilities
+│   │   └── transpile.py        Transpilation utilities
 │   ├── lessons/
-│   │   ├── extractor.py    Human-readable lesson extraction
+│   │   ├── extractor.py        Human-readable lesson extraction
-│   │   └── feedback.py     Machine-readable rules + search narrowing
+│   │   └── feedback.py         Machine-readable rules + search narrowing
 │   ├── persistence/
-│   │   └── store.py        JSON file store with resumability
+│   │   └── store.py            JSON file store with resumability
 │   ├── ratchet/
-│   │   └── runner.py       AutoresearchHarness orchestrator
+│   │   └── runner.py           AutoresearchHarness orchestrator
 │   ├── scoring/
-│   │   └── score.py        WAC + factory throughput scorers
+│   │   └── score.py            WAC + factory throughput scorers
 │   ├── search/
-│   │   ├── challengers.py  Neighbour generation with dedup
+│   │   ├── challengers.py      Neighbour generation with dedup
-│   │   └── strategies.py   NeighborWalk, RandomCombo, LessonGuided
+│   │   └── strategies.py       NeighborWalk, RandomCombo, LessonGuided
 │   └── teaching/
-│       ├── assess.py       Widget-based quizzes, predictions, reflections
+│       ├── assess.py           Widget-based quizzes, predictions, reflections
-│       └── tracker.py      LearningTracker — per-student progress tracking
+│       └── tracker.py          LearningTracker --- per-student progress tracking
 ├── paper/
 │   ├── autoresearch_quantum.tex   Full technical paper (LaTeX)
 │   ├── autoresearch_quantum.pdf   Compiled PDF (19 pages)
 │   ├── compendium.tex             Companion textbook (LaTeX)
 │   └── compendium.pdf             Compiled PDF (36 pages)
 ├── notebooks/
-│   ├── plan_a/              Bottom-up: 3 sequential notebooks
+│   ├── 00_START_HERE.ipynb     Central entry point --- plan selector
 │   ├── learning_objectives.md  Per-notebook, per-section learning objectives
 │   ├── plan_a/                 Bottom-up: 3 sequential notebooks
 │   │   ├── 01_encoded_magic_state.ipynb
 │   │   ├── 02_measuring_progress.ipynb
 │   │   └── 03_the_ratchet.ipynb
-│   ├── plan_b/              Spiral: 1 notebook, three passes
+│   ├── plan_b/                 Spiral: 1 notebook, three passes
 │   │   └── spiral_notebook.ipynb
-│   ├── plan_c/              Parallel tracks + dashboard
+│   ├── plan_c/                 Parallel tracks + dashboard
 │   │   ├── 00_dashboard.ipynb
 │   │   ├── track_a_physics.ipynb
 │   │   ├── track_b_engineering.ipynb
 │   │   └── track_c_search.ipynb
-│   └── plan_d/              Three claim-driven experiments
+│   └── plan_d/                 Three claim-driven experiments
 │       ├── experiment_1_protection.ipynb
 │       ├── experiment_2_noise.ipynb
 │       └── experiment_3_optimisation.ipynb
-├── tests/                   107 tests
+├── scripts/
-│   ├── test_analysis.py
+│   └── app.sh                  Consumer lifecycle manager
-│   ├── test_cli.py
+├── tests/                      335 tests across 13 files
-│   ├── test_codes.py
+│   ├── test_analysis.py        Postselection & witness tests
-│   ├── test_config.py
+│   ├── test_browser_ux.py      Playwright end-to-end UX tests
-│   ├── test_experiments.py
+│   ├── test_cli.py             CLI subcommand tests
-│   ├── test_feedback.py
+│   ├── test_codes.py           [[4,2,2]] code correctness
-│   ├── test_harness.py
+│   ├── test_config.py          YAML config loading
-│   ├── test_persistence.py
+│   ├── test_experiments.py     Circuit bundle construction
-│   └── test_scoring.py
+│   ├── test_feedback.py        Lesson extraction & search rules
-├── THE_STORY.md             Narrative documentation
+│   ├── test_harness.py         Full ratchet integration tests
-├── pyproject.toml
+│   ├── test_notebooks.py       Notebook execution & structure
 │   ├── test_pedagogy.py        Pedagogical quality invariants
 │   ├── test_persistence.py     JSON store round-trips
 │   ├── test_scoring.py         Score function correctness
 │   └── test_teaching.py        Assessment widget & tracker tests
 ├── .github/workflows/ci.yml    CI: lint, type check, test matrix, notebook execution
 ├── .pre-commit-config.yaml     Ruff, mypy, nbstripout, hygiene hooks
 ├── THE_STORY.md                Narrative documentation (system design)
 ├── pyproject.toml              Build config, dependencies, tool settings
 └── README.md
 ```
 ## Quick Start
 The fastest way to get running:
 ```bash
 # Clone and bootstrap (creates venv, installs everything, registers Jupyter kernel)
 git clone https://github.com/saymrwulf/autoresearch-quantum.git
 cd autoresearch-quantum
 bash scripts/app.sh bootstrap
 # Launch JupyterLab (opens 00_START_HERE.ipynb in your browser)
 bash scripts/app.sh start
 ```
 The `app.sh` lifecycle manager handles the entire consumer experience:
 | Command | What it does |
 |---------|-------------|
 | `bash scripts/app.sh bootstrap` | Create venv, install deps, register Jupyter kernel, verify imports |
 | `bash scripts/app.sh start` | Launch JupyterLab (auto-opens `00_START_HERE.ipynb`) |
 | `bash scripts/app.sh start --no-open` | Launch without opening browser |
 | `bash scripts/app.sh stop` | Stop JupyterLab |
 | `bash scripts/app.sh status` | Show venv, server, notebook, and progress status |
 | `bash scripts/app.sh validate` | Run full validation: ruff + mypy + pytest |
 | `bash scripts/app.sh validate --quick` | Lint + type check + unit tests only |
 | `bash scripts/app.sh logs` | Tail JupyterLab output |
 | `bash scripts/app.sh reset` | Delete learner progress files |
 ### Manual installation
 If you prefer manual setup:
 ```bash
 python3 -m venv .venv
 . .venv/bin/activate
 pip install -e '.[dev,notebooks]'
 ```
 For the optional IBM hardware path:
 ```bash
 pip install -e '.[hardware,dev,notebooks]'
 ```
 ## Jupyter Notebooks --- Learning Plans
 The `notebooks/` folder contains **12 notebooks across 4 independent learning plans**, all accessible from a central entry point: **`00_START_HERE.ipynb`**.
 Each plan teaches the same core material (encoded magic-state preparation, measurement, and the ratchet optimiser) through a different didactic lens. Every content notebook includes:
 - **Interactive assessments** --- multiple-choice quizzes, predictions, reflections, and ordering exercises (ipywidgets)
 - **Per-student progress tracking** --- `LearningTracker` records scores, Bloom's levels, and time per assessment
 - **Navigation links** --- forward/backward links between notebooks, cross-plan suggestions, and back-links to Start Here
 - **Key Insight callouts** --- highlighted takeaways for important concepts
 - **Checkpoint summaries** --- mid-notebook progress reviews in longer notebooks
 ### Plan A --- Bottom-Up (3 sequential notebooks)
 | # | File | What you learn |
 |---|------|----------------|
 | 1 | `plan_a/01_encoded_magic_state.ipynb` | T-state, [[4,2,2]] encoder, stabilisers, error detection, postselection |
 | 2 | `plan_a/02_measuring_progress.ipynb` | Noise, logical operators, magic witness, scoring formula, parameter sweeps |
 | 3 | `plan_a/03_the_ratchet.ipynb` | Incumbent/challenger model, ratchet steps, lessons, cross-rung propagation |
 Start with notebook 01 and work through in order. Run each cell top-to-bottom (Shift+Enter).
 ### Plan B --- Spiral (1 notebook, three passes)
 | File | What you learn |
 |------|----------------|
 | `plan_b/spiral_notebook.ipynb` | **Pass 1:** 5-min demo (black-box). **Pass 2:** Open the box (circuits, stabilisers, scoring). **Pass 3:** Make it your own (modify parameters, run experiments). |
 One notebook, 78 cells. Each pass revisits the same system at a deeper level.
 ### Plan C --- Parallel Tracks (4 notebooks)
 | File | Focus |
 |------|-------|
 | `plan_c/00_dashboard.ipynb` | Interactive dashboard (ipywidgets) --- run experiments from dropdowns |
 | `plan_c/track_a_physics.ipynb` | Pure quantum mechanics: Eastin-Knill, Bloch sphere, stabiliser algebra |
 | `plan_c/track_b_engineering.ipynb` | Noise models, transpilation, cost model, failure modes |
 | `plan_c/track_c_search.ipynb` | Parameter space, search strategies, lesson extraction, cross-rung transfer |
 Start with the dashboard for an overview, then dive into whichever track interests you. The three tracks are independent and can be read in any order.
 ### Plan D --- Three Claim-Driven Experiments
 | # | File | Hypothesis |
 |---|------|-----------|
 | 1 | `plan_d/experiment_1_protection.ipynb` | The [[4,2,2]] code can protect a magic state: W=1.0, all errors detected |
 | 2 | `plan_d/experiment_2_noise.ipynb` | Noise degrades quality but parameter choice matters >2x |
 | 3 | `plan_d/experiment_3_optimisation.ipynb` | A ratchet can learn to optimise and its knowledge transfers |
 Each notebook follows: **Hypothesis -> Claim -> Experiment -> Proof -> Next Hypothesis**.
 ### Troubleshooting
 | Problem | Fix |
 |---------|-----|
 | `ModuleNotFoundError: autoresearch_quantum` | Run `bash scripts/app.sh bootstrap` or `pip install -e '.[notebooks]'` |
 | `ModuleNotFoundError: ipywidgets` | Run `pip install ipywidgets` --- needed for interactive assessments |
 | Plots don't render | Make sure `%matplotlib inline` is in the first code cell (it already is) |
 | Kernel not found | In JupyterLab, select **Kernel > Change Kernel** and pick the `.venv` Python |
 ## Scientific Framing
 ### What is optimized
@ -144,7 +256,7 @@ Expensive tier:
 ## Built-In `[[4,2,2]]` Experiment
-The built-in experiment prepares an encoded logical T-state on one logical qubit of the `[[4,2,2]]` code while keeping the spectator logical qubit in `|0⟩`. The code utilities live in [`four_two_two.py`](src/autoresearch_quantum/codes/four_two_two.py).
+The built-in experiment prepares an encoded logical T-state on one logical qubit of the `[[4,2,2]]` code while keeping the spectator logical qubit in `|0>`. The code utilities live in [`four_two_two.py`](src/autoresearch_quantum/codes/four_two_two.py).
 The harness evaluates:
@ -158,108 +270,12 @@ This keeps the core scientific distinction explicit:
 - a circuit can be locally good for `[[4,2,2]]`
 - a rule is only valuable if it keeps helping across new backends or new rungs
-## Installation
+## How To Run (CLI)
 Create an isolated environment in the project root and install the package:
 ```bash
 python3 -m venv .venv
 . .venv/bin/activate
 pip install -e '.[dev,notebooks]'
 ```
 For the optional IBM hardware path:
 ```bash
 pip install -e '.[hardware,dev,notebooks]'
 ```
 If you want the CLI without installing editable mode, use `PYTHONPATH=src`.
 ## Jupyter Notebooks --- Learning Plans
 The `notebooks/` folder contains four independent learning experiences.
 Each plan teaches the same material (encoded magic-state preparation, measurement, and the ratchet optimiser) through a different didactic lens.
 **No IBM account or API key is needed** --- everything runs locally with the Aer simulator.
 ### Quick start
 ```bash
 # 1. Activate the virtual environment (if not already active)
 . .venv/bin/activate
 # 2. Install the project with notebook dependencies
 pip install -e '.[notebooks]'
 # 3. Start the Jupyter server
 jupyter lab --notebook-dir=notebooks
 ```
 This opens JupyterLab in your browser (usually at http://localhost:8888).
 Navigate into any plan folder and open the first notebook.
 > **Alternative:** If you prefer the classic notebook interface, run
 > `jupyter notebook --notebook-dir=notebooks` instead.
 ### Plan A --- Bottom-Up (3 sequential notebooks)
 | # | File | What you learn |
 |---|------|----------------|
 | 1 | `plan_a/01_encoded_magic_state.ipynb` | T-state, [[4,2,2]] encoder, stabilisers, error detection, postselection |
 | 2 | `plan_a/02_measuring_progress.ipynb` | Noise, logical operators, magic witness, scoring formula, parameter sweeps |
 | 3 | `plan_a/03_the_ratchet.ipynb` | Incumbent/challenger model, ratchet steps, lessons, cross-rung propagation |
 Start with notebook 01 and work through in order.
 Run each cell top-to-bottom (Shift+Enter).
 ### Plan B --- Spiral (1 notebook, three passes)
 | File | What you learn |
 |------|----------------|
 | `plan_b/spiral_notebook.ipynb` | **Pass 1:** 5-min demo (black-box). **Pass 2:** Open the box (circuits, stabilisers, scoring). **Pass 3:** Make it your own (modify parameters, run experiments). |
 One notebook, 78 cells. Each pass revisits the same system at a deeper level.
 ### Plan C --- Parallel Tracks (4 notebooks)
 | File | Focus |
 |------|-------|
 | `plan_c/00_dashboard.ipynb` | Interactive dashboard (ipywidgets) --- run experiments from dropdowns |
 | `plan_c/track_a_physics.ipynb` | Pure quantum mechanics: Eastin-Knill, Bloch sphere, stabiliser algebra |
 | `plan_c/track_b_engineering.ipynb` | Noise models, transpilation, cost model, failure modes |
 | `plan_c/track_c_search.ipynb` | Parameter space, search strategies, lesson extraction, cross-rung transfer |
 Start with the dashboard for an overview, then dive into whichever track interests you.
 The three tracks are independent and can be read in any order.
 ### Plan D --- Three Claim-Driven Experiments
 | # | File | Hypothesis |
 |---|------|-----------|
 | 1 | `plan_d/experiment_1_protection.ipynb` | The [[4,2,2]] code can protect a magic state: W=1.0, all errors detected |
 | 2 | `plan_d/experiment_2_noise.ipynb` | Noise degrades quality but parameter choice matters >2× |
 | 3 | `plan_d/experiment_3_optimisation.ipynb` | A ratchet can learn to optimise and its knowledge transfers |
 Each notebook follows: **Hypothesis → Claim → Experiment → Proof → Next Hypothesis**.
 The output of each experiment motivates the next.
 ### Troubleshooting
 | Problem | Fix |
 |---------|-----|
 | `ModuleNotFoundError: autoresearch_quantum` | Run `pip install -e '.[notebooks]'` inside the activated `.venv` |
 | `ModuleNotFoundError: ipywidgets` | Run `pip install ipywidgets` --- needed for the Plan C dashboard |
 | Plots don't render | Make sure `%matplotlib inline` is in the first code cell (it already is) |
 | Kernel not found | In JupyterLab, select **Kernel > Change Kernel** and pick the `.venv` Python |
 ## How To Run
 ### 1. Run a single local experiment
 Use the rung config bootstrap incumbent as-is:
 ```bash
-PYTHONPATH=src .venv/bin/python -m autoresearch_quantum run-experiment \
+autoresearch-quantum run-experiment \
  --config configs/rungs/rung1.yaml \
  --store-dir data/demo
 ```
@ -267,7 +283,7 @@ PYTHONPATH=src .venv/bin/python -m autoresearch_quantum run-experiment \
 Override individual experiment fields:
 ```bash
-PYTHONPATH=src .venv/bin/python -m autoresearch_quantum run-experiment \
+autoresearch-quantum run-experiment \
  --config configs/rungs/rung1.yaml \
  --store-dir data/demo \
  --set verification=z_only \
@ -278,7 +294,7 @@ PYTHONPATH=src .venv/bin/python -m autoresearch_quantum run-experiment \
 ### 2. Run one ratchet step
 ```bash
-PYTHONPATH=src .venv/bin/python -m autoresearch_quantum run-step \
+autoresearch-quantum run-step \
  --config configs/rungs/rung1.yaml \
  --store-dir data/demo
 ```
@ -294,7 +310,7 @@ This will:
 ### 3. Run one full rung
 ```bash
-PYTHONPATH=src .venv/bin/python -m autoresearch_quantum run-rung \
+autoresearch-quantum run-rung \
  --config configs/rungs/rung1.yaml \
  --store-dir data/demo
 ```
@ -310,7 +326,7 @@ Artifacts are persisted under `data/demo/rung_<n>/`:
 ### 4. Run a multi-rung ratchet campaign
 ```bash
-PYTHONPATH=src .venv/bin/python -m autoresearch_quantum run-ratchet \
+autoresearch-quantum run-ratchet \
  --config configs/rungs/rung1.yaml \
  --config configs/rungs/rung2.yaml \
  --config configs/rungs/rung3.yaml \
@ -320,18 +336,17 @@ PYTHONPATH=src .venv/bin/python -m autoresearch_quantum run-ratchet \
 ### 5. Run an optional hardware-backed confirmation
-First install the hardware extra and make IBM credentials available in the usual `qiskit-ibm-runtime` way. The simplest path is to export:
+First install the hardware extra and make IBM credentials available:
 ```bash
 pip install -e '.[hardware]'
 export QISKIT_IBM_TOKEN=...
 ```
 Then enable the hardware tier in the rung config by setting `tier_policy.enable_hardware: true` and optionally `hardware.backend_name: ibm_brisbane`.
 Run:
 ```bash
-PYTHONPATH=src .venv/bin/python -m autoresearch_quantum run-step \
+autoresearch-quantum run-step \
  --config configs/rungs/rung1.yaml \
  --store-dir data/hardware \
  --hardware
@ -339,18 +354,71 @@ PYTHONPATH=src .venv/bin/python -m autoresearch_quantum run-step \
 Only challengers that beat the incumbent cheap-tier score by `tier_policy.cheap_margin` are promoted.
 ## Testing & Validation
 The project has **335 tests** across 13 test files covering every layer:
 | Test file | What it validates |
 |-----------|-------------------|
 | `test_codes.py` | [[4,2,2]] stabilisers, encoder, seed gates |
 | `test_experiments.py` | Circuit bundle construction |
 | `test_analysis.py` | Postselection, witness, stability metrics |
 | `test_scoring.py` | WAC and factory throughput score functions |
 | `test_feedback.py` | Lesson extraction, search rules, space narrowing |
 | `test_harness.py` | Full ratchet integration (rung, multi-rung, resumability) |
 | `test_persistence.py` | JSON store round-trips |
 | `test_cli.py` | CLI subcommands |
 | `test_config.py` | YAML config loading |
 | `test_teaching.py` | Assessment widgets, LearningTracker |
 | `test_notebooks.py` | Notebook execution via nbclient, structure validation |
 | `test_pedagogy.py` | Pedagogical quality: prose density, assessment density, Bloom's coverage, section structure, tracker integration, key insights, cross-plan consistency |
 | `test_browser_ux.py` | Playwright end-to-end: JupyterLab launch, notebook rendering, navigation links, widget rendering |
 ### Running tests
 ```bash
 # Standard: all tests except browser UX (default)
 bash scripts/app.sh validate
 # Quick: lint + type check + unit tests only
 bash scripts/app.sh validate --quick
 # Direct pytest (browser tests excluded by default via marker)
 .venv/bin/python -m pytest tests/ -v
 # Browser UX tests (requires playwright)
 pip install playwright && python -m playwright install chromium
 .venv/bin/python -m pytest tests/test_browser_ux.py -m browser -v
 ```
 ### Static analysis
 - **Ruff** --- linting and formatting (E, F, W, I, UP, B, SIM rule sets)
 - **mypy** --- strict mode type checking across all source files
 - **nbstripout** --- strips notebook outputs before commit
 All three run automatically as **pre-commit hooks** (`.pre-commit-config.yaml`). Install with:
 ```bash
 .venv/bin/pre-commit install
 ```
 ### CI/CD
 The GitHub Actions pipeline (`.github/workflows/ci.yml`) runs on every push and PR:
 1. **Lint job** --- ruff check, ruff format --check, mypy strict (Python 3.11)
 2. **Test job** --- full test suite on Python 3.11 and 3.12 matrix
 3. **Notebook execution job** --- runs all 12 notebooks end-to-end via nbclient
 ## Extending The Ladder
 The intended progression is:
-1. `rung1.yaml`
+1. `rung1.yaml` --- baseline `[[4,2,2]]` encoded magic-state preparation
-   baseline `[[4,2,2]]` encoded magic-state preparation
+2. `rung2.yaml` --- same code with stronger stability and backend-awareness
-2. `rung2.yaml`
+3. `rung3.yaml` --- transfer across backend families
-   same code with stronger stability and backend-awareness
+4. `rung4.yaml` --- factory-style cost pressure
 3. `rung3.yaml`
   transfer across backend families
 4. `rung4.yaml`
   factory-style cost pressure
 To add a new rung:
--- a/THE_STORY.md
+++ b/THE_STORY.md
@ -398,7 +398,124 @@ and checks that their computed seeds are different.
 ---
-## Part 4: The file map
+## Part 4: The teaching layer
 The system is not only a research engine. It is also a course. Twelve Jupyter
 notebooks, organised into four independent learning plans, teach the same
 material through different pedagogical lenses. The teaching layer sits on top
 of the research engine and uses its real components (circuits, simulators,
 scorers, ratchet) as the substrate for interactive learning.
 ### 4.1 Entry point: 00_START_HERE.ipynb
 Every learner begins at `notebooks/00_START_HERE.ipynb`. This notebook
 contains no code --- it is a plan selector. It describes the four plans, their
 target audiences, and links directly to each plan's first notebook. All
 content notebooks link back to Start Here.
 ### 4.2 The four plans
 | Plan | Style | Notebooks | Target learner |
 |------|-------|-----------|----------------|
 | **A** | Bottom-up, sequential | 3 | Methodical learners who want foundations first |
 | **B** | Spiral, three passes | 1 (78 cells) | Time-pressed learners who want a demo first, theory later |
 | **C** | Parallel tracks + dashboard | 4 | Learners who want to choose their own path |
 | **D** | Hypothesis-driven experiments | 3 | Research-oriented learners who want to test claims |
 All four plans cover the same core concepts: T-state preparation, [[4,2,2]]
 encoding, stabiliser verification, postselection, scoring, the ratchet
 optimiser, lesson extraction, and cross-rung transfer.
 ### 4.3 Interactive assessments (teaching/assess.py)
 Every content notebook includes interactive assessments built with ipywidgets:
 - **quiz()** --- multiple-choice questions with immediate feedback
 - **predict_choice()** --- "What do you think will happen?" before running code
 - **reflect()** --- open-ended reflections graded by keyword matching
 - **order()** --- drag-and-drop ordering exercises (e.g., rank error types)
 Each assessment is tagged with a Bloom's taxonomy level (remember, understand,
 apply, analyse, evaluate) and a topic. The full mapping of learning objectives
 to assessments is documented in `notebooks/learning_objectives.md`.
 ### 4.4 Progress tracking (teaching/tracker.py)
 Each notebook creates a `LearningTracker` instance that records:
 - scores per assessment (correct/incorrect, attempt count)
 - Bloom's level distribution (how many of each level attempted/passed)
 - time spent per assessment
 - checkpoint summaries at natural breakpoints
 At the end of each notebook, `tracker.dashboard()` displays a visual summary,
 and `tracker.save()` persists progress to a JSON file. Progress files can be
 reset with `bash scripts/app.sh reset`.
 ### 4.5 Navigation
 Every content notebook has a navigation footer with:
 - **Forward link** to the next notebook in the plan
 - **Back-link** to 00_START_HERE.ipynb
 - **Cross-plan suggestions** at terminal notebooks (e.g., "Finished Plan A?
  Try Plan D for a different perspective.")
 ### 4.6 Pedagogical quality enforcement
 The test suite includes `tests/test_pedagogy.py`, which enforces educational
 quality invariants across all content notebooks:
 - Minimum 200 words of prose per notebook
 - At least 25% of cells are markdown (not code-only)
 - Every notebook has a title header and multiple sections
 - At least 2 interactive assessments per notebook
 - At least 2 different assessment types per notebook (variety)
 - Bloom's taxonomy coverage: at least 2 levels per notebook
 - Checkpoint summaries present when a notebook has 4+ assessments
 - LearningTracker initialisation, dashboard(), and save() in every notebook
 - Key Insight callouts in longer notebooks (5+ sections)
 - All four plans collectively cover core concepts (stabiliser, magic, witness, ratchet)
 These tests catch pedagogical regressions the same way unit tests catch code
 regressions. Adding a new notebook or modifying an existing one will fail CI
 if it violates these invariants.
 ---
 ## Part 5: The consumer experience (app.sh)
 The project includes a lifecycle manager (`scripts/app.sh`) that handles the
 entire consumer experience from first clone to running notebooks:
 ```bash
 bash scripts/app.sh bootstrap     # venv, pip install, kernel registration, import check
 bash scripts/app.sh start         # launch JupyterLab, open 00_START_HERE.ipynb
 bash scripts/app.sh stop          # graceful shutdown
 bash scripts/app.sh status        # venv, server, notebook, progress summary
 bash scripts/app.sh validate      # ruff + mypy + full test suite
 bash scripts/app.sh validate --quick  # lint + type check + unit tests only
 bash scripts/app.sh logs          # tail JupyterLab output
 bash scripts/app.sh reset         # delete learner progress files
 ```
 Bootstrap checks Python >= 3.11, creates the venv, installs the package with
 dev and notebook dependencies, registers a Jupyter kernel, and verifies that
 core imports succeed. Start finds a free port (8888-8899), launches JupyterLab
 in the background with PID tracking, and opens the browser directly to
 `00_START_HERE.ipynb`.
 Validation runs the full quality pipeline: ruff linting, mypy strict type
 checking, and the pytest suite (335 tests, excluding browser UX by default).
 The `--quick` flag runs only lint, type check, and unit tests.
 ---
 ## Part 6: The file map
 ```
 autoresearch-quantum/
@ -450,8 +567,42 @@ autoresearch-quantum/
      store.py             JSON file store: experiments, steps, progress,
                           lessons, feedback, propagated specs
-  tests/
+    teaching/
-    test_harness.py        21 tests covering every subsystem
+      assess.py            Widget-based quizzes, predictions, reflections
      tracker.py           LearningTracker: per-student progress tracking
  notebooks/
    00_START_HERE.ipynb    Central entry point: plan selector
    learning_objectives.md Per-notebook, per-section learning objectives
    plan_a/                Bottom-up: 3 sequential notebooks
    plan_b/                Spiral: 1 notebook, 3 passes
    plan_c/                Parallel tracks + dashboard: 4 notebooks
    plan_d/                Hypothesis-driven: 3 experiments
  paper/
    autoresearch_quantum.tex   Technical paper (LaTeX, 19 pages)
    compendium.tex             Companion textbook (LaTeX, 36 pages)
  scripts/
    app.sh                 Consumer lifecycle manager (bootstrap/start/stop/validate)
  tests/                   335 tests across 13 files
    test_analysis.py       Postselection & witness
    test_browser_ux.py     Playwright end-to-end UX
    test_cli.py            CLI subcommands
    test_codes.py          [[4,2,2]] code correctness
    test_config.py         YAML config loading
    test_experiments.py    Circuit bundle construction
    test_feedback.py       Lesson extraction & search rules
    test_harness.py        Full ratchet integration
    test_notebooks.py      Notebook execution & structure
    test_pedagogy.py       Pedagogical quality invariants (130 tests)
    test_persistence.py    JSON store round-trips
    test_scoring.py        Score functions
    test_teaching.py       Assessment widgets & tracker
  .github/workflows/ci.yml  CI: lint, type check, test matrix, notebook execution
  .pre-commit-config.yaml   Ruff, mypy, nbstripout, hygiene hooks
  data/                    Output directory (created at runtime)
    default/
@ -472,12 +623,12 @@ autoresearch-quantum/
 ---
-## Part 5: How to use it without Claude
+## Part 7: How to use it without Claude
 You do not need an AI to run this system or to make progress with its
 output. Everything below runs in your terminal.
-### 5.1 Setup
+### 7.1 Setup
 ```bash
 cd autoresearch-quantum
@ -486,7 +637,7 @@ source .venv/bin/activate
 pip install -e ".[dev]"
 ```
-### 5.2 Run a single experiment
+### 7.2 Run a single experiment
 ```bash
 python -m autoresearch_quantum run-experiment \
@ -498,7 +649,7 @@ python -m autoresearch_quantum run-experiment \
 This prints a JSON result with the score, failure mode, and experiment ID.
 The full record is saved to `data/default/rung_1/experiments/`.
-### 5.3 Run one ratchet step
+### 7.3 Run one ratchet step
 ```bash
 python -m autoresearch_quantum run-step \
@ -510,7 +661,7 @@ them, promotes the best, and saves the step record. Run it again and it
 generates *new* challengers (never repeating), with a new incumbent if one was
 found.
-### 5.4 Run a full rung
+### 7.4 Run a full rung
 ```bash
 python -m autoresearch_quantum run-rung \
@ -521,7 +672,7 @@ Runs up to `step_budget` steps (default 3), stopping early if patience runs
 out. Produces `data/default/rung_1/lesson.md` -- read this file. It tells you
 what helped, what hurt, what seems invariant, and what to test next.
-### 5.5 Run the full five-rung ratchet
+### 7.5 Run the full five-rung ratchet
 ```bash
 python -m autoresearch_quantum run-ratchet \
@ -536,7 +687,7 @@ This is the full pipeline. Each rung's winner is automatically propagated to
 the next rung. Each rung's lessons narrow the search space for the next.
 When it finishes, you have five lesson files and a final optimised recipe.
-### 5.6 Run a transfer evaluation
+### 7.6 Run a transfer evaluation
 ```bash
 python -m autoresearch_quantum run-transfer \
@ -547,7 +698,7 @@ python -m autoresearch_quantum run-transfer \
 Tests a single spec across multiple backend noise models. The output tells you
 the per-backend scores and the pessimistic transfer score.
-### 5.7 Reading the output
+### 7.7 Reading the output
 After a ratchet run, the most valuable artefacts are:
@ -559,7 +710,7 @@ After a ratchet run, the most valuable artefacts are:
 | `rung_N/propagated_spec.json` | The spec that was carried forward from the previous rung. Compare it with the YAML bootstrap to see what the system changed. |
 | `rung_N/progress.json` | If the run was interrupted, this tells you where it left off. Just re-run the same command to resume. |
-### 5.8 Making manual progress with the artefacts
+### 7.8 Making manual progress with the artefacts
 The system is designed so that you can interleave human intuition with
 automated search:
@ -591,22 +742,27 @@ automated search:
   You are now doing what the system does in `run_ratchet` -- but with human
   judgement about what to explore next.
-### 5.9 Running the tests
+### 7.9 Running the tests
 ```bash
 # Full validation (recommended)
 bash scripts/app.sh validate
 # Or directly with pytest
 python -m pytest tests/ -v
 ```
-All 21 tests should pass. They take about 13 seconds. If a test fails after
+All 335 tests should pass (browser UX tests excluded by default). If a test
-you edit a YAML config, the most likely cause is that you introduced a
+fails after you edit a YAML config, the most likely cause is that you
-dimension value that does not correspond to an implemented code path (e.g.,
+introduced a dimension value that does not correspond to an implemented code
-`encoder_style: "rzz_lattice"` does not exist in `four_two_two.py`).
+path (e.g., `encoder_style: "rzz_lattice"` does not exist in
 `four_two_two.py`).
 ---
-## Part 6: What this system does NOT do (yet)
+## Part 8: What this system does NOT do (yet)
 - **It does not run on real quantum hardware by default.** The
  `IBMHardwareExecutor` exists and is wired up, but `enable_hardware: false`
@ -623,8 +779,10 @@ dimension value that does not correspond to an implemented code path (e.g.,
  `SearchRule` extraction, the `CompositeGenerator` budget allocation, and
  the cross-rung propagation logic.
- **It does not visualise results.** There is no dashboard. The output is
+- **CLI output is JSON and Markdown.** The CLI ratchet produces JSON files
-  JSON and Markdown. You read it, or you write a script to plot it.
+  and Markdown lessons. For interactive exploration, use the Plan C dashboard
  notebook (`plan_c/00_dashboard.ipynb`), which provides a widget-based
  interface for running experiments and viewing results.
 - **It does not parallelise evaluations.** Each experiment runs sequentially.
  On a machine with multiple cores, you could shard the challenger set across
@ -634,7 +792,7 @@ dimension value that does not correspond to an implemented code path (e.g.,
 ---
-## Part 7: Architecture diagram
+## Part 9: Architecture diagram
 ```
                          configs/rungs/rung1-5.yaml
@ -677,6 +835,6 @@ ratchet runs multiple rungs. The lessons tighten the circle with every pass.
 ---
-*This document was written on 2026-04-04 to describe the system as built.
+*This document was last updated on 2026-04-15 to describe the system as
-The code is the ground truth. If this document contradicts the code, the
+built. The code is the ground truth. If this document contradicts the code,
-code is correct.*
+the code is correct.*
--- a/notebooks/learning_objectives.md
+++ b/notebooks/learning_objectives.md
@ -1,8 +1,19 @@
-# Learning Objectives — Per Notebook, Per Section
+# Learning Objectives --- Per Notebook, Per Section
 Each objective has a Bloom level and a matched assessment type.
 All four plans teach the same core material; the pedagogical approach differs.
 **Entry point:** Open `00_START_HERE.ipynb` to choose your plan. Every content
 notebook links back to Start Here and forward to the next notebook in the plan.
 **Assessment types:**
 - **MCQ** (`quiz()`) --- multiple-choice with immediate feedback
 - **Predict** (`predict_choice()`) --- predict an outcome before running code
 - **Reflect** (`reflect()`) --- open-ended reflection graded by keywords
 - **Order** (`order()`) --- rank or sequence items
 All assessments are tracked by `LearningTracker` with Bloom's taxonomy levels.
 ---
 ## Plan A — Bottom-Up (3 Sequential Notebooks)
--- a/paper/autoresearch_quantum.tex
+++ b/paper/autoresearch_quantum.tex
@ -913,9 +913,10 @@ re-evaluated, and the patience counter is preserved.
 \label{sec:verification_claims}
 % ============================================================================
-The test suite contains 21 tests, each anchored to a specific architectural
+The full test suite contains 335 tests across 13 files, covering the research
-claim. We present them grouped by subsystem, with the falsification condition
+engine, teaching layer, notebook structure, and pedagogical quality. Below we
-for each.
+present the 21 core research-engine tests, grouped by subsystem, with the
 falsification condition for each.
 \subsection{Quantum Correctness (3 tests)}
@ -1108,7 +1109,8 @@ with different \code{verification} values. The seeds must differ.
 cd autoresearch-quantum
 python -m venv .venv && source .venv/bin/activate
 pip install -e ".[dev]"
-python -m pytest tests/ -v          # 21 tests, ~13 seconds
+python -m pytest tests/ -v          # 335 tests
 bash scripts/app.sh validate       # full validation (lint + types + tests)
 \end{lstlisting}
 Requires Python $\geq$ 3.11 and Qiskit $\geq$ 2.3. No GPU needed.
--- a/paper/compendium.tex
+++ b/paper/compendium.tex
@ -116,13 +116,14 @@
 \begin{center}
 \begin{minipage}{0.85\textwidth}
 \small\itshape
-This compendium is the ``course textbook'' for the eight Jupyter notebooks
+This compendium is the ``course textbook'' for the twelve Jupyter notebooks
-in the \textsc{autoresearch-quantum} project. It is designed to be read
+(across four learning plans) in the \textsc{autoresearch-quantum} project.
-before, during, or after working through the notebooks. Every concept
+Start at \texttt{00\_START\_HERE.ipynb} to choose your plan. This document
-exercised in the notebooks is explained here with the depth and context
+is designed to be read before, during, or after working through the
-that a tutorial session cannot provide. No prior knowledge of quantum
+notebooks. Every concept exercised in the notebooks is explained here
-error correction is assumed; familiarity with linear algebra and
+with the depth and context that a tutorial session cannot provide. No
-complex numbers is helpful.
+prior knowledge of quantum error correction is assumed; familiarity with
 linear algebra and complex numbers is helpful.
 \end{minipage}
 \end{center}
 \vspace{2cm}
@ -1419,13 +1420,13 @@ expectation value is the average over many measurements.
 \textbf{Notebook Topic} & \textbf{Notebooks} & \textbf{Compendium} \\
 \midrule
 T-state definition \& Bloch sphere &
-  A/01~\S1--2, B~\S2.1, C/A~\S1--3 &
+  A/01~\S1--2, B~\S2.1, C/A~\S1--3, D/1~\S1 &
  \cref{ch:magic} \\
 Why encode (no-cloning, distance) &
-  A/01~\S3, C/A~\S1 &
+  A/01~\S3, C/A~\S1, D/1~\S2 &
  \cref{ch:code}~\S1--2 \\
 Stabilisers \& codespace &
-  A/01~\S6, B~\S2.3, C/A~\S4 &
+  A/01~\S6, B~\S2.3, C/A~\S4, D/1~\S3 &
  \cref{ch:code}~\S3 \\
 Logical operators &
  A/01~\S6, C/A~\S5 &
@ -1434,22 +1435,22 @@ Encoder circuits &
  A/01~\S4--5, C/A~\S6 &
  \cref{sec:encoder} \\
 Error detection &
-  A/01~\S7, C/A~\S8 &
+  A/01~\S7, C/A~\S8, D/1~\S4 &
  \cref{sec:errors} \\
 Ancilla \& syndrome extraction &
  A/01~\S9, C/A~\S7 &
  \cref{ch:measurement}~\S2 \\
 Postselection &
-  A/01~\S11, A/02~\S3, B~\S2.5 &
+  A/01~\S11, A/02~\S3, B~\S2.5, D/1~\S6 &
  \cref{sec:postselection} \\
 Noise models \& transpilation &
-  A/02~\S2, C/B~\S1--3 &
+  A/02~\S2, C/B~\S1--3, D/2~\S1 &
  \cref{ch:noise} \\
 Magic witness formula &
-  A/02~\S5, B~\S2.7, C/A~\S9 &
+  A/02~\S5, B~\S2.7, C/A~\S9, D/1~\S5 &
  \cref{ch:witness} \\
 Scoring formula &
-  A/02~\S7, B~\S2.9, C/B~\S8 &
+  A/02~\S7, B~\S2.9, C/B~\S8, D/2~\S2 &
  \cref{ch:scoring} \\
 Factory throughput &
  A/02~\S10, C/B~\S9 &
@ -1458,20 +1459,23 @@ Failure modes &
  A/02~\S9, C/B~\S7 &
  \cref{sec:failures} \\
 Ratchet mechanism &
-  A/03~\S1--4, B~\S2.10--12, C/C~\S1--7 &
+  A/03~\S1--4, B~\S2.10--12, C/C~\S1--7, D/3~\S1--2 &
  \cref{ch:ratchet}~\S1--3 \\
 Search strategies &
-  A/03~\S7, B~\S3.5, C/C~\S3--4 &
+  A/03~\S7, B~\S3.5, C/C~\S3--4, D/3~\S2 &
  \cref{sec:strategies} \\
 Lesson extraction \& rules &
-  A/03~\S8, B~\S3.6, C/C~\S8--9 &
+  A/03~\S8, B~\S3.6, C/C~\S8--9, D/3~\S4 &
  \cref{sec:lessons} \\
 Narrowing \& propagation &
  B~\S3.7, C/C~\S10--11 &
  \cref{ch:ratchet}~\S5--6 \\
 Transfer evaluation &
-  A/03~\S10, B~\S3.8, C/C~\S12 &
+  A/03~\S10, B~\S3.8, C/C~\S12, D/3~\S5 &
  \cref{ch:ratchet}~\S7 \\
 Parameter sweep \& optimisation &
  A/02~\S8, D/2~\S3, D/3~\S3 &
  \cref{ch:scoring} \\
 \bottomrule
 \end{tabular}
 \end{center}