Sync all documentation with current project ground truth

README: rewrite with Quick Start (app.sh), 335-test count, teaching layer
narrative, testing/validation section, CI/CD docs, pre-commit hooks.
THE_STORY: add Part 4 (teaching layer), Part 5 (app.sh consumer experience),
update file map with all 13 test files and teaching/notebook/paper entries.
compendium.tex: update notebook count (8→12), add Plan D cross-references.
autoresearch_quantum.tex: update test counts (21→335), add app.sh validate.
learning_objectives.md: add entry point reference and assessment type glossary.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
saymrwulf 2026-04-15 20:55:02 +02:00
parent 29caba3a1a
commit 55237d5f73
5 changed files with 455 additions and 212 deletions

396
README.md
View file

@ -1,93 +1,205 @@
# Autoresearch Quantum
`autoresearch-quantum` is a Python research harness for a Karpathy-style autoresearch ratchet in quantum experiments:
`autoresearch-quantum` is a Python research harness for a Karpathy-style autoresearch ratchet in quantum experiments, combined with a four-plan interactive coursework built on Jupyter notebooks.
- keep an incumbent experiment
- generate challenger experiments
- screen challengers on a cheap tier
- promote only justified challengers to an expensive tier
- replace the incumbent only when the challenger wins on the final criterion
- log every ratchet step
- extract a transferable lesson at the end of each rung
The system has two layers:
The first built-in experiment family targets encoded magic-state preparation in the `[[4,2,2]]` code with Qiskit. The framework is designed so the `[[4,2,2]]` rung is not the destination. It is the first rung in a ladder that shifts from best-circuit hunting toward reusable design rules for larger encoded workflows.
1. **Research engine** --- an automated loop that discovers the best way to prepare encoded magic states on the [[4,2,2]] quantum error-detecting code. It proposes, evaluates, compares, learns, and repeats without human intervention.
2. **Teaching layer** --- 12 Jupyter notebooks across 4 learning plans, each teaching the same core material through a different pedagogical lens: sequential (Plan A), spiral (Plan B), parallel tracks (Plan C), and hypothesis-driven experiments (Plan D). Every notebook includes interactive widget-based assessments, per-student progress tracking, and Bloom's taxonomy-aligned exercises.
No IBM account or API key is needed --- everything runs locally with the Aer simulator.
## Project Tree
```text
autoresearch-quantum/
├── configs/rungs/
│ ├── rung1.yaml Baseline: what recipe works?
│ ├── rung2.yaml Stability under noise variation
│ ├── rung3.yaml Transfer across backends
│ ├── rung4.yaml Factory throughput / cost
│ └── rung5.yaml Rosenfeld direction
│ ├── rung1.yaml Baseline: what recipe works?
│ ├── rung2.yaml Stability under noise variation
│ ├── rung3.yaml Transfer across backends
│ ├── rung4.yaml Factory throughput / cost
│ └── rung5.yaml Rosenfeld direction
├── src/autoresearch_quantum/
│ ├── cli.py CLI entry point
│ ├── config.py YAML config loader
│ ├── models.py All data structures
│ ├── cli.py CLI entry point
│ ├── config.py YAML config loader
│ ├── models.py All data structures
│ ├── codes/
│ │ └── four_two_two.py [[4,2,2]] stabilisers, encoder, seed gates
│ │ └── four_two_two.py [[4,2,2]] stabilisers, encoder, seed gates
│ ├── experiments/
│ │ └── encoded_magic_state.py Circuit bundle builder
│ ├── execution/
│ │ ├── analysis.py Postselection, witness, stability
│ │ ├── backends.py Backend resolution
│ │ ├── hardware.py IBM hardware executor
│ │ ├── local.py Aer noise simulation executor
│ │ ├── transfer.py Cross-backend transfer evaluator
│ │ └── transpile.py Transpilation utilities
│ │ ├── analysis.py Postselection, witness, stability
│ │ ├── backends.py Backend resolution
│ │ ├── hardware.py IBM hardware executor
│ │ ├── local.py Aer noise simulation executor
│ │ ├── transfer.py Cross-backend transfer evaluator
│ │ └── transpile.py Transpilation utilities
│ ├── lessons/
│ │ ├── extractor.py Human-readable lesson extraction
│ │ └── feedback.py Machine-readable rules + search narrowing
│ │ ├── extractor.py Human-readable lesson extraction
│ │ └── feedback.py Machine-readable rules + search narrowing
│ ├── persistence/
│ │ └── store.py JSON file store with resumability
│ │ └── store.py JSON file store with resumability
│ ├── ratchet/
│ │ └── runner.py AutoresearchHarness orchestrator
│ │ └── runner.py AutoresearchHarness orchestrator
│ ├── scoring/
│ │ └── score.py WAC + factory throughput scorers
│ │ └── score.py WAC + factory throughput scorers
│ ├── search/
│ │ ├── challengers.py Neighbour generation with dedup
│ │ └── strategies.py NeighborWalk, RandomCombo, LessonGuided
│ │ ├── challengers.py Neighbour generation with dedup
│ │ └── strategies.py NeighborWalk, RandomCombo, LessonGuided
│ └── teaching/
│ ├── assess.py Widget-based quizzes, predictions, reflections
│ └── tracker.py LearningTracker — per-student progress tracking
│ ├── assess.py Widget-based quizzes, predictions, reflections
│ └── tracker.py LearningTracker --- per-student progress tracking
├── paper/
│ ├── autoresearch_quantum.tex Full technical paper (LaTeX)
│ ├── autoresearch_quantum.pdf Compiled PDF (19 pages)
│ ├── compendium.tex Companion textbook (LaTeX)
│ └── compendium.pdf Compiled PDF (36 pages)
├── notebooks/
│ ├── plan_a/ Bottom-up: 3 sequential notebooks
│ ├── 00_START_HERE.ipynb Central entry point --- plan selector
│ ├── learning_objectives.md Per-notebook, per-section learning objectives
│ ├── plan_a/ Bottom-up: 3 sequential notebooks
│ │ ├── 01_encoded_magic_state.ipynb
│ │ ├── 02_measuring_progress.ipynb
│ │ └── 03_the_ratchet.ipynb
│ ├── plan_b/ Spiral: 1 notebook, three passes
│ ├── plan_b/ Spiral: 1 notebook, three passes
│ │ └── spiral_notebook.ipynb
│ ├── plan_c/ Parallel tracks + dashboard
│ ├── plan_c/ Parallel tracks + dashboard
│ │ ├── 00_dashboard.ipynb
│ │ ├── track_a_physics.ipynb
│ │ ├── track_b_engineering.ipynb
│ │ └── track_c_search.ipynb
│ └── plan_d/ Three claim-driven experiments
│ └── plan_d/ Three claim-driven experiments
│ ├── experiment_1_protection.ipynb
│ ├── experiment_2_noise.ipynb
│ └── experiment_3_optimisation.ipynb
├── tests/ 107 tests
│ ├── test_analysis.py
│ ├── test_cli.py
│ ├── test_codes.py
│ ├── test_config.py
│ ├── test_experiments.py
│ ├── test_feedback.py
│ ├── test_harness.py
│ ├── test_persistence.py
│ └── test_scoring.py
├── THE_STORY.md Narrative documentation
├── pyproject.toml
├── scripts/
│ └── app.sh Consumer lifecycle manager
├── tests/ 335 tests across 13 files
│ ├── test_analysis.py Postselection & witness tests
│ ├── test_browser_ux.py Playwright end-to-end UX tests
│ ├── test_cli.py CLI subcommand tests
│ ├── test_codes.py [[4,2,2]] code correctness
│ ├── test_config.py YAML config loading
│ ├── test_experiments.py Circuit bundle construction
│ ├── test_feedback.py Lesson extraction & search rules
│ ├── test_harness.py Full ratchet integration tests
│ ├── test_notebooks.py Notebook execution & structure
│ ├── test_pedagogy.py Pedagogical quality invariants
│ ├── test_persistence.py JSON store round-trips
│ ├── test_scoring.py Score function correctness
│ └── test_teaching.py Assessment widget & tracker tests
├── .github/workflows/ci.yml CI: lint, type check, test matrix, notebook execution
├── .pre-commit-config.yaml Ruff, mypy, nbstripout, hygiene hooks
├── THE_STORY.md Narrative documentation (system design)
├── pyproject.toml Build config, dependencies, tool settings
└── README.md
```
## Quick Start
The fastest way to get running:
```bash
# Clone and bootstrap (creates venv, installs everything, registers Jupyter kernel)
git clone https://github.com/saymrwulf/autoresearch-quantum.git
cd autoresearch-quantum
bash scripts/app.sh bootstrap
# Launch JupyterLab (opens 00_START_HERE.ipynb in your browser)
bash scripts/app.sh start
```
The `app.sh` lifecycle manager handles the entire consumer experience:
| Command | What it does |
|---------|-------------|
| `bash scripts/app.sh bootstrap` | Create venv, install deps, register Jupyter kernel, verify imports |
| `bash scripts/app.sh start` | Launch JupyterLab (auto-opens `00_START_HERE.ipynb`) |
| `bash scripts/app.sh start --no-open` | Launch without opening browser |
| `bash scripts/app.sh stop` | Stop JupyterLab |
| `bash scripts/app.sh status` | Show venv, server, notebook, and progress status |
| `bash scripts/app.sh validate` | Run full validation: ruff + mypy + pytest |
| `bash scripts/app.sh validate --quick` | Lint + type check + unit tests only |
| `bash scripts/app.sh logs` | Tail JupyterLab output |
| `bash scripts/app.sh reset` | Delete learner progress files |
### Manual installation
If you prefer manual setup:
```bash
python3 -m venv .venv
. .venv/bin/activate
pip install -e '.[dev,notebooks]'
```
For the optional IBM hardware path:
```bash
pip install -e '.[hardware,dev,notebooks]'
```
## Jupyter Notebooks --- Learning Plans
The `notebooks/` folder contains **12 notebooks across 4 independent learning plans**, all accessible from a central entry point: **`00_START_HERE.ipynb`**.
Each plan teaches the same core material (encoded magic-state preparation, measurement, and the ratchet optimiser) through a different didactic lens. Every content notebook includes:
- **Interactive assessments** --- multiple-choice quizzes, predictions, reflections, and ordering exercises (ipywidgets)
- **Per-student progress tracking** --- `LearningTracker` records scores, Bloom's levels, and time per assessment
- **Navigation links** --- forward/backward links between notebooks, cross-plan suggestions, and back-links to Start Here
- **Key Insight callouts** --- highlighted takeaways for important concepts
- **Checkpoint summaries** --- mid-notebook progress reviews in longer notebooks
### Plan A --- Bottom-Up (3 sequential notebooks)
| # | File | What you learn |
|---|------|----------------|
| 1 | `plan_a/01_encoded_magic_state.ipynb` | T-state, [[4,2,2]] encoder, stabilisers, error detection, postselection |
| 2 | `plan_a/02_measuring_progress.ipynb` | Noise, logical operators, magic witness, scoring formula, parameter sweeps |
| 3 | `plan_a/03_the_ratchet.ipynb` | Incumbent/challenger model, ratchet steps, lessons, cross-rung propagation |
Start with notebook 01 and work through in order. Run each cell top-to-bottom (Shift+Enter).
### Plan B --- Spiral (1 notebook, three passes)
| File | What you learn |
|------|----------------|
| `plan_b/spiral_notebook.ipynb` | **Pass 1:** 5-min demo (black-box). **Pass 2:** Open the box (circuits, stabilisers, scoring). **Pass 3:** Make it your own (modify parameters, run experiments). |
One notebook, 78 cells. Each pass revisits the same system at a deeper level.
### Plan C --- Parallel Tracks (4 notebooks)
| File | Focus |
|------|-------|
| `plan_c/00_dashboard.ipynb` | Interactive dashboard (ipywidgets) --- run experiments from dropdowns |
| `plan_c/track_a_physics.ipynb` | Pure quantum mechanics: Eastin-Knill, Bloch sphere, stabiliser algebra |
| `plan_c/track_b_engineering.ipynb` | Noise models, transpilation, cost model, failure modes |
| `plan_c/track_c_search.ipynb` | Parameter space, search strategies, lesson extraction, cross-rung transfer |
Start with the dashboard for an overview, then dive into whichever track interests you. The three tracks are independent and can be read in any order.
### Plan D --- Three Claim-Driven Experiments
| # | File | Hypothesis |
|---|------|-----------|
| 1 | `plan_d/experiment_1_protection.ipynb` | The [[4,2,2]] code can protect a magic state: W=1.0, all errors detected |
| 2 | `plan_d/experiment_2_noise.ipynb` | Noise degrades quality but parameter choice matters >2x |
| 3 | `plan_d/experiment_3_optimisation.ipynb` | A ratchet can learn to optimise and its knowledge transfers |
Each notebook follows: **Hypothesis -> Claim -> Experiment -> Proof -> Next Hypothesis**.
### Troubleshooting
| Problem | Fix |
|---------|-----|
| `ModuleNotFoundError: autoresearch_quantum` | Run `bash scripts/app.sh bootstrap` or `pip install -e '.[notebooks]'` |
| `ModuleNotFoundError: ipywidgets` | Run `pip install ipywidgets` --- needed for interactive assessments |
| Plots don't render | Make sure `%matplotlib inline` is in the first code cell (it already is) |
| Kernel not found | In JupyterLab, select **Kernel > Change Kernel** and pick the `.venv` Python |
## Scientific Framing
### What is optimized
@ -144,7 +256,7 @@ Expensive tier:
## Built-In `[[4,2,2]]` Experiment
The built-in experiment prepares an encoded logical T-state on one logical qubit of the `[[4,2,2]]` code while keeping the spectator logical qubit in `|0`. The code utilities live in [`four_two_two.py`](src/autoresearch_quantum/codes/four_two_two.py).
The built-in experiment prepares an encoded logical T-state on one logical qubit of the `[[4,2,2]]` code while keeping the spectator logical qubit in `|0>`. The code utilities live in [`four_two_two.py`](src/autoresearch_quantum/codes/four_two_two.py).
The harness evaluates:
@ -158,108 +270,12 @@ This keeps the core scientific distinction explicit:
- a circuit can be locally good for `[[4,2,2]]`
- a rule is only valuable if it keeps helping across new backends or new rungs
## Installation
Create an isolated environment in the project root and install the package:
```bash
python3 -m venv .venv
. .venv/bin/activate
pip install -e '.[dev,notebooks]'
```
For the optional IBM hardware path:
```bash
pip install -e '.[hardware,dev,notebooks]'
```
If you want the CLI without installing editable mode, use `PYTHONPATH=src`.
## Jupyter Notebooks --- Learning Plans
The `notebooks/` folder contains four independent learning experiences.
Each plan teaches the same material (encoded magic-state preparation, measurement, and the ratchet optimiser) through a different didactic lens.
**No IBM account or API key is needed** --- everything runs locally with the Aer simulator.
### Quick start
```bash
# 1. Activate the virtual environment (if not already active)
. .venv/bin/activate
# 2. Install the project with notebook dependencies
pip install -e '.[notebooks]'
# 3. Start the Jupyter server
jupyter lab --notebook-dir=notebooks
```
This opens JupyterLab in your browser (usually at http://localhost:8888).
Navigate into any plan folder and open the first notebook.
> **Alternative:** If you prefer the classic notebook interface, run
> `jupyter notebook --notebook-dir=notebooks` instead.
### Plan A --- Bottom-Up (3 sequential notebooks)
| # | File | What you learn |
|---|------|----------------|
| 1 | `plan_a/01_encoded_magic_state.ipynb` | T-state, [[4,2,2]] encoder, stabilisers, error detection, postselection |
| 2 | `plan_a/02_measuring_progress.ipynb` | Noise, logical operators, magic witness, scoring formula, parameter sweeps |
| 3 | `plan_a/03_the_ratchet.ipynb` | Incumbent/challenger model, ratchet steps, lessons, cross-rung propagation |
Start with notebook 01 and work through in order.
Run each cell top-to-bottom (Shift+Enter).
### Plan B --- Spiral (1 notebook, three passes)
| File | What you learn |
|------|----------------|
| `plan_b/spiral_notebook.ipynb` | **Pass 1:** 5-min demo (black-box). **Pass 2:** Open the box (circuits, stabilisers, scoring). **Pass 3:** Make it your own (modify parameters, run experiments). |
One notebook, 78 cells. Each pass revisits the same system at a deeper level.
### Plan C --- Parallel Tracks (4 notebooks)
| File | Focus |
|------|-------|
| `plan_c/00_dashboard.ipynb` | Interactive dashboard (ipywidgets) --- run experiments from dropdowns |
| `plan_c/track_a_physics.ipynb` | Pure quantum mechanics: Eastin-Knill, Bloch sphere, stabiliser algebra |
| `plan_c/track_b_engineering.ipynb` | Noise models, transpilation, cost model, failure modes |
| `plan_c/track_c_search.ipynb` | Parameter space, search strategies, lesson extraction, cross-rung transfer |
Start with the dashboard for an overview, then dive into whichever track interests you.
The three tracks are independent and can be read in any order.
### Plan D --- Three Claim-Driven Experiments
| # | File | Hypothesis |
|---|------|-----------|
| 1 | `plan_d/experiment_1_protection.ipynb` | The [[4,2,2]] code can protect a magic state: W=1.0, all errors detected |
| 2 | `plan_d/experiment_2_noise.ipynb` | Noise degrades quality but parameter choice matters >2× |
| 3 | `plan_d/experiment_3_optimisation.ipynb` | A ratchet can learn to optimise and its knowledge transfers |
Each notebook follows: **Hypothesis → Claim → Experiment → Proof → Next Hypothesis**.
The output of each experiment motivates the next.
### Troubleshooting
| Problem | Fix |
|---------|-----|
| `ModuleNotFoundError: autoresearch_quantum` | Run `pip install -e '.[notebooks]'` inside the activated `.venv` |
| `ModuleNotFoundError: ipywidgets` | Run `pip install ipywidgets` --- needed for the Plan C dashboard |
| Plots don't render | Make sure `%matplotlib inline` is in the first code cell (it already is) |
| Kernel not found | In JupyterLab, select **Kernel > Change Kernel** and pick the `.venv` Python |
## How To Run
## How To Run (CLI)
### 1. Run a single local experiment
Use the rung config bootstrap incumbent as-is:
```bash
PYTHONPATH=src .venv/bin/python -m autoresearch_quantum run-experiment \
autoresearch-quantum run-experiment \
--config configs/rungs/rung1.yaml \
--store-dir data/demo
```
@ -267,7 +283,7 @@ PYTHONPATH=src .venv/bin/python -m autoresearch_quantum run-experiment \
Override individual experiment fields:
```bash
PYTHONPATH=src .venv/bin/python -m autoresearch_quantum run-experiment \
autoresearch-quantum run-experiment \
--config configs/rungs/rung1.yaml \
--store-dir data/demo \
--set verification=z_only \
@ -278,7 +294,7 @@ PYTHONPATH=src .venv/bin/python -m autoresearch_quantum run-experiment \
### 2. Run one ratchet step
```bash
PYTHONPATH=src .venv/bin/python -m autoresearch_quantum run-step \
autoresearch-quantum run-step \
--config configs/rungs/rung1.yaml \
--store-dir data/demo
```
@ -294,7 +310,7 @@ This will:
### 3. Run one full rung
```bash
PYTHONPATH=src .venv/bin/python -m autoresearch_quantum run-rung \
autoresearch-quantum run-rung \
--config configs/rungs/rung1.yaml \
--store-dir data/demo
```
@ -310,7 +326,7 @@ Artifacts are persisted under `data/demo/rung_<n>/`:
### 4. Run a multi-rung ratchet campaign
```bash
PYTHONPATH=src .venv/bin/python -m autoresearch_quantum run-ratchet \
autoresearch-quantum run-ratchet \
--config configs/rungs/rung1.yaml \
--config configs/rungs/rung2.yaml \
--config configs/rungs/rung3.yaml \
@ -320,18 +336,17 @@ PYTHONPATH=src .venv/bin/python -m autoresearch_quantum run-ratchet \
### 5. Run an optional hardware-backed confirmation
First install the hardware extra and make IBM credentials available in the usual `qiskit-ibm-runtime` way. The simplest path is to export:
First install the hardware extra and make IBM credentials available:
```bash
pip install -e '.[hardware]'
export QISKIT_IBM_TOKEN=...
```
Then enable the hardware tier in the rung config by setting `tier_policy.enable_hardware: true` and optionally `hardware.backend_name: ibm_brisbane`.
Run:
```bash
PYTHONPATH=src .venv/bin/python -m autoresearch_quantum run-step \
autoresearch-quantum run-step \
--config configs/rungs/rung1.yaml \
--store-dir data/hardware \
--hardware
@ -339,18 +354,71 @@ PYTHONPATH=src .venv/bin/python -m autoresearch_quantum run-step \
Only challengers that beat the incumbent cheap-tier score by `tier_policy.cheap_margin` are promoted.
## Testing & Validation
The project has **335 tests** across 13 test files covering every layer:
| Test file | What it validates |
|-----------|-------------------|
| `test_codes.py` | [[4,2,2]] stabilisers, encoder, seed gates |
| `test_experiments.py` | Circuit bundle construction |
| `test_analysis.py` | Postselection, witness, stability metrics |
| `test_scoring.py` | WAC and factory throughput score functions |
| `test_feedback.py` | Lesson extraction, search rules, space narrowing |
| `test_harness.py` | Full ratchet integration (rung, multi-rung, resumability) |
| `test_persistence.py` | JSON store round-trips |
| `test_cli.py` | CLI subcommands |
| `test_config.py` | YAML config loading |
| `test_teaching.py` | Assessment widgets, LearningTracker |
| `test_notebooks.py` | Notebook execution via nbclient, structure validation |
| `test_pedagogy.py` | Pedagogical quality: prose density, assessment density, Bloom's coverage, section structure, tracker integration, key insights, cross-plan consistency |
| `test_browser_ux.py` | Playwright end-to-end: JupyterLab launch, notebook rendering, navigation links, widget rendering |
### Running tests
```bash
# Standard: all tests except browser UX (default)
bash scripts/app.sh validate
# Quick: lint + type check + unit tests only
bash scripts/app.sh validate --quick
# Direct pytest (browser tests excluded by default via marker)
.venv/bin/python -m pytest tests/ -v
# Browser UX tests (requires playwright)
pip install playwright && python -m playwright install chromium
.venv/bin/python -m pytest tests/test_browser_ux.py -m browser -v
```
### Static analysis
- **Ruff** --- linting and formatting (E, F, W, I, UP, B, SIM rule sets)
- **mypy** --- strict mode type checking across all source files
- **nbstripout** --- strips notebook outputs before commit
All three run automatically as **pre-commit hooks** (`.pre-commit-config.yaml`). Install with:
```bash
.venv/bin/pre-commit install
```
### CI/CD
The GitHub Actions pipeline (`.github/workflows/ci.yml`) runs on every push and PR:
1. **Lint job** --- ruff check, ruff format --check, mypy strict (Python 3.11)
2. **Test job** --- full test suite on Python 3.11 and 3.12 matrix
3. **Notebook execution job** --- runs all 12 notebooks end-to-end via nbclient
## Extending The Ladder
The intended progression is:
1. `rung1.yaml`
baseline `[[4,2,2]]` encoded magic-state preparation
2. `rung2.yaml`
same code with stronger stability and backend-awareness
3. `rung3.yaml`
transfer across backend families
4. `rung4.yaml`
factory-style cost pressure
1. `rung1.yaml` --- baseline `[[4,2,2]]` encoded magic-state preparation
2. `rung2.yaml` --- same code with stronger stability and backend-awareness
3. `rung3.yaml` --- transfer across backend families
4. `rung4.yaml` --- factory-style cost pressure
To add a new rung:

View file

@ -398,7 +398,124 @@ and checks that their computed seeds are different.
---
## Part 4: The file map
## Part 4: The teaching layer
The system is not only a research engine. It is also a course. Twelve Jupyter
notebooks, organised into four independent learning plans, teach the same
material through different pedagogical lenses. The teaching layer sits on top
of the research engine and uses its real components (circuits, simulators,
scorers, ratchet) as the substrate for interactive learning.
### 4.1 Entry point: 00_START_HERE.ipynb
Every learner begins at `notebooks/00_START_HERE.ipynb`. This notebook
contains no code --- it is a plan selector. It describes the four plans, their
target audiences, and links directly to each plan's first notebook. All
content notebooks link back to Start Here.
### 4.2 The four plans
| Plan | Style | Notebooks | Target learner |
|------|-------|-----------|----------------|
| **A** | Bottom-up, sequential | 3 | Methodical learners who want foundations first |
| **B** | Spiral, three passes | 1 (78 cells) | Time-pressed learners who want a demo first, theory later |
| **C** | Parallel tracks + dashboard | 4 | Learners who want to choose their own path |
| **D** | Hypothesis-driven experiments | 3 | Research-oriented learners who want to test claims |
All four plans cover the same core concepts: T-state preparation, [[4,2,2]]
encoding, stabiliser verification, postselection, scoring, the ratchet
optimiser, lesson extraction, and cross-rung transfer.
### 4.3 Interactive assessments (teaching/assess.py)
Every content notebook includes interactive assessments built with ipywidgets:
- **quiz()** --- multiple-choice questions with immediate feedback
- **predict_choice()** --- "What do you think will happen?" before running code
- **reflect()** --- open-ended reflections graded by keyword matching
- **order()** --- drag-and-drop ordering exercises (e.g., rank error types)
Each assessment is tagged with a Bloom's taxonomy level (remember, understand,
apply, analyse, evaluate) and a topic. The full mapping of learning objectives
to assessments is documented in `notebooks/learning_objectives.md`.
### 4.4 Progress tracking (teaching/tracker.py)
Each notebook creates a `LearningTracker` instance that records:
- scores per assessment (correct/incorrect, attempt count)
- Bloom's level distribution (how many of each level attempted/passed)
- time spent per assessment
- checkpoint summaries at natural breakpoints
At the end of each notebook, `tracker.dashboard()` displays a visual summary,
and `tracker.save()` persists progress to a JSON file. Progress files can be
reset with `bash scripts/app.sh reset`.
### 4.5 Navigation
Every content notebook has a navigation footer with:
- **Forward link** to the next notebook in the plan
- **Back-link** to 00_START_HERE.ipynb
- **Cross-plan suggestions** at terminal notebooks (e.g., "Finished Plan A?
Try Plan D for a different perspective.")
### 4.6 Pedagogical quality enforcement
The test suite includes `tests/test_pedagogy.py`, which enforces educational
quality invariants across all content notebooks:
- Minimum 200 words of prose per notebook
- At least 25% of cells are markdown (not code-only)
- Every notebook has a title header and multiple sections
- At least 2 interactive assessments per notebook
- At least 2 different assessment types per notebook (variety)
- Bloom's taxonomy coverage: at least 2 levels per notebook
- Checkpoint summaries present when a notebook has 4+ assessments
- LearningTracker initialisation, dashboard(), and save() in every notebook
- Key Insight callouts in longer notebooks (5+ sections)
- All four plans collectively cover core concepts (stabiliser, magic, witness, ratchet)
These tests catch pedagogical regressions the same way unit tests catch code
regressions. Adding a new notebook or modifying an existing one will fail CI
if it violates these invariants.
---
## Part 5: The consumer experience (app.sh)
The project includes a lifecycle manager (`scripts/app.sh`) that handles the
entire consumer experience from first clone to running notebooks:
```bash
bash scripts/app.sh bootstrap # venv, pip install, kernel registration, import check
bash scripts/app.sh start # launch JupyterLab, open 00_START_HERE.ipynb
bash scripts/app.sh stop # graceful shutdown
bash scripts/app.sh status # venv, server, notebook, progress summary
bash scripts/app.sh validate # ruff + mypy + full test suite
bash scripts/app.sh validate --quick # lint + type check + unit tests only
bash scripts/app.sh logs # tail JupyterLab output
bash scripts/app.sh reset # delete learner progress files
```
Bootstrap checks Python >= 3.11, creates the venv, installs the package with
dev and notebook dependencies, registers a Jupyter kernel, and verifies that
core imports succeed. Start finds a free port (8888-8899), launches JupyterLab
in the background with PID tracking, and opens the browser directly to
`00_START_HERE.ipynb`.
Validation runs the full quality pipeline: ruff linting, mypy strict type
checking, and the pytest suite (335 tests, excluding browser UX by default).
The `--quick` flag runs only lint, type check, and unit tests.
---
## Part 6: The file map
```
autoresearch-quantum/
@ -450,8 +567,42 @@ autoresearch-quantum/
store.py JSON file store: experiments, steps, progress,
lessons, feedback, propagated specs
tests/
test_harness.py 21 tests covering every subsystem
teaching/
assess.py Widget-based quizzes, predictions, reflections
tracker.py LearningTracker: per-student progress tracking
notebooks/
00_START_HERE.ipynb Central entry point: plan selector
learning_objectives.md Per-notebook, per-section learning objectives
plan_a/ Bottom-up: 3 sequential notebooks
plan_b/ Spiral: 1 notebook, 3 passes
plan_c/ Parallel tracks + dashboard: 4 notebooks
plan_d/ Hypothesis-driven: 3 experiments
paper/
autoresearch_quantum.tex Technical paper (LaTeX, 19 pages)
compendium.tex Companion textbook (LaTeX, 36 pages)
scripts/
app.sh Consumer lifecycle manager (bootstrap/start/stop/validate)
tests/ 335 tests across 13 files
test_analysis.py Postselection & witness
test_browser_ux.py Playwright end-to-end UX
test_cli.py CLI subcommands
test_codes.py [[4,2,2]] code correctness
test_config.py YAML config loading
test_experiments.py Circuit bundle construction
test_feedback.py Lesson extraction & search rules
test_harness.py Full ratchet integration
test_notebooks.py Notebook execution & structure
test_pedagogy.py Pedagogical quality invariants (130 tests)
test_persistence.py JSON store round-trips
test_scoring.py Score functions
test_teaching.py Assessment widgets & tracker
.github/workflows/ci.yml CI: lint, type check, test matrix, notebook execution
.pre-commit-config.yaml Ruff, mypy, nbstripout, hygiene hooks
data/ Output directory (created at runtime)
default/
@ -472,12 +623,12 @@ autoresearch-quantum/
---
## Part 5: How to use it without Claude
## Part 7: How to use it without Claude
You do not need an AI to run this system or to make progress with its
output. Everything below runs in your terminal.
### 5.1 Setup
### 7.1 Setup
```bash
cd autoresearch-quantum
@ -486,7 +637,7 @@ source .venv/bin/activate
pip install -e ".[dev]"
```
### 5.2 Run a single experiment
### 7.2 Run a single experiment
```bash
python -m autoresearch_quantum run-experiment \
@ -498,7 +649,7 @@ python -m autoresearch_quantum run-experiment \
This prints a JSON result with the score, failure mode, and experiment ID.
The full record is saved to `data/default/rung_1/experiments/`.
### 5.3 Run one ratchet step
### 7.3 Run one ratchet step
```bash
python -m autoresearch_quantum run-step \
@ -510,7 +661,7 @@ them, promotes the best, and saves the step record. Run it again and it
generates *new* challengers (never repeating), with a new incumbent if one was
found.
### 5.4 Run a full rung
### 7.4 Run a full rung
```bash
python -m autoresearch_quantum run-rung \
@ -521,7 +672,7 @@ Runs up to `step_budget` steps (default 3), stopping early if patience runs
out. Produces `data/default/rung_1/lesson.md` -- read this file. It tells you
what helped, what hurt, what seems invariant, and what to test next.
### 5.5 Run the full five-rung ratchet
### 7.5 Run the full five-rung ratchet
```bash
python -m autoresearch_quantum run-ratchet \
@ -536,7 +687,7 @@ This is the full pipeline. Each rung's winner is automatically propagated to
the next rung. Each rung's lessons narrow the search space for the next.
When it finishes, you have five lesson files and a final optimised recipe.
### 5.6 Run a transfer evaluation
### 7.6 Run a transfer evaluation
```bash
python -m autoresearch_quantum run-transfer \
@ -547,7 +698,7 @@ python -m autoresearch_quantum run-transfer \
Tests a single spec across multiple backend noise models. The output tells you
the per-backend scores and the pessimistic transfer score.
### 5.7 Reading the output
### 7.7 Reading the output
After a ratchet run, the most valuable artefacts are:
@ -559,7 +710,7 @@ After a ratchet run, the most valuable artefacts are:
| `rung_N/propagated_spec.json` | The spec that was carried forward from the previous rung. Compare it with the YAML bootstrap to see what the system changed. |
| `rung_N/progress.json` | If the run was interrupted, this tells you where it left off. Just re-run the same command to resume. |
### 5.8 Making manual progress with the artefacts
### 7.8 Making manual progress with the artefacts
The system is designed so that you can interleave human intuition with
automated search:
@ -591,22 +742,27 @@ automated search:
You are now doing what the system does in `run_ratchet` -- but with human
judgement about what to explore next.
### 5.9 Running the tests
### 7.9 Running the tests
```bash
# Full validation (recommended)
bash scripts/app.sh validate
# Or directly with pytest
python -m pytest tests/ -v
```
All 21 tests should pass. They take about 13 seconds. If a test fails after
you edit a YAML config, the most likely cause is that you introduced a
dimension value that does not correspond to an implemented code path (e.g.,
`encoder_style: "rzz_lattice"` does not exist in `four_two_two.py`).
All 335 tests should pass (browser UX tests excluded by default). If a test
fails after you edit a YAML config, the most likely cause is that you
introduced a dimension value that does not correspond to an implemented code
path (e.g., `encoder_style: "rzz_lattice"` does not exist in
`four_two_two.py`).
---
## Part 6: What this system does NOT do (yet)
## Part 8: What this system does NOT do (yet)
- **It does not run on real quantum hardware by default.** The
`IBMHardwareExecutor` exists and is wired up, but `enable_hardware: false`
@ -623,8 +779,10 @@ dimension value that does not correspond to an implemented code path (e.g.,
`SearchRule` extraction, the `CompositeGenerator` budget allocation, and
the cross-rung propagation logic.
- **It does not visualise results.** There is no dashboard. The output is
JSON and Markdown. You read it, or you write a script to plot it.
- **CLI output is JSON and Markdown.** The CLI ratchet produces JSON files
and Markdown lessons. For interactive exploration, use the Plan C dashboard
notebook (`plan_c/00_dashboard.ipynb`), which provides a widget-based
interface for running experiments and viewing results.
- **It does not parallelise evaluations.** Each experiment runs sequentially.
On a machine with multiple cores, you could shard the challenger set across
@ -634,7 +792,7 @@ dimension value that does not correspond to an implemented code path (e.g.,
---
## Part 7: Architecture diagram
## Part 9: Architecture diagram
```
configs/rungs/rung1-5.yaml
@ -677,6 +835,6 @@ ratchet runs multiple rungs. The lessons tighten the circle with every pass.
---
*This document was written on 2026-04-04 to describe the system as built.
The code is the ground truth. If this document contradicts the code, the
code is correct.*
*This document was last updated on 2026-04-15 to describe the system as
built. The code is the ground truth. If this document contradicts the code,
the code is correct.*

View file

@ -1,8 +1,19 @@
# Learning Objectives Per Notebook, Per Section
# Learning Objectives --- Per Notebook, Per Section
Each objective has a Bloom level and a matched assessment type.
All four plans teach the same core material; the pedagogical approach differs.
**Entry point:** Open `00_START_HERE.ipynb` to choose your plan. Every content
notebook links back to Start Here and forward to the next notebook in the plan.
**Assessment types:**
- **MCQ** (`quiz()`) --- multiple-choice with immediate feedback
- **Predict** (`predict_choice()`) --- predict an outcome before running code
- **Reflect** (`reflect()`) --- open-ended reflection graded by keywords
- **Order** (`order()`) --- rank or sequence items
All assessments are tracked by `LearningTracker` with Bloom's taxonomy levels.
---
## Plan A — Bottom-Up (3 Sequential Notebooks)

View file

@ -913,9 +913,10 @@ re-evaluated, and the patience counter is preserved.
\label{sec:verification_claims}
% ============================================================================
The test suite contains 21 tests, each anchored to a specific architectural
claim. We present them grouped by subsystem, with the falsification condition
for each.
The full test suite contains 335 tests across 13 files, covering the research
engine, teaching layer, notebook structure, and pedagogical quality. Below we
present the 21 core research-engine tests, grouped by subsystem, with the
falsification condition for each.
\subsection{Quantum Correctness (3 tests)}
@ -1108,7 +1109,8 @@ with different \code{verification} values. The seeds must differ.
cd autoresearch-quantum
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
python -m pytest tests/ -v # 21 tests, ~13 seconds
python -m pytest tests/ -v # 335 tests
bash scripts/app.sh validate # full validation (lint + types + tests)
\end{lstlisting}
Requires Python $\geq$ 3.11 and Qiskit $\geq$ 2.3. No GPU needed.

View file

@ -116,13 +116,14 @@
\begin{center}
\begin{minipage}{0.85\textwidth}
\small\itshape
This compendium is the ``course textbook'' for the eight Jupyter notebooks
in the \textsc{autoresearch-quantum} project. It is designed to be read
before, during, or after working through the notebooks. Every concept
exercised in the notebooks is explained here with the depth and context
that a tutorial session cannot provide. No prior knowledge of quantum
error correction is assumed; familiarity with linear algebra and
complex numbers is helpful.
This compendium is the ``course textbook'' for the twelve Jupyter notebooks
(across four learning plans) in the \textsc{autoresearch-quantum} project.
Start at \texttt{00\_START\_HERE.ipynb} to choose your plan. This document
is designed to be read before, during, or after working through the
notebooks. Every concept exercised in the notebooks is explained here
with the depth and context that a tutorial session cannot provide. No
prior knowledge of quantum error correction is assumed; familiarity with
linear algebra and complex numbers is helpful.
\end{minipage}
\end{center}
\vspace{2cm}
@ -1419,13 +1420,13 @@ expectation value is the average over many measurements.
\textbf{Notebook Topic} & \textbf{Notebooks} & \textbf{Compendium} \\
\midrule
T-state definition \& Bloch sphere &
A/01~\S1--2, B~\S2.1, C/A~\S1--3 &
A/01~\S1--2, B~\S2.1, C/A~\S1--3, D/1~\S1 &
\cref{ch:magic} \\
Why encode (no-cloning, distance) &
A/01~\S3, C/A~\S1 &
A/01~\S3, C/A~\S1, D/1~\S2 &
\cref{ch:code}~\S1--2 \\
Stabilisers \& codespace &
A/01~\S6, B~\S2.3, C/A~\S4 &
A/01~\S6, B~\S2.3, C/A~\S4, D/1~\S3 &
\cref{ch:code}~\S3 \\
Logical operators &
A/01~\S6, C/A~\S5 &
@ -1434,22 +1435,22 @@ Encoder circuits &
A/01~\S4--5, C/A~\S6 &
\cref{sec:encoder} \\
Error detection &
A/01~\S7, C/A~\S8 &
A/01~\S7, C/A~\S8, D/1~\S4 &
\cref{sec:errors} \\
Ancilla \& syndrome extraction &
A/01~\S9, C/A~\S7 &
\cref{ch:measurement}~\S2 \\
Postselection &
A/01~\S11, A/02~\S3, B~\S2.5 &
A/01~\S11, A/02~\S3, B~\S2.5, D/1~\S6 &
\cref{sec:postselection} \\
Noise models \& transpilation &
A/02~\S2, C/B~\S1--3 &
A/02~\S2, C/B~\S1--3, D/2~\S1 &
\cref{ch:noise} \\
Magic witness formula &
A/02~\S5, B~\S2.7, C/A~\S9 &
A/02~\S5, B~\S2.7, C/A~\S9, D/1~\S5 &
\cref{ch:witness} \\
Scoring formula &
A/02~\S7, B~\S2.9, C/B~\S8 &
A/02~\S7, B~\S2.9, C/B~\S8, D/2~\S2 &
\cref{ch:scoring} \\
Factory throughput &
A/02~\S10, C/B~\S9 &
@ -1458,20 +1459,23 @@ Failure modes &
A/02~\S9, C/B~\S7 &
\cref{sec:failures} \\
Ratchet mechanism &
A/03~\S1--4, B~\S2.10--12, C/C~\S1--7 &
A/03~\S1--4, B~\S2.10--12, C/C~\S1--7, D/3~\S1--2 &
\cref{ch:ratchet}~\S1--3 \\
Search strategies &
A/03~\S7, B~\S3.5, C/C~\S3--4 &
A/03~\S7, B~\S3.5, C/C~\S3--4, D/3~\S2 &
\cref{sec:strategies} \\
Lesson extraction \& rules &
A/03~\S8, B~\S3.6, C/C~\S8--9 &
A/03~\S8, B~\S3.6, C/C~\S8--9, D/3~\S4 &
\cref{sec:lessons} \\
Narrowing \& propagation &
B~\S3.7, C/C~\S10--11 &
\cref{ch:ratchet}~\S5--6 \\
Transfer evaluation &
A/03~\S10, B~\S3.8, C/C~\S12 &
A/03~\S10, B~\S3.8, C/C~\S12, D/3~\S5 &
\cref{ch:ratchet}~\S7 \\
Parameter sweep \& optimisation &
A/02~\S8, D/2~\S3, D/3~\S3 &
\cref{ch:scoring} \\
\bottomrule
\end{tabular}
\end{center}