Karpathy-style autoresearch ratchet for encoded magic-state preparation on [[4,2,2]] quantum error-detecting code
Find a file
saymrwulf a2d9120960 Add OVERVIEW.md for each plan: thematic summaries of the [[4,2,2]] magic state pipeline
Each overview distills the plan's building blocks into a narrative
centered on magic state creation as a tunable, optimizable process
for Toffoli gate scalability.
2026-04-23 19:07:54 +02:00
configs/rungs Initial commit: autoresearch-quantum — automated magic-state preparation ratchet 2026-04-05 12:37:39 +02:00
notebooks Add OVERVIEW.md for each plan: thematic summaries of the [[4,2,2]] magic state pipeline 2026-04-23 19:07:54 +02:00
paper Sync all documentation with current project ground truth 2026-04-15 20:55:02 +02:00
scripts Clarify background/foreground behavior in app.sh start output 2026-04-16 09:28:10 +02:00
src/autoresearch_quantum Add professional toolchain: mypy strict, CI pipeline, Playwright UX tests, pedagogy validation 2026-04-15 20:00:19 +02:00
tests Harden Jupyter lifecycle and enhance E2E browser UX tests 2026-04-15 21:20:34 +02:00
.gitignore Remove GitHub Actions CI — validate locally with app.sh validate 2026-04-16 10:23:11 +02:00
pyproject.toml Remove pre-commit hooks, focus on local validation via app.sh 2026-04-16 10:27:18 +02:00
README.md Remove pre-commit hooks, focus on local validation via app.sh 2026-04-16 10:27:18 +02:00
THE_STORY.md Remove pre-commit hooks, focus on local validation via app.sh 2026-04-16 10:27:18 +02:00

Autoresearch Quantum

autoresearch-quantum is a Python research harness for a Karpathy-style autoresearch ratchet in quantum experiments, combined with a four-plan interactive coursework built on Jupyter notebooks.

The system has two layers:

  1. Research engine --- an automated loop that discovers the best way to prepare encoded magic states on the 4,2,2 quantum error-detecting code. It proposes, evaluates, compares, learns, and repeats without human intervention.

  2. Teaching layer --- 12 Jupyter notebooks across 4 learning plans, each teaching the same core material through a different pedagogical lens: sequential (Plan A), spiral (Plan B), parallel tracks (Plan C), and hypothesis-driven experiments (Plan D). Every notebook includes interactive widget-based assessments, per-student progress tracking, and Bloom's taxonomy-aligned exercises.

No IBM account or API key is needed --- everything runs locally with the Aer simulator.

Project Tree

autoresearch-quantum/
├── configs/rungs/
│   ├── rung1.yaml              Baseline: what recipe works?
│   ├── rung2.yaml              Stability under noise variation
│   ├── rung3.yaml              Transfer across backends
│   ├── rung4.yaml              Factory throughput / cost
│   └── rung5.yaml              Rosenfeld direction
├── src/autoresearch_quantum/
│   ├── cli.py                  CLI entry point
│   ├── config.py               YAML config loader
│   ├── models.py               All data structures
│   ├── codes/
│   │   └── four_two_two.py     [[4,2,2]] stabilisers, encoder, seed gates
│   ├── experiments/
│   │   └── encoded_magic_state.py  Circuit bundle builder
│   ├── execution/
│   │   ├── analysis.py         Postselection, witness, stability
│   │   ├── backends.py         Backend resolution
│   │   ├── hardware.py         IBM hardware executor
│   │   ├── local.py            Aer noise simulation executor
│   │   ├── transfer.py         Cross-backend transfer evaluator
│   │   └── transpile.py        Transpilation utilities
│   ├── lessons/
│   │   ├── extractor.py        Human-readable lesson extraction
│   │   └── feedback.py         Machine-readable rules + search narrowing
│   ├── persistence/
│   │   └── store.py            JSON file store with resumability
│   ├── ratchet/
│   │   └── runner.py           AutoresearchHarness orchestrator
│   ├── scoring/
│   │   └── score.py            WAC + factory throughput scorers
│   ├── search/
│   │   ├── challengers.py      Neighbour generation with dedup
│   │   └── strategies.py       NeighborWalk, RandomCombo, LessonGuided
│   └── teaching/
│       ├── assess.py           Widget-based quizzes, predictions, reflections
│       └── tracker.py          LearningTracker --- per-student progress tracking
├── paper/
│   ├── autoresearch_quantum.tex   Full technical paper (LaTeX)
│   ├── autoresearch_quantum.pdf   Compiled PDF (19 pages)
│   ├── compendium.tex             Companion textbook (LaTeX)
│   └── compendium.pdf             Compiled PDF (36 pages)
├── notebooks/
│   ├── 00_START_HERE.ipynb     Central entry point --- plan selector
│   ├── learning_objectives.md  Per-notebook, per-section learning objectives
│   ├── plan_a/                 Bottom-up: 3 sequential notebooks
│   │   ├── 01_encoded_magic_state.ipynb
│   │   ├── 02_measuring_progress.ipynb
│   │   └── 03_the_ratchet.ipynb
│   ├── plan_b/                 Spiral: 1 notebook, three passes
│   │   └── spiral_notebook.ipynb
│   ├── plan_c/                 Parallel tracks + dashboard
│   │   ├── 00_dashboard.ipynb
│   │   ├── track_a_physics.ipynb
│   │   ├── track_b_engineering.ipynb
│   │   └── track_c_search.ipynb
│   └── plan_d/                 Three claim-driven experiments
│       ├── experiment_1_protection.ipynb
│       ├── experiment_2_noise.ipynb
│       └── experiment_3_optimisation.ipynb
├── scripts/
│   └── app.sh                  Consumer lifecycle manager
├── tests/                      335 tests across 13 files
│   ├── test_analysis.py        Postselection & witness tests
│   ├── test_browser_ux.py      Playwright end-to-end UX tests
│   ├── test_cli.py             CLI subcommand tests
│   ├── test_codes.py           [[4,2,2]] code correctness
│   ├── test_config.py          YAML config loading
│   ├── test_experiments.py     Circuit bundle construction
│   ├── test_feedback.py        Lesson extraction & search rules
│   ├── test_harness.py         Full ratchet integration tests
│   ├── test_notebooks.py       Notebook execution & structure
│   ├── test_pedagogy.py        Pedagogical quality invariants
│   ├── test_persistence.py     JSON store round-trips
│   ├── test_scoring.py         Score function correctness
│   └── test_teaching.py        Assessment widget & tracker tests
├── THE_STORY.md                Narrative documentation (system design)
├── pyproject.toml              Build config, dependencies, tool settings
└── README.md

Jupyter Lifecycle

This project follows the JupyterManager lifecycle specification. scripts/app.sh provides isolated Jupyter directories, auto port allocation (8888--8899), PID tracking, orphan detection, and graceful stop. The cross-project jupyter-hub CLI can discover and manage this project alongside other Jupyter-enabled projects on the same machine.

Quick Start

The fastest way to get running:

# Clone and bootstrap (creates venv, installs everything, registers Jupyter kernel)
git clone https://github.com/saymrwulf/autoresearch-quantum.git
cd autoresearch-quantum
bash scripts/app.sh bootstrap

# Launch JupyterLab (opens 00_START_HERE.ipynb in your browser)
bash scripts/app.sh start

The app.sh lifecycle manager handles the entire consumer experience:

Command What it does
bash scripts/app.sh bootstrap Create venv, install deps, register Jupyter kernel, verify imports
bash scripts/app.sh start Launch JupyterLab in background (survives terminal close; stop with app.sh stop)
bash scripts/app.sh start --no-open Launch in background without opening browser
bash scripts/app.sh start --foreground Run in foreground (Ctrl-C or closing terminal stops it)
bash scripts/app.sh start --port 9999 Use a specific port
bash scripts/app.sh stop Stop JupyterLab (graceful SIGTERM, SIGKILL fallback)
bash scripts/app.sh restart Stop + start
bash scripts/app.sh status Show venv, server, ports, orphan detection
bash scripts/app.sh validate Run full validation: ruff + mypy + pytest
bash scripts/app.sh validate --quick Lint + type check + unit tests only
bash scripts/app.sh logs [-f] Show or follow JupyterLab output
bash scripts/app.sh reset Delete learner progress files
bash scripts/app.sh reset-state Reset Jupyter runtime + UI state

Manual installation

If you prefer manual setup:

python3 -m venv .venv
. .venv/bin/activate
pip install -e '.[dev,notebooks]'

For the optional IBM hardware path:

pip install -e '.[hardware,dev,notebooks]'

Jupyter Notebooks --- Learning Plans

The notebooks/ folder contains 12 notebooks across 4 independent learning plans, all accessible from a central entry point: 00_START_HERE.ipynb.

Each plan teaches the same core material (encoded magic-state preparation, measurement, and the ratchet optimiser) through a different didactic lens. Every content notebook includes:

  • Interactive assessments --- multiple-choice quizzes, predictions, reflections, and ordering exercises (ipywidgets)
  • Per-student progress tracking --- LearningTracker records scores, Bloom's levels, and time per assessment
  • Navigation links --- forward/backward links between notebooks, cross-plan suggestions, and back-links to Start Here
  • Key Insight callouts --- highlighted takeaways for important concepts
  • Checkpoint summaries --- mid-notebook progress reviews in longer notebooks

Plan A --- Bottom-Up (3 sequential notebooks)

# File What you learn
1 plan_a/01_encoded_magic_state.ipynb T-state, 4,2,2 encoder, stabilisers, error detection, postselection
2 plan_a/02_measuring_progress.ipynb Noise, logical operators, magic witness, scoring formula, parameter sweeps
3 plan_a/03_the_ratchet.ipynb Incumbent/challenger model, ratchet steps, lessons, cross-rung propagation

Start with notebook 01 and work through in order. Run each cell top-to-bottom (Shift+Enter).

Plan B --- Spiral (1 notebook, three passes)

File What you learn
plan_b/spiral_notebook.ipynb Pass 1: 5-min demo (black-box). Pass 2: Open the box (circuits, stabilisers, scoring). Pass 3: Make it your own (modify parameters, run experiments).

One notebook, 78 cells. Each pass revisits the same system at a deeper level.

Plan C --- Parallel Tracks (4 notebooks)

File Focus
plan_c/00_dashboard.ipynb Interactive dashboard (ipywidgets) --- run experiments from dropdowns
plan_c/track_a_physics.ipynb Pure quantum mechanics: Eastin-Knill, Bloch sphere, stabiliser algebra
plan_c/track_b_engineering.ipynb Noise models, transpilation, cost model, failure modes
plan_c/track_c_search.ipynb Parameter space, search strategies, lesson extraction, cross-rung transfer

Start with the dashboard for an overview, then dive into whichever track interests you. The three tracks are independent and can be read in any order.

Plan D --- Three Claim-Driven Experiments

# File Hypothesis
1 plan_d/experiment_1_protection.ipynb The 4,2,2 code can protect a magic state: W=1.0, all errors detected
2 plan_d/experiment_2_noise.ipynb Noise degrades quality but parameter choice matters >2x
3 plan_d/experiment_3_optimisation.ipynb A ratchet can learn to optimise and its knowledge transfers

Each notebook follows: Hypothesis -> Claim -> Experiment -> Proof -> Next Hypothesis.

Troubleshooting

Problem Fix
ModuleNotFoundError: autoresearch_quantum Run bash scripts/app.sh bootstrap or pip install -e '.[notebooks]'
ModuleNotFoundError: ipywidgets Run pip install ipywidgets --- needed for interactive assessments
Plots don't render Make sure %matplotlib inline is in the first code cell (it already is)
Kernel not found In JupyterLab, select Kernel > Change Kernel and pick the .venv Python

Scientific Framing

What is optimized

The harness optimizes an experiment, not just a circuit. A spec includes:

  • logical magic-seed construction
  • encoder realization
  • verification strategy
  • postselection rule
  • ancilla strategy
  • transpilation choices
  • backend target and noise proxy
  • shot and repeat allocation

What is measured

The default score is:

score = (usable_magic_quality * acceptance_rate) / total_cost

with a configurable usable_magic_quality assembled from:

  • noisy encoded fidelity proxy
  • logical magic witness
  • codespace survival / postselection success
  • stability under repeated noisy evaluation
  • spectator logical alignment

and a configurable total_cost assembled from:

  • two-qubit gate count
  • transpiled depth
  • total shots consumed
  • runtime proxy
  • hardware queue proxy

Cheap tier vs expensive tier

Cheap tier:

  • backend-aware transpilation
  • noisy Aer evaluation
  • density-matrix fidelity when a backend-derived noise model is available
  • repeated local runs for stability scoring

Expensive tier:

  • IBM Runtime execution through SamplerV2
  • only used when enabled and when cheap-tier promotion thresholds are met
  • isolated behind hardware.py

Built-In [[4,2,2]] Experiment

The built-in experiment prepares an encoded logical T-state on one logical qubit of the [[4,2,2]] code while keeping the spectator logical qubit in |0>. The code utilities live in four_two_two.py.

The harness evaluates:

  • acceptance under optional ZZZZ and XXXX stabilizer checks
  • logical X and Y witnesses for the encoded magic state
  • spectator logical Z
  • compiled cost after transpilation to a chosen backend target

This keeps the core scientific distinction explicit:

  • a circuit can be locally good for [[4,2,2]]
  • a rule is only valuable if it keeps helping across new backends or new rungs

How To Run (CLI)

1. Run a single local experiment

autoresearch-quantum run-experiment \
  --config configs/rungs/rung1.yaml \
  --store-dir data/demo

Override individual experiment fields:

autoresearch-quantum run-experiment \
  --config configs/rungs/rung1.yaml \
  --store-dir data/demo \
  --set verification=z_only \
  --set postselection=z_only \
  --set ancilla_strategy=reused_single

2. Run one ratchet step

autoresearch-quantum run-step \
  --config configs/rungs/rung1.yaml \
  --store-dir data/demo

This will:

  • load or bootstrap the incumbent
  • generate neighbor challengers from the rung search space
  • evaluate every challenger on the cheap tier
  • promote only margin-beating challengers if hardware is enabled
  • log the step and update the incumbent pointer if a challenger wins

3. Run one full rung

autoresearch-quantum run-rung \
  --config configs/rungs/rung1.yaml \
  --store-dir data/demo

Artifacts are persisted under data/demo/rung_<n>/:

  • experiments/*.json
  • ratchet_steps/*.json
  • incumbent.json
  • lesson.json
  • lesson.md

4. Run a multi-rung ratchet campaign

autoresearch-quantum run-ratchet \
  --config configs/rungs/rung1.yaml \
  --config configs/rungs/rung2.yaml \
  --config configs/rungs/rung3.yaml \
  --config configs/rungs/rung4.yaml \
  --store-dir data/campaign

5. Run an optional hardware-backed confirmation

First install the hardware extra and make IBM credentials available:

pip install -e '.[hardware]'
export QISKIT_IBM_TOKEN=...

Then enable the hardware tier in the rung config by setting tier_policy.enable_hardware: true and optionally hardware.backend_name: ibm_brisbane.

autoresearch-quantum run-step \
  --config configs/rungs/rung1.yaml \
  --store-dir data/hardware \
  --hardware

Only challengers that beat the incumbent cheap-tier score by tier_policy.cheap_margin are promoted.

Testing & Validation

The project has 335 tests across 13 test files covering every layer:

Test file What it validates
test_codes.py 4,2,2 stabilisers, encoder, seed gates
test_experiments.py Circuit bundle construction
test_analysis.py Postselection, witness, stability metrics
test_scoring.py WAC and factory throughput score functions
test_feedback.py Lesson extraction, search rules, space narrowing
test_harness.py Full ratchet integration (rung, multi-rung, resumability)
test_persistence.py JSON store round-trips
test_cli.py CLI subcommands
test_config.py YAML config loading
test_teaching.py Assessment widgets, LearningTracker
test_notebooks.py Notebook execution via nbclient, structure validation
test_pedagogy.py Pedagogical quality: prose density, assessment density, Bloom's coverage, section structure, tracker integration, key insights, cross-plan consistency
test_browser_ux.py Playwright end-to-end: JupyterLab launch, notebook rendering, navigation links, widget rendering

Running tests

# Standard: all tests except browser UX (default)
bash scripts/app.sh validate

# Quick: lint + type check + unit tests only
bash scripts/app.sh validate --quick

# Direct pytest (browser tests excluded by default via marker)
.venv/bin/python -m pytest tests/ -v

# Browser UX tests (requires playwright)
pip install playwright && python -m playwright install chromium
.venv/bin/python -m pytest tests/test_browser_ux.py -m browser -v

Static analysis

app.sh validate runs all three automatically:

  • Ruff --- linting and formatting (E, F, W, I, UP, B, SIM rule sets)
  • mypy --- strict mode type checking across all source files
  • nbstripout --- strips notebook outputs (run manually: .venv/bin/nbstripout notebooks/**/*.ipynb)

Extending The Ladder

The intended progression is:

  1. rung1.yaml --- baseline [[4,2,2]] encoded magic-state preparation
  2. rung2.yaml --- same code with stronger stability and backend-awareness
  3. rung3.yaml --- transfer across backend families
  4. rung4.yaml --- factory-style cost pressure

To add a new rung:

  • create a new YAML in configs/rungs/
  • narrow the challenger space to the specific next question
  • tune cheap and expensive score weights for that rung
  • keep the lesson document as the real product

To add a new experiment family:

  • implement a new builder under src/autoresearch_quantum/experiments/
  • define the target state, witness operators, verification flow, and logging metadata
  • route the ratchet to that experiment family through config or a new CLI selector

Notes On Interpretation

This harness is explicit about proxy vs confirmation:

  • cheap-tier fidelity and witness numbers are local proxies
  • hardware runs are scarce and should be treated as confirmation
  • the most important artifact of each rung is the lesson, not just the incumbent ID

That is the intended ratchet: better experiment plus better search rule.