All quality checks (ruff, mypy, tests) run through app.sh validate. No external CI or hook infrastructure needed. |
||
|---|---|---|
| configs/rungs | ||
| notebooks | ||
| paper | ||
| scripts | ||
| src/autoresearch_quantum | ||
| tests | ||
| .gitignore | ||
| pyproject.toml | ||
| README.md | ||
| THE_STORY.md | ||
Autoresearch Quantum
autoresearch-quantum is a Python research harness for a Karpathy-style autoresearch ratchet in quantum experiments, combined with a four-plan interactive coursework built on Jupyter notebooks.
The system has two layers:
-
Research engine --- an automated loop that discovers the best way to prepare encoded magic states on the 4,2,2 quantum error-detecting code. It proposes, evaluates, compares, learns, and repeats without human intervention.
-
Teaching layer --- 12 Jupyter notebooks across 4 learning plans, each teaching the same core material through a different pedagogical lens: sequential (Plan A), spiral (Plan B), parallel tracks (Plan C), and hypothesis-driven experiments (Plan D). Every notebook includes interactive widget-based assessments, per-student progress tracking, and Bloom's taxonomy-aligned exercises.
No IBM account or API key is needed --- everything runs locally with the Aer simulator.
Project Tree
autoresearch-quantum/
├── configs/rungs/
│ ├── rung1.yaml Baseline: what recipe works?
│ ├── rung2.yaml Stability under noise variation
│ ├── rung3.yaml Transfer across backends
│ ├── rung4.yaml Factory throughput / cost
│ └── rung5.yaml Rosenfeld direction
├── src/autoresearch_quantum/
│ ├── cli.py CLI entry point
│ ├── config.py YAML config loader
│ ├── models.py All data structures
│ ├── codes/
│ │ └── four_two_two.py [[4,2,2]] stabilisers, encoder, seed gates
│ ├── experiments/
│ │ └── encoded_magic_state.py Circuit bundle builder
│ ├── execution/
│ │ ├── analysis.py Postselection, witness, stability
│ │ ├── backends.py Backend resolution
│ │ ├── hardware.py IBM hardware executor
│ │ ├── local.py Aer noise simulation executor
│ │ ├── transfer.py Cross-backend transfer evaluator
│ │ └── transpile.py Transpilation utilities
│ ├── lessons/
│ │ ├── extractor.py Human-readable lesson extraction
│ │ └── feedback.py Machine-readable rules + search narrowing
│ ├── persistence/
│ │ └── store.py JSON file store with resumability
│ ├── ratchet/
│ │ └── runner.py AutoresearchHarness orchestrator
│ ├── scoring/
│ │ └── score.py WAC + factory throughput scorers
│ ├── search/
│ │ ├── challengers.py Neighbour generation with dedup
│ │ └── strategies.py NeighborWalk, RandomCombo, LessonGuided
│ └── teaching/
│ ├── assess.py Widget-based quizzes, predictions, reflections
│ └── tracker.py LearningTracker --- per-student progress tracking
├── paper/
│ ├── autoresearch_quantum.tex Full technical paper (LaTeX)
│ ├── autoresearch_quantum.pdf Compiled PDF (19 pages)
│ ├── compendium.tex Companion textbook (LaTeX)
│ └── compendium.pdf Compiled PDF (36 pages)
├── notebooks/
│ ├── 00_START_HERE.ipynb Central entry point --- plan selector
│ ├── learning_objectives.md Per-notebook, per-section learning objectives
│ ├── plan_a/ Bottom-up: 3 sequential notebooks
│ │ ├── 01_encoded_magic_state.ipynb
│ │ ├── 02_measuring_progress.ipynb
│ │ └── 03_the_ratchet.ipynb
│ ├── plan_b/ Spiral: 1 notebook, three passes
│ │ └── spiral_notebook.ipynb
│ ├── plan_c/ Parallel tracks + dashboard
│ │ ├── 00_dashboard.ipynb
│ │ ├── track_a_physics.ipynb
│ │ ├── track_b_engineering.ipynb
│ │ └── track_c_search.ipynb
│ └── plan_d/ Three claim-driven experiments
│ ├── experiment_1_protection.ipynb
│ ├── experiment_2_noise.ipynb
│ └── experiment_3_optimisation.ipynb
├── scripts/
│ └── app.sh Consumer lifecycle manager
├── tests/ 335 tests across 13 files
│ ├── test_analysis.py Postselection & witness tests
│ ├── test_browser_ux.py Playwright end-to-end UX tests
│ ├── test_cli.py CLI subcommand tests
│ ├── test_codes.py [[4,2,2]] code correctness
│ ├── test_config.py YAML config loading
│ ├── test_experiments.py Circuit bundle construction
│ ├── test_feedback.py Lesson extraction & search rules
│ ├── test_harness.py Full ratchet integration tests
│ ├── test_notebooks.py Notebook execution & structure
│ ├── test_pedagogy.py Pedagogical quality invariants
│ ├── test_persistence.py JSON store round-trips
│ ├── test_scoring.py Score function correctness
│ └── test_teaching.py Assessment widget & tracker tests
├── THE_STORY.md Narrative documentation (system design)
├── pyproject.toml Build config, dependencies, tool settings
└── README.md
Jupyter Lifecycle
This project follows the JupyterManager lifecycle specification. scripts/app.sh provides isolated Jupyter directories, auto port allocation (8888--8899), PID tracking, orphan detection, and graceful stop. The cross-project jupyter-hub CLI can discover and manage this project alongside other Jupyter-enabled projects on the same machine.
Quick Start
The fastest way to get running:
# Clone and bootstrap (creates venv, installs everything, registers Jupyter kernel)
git clone https://github.com/saymrwulf/autoresearch-quantum.git
cd autoresearch-quantum
bash scripts/app.sh bootstrap
# Launch JupyterLab (opens 00_START_HERE.ipynb in your browser)
bash scripts/app.sh start
The app.sh lifecycle manager handles the entire consumer experience:
| Command | What it does |
|---|---|
bash scripts/app.sh bootstrap |
Create venv, install deps, register Jupyter kernel, verify imports |
bash scripts/app.sh start |
Launch JupyterLab in background (survives terminal close; stop with app.sh stop) |
bash scripts/app.sh start --no-open |
Launch in background without opening browser |
bash scripts/app.sh start --foreground |
Run in foreground (Ctrl-C or closing terminal stops it) |
bash scripts/app.sh start --port 9999 |
Use a specific port |
bash scripts/app.sh stop |
Stop JupyterLab (graceful SIGTERM, SIGKILL fallback) |
bash scripts/app.sh restart |
Stop + start |
bash scripts/app.sh status |
Show venv, server, ports, orphan detection |
bash scripts/app.sh validate |
Run full validation: ruff + mypy + pytest |
bash scripts/app.sh validate --quick |
Lint + type check + unit tests only |
bash scripts/app.sh logs [-f] |
Show or follow JupyterLab output |
bash scripts/app.sh reset |
Delete learner progress files |
bash scripts/app.sh reset-state |
Reset Jupyter runtime + UI state |
Manual installation
If you prefer manual setup:
python3 -m venv .venv
. .venv/bin/activate
pip install -e '.[dev,notebooks]'
For the optional IBM hardware path:
pip install -e '.[hardware,dev,notebooks]'
Jupyter Notebooks --- Learning Plans
The notebooks/ folder contains 12 notebooks across 4 independent learning plans, all accessible from a central entry point: 00_START_HERE.ipynb.
Each plan teaches the same core material (encoded magic-state preparation, measurement, and the ratchet optimiser) through a different didactic lens. Every content notebook includes:
- Interactive assessments --- multiple-choice quizzes, predictions, reflections, and ordering exercises (ipywidgets)
- Per-student progress tracking ---
LearningTrackerrecords scores, Bloom's levels, and time per assessment - Navigation links --- forward/backward links between notebooks, cross-plan suggestions, and back-links to Start Here
- Key Insight callouts --- highlighted takeaways for important concepts
- Checkpoint summaries --- mid-notebook progress reviews in longer notebooks
Plan A --- Bottom-Up (3 sequential notebooks)
| # | File | What you learn |
|---|---|---|
| 1 | plan_a/01_encoded_magic_state.ipynb |
T-state, 4,2,2 encoder, stabilisers, error detection, postselection |
| 2 | plan_a/02_measuring_progress.ipynb |
Noise, logical operators, magic witness, scoring formula, parameter sweeps |
| 3 | plan_a/03_the_ratchet.ipynb |
Incumbent/challenger model, ratchet steps, lessons, cross-rung propagation |
Start with notebook 01 and work through in order. Run each cell top-to-bottom (Shift+Enter).
Plan B --- Spiral (1 notebook, three passes)
| File | What you learn |
|---|---|
plan_b/spiral_notebook.ipynb |
Pass 1: 5-min demo (black-box). Pass 2: Open the box (circuits, stabilisers, scoring). Pass 3: Make it your own (modify parameters, run experiments). |
One notebook, 78 cells. Each pass revisits the same system at a deeper level.
Plan C --- Parallel Tracks (4 notebooks)
| File | Focus |
|---|---|
plan_c/00_dashboard.ipynb |
Interactive dashboard (ipywidgets) --- run experiments from dropdowns |
plan_c/track_a_physics.ipynb |
Pure quantum mechanics: Eastin-Knill, Bloch sphere, stabiliser algebra |
plan_c/track_b_engineering.ipynb |
Noise models, transpilation, cost model, failure modes |
plan_c/track_c_search.ipynb |
Parameter space, search strategies, lesson extraction, cross-rung transfer |
Start with the dashboard for an overview, then dive into whichever track interests you. The three tracks are independent and can be read in any order.
Plan D --- Three Claim-Driven Experiments
| # | File | Hypothesis |
|---|---|---|
| 1 | plan_d/experiment_1_protection.ipynb |
The 4,2,2 code can protect a magic state: W=1.0, all errors detected |
| 2 | plan_d/experiment_2_noise.ipynb |
Noise degrades quality but parameter choice matters >2x |
| 3 | plan_d/experiment_3_optimisation.ipynb |
A ratchet can learn to optimise and its knowledge transfers |
Each notebook follows: Hypothesis -> Claim -> Experiment -> Proof -> Next Hypothesis.
Troubleshooting
| Problem | Fix |
|---|---|
ModuleNotFoundError: autoresearch_quantum |
Run bash scripts/app.sh bootstrap or pip install -e '.[notebooks]' |
ModuleNotFoundError: ipywidgets |
Run pip install ipywidgets --- needed for interactive assessments |
| Plots don't render | Make sure %matplotlib inline is in the first code cell (it already is) |
| Kernel not found | In JupyterLab, select Kernel > Change Kernel and pick the .venv Python |
Scientific Framing
What is optimized
The harness optimizes an experiment, not just a circuit. A spec includes:
- logical magic-seed construction
- encoder realization
- verification strategy
- postselection rule
- ancilla strategy
- transpilation choices
- backend target and noise proxy
- shot and repeat allocation
What is measured
The default score is:
score = (usable_magic_quality * acceptance_rate) / total_cost
with a configurable usable_magic_quality assembled from:
- noisy encoded fidelity proxy
- logical magic witness
- codespace survival / postselection success
- stability under repeated noisy evaluation
- spectator logical alignment
and a configurable total_cost assembled from:
- two-qubit gate count
- transpiled depth
- total shots consumed
- runtime proxy
- hardware queue proxy
Cheap tier vs expensive tier
Cheap tier:
- backend-aware transpilation
- noisy Aer evaluation
- density-matrix fidelity when a backend-derived noise model is available
- repeated local runs for stability scoring
Expensive tier:
- IBM Runtime execution through
SamplerV2 - only used when enabled and when cheap-tier promotion thresholds are met
- isolated behind
hardware.py
Built-In [[4,2,2]] Experiment
The built-in experiment prepares an encoded logical T-state on one logical qubit of the [[4,2,2]] code while keeping the spectator logical qubit in |0>. The code utilities live in four_two_two.py.
The harness evaluates:
- acceptance under optional
ZZZZandXXXXstabilizer checks - logical
XandYwitnesses for the encoded magic state - spectator logical
Z - compiled cost after transpilation to a chosen backend target
This keeps the core scientific distinction explicit:
- a circuit can be locally good for
[[4,2,2]] - a rule is only valuable if it keeps helping across new backends or new rungs
How To Run (CLI)
1. Run a single local experiment
autoresearch-quantum run-experiment \
--config configs/rungs/rung1.yaml \
--store-dir data/demo
Override individual experiment fields:
autoresearch-quantum run-experiment \
--config configs/rungs/rung1.yaml \
--store-dir data/demo \
--set verification=z_only \
--set postselection=z_only \
--set ancilla_strategy=reused_single
2. Run one ratchet step
autoresearch-quantum run-step \
--config configs/rungs/rung1.yaml \
--store-dir data/demo
This will:
- load or bootstrap the incumbent
- generate neighbor challengers from the rung search space
- evaluate every challenger on the cheap tier
- promote only margin-beating challengers if hardware is enabled
- log the step and update the incumbent pointer if a challenger wins
3. Run one full rung
autoresearch-quantum run-rung \
--config configs/rungs/rung1.yaml \
--store-dir data/demo
Artifacts are persisted under data/demo/rung_<n>/:
experiments/*.jsonratchet_steps/*.jsonincumbent.jsonlesson.jsonlesson.md
4. Run a multi-rung ratchet campaign
autoresearch-quantum run-ratchet \
--config configs/rungs/rung1.yaml \
--config configs/rungs/rung2.yaml \
--config configs/rungs/rung3.yaml \
--config configs/rungs/rung4.yaml \
--store-dir data/campaign
5. Run an optional hardware-backed confirmation
First install the hardware extra and make IBM credentials available:
pip install -e '.[hardware]'
export QISKIT_IBM_TOKEN=...
Then enable the hardware tier in the rung config by setting tier_policy.enable_hardware: true and optionally hardware.backend_name: ibm_brisbane.
autoresearch-quantum run-step \
--config configs/rungs/rung1.yaml \
--store-dir data/hardware \
--hardware
Only challengers that beat the incumbent cheap-tier score by tier_policy.cheap_margin are promoted.
Testing & Validation
The project has 335 tests across 13 test files covering every layer:
| Test file | What it validates |
|---|---|
test_codes.py |
4,2,2 stabilisers, encoder, seed gates |
test_experiments.py |
Circuit bundle construction |
test_analysis.py |
Postselection, witness, stability metrics |
test_scoring.py |
WAC and factory throughput score functions |
test_feedback.py |
Lesson extraction, search rules, space narrowing |
test_harness.py |
Full ratchet integration (rung, multi-rung, resumability) |
test_persistence.py |
JSON store round-trips |
test_cli.py |
CLI subcommands |
test_config.py |
YAML config loading |
test_teaching.py |
Assessment widgets, LearningTracker |
test_notebooks.py |
Notebook execution via nbclient, structure validation |
test_pedagogy.py |
Pedagogical quality: prose density, assessment density, Bloom's coverage, section structure, tracker integration, key insights, cross-plan consistency |
test_browser_ux.py |
Playwright end-to-end: JupyterLab launch, notebook rendering, navigation links, widget rendering |
Running tests
# Standard: all tests except browser UX (default)
bash scripts/app.sh validate
# Quick: lint + type check + unit tests only
bash scripts/app.sh validate --quick
# Direct pytest (browser tests excluded by default via marker)
.venv/bin/python -m pytest tests/ -v
# Browser UX tests (requires playwright)
pip install playwright && python -m playwright install chromium
.venv/bin/python -m pytest tests/test_browser_ux.py -m browser -v
Static analysis
app.sh validate runs all three automatically:
- Ruff --- linting and formatting (E, F, W, I, UP, B, SIM rule sets)
- mypy --- strict mode type checking across all source files
- nbstripout --- strips notebook outputs (run manually:
.venv/bin/nbstripout notebooks/**/*.ipynb)
Extending The Ladder
The intended progression is:
rung1.yaml--- baseline[[4,2,2]]encoded magic-state preparationrung2.yaml--- same code with stronger stability and backend-awarenessrung3.yaml--- transfer across backend familiesrung4.yaml--- factory-style cost pressure
To add a new rung:
- create a new YAML in
configs/rungs/ - narrow the challenger space to the specific next question
- tune cheap and expensive score weights for that rung
- keep the lesson document as the real product
To add a new experiment family:
- implement a new builder under
src/autoresearch_quantum/experiments/ - define the target state, witness operators, verification flow, and logging metadata
- route the ratchet to that experiment family through config or a new CLI selector
Notes On Interpretation
This harness is explicit about proxy vs confirmation:
- cheap-tier fidelity and witness numbers are local proxies
- hardware runs are scarce and should be treated as confirmation
- the most important artifact of each rung is the lesson, not just the incumbent ID
That is the intended ratchet: better experiment plus better search rule.