The previous driver imposed a synchronous turn-counted clock that the
Crisis paper explicitly forbids — Crisis is supposed to work in
asynchronous P2P networks, with any synchronicity being virtual and
derived inside the consensus algorithm from the DAG structure, not
imposed externally by a coordinator. This commit removes the wall clock.
What changed in the engine:
- `Mothership.run_crisis_phase(num_turns, gossip_rounds_per_turn)`
is replaced by `run_until_quiescent(max_steps=200)`. The loop
interleaves three concerns on each iteration — emissions, gossip,
and alarm emissions — until none make progress. Termination is by
quiescence, not by a fixed turn count. `max_steps` is a safety
bound (loop-iteration cap), not an exposed clock.
- `Mothership.run_closed_phase(num_turns)` becomes
`run_closed_phase(max_steps=50)`. Same quiescence model — the
closed-phase conversation runs until no agent has more to say.
- Agents grew `pending_alarm_claims()`: each agent checks its own
graph for un-alarmed mutations and produces AlarmClaims directly.
The driver loop calls this every iteration, so alarms emit and
propagate in the same loop as regular emissions and gossip — no
separate "alarm phase."
- `Mothership.emit_alarms_from_detectors()` and the explicit
`run_gossip_round()` step are no longer needed by callers; both
are subsumed by the async loop. `run_gossip_round()` stays as a
helper but tests no longer call it externally.
What changed in the agent interface:
- `CrisisAgent.next_turn(turn, received_claims)` becomes
`try_emit()` — no arguments. Agents in an async network don't see
a global tick. They decide based on their own internal state.
- `CrisisAgent.observe(claim)` is the new optional callback the
closed-phase loop uses to feed context into agents that care
(overridden by LiveClaudeAgent to populate its prompt buffer).
- `pending_alarm_claims()` is idempotent: an internal
`_already_alarmed` set tracks claims this agent has emitted, so
the loop calls it every step without flooding the network with
duplicate alarms.
What changed in the dataclass schema:
- `AlarmClaim.detected_at_turn` -> `emitted_at_step`. The word
"turn" implies a global clock; "step" is a per-agent sequence
number used only for log ordering — local, not networked.
- `ClosedPhaseEntry.turn` and `CrisisPhaseEntry.turn` -> `step`.
Same rename, same reasoning.
- `Scenario.closed_phase_turns` and `Scenario.crisis_phase_turns`
are gone. The scenario no longer prescribes how many turns; it
just provides agents and lets the async loop run them out.
What changed in the CLI:
- Phase 3 reports "drove to quiescence in N step(s)" with a
breakdown of regular emissions / gossip transfers / alarm
emissions, instead of "ran N turns".
- `QuiescenceReport` (new dataclass) carries the run statistics
back from `run_until_quiescent`/`run_closed_phase` — steps taken,
emissions made, gossip transfers, alarm claims emitted, plus
whether termination was via quiescence or max-step cap.
New regression tests (`test_async_quiescence.py`):
- `test_run_until_quiescent_terminates`: the loop must exit.
- `test_two_runs_produce_identical_final_state`: determinism check —
if anything in the loop depended on real wall time, this would
fail.
- `test_max_steps_bound_caps_runtime`: setting max_steps=1 exits
immediately and `QuiescenceReport.reached_quiescence` reflects
reality.
- `test_no_turn_argument_exposed_to_agents`: introspects
`CrisisAgent.try_emit` signature; fails if anyone re-adds a
`turn` parameter.
- `test_no_turn_field_on_alarmclaim`: introspects the dataclass
fields; fails if `detected_at_turn` reappears.
- `test_alarms_propagate_through_async_loop_alone`: the loop alone
(no manual emit_alarms / run_gossip_round) ratifies an alarm.
- `test_quiescence_report_counts_match_logs`: sanity check that
the report's emission count equals the crisis log length.
Suite: 163 -> 170 tests, all green in 0.79s.
Behavioral end-state is identical to the previous (synchronous)
version: same fact-check scenario, same byzantine equivocation, same
proof JSON shape, same three signers, same quorum-met outcome. The
difference is structural: the protocol now matches the paper's async
shape, and a future port to actual TCP gossip + concurrent agents
needs no change to this engine.
CrisisViz: still untouched. The `crisis_data.json` pipeline that
drives the visualizer is orthogonal.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|---|---|---|
| CrisisViz | ||
| src | ||
| tests | ||
| .gitignore | ||
| Crisis.mirco-richter-2019.pdf | ||
| crisis_data.json | ||
| INSTALL.md | ||
| LICENSE | ||
| pyproject.toml | ||
| README.md | ||
crisis
A proof-of-concept and educational artifact for Mirco Richter's Crisis paper — a DAG-based BFT consensus protocol that achieves total order on messages in fully open, unstructured peer-to-peer networks through virtual voting: votes are never sent explicitly but are deduced from the causal relationships encoded in Lamport graphs.
This repository contains:
- a Python implementation of the protocol (
src/,tests/), - an event recorder that exports a deterministic simulation run to JSON,
- CrisisViz — a native macOS / SwiftUI curriculum visualizer that walks the protocol end-to-end across ten chapters: cast intro, gossip mechanics, partition, round derivation, virtual voting, leader election, total order, the data-availability problem, erasure-coded recovery, and Byzantine fork detection.
Everything in the visualizer is in extreme slow motion and serialized for didactic clarity. A signed speed slider scrubs the chapter forward and backward at any rate from -16\times to +16\times; narration is bound to whichever beat the playhead is on.
Architecture at a glance
flowchart TD
Paper["📄 <b>Paper — the spec</b><br/>Crisis.mirco-richter-2019.pdf"]
Paper --> Algos
subgraph Algos["🧮 Pure protocol algorithms — <code>src/crisis/</code>"]
direction LR
Crypto["crypto.py"]
Msg["message.py"]
Graph["graph.py"]
Weight["weight.py"]
Rounds["rounds.py"]
Voting["voting.py"]
Order["order.py"]
end
Algos --> RealRT
Algos --> SimRT
subgraph RealRT["🌐 <b>Real runtime — <code>node.py</code> + <code>gossip.py</code></b><br/><i>scalable, deployable</i>"]
Node["CrisisNode<br/>asyncio · TCP push/pull gossip<br/>3 concurrent loops<br/>CLI: <code>crisis-node</code>"]
end
subgraph SimRT["🧪 <b>In-process toy runtime — <code>demo.py</code></b><br/><i>deterministic, recordable</i>"]
SimNode["SimulatedNode<br/>direct in-memory message passing<br/>NetworkParams: delays / drops / silences"]
SimCtl["Simulation controller<br/>spins up N honest + K byzantine<br/>CLI: <code>crisis-demo</code>"]
SimNode --- SimCtl
end
SimRT --> Rec
Rec["📼 <b>Recorder — <code>recorder.py</code></b><br/>instruments every algorithm call<br/>captures events + per-step snapshots"]
Rec --> Export
Export["📦 <b>JSON exporter — <code>export_json.py</code></b><br/>writes <code>crisis_data.json</code>"]
Export --> Viz
subgraph Viz["🎬 <b>CrisisViz — native macOS / SwiftUI</b>"]
Player["Keynote-style player<br/>10 chapters · ~18 min @ 1×<br/>scrubbable −16× to +16×"]
Testbed["Testbed harness<br/>invariants · source audit<br/>PNG sweep · 36 MP4 clips"]
end
classDef paper fill:#fdf6e3,stroke:#586e75,color:#073642
classDef pure fill:#eee8d5,stroke:#586e75,color:#073642
classDef real fill:#fce5cd,stroke:#cc4125,color:#660000
classDef sim fill:#d9ead3,stroke:#38761d,color:#0b3d0b
classDef rec fill:#cfe2f3,stroke:#2c5f8f,color:#062b4d
classDef viz fill:#ead1dc,stroke:#741b47,color:#3d0a26
class Paper paper
class Algos pure
class RealRT real
class SimRT sim
class Rec,Export rec
class Viz viz
Key architectural fact — the recording pipeline that feeds CrisisViz only exercises the SimulatedNode path (in-process, deterministic, in-memory message passing). The CrisisNode TCP runtime is a separately developed PoC of how a real network deployment would look; it is not what produces crisis_data.json. The two runtimes are siblings, not layers.
Repository layout
crisis/ ← git root
├── Crisis.mirco-richter-2019.pdf the paper
├── README.md this file
├── INSTALL.md fresh-macOS install guide
├── LICENSE MIT (code only; paper is CC-BY-4.0)
├── pyproject.toml Python ≥3.11, networkx, pytest
├── crisis_data.json simulation export (source of truth)
│
├── src/crisis/ ── PROTOCOL PoC (Python) ──
│ ├── crypto.py, message.py random-oracle hash + Message/Vertex
│ ├── graph.py, weight.py, rounds.py Lamport DAG + PoW weight + round derivation
│ ├── voting.py, order.py BBA virtual voting + total order
│ ├── gossip.py, node.py real TCP runtime (CrisisNode)
│ ├── demo.py in-process simulation harness
│ ├── recorder.py event instrumentation
│ └── export_json.py JSON exporter for CrisisViz
├── tests/ pytest suite
│
└── CrisisViz/ ── VISUALIZER (Swift / macOS 26) ──
├── Package.swift, bundle.sh, package-dmg.sh
├── Sources/CrisisViz/ App, Engine, Model, Chapters, Views, Glass, Testbed, Canvas
├── README.md Swift-side human guide
└── HANDOFF.md agent-to-agent engineering log
Quick start
There are three audiences. Pick the one that matches what you want to do.
🧮 Verify the protocol — pytest
cd crisis
source .venv/bin/activate # set up per INSTALL.md if first time
pytest -q
Runs the algorithm unit tests (crypto, graph, rounds, weight, message, order, voting, recorder, simulation). Should be green in under a second.
🧪 Run a deterministic simulation — Python CLI
python -m crisis.demo --nodes 4 --byzantine 1 --rounds 10
Spins up four honest + one byzantine SimulatedNode, runs ten consensus rounds in-process with a deterministic seed, prints the resulting total order. To export a fresh crisis_data.json for CrisisViz:
python -m crisis.export_json --steps 80 -o crisis_data.json
cp crisis_data.json CrisisViz/Sources/CrisisViz/crisis_data.json
🎬 Watch the visualizer — Swift / macOS
cd CrisisViz
./bundle.sh # builds CrisisViz.app and opens it
# or:
./package-dmg.sh # builds CrisisViz.dmg for distribution
Then arrow keys ←/→ to navigate, Space to play/pause, the bottom slider to scrub at any signed speed from -16\times to +16\times.
Where to read next
- INSTALL.md — clone-to-running on a fresh macOS box. Prerequisites, Python venv setup, Swift toolchain, regenerating sim data, troubleshooting.
- CrisisViz/README.md — Swift-side guide: serial-timeline pattern, testbed outputs, controls, cast convention.
- CrisisViz/HANDOFF.md — engineering log for the next coding agent: current state, architecture pointers, hard-won rules.
License
- Code (
src/,tests/,CrisisViz/) is licensed under the MIT License. - Paper (
Crisis.mirco-richter-2019.pdf) by Mirco Richter is a separately licensed artifact under CC-BY-4.0.