From 54aae1a4dd9d4064c78dc3cbdda525f33e9346bf Mon Sep 17 00:00:00 2001 From: saymrwulf Date: Thu, 14 May 2026 22:13:00 +0200 Subject: [PATCH] Update all documentation for the crisis_agents layer + async refactor MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Three sweeping additions and one new file, reflecting how the project has grown: * Parent `README.md` rewritten. The architecture mermaid now shows `crisis_agents` as a third sibling layer on top of the pure protocol algorithms, alongside the CrisisNode TCP runtime and the SimulatedNode in-process recorder. A fourth audience-shaped quick start (๐Ÿค– "run the AI-agent coordination demo") joins the protocol-pytest, simulation-CLI, and visualizer entries. The repository-layout tree expands to enumerate `src/crisis_agents/`'s modules. Test count corrected (~170). * New `src/crisis_agents/README.md`. Comprehensive package documentation: - threat model + what's out of scope - the two principles enforced by tests: no chokepoint, no clock - mental-model mermaid (closed phase โ†’ boundary opens โ†’ async loop โ†’ quorum vote โ†’ multi-signer proof) - six-phase walkthrough matching the CLI output - module-by-module reference table - reuse map from `src/crisis/` (Message, LamportGraph, find_mutations, ProofOfWorkWeight, etc.) - build/run/test instructions including the `--live` Claude path - quorum-threshold formula in LaTeX: โŒˆ2N/3โŒ‰ - test taxonomy with the two sentinel files (test_no_chokepoint, test_async_quiescence) highlighted * `INSTALL.md` extended. New Section 4 covers running the `crisis-agents demo`, both mocked-deterministic and `--live` with real Claude sub-agents. Anthropic SDK shown as optional `[live]` extras. Old sections renumbered (Section 5 โ†’ Section 6 for Swift, 6 โ†’ 7 for Troubleshooting). Two new troubleshooting entries for live-mode failures. * `CrisisViz/HANDOFF.md` gets a new Section 0. Brief notice that a sibling Python sub-project (`crisis_agents`) now exists, what it does, and โ€” most importantly โ€” that it doesn't share code with CrisisViz: refactoring one cannot break the other. Cross-link to the crisis_agents README so a future Swift-side agent has the pointer without having to discover it via grep. Source-of-truth corrections in the parent README: - the "three audiences" framing becomes four - the layout tree now lists `src/crisis_agents/` - the architecture diagram explicitly marks the agent layer as "decentralized, asynchronous" (the two principles the recent refactors enforce) CrisisViz code: still untouched by all this. Only its HANDOFF doc gets a heads-up paragraph. Co-Authored-By: Claude Opus 4.7 --- CrisisViz/HANDOFF.md | 13 +- INSTALL.md | 56 ++++++-- README.md | 111 +++++++++++---- src/crisis_agents/README.md | 272 ++++++++++++++++++++++++++++++++++++ 4 files changed, 411 insertions(+), 41 deletions(-) create mode 100644 src/crisis_agents/README.md diff --git a/CrisisViz/HANDOFF.md b/CrisisViz/HANDOFF.md index 552ed30..aa862a8 100644 --- a/CrisisViz/HANDOFF.md +++ b/CrisisViz/HANDOFF.md @@ -6,11 +6,22 @@ Last updated: **2026-05-14**. --- +## 0. Sibling project notice โ€” `crisis_agents` exists + +Since this file was last meaningfully updated, a sibling Python sub-project has landed: **`src/crisis_agents/`** โ€” a coordination layer that uses the same `crisis` protocol substrate for a fundamentally different consumer (AI agent teams, not visualization). It produces `proof_*.json` documents instead of `crisis_data.json`. + +**Important for CrisisViz work:** the two sub-projects don't share code. `crisis_agents` does not produce data CrisisViz reads, and CrisisViz does not consume anything from `crisis_agents`. Refactoring either one cannot break the other. + +If a future curriculum chapter wants to visualize agent coordination (decentralized detection, gossip propagation, multi-detector alarm convergence), that's a substantial new effort โ€” see the parent README's "future CrisisViz story" note. For now, **focus on the chapter and testbed work and treat `crisis_agents` as an unrelated package living in the same repo**. + +Reference: **[`../src/crisis_agents/README.md`](../src/crisis_agents/README.md)**. + +--- + ## 1. Current state โ€” what's shipped - **All 10 chapters migrated** to the serial-beat timeline pattern (pure `state(at: t) -> WorldState`, scrubbable โˆ’16ร— to +16ร—, beat-bound narration). - **Testbed green** at the last clean run: 38/38 invariants pass, 0 source-audit errors, 36/36 MP4 clips written, 279 PNGs sane, 12/12 resize cases pass. -- **`origin/master` at `fb9bc9c`** โ€” working tree was clean before this documentation/testing pass. After this pass: README.md/INSTALL.md/LICENSE/CrisisViz README&HANDOFF/package-dmg.sh/Python tests landed. - **Bundle pipeline works.** `./bundle.sh` produces a working `CrisisViz.app`. `./package-dmg.sh` produces a working `CrisisViz.dmg` (ad-hoc signed; first-open Gatekeeper warning, right-click โ†’ Open). If you can't run the testbed and confirm it's green, **stop and fix that first** before making curriculum changes. diff --git a/INSTALL.md b/INSTALL.md index 662ebaa..79a974e 100644 --- a/INSTALL.md +++ b/INSTALL.md @@ -1,6 +1,6 @@ -# INSTALL โ€” Crisis & CrisisViz +# INSTALL โ€” Crisis, CrisisViz, and crisis_agents -End-to-end setup on a fresh macOS box, from a blank checkout to a running visualizer. Follow top-to-bottom. +End-to-end setup on a fresh macOS box: from blank checkout to running protocol tests, the agent-coordination demo, and the SwiftUI visualizer. Follow top-to-bottom. --- @@ -52,7 +52,7 @@ Run the unit tests to verify the algorithm implementations: pytest -q ``` -Expected: all tests pass in under a second. If any fail, stop and investigate before continuing โ€” the visualizer's data pipeline depends on these. +Expected: ~170 tests, all green in under a second. If any fail, stop and investigate before continuing โ€” both the visualizer's data pipeline and the agent-coordination layer depend on these. Try a deterministic in-process simulation: @@ -64,7 +64,39 @@ You should see consensus rounds advance and a total order emerge. --- -## 4. Regenerate `crisis_data.json` (optional) +## 4. Run the AI-agent coordination demo + +The `crisis-agents` CLI walks a six-phase scenario end-to-end: a closed honest team, a byzantine joiner who equivocates on a fact-check statement, an asynchronous gossip + detection event loop, quorum-ratified alarm, and a multi-signer proof JSON. + +### 4a. Mocked agents (deterministic, no API costs) + +```sh +crisis-agents demo --out-dir /tmp/crisis_demo +``` + +Output ends with `proof__.json` in `--out-dir`. To self-verify a proof: + +```sh +crisis-agents verify /tmp/crisis_demo/proof_*.json +``` + +### 4b. Real Claude sub-agents (`--live`) + +Install the optional Anthropic SDK extras: + +```sh +pip install -e ".[live]" +export ANTHROPIC_API_KEY=sk-ant-... +crisis-agents demo --live --model claude-haiku-4-5-20251001 +``` + +This swaps the three scripted honest agents for `LiveClaudeAgent` instances backed by real Anthropic Messages API calls. The byzantine stays scripted so the equivocation is reliably reproducible. Costs API credits; output is non-deterministic. + +Architecture reference: **[src/crisis_agents/README.md](src/crisis_agents/README.md)**. + +--- + +## 5. Regenerate `crisis_data.json` (optional) The repo ships with a pre-recorded `crisis_data.json` at the root and a bundled copy in `CrisisViz/Sources/CrisisViz/`. Regenerate when you change the protocol code or want a different simulation: @@ -77,9 +109,9 @@ The defaults (6 honest + 1 byzantine, 80 steps) produce full convergence from st --- -## 5. Swift side โ€” the visualizer +## 6. Swift side โ€” the visualizer -### 5a. Quick dev loop +### 6a. Quick dev loop ```sh cd CrisisViz @@ -89,7 +121,7 @@ swift run CrisisViz # launches the dev binary Note: the dev binary does not have a Dock icon and lives in `.build/`. For a real `.app` use `bundle.sh`. -### 5b. Build the `.app` bundle +### 6b. Build the `.app` bundle ```sh ./bundle.sh # build + assemble CrisisViz.app + open @@ -98,7 +130,7 @@ Note: the dev binary does not have a Dock icon and lives in `.build/`. For a rea `CrisisViz.app` is created in the current directory. Open it from Finder or the Dock to get the full activation-policy behavior. -### 5c. Build a DMG installer +### 6c. Build a DMG installer ```sh ./package-dmg.sh # produces CrisisViz.dmg in the current directory @@ -112,7 +144,7 @@ Distribution flow for a new machine: 3. Drag `CrisisViz` onto the `Applications` symlink. 4. Eject the DMG; launch from `/Applications` (right-click โ†’ Open the first time). -### 5d. Run the QA testbed +### 6d. Run the QA testbed ```sh swift run CrisisViz --testbed @@ -130,7 +162,7 @@ All five should be green before shipping changes. --- -## 6. Troubleshooting +## 7. Troubleshooting **`swift build` fails with โ€œunsupported deployment targetโ€.** Your Xcode does not provide the macOS 26 SDK. Update Xcode to โ‰ฅ17, or downgrade `Package.swift` to your installed SDK (not recommended โ€” visual features depend on macOS 26 Liquid Glass APIs). @@ -141,3 +173,7 @@ All five should be green before shipping changes. **`pytest` fails on `ModuleNotFoundError: crisis`.** Activate the venv (`source .venv/bin/activate`) and reinstall with `pip install -e ".[dev]"`. The `-e` (editable) flag is what makes `import crisis` resolve to `src/crisis/`. **The visualizer freezes mid-chapter / animations are stuck.** You're running the unbundled `swift-run` binary while the Dock icon launches `CrisisViz.app`. Rebuild the bundle: `./bundle.sh --no-launch && open CrisisViz.app`. + +**`crisis-agents --live` fails with `live mode requires the anthropic SDK`.** Install the optional extras: `pip install -e ".[live]"`. The mocked path doesn't need this dependency. + +**`crisis-agents --live` fails with `ANTHROPIC_API_KEY`.** Export the key before running: `export ANTHROPIC_API_KEY=sk-ant-...`. The SDK reads it from the environment. diff --git a/README.md b/README.md index 3d63a47..05810a6 100644 --- a/README.md +++ b/README.md @@ -4,11 +4,12 @@ A proof-of-concept and educational artifact for Mirco Richter's [_Crisis_ paper] This repository contains: -- a **Python implementation** of the protocol (`src/`, `tests/`), +- a **Python implementation** of the protocol (`src/crisis/`, `tests/`), - an **event recorder** that exports a deterministic simulation run to JSON, -- **CrisisViz** โ€” a native macOS / SwiftUI curriculum visualizer that walks the protocol end-to-end across ten chapters: cast intro, gossip mechanics, partition, round derivation, virtual voting, leader election, total order, the data-availability problem, erasure-coded recovery, and Byzantine fork detection. +- **CrisisViz** โ€” a native macOS / SwiftUI curriculum visualizer that walks the protocol end-to-end across ten chapters, +- **crisis_agents** โ€” a coordination layer that lifts the protocol from "consensus between machines" to "consensus between AI agents," with a decentralized async event-driven engine and quorum-ratified byzantine alarms. -Everything in the visualizer is in extreme slow motion and serialized for didactic clarity. A signed speed slider scrubs the chapter forward and backward at any rate from $-16\times$ to $+16\times$; narration is bound to whichever beat the playhead is on. +Everything in the visualizer is in extreme slow motion and serialized for didactic clarity. A signed speed slider scrubs each chapter forward and backward at any rate from $-16\times$ to $+16\times$; narration is bound to whichever beat the playhead is on. --- @@ -32,6 +33,7 @@ flowchart TD Algos --> RealRT Algos --> SimRT + Algos --> AgentLayer subgraph RealRT["๐ŸŒ Real runtime โ€” node.py + gossip.py
scalable, deployable"] Node["CrisisNode
asyncio ยท TCP push/pull gossip
3 concurrent loops
CLI: crisis-node"] @@ -43,12 +45,20 @@ flowchart TD SimNode --- SimCtl end + subgraph AgentLayer["๐Ÿค– Crisis-Agents โ€” src/crisis_agents/
decentralized, asynchronous"] + Agent["CrisisAgent ร—N
owns own LamportGraph
emit ยท receive ยท gossip ยท detect"] + Mom["Mothership
bootstrap + event-loop driver
no clock ยท no privileged state
CLI: crisis-agents"] + Agent --- Mom + end + SimRT --> Rec Rec["๐Ÿ“ผ Recorder โ€” recorder.py
instruments every algorithm call
captures events + per-step snapshots"] Rec --> Export Export["๐Ÿ“ฆ JSON exporter โ€” export_json.py
writes crisis_data.json"] Export --> Viz + AgentLayer --> ProofJSON["๐Ÿงพ proof_*.json
multi-signer byzantine proof
schema_version=2"] + subgraph Viz["๐ŸŽฌ CrisisViz โ€” native macOS / SwiftUI"] Player["Keynote-style player
10 chapters ยท ~18 min @ 1ร—
scrubbable โˆ’16ร— to +16ร—"] Testbed["Testbed harness
invariants ยท source audit
PNG sweep ยท 36 MP4 clips"] @@ -58,53 +68,77 @@ flowchart TD classDef pure fill:#eee8d5,stroke:#586e75,color:#073642 classDef real fill:#fce5cd,stroke:#cc4125,color:#660000 classDef sim fill:#d9ead3,stroke:#38761d,color:#0b3d0b + classDef agents fill:#fff2cc,stroke:#bf9000,color:#3d2e00 classDef rec fill:#cfe2f3,stroke:#2c5f8f,color:#062b4d classDef viz fill:#ead1dc,stroke:#741b47,color:#3d0a26 + classDef proof fill:#fce5e8,stroke:#a64d59,color:#3d0014 class Paper paper class Algos pure class RealRT real class SimRT sim + class AgentLayer agents class Rec,Export rec class Viz viz + class ProofJSON proof ``` -**Key architectural fact** โ€” the recording pipeline that feeds CrisisViz only exercises the **`SimulatedNode`** path (in-process, deterministic, in-memory message passing). The **`CrisisNode`** TCP runtime is a separately developed PoC of how a real network deployment would look; it is _not_ what produces `crisis_data.json`. The two runtimes are siblings, not layers. +**Three independent consumers of the protocol.** `src/crisis/` provides the pure algorithms (Lamport graphs, virtual voting, total order, mutation detection). Three sibling layers sit on top: + +- **`CrisisNode`** โ€” a deployable distributed runtime (TCP gossip, three concurrent asyncio loops). Has no consumers in this repo; meant as a reference for how a real network deployment would look. +- **`SimulatedNode`** โ€” an in-process deterministic simulator whose recording becomes `crisis_data.json`, the file CrisisViz visualizes. +- **`crisis_agents`** โ€” agent-coordination layer. Each AI agent participates as a Crisis node; the network catches byzantine equivocation through decentralized detection + quorum voting. The engine is asynchronous and event-driven โ€” no global clock, no privileged observer. + +The three are **siblings, not layers**: refactoring one doesn't break the others. CrisisViz and crisis_agents don't know each other exists. --- ## Repository layout ``` -crisis/ โ† git root -โ”œโ”€โ”€ Crisis.mirco-richter-2019.pdf the paper -โ”œโ”€โ”€ README.md this file -โ”œโ”€โ”€ INSTALL.md fresh-macOS install guide -โ”œโ”€โ”€ LICENSE MIT (code only; paper is CC-BY-4.0) -โ”œโ”€โ”€ pyproject.toml Python โ‰ฅ3.11, networkx, pytest -โ”œโ”€โ”€ crisis_data.json simulation export (source of truth) +crisis/ โ† git root +โ”œโ”€โ”€ Crisis.mirco-richter-2019.pdf the paper +โ”œโ”€โ”€ README.md this file +โ”œโ”€โ”€ INSTALL.md fresh-macOS install guide +โ”œโ”€โ”€ LICENSE MIT (code only; paper is CC-BY-4.0) +โ”œโ”€โ”€ pyproject.toml Python โ‰ฅ3.11, networkx, pytest +โ”œโ”€โ”€ crisis_data.json simulation export (source of truth) โ”‚ -โ”œโ”€โ”€ src/crisis/ โ”€โ”€ PROTOCOL PoC (Python) โ”€โ”€ -โ”‚ โ”œโ”€โ”€ crypto.py, message.py random-oracle hash + Message/Vertex -โ”‚ โ”œโ”€โ”€ graph.py, weight.py, rounds.py Lamport DAG + PoW weight + round derivation -โ”‚ โ”œโ”€โ”€ voting.py, order.py BBA virtual voting + total order -โ”‚ โ”œโ”€โ”€ gossip.py, node.py real TCP runtime (CrisisNode) -โ”‚ โ”œโ”€โ”€ demo.py in-process simulation harness -โ”‚ โ”œโ”€โ”€ recorder.py event instrumentation -โ”‚ โ””โ”€โ”€ export_json.py JSON exporter for CrisisViz -โ”œโ”€โ”€ tests/ pytest suite +โ”œโ”€โ”€ src/crisis/ โ”€โ”€ PROTOCOL PoC (Python) โ”€โ”€ +โ”‚ โ”œโ”€โ”€ crypto.py, message.py random-oracle hash + Message/Vertex +โ”‚ โ”œโ”€โ”€ graph.py, weight.py, rounds.py Lamport DAG + PoW weight + round derivation +โ”‚ โ”œโ”€โ”€ voting.py, order.py BBA virtual voting + total order +โ”‚ โ”œโ”€โ”€ gossip.py, node.py real TCP runtime (CrisisNode) +โ”‚ โ”œโ”€โ”€ demo.py in-process simulation harness +โ”‚ โ”œโ”€โ”€ recorder.py event instrumentation +โ”‚ โ””โ”€โ”€ export_json.py JSON exporter for CrisisViz โ”‚ -โ””โ”€โ”€ CrisisViz/ โ”€โ”€ VISUALIZER (Swift / macOS 26) โ”€โ”€ +โ”œโ”€โ”€ src/crisis_agents/ โ”€โ”€ AGENT COORDINATION (Python) โ”€โ”€ +โ”‚ โ”œโ”€โ”€ README.md architecture & walkthrough +โ”‚ โ”œโ”€โ”€ agent.py CrisisAgent + MockAgent + MockByzantineAgent +โ”‚ โ”œโ”€โ”€ live_agent.py LiveClaudeAgent (Anthropic SDK) +โ”‚ โ”œโ”€โ”€ boundary.py trust-set + open() trigger +โ”‚ โ”œโ”€โ”€ mothership.py bootstrap + async event-loop driver +โ”‚ โ”œโ”€โ”€ claim.py ClaimMessage payload +โ”‚ โ”œโ”€โ”€ alarm.py decentralized detection +โ”‚ โ”œโ”€โ”€ vote.py AlarmClaim + quorum tally +โ”‚ โ”œโ”€โ”€ proof.py multi-signer ProofDocument +โ”‚ โ”œโ”€โ”€ cli.py crisis-agents CLI entry point +โ”‚ โ””โ”€โ”€ scenarios/fact_check.py the canonical demo +โ”‚ +โ”œโ”€โ”€ tests/ pytest suite (170 tests, ~0.8s) +โ”‚ +โ””โ”€โ”€ CrisisViz/ โ”€โ”€ VISUALIZER (Swift / macOS 26) โ”€โ”€ โ”œโ”€โ”€ Package.swift, bundle.sh, package-dmg.sh - โ”œโ”€โ”€ Sources/CrisisViz/ App, Engine, Model, Chapters, Views, Glass, Testbed, Canvas - โ”œโ”€โ”€ README.md Swift-side human guide - โ””โ”€โ”€ HANDOFF.md agent-to-agent engineering log + โ”œโ”€โ”€ Sources/CrisisViz/ App, Engine, Model, Chapters, Views, Glass, Testbed, Canvas + โ”œโ”€โ”€ README.md Swift-side human guide + โ””โ”€โ”€ HANDOFF.md agent-to-agent engineering log ``` --- ## Quick start -There are three audiences. Pick the one that matches what you want to do. +Four audiences. Pick the one that matches what you want to do. ### ๐Ÿงฎ Verify the protocol โ€” pytest @@ -114,9 +148,9 @@ source .venv/bin/activate # set up per INSTALL.md if first time pytest -q ``` -Runs the algorithm unit tests (crypto, graph, rounds, weight, message, order, voting, recorder, simulation). Should be green in under a second. +Runs all 170 tests across the protocol algorithms and the crisis_agents layer. Should be green in under a second. -### ๐Ÿงช Run a deterministic simulation โ€” Python CLI +### ๐Ÿงช Run a deterministic protocol simulation โ€” Python CLI ```sh python -m crisis.demo --nodes 4 --byzantine 1 --rounds 10 @@ -129,7 +163,23 @@ python -m crisis.export_json --steps 80 -o crisis_data.json cp crisis_data.json CrisisViz/Sources/CrisisViz/crisis_data.json ``` -### ๐ŸŽฌ Watch the visualizer โ€” Swift / macOS +### ๐Ÿค– Run the AI-agent coordination demo โ€” Python CLI + +```sh +crisis-agents demo +``` + +Walks a six-phase scenario: a closed honest team, a byzantine joiner who equivocates on a fact-check statement, an asynchronous gossip + detection event loop, and a quorum-ratified proof. Output ends with a `proof_*.json` document that any third party can self-verify. See **[src/crisis_agents/README.md](src/crisis_agents/README.md)** for the architecture. + +For real Claude sub-agents instead of scripted mocks: + +```sh +pip install -e ".[live]" # adds anthropic SDK +export ANTHROPIC_API_KEY=... +crisis-agents demo --live +``` + +### ๐ŸŽฌ Watch the protocol visualizer โ€” Swift / macOS ```sh cd CrisisViz @@ -144,9 +194,10 @@ Then arrow keys โ†/โ†’ to navigate, **Space** to play/pause, the bottom slider ## Where to read next -- **[INSTALL.md](INSTALL.md)** โ€” clone-to-running on a fresh macOS box. Prerequisites, Python venv setup, Swift toolchain, regenerating sim data, troubleshooting. +- **[INSTALL.md](INSTALL.md)** โ€” clone-to-running on a fresh macOS box. Prerequisites, Python venv setup, Swift toolchain, regenerating sim data, running the agents demo, troubleshooting. +- **[src/crisis_agents/README.md](src/crisis_agents/README.md)** โ€” the AI-agent coordination layer: architecture, six-phase walkthrough, decentralization principles, async event loop, quorum formula, live Claude mode, proof JSON shape. - **[CrisisViz/README.md](CrisisViz/README.md)** โ€” Swift-side guide: serial-timeline pattern, testbed outputs, controls, cast convention. -- **[CrisisViz/HANDOFF.md](CrisisViz/HANDOFF.md)** โ€” engineering log for the next coding agent: current state, architecture pointers, hard-won rules. +- **[CrisisViz/HANDOFF.md](CrisisViz/HANDOFF.md)** โ€” engineering log for the next coding agent. --- diff --git a/src/crisis_agents/README.md b/src/crisis_agents/README.md new file mode 100644 index 0000000..266fe0c --- /dev/null +++ b/src/crisis_agents/README.md @@ -0,0 +1,272 @@ +# crisis_agents โ€” coordination layer for AI agent teams + +A Python package that lifts the Crisis consensus protocol from "consensus between machines" to "consensus between AI agents." Each participant is a Crisis node with its own Lamport graph; the network catches byzantine equivocation via decentralized detection and quorum-ratified alarms. The engine is **asynchronous** and **event-driven** โ€” no global clock, no privileged observer. + +> If you're new to this repo, start at the [parent README](../../README.md). This document is the architectural reference for the agent layer. + +--- + +## Threat model + +The setting is a small team of AI agents (Claude sub-agents, in our live demo) coordinated by an orchestrator we call the **mothership**. + +- **Normal life** โ€” the team is closed. Agents talk freely with each other and the mothership. No Crisis layer; the conversation is the medium. +- **Boundary opens** โ€” an external agent of unknown trust joins. Its internal motivation may diverge from the team's task. It may equivocate โ€” telling one peer one thing while telling another peer the opposite โ€” to mislead the network. +- **Crisis to the rescue** โ€” from the moment the boundary opens, every claim is wrapped into a Crisis Message with the emitting agent's stable process_id and a PoW nonce. The per-agent Lamport DAG is the immutable, replayable ledger. Mutation detection (built on `LamportGraph.find_mutations` from the protocol layer) catches equivocation. Every honest agent who has gossiped enough to see both contradictory variants raises an alarm. A quorum of independent alarms produces a network-ratified proof of malfeasance. + +What's deliberately **not** in scope (this is a PoC): + +- Visualization. The CrisisViz application is a separate effort that visualizes the protocol PoC; visualizing an agent-coordination run would require a substantial new chapter set there. +- Real TCP gossip. Agents talk via in-process function calls in the mothership process. The existing `crisis.gossip.GossipServer` shows how it would look across sockets. +- Detection of *false claims that aren't equivocations*. An agent who consistently lies but never equivocates is out-voted, not "caught." Catching it would require a ground-truth oracle, which is application-layer, not protocol-layer. + +--- + +## Two architectural principles, enforced by tests + +### 1. No chokepoint + +Every honest agent maintains **its own** `LamportGraph`. The mothership does NOT hold a privileged graph of the whole network. Detection runs on each agent independently; alarms are emitted by each detector independently; proofs are signed by a quorum of detectors. + +The regression-test file `tests/test_no_chokepoint.py` asserts: + +- After the full lifecycle, every honest agent's *ratified-alarms set* is byte-identical to every other honest agent's. +- The mothership does not expose `all_graphs`, `graph_of`, `_graphs`, or any other privileged collection. +- A single byzantine accuser alone cannot ratify an alarm. + +### 2. No clock + +Crisis is supposed to work in asynchronous P2P networks. Any synchronicity in the protocol is *virtual* โ€” derived inside the consensus algorithm from the causal structure of the Lamport graph โ€” not imposed from outside by a coordinator. + +The driver loop is **event-driven and quiescence-terminated**, not turn-counted: + +```python +def run_until_quiescent(max_steps=200): + while progress: + progress = False + # 1. Any agent has something to emit? Let them speak. + # 2. Any gossip pair has new info? Exchange. + # 3. Any agent has detected a new mutation? Emit AlarmClaim. +``` + +`tests/test_async_quiescence.py` asserts: + +- `CrisisAgent.try_emit()` takes no `turn` argument. +- `AlarmClaim` has no `detected_at_turn` field (the wall-clock-implying name); only `emitted_at_step`, which is a per-agent local sequence number. +- Two runs of the same scenario produce identical end states (determinism โ€” no hidden wall-time dependence). +- The loop alone (no manual phase orchestration) ratifies an alarm. + +--- + +## The mental model + +```mermaid +flowchart TB + subgraph Closed["๐ŸŸข Phase 1 โ€” closed team (no Crisis)"] + Mom1["mothership"] + A1["agent_ฮฑ"] + B1["agent_ฮฒ"] + C1["agent_ฮณ"] + Mom1 <--> A1 + Mom1 <--> B1 + Mom1 <--> C1 + A1 <--> B1 + B1 <--> C1 + end + + Boundary{"BOUNDARY OPENS
agent_ฮด joins;
trust unknown"} + Closed --> Boundary --> Open + + subgraph Open["๐ŸŸก Phase 2 โ€” Crisis active, async event loop"] + Mom2["mothership
(bootstrap + driver only)"] + A2["agent_ฮฑ
own LamportGraph
detect ยท alarm"] + B2["agent_ฮฒ
own LamportGraph
detect ยท alarm"] + C2["agent_ฮณ
own LamportGraph
detect ยท alarm"] + D2["agent_ฮด โš 
own LamportGraph
byzantine"] + + D2 -. variant_A .-> A2 + D2 -. variant_A .-> C2 + D2 -. variant_B .-> B2 + A2 <-. gossip .-> B2 + B2 <-. gossip .-> C2 + A2 <-. gossip .-> C2 + A2 -- alarm --> Mom2 + B2 -- alarm --> Mom2 + C2 -- alarm --> Mom2 + end + + Open --> Quorum + Quorum{"QUORUM VOTE
โ‰ฅ โŒˆ2N/3โŒ‰ honest signers
independently agree"} + Quorum --> Proof + Proof["๐Ÿ“œ Multi-signer proof
signed JSON; replayable
schema_version=2"] + + classDef closed fill:#d9ead3,stroke:#38761d + classDef boundary fill:#fff2cc,stroke:#bf9000 + classDef open fill:#fce5cd,stroke:#cc4125 + classDef quorum fill:#cfe2f3,stroke:#2c5f8f + classDef proof fill:#ead1dc,stroke:#741b47 + class Closed closed + class Boundary,Quorum boundary + class Open open + class Proof proof +``` + +--- + +## Six-phase walkthrough (the `crisis-agents demo`) + +The canonical scenario is `scenarios/fact_check.py`: three honest agents and one byzantine adjudicate six factual statements about a small reference document. + +### Phase 1 โ€” closed team, no Crisis +The mothership drives `run_closed_phase()` until quiescent. Each honest agent emits its six fact-check claims via plain function calls โ€” appended to a flat log. Per-agent LamportGraphs aren't yet allocated. **No Crisis overhead.** + +### Phase 2 โ€” boundary opens +`mothership.open_boundary(agent_delta)`. Atomically: ฮด is added to the trust set, a fresh `LamportGraph` is created on every agent (including ฮด), and `boundary.is_open` flips to `True`. + +### Phase 3 โ€” asynchronous event loop +`mothership.run_until_quiescent()`. The driver cycles through: + +1. **Emission** โ€” `agent.try_emit()` is called on each agent. Returned `AgentTurn`s are first-hop routed to their target subset (or broadcast). The byzantine emits an intro (broadcast), then a pair of contradictory variants (split delivery). +2. **Gossip** โ€” every ordered pair `(sender, receiver)` exchanges what `sender` has that `receiver` doesn't. Eventually-consistent propagation. +3. **Alarm emission** โ€” `agent.pending_alarm_claims()` runs `LamportGraph.find_mutations(...)` on each agent's own graph and produces `AlarmClaim`s for any newly observed equivocation. AlarmClaims are wrapped as Crisis Messages and broadcast. + +The loop exits when none of these three concerns make progress. `QuiescenceReport` (returned) carries: `steps`, `emissions`, `gossip_transfers`, `alarm_claims_emitted`, `reached_quiescence`. + +### Phase 4 โ€” decentralized detection +Each agent independently runs `detect_mutations()` on its own graph. In our scenario, every honest agent observes the byzantine's same-id spacelike pair and reports it. The byzantine doesn't accuse itself. + +### Phase 5 โ€” ratification by quorum +The quorum threshold is + +$$\text{quorum}(N) = \left\lceil \frac{2N}{3} \right\rceil$$ + +where $N$ is the boundary size at ratification. For our scenario $N=4$ (3 honest + 1 byzantine), so the threshold is $\left\lceil 2 \cdot 4 / 3 \right\rceil = 3$ โ€” every honest agent must concur. `tally_alarms(graph, threshold)` groups AlarmClaim vertices by `(accused, statement_id, witness_pair)`, counts unique signer process_ids per group, and ratifies groups meeting the threshold. **All honest agents produce identical `RatifiedAlarm` lists** (this is the no-chokepoint property in action). + +### Phase 6 โ€” proof emission +`build_proof(ratified_alarm)` produces a self-contained JSON document. Schema: + +```json +{ + "schema_version": 2, + "accused_process_id_hex": "...", + "statement_id": "s03", + "witness_digests": ["...", "..."], + "signer_process_id_hexes": ["...", "...", "..."], + "quorum_threshold": 3, + "summary": "agent id=... emitted contradictory Crisis vertices about ..." +} +``` + +`verify_proof_self_consistent(proof)` checks distinct witnesses, distinct signers, signer count โ‰ฅ threshold. Future Phase-6+ work: full replay verification that re-derives the alarm from a recorded simulation log. + +--- + +## Module reference + +| File | What it owns | +|---|---| +| `claim.py` | `Claim` dataclass โ€” the application-layer payload (verdict + evidence) | +| `boundary.py` | `Boundary` โ€” trust set, `open()` trigger | +| `agent.py` | `CrisisAgent` (abstract) + `MockAgent` + `MockByzantineAgent`. Each agent owns its `LamportGraph`, `emit_claim`, `receive`, `gossip_to`, `detect_mutations`, `pending_alarm_claims` | +| `live_agent.py` | `LiveClaudeAgent` โ€” same interface, backed by real Anthropic API calls | +| `mothership.py` | `Mothership` โ€” bootstrap + async event-loop driver. No privileged graph state. `run_closed_phase()`, `run_until_quiescent()`, `ratified_alarms_from(name)` | +| `alarm.py` | `LocalAlarm` + `detect_mutations_in_graph(graph, ...)` โ€” pure function, runs on one agent's graph | +| `vote.py` | `AlarmClaim` payload, `RatifiedAlarm`, `quorum_for(n)`, `tally_alarms(graph, threshold)` | +| `proof.py` | `ProofDocument` (schema v2), `build_proof`, `verify_proof_self_consistent` | +| `cli.py` | `crisis-agents demo` + `crisis-agents verify` | +| `scenarios/fact_check.py` | The canonical demo scenario: reference doc, six statements, scripted agents | +| `scenarios/reference_doc.txt` | The factual paragraph the demo adjudicates | + +--- + +## Reuse map from `src/crisis/` + +Almost all the heavy lifting comes from the protocol layer; `crisis_agents` is a thin adapter. + +| `src/crisis/` primitive | How `crisis_agents` uses it | +|---|---| +| `Message`, `Vertex` | Claims and AlarmClaims become `Message.payload`. Agent's stable id โ†’ `Message.id`. | +| `LamportGraph` | One per agent. `extend()`, `find_mutations()`, `are_spacelike()` all reused. | +| `LamportGraph.find_mutations(pid)` | The core of decentralized detection. Returns same-id spacelike groups. | +| `ProofOfWorkWeight` + `mine_nonce()` | Each emission's PoW comes from here, with a shared weight system across the network so PoW is verifiable across graphs. | +| `digest(name)[:ID_LENGTH]` | Agent process_id derivation. Same convention as `crisis.demo.Simulation` so agents could coexist with simulated nodes in a future mixed scenario. | + +--- + +## Build ยท run ยท test + +```sh +# From repo root, after setup per INSTALL.md +cd /path/to/crisis +source .venv/bin/activate +pip install -e ".[dev]" # editable install with pytest + +# All tests, including crisis_agents +pytest -q # ~170 tests in 0.8s + +# Just the agent layer +pytest tests/test_claim.py tests/test_boundary.py tests/test_agent*.py \ + tests/test_mothership.py tests/test_alarm.py tests/test_vote.py \ + tests/test_proof.py tests/test_demo_fact_check.py \ + tests/test_no_chokepoint.py tests/test_async_quiescence.py -v + +# Run the demo (mocked, deterministic) +crisis-agents demo --out-dir /tmp/crisis_demo + +# Run with real Claude sub-agents (requires API key + extras) +pip install -e ".[live]" +export ANTHROPIC_API_KEY=sk-ant-... +crisis-agents demo --live --model claude-haiku-4-5-20251001 + +# Verify a proof +crisis-agents verify /tmp/crisis_demo/proof_*.json +``` + +--- + +## The live-Claude path + +`LiveClaudeAgent` (in `live_agent.py`) makes one Anthropic Messages API call per `try_emit()` invocation, asking Claude to fact-check the scenario's statements against the reference document. The response is parsed as a JSON array of `Claim`-shaped objects; malformed responses degrade gracefully (the agent emits nothing rather than crashing). + +The byzantine joiner stays **mocked** even in `--live` mode: producing deterministic equivocation from an LLM requires multiple API calls per turn (one per peer subset) for unreliable yields, and the demo's narrative is cleaner with a scripted byzantine. The honest agents are the real LLM participants. + +Default model: `claude-haiku-4-5-20251001` (fast, cheap, plenty of capability for structured-output adjudication). Override with `--model`. + +The live path is intentionally not in CI โ€” it costs API credits and has nondeterministic outputs. + +--- + +## Test taxonomy + +| Test file | What it asserts | +|---|---| +| `tests/test_claim.py` | Claim dataclass validation + JSON round-trip | +| `tests/test_boundary.py` | Boundary state machine (closed โ†’ open) | +| `tests/test_mothership.py` | Per-agent graph ownership; broadcast vs. targeted delivery; gossip propagation; no privileged attribute | +| `tests/test_alarm.py` | Decentralized detection; every honest agent finds the same mutation; canonical witness pairs | +| `tests/test_vote.py` | AlarmClaim round-trip; quorum formulas; tally determinism | +| `tests/test_proof.py` | ProofDocument schema; JSON round-trip; tampered-witness/below-quorum rejection | +| `tests/test_demo_fact_check.py` | End-to-end scenario produces one ratified alarm; CLI output contains all six phases | +| `tests/test_live_agent.py` | LiveClaudeAgent parsing (fake Anthropic client; no real API calls) | +| **`tests/test_no_chokepoint.py`** | **Centerpiece: every honest agent's ratified set is byte-identical; no privileged attributes exist** | +| **`tests/test_async_quiescence.py`** | **Centerpiece: no clock; `try_emit()` takes no arg; `AlarmClaim.detected_at_turn` doesn't exist; two runs converge identically** | + +The two centerpiece files are sentinels โ€” if you ever re-introduce a chokepoint or a wall clock, one of those tests should fail. + +--- + +## What's deliberately out of scope + +- **CrisisViz integration.** The visualizer's data file (`crisis_data.json`) is produced by `crisis.demo.Simulation`, not by `crisis_agents`. A future CrisisViz upgrade could absorb agent-coordination runs (multi-DAG rendering, gossip arrows, alarm-vote convergence) โ€” but that's a separate effort, sketched in the parent README. +- **Real TCP gossip.** In-process function calls only. Lifting to multi-process requires plugging into `crisis.gossip.GossipServer` โ€” independent work. +- **Cryptographic signatures beyond what Crisis already provides.** Crisis already provides nonces + message-digest chaining + PoW. Agent identity is `digest(name)[:32]`. We don't add a separate identity-PKI. +- **Sybil resistance.** Threat model is "a few byzantine joiners with valid PoW", not "an attacker spawning unlimited identities." Sybil defense is what the PoW weight in Crisis is *for*; it's not the agent layer's concern. +- **Byzantine false-accusations.** A byzantine could emit a false AlarmClaim against an honest agent. The quorum mechanism prevents ratification (honest agents won't second the false claim, so it stays at 1-of-N). Second-order detection of false accusers isn't in this PoC. + +--- + +## Pointers + +- Parent README: [`../../README.md`](../../README.md) +- Install guide: [`../../INSTALL.md`](../../INSTALL.md) +- The paper this is all based on: [`../../Crisis.mirco-richter-2019.pdf`](../../Crisis.mirco-richter-2019.pdf)