Update all documentation for the crisis_agents layer + async refactor

Three sweeping additions and one new file, reflecting how the project
has grown:

* Parent `README.md` rewritten. The architecture mermaid now shows
  `crisis_agents` as a third sibling layer on top of the pure
  protocol algorithms, alongside the CrisisNode TCP runtime and the
  SimulatedNode in-process recorder. A fourth audience-shaped quick
  start (🤖 "run the AI-agent coordination demo") joins the
  protocol-pytest, simulation-CLI, and visualizer entries. The
  repository-layout tree expands to enumerate `src/crisis_agents/`'s
  modules. Test count corrected (~170).

* New `src/crisis_agents/README.md`. Comprehensive package
  documentation:
    - threat model + what's out of scope
    - the two principles enforced by tests: no chokepoint, no clock
    - mental-model mermaid (closed phase → boundary opens → async
      loop → quorum vote → multi-signer proof)
    - six-phase walkthrough matching the CLI output
    - module-by-module reference table
    - reuse map from `src/crisis/` (Message, LamportGraph,
      find_mutations, ProofOfWorkWeight, etc.)
    - build/run/test instructions including the `--live` Claude path
    - quorum-threshold formula in LaTeX: ⌈2N/3⌉
    - test taxonomy with the two sentinel files
      (test_no_chokepoint, test_async_quiescence) highlighted

* `INSTALL.md` extended. New Section 4 covers running the
  `crisis-agents demo`, both mocked-deterministic and `--live` with
  real Claude sub-agents. Anthropic SDK shown as optional `[live]`
  extras. Old sections renumbered (Section 5 → Section 6 for Swift,
  6 → 7 for Troubleshooting). Two new troubleshooting entries for
  live-mode failures.

* `CrisisViz/HANDOFF.md` gets a new Section 0. Brief notice that a
  sibling Python sub-project (`crisis_agents`) now exists, what it
  does, and — most importantly — that it doesn't share code with
  CrisisViz: refactoring one cannot break the other. Cross-link to
  the crisis_agents README so a future Swift-side agent has the
  pointer without having to discover it via grep.

Source-of-truth corrections in the parent README:
  - the "three audiences" framing becomes four
  - the layout tree now lists `src/crisis_agents/`
  - the architecture diagram explicitly marks the agent layer as
    "decentralized, asynchronous" (the two principles the recent
    refactors enforce)

CrisisViz code: still untouched by all this. Only its HANDOFF doc
gets a heads-up paragraph.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
saymrwulf 2026-05-14 22:13:00 +02:00
parent 0976239ebd
commit 54aae1a4dd
4 changed files with 411 additions and 41 deletions

View file

@ -6,11 +6,22 @@ Last updated: **2026-05-14**.
--- ---
## 0. Sibling project notice — `crisis_agents` exists
Since this file was last meaningfully updated, a sibling Python sub-project has landed: **`src/crisis_agents/`** — a coordination layer that uses the same `crisis` protocol substrate for a fundamentally different consumer (AI agent teams, not visualization). It produces `proof_*.json` documents instead of `crisis_data.json`.
**Important for CrisisViz work:** the two sub-projects don't share code. `crisis_agents` does not produce data CrisisViz reads, and CrisisViz does not consume anything from `crisis_agents`. Refactoring either one cannot break the other.
If a future curriculum chapter wants to visualize agent coordination (decentralized detection, gossip propagation, multi-detector alarm convergence), that's a substantial new effort — see the parent README's "future CrisisViz story" note. For now, **focus on the chapter and testbed work and treat `crisis_agents` as an unrelated package living in the same repo**.
Reference: **[`../src/crisis_agents/README.md`](../src/crisis_agents/README.md)**.
---
## 1. Current state — what's shipped ## 1. Current state — what's shipped
- **All 10 chapters migrated** to the serial-beat timeline pattern (pure `state(at: t) -> WorldState`, scrubbable 16× to +16×, beat-bound narration). - **All 10 chapters migrated** to the serial-beat timeline pattern (pure `state(at: t) -> WorldState`, scrubbable 16× to +16×, beat-bound narration).
- **Testbed green** at the last clean run: 38/38 invariants pass, 0 source-audit errors, 36/36 MP4 clips written, 279 PNGs sane, 12/12 resize cases pass. - **Testbed green** at the last clean run: 38/38 invariants pass, 0 source-audit errors, 36/36 MP4 clips written, 279 PNGs sane, 12/12 resize cases pass.
- **`origin/master` at `fb9bc9c`** — working tree was clean before this documentation/testing pass. After this pass: README.md/INSTALL.md/LICENSE/CrisisViz README&HANDOFF/package-dmg.sh/Python tests landed.
- **Bundle pipeline works.** `./bundle.sh` produces a working `CrisisViz.app`. `./package-dmg.sh` produces a working `CrisisViz.dmg` (ad-hoc signed; first-open Gatekeeper warning, right-click → Open). - **Bundle pipeline works.** `./bundle.sh` produces a working `CrisisViz.app`. `./package-dmg.sh` produces a working `CrisisViz.dmg` (ad-hoc signed; first-open Gatekeeper warning, right-click → Open).
If you can't run the testbed and confirm it's green, **stop and fix that first** before making curriculum changes. If you can't run the testbed and confirm it's green, **stop and fix that first** before making curriculum changes.

View file

@ -1,6 +1,6 @@
# INSTALL — Crisis & CrisisViz # INSTALL — Crisis, CrisisViz, and crisis_agents
End-to-end setup on a fresh macOS box, from a blank checkout to a running visualizer. Follow top-to-bottom. End-to-end setup on a fresh macOS box: from blank checkout to running protocol tests, the agent-coordination demo, and the SwiftUI visualizer. Follow top-to-bottom.
--- ---
@ -52,7 +52,7 @@ Run the unit tests to verify the algorithm implementations:
pytest -q pytest -q
``` ```
Expected: all tests pass in under a second. If any fail, stop and investigate before continuing — the visualizer's data pipeline depends on these. Expected: ~170 tests, all green in under a second. If any fail, stop and investigate before continuing — both the visualizer's data pipeline and the agent-coordination layer depend on these.
Try a deterministic in-process simulation: Try a deterministic in-process simulation:
@ -64,7 +64,39 @@ You should see consensus rounds advance and a total order emerge.
--- ---
## 4. Regenerate `crisis_data.json` (optional) ## 4. Run the AI-agent coordination demo
The `crisis-agents` CLI walks a six-phase scenario end-to-end: a closed honest team, a byzantine joiner who equivocates on a fact-check statement, an asynchronous gossip + detection event loop, quorum-ratified alarm, and a multi-signer proof JSON.
### 4a. Mocked agents (deterministic, no API costs)
```sh
crisis-agents demo --out-dir /tmp/crisis_demo
```
Output ends with `proof_<accused>_<statement>.json` in `--out-dir`. To self-verify a proof:
```sh
crisis-agents verify /tmp/crisis_demo/proof_*.json
```
### 4b. Real Claude sub-agents (`--live`)
Install the optional Anthropic SDK extras:
```sh
pip install -e ".[live]"
export ANTHROPIC_API_KEY=sk-ant-...
crisis-agents demo --live --model claude-haiku-4-5-20251001
```
This swaps the three scripted honest agents for `LiveClaudeAgent` instances backed by real Anthropic Messages API calls. The byzantine stays scripted so the equivocation is reliably reproducible. Costs API credits; output is non-deterministic.
Architecture reference: **[src/crisis_agents/README.md](src/crisis_agents/README.md)**.
---
## 5. Regenerate `crisis_data.json` (optional)
The repo ships with a pre-recorded `crisis_data.json` at the root and a bundled copy in `CrisisViz/Sources/CrisisViz/`. Regenerate when you change the protocol code or want a different simulation: The repo ships with a pre-recorded `crisis_data.json` at the root and a bundled copy in `CrisisViz/Sources/CrisisViz/`. Regenerate when you change the protocol code or want a different simulation:
@ -77,9 +109,9 @@ The defaults (6 honest + 1 byzantine, 80 steps) produce full convergence from st
--- ---
## 5. Swift side — the visualizer ## 6. Swift side — the visualizer
### 5a. Quick dev loop ### 6a. Quick dev loop
```sh ```sh
cd CrisisViz cd CrisisViz
@ -89,7 +121,7 @@ swift run CrisisViz # launches the dev binary
Note: the dev binary does not have a Dock icon and lives in `.build/`. For a real `.app` use `bundle.sh`. Note: the dev binary does not have a Dock icon and lives in `.build/`. For a real `.app` use `bundle.sh`.
### 5b. Build the `.app` bundle ### 6b. Build the `.app` bundle
```sh ```sh
./bundle.sh # build + assemble CrisisViz.app + open ./bundle.sh # build + assemble CrisisViz.app + open
@ -98,7 +130,7 @@ Note: the dev binary does not have a Dock icon and lives in `.build/`. For a rea
`CrisisViz.app` is created in the current directory. Open it from Finder or the Dock to get the full activation-policy behavior. `CrisisViz.app` is created in the current directory. Open it from Finder or the Dock to get the full activation-policy behavior.
### 5c. Build a DMG installer ### 6c. Build a DMG installer
```sh ```sh
./package-dmg.sh # produces CrisisViz.dmg in the current directory ./package-dmg.sh # produces CrisisViz.dmg in the current directory
@ -112,7 +144,7 @@ Distribution flow for a new machine:
3. Drag `CrisisViz` onto the `Applications` symlink. 3. Drag `CrisisViz` onto the `Applications` symlink.
4. Eject the DMG; launch from `/Applications` (right-click → Open the first time). 4. Eject the DMG; launch from `/Applications` (right-click → Open the first time).
### 5d. Run the QA testbed ### 6d. Run the QA testbed
```sh ```sh
swift run CrisisViz --testbed swift run CrisisViz --testbed
@ -130,7 +162,7 @@ All five should be green before shipping changes.
--- ---
## 6. Troubleshooting ## 7. Troubleshooting
**`swift build` fails with “unsupported deployment target”.** Your Xcode does not provide the macOS 26 SDK. Update Xcode to ≥17, or downgrade `Package.swift` to your installed SDK (not recommended — visual features depend on macOS 26 Liquid Glass APIs). **`swift build` fails with “unsupported deployment target”.** Your Xcode does not provide the macOS 26 SDK. Update Xcode to ≥17, or downgrade `Package.swift` to your installed SDK (not recommended — visual features depend on macOS 26 Liquid Glass APIs).
@ -141,3 +173,7 @@ All five should be green before shipping changes.
**`pytest` fails on `ModuleNotFoundError: crisis`.** Activate the venv (`source .venv/bin/activate`) and reinstall with `pip install -e ".[dev]"`. The `-e` (editable) flag is what makes `import crisis` resolve to `src/crisis/`. **`pytest` fails on `ModuleNotFoundError: crisis`.** Activate the venv (`source .venv/bin/activate`) and reinstall with `pip install -e ".[dev]"`. The `-e` (editable) flag is what makes `import crisis` resolve to `src/crisis/`.
**The visualizer freezes mid-chapter / animations are stuck.** You're running the unbundled `swift-run` binary while the Dock icon launches `CrisisViz.app`. Rebuild the bundle: `./bundle.sh --no-launch && open CrisisViz.app`. **The visualizer freezes mid-chapter / animations are stuck.** You're running the unbundled `swift-run` binary while the Dock icon launches `CrisisViz.app`. Rebuild the bundle: `./bundle.sh --no-launch && open CrisisViz.app`.
**`crisis-agents --live` fails with `live mode requires the anthropic SDK`.** Install the optional extras: `pip install -e ".[live]"`. The mocked path doesn't need this dependency.
**`crisis-agents --live` fails with `ANTHROPIC_API_KEY`.** Export the key before running: `export ANTHROPIC_API_KEY=sk-ant-...`. The SDK reads it from the environment.

111
README.md
View file

@ -4,11 +4,12 @@ A proof-of-concept and educational artifact for Mirco Richter's [_Crisis_ paper]
This repository contains: This repository contains:
- a **Python implementation** of the protocol (`src/`, `tests/`), - a **Python implementation** of the protocol (`src/crisis/`, `tests/`),
- an **event recorder** that exports a deterministic simulation run to JSON, - an **event recorder** that exports a deterministic simulation run to JSON,
- **CrisisViz** — a native macOS / SwiftUI curriculum visualizer that walks the protocol end-to-end across ten chapters: cast intro, gossip mechanics, partition, round derivation, virtual voting, leader election, total order, the data-availability problem, erasure-coded recovery, and Byzantine fork detection. - **CrisisViz** — a native macOS / SwiftUI curriculum visualizer that walks the protocol end-to-end across ten chapters,
- **crisis_agents** — a coordination layer that lifts the protocol from "consensus between machines" to "consensus between AI agents," with a decentralized async event-driven engine and quorum-ratified byzantine alarms.
Everything in the visualizer is in extreme slow motion and serialized for didactic clarity. A signed speed slider scrubs the chapter forward and backward at any rate from $-16\times$ to $+16\times$; narration is bound to whichever beat the playhead is on. Everything in the visualizer is in extreme slow motion and serialized for didactic clarity. A signed speed slider scrubs each chapter forward and backward at any rate from $-16\times$ to $+16\times$; narration is bound to whichever beat the playhead is on.
--- ---
@ -32,6 +33,7 @@ flowchart TD
Algos --> RealRT Algos --> RealRT
Algos --> SimRT Algos --> SimRT
Algos --> AgentLayer
subgraph RealRT["🌐 <b>Real runtime — <code>node.py</code> + <code>gossip.py</code></b><br/><i>scalable, deployable</i>"] subgraph RealRT["🌐 <b>Real runtime — <code>node.py</code> + <code>gossip.py</code></b><br/><i>scalable, deployable</i>"]
Node["CrisisNode<br/>asyncio · TCP push/pull gossip<br/>3 concurrent loops<br/>CLI: <code>crisis-node</code>"] Node["CrisisNode<br/>asyncio · TCP push/pull gossip<br/>3 concurrent loops<br/>CLI: <code>crisis-node</code>"]
@ -43,12 +45,20 @@ flowchart TD
SimNode --- SimCtl SimNode --- SimCtl
end end
subgraph AgentLayer["🤖 <b>Crisis-Agents — <code>src/crisis_agents/</code></b><br/><i>decentralized, asynchronous</i>"]
Agent["CrisisAgent ×N<br/>owns own LamportGraph<br/>emit · receive · gossip · detect"]
Mom["Mothership<br/>bootstrap + event-loop driver<br/>no clock · no privileged state<br/>CLI: <code>crisis-agents</code>"]
Agent --- Mom
end
SimRT --> Rec SimRT --> Rec
Rec["📼 <b>Recorder — <code>recorder.py</code></b><br/>instruments every algorithm call<br/>captures events + per-step snapshots"] Rec["📼 <b>Recorder — <code>recorder.py</code></b><br/>instruments every algorithm call<br/>captures events + per-step snapshots"]
Rec --> Export Rec --> Export
Export["📦 <b>JSON exporter — <code>export_json.py</code></b><br/>writes <code>crisis_data.json</code>"] Export["📦 <b>JSON exporter — <code>export_json.py</code></b><br/>writes <code>crisis_data.json</code>"]
Export --> Viz Export --> Viz
AgentLayer --> ProofJSON["🧾 <b>proof_*.json</b><br/>multi-signer byzantine proof<br/>schema_version=2"]
subgraph Viz["🎬 <b>CrisisViz — native macOS / SwiftUI</b>"] subgraph Viz["🎬 <b>CrisisViz — native macOS / SwiftUI</b>"]
Player["Keynote-style player<br/>10 chapters · ~18 min @ 1×<br/>scrubbable 16× to +16×"] Player["Keynote-style player<br/>10 chapters · ~18 min @ 1×<br/>scrubbable 16× to +16×"]
Testbed["Testbed harness<br/>invariants · source audit<br/>PNG sweep · 36 MP4 clips"] Testbed["Testbed harness<br/>invariants · source audit<br/>PNG sweep · 36 MP4 clips"]
@ -58,53 +68,77 @@ flowchart TD
classDef pure fill:#eee8d5,stroke:#586e75,color:#073642 classDef pure fill:#eee8d5,stroke:#586e75,color:#073642
classDef real fill:#fce5cd,stroke:#cc4125,color:#660000 classDef real fill:#fce5cd,stroke:#cc4125,color:#660000
classDef sim fill:#d9ead3,stroke:#38761d,color:#0b3d0b classDef sim fill:#d9ead3,stroke:#38761d,color:#0b3d0b
classDef agents fill:#fff2cc,stroke:#bf9000,color:#3d2e00
classDef rec fill:#cfe2f3,stroke:#2c5f8f,color:#062b4d classDef rec fill:#cfe2f3,stroke:#2c5f8f,color:#062b4d
classDef viz fill:#ead1dc,stroke:#741b47,color:#3d0a26 classDef viz fill:#ead1dc,stroke:#741b47,color:#3d0a26
classDef proof fill:#fce5e8,stroke:#a64d59,color:#3d0014
class Paper paper class Paper paper
class Algos pure class Algos pure
class RealRT real class RealRT real
class SimRT sim class SimRT sim
class AgentLayer agents
class Rec,Export rec class Rec,Export rec
class Viz viz class Viz viz
class ProofJSON proof
``` ```
**Key architectural fact** — the recording pipeline that feeds CrisisViz only exercises the **`SimulatedNode`** path (in-process, deterministic, in-memory message passing). The **`CrisisNode`** TCP runtime is a separately developed PoC of how a real network deployment would look; it is _not_ what produces `crisis_data.json`. The two runtimes are siblings, not layers. **Three independent consumers of the protocol.** `src/crisis/` provides the pure algorithms (Lamport graphs, virtual voting, total order, mutation detection). Three sibling layers sit on top:
- **`CrisisNode`** — a deployable distributed runtime (TCP gossip, three concurrent asyncio loops). Has no consumers in this repo; meant as a reference for how a real network deployment would look.
- **`SimulatedNode`** — an in-process deterministic simulator whose recording becomes `crisis_data.json`, the file CrisisViz visualizes.
- **`crisis_agents`** — agent-coordination layer. Each AI agent participates as a Crisis node; the network catches byzantine equivocation through decentralized detection + quorum voting. The engine is asynchronous and event-driven — no global clock, no privileged observer.
The three are **siblings, not layers**: refactoring one doesn't break the others. CrisisViz and crisis_agents don't know each other exists.
--- ---
## Repository layout ## Repository layout
``` ```
crisis/ ← git root crisis/ ← git root
├── Crisis.mirco-richter-2019.pdf the paper ├── Crisis.mirco-richter-2019.pdf the paper
├── README.md this file ├── README.md this file
├── INSTALL.md fresh-macOS install guide ├── INSTALL.md fresh-macOS install guide
├── LICENSE MIT (code only; paper is CC-BY-4.0) ├── LICENSE MIT (code only; paper is CC-BY-4.0)
├── pyproject.toml Python ≥3.11, networkx, pytest ├── pyproject.toml Python ≥3.11, networkx, pytest
├── crisis_data.json simulation export (source of truth) ├── crisis_data.json simulation export (source of truth)
├── src/crisis/ ── PROTOCOL PoC (Python) ── ├── src/crisis/ ── PROTOCOL PoC (Python) ──
│ ├── crypto.py, message.py random-oracle hash + Message/Vertex │ ├── crypto.py, message.py random-oracle hash + Message/Vertex
│ ├── graph.py, weight.py, rounds.py Lamport DAG + PoW weight + round derivation │ ├── graph.py, weight.py, rounds.py Lamport DAG + PoW weight + round derivation
│ ├── voting.py, order.py BBA virtual voting + total order │ ├── voting.py, order.py BBA virtual voting + total order
│ ├── gossip.py, node.py real TCP runtime (CrisisNode) │ ├── gossip.py, node.py real TCP runtime (CrisisNode)
│ ├── demo.py in-process simulation harness │ ├── demo.py in-process simulation harness
│ ├── recorder.py event instrumentation │ ├── recorder.py event instrumentation
│ └── export_json.py JSON exporter for CrisisViz │ └── export_json.py JSON exporter for CrisisViz
├── tests/ pytest suite
└── CrisisViz/ ── VISUALIZER (Swift / macOS 26) ── ├── src/crisis_agents/ ── AGENT COORDINATION (Python) ──
│ ├── README.md architecture & walkthrough
│ ├── agent.py CrisisAgent + MockAgent + MockByzantineAgent
│ ├── live_agent.py LiveClaudeAgent (Anthropic SDK)
│ ├── boundary.py trust-set + open() trigger
│ ├── mothership.py bootstrap + async event-loop driver
│ ├── claim.py ClaimMessage payload
│ ├── alarm.py decentralized detection
│ ├── vote.py AlarmClaim + quorum tally
│ ├── proof.py multi-signer ProofDocument
│ ├── cli.py crisis-agents CLI entry point
│ └── scenarios/fact_check.py the canonical demo
├── tests/ pytest suite (170 tests, ~0.8s)
└── CrisisViz/ ── VISUALIZER (Swift / macOS 26) ──
├── Package.swift, bundle.sh, package-dmg.sh ├── Package.swift, bundle.sh, package-dmg.sh
├── Sources/CrisisViz/ App, Engine, Model, Chapters, Views, Glass, Testbed, Canvas ├── Sources/CrisisViz/ App, Engine, Model, Chapters, Views, Glass, Testbed, Canvas
├── README.md Swift-side human guide ├── README.md Swift-side human guide
└── HANDOFF.md agent-to-agent engineering log └── HANDOFF.md agent-to-agent engineering log
``` ```
--- ---
## Quick start ## Quick start
There are three audiences. Pick the one that matches what you want to do. Four audiences. Pick the one that matches what you want to do.
### 🧮 Verify the protocol — pytest ### 🧮 Verify the protocol — pytest
@ -114,9 +148,9 @@ source .venv/bin/activate # set up per INSTALL.md if first time
pytest -q pytest -q
``` ```
Runs the algorithm unit tests (crypto, graph, rounds, weight, message, order, voting, recorder, simulation). Should be green in under a second. Runs all 170 tests across the protocol algorithms and the crisis_agents layer. Should be green in under a second.
### 🧪 Run a deterministic simulation — Python CLI ### 🧪 Run a deterministic protocol simulation — Python CLI
```sh ```sh
python -m crisis.demo --nodes 4 --byzantine 1 --rounds 10 python -m crisis.demo --nodes 4 --byzantine 1 --rounds 10
@ -129,7 +163,23 @@ python -m crisis.export_json --steps 80 -o crisis_data.json
cp crisis_data.json CrisisViz/Sources/CrisisViz/crisis_data.json cp crisis_data.json CrisisViz/Sources/CrisisViz/crisis_data.json
``` ```
### 🎬 Watch the visualizer — Swift / macOS ### 🤖 Run the AI-agent coordination demo — Python CLI
```sh
crisis-agents demo
```
Walks a six-phase scenario: a closed honest team, a byzantine joiner who equivocates on a fact-check statement, an asynchronous gossip + detection event loop, and a quorum-ratified proof. Output ends with a `proof_*.json` document that any third party can self-verify. See **[src/crisis_agents/README.md](src/crisis_agents/README.md)** for the architecture.
For real Claude sub-agents instead of scripted mocks:
```sh
pip install -e ".[live]" # adds anthropic SDK
export ANTHROPIC_API_KEY=...
crisis-agents demo --live
```
### 🎬 Watch the protocol visualizer — Swift / macOS
```sh ```sh
cd CrisisViz cd CrisisViz
@ -144,9 +194,10 @@ Then arrow keys ←/→ to navigate, **Space** to play/pause, the bottom slider
## Where to read next ## Where to read next
- **[INSTALL.md](INSTALL.md)** — clone-to-running on a fresh macOS box. Prerequisites, Python venv setup, Swift toolchain, regenerating sim data, troubleshooting. - **[INSTALL.md](INSTALL.md)** — clone-to-running on a fresh macOS box. Prerequisites, Python venv setup, Swift toolchain, regenerating sim data, running the agents demo, troubleshooting.
- **[src/crisis_agents/README.md](src/crisis_agents/README.md)** — the AI-agent coordination layer: architecture, six-phase walkthrough, decentralization principles, async event loop, quorum formula, live Claude mode, proof JSON shape.
- **[CrisisViz/README.md](CrisisViz/README.md)** — Swift-side guide: serial-timeline pattern, testbed outputs, controls, cast convention. - **[CrisisViz/README.md](CrisisViz/README.md)** — Swift-side guide: serial-timeline pattern, testbed outputs, controls, cast convention.
- **[CrisisViz/HANDOFF.md](CrisisViz/HANDOFF.md)** — engineering log for the next coding agent: current state, architecture pointers, hard-won rules. - **[CrisisViz/HANDOFF.md](CrisisViz/HANDOFF.md)** — engineering log for the next coding agent.
--- ---

272
src/crisis_agents/README.md Normal file
View file

@ -0,0 +1,272 @@
# crisis_agents — coordination layer for AI agent teams
A Python package that lifts the Crisis consensus protocol from "consensus between machines" to "consensus between AI agents." Each participant is a Crisis node with its own Lamport graph; the network catches byzantine equivocation via decentralized detection and quorum-ratified alarms. The engine is **asynchronous** and **event-driven** — no global clock, no privileged observer.
> If you're new to this repo, start at the [parent README](../../README.md). This document is the architectural reference for the agent layer.
---
## Threat model
The setting is a small team of AI agents (Claude sub-agents, in our live demo) coordinated by an orchestrator we call the **mothership**.
- **Normal life** — the team is closed. Agents talk freely with each other and the mothership. No Crisis layer; the conversation is the medium.
- **Boundary opens** — an external agent of unknown trust joins. Its internal motivation may diverge from the team's task. It may equivocate — telling one peer one thing while telling another peer the opposite — to mislead the network.
- **Crisis to the rescue** — from the moment the boundary opens, every claim is wrapped into a Crisis Message with the emitting agent's stable process_id and a PoW nonce. The per-agent Lamport DAG is the immutable, replayable ledger. Mutation detection (built on `LamportGraph.find_mutations` from the protocol layer) catches equivocation. Every honest agent who has gossiped enough to see both contradictory variants raises an alarm. A quorum of independent alarms produces a network-ratified proof of malfeasance.
What's deliberately **not** in scope (this is a PoC):
- Visualization. The CrisisViz application is a separate effort that visualizes the protocol PoC; visualizing an agent-coordination run would require a substantial new chapter set there.
- Real TCP gossip. Agents talk via in-process function calls in the mothership process. The existing `crisis.gossip.GossipServer` shows how it would look across sockets.
- Detection of *false claims that aren't equivocations*. An agent who consistently lies but never equivocates is out-voted, not "caught." Catching it would require a ground-truth oracle, which is application-layer, not protocol-layer.
---
## Two architectural principles, enforced by tests
### 1. No chokepoint
Every honest agent maintains **its own** `LamportGraph`. The mothership does NOT hold a privileged graph of the whole network. Detection runs on each agent independently; alarms are emitted by each detector independently; proofs are signed by a quorum of detectors.
The regression-test file `tests/test_no_chokepoint.py` asserts:
- After the full lifecycle, every honest agent's *ratified-alarms set* is byte-identical to every other honest agent's.
- The mothership does not expose `all_graphs`, `graph_of`, `_graphs`, or any other privileged collection.
- A single byzantine accuser alone cannot ratify an alarm.
### 2. No clock
Crisis is supposed to work in asynchronous P2P networks. Any synchronicity in the protocol is *virtual* — derived inside the consensus algorithm from the causal structure of the Lamport graph — not imposed from outside by a coordinator.
The driver loop is **event-driven and quiescence-terminated**, not turn-counted:
```python
def run_until_quiescent(max_steps=200):
while progress:
progress = False
# 1. Any agent has something to emit? Let them speak.
# 2. Any gossip pair has new info? Exchange.
# 3. Any agent has detected a new mutation? Emit AlarmClaim.
```
`tests/test_async_quiescence.py` asserts:
- `CrisisAgent.try_emit()` takes no `turn` argument.
- `AlarmClaim` has no `detected_at_turn` field (the wall-clock-implying name); only `emitted_at_step`, which is a per-agent local sequence number.
- Two runs of the same scenario produce identical end states (determinism — no hidden wall-time dependence).
- The loop alone (no manual phase orchestration) ratifies an alarm.
---
## The mental model
```mermaid
flowchart TB
subgraph Closed["🟢 Phase 1 — closed team (no Crisis)"]
Mom1["mothership"]
A1["agent_α"]
B1["agent_β"]
C1["agent_γ"]
Mom1 <--> A1
Mom1 <--> B1
Mom1 <--> C1
A1 <--> B1
B1 <--> C1
end
Boundary{"<b>BOUNDARY OPENS</b><br/>agent_δ joins;<br/>trust unknown"}
Closed --> Boundary --> Open
subgraph Open["🟡 Phase 2 — Crisis active, async event loop"]
Mom2["mothership<br/>(bootstrap + driver only)"]
A2["agent_α<br/>own LamportGraph<br/>detect · alarm"]
B2["agent_β<br/>own LamportGraph<br/>detect · alarm"]
C2["agent_γ<br/>own LamportGraph<br/>detect · alarm"]
D2["agent_δ ⚠<br/>own LamportGraph<br/><i>byzantine</i>"]
D2 -. variant_A .-> A2
D2 -. variant_A .-> C2
D2 -. variant_B .-> B2
A2 <-. gossip .-> B2
B2 <-. gossip .-> C2
A2 <-. gossip .-> C2
A2 -- alarm --> Mom2
B2 -- alarm --> Mom2
C2 -- alarm --> Mom2
end
Open --> Quorum
Quorum{"<b>QUORUM VOTE</b><br/>≥ ⌈2N/3⌉ honest signers<br/>independently agree"}
Quorum --> Proof
Proof["<b>📜 Multi-signer proof</b><br/>signed JSON; replayable<br/>schema_version=2"]
classDef closed fill:#d9ead3,stroke:#38761d
classDef boundary fill:#fff2cc,stroke:#bf9000
classDef open fill:#fce5cd,stroke:#cc4125
classDef quorum fill:#cfe2f3,stroke:#2c5f8f
classDef proof fill:#ead1dc,stroke:#741b47
class Closed closed
class Boundary,Quorum boundary
class Open open
class Proof proof
```
---
## Six-phase walkthrough (the `crisis-agents demo`)
The canonical scenario is `scenarios/fact_check.py`: three honest agents and one byzantine adjudicate six factual statements about a small reference document.
### Phase 1 — closed team, no Crisis
The mothership drives `run_closed_phase()` until quiescent. Each honest agent emits its six fact-check claims via plain function calls — appended to a flat log. Per-agent LamportGraphs aren't yet allocated. **No Crisis overhead.**
### Phase 2 — boundary opens
`mothership.open_boundary(agent_delta)`. Atomically: δ is added to the trust set, a fresh `LamportGraph` is created on every agent (including δ), and `boundary.is_open` flips to `True`.
### Phase 3 — asynchronous event loop
`mothership.run_until_quiescent()`. The driver cycles through:
1. **Emission**`agent.try_emit()` is called on each agent. Returned `AgentTurn`s are first-hop routed to their target subset (or broadcast). The byzantine emits an intro (broadcast), then a pair of contradictory variants (split delivery).
2. **Gossip** — every ordered pair `(sender, receiver)` exchanges what `sender` has that `receiver` doesn't. Eventually-consistent propagation.
3. **Alarm emission**`agent.pending_alarm_claims()` runs `LamportGraph.find_mutations(...)` on each agent's own graph and produces `AlarmClaim`s for any newly observed equivocation. AlarmClaims are wrapped as Crisis Messages and broadcast.
The loop exits when none of these three concerns make progress. `QuiescenceReport` (returned) carries: `steps`, `emissions`, `gossip_transfers`, `alarm_claims_emitted`, `reached_quiescence`.
### Phase 4 — decentralized detection
Each agent independently runs `detect_mutations()` on its own graph. In our scenario, every honest agent observes the byzantine's same-id spacelike pair and reports it. The byzantine doesn't accuse itself.
### Phase 5 — ratification by quorum
The quorum threshold is
$$\text{quorum}(N) = \left\lceil \frac{2N}{3} \right\rceil$$
where $N$ is the boundary size at ratification. For our scenario $N=4$ (3 honest + 1 byzantine), so the threshold is $\left\lceil 2 \cdot 4 / 3 \right\rceil = 3$ — every honest agent must concur. `tally_alarms(graph, threshold)` groups AlarmClaim vertices by `(accused, statement_id, witness_pair)`, counts unique signer process_ids per group, and ratifies groups meeting the threshold. **All honest agents produce identical `RatifiedAlarm` lists** (this is the no-chokepoint property in action).
### Phase 6 — proof emission
`build_proof(ratified_alarm)` produces a self-contained JSON document. Schema:
```json
{
"schema_version": 2,
"accused_process_id_hex": "...",
"statement_id": "s03",
"witness_digests": ["...", "..."],
"signer_process_id_hexes": ["...", "...", "..."],
"quorum_threshold": 3,
"summary": "agent id=... emitted contradictory Crisis vertices about ..."
}
```
`verify_proof_self_consistent(proof)` checks distinct witnesses, distinct signers, signer count ≥ threshold. Future Phase-6+ work: full replay verification that re-derives the alarm from a recorded simulation log.
---
## Module reference
| File | What it owns |
|---|---|
| `claim.py` | `Claim` dataclass — the application-layer payload (verdict + evidence) |
| `boundary.py` | `Boundary` — trust set, `open()` trigger |
| `agent.py` | `CrisisAgent` (abstract) + `MockAgent` + `MockByzantineAgent`. Each agent owns its `LamportGraph`, `emit_claim`, `receive`, `gossip_to`, `detect_mutations`, `pending_alarm_claims` |
| `live_agent.py` | `LiveClaudeAgent` — same interface, backed by real Anthropic API calls |
| `mothership.py` | `Mothership` — bootstrap + async event-loop driver. No privileged graph state. `run_closed_phase()`, `run_until_quiescent()`, `ratified_alarms_from(name)` |
| `alarm.py` | `LocalAlarm` + `detect_mutations_in_graph(graph, ...)` — pure function, runs on one agent's graph |
| `vote.py` | `AlarmClaim` payload, `RatifiedAlarm`, `quorum_for(n)`, `tally_alarms(graph, threshold)` |
| `proof.py` | `ProofDocument` (schema v2), `build_proof`, `verify_proof_self_consistent` |
| `cli.py` | `crisis-agents demo` + `crisis-agents verify` |
| `scenarios/fact_check.py` | The canonical demo scenario: reference doc, six statements, scripted agents |
| `scenarios/reference_doc.txt` | The factual paragraph the demo adjudicates |
---
## Reuse map from `src/crisis/`
Almost all the heavy lifting comes from the protocol layer; `crisis_agents` is a thin adapter.
| `src/crisis/` primitive | How `crisis_agents` uses it |
|---|---|
| `Message`, `Vertex` | Claims and AlarmClaims become `Message.payload`. Agent's stable id → `Message.id`. |
| `LamportGraph` | One per agent. `extend()`, `find_mutations()`, `are_spacelike()` all reused. |
| `LamportGraph.find_mutations(pid)` | The core of decentralized detection. Returns same-id spacelike groups. |
| `ProofOfWorkWeight` + `mine_nonce()` | Each emission's PoW comes from here, with a shared weight system across the network so PoW is verifiable across graphs. |
| `digest(name)[:ID_LENGTH]` | Agent process_id derivation. Same convention as `crisis.demo.Simulation` so agents could coexist with simulated nodes in a future mixed scenario. |
---
## Build · run · test
```sh
# From repo root, after setup per INSTALL.md
cd /path/to/crisis
source .venv/bin/activate
pip install -e ".[dev]" # editable install with pytest
# All tests, including crisis_agents
pytest -q # ~170 tests in 0.8s
# Just the agent layer
pytest tests/test_claim.py tests/test_boundary.py tests/test_agent*.py \
tests/test_mothership.py tests/test_alarm.py tests/test_vote.py \
tests/test_proof.py tests/test_demo_fact_check.py \
tests/test_no_chokepoint.py tests/test_async_quiescence.py -v
# Run the demo (mocked, deterministic)
crisis-agents demo --out-dir /tmp/crisis_demo
# Run with real Claude sub-agents (requires API key + extras)
pip install -e ".[live]"
export ANTHROPIC_API_KEY=sk-ant-...
crisis-agents demo --live --model claude-haiku-4-5-20251001
# Verify a proof
crisis-agents verify /tmp/crisis_demo/proof_*.json
```
---
## The live-Claude path
`LiveClaudeAgent` (in `live_agent.py`) makes one Anthropic Messages API call per `try_emit()` invocation, asking Claude to fact-check the scenario's statements against the reference document. The response is parsed as a JSON array of `Claim`-shaped objects; malformed responses degrade gracefully (the agent emits nothing rather than crashing).
The byzantine joiner stays **mocked** even in `--live` mode: producing deterministic equivocation from an LLM requires multiple API calls per turn (one per peer subset) for unreliable yields, and the demo's narrative is cleaner with a scripted byzantine. The honest agents are the real LLM participants.
Default model: `claude-haiku-4-5-20251001` (fast, cheap, plenty of capability for structured-output adjudication). Override with `--model`.
The live path is intentionally not in CI — it costs API credits and has nondeterministic outputs.
---
## Test taxonomy
| Test file | What it asserts |
|---|---|
| `tests/test_claim.py` | Claim dataclass validation + JSON round-trip |
| `tests/test_boundary.py` | Boundary state machine (closed → open) |
| `tests/test_mothership.py` | Per-agent graph ownership; broadcast vs. targeted delivery; gossip propagation; no privileged attribute |
| `tests/test_alarm.py` | Decentralized detection; every honest agent finds the same mutation; canonical witness pairs |
| `tests/test_vote.py` | AlarmClaim round-trip; quorum formulas; tally determinism |
| `tests/test_proof.py` | ProofDocument schema; JSON round-trip; tampered-witness/below-quorum rejection |
| `tests/test_demo_fact_check.py` | End-to-end scenario produces one ratified alarm; CLI output contains all six phases |
| `tests/test_live_agent.py` | LiveClaudeAgent parsing (fake Anthropic client; no real API calls) |
| **`tests/test_no_chokepoint.py`** | **Centerpiece: every honest agent's ratified set is byte-identical; no privileged attributes exist** |
| **`tests/test_async_quiescence.py`** | **Centerpiece: no clock; `try_emit()` takes no arg; `AlarmClaim.detected_at_turn` doesn't exist; two runs converge identically** |
The two centerpiece files are sentinels — if you ever re-introduce a chokepoint or a wall clock, one of those tests should fail.
---
## What's deliberately out of scope
- **CrisisViz integration.** The visualizer's data file (`crisis_data.json`) is produced by `crisis.demo.Simulation`, not by `crisis_agents`. A future CrisisViz upgrade could absorb agent-coordination runs (multi-DAG rendering, gossip arrows, alarm-vote convergence) — but that's a separate effort, sketched in the parent README.
- **Real TCP gossip.** In-process function calls only. Lifting to multi-process requires plugging into `crisis.gossip.GossipServer` — independent work.
- **Cryptographic signatures beyond what Crisis already provides.** Crisis already provides nonces + message-digest chaining + PoW. Agent identity is `digest(name)[:32]`. We don't add a separate identity-PKI.
- **Sybil resistance.** Threat model is "a few byzantine joiners with valid PoW", not "an attacker spawning unlimited identities." Sybil defense is what the PoW weight in Crisis is *for*; it's not the agent layer's concern.
- **Byzantine false-accusations.** A byzantine could emit a false AlarmClaim against an honest agent. The quorum mechanism prevents ratification (honest agents won't second the false claim, so it stays at 1-of-N). Second-order detection of false accusers isn't in this PoC.
---
## Pointers
- Parent README: [`../../README.md`](../../README.md)
- Install guide: [`../../INSTALL.md`](../../INSTALL.md)
- The paper this is all based on: [`../../Crisis.mirco-richter-2019.pdf`](../../Crisis.mirco-richter-2019.pdf)