mirror of
https://github.com/saymrwulf/crisis.git
synced 2026-05-14 20:37:54 +00:00
Update all documentation for the crisis_agents layer + async refactor
Three sweeping additions and one new file, reflecting how the project
has grown:
* Parent `README.md` rewritten. The architecture mermaid now shows
`crisis_agents` as a third sibling layer on top of the pure
protocol algorithms, alongside the CrisisNode TCP runtime and the
SimulatedNode in-process recorder. A fourth audience-shaped quick
start (🤖 "run the AI-agent coordination demo") joins the
protocol-pytest, simulation-CLI, and visualizer entries. The
repository-layout tree expands to enumerate `src/crisis_agents/`'s
modules. Test count corrected (~170).
* New `src/crisis_agents/README.md`. Comprehensive package
documentation:
- threat model + what's out of scope
- the two principles enforced by tests: no chokepoint, no clock
- mental-model mermaid (closed phase → boundary opens → async
loop → quorum vote → multi-signer proof)
- six-phase walkthrough matching the CLI output
- module-by-module reference table
- reuse map from `src/crisis/` (Message, LamportGraph,
find_mutations, ProofOfWorkWeight, etc.)
- build/run/test instructions including the `--live` Claude path
- quorum-threshold formula in LaTeX: ⌈2N/3⌉
- test taxonomy with the two sentinel files
(test_no_chokepoint, test_async_quiescence) highlighted
* `INSTALL.md` extended. New Section 4 covers running the
`crisis-agents demo`, both mocked-deterministic and `--live` with
real Claude sub-agents. Anthropic SDK shown as optional `[live]`
extras. Old sections renumbered (Section 5 → Section 6 for Swift,
6 → 7 for Troubleshooting). Two new troubleshooting entries for
live-mode failures.
* `CrisisViz/HANDOFF.md` gets a new Section 0. Brief notice that a
sibling Python sub-project (`crisis_agents`) now exists, what it
does, and — most importantly — that it doesn't share code with
CrisisViz: refactoring one cannot break the other. Cross-link to
the crisis_agents README so a future Swift-side agent has the
pointer without having to discover it via grep.
Source-of-truth corrections in the parent README:
- the "three audiences" framing becomes four
- the layout tree now lists `src/crisis_agents/`
- the architecture diagram explicitly marks the agent layer as
"decentralized, asynchronous" (the two principles the recent
refactors enforce)
CrisisViz code: still untouched by all this. Only its HANDOFF doc
gets a heads-up paragraph.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
0976239ebd
commit
54aae1a4dd
4 changed files with 411 additions and 41 deletions
|
|
@ -6,11 +6,22 @@ Last updated: **2026-05-14**.
|
|||
|
||||
---
|
||||
|
||||
## 0. Sibling project notice — `crisis_agents` exists
|
||||
|
||||
Since this file was last meaningfully updated, a sibling Python sub-project has landed: **`src/crisis_agents/`** — a coordination layer that uses the same `crisis` protocol substrate for a fundamentally different consumer (AI agent teams, not visualization). It produces `proof_*.json` documents instead of `crisis_data.json`.
|
||||
|
||||
**Important for CrisisViz work:** the two sub-projects don't share code. `crisis_agents` does not produce data CrisisViz reads, and CrisisViz does not consume anything from `crisis_agents`. Refactoring either one cannot break the other.
|
||||
|
||||
If a future curriculum chapter wants to visualize agent coordination (decentralized detection, gossip propagation, multi-detector alarm convergence), that's a substantial new effort — see the parent README's "future CrisisViz story" note. For now, **focus on the chapter and testbed work and treat `crisis_agents` as an unrelated package living in the same repo**.
|
||||
|
||||
Reference: **[`../src/crisis_agents/README.md`](../src/crisis_agents/README.md)**.
|
||||
|
||||
---
|
||||
|
||||
## 1. Current state — what's shipped
|
||||
|
||||
- **All 10 chapters migrated** to the serial-beat timeline pattern (pure `state(at: t) -> WorldState`, scrubbable −16× to +16×, beat-bound narration).
|
||||
- **Testbed green** at the last clean run: 38/38 invariants pass, 0 source-audit errors, 36/36 MP4 clips written, 279 PNGs sane, 12/12 resize cases pass.
|
||||
- **`origin/master` at `fb9bc9c`** — working tree was clean before this documentation/testing pass. After this pass: README.md/INSTALL.md/LICENSE/CrisisViz README&HANDOFF/package-dmg.sh/Python tests landed.
|
||||
- **Bundle pipeline works.** `./bundle.sh` produces a working `CrisisViz.app`. `./package-dmg.sh` produces a working `CrisisViz.dmg` (ad-hoc signed; first-open Gatekeeper warning, right-click → Open).
|
||||
|
||||
If you can't run the testbed and confirm it's green, **stop and fix that first** before making curriculum changes.
|
||||
|
|
|
|||
56
INSTALL.md
56
INSTALL.md
|
|
@ -1,6 +1,6 @@
|
|||
# INSTALL — Crisis & CrisisViz
|
||||
# INSTALL — Crisis, CrisisViz, and crisis_agents
|
||||
|
||||
End-to-end setup on a fresh macOS box, from a blank checkout to a running visualizer. Follow top-to-bottom.
|
||||
End-to-end setup on a fresh macOS box: from blank checkout to running protocol tests, the agent-coordination demo, and the SwiftUI visualizer. Follow top-to-bottom.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -52,7 +52,7 @@ Run the unit tests to verify the algorithm implementations:
|
|||
pytest -q
|
||||
```
|
||||
|
||||
Expected: all tests pass in under a second. If any fail, stop and investigate before continuing — the visualizer's data pipeline depends on these.
|
||||
Expected: ~170 tests, all green in under a second. If any fail, stop and investigate before continuing — both the visualizer's data pipeline and the agent-coordination layer depend on these.
|
||||
|
||||
Try a deterministic in-process simulation:
|
||||
|
||||
|
|
@ -64,7 +64,39 @@ You should see consensus rounds advance and a total order emerge.
|
|||
|
||||
---
|
||||
|
||||
## 4. Regenerate `crisis_data.json` (optional)
|
||||
## 4. Run the AI-agent coordination demo
|
||||
|
||||
The `crisis-agents` CLI walks a six-phase scenario end-to-end: a closed honest team, a byzantine joiner who equivocates on a fact-check statement, an asynchronous gossip + detection event loop, quorum-ratified alarm, and a multi-signer proof JSON.
|
||||
|
||||
### 4a. Mocked agents (deterministic, no API costs)
|
||||
|
||||
```sh
|
||||
crisis-agents demo --out-dir /tmp/crisis_demo
|
||||
```
|
||||
|
||||
Output ends with `proof_<accused>_<statement>.json` in `--out-dir`. To self-verify a proof:
|
||||
|
||||
```sh
|
||||
crisis-agents verify /tmp/crisis_demo/proof_*.json
|
||||
```
|
||||
|
||||
### 4b. Real Claude sub-agents (`--live`)
|
||||
|
||||
Install the optional Anthropic SDK extras:
|
||||
|
||||
```sh
|
||||
pip install -e ".[live]"
|
||||
export ANTHROPIC_API_KEY=sk-ant-...
|
||||
crisis-agents demo --live --model claude-haiku-4-5-20251001
|
||||
```
|
||||
|
||||
This swaps the three scripted honest agents for `LiveClaudeAgent` instances backed by real Anthropic Messages API calls. The byzantine stays scripted so the equivocation is reliably reproducible. Costs API credits; output is non-deterministic.
|
||||
|
||||
Architecture reference: **[src/crisis_agents/README.md](src/crisis_agents/README.md)**.
|
||||
|
||||
---
|
||||
|
||||
## 5. Regenerate `crisis_data.json` (optional)
|
||||
|
||||
The repo ships with a pre-recorded `crisis_data.json` at the root and a bundled copy in `CrisisViz/Sources/CrisisViz/`. Regenerate when you change the protocol code or want a different simulation:
|
||||
|
||||
|
|
@ -77,9 +109,9 @@ The defaults (6 honest + 1 byzantine, 80 steps) produce full convergence from st
|
|||
|
||||
---
|
||||
|
||||
## 5. Swift side — the visualizer
|
||||
## 6. Swift side — the visualizer
|
||||
|
||||
### 5a. Quick dev loop
|
||||
### 6a. Quick dev loop
|
||||
|
||||
```sh
|
||||
cd CrisisViz
|
||||
|
|
@ -89,7 +121,7 @@ swift run CrisisViz # launches the dev binary
|
|||
|
||||
Note: the dev binary does not have a Dock icon and lives in `.build/`. For a real `.app` use `bundle.sh`.
|
||||
|
||||
### 5b. Build the `.app` bundle
|
||||
### 6b. Build the `.app` bundle
|
||||
|
||||
```sh
|
||||
./bundle.sh # build + assemble CrisisViz.app + open
|
||||
|
|
@ -98,7 +130,7 @@ Note: the dev binary does not have a Dock icon and lives in `.build/`. For a rea
|
|||
|
||||
`CrisisViz.app` is created in the current directory. Open it from Finder or the Dock to get the full activation-policy behavior.
|
||||
|
||||
### 5c. Build a DMG installer
|
||||
### 6c. Build a DMG installer
|
||||
|
||||
```sh
|
||||
./package-dmg.sh # produces CrisisViz.dmg in the current directory
|
||||
|
|
@ -112,7 +144,7 @@ Distribution flow for a new machine:
|
|||
3. Drag `CrisisViz` onto the `Applications` symlink.
|
||||
4. Eject the DMG; launch from `/Applications` (right-click → Open the first time).
|
||||
|
||||
### 5d. Run the QA testbed
|
||||
### 6d. Run the QA testbed
|
||||
|
||||
```sh
|
||||
swift run CrisisViz --testbed
|
||||
|
|
@ -130,7 +162,7 @@ All five should be green before shipping changes.
|
|||
|
||||
---
|
||||
|
||||
## 6. Troubleshooting
|
||||
## 7. Troubleshooting
|
||||
|
||||
**`swift build` fails with “unsupported deployment target”.** Your Xcode does not provide the macOS 26 SDK. Update Xcode to ≥17, or downgrade `Package.swift` to your installed SDK (not recommended — visual features depend on macOS 26 Liquid Glass APIs).
|
||||
|
||||
|
|
@ -141,3 +173,7 @@ All five should be green before shipping changes.
|
|||
**`pytest` fails on `ModuleNotFoundError: crisis`.** Activate the venv (`source .venv/bin/activate`) and reinstall with `pip install -e ".[dev]"`. The `-e` (editable) flag is what makes `import crisis` resolve to `src/crisis/`.
|
||||
|
||||
**The visualizer freezes mid-chapter / animations are stuck.** You're running the unbundled `swift-run` binary while the Dock icon launches `CrisisViz.app`. Rebuild the bundle: `./bundle.sh --no-launch && open CrisisViz.app`.
|
||||
|
||||
**`crisis-agents --live` fails with `live mode requires the anthropic SDK`.** Install the optional extras: `pip install -e ".[live]"`. The mocked path doesn't need this dependency.
|
||||
|
||||
**`crisis-agents --live` fails with `ANTHROPIC_API_KEY`.** Export the key before running: `export ANTHROPIC_API_KEY=sk-ant-...`. The SDK reads it from the environment.
|
||||
|
|
|
|||
111
README.md
111
README.md
|
|
@ -4,11 +4,12 @@ A proof-of-concept and educational artifact for Mirco Richter's [_Crisis_ paper]
|
|||
|
||||
This repository contains:
|
||||
|
||||
- a **Python implementation** of the protocol (`src/`, `tests/`),
|
||||
- a **Python implementation** of the protocol (`src/crisis/`, `tests/`),
|
||||
- an **event recorder** that exports a deterministic simulation run to JSON,
|
||||
- **CrisisViz** — a native macOS / SwiftUI curriculum visualizer that walks the protocol end-to-end across ten chapters: cast intro, gossip mechanics, partition, round derivation, virtual voting, leader election, total order, the data-availability problem, erasure-coded recovery, and Byzantine fork detection.
|
||||
- **CrisisViz** — a native macOS / SwiftUI curriculum visualizer that walks the protocol end-to-end across ten chapters,
|
||||
- **crisis_agents** — a coordination layer that lifts the protocol from "consensus between machines" to "consensus between AI agents," with a decentralized async event-driven engine and quorum-ratified byzantine alarms.
|
||||
|
||||
Everything in the visualizer is in extreme slow motion and serialized for didactic clarity. A signed speed slider scrubs the chapter forward and backward at any rate from $-16\times$ to $+16\times$; narration is bound to whichever beat the playhead is on.
|
||||
Everything in the visualizer is in extreme slow motion and serialized for didactic clarity. A signed speed slider scrubs each chapter forward and backward at any rate from $-16\times$ to $+16\times$; narration is bound to whichever beat the playhead is on.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -32,6 +33,7 @@ flowchart TD
|
|||
|
||||
Algos --> RealRT
|
||||
Algos --> SimRT
|
||||
Algos --> AgentLayer
|
||||
|
||||
subgraph RealRT["🌐 <b>Real runtime — <code>node.py</code> + <code>gossip.py</code></b><br/><i>scalable, deployable</i>"]
|
||||
Node["CrisisNode<br/>asyncio · TCP push/pull gossip<br/>3 concurrent loops<br/>CLI: <code>crisis-node</code>"]
|
||||
|
|
@ -43,12 +45,20 @@ flowchart TD
|
|||
SimNode --- SimCtl
|
||||
end
|
||||
|
||||
subgraph AgentLayer["🤖 <b>Crisis-Agents — <code>src/crisis_agents/</code></b><br/><i>decentralized, asynchronous</i>"]
|
||||
Agent["CrisisAgent ×N<br/>owns own LamportGraph<br/>emit · receive · gossip · detect"]
|
||||
Mom["Mothership<br/>bootstrap + event-loop driver<br/>no clock · no privileged state<br/>CLI: <code>crisis-agents</code>"]
|
||||
Agent --- Mom
|
||||
end
|
||||
|
||||
SimRT --> Rec
|
||||
Rec["📼 <b>Recorder — <code>recorder.py</code></b><br/>instruments every algorithm call<br/>captures events + per-step snapshots"]
|
||||
Rec --> Export
|
||||
Export["📦 <b>JSON exporter — <code>export_json.py</code></b><br/>writes <code>crisis_data.json</code>"]
|
||||
Export --> Viz
|
||||
|
||||
AgentLayer --> ProofJSON["🧾 <b>proof_*.json</b><br/>multi-signer byzantine proof<br/>schema_version=2"]
|
||||
|
||||
subgraph Viz["🎬 <b>CrisisViz — native macOS / SwiftUI</b>"]
|
||||
Player["Keynote-style player<br/>10 chapters · ~18 min @ 1×<br/>scrubbable −16× to +16×"]
|
||||
Testbed["Testbed harness<br/>invariants · source audit<br/>PNG sweep · 36 MP4 clips"]
|
||||
|
|
@ -58,53 +68,77 @@ flowchart TD
|
|||
classDef pure fill:#eee8d5,stroke:#586e75,color:#073642
|
||||
classDef real fill:#fce5cd,stroke:#cc4125,color:#660000
|
||||
classDef sim fill:#d9ead3,stroke:#38761d,color:#0b3d0b
|
||||
classDef agents fill:#fff2cc,stroke:#bf9000,color:#3d2e00
|
||||
classDef rec fill:#cfe2f3,stroke:#2c5f8f,color:#062b4d
|
||||
classDef viz fill:#ead1dc,stroke:#741b47,color:#3d0a26
|
||||
classDef proof fill:#fce5e8,stroke:#a64d59,color:#3d0014
|
||||
class Paper paper
|
||||
class Algos pure
|
||||
class RealRT real
|
||||
class SimRT sim
|
||||
class AgentLayer agents
|
||||
class Rec,Export rec
|
||||
class Viz viz
|
||||
class ProofJSON proof
|
||||
```
|
||||
|
||||
**Key architectural fact** — the recording pipeline that feeds CrisisViz only exercises the **`SimulatedNode`** path (in-process, deterministic, in-memory message passing). The **`CrisisNode`** TCP runtime is a separately developed PoC of how a real network deployment would look; it is _not_ what produces `crisis_data.json`. The two runtimes are siblings, not layers.
|
||||
**Three independent consumers of the protocol.** `src/crisis/` provides the pure algorithms (Lamport graphs, virtual voting, total order, mutation detection). Three sibling layers sit on top:
|
||||
|
||||
- **`CrisisNode`** — a deployable distributed runtime (TCP gossip, three concurrent asyncio loops). Has no consumers in this repo; meant as a reference for how a real network deployment would look.
|
||||
- **`SimulatedNode`** — an in-process deterministic simulator whose recording becomes `crisis_data.json`, the file CrisisViz visualizes.
|
||||
- **`crisis_agents`** — agent-coordination layer. Each AI agent participates as a Crisis node; the network catches byzantine equivocation through decentralized detection + quorum voting. The engine is asynchronous and event-driven — no global clock, no privileged observer.
|
||||
|
||||
The three are **siblings, not layers**: refactoring one doesn't break the others. CrisisViz and crisis_agents don't know each other exists.
|
||||
|
||||
---
|
||||
|
||||
## Repository layout
|
||||
|
||||
```
|
||||
crisis/ ← git root
|
||||
├── Crisis.mirco-richter-2019.pdf the paper
|
||||
├── README.md this file
|
||||
├── INSTALL.md fresh-macOS install guide
|
||||
├── LICENSE MIT (code only; paper is CC-BY-4.0)
|
||||
├── pyproject.toml Python ≥3.11, networkx, pytest
|
||||
├── crisis_data.json simulation export (source of truth)
|
||||
crisis/ ← git root
|
||||
├── Crisis.mirco-richter-2019.pdf the paper
|
||||
├── README.md this file
|
||||
├── INSTALL.md fresh-macOS install guide
|
||||
├── LICENSE MIT (code only; paper is CC-BY-4.0)
|
||||
├── pyproject.toml Python ≥3.11, networkx, pytest
|
||||
├── crisis_data.json simulation export (source of truth)
|
||||
│
|
||||
├── src/crisis/ ── PROTOCOL PoC (Python) ──
|
||||
│ ├── crypto.py, message.py random-oracle hash + Message/Vertex
|
||||
│ ├── graph.py, weight.py, rounds.py Lamport DAG + PoW weight + round derivation
|
||||
│ ├── voting.py, order.py BBA virtual voting + total order
|
||||
│ ├── gossip.py, node.py real TCP runtime (CrisisNode)
|
||||
│ ├── demo.py in-process simulation harness
|
||||
│ ├── recorder.py event instrumentation
|
||||
│ └── export_json.py JSON exporter for CrisisViz
|
||||
├── tests/ pytest suite
|
||||
├── src/crisis/ ── PROTOCOL PoC (Python) ──
|
||||
│ ├── crypto.py, message.py random-oracle hash + Message/Vertex
|
||||
│ ├── graph.py, weight.py, rounds.py Lamport DAG + PoW weight + round derivation
|
||||
│ ├── voting.py, order.py BBA virtual voting + total order
|
||||
│ ├── gossip.py, node.py real TCP runtime (CrisisNode)
|
||||
│ ├── demo.py in-process simulation harness
|
||||
│ ├── recorder.py event instrumentation
|
||||
│ └── export_json.py JSON exporter for CrisisViz
|
||||
│
|
||||
└── CrisisViz/ ── VISUALIZER (Swift / macOS 26) ──
|
||||
├── src/crisis_agents/ ── AGENT COORDINATION (Python) ──
|
||||
│ ├── README.md architecture & walkthrough
|
||||
│ ├── agent.py CrisisAgent + MockAgent + MockByzantineAgent
|
||||
│ ├── live_agent.py LiveClaudeAgent (Anthropic SDK)
|
||||
│ ├── boundary.py trust-set + open() trigger
|
||||
│ ├── mothership.py bootstrap + async event-loop driver
|
||||
│ ├── claim.py ClaimMessage payload
|
||||
│ ├── alarm.py decentralized detection
|
||||
│ ├── vote.py AlarmClaim + quorum tally
|
||||
│ ├── proof.py multi-signer ProofDocument
|
||||
│ ├── cli.py crisis-agents CLI entry point
|
||||
│ └── scenarios/fact_check.py the canonical demo
|
||||
│
|
||||
├── tests/ pytest suite (170 tests, ~0.8s)
|
||||
│
|
||||
└── CrisisViz/ ── VISUALIZER (Swift / macOS 26) ──
|
||||
├── Package.swift, bundle.sh, package-dmg.sh
|
||||
├── Sources/CrisisViz/ App, Engine, Model, Chapters, Views, Glass, Testbed, Canvas
|
||||
├── README.md Swift-side human guide
|
||||
└── HANDOFF.md agent-to-agent engineering log
|
||||
├── Sources/CrisisViz/ App, Engine, Model, Chapters, Views, Glass, Testbed, Canvas
|
||||
├── README.md Swift-side human guide
|
||||
└── HANDOFF.md agent-to-agent engineering log
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Quick start
|
||||
|
||||
There are three audiences. Pick the one that matches what you want to do.
|
||||
Four audiences. Pick the one that matches what you want to do.
|
||||
|
||||
### 🧮 Verify the protocol — pytest
|
||||
|
||||
|
|
@ -114,9 +148,9 @@ source .venv/bin/activate # set up per INSTALL.md if first time
|
|||
pytest -q
|
||||
```
|
||||
|
||||
Runs the algorithm unit tests (crypto, graph, rounds, weight, message, order, voting, recorder, simulation). Should be green in under a second.
|
||||
Runs all 170 tests across the protocol algorithms and the crisis_agents layer. Should be green in under a second.
|
||||
|
||||
### 🧪 Run a deterministic simulation — Python CLI
|
||||
### 🧪 Run a deterministic protocol simulation — Python CLI
|
||||
|
||||
```sh
|
||||
python -m crisis.demo --nodes 4 --byzantine 1 --rounds 10
|
||||
|
|
@ -129,7 +163,23 @@ python -m crisis.export_json --steps 80 -o crisis_data.json
|
|||
cp crisis_data.json CrisisViz/Sources/CrisisViz/crisis_data.json
|
||||
```
|
||||
|
||||
### 🎬 Watch the visualizer — Swift / macOS
|
||||
### 🤖 Run the AI-agent coordination demo — Python CLI
|
||||
|
||||
```sh
|
||||
crisis-agents demo
|
||||
```
|
||||
|
||||
Walks a six-phase scenario: a closed honest team, a byzantine joiner who equivocates on a fact-check statement, an asynchronous gossip + detection event loop, and a quorum-ratified proof. Output ends with a `proof_*.json` document that any third party can self-verify. See **[src/crisis_agents/README.md](src/crisis_agents/README.md)** for the architecture.
|
||||
|
||||
For real Claude sub-agents instead of scripted mocks:
|
||||
|
||||
```sh
|
||||
pip install -e ".[live]" # adds anthropic SDK
|
||||
export ANTHROPIC_API_KEY=...
|
||||
crisis-agents demo --live
|
||||
```
|
||||
|
||||
### 🎬 Watch the protocol visualizer — Swift / macOS
|
||||
|
||||
```sh
|
||||
cd CrisisViz
|
||||
|
|
@ -144,9 +194,10 @@ Then arrow keys ←/→ to navigate, **Space** to play/pause, the bottom slider
|
|||
|
||||
## Where to read next
|
||||
|
||||
- **[INSTALL.md](INSTALL.md)** — clone-to-running on a fresh macOS box. Prerequisites, Python venv setup, Swift toolchain, regenerating sim data, troubleshooting.
|
||||
- **[INSTALL.md](INSTALL.md)** — clone-to-running on a fresh macOS box. Prerequisites, Python venv setup, Swift toolchain, regenerating sim data, running the agents demo, troubleshooting.
|
||||
- **[src/crisis_agents/README.md](src/crisis_agents/README.md)** — the AI-agent coordination layer: architecture, six-phase walkthrough, decentralization principles, async event loop, quorum formula, live Claude mode, proof JSON shape.
|
||||
- **[CrisisViz/README.md](CrisisViz/README.md)** — Swift-side guide: serial-timeline pattern, testbed outputs, controls, cast convention.
|
||||
- **[CrisisViz/HANDOFF.md](CrisisViz/HANDOFF.md)** — engineering log for the next coding agent: current state, architecture pointers, hard-won rules.
|
||||
- **[CrisisViz/HANDOFF.md](CrisisViz/HANDOFF.md)** — engineering log for the next coding agent.
|
||||
|
||||
---
|
||||
|
||||
|
|
|
|||
272
src/crisis_agents/README.md
Normal file
272
src/crisis_agents/README.md
Normal file
|
|
@ -0,0 +1,272 @@
|
|||
# crisis_agents — coordination layer for AI agent teams
|
||||
|
||||
A Python package that lifts the Crisis consensus protocol from "consensus between machines" to "consensus between AI agents." Each participant is a Crisis node with its own Lamport graph; the network catches byzantine equivocation via decentralized detection and quorum-ratified alarms. The engine is **asynchronous** and **event-driven** — no global clock, no privileged observer.
|
||||
|
||||
> If you're new to this repo, start at the [parent README](../../README.md). This document is the architectural reference for the agent layer.
|
||||
|
||||
---
|
||||
|
||||
## Threat model
|
||||
|
||||
The setting is a small team of AI agents (Claude sub-agents, in our live demo) coordinated by an orchestrator we call the **mothership**.
|
||||
|
||||
- **Normal life** — the team is closed. Agents talk freely with each other and the mothership. No Crisis layer; the conversation is the medium.
|
||||
- **Boundary opens** — an external agent of unknown trust joins. Its internal motivation may diverge from the team's task. It may equivocate — telling one peer one thing while telling another peer the opposite — to mislead the network.
|
||||
- **Crisis to the rescue** — from the moment the boundary opens, every claim is wrapped into a Crisis Message with the emitting agent's stable process_id and a PoW nonce. The per-agent Lamport DAG is the immutable, replayable ledger. Mutation detection (built on `LamportGraph.find_mutations` from the protocol layer) catches equivocation. Every honest agent who has gossiped enough to see both contradictory variants raises an alarm. A quorum of independent alarms produces a network-ratified proof of malfeasance.
|
||||
|
||||
What's deliberately **not** in scope (this is a PoC):
|
||||
|
||||
- Visualization. The CrisisViz application is a separate effort that visualizes the protocol PoC; visualizing an agent-coordination run would require a substantial new chapter set there.
|
||||
- Real TCP gossip. Agents talk via in-process function calls in the mothership process. The existing `crisis.gossip.GossipServer` shows how it would look across sockets.
|
||||
- Detection of *false claims that aren't equivocations*. An agent who consistently lies but never equivocates is out-voted, not "caught." Catching it would require a ground-truth oracle, which is application-layer, not protocol-layer.
|
||||
|
||||
---
|
||||
|
||||
## Two architectural principles, enforced by tests
|
||||
|
||||
### 1. No chokepoint
|
||||
|
||||
Every honest agent maintains **its own** `LamportGraph`. The mothership does NOT hold a privileged graph of the whole network. Detection runs on each agent independently; alarms are emitted by each detector independently; proofs are signed by a quorum of detectors.
|
||||
|
||||
The regression-test file `tests/test_no_chokepoint.py` asserts:
|
||||
|
||||
- After the full lifecycle, every honest agent's *ratified-alarms set* is byte-identical to every other honest agent's.
|
||||
- The mothership does not expose `all_graphs`, `graph_of`, `_graphs`, or any other privileged collection.
|
||||
- A single byzantine accuser alone cannot ratify an alarm.
|
||||
|
||||
### 2. No clock
|
||||
|
||||
Crisis is supposed to work in asynchronous P2P networks. Any synchronicity in the protocol is *virtual* — derived inside the consensus algorithm from the causal structure of the Lamport graph — not imposed from outside by a coordinator.
|
||||
|
||||
The driver loop is **event-driven and quiescence-terminated**, not turn-counted:
|
||||
|
||||
```python
|
||||
def run_until_quiescent(max_steps=200):
|
||||
while progress:
|
||||
progress = False
|
||||
# 1. Any agent has something to emit? Let them speak.
|
||||
# 2. Any gossip pair has new info? Exchange.
|
||||
# 3. Any agent has detected a new mutation? Emit AlarmClaim.
|
||||
```
|
||||
|
||||
`tests/test_async_quiescence.py` asserts:
|
||||
|
||||
- `CrisisAgent.try_emit()` takes no `turn` argument.
|
||||
- `AlarmClaim` has no `detected_at_turn` field (the wall-clock-implying name); only `emitted_at_step`, which is a per-agent local sequence number.
|
||||
- Two runs of the same scenario produce identical end states (determinism — no hidden wall-time dependence).
|
||||
- The loop alone (no manual phase orchestration) ratifies an alarm.
|
||||
|
||||
---
|
||||
|
||||
## The mental model
|
||||
|
||||
```mermaid
|
||||
flowchart TB
|
||||
subgraph Closed["🟢 Phase 1 — closed team (no Crisis)"]
|
||||
Mom1["mothership"]
|
||||
A1["agent_α"]
|
||||
B1["agent_β"]
|
||||
C1["agent_γ"]
|
||||
Mom1 <--> A1
|
||||
Mom1 <--> B1
|
||||
Mom1 <--> C1
|
||||
A1 <--> B1
|
||||
B1 <--> C1
|
||||
end
|
||||
|
||||
Boundary{"<b>BOUNDARY OPENS</b><br/>agent_δ joins;<br/>trust unknown"}
|
||||
Closed --> Boundary --> Open
|
||||
|
||||
subgraph Open["🟡 Phase 2 — Crisis active, async event loop"]
|
||||
Mom2["mothership<br/>(bootstrap + driver only)"]
|
||||
A2["agent_α<br/>own LamportGraph<br/>detect · alarm"]
|
||||
B2["agent_β<br/>own LamportGraph<br/>detect · alarm"]
|
||||
C2["agent_γ<br/>own LamportGraph<br/>detect · alarm"]
|
||||
D2["agent_δ ⚠<br/>own LamportGraph<br/><i>byzantine</i>"]
|
||||
|
||||
D2 -. variant_A .-> A2
|
||||
D2 -. variant_A .-> C2
|
||||
D2 -. variant_B .-> B2
|
||||
A2 <-. gossip .-> B2
|
||||
B2 <-. gossip .-> C2
|
||||
A2 <-. gossip .-> C2
|
||||
A2 -- alarm --> Mom2
|
||||
B2 -- alarm --> Mom2
|
||||
C2 -- alarm --> Mom2
|
||||
end
|
||||
|
||||
Open --> Quorum
|
||||
Quorum{"<b>QUORUM VOTE</b><br/>≥ ⌈2N/3⌉ honest signers<br/>independently agree"}
|
||||
Quorum --> Proof
|
||||
Proof["<b>📜 Multi-signer proof</b><br/>signed JSON; replayable<br/>schema_version=2"]
|
||||
|
||||
classDef closed fill:#d9ead3,stroke:#38761d
|
||||
classDef boundary fill:#fff2cc,stroke:#bf9000
|
||||
classDef open fill:#fce5cd,stroke:#cc4125
|
||||
classDef quorum fill:#cfe2f3,stroke:#2c5f8f
|
||||
classDef proof fill:#ead1dc,stroke:#741b47
|
||||
class Closed closed
|
||||
class Boundary,Quorum boundary
|
||||
class Open open
|
||||
class Proof proof
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Six-phase walkthrough (the `crisis-agents demo`)
|
||||
|
||||
The canonical scenario is `scenarios/fact_check.py`: three honest agents and one byzantine adjudicate six factual statements about a small reference document.
|
||||
|
||||
### Phase 1 — closed team, no Crisis
|
||||
The mothership drives `run_closed_phase()` until quiescent. Each honest agent emits its six fact-check claims via plain function calls — appended to a flat log. Per-agent LamportGraphs aren't yet allocated. **No Crisis overhead.**
|
||||
|
||||
### Phase 2 — boundary opens
|
||||
`mothership.open_boundary(agent_delta)`. Atomically: δ is added to the trust set, a fresh `LamportGraph` is created on every agent (including δ), and `boundary.is_open` flips to `True`.
|
||||
|
||||
### Phase 3 — asynchronous event loop
|
||||
`mothership.run_until_quiescent()`. The driver cycles through:
|
||||
|
||||
1. **Emission** — `agent.try_emit()` is called on each agent. Returned `AgentTurn`s are first-hop routed to their target subset (or broadcast). The byzantine emits an intro (broadcast), then a pair of contradictory variants (split delivery).
|
||||
2. **Gossip** — every ordered pair `(sender, receiver)` exchanges what `sender` has that `receiver` doesn't. Eventually-consistent propagation.
|
||||
3. **Alarm emission** — `agent.pending_alarm_claims()` runs `LamportGraph.find_mutations(...)` on each agent's own graph and produces `AlarmClaim`s for any newly observed equivocation. AlarmClaims are wrapped as Crisis Messages and broadcast.
|
||||
|
||||
The loop exits when none of these three concerns make progress. `QuiescenceReport` (returned) carries: `steps`, `emissions`, `gossip_transfers`, `alarm_claims_emitted`, `reached_quiescence`.
|
||||
|
||||
### Phase 4 — decentralized detection
|
||||
Each agent independently runs `detect_mutations()` on its own graph. In our scenario, every honest agent observes the byzantine's same-id spacelike pair and reports it. The byzantine doesn't accuse itself.
|
||||
|
||||
### Phase 5 — ratification by quorum
|
||||
The quorum threshold is
|
||||
|
||||
$$\text{quorum}(N) = \left\lceil \frac{2N}{3} \right\rceil$$
|
||||
|
||||
where $N$ is the boundary size at ratification. For our scenario $N=4$ (3 honest + 1 byzantine), so the threshold is $\left\lceil 2 \cdot 4 / 3 \right\rceil = 3$ — every honest agent must concur. `tally_alarms(graph, threshold)` groups AlarmClaim vertices by `(accused, statement_id, witness_pair)`, counts unique signer process_ids per group, and ratifies groups meeting the threshold. **All honest agents produce identical `RatifiedAlarm` lists** (this is the no-chokepoint property in action).
|
||||
|
||||
### Phase 6 — proof emission
|
||||
`build_proof(ratified_alarm)` produces a self-contained JSON document. Schema:
|
||||
|
||||
```json
|
||||
{
|
||||
"schema_version": 2,
|
||||
"accused_process_id_hex": "...",
|
||||
"statement_id": "s03",
|
||||
"witness_digests": ["...", "..."],
|
||||
"signer_process_id_hexes": ["...", "...", "..."],
|
||||
"quorum_threshold": 3,
|
||||
"summary": "agent id=... emitted contradictory Crisis vertices about ..."
|
||||
}
|
||||
```
|
||||
|
||||
`verify_proof_self_consistent(proof)` checks distinct witnesses, distinct signers, signer count ≥ threshold. Future Phase-6+ work: full replay verification that re-derives the alarm from a recorded simulation log.
|
||||
|
||||
---
|
||||
|
||||
## Module reference
|
||||
|
||||
| File | What it owns |
|
||||
|---|---|
|
||||
| `claim.py` | `Claim` dataclass — the application-layer payload (verdict + evidence) |
|
||||
| `boundary.py` | `Boundary` — trust set, `open()` trigger |
|
||||
| `agent.py` | `CrisisAgent` (abstract) + `MockAgent` + `MockByzantineAgent`. Each agent owns its `LamportGraph`, `emit_claim`, `receive`, `gossip_to`, `detect_mutations`, `pending_alarm_claims` |
|
||||
| `live_agent.py` | `LiveClaudeAgent` — same interface, backed by real Anthropic API calls |
|
||||
| `mothership.py` | `Mothership` — bootstrap + async event-loop driver. No privileged graph state. `run_closed_phase()`, `run_until_quiescent()`, `ratified_alarms_from(name)` |
|
||||
| `alarm.py` | `LocalAlarm` + `detect_mutations_in_graph(graph, ...)` — pure function, runs on one agent's graph |
|
||||
| `vote.py` | `AlarmClaim` payload, `RatifiedAlarm`, `quorum_for(n)`, `tally_alarms(graph, threshold)` |
|
||||
| `proof.py` | `ProofDocument` (schema v2), `build_proof`, `verify_proof_self_consistent` |
|
||||
| `cli.py` | `crisis-agents demo` + `crisis-agents verify` |
|
||||
| `scenarios/fact_check.py` | The canonical demo scenario: reference doc, six statements, scripted agents |
|
||||
| `scenarios/reference_doc.txt` | The factual paragraph the demo adjudicates |
|
||||
|
||||
---
|
||||
|
||||
## Reuse map from `src/crisis/`
|
||||
|
||||
Almost all the heavy lifting comes from the protocol layer; `crisis_agents` is a thin adapter.
|
||||
|
||||
| `src/crisis/` primitive | How `crisis_agents` uses it |
|
||||
|---|---|
|
||||
| `Message`, `Vertex` | Claims and AlarmClaims become `Message.payload`. Agent's stable id → `Message.id`. |
|
||||
| `LamportGraph` | One per agent. `extend()`, `find_mutations()`, `are_spacelike()` all reused. |
|
||||
| `LamportGraph.find_mutations(pid)` | The core of decentralized detection. Returns same-id spacelike groups. |
|
||||
| `ProofOfWorkWeight` + `mine_nonce()` | Each emission's PoW comes from here, with a shared weight system across the network so PoW is verifiable across graphs. |
|
||||
| `digest(name)[:ID_LENGTH]` | Agent process_id derivation. Same convention as `crisis.demo.Simulation` so agents could coexist with simulated nodes in a future mixed scenario. |
|
||||
|
||||
---
|
||||
|
||||
## Build · run · test
|
||||
|
||||
```sh
|
||||
# From repo root, after setup per INSTALL.md
|
||||
cd /path/to/crisis
|
||||
source .venv/bin/activate
|
||||
pip install -e ".[dev]" # editable install with pytest
|
||||
|
||||
# All tests, including crisis_agents
|
||||
pytest -q # ~170 tests in 0.8s
|
||||
|
||||
# Just the agent layer
|
||||
pytest tests/test_claim.py tests/test_boundary.py tests/test_agent*.py \
|
||||
tests/test_mothership.py tests/test_alarm.py tests/test_vote.py \
|
||||
tests/test_proof.py tests/test_demo_fact_check.py \
|
||||
tests/test_no_chokepoint.py tests/test_async_quiescence.py -v
|
||||
|
||||
# Run the demo (mocked, deterministic)
|
||||
crisis-agents demo --out-dir /tmp/crisis_demo
|
||||
|
||||
# Run with real Claude sub-agents (requires API key + extras)
|
||||
pip install -e ".[live]"
|
||||
export ANTHROPIC_API_KEY=sk-ant-...
|
||||
crisis-agents demo --live --model claude-haiku-4-5-20251001
|
||||
|
||||
# Verify a proof
|
||||
crisis-agents verify /tmp/crisis_demo/proof_*.json
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## The live-Claude path
|
||||
|
||||
`LiveClaudeAgent` (in `live_agent.py`) makes one Anthropic Messages API call per `try_emit()` invocation, asking Claude to fact-check the scenario's statements against the reference document. The response is parsed as a JSON array of `Claim`-shaped objects; malformed responses degrade gracefully (the agent emits nothing rather than crashing).
|
||||
|
||||
The byzantine joiner stays **mocked** even in `--live` mode: producing deterministic equivocation from an LLM requires multiple API calls per turn (one per peer subset) for unreliable yields, and the demo's narrative is cleaner with a scripted byzantine. The honest agents are the real LLM participants.
|
||||
|
||||
Default model: `claude-haiku-4-5-20251001` (fast, cheap, plenty of capability for structured-output adjudication). Override with `--model`.
|
||||
|
||||
The live path is intentionally not in CI — it costs API credits and has nondeterministic outputs.
|
||||
|
||||
---
|
||||
|
||||
## Test taxonomy
|
||||
|
||||
| Test file | What it asserts |
|
||||
|---|---|
|
||||
| `tests/test_claim.py` | Claim dataclass validation + JSON round-trip |
|
||||
| `tests/test_boundary.py` | Boundary state machine (closed → open) |
|
||||
| `tests/test_mothership.py` | Per-agent graph ownership; broadcast vs. targeted delivery; gossip propagation; no privileged attribute |
|
||||
| `tests/test_alarm.py` | Decentralized detection; every honest agent finds the same mutation; canonical witness pairs |
|
||||
| `tests/test_vote.py` | AlarmClaim round-trip; quorum formulas; tally determinism |
|
||||
| `tests/test_proof.py` | ProofDocument schema; JSON round-trip; tampered-witness/below-quorum rejection |
|
||||
| `tests/test_demo_fact_check.py` | End-to-end scenario produces one ratified alarm; CLI output contains all six phases |
|
||||
| `tests/test_live_agent.py` | LiveClaudeAgent parsing (fake Anthropic client; no real API calls) |
|
||||
| **`tests/test_no_chokepoint.py`** | **Centerpiece: every honest agent's ratified set is byte-identical; no privileged attributes exist** |
|
||||
| **`tests/test_async_quiescence.py`** | **Centerpiece: no clock; `try_emit()` takes no arg; `AlarmClaim.detected_at_turn` doesn't exist; two runs converge identically** |
|
||||
|
||||
The two centerpiece files are sentinels — if you ever re-introduce a chokepoint or a wall clock, one of those tests should fail.
|
||||
|
||||
---
|
||||
|
||||
## What's deliberately out of scope
|
||||
|
||||
- **CrisisViz integration.** The visualizer's data file (`crisis_data.json`) is produced by `crisis.demo.Simulation`, not by `crisis_agents`. A future CrisisViz upgrade could absorb agent-coordination runs (multi-DAG rendering, gossip arrows, alarm-vote convergence) — but that's a separate effort, sketched in the parent README.
|
||||
- **Real TCP gossip.** In-process function calls only. Lifting to multi-process requires plugging into `crisis.gossip.GossipServer` — independent work.
|
||||
- **Cryptographic signatures beyond what Crisis already provides.** Crisis already provides nonces + message-digest chaining + PoW. Agent identity is `digest(name)[:32]`. We don't add a separate identity-PKI.
|
||||
- **Sybil resistance.** Threat model is "a few byzantine joiners with valid PoW", not "an attacker spawning unlimited identities." Sybil defense is what the PoW weight in Crisis is *for*; it's not the agent layer's concern.
|
||||
- **Byzantine false-accusations.** A byzantine could emit a false AlarmClaim against an honest agent. The quorum mechanism prevents ratification (honest agents won't second the false claim, so it stays at 1-of-N). Second-order detection of false accusers isn't in this PoC.
|
||||
|
||||
---
|
||||
|
||||
## Pointers
|
||||
|
||||
- Parent README: [`../../README.md`](../../README.md)
|
||||
- Install guide: [`../../INSTALL.md`](../../INSTALL.md)
|
||||
- The paper this is all based on: [`../../Crisis.mirco-richter-2019.pdf`](../../Crisis.mirco-richter-2019.pdf)
|
||||
Loading…
Reference in a new issue