crisis

mirror of https://github.com/saymrwulf/crisis.git synced 2026-05-14 20:37:54 +00:00

Author	SHA1	Message	Date
saymrwulf	0976239ebd	crisis_agents: drop the wall-clock, drive asynchronously to quiescence The previous driver imposed a synchronous turn-counted clock that the Crisis paper explicitly forbids — Crisis is supposed to work in asynchronous P2P networks, with any synchronicity being virtual and derived inside the consensus algorithm from the DAG structure, not imposed externally by a coordinator. This commit removes the wall clock. What changed in the engine: - `Mothership.run_crisis_phase(num_turns, gossip_rounds_per_turn)` is replaced by `run_until_quiescent(max_steps=200)`. The loop interleaves three concerns on each iteration — emissions, gossip, and alarm emissions — until none make progress. Termination is by quiescence, not by a fixed turn count. `max_steps` is a safety bound (loop-iteration cap), not an exposed clock. - `Mothership.run_closed_phase(num_turns)` becomes `run_closed_phase(max_steps=50)`. Same quiescence model — the closed-phase conversation runs until no agent has more to say. - Agents grew `pending_alarm_claims()`: each agent checks its own graph for un-alarmed mutations and produces AlarmClaims directly. The driver loop calls this every iteration, so alarms emit and propagate in the same loop as regular emissions and gossip — no separate "alarm phase." - `Mothership.emit_alarms_from_detectors()` and the explicit `run_gossip_round()` step are no longer needed by callers; both are subsumed by the async loop. `run_gossip_round()` stays as a helper but tests no longer call it externally. What changed in the agent interface: - `CrisisAgent.next_turn(turn, received_claims)` becomes `try_emit()` — no arguments. Agents in an async network don't see a global tick. They decide based on their own internal state. - `CrisisAgent.observe(claim)` is the new optional callback the closed-phase loop uses to feed context into agents that care (overridden by LiveClaudeAgent to populate its prompt buffer). - `pending_alarm_claims()` is idempotent: an internal `_already_alarmed` set tracks claims this agent has emitted, so the loop calls it every step without flooding the network with duplicate alarms. What changed in the dataclass schema: - `AlarmClaim.detected_at_turn` -> `emitted_at_step`. The word "turn" implies a global clock; "step" is a per-agent sequence number used only for log ordering — local, not networked. - `ClosedPhaseEntry.turn` and `CrisisPhaseEntry.turn` -> `step`. Same rename, same reasoning. - `Scenario.closed_phase_turns` and `Scenario.crisis_phase_turns` are gone. The scenario no longer prescribes how many turns; it just provides agents and lets the async loop run them out. What changed in the CLI: - Phase 3 reports "drove to quiescence in N step(s)" with a breakdown of regular emissions / gossip transfers / alarm emissions, instead of "ran N turns". - `QuiescenceReport` (new dataclass) carries the run statistics back from `run_until_quiescent`/`run_closed_phase` — steps taken, emissions made, gossip transfers, alarm claims emitted, plus whether termination was via quiescence or max-step cap. New regression tests (`test_async_quiescence.py`): - `test_run_until_quiescent_terminates`: the loop must exit. - `test_two_runs_produce_identical_final_state`: determinism check — if anything in the loop depended on real wall time, this would fail. - `test_max_steps_bound_caps_runtime`: setting max_steps=1 exits immediately and `QuiescenceReport.reached_quiescence` reflects reality. - `test_no_turn_argument_exposed_to_agents`: introspects `CrisisAgent.try_emit` signature; fails if anyone re-adds a `turn` parameter. - `test_no_turn_field_on_alarmclaim`: introspects the dataclass fields; fails if `detected_at_turn` reappears. - `test_alarms_propagate_through_async_loop_alone`: the loop alone (no manual emit_alarms / run_gossip_round) ratifies an alarm. - `test_quiescence_report_counts_match_logs`: sanity check that the report's emission count equals the crisis log length. Suite: 163 -> 170 tests, all green in 0.79s. Behavioral end-state is identical to the previous (synchronous) version: same fact-check scenario, same byzantine equivocation, same proof JSON shape, same three signers, same quorum-met outcome. The difference is structural: the protocol now matches the paper's async shape, and a future port to actual TCP gossip + concurrent agents needs no change to this engine. CrisisViz: still untouched. The `crisis_data.json` pipeline that drives the visualizer is orthogonal. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-14 22:06:56 +02:00
saymrwulf	a1064660d5	Decentralize crisis_agents: agents own graphs, detect locally, vote by quorum The previous design routed every Crisis message through a `Mothership` that also held every agent's LamportGraph, ran the byzantine scan from a privileged vantage, and built proofs from its own view. That made the mothership a chokepoint — exactly what a BFT layer is supposed to remove. This commit redistributes responsibility along the lines you'd expect from a real open protocol: Each `CrisisAgent` now owns: - its own `LamportGraph` (the agent's view of the network) - `emit_claim(claim) → Message`: wraps a Claim into a fully-valid Crisis Message built from the agent's OWN graph state, with chain link + cross references + mined PoW nonce - `receive(message)`: extends my graph if integrity holds; idempotent - `gossip_to(peer) → int`: shares everything I have with peer until quiescence (Algorithm 4 in the paper, in-process flavor) - `detect_mutations() → list[LocalAlarm]`: scans MY graph for same-id spacelike vertex pairs via the existing `LamportGraph.find_mutations`, filtered by application-layer `statement_id` so cross-detector AlarmClaims canonicalize The `Mothership` shrinks to coordinator-only: - bootstrap (register honest agents; trigger boundary open with a joiner) - clock (call each agent's `next_turn()` per turn) - first-hop routing (sender's emission → declared target subset) - all-pairs gossip rounds between turns - emit_alarms_from_detectors(): poll each agent for its LocalAlarms, wrap any returned alarms into AlarmClaim payloads, broadcast them as Crisis Messages over the gossip layer Gone (regression-tested in `test_no_chokepoint.py`): - `Mothership._graphs`, `Mothership.all_graphs()`, `Mothership.graph_of()` - `alarm.scan_for_mutations(mothership)` - any path where the mothership reads an agent's internal state New voting layer (`crisis_agents/vote.py`): - `AlarmClaim`: a Crisis-payload dataclass discriminated by `kind="alarm"`. Wraps the accused process_id, statement_id, witness_digests, and detection turn. Round-trips through JSON same as Claim. - `quorum_for(n) = ceil(2n/3)`: classic BFT threshold. - `tally_alarms(graph, threshold)`: groups AlarmClaim vertices by (accused, statement_id, witness_pair), counts unique signer process_ids, ratifies groups meeting the threshold. Deterministic ordering so two equal graphs produce equal `RatifiedAlarm` lists. - `RatifiedAlarm`: the network-level consensus on byzantine behavior. Multi-signer proofs (`crisis_agents/proof.py`): - schema_version bumped 1 → 2. - ProofDocument now embeds every signer's process_id_hex and the quorum threshold that was met. Self-consistency check enforces distinct signers, witness pairs, and signer count ≥ threshold. Byzantine scenario rewrite: - `MockByzantineAgent` now takes an `intro_claim` for its first turn (a benign broadcast). The intro is technically necessary: the agent's two contradictory variants both chain to the intro vertex, so they can propagate through gossip — without it, the second variant would fail the chain constraint in any graph already holding the first. - `fact_check` scenario: closed phase still has 3 honest agents emitting 6 claims each into the closed log; Crisis phase grew to 2 turns (intro + equivocation) so the byzantine can establish its same-id anchor before equivocating. End-to-end CLI output reframed around six phases: 1. closed team (no Crisis) 2. boundary opens 3. emission + gossip 4. decentralized detection (each agent reports its own findings) 5. alarms emitted + gossiped + ratified by quorum 6. proof emission Tests (51 fresh + 5 carried over for boundary): - `test_mothership.py`: per-agent graph ownership, broadcast vs. targeted delivery semantics, gossip propagation, regression guards against the removed centralization attributes. - `test_alarm.py`: every honest agent independently detects the same mutation; the byzantine doesn't detect itself; witness pairs are canonical across detectors. - `test_vote.py`: AlarmClaim round-trip, quorum formulas, tally determinism, mothership convenience method matches direct tallying. - `test_proof.py`: build_proof from RatifiedAlarm; multi-signer JSON round-trip; tampered-witness/below-quorum/duplicate-signer rejection. - `test_no_chokepoint.py` (the centerpiece): after the full lifecycle, every honest agent's ratified-alarm set is byte-identical. A single byzantine accuser alone cannot ratify. Forbidden attributes don't exist on Mothership. Full suite: 163 tests, all green in 0.80s. CrisisViz: untouched by this refactor. The `crisis_data.json` pipeline the visualizer consumes is produced by the orthogonal `crisis.demo.Simulation`, which this commit doesn't touch. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-14 21:55:49 +02:00
saymrwulf	6aa2c54b68	Add LiveClaudeAgent — back honest agents with real Claude API calls `crisis-agents demo --live` swaps the three honest MockAgents for LiveClaudeAgent instances that issue one Anthropic Messages API call per turn. The byzantine joiner stays mocked: making the byzantine deterministic with an LLM would require multiple API calls per turn (one per peer subset) for unreliable yields. Better demo legibility to keep the equivocator scripted. Prompt shape: the honest agent receives the reference doc, a list of statements still to adjudicate, and the last 12 claims observed from peers; it responds with a JSON array of {statement_id, verdict, confidence, evidence} objects. The parser tolerates markdown fences and per-item validation failures; malformed responses produce no emissions rather than crashing the demo. Default model: claude-haiku-4-5-20251001 — fast enough and cheap enough for short-form structured-output adjudication. Override with `--model <id>`. Dependency: anthropic SDK as an optional install — `pip install -e ".[live]"`. Lazy-imported so the mocked path never needs it. Tests: 6 new tests in test_live_agent.py using a fake Anthropic client (no real API calls in CI): - clean JSON response parsing - markdown-fence tolerance - malformed-response graceful degradation - per-item validation skipping - already-adjudicated statement filtering (the agent doesn't keep re-asking about statements it has already answered) - evidence-length truncation to Claim.EVIDENCE_MAX_LEN Suite: 145 -> 150 tests, all green in 0.77s. Manual test (not in CI; requires API credits): pip install -e ".[live]" export ANTHROPIC_API_KEY=... crisis-agents demo --live Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-14 16:38:25 +02:00
saymrwulf	b8684297fa	Add crisis_agents — Crisis as a coordination layer for AI agent teams A new sibling Python package, `crisis_agents`, that lifts the Crisis protocol from "consensus between machines" to "consensus between AI agents". Threat model: a team of sub-agents normally talks freely with its orchestrator (the "mothership"); when the team's boundary opens and an external agent of unknown trust joins, the mothership activates the Crisis layer so byzantine equivocation is detectable. Two-phase orchestration model: Phase 1 — closed team, no Crisis: agents emit claims directly, the mothership collects them flat. Phase 2 — boundary opens: every subsequent claim is wrapped into a Crisis Message with the agent's stable process_id and a PoW nonce, delivered into per-agent LamportGraphs, and after each turn the mothership scans for mutations via LamportGraph.find_mutations. Phase 3 — proof: when an alarm fires, the mothership emits a replayable JSON proof-of-malfeasance document with the contradictory witnesses, their delivery sets, and DAG cross-references showing which honest agents saw what. Modules: - claim.py Claim dataclass + JSON round-trip - boundary.py membership tracker + open() trigger - agent.py CrisisAgent abstract + MockAgent + MockByzantineAgent (the latter equivocates by emitting two variants to disjoint peer subsets at the same logical turn) - mothership.py orchestrator driving both phases, building Crisis Messages from Claims, per-agent LamportGraphs, log - alarm.py scan_for_mutations: same-agent same-turn distinct digests with non-identical delivery sets, verified spacelike via LamportGraph.are_spacelike on the honest-agent graphs - proof.py build_proof + ProofDocument + JSON serializer + verify_proof_self_consistent - cli.py `crisis-agents demo` + `crisis-agents verify` - scenarios/ fact_check: reference doc + 6 statements + scripted honest/byzantine agents producing a deterministic equivocation on statement s03 Tests: 50 new tests across test_claim, test_boundary, test_mothership, test_alarm, test_proof, test_demo_fact_check. End-to-end test runs the fact_check scenario, asserts exactly one alarm raised, proof is built, re-serialized JSON passes self-consistency. Full suite (existing crisis + new crisis_agents) green in 0.77s — 145 tests. Out of scope (deliberately): visualization (separate CrisisViz upgrade later), real TCP gossip (agents talk via in-process function calls in the mothership), false-claim detection without equivocation (an agent that consistently lies but never equivocates is out-voted, not "caught"; catching it would require a ground-truth oracle). Reuse from existing crisis package: Message, Vertex, LamportGraph, LamportGraph.find_mutations, ProofOfWorkWeight, digest. The new code is a thin adapter layer; the protocol substrate did the heavy lifting. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-14 16:38:11 +02:00
saymrwulf	7f830a36ef	Advance Python test coverage — voting, recorder, simulation extensions Pre-existing tests covered crypto / graph / message / order / rounds / weight, but left three high-value modules unverified: - voting.py — 25 KB of BBA virtual leader election + safe voting pattern (Algorithms 6 & 7), the heart of the protocol. Zero tests. Now 14 tests covering the four public entry points (`build_knowledge_graph`, `select_quorum`, `voting_set`, `compute_safe_voting_pattern`, `compute_virtual_leader_election`) plus `initial_vote`. Uses a small in-process Simulation to produce realistic multi-round graphs. - recorder.py — the bridge that turns simulation runs into the JSON consumed by CrisisViz. Zero tests despite being the choke point: if recorder silently drops fields, the viz lies. Now 11 tests covering EventRecorder bookkeeping (sequence, filtering), SimulationRecording integration (STEP_BEGIN/END, MESSAGE_CREATED/DELIVERED), capture_snapshot well-formedness, and JSON-serializability of both snapshots and event data. - test_simulation.py extended with three regression guards: - test_byzantine_vertices_flagged_in_snapshots: ensures the `is_byzantine_source` flag survives the recorder pipeline. CrisisViz's Ch10 (byzantine) chapter relies on this to colour Dave's lane red. - test_recorder_deterministic_with_seed: same seed produces identical event-stream length and type ordering. Tightens the existing vertex-count determinism check. - test_consensus_pipeline_progresses: a fast claim that rounds advance past 0 and the SVP / voting code paths engage. The stronger claim (full convergence + non-empty total order) takes minutes in pure Python and belongs in a separate long-running benchmark, not the unit-test suite — but the weaker claim is sufficient to catch the dead-pipeline failure mode that motivated regenerating crisis_data.json on 2026-05-04. Suite: 72 -> 100 tests, all green in ~0.75s. Explicitly out of scope (separate engineering effort): - gossip.py / node.py TCP integration tests — heavy harness; - export_json.py — thin composition of tested layers; - Swift XCTest — the CrisisViz testbed harness already covers the curriculum-correctness layer. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-14 15:52:30 +02:00
saymrwulf	1df4790fb4	Initial implementation of the Crisis protocol (Richter, 2019) Complete Python PoC of "Probabilistically Self Organizing Total Order in Unstructured P2P Networks". Implements all 10 algorithms from the paper: message generation, integrity checks, Lamport graphs, virtual synchronous rounds, safe voting patterns, virtual leader election (BA*), longest chain rule, total order via Kahn's algorithm, and push/pull gossip. Includes simulation harness, full node binary, and 72 passing tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-23 13:20:30 +02:00

6 commits