mirror of
https://github.com/saymrwulf/crisis.git
synced 2026-05-14 20:37:54 +00:00
6 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
| 0976239ebd |
crisis_agents: drop the wall-clock, drive asynchronously to quiescence
The previous driver imposed a synchronous turn-counted clock that the
Crisis paper explicitly forbids — Crisis is supposed to work in
asynchronous P2P networks, with any synchronicity being virtual and
derived inside the consensus algorithm from the DAG structure, not
imposed externally by a coordinator. This commit removes the wall clock.
What changed in the engine:
- `Mothership.run_crisis_phase(num_turns, gossip_rounds_per_turn)`
is replaced by `run_until_quiescent(max_steps=200)`. The loop
interleaves three concerns on each iteration — emissions, gossip,
and alarm emissions — until none make progress. Termination is by
quiescence, not by a fixed turn count. `max_steps` is a safety
bound (loop-iteration cap), not an exposed clock.
- `Mothership.run_closed_phase(num_turns)` becomes
`run_closed_phase(max_steps=50)`. Same quiescence model — the
closed-phase conversation runs until no agent has more to say.
- Agents grew `pending_alarm_claims()`: each agent checks its own
graph for un-alarmed mutations and produces AlarmClaims directly.
The driver loop calls this every iteration, so alarms emit and
propagate in the same loop as regular emissions and gossip — no
separate "alarm phase."
- `Mothership.emit_alarms_from_detectors()` and the explicit
`run_gossip_round()` step are no longer needed by callers; both
are subsumed by the async loop. `run_gossip_round()` stays as a
helper but tests no longer call it externally.
What changed in the agent interface:
- `CrisisAgent.next_turn(turn, received_claims)` becomes
`try_emit()` — no arguments. Agents in an async network don't see
a global tick. They decide based on their own internal state.
- `CrisisAgent.observe(claim)` is the new optional callback the
closed-phase loop uses to feed context into agents that care
(overridden by LiveClaudeAgent to populate its prompt buffer).
- `pending_alarm_claims()` is idempotent: an internal
`_already_alarmed` set tracks claims this agent has emitted, so
the loop calls it every step without flooding the network with
duplicate alarms.
What changed in the dataclass schema:
- `AlarmClaim.detected_at_turn` -> `emitted_at_step`. The word
"turn" implies a global clock; "step" is a per-agent sequence
number used only for log ordering — local, not networked.
- `ClosedPhaseEntry.turn` and `CrisisPhaseEntry.turn` -> `step`.
Same rename, same reasoning.
- `Scenario.closed_phase_turns` and `Scenario.crisis_phase_turns`
are gone. The scenario no longer prescribes how many turns; it
just provides agents and lets the async loop run them out.
What changed in the CLI:
- Phase 3 reports "drove to quiescence in N step(s)" with a
breakdown of regular emissions / gossip transfers / alarm
emissions, instead of "ran N turns".
- `QuiescenceReport` (new dataclass) carries the run statistics
back from `run_until_quiescent`/`run_closed_phase` — steps taken,
emissions made, gossip transfers, alarm claims emitted, plus
whether termination was via quiescence or max-step cap.
New regression tests (`test_async_quiescence.py`):
- `test_run_until_quiescent_terminates`: the loop must exit.
- `test_two_runs_produce_identical_final_state`: determinism check —
if anything in the loop depended on real wall time, this would
fail.
- `test_max_steps_bound_caps_runtime`: setting max_steps=1 exits
immediately and `QuiescenceReport.reached_quiescence` reflects
reality.
- `test_no_turn_argument_exposed_to_agents`: introspects
`CrisisAgent.try_emit` signature; fails if anyone re-adds a
`turn` parameter.
- `test_no_turn_field_on_alarmclaim`: introspects the dataclass
fields; fails if `detected_at_turn` reappears.
- `test_alarms_propagate_through_async_loop_alone`: the loop alone
(no manual emit_alarms / run_gossip_round) ratifies an alarm.
- `test_quiescence_report_counts_match_logs`: sanity check that
the report's emission count equals the crisis log length.
Suite: 163 -> 170 tests, all green in 0.79s.
Behavioral end-state is identical to the previous (synchronous)
version: same fact-check scenario, same byzantine equivocation, same
proof JSON shape, same three signers, same quorum-met outcome. The
difference is structural: the protocol now matches the paper's async
shape, and a future port to actual TCP gossip + concurrent agents
needs no change to this engine.
CrisisViz: still untouched. The `crisis_data.json` pipeline that
drives the visualizer is orthogonal.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
|||
| a1064660d5 |
Decentralize crisis_agents: agents own graphs, detect locally, vote by quorum
The previous design routed every Crisis message through a `Mothership` that
also held every agent's LamportGraph, ran the byzantine scan from a
privileged vantage, and built proofs from its own view. That made the
mothership a chokepoint — exactly what a BFT layer is supposed to remove.
This commit redistributes responsibility along the lines you'd expect from
a real open protocol:
Each `CrisisAgent` now owns:
- its own `LamportGraph` (the agent's view of the network)
- `emit_claim(claim) → Message`: wraps a Claim into a fully-valid Crisis
Message built from the agent's OWN graph state, with chain link + cross
references + mined PoW nonce
- `receive(message)`: extends my graph if integrity holds; idempotent
- `gossip_to(peer) → int`: shares everything I have with peer until
quiescence (Algorithm 4 in the paper, in-process flavor)
- `detect_mutations() → list[LocalAlarm]`: scans MY graph for same-id
spacelike vertex pairs via the existing
`LamportGraph.find_mutations`, filtered by application-layer
`statement_id` so cross-detector AlarmClaims canonicalize
The `Mothership` shrinks to coordinator-only:
- bootstrap (register honest agents; trigger boundary open with a joiner)
- clock (call each agent's `next_turn()` per turn)
- first-hop routing (sender's emission → declared target subset)
- all-pairs gossip rounds between turns
- emit_alarms_from_detectors(): poll each agent for its LocalAlarms,
wrap any returned alarms into AlarmClaim payloads, broadcast them as
Crisis Messages over the gossip layer
Gone (regression-tested in `test_no_chokepoint.py`):
- `Mothership._graphs`, `Mothership.all_graphs()`, `Mothership.graph_of()`
- `alarm.scan_for_mutations(mothership)`
- any path where the mothership reads an agent's internal state
New voting layer (`crisis_agents/vote.py`):
- `AlarmClaim`: a Crisis-payload dataclass discriminated by `kind="alarm"`.
Wraps the accused process_id, statement_id, witness_digests, and
detection turn. Round-trips through JSON same as Claim.
- `quorum_for(n) = ceil(2n/3)`: classic BFT threshold.
- `tally_alarms(graph, threshold)`: groups AlarmClaim vertices by
(accused, statement_id, witness_pair), counts unique signer
process_ids, ratifies groups meeting the threshold. Deterministic
ordering so two equal graphs produce equal `RatifiedAlarm` lists.
- `RatifiedAlarm`: the network-level consensus on byzantine behavior.
Multi-signer proofs (`crisis_agents/proof.py`):
- schema_version bumped 1 → 2.
- ProofDocument now embeds every signer's process_id_hex and the
quorum threshold that was met. Self-consistency check enforces
distinct signers, witness pairs, and signer count ≥ threshold.
Byzantine scenario rewrite:
- `MockByzantineAgent` now takes an `intro_claim` for its first turn (a
benign broadcast). The intro is technically necessary: the agent's two
contradictory variants both chain to the intro vertex, so they can
propagate through gossip — without it, the second variant would fail
the chain constraint in any graph already holding the first.
- `fact_check` scenario: closed phase still has 3 honest agents emitting
6 claims each into the closed log; Crisis phase grew to 2 turns (intro
+ equivocation) so the byzantine can establish its same-id anchor
before equivocating.
End-to-end CLI output reframed around six phases:
1. closed team (no Crisis)
2. boundary opens
3. emission + gossip
4. decentralized detection (each agent reports its own findings)
5. alarms emitted + gossiped + ratified by quorum
6. proof emission
Tests (51 fresh + 5 carried over for boundary):
- `test_mothership.py`: per-agent graph ownership, broadcast vs.
targeted delivery semantics, gossip propagation, regression guards
against the removed centralization attributes.
- `test_alarm.py`: every honest agent independently detects the same
mutation; the byzantine doesn't detect itself; witness pairs are
canonical across detectors.
- `test_vote.py`: AlarmClaim round-trip, quorum formulas, tally
determinism, mothership convenience method matches direct tallying.
- `test_proof.py`: build_proof from RatifiedAlarm; multi-signer JSON
round-trip; tampered-witness/below-quorum/duplicate-signer rejection.
- `test_no_chokepoint.py` (the centerpiece): after the full lifecycle,
every honest agent's ratified-alarm set is byte-identical. A single
byzantine accuser alone cannot ratify. Forbidden attributes don't
exist on Mothership.
Full suite: 163 tests, all green in 0.80s.
CrisisViz: untouched by this refactor. The `crisis_data.json` pipeline
the visualizer consumes is produced by the orthogonal
`crisis.demo.Simulation`, which this commit doesn't touch.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
|||
| 6aa2c54b68 |
Add LiveClaudeAgent — back honest agents with real Claude API calls
`crisis-agents demo --live` swaps the three honest MockAgents for
LiveClaudeAgent instances that issue one Anthropic Messages API call
per turn. The byzantine joiner stays mocked: making the byzantine
deterministic with an LLM would require multiple API calls per turn
(one per peer subset) for unreliable yields. Better demo legibility
to keep the equivocator scripted.
Prompt shape: the honest agent receives the reference doc, a list of
statements still to adjudicate, and the last 12 claims observed from
peers; it responds with a JSON array of {statement_id, verdict,
confidence, evidence} objects. The parser tolerates markdown fences
and per-item validation failures; malformed responses produce no
emissions rather than crashing the demo.
Default model: claude-haiku-4-5-20251001 — fast enough and cheap
enough for short-form structured-output adjudication. Override with
`--model <id>`.
Dependency: anthropic SDK as an optional install — `pip install -e
".[live]"`. Lazy-imported so the mocked path never needs it.
Tests: 6 new tests in test_live_agent.py using a fake Anthropic client
(no real API calls in CI):
- clean JSON response parsing
- markdown-fence tolerance
- malformed-response graceful degradation
- per-item validation skipping
- already-adjudicated statement filtering (the agent doesn't keep
re-asking about statements it has already answered)
- evidence-length truncation to Claim.EVIDENCE_MAX_LEN
Suite: 145 -> 150 tests, all green in 0.77s.
Manual test (not in CI; requires API credits):
pip install -e ".[live]"
export ANTHROPIC_API_KEY=...
crisis-agents demo --live
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
|||
| b8684297fa |
Add crisis_agents — Crisis as a coordination layer for AI agent teams
A new sibling Python package, `crisis_agents`, that lifts the Crisis
protocol from "consensus between machines" to "consensus between AI
agents". Threat model: a team of sub-agents normally talks freely
with its orchestrator (the "mothership"); when the team's boundary
opens and an external agent of unknown trust joins, the mothership
activates the Crisis layer so byzantine equivocation is detectable.
Two-phase orchestration model:
Phase 1 — closed team, no Crisis: agents emit claims directly, the
mothership collects them flat.
Phase 2 — boundary opens: every subsequent claim is wrapped into a
Crisis Message with the agent's stable process_id and a PoW nonce,
delivered into per-agent LamportGraphs, and after each turn the
mothership scans for mutations via LamportGraph.find_mutations.
Phase 3 — proof: when an alarm fires, the mothership emits a
replayable JSON proof-of-malfeasance document with the contradictory
witnesses, their delivery sets, and DAG cross-references showing
which honest agents saw what.
Modules:
- claim.py Claim dataclass + JSON round-trip
- boundary.py membership tracker + open() trigger
- agent.py CrisisAgent abstract + MockAgent + MockByzantineAgent
(the latter equivocates by emitting two variants to
disjoint peer subsets at the same logical turn)
- mothership.py orchestrator driving both phases, building Crisis
Messages from Claims, per-agent LamportGraphs, log
- alarm.py scan_for_mutations: same-agent same-turn distinct
digests with non-identical delivery sets, verified
spacelike via LamportGraph.are_spacelike on the
honest-agent graphs
- proof.py build_proof + ProofDocument + JSON serializer +
verify_proof_self_consistent
- cli.py `crisis-agents demo` + `crisis-agents verify`
- scenarios/ fact_check: reference doc + 6 statements + scripted
honest/byzantine agents producing a deterministic
equivocation on statement s03
Tests: 50 new tests across test_claim, test_boundary, test_mothership,
test_alarm, test_proof, test_demo_fact_check. End-to-end test runs the
fact_check scenario, asserts exactly one alarm raised, proof is built,
re-serialized JSON passes self-consistency. Full suite (existing
crisis + new crisis_agents) green in 0.77s — 145 tests.
Out of scope (deliberately): visualization (separate CrisisViz upgrade
later), real TCP gossip (agents talk via in-process function calls in
the mothership), false-claim detection without equivocation (an
agent that consistently lies but never equivocates is out-voted, not
"caught"; catching it would require a ground-truth oracle).
Reuse from existing crisis package: Message, Vertex, LamportGraph,
LamportGraph.find_mutations, ProofOfWorkWeight, digest. The new code
is a thin adapter layer; the protocol substrate did the heavy lifting.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
|||
| 7f830a36ef |
Advance Python test coverage — voting, recorder, simulation extensions
Pre-existing tests covered crypto / graph / message / order / rounds /
weight, but left three high-value modules unverified:
- voting.py — 25 KB of BBA virtual leader election + safe voting
pattern (Algorithms 6 & 7), the heart of the protocol. Zero
tests. Now 14 tests covering the four public entry points
(`build_knowledge_graph`, `select_quorum`, `voting_set`,
`compute_safe_voting_pattern`, `compute_virtual_leader_election`)
plus `initial_vote`. Uses a small in-process Simulation to
produce realistic multi-round graphs.
- recorder.py — the bridge that turns simulation runs into the
JSON consumed by CrisisViz. Zero tests despite being the choke
point: if recorder silently drops fields, the viz lies. Now 11
tests covering EventRecorder bookkeeping (sequence, filtering),
SimulationRecording integration (STEP_BEGIN/END,
MESSAGE_CREATED/DELIVERED), capture_snapshot well-formedness,
and JSON-serializability of both snapshots and event data.
- test_simulation.py extended with three regression guards:
- test_byzantine_vertices_flagged_in_snapshots: ensures the
`is_byzantine_source` flag survives the recorder pipeline.
CrisisViz's Ch10 (byzantine) chapter relies on this to
colour Dave's lane red.
- test_recorder_deterministic_with_seed: same seed produces
identical event-stream length and type ordering. Tightens
the existing vertex-count determinism check.
- test_consensus_pipeline_progresses: a fast claim that rounds
advance past 0 and the SVP / voting code paths engage. The
stronger claim (full convergence + non-empty total order)
takes minutes in pure Python and belongs in a separate
long-running benchmark, not the unit-test suite — but the
weaker claim is sufficient to catch the dead-pipeline
failure mode that motivated regenerating crisis_data.json
on 2026-05-04.
Suite: 72 -> 100 tests, all green in ~0.75s.
Explicitly out of scope (separate engineering effort):
- gossip.py / node.py TCP integration tests — heavy harness;
- export_json.py — thin composition of tested layers;
- Swift XCTest — the CrisisViz testbed harness already covers
the curriculum-correctness layer.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
|||
| 1df4790fb4 |
Initial implementation of the Crisis protocol (Richter, 2019)
Complete Python PoC of "Probabilistically Self Organizing Total Order in Unstructured P2P Networks". Implements all 10 algorithms from the paper: message generation, integrity checks, Lamport graphs, virtual synchronous rounds, safe voting patterns, virtual leader election (BA*), longest chain rule, total order via Kahn's algorithm, and push/pull gossip. Includes simulation harness, full node binary, and 72 passing tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |