mirror of
https://github.com/saymrwulf/crisis.git
synced 2026-05-14 20:37:54 +00:00
The previous driver imposed a synchronous turn-counted clock that the
Crisis paper explicitly forbids — Crisis is supposed to work in
asynchronous P2P networks, with any synchronicity being virtual and
derived inside the consensus algorithm from the DAG structure, not
imposed externally by a coordinator. This commit removes the wall clock.
What changed in the engine:
- `Mothership.run_crisis_phase(num_turns, gossip_rounds_per_turn)`
is replaced by `run_until_quiescent(max_steps=200)`. The loop
interleaves three concerns on each iteration — emissions, gossip,
and alarm emissions — until none make progress. Termination is by
quiescence, not by a fixed turn count. `max_steps` is a safety
bound (loop-iteration cap), not an exposed clock.
- `Mothership.run_closed_phase(num_turns)` becomes
`run_closed_phase(max_steps=50)`. Same quiescence model — the
closed-phase conversation runs until no agent has more to say.
- Agents grew `pending_alarm_claims()`: each agent checks its own
graph for un-alarmed mutations and produces AlarmClaims directly.
The driver loop calls this every iteration, so alarms emit and
propagate in the same loop as regular emissions and gossip — no
separate "alarm phase."
- `Mothership.emit_alarms_from_detectors()` and the explicit
`run_gossip_round()` step are no longer needed by callers; both
are subsumed by the async loop. `run_gossip_round()` stays as a
helper but tests no longer call it externally.
What changed in the agent interface:
- `CrisisAgent.next_turn(turn, received_claims)` becomes
`try_emit()` — no arguments. Agents in an async network don't see
a global tick. They decide based on their own internal state.
- `CrisisAgent.observe(claim)` is the new optional callback the
closed-phase loop uses to feed context into agents that care
(overridden by LiveClaudeAgent to populate its prompt buffer).
- `pending_alarm_claims()` is idempotent: an internal
`_already_alarmed` set tracks claims this agent has emitted, so
the loop calls it every step without flooding the network with
duplicate alarms.
What changed in the dataclass schema:
- `AlarmClaim.detected_at_turn` -> `emitted_at_step`. The word
"turn" implies a global clock; "step" is a per-agent sequence
number used only for log ordering — local, not networked.
- `ClosedPhaseEntry.turn` and `CrisisPhaseEntry.turn` -> `step`.
Same rename, same reasoning.
- `Scenario.closed_phase_turns` and `Scenario.crisis_phase_turns`
are gone. The scenario no longer prescribes how many turns; it
just provides agents and lets the async loop run them out.
What changed in the CLI:
- Phase 3 reports "drove to quiescence in N step(s)" with a
breakdown of regular emissions / gossip transfers / alarm
emissions, instead of "ran N turns".
- `QuiescenceReport` (new dataclass) carries the run statistics
back from `run_until_quiescent`/`run_closed_phase` — steps taken,
emissions made, gossip transfers, alarm claims emitted, plus
whether termination was via quiescence or max-step cap.
New regression tests (`test_async_quiescence.py`):
- `test_run_until_quiescent_terminates`: the loop must exit.
- `test_two_runs_produce_identical_final_state`: determinism check —
if anything in the loop depended on real wall time, this would
fail.
- `test_max_steps_bound_caps_runtime`: setting max_steps=1 exits
immediately and `QuiescenceReport.reached_quiescence` reflects
reality.
- `test_no_turn_argument_exposed_to_agents`: introspects
`CrisisAgent.try_emit` signature; fails if anyone re-adds a
`turn` parameter.
- `test_no_turn_field_on_alarmclaim`: introspects the dataclass
fields; fails if `detected_at_turn` reappears.
- `test_alarms_propagate_through_async_loop_alone`: the loop alone
(no manual emit_alarms / run_gossip_round) ratifies an alarm.
- `test_quiescence_report_counts_match_logs`: sanity check that
the report's emission count equals the crisis log length.
Suite: 163 -> 170 tests, all green in 0.79s.
Behavioral end-state is identical to the previous (synchronous)
version: same fact-check scenario, same byzantine equivocation, same
proof JSON shape, same three signers, same quorum-met outcome. The
difference is structural: the protocol now matches the paper's async
shape, and a future port to actual TCP gossip + concurrent agents
needs no change to this engine.
CrisisViz: still untouched. The `crisis_data.json` pipeline that
drives the visualizer is orthogonal.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
152 lines
5.7 KiB
Python
152 lines
5.7 KiB
Python
"""
|
|
vote.py — turn local alarms into ratified alarms via gossip + quorum.
|
|
|
|
Design:
|
|
1. An honest agent that detects a mutation emits an `AlarmClaim` —
|
|
a special payload structured like a Crisis Claim but with `kind="alarm"`
|
|
and the accused/witnesses encoded in fields. AlarmClaims are wrapped
|
|
into ordinary Crisis Messages with the *detector's* process id, so they
|
|
gossip through the network like everything else.
|
|
|
|
2. After enough gossip, every honest agent's graph contains AlarmClaim
|
|
vertices from every other honest detector. Tallying happens locally:
|
|
`count_alarm_votes(graph, accused, statement_id)` counts unique signer
|
|
ids and returns the set of detectors who have weighed in.
|
|
|
|
3. A `RatifiedAlarm` is produced when ≥ quorum_threshold detectors agree
|
|
on the same (accused, statement_id, witness_digests) tuple.
|
|
|
|
The quorum threshold is `ceil(2 * N_trusted / 3)` where N_trusted is the
|
|
size of the boundary set at the moment of ratification. In the canonical
|
|
fact_check scenario: N=4 (3 honest + 1 byzantine), 2/3 rounded up = 3. So
|
|
all three honest detectors must concur — exactly the protection we want
|
|
against a single byzantine accuser ostracizing an honest agent.
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
import json
|
|
import math
|
|
from dataclasses import asdict, dataclass, field
|
|
|
|
from crisis.graph import LamportGraph
|
|
|
|
from crisis_agents.alarm import LocalAlarm
|
|
|
|
|
|
ALARM_KIND = "alarm"
|
|
|
|
|
|
@dataclass(frozen=True)
|
|
class AlarmClaim:
|
|
"""An on-the-wire alarm — the detector's statement, signed via the
|
|
Crisis message wrapping (process_id + PoW nonce).
|
|
|
|
Serializes to JSON for the Crisis Message payload. Recognizable by
|
|
`kind == "alarm"`, distinguishing it from a regular `Claim` payload
|
|
(which has `kind` absent or != "alarm" by convention).
|
|
|
|
`emitted_at_step` is the agent's local sequence number for ordering;
|
|
it is NOT a global clock tick — Crisis is asynchronous.
|
|
"""
|
|
accused_process_id_hex: str
|
|
statement_id: str
|
|
witness_digests: tuple[str, str]
|
|
emitted_at_step: int
|
|
kind: str = ALARM_KIND
|
|
|
|
def to_payload(self) -> bytes:
|
|
return json.dumps(asdict(self), sort_keys=True, separators=(",", ":")).encode("utf-8")
|
|
|
|
@classmethod
|
|
def from_payload(cls, payload: bytes) -> "AlarmClaim":
|
|
obj = json.loads(payload.decode("utf-8"))
|
|
if obj.get("kind") != ALARM_KIND:
|
|
raise ValueError("not an AlarmClaim payload")
|
|
return cls(
|
|
accused_process_id_hex=obj["accused_process_id_hex"],
|
|
statement_id=obj["statement_id"],
|
|
witness_digests=tuple(obj["witness_digests"]), # type: ignore[arg-type]
|
|
emitted_at_step=obj["emitted_at_step"],
|
|
)
|
|
|
|
@classmethod
|
|
def from_local_alarm(cls, alarm: LocalAlarm, emitted_at_step: int) -> "AlarmClaim":
|
|
return cls(
|
|
accused_process_id_hex=alarm.accused_process_id_hex,
|
|
statement_id=alarm.statement_id,
|
|
witness_digests=alarm.witness_digests,
|
|
emitted_at_step=emitted_at_step,
|
|
)
|
|
|
|
|
|
@dataclass(frozen=True)
|
|
class RatifiedAlarm:
|
|
"""Network-level consensus on a byzantine equivocation.
|
|
|
|
Produced by `tally_alarms()` when ≥ quorum signers have emitted matching
|
|
AlarmClaims into a graph.
|
|
"""
|
|
accused_process_id_hex: str
|
|
statement_id: str
|
|
witness_digests: tuple[str, str]
|
|
signer_process_id_hexes: tuple[str, ...] # sorted, unique
|
|
quorum_threshold: int
|
|
|
|
@property
|
|
def signer_count(self) -> int:
|
|
return len(self.signer_process_id_hexes)
|
|
|
|
|
|
def quorum_for(n_trusted: int) -> int:
|
|
"""Quorum threshold: ceil(2 * n / 3)."""
|
|
return math.ceil(2 * n_trusted / 3)
|
|
|
|
|
|
def collect_alarm_claims(graph: LamportGraph) -> list[tuple[bytes, AlarmClaim]]:
|
|
"""Walk `graph` and return every (signer_process_id, AlarmClaim) pair.
|
|
|
|
Vertices that aren't AlarmClaim-payloaded are skipped silently. The
|
|
signer's process id is the vertex's `id` field — that's the Crisis-layer
|
|
cryptographic signature of who emitted the claim.
|
|
"""
|
|
out: list[tuple[bytes, AlarmClaim]] = []
|
|
for v in graph.all_vertices():
|
|
try:
|
|
claim = AlarmClaim.from_payload(v.payload)
|
|
except (ValueError, TypeError):
|
|
continue
|
|
out.append((v.id, claim))
|
|
return out
|
|
|
|
|
|
def tally_alarms(graph: LamportGraph, *, quorum_threshold: int) -> list[RatifiedAlarm]:
|
|
"""Count AlarmClaims in `graph` and emit RatifiedAlarms for groups that
|
|
meet quorum.
|
|
|
|
Groups by (accused, statement_id, witness_digests). Counts unique signer
|
|
process_ids per group. If the count meets or exceeds `quorum_threshold`,
|
|
the group ratifies.
|
|
|
|
The same agent's graph being scanned multiple times produces identical
|
|
results — there's no implicit ordering or non-determinism. Two agents'
|
|
graphs that have converged via gossip produce the same RatifiedAlarms.
|
|
"""
|
|
by_group: dict[tuple[str, str, tuple[str, str]], set[bytes]] = {}
|
|
for signer_pid, claim in collect_alarm_claims(graph):
|
|
key = (claim.accused_process_id_hex, claim.statement_id, claim.witness_digests)
|
|
by_group.setdefault(key, set()).add(signer_pid)
|
|
|
|
ratified: list[RatifiedAlarm] = []
|
|
for (accused, statement_id, witnesses), signers in by_group.items():
|
|
if len(signers) >= quorum_threshold:
|
|
ratified.append(RatifiedAlarm(
|
|
accused_process_id_hex=accused,
|
|
statement_id=statement_id,
|
|
witness_digests=witnesses,
|
|
signer_process_id_hexes=tuple(sorted(s.hex() for s in signers)),
|
|
quorum_threshold=quorum_threshold,
|
|
))
|
|
# Stable ordering so equal graphs produce equal lists
|
|
ratified.sort(key=lambda r: (r.accused_process_id_hex, r.statement_id))
|
|
return ratified
|