crisis/tests/test_mothership.py
saymrwulf 0976239ebd crisis_agents: drop the wall-clock, drive asynchronously to quiescence
The previous driver imposed a synchronous turn-counted clock that the
Crisis paper explicitly forbids — Crisis is supposed to work in
asynchronous P2P networks, with any synchronicity being virtual and
derived inside the consensus algorithm from the DAG structure, not
imposed externally by a coordinator. This commit removes the wall clock.

What changed in the engine:

  - `Mothership.run_crisis_phase(num_turns, gossip_rounds_per_turn)`
    is replaced by `run_until_quiescent(max_steps=200)`. The loop
    interleaves three concerns on each iteration — emissions, gossip,
    and alarm emissions — until none make progress. Termination is by
    quiescence, not by a fixed turn count. `max_steps` is a safety
    bound (loop-iteration cap), not an exposed clock.

  - `Mothership.run_closed_phase(num_turns)` becomes
    `run_closed_phase(max_steps=50)`. Same quiescence model — the
    closed-phase conversation runs until no agent has more to say.

  - Agents grew `pending_alarm_claims()`: each agent checks its own
    graph for un-alarmed mutations and produces AlarmClaims directly.
    The driver loop calls this every iteration, so alarms emit and
    propagate in the same loop as regular emissions and gossip — no
    separate "alarm phase."

  - `Mothership.emit_alarms_from_detectors()` and the explicit
    `run_gossip_round()` step are no longer needed by callers; both
    are subsumed by the async loop. `run_gossip_round()` stays as a
    helper but tests no longer call it externally.

What changed in the agent interface:

  - `CrisisAgent.next_turn(turn, received_claims)` becomes
    `try_emit()` — no arguments. Agents in an async network don't see
    a global tick. They decide based on their own internal state.

  - `CrisisAgent.observe(claim)` is the new optional callback the
    closed-phase loop uses to feed context into agents that care
    (overridden by LiveClaudeAgent to populate its prompt buffer).

  - `pending_alarm_claims()` is idempotent: an internal
    `_already_alarmed` set tracks claims this agent has emitted, so
    the loop calls it every step without flooding the network with
    duplicate alarms.

What changed in the dataclass schema:

  - `AlarmClaim.detected_at_turn` -> `emitted_at_step`. The word
    "turn" implies a global clock; "step" is a per-agent sequence
    number used only for log ordering — local, not networked.

  - `ClosedPhaseEntry.turn` and `CrisisPhaseEntry.turn` -> `step`.
    Same rename, same reasoning.

  - `Scenario.closed_phase_turns` and `Scenario.crisis_phase_turns`
    are gone. The scenario no longer prescribes how many turns; it
    just provides agents and lets the async loop run them out.

What changed in the CLI:

  - Phase 3 reports "drove to quiescence in N step(s)" with a
    breakdown of regular emissions / gossip transfers / alarm
    emissions, instead of "ran N turns".

  - `QuiescenceReport` (new dataclass) carries the run statistics
    back from `run_until_quiescent`/`run_closed_phase` — steps taken,
    emissions made, gossip transfers, alarm claims emitted, plus
    whether termination was via quiescence or max-step cap.

New regression tests (`test_async_quiescence.py`):

  - `test_run_until_quiescent_terminates`: the loop must exit.
  - `test_two_runs_produce_identical_final_state`: determinism check —
    if anything in the loop depended on real wall time, this would
    fail.
  - `test_max_steps_bound_caps_runtime`: setting max_steps=1 exits
    immediately and `QuiescenceReport.reached_quiescence` reflects
    reality.
  - `test_no_turn_argument_exposed_to_agents`: introspects
    `CrisisAgent.try_emit` signature; fails if anyone re-adds a
    `turn` parameter.
  - `test_no_turn_field_on_alarmclaim`: introspects the dataclass
    fields; fails if `detected_at_turn` reappears.
  - `test_alarms_propagate_through_async_loop_alone`: the loop alone
    (no manual emit_alarms / run_gossip_round) ratifies an alarm.
  - `test_quiescence_report_counts_match_logs`: sanity check that
    the report's emission count equals the crisis log length.

Suite: 163 -> 170 tests, all green in 0.79s.

Behavioral end-state is identical to the previous (synchronous)
version: same fact-check scenario, same byzantine equivocation, same
proof JSON shape, same three signers, same quorum-met outcome. The
difference is structural: the protocol now matches the paper's async
shape, and a future port to actual TCP gossip + concurrent agents
needs no change to this engine.

CrisisViz: still untouched. The `crisis_data.json` pipeline that
drives the visualizer is orthogonal.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-14 22:06:56 +02:00

166 lines
6.7 KiB
Python

"""Tests for the slimmed-down Mothership (bootstrap + clock + routing only)."""
import pytest
from crisis_agents.agent import MockAgent, MockByzantineAgent
from crisis_agents.claim import Claim
from crisis_agents.mothership import Mothership
def _claim(sid: str, verdict: str = "true", evidence: str = "ok") -> Claim:
return Claim(statement_id=sid, verdict=verdict, confidence=0.9, # type: ignore[arg-type]
evidence=evidence, timestamp_logical=0)
def _intro(name: str = "delta") -> Claim:
"""A benign 'I have joined' claim for the byzantine's first turn."""
return Claim(statement_id=f"intro:{name}", verdict="unknown", confidence=1.0,
evidence=f"{name} joining the team", timestamp_logical=0)
class TestClosedPhase:
def test_no_dag_in_closed_phase_for_active_agents(self):
"""In the closed phase, agents don't extend their graphs."""
m = Mothership()
m.add_agent(MockAgent("a", [[_claim("s01")]]))
m.add_agent(MockAgent("b", [[_claim("s01")]]))
report = m.run_closed_phase()
# Two agents emitted one claim each via the closed-phase log
assert len(m.run_result.closed_log) == 2
# The async loop reached quiescence within the step budget
assert report.reached_quiescence
assert report.emissions == 2
# No Crisis messages sent yet, so per-agent graphs are still empty
for agent in m.agents.values():
assert agent.graph.vertex_count() == 0
assert not m.boundary.is_open
def test_add_agent_after_open_rejected(self):
m = Mothership()
m.add_agent(MockAgent("a", [[_claim("s01")]]))
m.open_boundary(MockByzantineAgent("byz", _intro("byz"), [], set(), set()))
with pytest.raises(RuntimeError, match="cannot add_agent"):
m.add_agent(MockAgent("late", []))
class TestCrisisPhaseAgentOwnership:
def test_each_agent_owns_its_graph(self):
"""After open_boundary every agent has its own LamportGraph."""
m = Mothership()
m.add_agent(MockAgent("a", [[]]))
m.add_agent(MockAgent("b", [[]]))
joiner = MockByzantineAgent("d", _intro(), [], set(), set())
m.open_boundary(joiner)
# Each agent has a graph attribute, and they're distinct objects
graphs = [a.graph for a in m.agents.values()]
assert len(graphs) == 3
assert len({id(g) for g in graphs}) == 3 # distinct identity
for g in graphs:
assert g.vertex_count() == 0
def test_broadcast_emission_reaches_every_agent(self):
"""A target_subset=None emission ends up in every peer's graph."""
m = Mothership()
m.add_agent(MockAgent("a", [[]]))
m.add_agent(MockAgent("b", [[]]))
# Joiner with a single broadcast intro, no equivocation script
joiner = MockByzantineAgent("d", _intro(), [], set(), set())
m.open_boundary(joiner)
m.run_until_quiescent()
for name, agent in m.agents.items():
assert agent.graph.vertex_count() == 1, (
f"agent {name!r} should have received the intro broadcast"
)
def test_targeted_emission_seeds_disjoint_views(self):
"""After the async loop with gossip, every honest agent sees both
variants — but the byzantine itself never has both in its own graph
(it never re-receives its own targeted emissions, and gossip from
honest peers may or may not feed them back).
The protocol-level invariant: the byzantine's two contradictory
vertices end up reachable to every honest agent. THAT is what
decentralized detection depends on.
"""
m = Mothership()
m.add_agent(MockAgent("a", [[]]))
m.add_agent(MockAgent("b", [[]]))
byz = MockByzantineAgent(
"d", _intro(),
scripted_pairs=[(
_claim("s03", verdict="true", evidence="to_a"),
_claim("s03", verdict="false", evidence="to_b"),
)],
split_a={"a"},
split_b={"b"},
)
m.open_boundary(byz)
m.run_until_quiescent()
# Every honest agent's graph has both variants of the equivocation
# (the post-condition that lets decentralized detection work).
for name in ("a", "b"):
payloads = [v.payload for v in m.agents[name].graph.all_vertices()]
assert any(b'"verdict":"true"' in p for p in payloads), (
f"agent {name!r} missing the true-variant"
)
assert any(b'"verdict":"false"' in p for p in payloads), (
f"agent {name!r} missing the false-variant"
)
class TestGossipRound:
def test_gossip_propagates_byzantine_equivocation(self):
"""After one gossip round, every honest agent has both variants —
the prerequisite for decentralized detection."""
m = Mothership()
m.add_agent(MockAgent("a", [[]]))
m.add_agent(MockAgent("b", [[]]))
m.add_agent(MockAgent("c", [[]]))
byz = MockByzantineAgent(
"d", _intro(),
scripted_pairs=[(
_claim("s03", verdict="true", evidence="to_ac"),
_claim("s03", verdict="false", evidence="to_b"),
)],
split_a={"a", "c"},
split_b={"b"},
)
m.open_boundary(byz)
# Two turns (intro + equivocation), then gossip
m.run_until_quiescent()
# After gossip, every honest agent should have both byzantine variants
# (intro + 2 equivocations = 3 vertices minimum). The byzantine itself
# ends up with intro + everything its peers shared back.
for name in ("a", "b", "c"):
payloads = [v.payload for v in m.agents[name].graph.all_vertices()]
assert any(b'"verdict":"true"' in p for p in payloads), (
f"agent {name!r} missing the true-variant after gossip"
)
assert any(b'"verdict":"false"' in p for p in payloads), (
f"agent {name!r} missing the false-variant after gossip"
)
def test_mothership_doesnt_hold_a_graph_dict(self):
"""Regression guard against the chokepoint we just removed."""
m = Mothership()
# The old API exposed `m.all_graphs()` and `m.graph_of(name)`.
# Neither should exist now.
assert not hasattr(m, "all_graphs")
assert not hasattr(m, "graph_of")
assert not hasattr(m, "_graphs")
def test_run_crisis_phase_requires_open_boundary(self):
m = Mothership()
m.add_agent(MockAgent("a", [[_claim("s01")]]))
with pytest.raises(RuntimeError, match="boundary not yet open"):
m.run_until_quiescent()