crisis/tests/test_demo_fact_check.py
saymrwulf 0976239ebd crisis_agents: drop the wall-clock, drive asynchronously to quiescence
The previous driver imposed a synchronous turn-counted clock that the
Crisis paper explicitly forbids — Crisis is supposed to work in
asynchronous P2P networks, with any synchronicity being virtual and
derived inside the consensus algorithm from the DAG structure, not
imposed externally by a coordinator. This commit removes the wall clock.

What changed in the engine:

  - `Mothership.run_crisis_phase(num_turns, gossip_rounds_per_turn)`
    is replaced by `run_until_quiescent(max_steps=200)`. The loop
    interleaves three concerns on each iteration — emissions, gossip,
    and alarm emissions — until none make progress. Termination is by
    quiescence, not by a fixed turn count. `max_steps` is a safety
    bound (loop-iteration cap), not an exposed clock.

  - `Mothership.run_closed_phase(num_turns)` becomes
    `run_closed_phase(max_steps=50)`. Same quiescence model — the
    closed-phase conversation runs until no agent has more to say.

  - Agents grew `pending_alarm_claims()`: each agent checks its own
    graph for un-alarmed mutations and produces AlarmClaims directly.
    The driver loop calls this every iteration, so alarms emit and
    propagate in the same loop as regular emissions and gossip — no
    separate "alarm phase."

  - `Mothership.emit_alarms_from_detectors()` and the explicit
    `run_gossip_round()` step are no longer needed by callers; both
    are subsumed by the async loop. `run_gossip_round()` stays as a
    helper but tests no longer call it externally.

What changed in the agent interface:

  - `CrisisAgent.next_turn(turn, received_claims)` becomes
    `try_emit()` — no arguments. Agents in an async network don't see
    a global tick. They decide based on their own internal state.

  - `CrisisAgent.observe(claim)` is the new optional callback the
    closed-phase loop uses to feed context into agents that care
    (overridden by LiveClaudeAgent to populate its prompt buffer).

  - `pending_alarm_claims()` is idempotent: an internal
    `_already_alarmed` set tracks claims this agent has emitted, so
    the loop calls it every step without flooding the network with
    duplicate alarms.

What changed in the dataclass schema:

  - `AlarmClaim.detected_at_turn` -> `emitted_at_step`. The word
    "turn" implies a global clock; "step" is a per-agent sequence
    number used only for log ordering — local, not networked.

  - `ClosedPhaseEntry.turn` and `CrisisPhaseEntry.turn` -> `step`.
    Same rename, same reasoning.

  - `Scenario.closed_phase_turns` and `Scenario.crisis_phase_turns`
    are gone. The scenario no longer prescribes how many turns; it
    just provides agents and lets the async loop run them out.

What changed in the CLI:

  - Phase 3 reports "drove to quiescence in N step(s)" with a
    breakdown of regular emissions / gossip transfers / alarm
    emissions, instead of "ran N turns".

  - `QuiescenceReport` (new dataclass) carries the run statistics
    back from `run_until_quiescent`/`run_closed_phase` — steps taken,
    emissions made, gossip transfers, alarm claims emitted, plus
    whether termination was via quiescence or max-step cap.

New regression tests (`test_async_quiescence.py`):

  - `test_run_until_quiescent_terminates`: the loop must exit.
  - `test_two_runs_produce_identical_final_state`: determinism check —
    if anything in the loop depended on real wall time, this would
    fail.
  - `test_max_steps_bound_caps_runtime`: setting max_steps=1 exits
    immediately and `QuiescenceReport.reached_quiescence` reflects
    reality.
  - `test_no_turn_argument_exposed_to_agents`: introspects
    `CrisisAgent.try_emit` signature; fails if anyone re-adds a
    `turn` parameter.
  - `test_no_turn_field_on_alarmclaim`: introspects the dataclass
    fields; fails if `detected_at_turn` reappears.
  - `test_alarms_propagate_through_async_loop_alone`: the loop alone
    (no manual emit_alarms / run_gossip_round) ratifies an alarm.
  - `test_quiescence_report_counts_match_logs`: sanity check that
    the report's emission count equals the crisis log length.

Suite: 163 -> 170 tests, all green in 0.79s.

Behavioral end-state is identical to the previous (synchronous)
version: same fact-check scenario, same byzantine equivocation, same
proof JSON shape, same three signers, same quorum-met outcome. The
difference is structural: the protocol now matches the paper's async
shape, and a future port to actual TCP gossip + concurrent agents
needs no change to this engine.

CrisisViz: still untouched. The `crisis_data.json` pipeline that
drives the visualizer is orthogonal.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-14 22:06:56 +02:00

104 lines
3.5 KiB
Python

"""End-to-end test: the fact_check scenario walks the decentralized flow
and produces a quorum-ratified proof."""
import json
from pathlib import Path
from crisis_agents.cli import main as cli_main
from crisis_agents.mothership import Mothership
from crisis_agents.proof import (
ProofDocument,
build_proof,
verify_proof_self_consistent,
)
from crisis_agents.scenarios import build_fact_check_scenario
from crisis_agents.vote import quorum_for
class TestFactCheckEndToEnd:
def test_scenario_loads(self):
s = build_fact_check_scenario()
assert s.name == "fact_check"
assert len(s.honest_agents) == 3
assert s.byzantine_joiner.name == "agent_delta"
assert "Pluto" in s.reference_doc
def test_runs_through_all_phases(self):
s = build_fact_check_scenario()
m = Mothership()
for a in s.honest_agents:
m.add_agent(a)
m.run_closed_phase()
m.open_boundary(s.byzantine_joiner)
# One async run, no clock — alarms emit and propagate inside the loop
report = m.run_until_quiescent()
assert report.reached_quiescence
assert report.alarm_claims_emitted >= 3
threshold = quorum_for(m.boundary.size())
ratified_sets = [
m.ratified_alarms_from(name)
for name in ("agent_alpha", "agent_beta", "agent_gamma")
]
assert ratified_sets[0] == ratified_sets[1] == ratified_sets[2]
assert len(ratified_sets[0]) == 1
r = ratified_sets[0][0]
assert r.statement_id == "s03"
assert r.quorum_threshold == threshold
assert r.signer_count >= threshold
def test_proof_round_trips_through_json(self, tmp_path):
s = build_fact_check_scenario()
m = Mothership()
for a in s.honest_agents:
m.add_agent(a)
m.run_closed_phase()
m.open_boundary(s.byzantine_joiner)
m.run_until_quiescent()
r = m.ratified_alarms_from("agent_alpha")[0]
proof = build_proof(r)
out = tmp_path / "proof.json"
out.write_text(proof.to_json())
reloaded = ProofDocument.from_json(out.read_text())
assert verify_proof_self_consistent(reloaded).ok
class TestCli:
def test_cli_demo_runs(self, tmp_path, capsys):
exit_code = cli_main(["demo", "--scenario", "fact_check",
"--out-dir", str(tmp_path)])
assert exit_code == 0
captured = capsys.readouterr()
# The five named phases appear
for phase in ("Phase 1", "Phase 2", "Phase 3",
"Phase 4", "Phase 5", "Phase 6"):
assert phase in captured.out
# The chokepoint-free marker prints
assert "no chokepoint" in captured.out
# Exactly one proof file written
proofs = list(tmp_path.glob("proof_*.json"))
assert len(proofs) == 1
obj = json.loads(proofs[0].read_text())
assert obj["statement_id"] == "s03"
assert len(obj["signer_process_id_hexes"]) >= 3
def test_cli_verify_passes_on_valid_proof(self, tmp_path, capsys):
cli_main(["demo", "--scenario", "fact_check", "--out-dir", str(tmp_path)])
proof_path = next(tmp_path.glob("proof_*.json"))
capsys.readouterr()
exit_code = cli_main(["verify", str(proof_path)])
assert exit_code == 0
out = capsys.readouterr().out
assert "self-consistent: True" in out
def test_cli_unknown_scenario(self, capsys):
exit_code = cli_main(["demo", "--scenario", "nonexistent"])
assert exit_code == 2