Sections in this pattern
Adversary
(Orchestration Pattern)
Name
Adversary
Also known as: Critic, Red-Team Role, Negative Channel.
Intent
Assign a structurally separate role whose only job is to find failures in another role’s output, and require that role to emit a negative channel the orchestrator can inspect.
Problem
A proposer produces work. The same proposer is then asked to “also list weaknesses,” or a critic role is placed beside the proposer but given the same context and an optional feedback prompt.
That can look like verification:
- the transcript contains a message from a role named
critic; - the prompt says “be critical”;
- the critic gives suggestions before approval;
- the workflow can continue when the critic says the work looks acceptable.
The boundary still collapses when the proposer grades its own work or when the negative channel is optional. A critic that can return an empty feedback string without recording “no defect found” has not performed an adversarial pass. It has only added a chance for the model to rationalize the draft.
verification_design.md Principle 1 rejects self-review as a verification signal. Principle 7 names the stronger form: cross-family verification beats self-verification. Adversary is the single-role orchestration primitive underneath those principles. It makes who critiques whom a runtime fact, not a tone instruction.
Forces
- Separate role vs. single-agent self-critique. A second role costs tokens and routing complexity; self-critique is cheaper but preserves the same blind spot.
- Shared context vs. blind critique. Full context is easy to pass, but it can contaminate the critic. A Blind Oracle or Cross-Family verifier may be needed for stronger independence.
- Mandatory negative channel vs. optional feedback. Optional feedback collapses to “looks good.” A mandatory negative channel must either list defects or record an explicit no-defect verdict.
- Single critic vs. panel. One adversary is the primitive. Multi-round disagreement belongs to Debate.
- Same family vs. cross-family. A same-family adversary can still share latent priors with the proposer. Cross-Family strengthens the role boundary.
Solution
Make adversarial assignment explicit in code.
The orchestrator names a proposer, names a critic, rejects critic_id == proposer_id, and requires a structured critique artifact. The artifact must contain:
- proposer identity;
- critic identity;
- the artifact being critiqued;
- weaknesses, risks, or rejected assumptions;
- suggestions or next action;
- a score or verdict;
- an explicit
no_defect_foundverdict if no weakness is found.
The role label is not enough. The load-bearing structure is identity separation plus a required negative channel.
Mechanism
- Assign role identities. Give each proposal an
author_id, and give each critique a distinctcritic_id. - Reject self-critique. The orchestrator refuses to route a proposal back to its author as the adversary.
- Run the critic under a findings schema. The critic returns structured weaknesses, suggestions, score, and verdict.
- Require a negative channel. A critique must contain at least one weakness or an explicit
no_defect_foundverdict. - Route findings onward. Findings gate release, route to Backpressure, or escalate. Routing is outside this card; the adversary only creates the failure signal.
Pattern / Antipattern
The same task: evaluate a proposal before it can advance. The antipattern side is intentionally uncovered in this pass. The pattern side shows the minimal identity and negative-channel assertions a verifier can inspect.
Antipattern: uncovered confirmatory-critic instance
No strict Adversary antipattern was promoted from the OSS bench surveyed for this catalog.
The natural failure shape is a confirmatory critic: a role named critic or adversary that shares the proposer’s context, asks for constructive feedback, and can approve without recording weaknesses or an explicit no-defect verdict. That shape is already covered by Adversarial Frame. A same-family critic that is treated as independent evidence is already covered by Cross-Family.
This card keeps the Antipattern instance empty rather than inventing a second copy of those failures. When a strict instance is mined, re-author this section around the assertion critic_id == proposal.author_id or negative_channel_present is False.
Pattern: separate critic with mandatory findings
The structured implementation refuses self-critique and validates that every critique either names weaknesses or records an explicit no-defect verdict.
from dataclasses import dataclass
from typing import Literal
Verdict = Literal["defects_found", "no_defect_found"]
@dataclass(frozen=True)
class Proposal:
proposal_id: str
author_id: str
content: str
@dataclass(frozen=True)
class Critique:
proposal_id: str
proposer_id: str
critic_id: str
weaknesses: tuple[str, ...]
suggestions: tuple[str, ...]
score: int
verdict: Verdict
def require_adversary(proposal: Proposal, critic_id: str) -> None:
if critic_id == proposal.author_id:
raise ValueError("critic must be distinct from proposer")
def require_negative_channel(critique: Critique) -> None:
has_weakness = len(critique.weaknesses) > 0
has_no_defect_verdict = critique.verdict == "no_defect_found"
if not (has_weakness or has_no_defect_verdict):
raise ValueError("critique must include weaknesses or no_defect_found")
def run_adversary(proposal: Proposal, critic_id: str, critic_fn) -> Critique:
require_adversary(proposal, critic_id)
critique = critic_fn(proposal=proposal, critic_id=critic_id)
require_negative_channel(critique)
return critique
proposal = Proposal(
proposal_id="p-017",
author_id="planner",
content="Ship the migration without a rollback check.",
)
def critic_fn(proposal: Proposal, critic_id: str) -> Critique:
return Critique(
proposal_id=proposal.proposal_id,
proposer_id=proposal.author_id,
critic_id=critic_id,
weaknesses=("No rollback check is defined.",),
suggestions=("Add a rollback verification gate before release.",),
score=42,
verdict="defects_found",
)
critique = run_adversary(proposal, critic_id="critic", critic_fn=critic_fn)
assert critique.critic_id != proposal.author_id
assert critique.weaknesses or critique.verdict == "no_defect_found"
AutoGPT’s multi_agent_debate.py in classic/original_autogpt/ has this shape as a legacy v1 instance. Its critique artifact records critic_id, target_agent_id, weaknesses, suggestions, and score. Its critique phase skips self-critique by skipping j == i, so a proposal owner does not critique itself.
AutoGen’s writer and critic example in the migration guide is a partial instance. The critic is a named role in a RoundRobinGroupChat, and TextMentionTermination("APPROVE") makes explicit approval the release condition. That shape is also Backpressure because unresolved critic feedback keeps the loop running.
Determinism Move
Adversary constrains self_review_bias by making the proposer unable to satisfy the adversarial step alone. The critic identity is external to the proposal, and the assertion rejects critic_id == proposer_id.
A same-family critic can still share the proposer’s blind spots. Adversary by itself does not constrain same_family_bias; the recorded role boundary is where Cross-Family can attach its enforced family check.
The determinism move is making the negative channel mandatory and the critic’s identity external.
Observable Signal
Every Adversary report should include:
- proposer id;
- critic id;
- self-critique skipped boolean;
- negative-channel present boolean;
- weakness count;
- critique score;
- verdict;
- routing decision, such as
release,revise, orescalate.
A useful report makes the role boundary visible:
proposal_id: p-017
proposer_id: planner
critic_id: critic
self_critique_skipped: true
negative_channel_present: true
weakness_count: 1
critique_score: 42
verdict: defects_found
routing_decision: revise
Failure Modes
- Confirmatory Critic: the role is named critic, but the prompt asks for constructive feedback and approval. Use Adversarial Frame so the critic must search for failure before approval.
- Self-Critique: the critic and proposer are the same role or model call. Assert identity separation before routing the critique.
- Optional Negative Channel: the critic can return empty feedback with no recorded
no_defect_foundverdict. Reject empty critiques unless the no-defect verdict is explicit. - Toothless Adversary: findings are produced but never gate, revise, or escalate. Connect the report to Backpressure or Escalation Chain.
Use When
Use this pattern when:
- a single proposer’s blind spots are costly;
- the workflow can afford a second role;
- the system needs an explicit failure-search step before release;
- the critique should create a routable artifact, not only prose;
- later Backpressure, Escalation Chain, or Debate steps need a negative signal.
Do Not Use When
Do not reach for Adversary when:
- the task is trivial and a second role would add process noise;
- the critic would be the same model, same prompt context, and same family, with no recorded independence;
- an Executable Analog or Comparator can decide the property without an LLM critic;
- the desired structure is multi-round disagreement among several roles. Use Debate for that.
If only a same-family critic is available, label the result as a weak adversarial pass and maximize executable checks around it.
Evidence
- Verification Design Principles 1 and 7: the design doc rejects self-review as a verification signal and frames independent verification as stronger than same-family review.
- AutoGPT multi-agent debate: the orchestration sweep records a direct Adversary instance:
AgentCritiquenames critic and target identities, records weaknesses and suggestions, and skips self-critique in the critique phase. - AutoGPT legacy caveat: the same evidence lives in
classic/original_autogpt/, so it is treated as a legacy v1 implementation, not a current framework recommendation. - AutoGen writer and critic migration guide: the orchestration sweep records a partial instance where a critic role must emit
APPROVEbefore the writer/critic loop terminates. - No promoted antipattern: the orchestration sweep did not promote a strict Adversary antipattern; this card cross-references Adversarial Frame and Cross-Family instead of inventing one.
Related Patterns
- Adversarial Frame: defines the default-no and admissibility logic an adversary applies to each finding.
- Cross-Family: strengthens the adversary by making the critic come from a different model family.
- Debate: generalizes Adversary into multi-round, multi-critic disagreement.
- Escalation Chain: receives unresolved adversary findings when the critic cannot safely approve.
- Backpressure: routes adversary findings back to the proposer for revision.
Updated 2026-06-10 · View source · Report an error