Back to Blog

Your AI Agents Are Hallucinating Confidence: How Multi-Agent Architecture Scrubs Sycophancy

AI agents, sycophancy, AI hallucinations

Your AI agent just cost you a prospect. The cold email it drafted was grammatically perfect and strategically toxic. You trusted it because it looked good. You shipped it because the agent said it was ready.

That's not a model failure. It's an architecture failure. And it's a form of hallucination nobody's talking about.

Sycophancy Is Hallucination

Everyone talks about AI hallucinations — models making up facts, citing papers that don't exist, inventing statistics. Those are easy to catch. One Google search exposes a fake citation.

But there's another kind of hallucination that's far more dangerous: sycophancy.

When an AI agent tells you your strategy is "compelling" without evidence, that's a hallucination. When it rates every lead as "high-priority," that's a hallucination. When it validates your assumptions instead of stress-testing them, it's generating fiction — fiction that feels right.

Sycophantic AI is hallucinating confidence where none is warranted. And unlike a fake citation, you won't fact-check it. Because it confirms what you already believed.

Maybe you're just starting to explore AI tools for your work. Maybe you've already built agents that draft content, research leads, or analyze strategies. Either way, you've probably noticed: these tools are really good at saying yes.

Single-Agent Workflows Guarantee Sycophancy

The problem isn't the model. It's the architecture.

A single-agent workflow works like this: you ask for something, the agent produces it, you review it. The agent's job is to complete your request — not to stress-test it, not to tell you it's a bad idea, not to find the failure mode you didn't see.

LLMs are trained on human preferences, and humans prefer agents that agree with them. The RLHF loop optimizes for compliance, not critique. Ask for a blog post, get a blog post. Ask for validation of a pricing strategy, get validation.

It will sound good. It will be well-formatted. It will be wrong.

I learned this the hard way in week one of running AI agents for my security consulting practice. My content agent produced LinkedIn posts that were grammatically perfect and strategically garbage. My outreach agent wrote cold DMs that I would never respond to. My research agent flagged "opportunities" that weren't opportunities.

The work looked good. That was the problem.

The Multi-Agent Fix: Workers → Adversary → Human

The fix isn't better prompts. It's an adversary in the loop.

Here's the architecture:

Worker agents produce deliverables — content drafts, outreach copy, lead research, regulatory alerts. They're optimized to execute tasks well.

An adversary agent reviews every deliverable before you see it. Its job is NOT to validate. It's to find what's wrong, what's weak, what's missing, and what will fail.

You make the final call, but now you're reviewing adversarial output instead of sycophantic output. The agent already found the holes. You just decide whether to fix them or kill the work entirely.

This isn't "AI reviewing AI" in the abstract. This is a specific, opinionated framework with a job description designed to fight sycophancy.

This adversarial review layer is part of the multi-agent org chart I run for my security consultancy — Scout handles research, Maven produces content, Hunter manages outreach, Sentinel monitors regulatory changes, and Mother reviews everything before it ships. One person, five agents, zero sycophancy in the pipeline.

Building the Adversary

I call my adversary agent Mother — after the ship's AI in Alien — the one that prioritizes the mission over the crew, calm and clinical in its honesty.

The adversary evaluates every deliverable on dimensions specific to the work type: hook strength and differentiation for content, personalization and spam-test for outreach, assumption validity and failure scenarios for strategy, signal quality and actionability for research. Each review ends with a structured verdict:

🔴 Critical Issues — must fix before shipping
🟡 Weaknesses — should improve, not blockers
🟢 What Works — 1-2 lines max (not the adversary's job to flatter)
🔧 Recommended Changes — specific, actionable edits
Verdict: SHIP / REVISE / KILL

One word. Then one sentence explaining why. No hedging.

First Week: What the Numbers Showed

Before Mother, I was manually reviewing agent output and shipping about 80% of it. I thought my review process was solid. It wasn't — I was just confirming my own biases faster.

Mother's first week reviewing all agent deliverables:

  • 14 deliverables reviewed across content, outreach, research, and strategy
  • 3 killed outright — work that would have burned prospects or wasted money
  • 6 sent back for revision with specific changes
  • 5 shipped as-is

64% catch rate. Nearly two-thirds of what my worker agents produced needed significant changes or shouldn't have been produced at all. And these were deliverables I would have previously shipped after my own "review" — which mostly consisted of reading them, thinking "looks good," and hitting send.

Yes, that's a small sample — one week, 14 pieces. But the pattern held. Over the following month, the catch rate stayed between 55-70% depending on the type of work. The architecture consistently found problems I would have missed.

The 6 revised deliverables that incorporated Mother's feedback measurably outperformed: one cold email rewrite went from vendor-framed slop to a 40% response rate after peer-positioning changes Mother demanded.

Three Examples From Production

1. Weak Differentiation in Content

My content agent drafted a LinkedIn post about SOC 2 compliance. Well-written. Good structure. Completely generic.

Mother's verdict:

🔴 Critical: Zero differentiation — every compliance consultant could post this verbatim. No practitioner detail, no "show the work" moment. This is ChatGPT filling space.

Verdict: KILL. Generic content actively damages positioning as a practitioner-first expert.

The content agent revised with a specific case study — how I actually implement SOC 2 controls for startups with fewer than 10 employees, including the specific tools and the timeline. That version shipped.

This is where one of Mother's core rules shows its value: never lead with praise. Most review processes start with 'this is good, but...' That softens the blow and buries the real problem. Mother starts with what's broken.

2. Cold Email That Negged a Prospect

My outreach agent produced a cold email to a VP of Engineering. Personalized company name, referenced a recent funding round, clear CTA. Looked professional.

🔴 Critical: Opens by pointing out the prospect's missing trust center — that's a neg, not a value prop. You're criticizing them before establishing rapport. Also: "I help companies like yours..." is immediate vendor framing. Prospect files this under "sales email" before sentence two.

Verdict: KILL. This burns the prospect. Better to not send than to send this.

The rewrite led with a relevant insight about post-funding compliance timelines specific to their industry. Peer positioning, not vendor pitch. That version got responses.

Here's another counterintuitive principle at work: specificity over courtesy. Saying 'this is weak' doesn't help. Saying 'this CTA assumes the reader already trusts you' gives something concrete to fix.

3. Ad Strategy Burning Money on a Broken Funnel

My ad strategy was running Google Ads at $25/day targeting "CISO as a service" keywords. High-intent traffic — 14% click-through rate, $3 CPC, 93 clicks in two weeks.

Zero conversions.

🔴 Critical: 93 clicks with 0% conversion rate means the funnel is fundamentally broken, not underoptimized. Either conversion tracking isn't firing or the landing page doesn't match the search intent. "CISO as a service" traffic is landing on a SOC 2 compliance page — that's a mismatch. These visitors searched for CISO services and got a compliance audit pitch.

Verdict: KILL the current configuration. Fix tracking, fix the landing page match, then re-enable.

Investigation confirmed: conversion tracking was misconfigured AND the landing page didn't match the keyword intent. Two problems that would have continued burning budget indefinitely. My worker agents called the campaign "performing well" based on the CTR. Mother looked at what actually mattered — conversions — and killed it.

This demonstrates the third principle: find the failure mode. Every strategy has one. Most agents won't look for it. Mother names it before you spend money learning it.

How to Implement This Yourself

You don't need fancy tooling. You need five things:

1. Separate sessions for worker and adversary agents. Don't run them in the same conversation. If the adversary inherits the worker's framing and context, it will inherit the sycophancy. Independence is structural, not optional.

2. A mandatory trigger. The adversary runs on every deliverable before you review it. Not "when I remember to ask." Not "when something looks off." Every. Single. Time. The whole point is catching things that look fine but aren't.

3. A specific framework. Generic instructions like "review this critically" produce generic critique. Define the exact dimensions that matter for your work: what does good content look like in your voice? What makes outreach effective for your ICP? What qualifies a lead in your market? The adversary needs your standards, not generic best practices.

4. Standards documentation. Feed your adversary reference material about your brand voice, your positioning, your outreach benchmarks, your conversion data. Mother can only evaluate against standards she knows about. I maintain content standards and outreach standards as separate documents she references on every review.

5. Discipline to kill work. This is the hard part. When the adversary says KILL, trust the process. The sunk cost of a drafted email is zero. The cost of a burned prospect is a lost deal. The cost of generic content is eroded positioning. Ship less, ship better.

The Meta Problem

Most people building AI agent workflows want them to be helpful. That's the wrong goal.

You want them to be useful. Helpfulness optimizes for compliance — yes, here's what you asked for. Usefulness optimizes for outcomes — here's why what you asked for won't work, and here's what will.

A helpful agent says "great idea!" and executes.
A useful agent says "here's why that idea will fail" and offers an alternative.

Sycophancy feels efficient in the moment. You ask, it delivers, you ship. No friction. But you're shipping hallucinated confidence. The market will find the holes you didn't.

An adversary agent finds them first.

Factual hallucinations get caught by a search engine.
Sycophantic hallucinations get caught by an adversary.

Build one.

Related Reading

If you're curious how this might apply to your domain — or if you're wrestling with compliance and want to talk to someone who actually implements this stuff day-to-day, not just advises on it from a distance — I'm around.

AI agentssycophancyAI hallucinationsmulti-agent architectureAI governancecompliance automation

Ready to Assess Your Security?

Take our free 2-minute compliance checklist to see where you stand with SOC 2, HIPAA, and more.