Blog

Confident lies are worse than hedged ones

30 Apr 2026 · 6 minute read

Max Wiesner

Co-founder, KnitKnot

The same lie, two ways

Here are two things ChatGPT might say about your company in a head-to-head comparison. Both are false.

“Acme does not support SOC 2 Type II compliance.”

“Acme’s SOC 2 Type II compliance status isn’t well-documented, so it may be worth verifying directly.”

If you’re just measuring accuracy, these are the same: one false claim, one penalty. But if you’re a buyer reading this while evaluating compliance tools, they land completely differently. The first one closes a door. The second one sends the buyer to your website, where they find out you’ve had SOC 2 Type II for three years.

The difference isn’t accuracy. It’s conviction. And those are independent axes.

The accuracy-conviction matrix

Most AI benchmarks operate on one dimension: is the claim true or false? That matters. But it’s half the picture. The other half is how confidently the AI states the claim, because that determines whether the buyer acts on it, questions it, or ignores it entirely.

Those two axes create four quadrants, and each one tells a different story:

	High conviction	Low conviction
True claim	Best case. Buyer trusts it, acts on it. Your positioning lands.	Wasted truth. Correct info that the buyer second-guesses. You had the answer and the AI undersold it.
False claim	Worst case. Buyer acts on misinformation. The deal may be over before you know it happened.	Damage contained. The AI is wrong but sounds unsure. Buyer might verify. The door stays open.

A benchmark that only measures accuracy treats the left column and right column as identical. But the buyer outcome is radically different. A confident falsehood is the worst quadrant because the buyer has no reason to question it. A hedged truth is a missed opportunity because the buyer does question it, even though the answer was right.

The interesting insight is that the top-right and bottom-left quadrants can be equally damaging. A true claim stated with no conviction and a false claim stated with low conviction both result in the buyer going elsewhere to verify. The difference is what they find when they get there.

How we measure conviction today

Every AI response gets classified into one of three conviction tiers: certain, tentative, or uncertain. The classification happens inside the same semantic judging pass that scores the rest of the response. The judge reads the hedging language (“I think,” “it seems,” “possibly,” “to my knowledge”) and the assertion language (“definitely,” “certainly,” “is the clear choice”) in context and assigns the tier.

Our first version counted regex matches against the raw text. It was cheap, but it misread quoted hedges, negated assertions, and hedges aimed at the competitor rather than at you. Reading conviction is a comprehension task, so it moved into the judge. The tiering stays deliberately conservative: we’d rather miss a confident lie than falsely amplify a penalty.

The conviction tier then modifies our claim accuracy component asymmetrically:

Certain + false claims: False claim count boosted by 1.3x before computing accuracy. This compounds. Three confident lies in the same response score significantly worse than three hedged ones.

Uncertain + true claims: Accuracy discounted by 0.9x. The information is correct, but the buyer doesn’t know that. A hedged truth carries less weight in a purchasing decision than a confident one.

Tentative: No adjustment. The base score stands.

This is an approximation. Three buckets and two constants don’t capture the full conviction spectrum. But even this coarse model surfaces real patterns: some engines state falsehoods with more certainty than others, consistently, across dozens of evaluations.

Where this goes

The conviction axis opens up analysis that pure accuracy benchmarks can’t do.

Per-claim conviction scoring. Right now we classify conviction at the response level. The next step is per-claim: the AI might hedge on one claim and assert another confidently in the same paragraph. A response with one confident lie and four hedged truths tells a different story than five tentative claims.

Engine conviction profiles. We already see that engines have different conviction signatures. Some models hedge systematically. Some state everything with equal confidence regardless of whether they have evidence. Plotting conviction against accuracy per engine reveals which models are calibrated (high conviction correlates with high accuracy) and which are overconfident (high conviction, mixed accuracy).

Conviction drift. As models update, their conviction patterns shift. A model that used to hedge on a topic might start asserting it confidently after a training data refresh. Tracking conviction over time reveals when an engine’s relationship with your company’s information changes, even if the underlying accuracy stays flat.

The goal is a continuous conviction score, 0.0 to 1.0, derived from linguistic markers. The penalty becomes a curve, not a step function. A claim stated at 0.95 conviction that’s wrong is dramatically worse than one at 0.6 conviction. And a true claim at 0.3 conviction is almost as bad as a false one, because the buyer walks away unconvinced either way.

Accuracy tells you what the AI knows. Conviction tells you what the buyer believes. We think you need both.

‹ Back to blog

# Confident lies are worse than hedged ones

## The same lie, two ways

## The accuracy-conviction matrix

## How we measure conviction today

## Where this goes

The same lie, two ways

The accuracy-conviction matrix

How we measure conviction today

Where this goes