Why AI recommends your competitor instead of you

05 Jun 2026 · 9 minute read

Max Wiesner

Co-founder, KnitKnot

The question everyone asks first

When AI recommends your competitor, the reason is usually not your product. Across 33,000 evaluations on ChatGPT, Claude, Perplexity, and Gemini, the losses trace to who owns the sources, which facts went stale, and whose features framed the comparison. All of which are content problems, not product problems.

Companies assume otherwise. The first thing everyone wants to know when they run a benchmark is why they’re losing, and the assumed answer is always: not enough features, wrong positioning, missing a capability the competitor has. Sometimes that’s true. Usually it isn’t.

The 33,000 evaluations cover head-to-head comparisons and brand perception queries for companies in B2B software, scored with structured signal extraction across 39,000+ feature comparisons and 513 distinct features. Here’s what we actually found.

Models disagree with each other 48.6% of the time

This was the finding that surprised us most.

We looked at every prompt that was evaluated by at least two engines with a decisive outcome (win or loss, excluding ties). 2,358 prompts met that criteria. Of those, 1,145 produced different outcomes depending on which model answered.

48.6%. Nearly half.

Same company. Same competitor. Same prompt. Different engine, different winner. A buyer who asks ChatGPT gets told to go with Vendor A. A buyer who asks Gemini gets told to go with Vendor B. Both responses are confident. Both cite sources. Both sound authoritative. They just disagree.

This means a company’s competitive position in AI isn’t a single number. It’s four numbers, one per model, and they may tell opposite stories. The buyer’s impression depends on which app they happened to open.

Not all engines are equal

The disagreement isn’t random. Each engine has systematic biases that show up consistently across companies.

Engine	Win rate	Avg score	Absence rate	Positive sentiment
Gemini	74.4%	67.5	17.7%	67.7%
Claude	71.9%	65.8	23.9%	63.8%
ChatGPT	69.4%	64.4	26.3%	63.1%
Perplexity	66.1%	62.7	38.8%	58.3%

The spread is 8.3 percentage points on win rate between the most favorable engine (Gemini, 74.4%) and the toughest (Perplexity, 66.1%). That’s not noise. For a company running hundreds of buyer evaluations a month across models, 8 points is the difference between winning two-thirds of comparisons and winning three-quarters of them.

The absence numbers are even more striking. On Perplexity, companies are completely absent from 38.8% of brand perception evaluations. On Gemini, it’s 17.7%. A 21-point gap in whether the model even knows you exist.

Why each engine behaves differently

The differences map to how each model finds and weighs information.

ChatGPT uses Bing’s index. It favors well-established sources with high domain authority. Companies with strong traditional SEO tend to do relatively well on ChatGPT, but the index can lag behind content updates by weeks or months. ChatGPT produces the highest rate of negative sentiment (14.8%) and the second-highest absence rate, which suggests it’s both opinionated and selective about who it includes.

Claude uses Brave Search. Its source mix skews differently from Bing, which means different pages show up as high-influence sources. We’ve seen cases where a company’s technical documentation ranks well in Brave but not Bing, producing a Claude evaluation that’s grounded in different source material than ChatGPT’s. Claude lands in the middle on most metrics.

Perplexity is the outlier. It’s the toughest engine across the board: lowest win rate, lowest average score, highest absence rate, and the most neutral sentiment (29.9% neutral vs ~20% for other engines). Perplexity cites Reddit at disproportionate rates, which means community perception carries more weight than vendor content. If an 11-month-old Reddit thread says your product has a limitation you’ve since fixed, Perplexity is the engine most likely to still be repeating it.

Gemini is the most favorable. Highest win rate, highest score, lowest absence rate, most positive sentiment. We don’t have a definitive explanation for why. One hypothesis: Gemini draws from Google’s index, which is the deepest and most current. Companies with strong Google SEO infrastructure have more source material available for Gemini to synthesize from, and more of it is current.

The absence problem

In 26.6% of brand perception evaluations across all engines, the benchmarked company was completely absent from the AI’s response. Not misrepresented. Not mentioned with wrong facts. Just not there. The AI answered a question about the company’s category and didn’t include them. Same company, same category, same question: one engine includes you, another doesn’t.

Absence is harder to fix than inaccuracy. When AI gets a fact wrong, the fix is specific: update the page, add structured data, publish a correction. When AI doesn’t know you exist, the fix is broader: build the content authority that earns you a place in the AI’s answer set. That takes time, and the winner-take-all dynamics mean the window for establishing presence is narrowing.

28.6% of evaluations score below 50

This is the number that reframes the overall win rate.

The aggregate decisive win rate across all engines is 70.7%. That sounds fine. Most companies are winning most of their head-to-head comparisons most of the time. But the distribution has a long tail.

28.6% of all evaluations, nearly 1 in 3, produce an AI Presence Score below 50. 15.5% score below 30. In those evaluations, the AI is actively working against the company: wrong facts, competitor-biased framing, missing coverage, negative sentiment.

The 70.7% average hides these. A company can win 75% of its ChatGPT evaluations and lose 60% of its Perplexity evaluations. The blended number looks healthy. The per-engine number reveals a channel where buyers are being systematically steered away.

What actually determines the recommendation

Based on our scoring decomposition, the recommendation in any single evaluation is driven by three interacting signals.

Source influence. Whose content shaped the AI’s answer? If the highest-gravity source is the competitor’s comparison page, the evaluation framework is built from their perspective. This is the most common driver of losses we see, and the most fixable. Publishing a comparison page that answers the buyer’s question from your frame changes the source that shapes the AI’s answer.

Claim accuracy. Did the AI get the facts right? Wrong pricing, feature misattribution, and outdated positioning all degrade the score. When the AI says you don’t support a feature you’ve had for a year, that’s a claim error that directly costs you the recommendation.

Feature-level outcomes. Across 39,117 feature comparisons and 513 distinct features, the pattern is clear: companies don’t win or lose across the board. They win on some features and lose on others. The features the AI chooses to compare determine who wins the evaluation. And the features the AI chooses to compare are shaped by the source material it has access to.

These three signals interact. A competitor-owned high-gravity source introduces the competitor’s feature strengths as the evaluation criteria, quotes their pricing accurately (because it’s from their own page), and produces a recommendation that logically follows from a framework designed to make them look good.

The fix in almost every case is specific content, not product changes. Write the comparison page. Update the pricing. Publish the feature documentation that makes your strengths the evaluation criteria instead of theirs.

What this changes about monitoring

If you’re monitoring one model, you’re seeing roughly half the picture. The model you’re tracking might show you winning while the model half your buyers use shows you losing. Single-model monitoring isn’t wrong. It’s incomplete in a way that creates false confidence.

Cross-model benchmarking isn’t just about coverage. It’s about finding the engines where you’re weakest and understanding why. The per-engine differences aren’t random. They trace to source selection, index freshness, and community content weighting. A company that’s losing on Perplexity because of a stale Reddit thread has a different problem than a company that’s losing on ChatGPT because of a competitor’s comparison page.

There’s an irony in how we ended up exposing this. KnitKnot runs an MCP server, so you can connect Claude or ChatGPT directly to your workspace and ask the assistant itself how your AI presence changed this week: score trends per engine, mention rollups, competitor deep dives, the full picture. The same models that disagree about you become the interface for monitoring the disagreement.

Four models, four source ecosystems, four sets of buyer impressions. The recommendation your buyer gets depends on which one they ask. The question is whether you know what each of them says.

‹ Back to blog

# Why AI recommends your competitor instead of you

## The question everyone asks first

## Models disagree with each other 48.6% of the time

## Not all engines are equal

## Why each engine behaves differently

## The absence problem

## 28.6% of evaluations score below 50

## What actually determines the recommendation

## What this changes about monitoring