Ask an AI for 100 Arguments — It Repeats Itself

AI text sounds fluent but its arguments repeat: Pangram CEO Max Spero says models cluster in a narrow band, while human arguments span a much wider space.

4 min readEAEvgenii ArsentevEvgenii Arsentev · PhD

Max Spero, CEO of Pangram — an AI text detection company — described something counterintuitive about how language models give themselves away. It's not sloppy grammar. It's not telltale phrases like 'certainly' or 'as an AI'. It's the fact that when you ask a language model for 100 arguments on any topic, you get 100 arguments that cluster into a narrow band. Ask 100 different people the same question and you get something far more scattered.

"Ask an LLM for 100 arguments on a topic and they'll cluster in a narrow band, whereas the space of human arguments is going to be very diverse," Spero told AI Policy Perspectives. Pangram's detector uses a deep-learning classifier that looks for these structural patterns rather than surface-level word choices. It can identify AI-generated text even when the classifier can't fully explain which pattern it found — it learns to recognize the shape of AI reasoning, not its vocabulary.

Why argument diversity is a hard problem for AI

This is a deeper problem for AI text than surface fluency would suggest. Language models are trained to produce coherent, reasonable-sounding text — and they do. But 'reasonable-sounding' means drawing from the most common, most well-supported, most expected arguments in the training data. Models develop a kind of gravity toward the center: the conventional, the canonical, the well-trodden. Individual humans escape that center all the time through personal experience, contrarian instincts, gaps in knowledge, and the simple randomness of how they approach any question. Models don't have that randomness. Their outputs are statistically more predictable than any human writer's, even when they sound perfectly natural.

What I'd actually do

If you're producing content with AI that needs to stand up to scrutiny — for publication, applications, or anything where AI detection could matter — the problem isn't that it sounds robotic. The problem is that it sounds average. Force variety: generate four or five different versions of the same argument, pick the most unexpected one, rewrite key sections yourself, and explicitly ask for unusual or unpopular angles. That's not a perfect defense, but it breaks the clustering.

The practical implication runs in two directions. If you're trying to detect AI content, argument-pattern analysis is likely more durable than looking for specific phrases — because phrases are easy to vary, while the underlying distribution of arguments is much harder to change without fundamentally altering the model. And if you're trying to produce content that doesn't set off detectors, fluency alone won't do it. You need genuine unpredictability, which comes from human judgment, not from running the output through a second AI.

Pangram operates in a market that matters most where authenticity is a hard requirement: academic submissions, journalism, legal filings, content moderation at scale. The insight that argument diversity is the real signal — rather than surface features — is probably more durable than most early detection methods, which LLMs quickly learned to evade. It works because it targets something that's baked into how models are trained, not just how they phrase things. Whether that advantage persists as models grow larger and more diverse in their outputs is an open question — but for now, it's the most interesting angle on AI detection this week.

#ai-detection#llm#research#pangram

Related guides

EAEvgenii Arsentev

Author

Evgenii Arsentev

PhD · Chief Product Officer at a tech company

Want to actually build this?

Guides explain. The free course transforms — personalized, gamified, and built to get you shipping fast.

◉ Start the free course

Source: the-decoder.com