A Startup Says It Cracked AI's Decade-Old Bottleneck

Miami startup Subquadratic says its SubQ model beats the transformer quadratic attention limit — 56x faster, 12M-token context — but experts want real proof.

↻ Published 2026-06-19◷ 4 min readEA

Evgenii Arsentev · PhD

New posts every dayFollow me on TelegramWhere the AI world lives — daily AI news + Claude Code tipsFollow →Free Claude Code courseNo upsells, no cross-sells — nothing to buy here.Start free →

Subquadratic, a Miami startup that left stealth in May 2026, claims its SubQ model breaks through the quadratic attention bottleneck that has constrained large language models for nearly a decade. The headline numbers, reported by MIT Technology Review, are loud: up to 56 times faster than FlashAttention-based models, a context window of up to 12 million tokens, and 98% accuracy on needle-in-a-haystack retrieval at the 6-to-12-million-token scale.

The bottleneck is real and old. In a standard transformer, attention is "dense" — every token is compared against every other token, so doubling the length of your text roughly quadruples the computation. That quadratic cost is why long documents get expensive fast and why context windows have practical limits. SubQ uses sparse attention instead: a dynamic, text-specific mechanism that selects only the token relationships that matter rather than grinding through every pair. Founded by CTO Alex Whedon and CEO Justin Dangel, the company says that shift is what unlocks the speed and the length.

The numbers — and the asterisks

On cost, the claim is striking: running the RULER 128 benchmark cost the company $8, versus $2,600 they cite for Anthropic's Opus 4.6. SubQ also posted 89.7% on LiveCodeBench, a competitive-coding test, and reportedly chewed through a 400-document analysis task in seconds. An independent evaluator, Appen, ran tests; its director Jeanine Sinanan-Singh said the architecture "could be a game changer."

Then the asterisks. Public access is thin — there's a long waitlist despite the company claiming tens of thousands of signups, so almost nobody outside has stress-tested it. SubQ reused weights from China's open-source Qwen model rather than training from scratch, which complicates clean comparisons. And AI researcher Will Depue put the skeptic's case plainly: "public evidence does not yet justify the stronger claim that they have solved the quadratic attention bottleneck." Benchmarks, as ever, aren't the same as real-world use across messy, varied tasks.

ℹWhat I'd actually do

Don't rewrite your plans around SubQ yet — wait for independent access and third-party numbers on tasks you care about. But do file the direction away: sparse, selective attention is where a lot of the field is heading, and cheap long context is coming whether or not this particular startup is the one that delivers it.

Why this matters even if you never touch SubQ: the quadratic wall is the reason today's models forget the start of a long conversation, choke on a whole codebase, or charge real money to read a long PDF. Break that wall cheaply and a lot of frustrating limits quietly relax — you could hand a model an entire book, a year of email, or a giant repo and have it actually hold the whole thing in mind. My honest read: the claims are exciting and the caveats are equally serious, and the right posture is interested-but-unconvinced until outsiders can run it. A genuine breakthrough here would matter enormously — which is exactly why it deserves more proof than a launch-week benchmark sheet.

#llm#attention#long-context

Related guides

Author

Evgenii Arsentev

PhD · Chief Product Officer at a tech company

About the author →

Want to actually build this?

Guides explain. The free course transforms — personalized, gamified, and built to get you shipping fast.

◉ Start the free course

← All news

Source: technologyreview.com