Probably Raises $9M for AI That Barely Hallucinates

Probably raised $9M from Andreessen Horowitz to build AI aiming for 99.99% accuracy by checking model output against hard validators before it reaches you.

4 min readEAEvgenii ArsentevEvgenii Arsentev · PhD

A new startup called Probably has raised $9 million in seed funding from Andreessen Horowitz to build AI that aims for 99.99% accuracy — the kind of reliability you expect from ordinary deterministic software, not from a chatbot that occasionally invents facts. The company, led by founder and CEO Peter Elias, says its goal is to stop hallucinations and factual errors before they ever reach a user, rather than apologizing for them afterward.

The approach is the interesting part. Instead of betting everything on a bigger, smarter model, Probably wraps a language model in what Elias calls a 'data science mech suit': the model's output is checked against deterministic validators, and the model itself is trained against that harness so it learns to produce answers the validators will accept. 'The better your harness engineering is, the weaker the model can be,' Elias told TechCrunch — and that has a practical payoff. The current product runs on a model he describes as 'four classes weaker' than today's frontier systems, which means it can run on local hardware instead of a data center and costs far less per token.

What Probably actually ships first

The first product is a data-science tool that produces answers from complicated datasets, complete with citations and an audit trail you can follow back to the source. That focus is deliberate: Probably is targeting domains where a confident wrong answer is expensive, like accounting and healthcare. Elias also took a swipe at the rest of the industry, saying he finds it 'really interesting that the big AI labs have not even attempted to do this' — the labs mostly compete on raw model capability, not on guaranteeing the output is correct.

Why this matters for you

Most of us have already been burned by a confident, wrong AI answer — a made-up citation, a number that was off, a function that doesn't exist. The usual fix is 'use a bigger model and hope,' which is slow and expensive and still fails. Probably is a bet on a different lever: the scaffolding around the model. If a weaker, cheaper, local model plus good validation can hit near-perfect accuracy on a narrow task, that's a more honest path to AI you can actually trust with money and health decisions. My own take, after a lot of time building with these tools, is that this is the right instinct — the gains that matter day to day come less from a smarter model and more from the harness that checks its work before it reaches me. A $9M seed round won't settle whether it works at scale, but the direction is the one I'd bet on.

What I'd actually do

Until 'verified AI' is the default, build your own tiny harness. For anything that touches numbers, dates, or money, ask the model to show its sources and the steps, then check one or two by hand. Treat an answer with no citation and no way to verify it as a draft, not a fact — that single habit catches most of the damage.

#ai#hallucinations#startups#a16z#reliability

Related guides

EAEvgenii Arsentev

Author

Evgenii Arsentev

PhD · Chief Product Officer at a tech company

Want to actually build this?

Guides explain. The free course transforms — personalized, gamified, and built to get you shipping fast.

◉ Start the free course

Source: techcrunch.com