Baidu's Free OCR Model Reads Any Document at Once

Baidu's MIT-licensed Unlimited-OCR model processes entire multi-page documents in one pass, runs locally via Ollama, and needs no cloud subscription or per-request fees.

3 min readEAEvgenii ArsentevEvgenii Arsentev · PhD

Baidu published Unlimited-OCR on Hugging Face in June 2026 — a 3-billion-parameter model under the MIT license that converts scanned documents and PDFs into plain text entirely on your own machine. No API fees, no cloud upload, no third-party service handling your documents. It runs locally via Ollama, LM Studio, or a server you control with an API that's compatible with standard AI tools.

OCR — optical character recognition — is the technology that turns a photo of a contract, a scanned invoice stack, or a page of printed text into editable, searchable words. Until recently, doing this accurately on complex documents usually meant paying a cloud service and sending your files somewhere else. Unlimited-OCR is designed to handle that task locally, including multi-page documents with mixed layouts.

What makes it different from standard OCR

The model's technical paper, published on arXiv in June 2026, describes 'one-shot long-horizon parsing' — the ability to take an entire document and process it in a single pass rather than slicing it into individual pages and stitching the results together afterward. That matters because stitching introduces errors at page boundaries, especially when text or tables cross from one page to the next. The model supports a context window of up to 32,768 tokens, which covers a typical multi-page contract or a detailed report.

Two resolution modes are available: a faster 640-pixel mode for speed and a more detailed 1024-pixel mode for complex layouts with small text or dense tables. PDFs are first converted to images, then processed. Baidu built on the foundations of existing open-source projects including DeepSeek-OCR and PaddleOCR, and the result inherits their multilingual strengths.

Why this matters if you build things

A large share of the most useful data in any business sits in PDF files: contracts, bank statements, invoices, medical records, insurance forms. Pulling that data into a format you can query or process automatically is one of the most requested tasks in any automation project. Having a free, locally-running model to do it changes the economics significantly — especially when the documents contain sensitive information that shouldn't leave your machine.

The 3-billion-parameter size keeps hardware requirements manageable: it runs on a consumer GPU or a modest server, not a dedicated AI cluster. The MIT license lets you use it in commercial projects without restrictions or royalties. Baidu has also released quantized versions compatible with llama.cpp, Ollama, LM Studio, and Jan, covering the most common local deployment setups.

What I'd actually do

Pull it via Ollama, feed it a PDF you already have — a contract or invoice with a clear layout is a good first test — and see how clean the output text is. Then pipe that text to Claude for any summarization or data extraction you need. A local OCR model feeding a hosted AI for reasoning is a practical setup that keeps your raw documents private.

#OCR#document parsing#Baidu#open source#local AI

Related guides

EAEvgenii Arsentev

Author

Evgenii Arsentev

PhD · Chief Product Officer at a tech company

Want to actually build this?

Guides explain. The free course transforms — personalized, gamified, and built to get you shipping fast.

◉ Start the free course

Source: huggingface.co