comparison

Tesseract vs Google Vision vs space-ocr for receipts

A fair, fact-checked 3-way comparison for receipt OCR: Tesseract (open-source, self-hosted, pixel hOCR), Google Cloud Vision (cloud text + pixel boxes + recognition confidence), and space-ocr (structured receipt fields + line items + per-value match ratio + queryable sheet + CSV + automatic language detection) — led by a live demo.

8 min read· 2026-06-25

If you need to pull totals, dates, and line items off receipts, three very different tools come up — and they're easy to lump together even though they solve different parts of the problem.

Tesseract is the classic open-source engine. It's free, private, and you run it yourself: feed it an image, get back text with word/line bounding boxes in pixels. There's no hosted API and no structuring into fields — that's your code to write.
Google Cloud Vision is strong cloud OCR at scale. TEXT_DETECTION / DOCUMENT_TEXT_DETECTION return the full text plus a boundingPoly (in pixels) and a per-word recognition confidence — text and geometry, but not structured receipt fields by default. (Structured key-value extraction at Google is a separate product, Document AI.)
space-ocr is verification-first and structured: send a receipt image, get back receipt/invoice fields and line items, each value carrying its own box and a match_ratio for how much of it was actually found on the page — plus a queryable sheet, one-click CSV, and automatic Japanese/Korean/Chinese/English detection with no language pack to install or select.

This is a fair comparison. Tesseract and Vision are genuinely good at what they do; the question is which part of "image → checkable receipt data" each one covers. It leads with a live demo you can poke at, not a feature grid you have to trust.

Proof first: a receipt extraction you can check

Most OCR comparisons ask you to trust a screenshot. Here's the thing none of these tools puts in front of you by default: an extraction where every value points back to the exact spot on the receipt it came from. Hover any field below — the box on the receipt is where that value was read, and each value carries a match ratio for how much of its characters were actually located on the page.

Source receipts with extracted-field bounding boxes

Verified fields

KINSHO · 合計 2,045

ライフ · 合計 4,286

Every value carries a verified on-page location — bbox + 4-point vertices + match_ratio — on a 0–1000 normalized grid (0,0 top-left → 1000,1000 bottom-right), the same shape the live API returns. Hover a field to trace it back to the pixels it came from.

DemoEach extracted receipt field carries its own bounding box and <b>match ratio</b> — not just a value, but where on the page it sits and how well it matched.

Each extracted receipt field carries its own bounding box and match ratio — not just a value, but where on the page it sits and how well it matched.

The 3-way comparison, row by row

All three read receipts and return coordinates. The differences are in who runs it, what coordinate system you get, whether structured fields and line items come out of the box, what kind of confidence each value carries, and how the data leaves the tool. The table states verified facts for each — use it as a checklist for your own receipts.

Capability	Tesseract (open-source)	Google Cloud Vision	space-ocr
Hosting / ops	Self-hosted; you run the binary/library, no hosted API	Cloud API (GCP project, key, quotas)	Cloud API — one HTTPS call with a Bearer key, no infra
Coordinate system	Word/line boxes in pixels of the source image (hOCR / TSV `left,top,width,height` / ALTO)	`boundingPoly` vertices in pixels of the source image	`bbox` `{xmin,ymin,xmax,ymax}` on a 0–1000 normalized grid + oriented 4-point `vertices`
Structured receipt fields	None — text + geometry only; you write the structuring	Not by default (text + geometry); structured fields are a separate product, Document AI	Yes — `templateId` `receipt`/`invoice` or your own field list
Line items	Not built in	Not built in	An `array` field with `children`, each cell individually positioned
Per-value confidence	Recognition confidence per word, 0–100	Recognition confidence per word, range [0, 1]	`match_ratio` — share of the value's characters located on the page — plus a `bbox_source` label
Languages / CJK	Japanese, Korean, Chinese & many more — via trained-data packs you install per language	Broadly multilingual, including CJK	One engine auto-detects Japanese, Korean, Chinese, English & more — no language pack to install or select
Export	Build it yourself from hOCR/TSV/ALTO	Build it yourself from the JSON	One-click CSV (UTF-8 BOM, line items unfolded)
Verification UI	None built in	None built in	Built into the app — click a cell and its exact region lights up on the original

✓ Verified

What "match ratio" means, and why it's not the same as a recognition confidence. Tesseract reports a per-word confidence (0–100) and Vision reports one too (range [0, 1]) — both are the model's self-reported certainty that it read the text correctly. space-ocr adds a different signal. The language model returns each field's text — and a hint of which word tokens it used — but never the boxes themselves. The engine then character-matches that text against the symbols the vision OCR actually detected on the receipt, lands a box on the real pixels those characters were found at, and scores each value with a match_ratio: the share of the value's characters located on the page (treated as a confident match at ≥ 0.85, labelled vision_symbol_match). The token hints can be noisy — they sometimes swap between repeated rows — so column- and row-consistency checks validate them rather than trusting them blindly. This isn't a claim that the other engines can't character-match; it's that space-ocr documents a character-coverage ratio per value, which is a different question from "how sure was the model."

Where Tesseract is the right call

A fair comparison names where each tool wins. Reach for Tesseract when:

Privacy or air-gap matters. Nothing leaves your machine — it's a local binary, Apache-2.0, no network call. For receipts with personal data that can't go to a cloud, this is decisive.
Cost must be zero per scan. There's no per-page fee, ever; you pay only for the compute you already own.
You're fine writing the structuring layer. Tesseract gives you text plus word/line boxes (hOCR, TSV, ALTO). Turning "合計 2,045" into { total: "2,045" } with the right box is code you write and maintain — and on noisy phone-photo receipts that's a non-trivial amount of it.

If you want full control and zero recurring cost and you're happy to own the field-extraction logic, Tesseract is hard to beat.

Where Google Cloud Vision is the right call

Reach for Vision when:

You need raw OCR at large scale and you're already on Google Cloud — DOCUMENT_TEXT_DETECTION returns a clean Pages → Blocks → Paragraphs → Words → Symbols hierarchy with a per-word recognition confidence.
You want geometry, not opinions. Vision returns exactly what it read and where, in pixels, and leaves the interpretation to you.
You'll add Document AI for structure. When you do need key-value fields and invoice line items from Google, the official OCR docs point you to Document AI (Form Parser / the pretrained Invoice parser) — a separate, processor-based product layered on top.

One operational note that bites receipt pipelines on every cloud OCR, Vision included: pixel coordinates are tied to the exact image you uploaded. Resize or recompress the receipt, or mishandle EXIF rotation from a phone, and the overlay boxes drift. Vision represents rotation around the top-left corner and doesn't hand back an explicit rotation angle, so reconciling orientation is on you.

Where space-ocr fits instead

A verification-first tool earns its place for receipts when one or more of these matters:

You want structured fields and line items without writing them. Send a receipt, get total, date, store name, and a positioned line-item array back — no parsing layer to build on top of raw OCR.
You want to verify, not just trust. Every value returns with its on-page box and a match_ratio, and clicking a cell highlights exactly where it was read. A value below 0.85 flags itself as worth a second look.
You process Japanese, Korean, or Chinese receipts and want zero setup. One engine runs CJK and Latin scripts with automatic language detection — no language parameter to set. (Tesseract and Vision read CJK too, but you install or select the language and still build the field-structuring layer yourself.)
You don't want to stand up storage. Results land in a sheet you can query server-side (GET /view) and export to CSV in one click — no database.
You want flat, predictable pricing and zero ops. A flat ¥10 per image (about $0.05), a free tier of 100 scans a month with no credit card, and a $39/month Pro plan — and one HTTPS call instead of a server to run or a GCP project to wire up.

The whole call is one HTTP request. The engine takes raster images (JPEG, PNG, GIF, BMP, TIFF, WebP); the web app additionally converts PDF pages to images for you.

extract a receipt — one request, structured fields back

curl -s https://api.space-ocr.com/ocr/fields \
  -H "Authorization: Bearer $SPACE_OCR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "image": "https://example.com/receipt.jpg",
    "imageType": "url",
    "templateId": "receipt"
  }'

Each value comes back with a bbox ({ xmin, ymin, xmax, ymax } on a 0–1000 grid), four vertices for an oriented box that follows a tilted phone photo, a match_ratio, and a bbox_source. Because the grid is normalized to 0–1000 rather than pixels of the uploaded file, the same coordinates overlay correctly whether you display the receipt at thumbnail or full size. For the full coordinate model see an OCR API with bounding boxes; for getting positioned line items out of receipts and invoices, see extract line items from invoices.

DemoClick any receipt cell and the matching region lights up on the original image — value-by-value verification that neither Tesseract nor Vision ships built in.

Click any receipt cell and the matching region lights up on the original image — value-by-value verification that neither Tesseract nor Vision ships built in.

Line items: the part raw OCR leaves to you

Receipts are mostly a list of rows — description, quantity, unit price — and that's exactly where text-plus-boxes stops being enough. Tesseract and Vision both hand you words with coordinates; reconstructing which words belong to which row, and which column, is your code. space-ocr models a line item as an array field whose children describe one row, and each cell keeps its own bounding box — so a wrapped description or a discount line stays individually traceable. Pull those into a sheet with /upload, query them server-side with GET /view (where, sort, select), and download CSV with the line items unfolded — none of which is a re-OCR and none of which is charged.

How to try all three on the same receipt

Pick one representative receipt
Choose a real receipt — ideally a slightly tilted phone photo with line items and, if relevant, CJK text — so the comparison reflects your actual inputs rather than a clean scan.
Run Tesseract locally
Install Tesseract and the language data you need, then run it for hOCR or TSV output. Note the per-word boxes (pixels) and confidence (0–100), and how much code it takes to turn those words into total, date, and line items.
Call Google Cloud Vision
Send the same image to DOCUMENT_TEXT_DETECTION. You'll get text, a Pages→Blocks→Paragraphs→Words→Symbols hierarchy with pixel boundingPolys, and a per-word recognition confidence in range [0, 1] — but you still structure the fields yourself, or add Document AI.
Call space-ocr with the receipt template
POST the same image to /ocr/fields with imageType 'url' or 'base64' and templateId 'receipt'. Each value comes back with a bbox, oriented vertices, a match_ratio, and a bbox_source — structured fields and line items, no parsing layer to write.
Compare verification, not just text
Line up the three: pixel boxes you must reconcile to your display size vs. a 0–1000 normalized grid; recognition confidence vs. a character-coverage match_ratio; raw words vs. positioned line items you can query with GET /view and export to CSV.

Tesseract vs Google Vision vs space-ocr — which is best for receipts?

It depends on what you need. Tesseract is free, private, and self-hosted, but returns only text with pixel boxes — you write the receipt-field and line-item logic. Google Cloud Vision is strong cloud OCR returning text, pixel boundingPolys, and a per-word recognition confidence, but no structured receipt fields by default (that's Google Document AI, a separate product). space-ocr returns structured receipt/invoice fields and positioned line items, each value carrying a match_ratio and an on-page box, plus a queryable sheet, CSV, and automatic language detection. Choose Tesseract for zero-cost privacy, Vision for raw OCR at scale, and space-ocr when you want structured, verifiable receipts with zero ops.

Do Tesseract or Google Vision return receipt fields like total and line items?

Not on their own. Tesseract outputs text with word/line bounding boxes (hOCR, TSV, ALTO) and a per-word confidence 0–100 — no key-value structuring. Vision's TEXT_DETECTION / DOCUMENT_TEXT_DETECTION return text plus pixel boundingPolys and a per-word recognition confidence in range [0, 1], but not structured fields; Google's own docs point to Document AI for form parsing and invoice line items. space-ocr returns total, date, store name, and a positioned line-item array directly from a receipt template.

What's the difference between a recognition confidence and a match ratio?

A recognition confidence — Tesseract's 0–100 or Vision's [0, 1] — is the model's self-reported certainty that it read the characters correctly. space-ocr's match_ratio is a different measure: the share of an extracted value's characters that were actually located among the symbols the vision OCR detected on the page (a confident match at 0.85 and above). It answers 'how much of this value did we find on the page', which is the question you care about when auditing a receipt total. It's not a claim that the other engines can't character-match — only that space-ocr documents this coverage ratio per value.

Why do my overlay boxes drift after resizing a receipt image?

Because pixel coordinates are tied to the exact image you uploaded. Tesseract and Vision both return boxes in pixels of the source image, so resizing or recompressing the receipt, or mishandling EXIF rotation from a phone camera, makes the overlay drift — Vision represents rotation around the top-left corner without an explicit angle, leaving orientation for you to reconcile. space-ocr returns coordinates on a 0–1000 normalized grid and applies EXIF orientation on load, so the boxes line up with the receipt as displayed regardless of display size.

Does any of these handle Japanese, Korean, and Chinese receipts?

space-ocr runs Japanese, Korean, Chinese, English, and other scripts through one engine with automatic language detection — there's no language parameter to set, and it normalizes full-width/half-width characters, hyphen variants, CJK spacing, and vertical Han. Tesseract supports many languages via trained data files you install per language, and Google Vision has broad language coverage; in both cases you still write the receipt-structuring layer on top of the raw OCR yourself.

Run a verifiable receipt extraction on your own images

Free tier — 100 scans a month, no credit card, no server to run. Every value comes back with its on-page location and a match ratio.

Start free API docs

Best OCR Software for Receipts and Invoices (2026 Guide)

OCR API with Bounding Boxes: Verify Every Value (2026)

Extract Line Items From Invoices Automatically | space-ocr