Tesseract vs Google Vision vs space-ocr for receipts
A fair, fact-checked 3-way comparison for receipt OCR: Tesseract (open-source, self-hosted, pixel hOCR), Google Cloud Vision (cloud text + pixel boxes + recognition confidence), and space-ocr (structured receipt fields + line items + per-value match ratio + queryable sheet + CSV + automatic language detection) — led by a live demo.
If you need to pull totals, dates, and line items off receipts, three very different tools come up — and they're easy to lump together even though they solve different parts of the problem.
- Tesseract is the classic open-source engine. It's free, private, and you run it yourself: feed it an image, get back text with word/line bounding boxes in pixels. There's no hosted API and no structuring into fields — that's your code to write.
- Google Cloud Vision is strong cloud OCR at scale.
TEXT_DETECTION/DOCUMENT_TEXT_DETECTIONreturn the full text plus aboundingPoly(in pixels) and a per-word recognition confidence — text and geometry, but not structured receipt fields by default. (Structured key-value extraction at Google is a separate product, Document AI.) - space-ocr is verification-first and structured: send a receipt image, get back receipt/invoice fields and line items, each value carrying its own box and a
match_ratiofor how much of it was actually found on the page — plus a queryable sheet, one-click CSV, and automatic Japanese/Korean/Chinese/English detection with no language pack to install or select.
This is a fair comparison. Tesseract and Vision are genuinely good at what they do; the question is which part of "image → checkable receipt data" each one covers. It leads with a live demo you can poke at, not a feature grid you have to trust.
Proof first: a receipt extraction you can check
Most OCR comparisons ask you to trust a screenshot. Here's the thing none of these tools puts in front of you by default: an extraction where every value points back to the exact spot on the receipt it came from. Hover any field below — the box on the receipt is where that value was read, and each value carries a match ratio for how much of its characters were actually located on the page.

Every value carries a verified on-page location — bbox + 4-point vertices + match_ratio — on a 0–1000 normalized grid (0,0 top-left → 1000,1000 bottom-right), the same shape the live API returns. Hover a field to trace it back to the pixels it came from.
The 3-way comparison, row by row
All three read receipts and return coordinates. The differences are in who runs it, what coordinate system you get, whether structured fields and line items come out of the box, what kind of confidence each value carries, and how the data leaves the tool. The table states verified facts for each — use it as a checklist for your own receipts.
| Capability | Tesseract (open-source) | Google Cloud Vision | space-ocr |
|---|---|---|---|
| Hosting / ops | Self-hosted; you run the binary/library, no hosted API | Cloud API (GCP project, key, quotas) | Cloud API — one HTTPS call with a Bearer key, no infra |
| Coordinate system | Word/line boxes in pixels of the source image (hOCR / TSV left,top,width,height / ALTO) | boundingPoly vertices in pixels of the source image | bbox {xmin,ymin,xmax,ymax} on a 0–1000 normalized grid + oriented 4-point vertices |
| Structured receipt fields | None — text + geometry only; you write the structuring | Not by default (text + geometry); structured fields are a separate product, Document AI | Yes — templateId receipt/invoice or your own field list |
| Line items | Not built in | Not built in | An array field with children, each cell individually positioned |
| Per-value confidence | Recognition confidence per word, 0–100 | Recognition confidence per word, range [0, 1] | match_ratio — share of the value's characters located on the page — plus a bbox_source label |
| Languages / CJK | Japanese, Korean, Chinese & many more — via trained-data packs you install per language | Broadly multilingual, including CJK | One engine auto-detects Japanese, Korean, Chinese, English & more — no language pack to install or select |
| Export | Build it yourself from hOCR/TSV/ALTO | Build it yourself from the JSON | One-click CSV (UTF-8 BOM, line items unfolded) |
| Verification UI | None built in | None built in | Built into the app — click a cell and its exact region lights up on the original |
What "match ratio" means, and why it's not the same as a recognition confidence. Tesseract reports a per-word confidence (0–100) and Vision reports one too (range [0, 1]) — both are the model's self-reported certainty that it read the text correctly. space-ocr adds a different signal. The language model returns each field's text — and a hint of which word tokens it used — but never the boxes themselves. The engine then character-matches that text against the symbols the vision OCR actually detected on the receipt, lands a box on the real pixels those characters were found at, and scores each value with a match_ratio: the share of the value's characters located on the page (treated as a confident match at ≥ 0.85, labelled vision_symbol_match). The token hints can be noisy — they sometimes swap between repeated rows — so column- and row-consistency checks validate them rather than trusting them blindly. This isn't a claim that the other engines can't character-match; it's that space-ocr documents a character-coverage ratio per value, which is a different question from "how sure was the model."
Where Tesseract is the right call
A fair comparison names where each tool wins. Reach for Tesseract when:
- Privacy or air-gap matters. Nothing leaves your machine — it's a local binary, Apache-2.0, no network call. For receipts with personal data that can't go to a cloud, this is decisive.
- Cost must be zero per scan. There's no per-page fee, ever; you pay only for the compute you already own.
- You're fine writing the structuring layer. Tesseract gives you text plus word/line boxes (hOCR, TSV, ALTO). Turning "合計 2,045" into
{ total: "2,045" }with the right box is code you write and maintain — and on noisy phone-photo receipts that's a non-trivial amount of it.
If you want full control and zero recurring cost and you're happy to own the field-extraction logic, Tesseract is hard to beat.
Where Google Cloud Vision is the right call
Reach for Vision when:
- You need raw OCR at large scale and you're already on Google Cloud —
DOCUMENT_TEXT_DETECTIONreturns a cleanPages → Blocks → Paragraphs → Words → Symbolshierarchy with a per-word recognition confidence. - You want geometry, not opinions. Vision returns exactly what it read and where, in pixels, and leaves the interpretation to you.
- You'll add Document AI for structure. When you do need key-value fields and invoice line items from Google, the official OCR docs point you to Document AI (Form Parser / the pretrained Invoice parser) — a separate, processor-based product layered on top.
One operational note that bites receipt pipelines on every cloud OCR, Vision included: pixel coordinates are tied to the exact image you uploaded. Resize or recompress the receipt, or mishandle EXIF rotation from a phone, and the overlay boxes drift. Vision represents rotation around the top-left corner and doesn't hand back an explicit rotation angle, so reconciling orientation is on you.
Where space-ocr fits instead
A verification-first tool earns its place for receipts when one or more of these matters:
- You want structured fields and line items without writing them. Send a receipt, get
total,date, store name, and a positioned line-item array back — no parsing layer to build on top of raw OCR. - You want to verify, not just trust. Every value returns with its on-page box and a
match_ratio, and clicking a cell highlights exactly where it was read. A value below 0.85 flags itself as worth a second look. - You process Japanese, Korean, or Chinese receipts and want zero setup. One engine runs CJK and Latin scripts with automatic language detection — no language parameter to set. (Tesseract and Vision read CJK too, but you install or select the language and still build the field-structuring layer yourself.)
- You don't want to stand up storage. Results land in a sheet you can query server-side (
GET /view) and export to CSV in one click — no database. - You want flat, predictable pricing and zero ops. A flat ¥10 per image (about $0.05), a free tier of 100 scans a month with no credit card, and a $39/month Pro plan — and one HTTPS call instead of a server to run or a GCP project to wire up.
The whole call is one HTTP request. The engine takes raster images (JPEG, PNG, GIF, BMP, TIFF, WebP); the web app additionally converts PDF pages to images for you.
curl -s https://api.space-ocr.com/ocr/fields \
-H "Authorization: Bearer $SPACE_OCR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"image": "https://example.com/receipt.jpg",
"imageType": "url",
"templateId": "receipt"
}'Each value comes back with a bbox ({ xmin, ymin, xmax, ymax } on a 0–1000 grid), four vertices for an oriented box that follows a tilted phone photo, a match_ratio, and a bbox_source. Because the grid is normalized to 0–1000 rather than pixels of the uploaded file, the same coordinates overlay correctly whether you display the receipt at thumbnail or full size. For the full coordinate model see an OCR API with bounding boxes; for getting positioned line items out of receipts and invoices, see extract line items from invoices.
Line items: the part raw OCR leaves to you
Receipts are mostly a list of rows — description, quantity, unit price — and that's exactly where text-plus-boxes stops being enough. Tesseract and Vision both hand you words with coordinates; reconstructing which words belong to which row, and which column, is your code. space-ocr models a line item as an array field whose children describe one row, and each cell keeps its own bounding box — so a wrapped description or a discount line stays individually traceable. Pull those into a sheet with /upload, query them server-side with GET /view (where, sort, select), and download CSV with the line items unfolded — none of which is a re-OCR and none of which is charged.
How to try all three on the same receipt
- Pick one representative receiptChoose a real receipt — ideally a slightly tilted phone photo with line items and, if relevant, CJK text — so the comparison reflects your actual inputs rather than a clean scan.
- Run Tesseract locallyInstall Tesseract and the language data you need, then run it for hOCR or TSV output. Note the per-word boxes (pixels) and confidence (0–100), and how much code it takes to turn those words into total, date, and line items.
- Call Google Cloud VisionSend the same image to DOCUMENT_TEXT_DETECTION. You'll get text, a Pages→Blocks→Paragraphs→Words→Symbols hierarchy with pixel boundingPolys, and a per-word recognition confidence in range [0, 1] — but you still structure the fields yourself, or add Document AI.
- Call space-ocr with the receipt templatePOST the same image to /ocr/fields with imageType 'url' or 'base64' and templateId 'receipt'. Each value comes back with a bbox, oriented vertices, a match_ratio, and a bbox_source — structured fields and line items, no parsing layer to write.
- Compare verification, not just textLine up the three: pixel boxes you must reconcile to your display size vs. a 0–1000 normalized grid; recognition confidence vs. a character-coverage match_ratio; raw words vs. positioned line items you can query with GET /view and export to CSV.
Tesseract vs Google Vision vs space-ocr — which is best for receipts?
Do Tesseract or Google Vision return receipt fields like total and line items?
What's the difference between a recognition confidence and a match ratio?
Why do my overlay boxes drift after resizing a receipt image?
Does any of these handle Japanese, Korean, and Chinese receipts?
Run a verifiable receipt extraction on your own images
Free tier — 100 scans a month, no credit card, no server to run. Every value comes back with its on-page location and a match ratio.