space ocr
GuidesArticlesPricingDocs
AI OCR

AI OCR that you don't have to take on faith

space-ocr uses an LLM to structure your documents, then validates every value against the real OCR symbols on the page and scores each one with a match_ratio.

AI OCR sounds like the answer to messy documents: hand a receipt or an invoice to a model and get clean, structured fields back. The trouble is what happens when the model is wrong. A language model will return a confident, well-formatted value whether or not it actually read it off the page, and most tools hand you that value with no way to tell the difference.

space-ocr takes a stricter line. An LLM does the structuring, but it doesn't get the final word. The model returns each value plus the word-token ids it thinks it used; the engine then character-matches that value against the symbols Google Vision actually detected on the page, locates it with a box, and scores how well it matched. So the AI is part of the pipeline, not the judge of it. You can check every value it produced.

See the AI's output, checked

Hover any field below — the box on the receipt is where that value was actually found on the page, not where the model claimed it was. Every value, box, and match score here is read straight from a real parsed result, not a mockup.

Receipts with extracted-field bounding boxes
Verified fields
KINSHO · 合計 2,045
ライフ · 合計 4,286

Each value with a box carries a verified on-page location — bbox + 4-point vertices + match_ratio — on a 0–1000 normalized grid (0,0 top-left → 1000,1000 bottom-right), the same shape the live API returns. Hover a field to trace it back to the pixels it came from.

AI output, validated against the page
The LLM returns each value and the word-token ids it used — never coordinates. CharMatcher runs first and matches that value, character by character, against the symbols Vision actually detected.
Every value located and scored
Each field comes back with a bounding box (xmin/ymin/xmax/ymax on a 0–1000 grid), four oriented vertices, and a match_ratio. 0.85 or higher is a confident match; 1.0 means every character was found.
Templates or auto-fields, no schema
Apply a built-in templateId like receipt or invoice, define your own fields, or set autoFields and let the model propose the schema. No schema to write for common documents.
Audit trail: original vs edited
When you correct a cell, the edit is stored beside the original OCR value rather than overwriting it — so what the AI read and what a human changed both stay on record.
Line items the model can't fake
Repeated-value columns are checked for column and row consistency instead of being trusted blindly, so a model that swaps two rows gets caught, not exported.
Languages on autopilot
Japanese, Korean, Chinese, and English in one engine, mixed scripts handled — no language hint to set. The model and the matcher both work across scripts.

How AI OCR works in space-ocr

Upload an image and an LLM reads the document into structured fields, returning each value with the word-token ids it used. Before that ever reaches you, CharMatcher takes the value and matches its characters against the symbols Google Vision detected on the page, producing the box, the oriented vertices, and the match_ratio. If the model supplied token ids, the engine looks up those Vision word boxes and can override a field's source to token_id — but for repeated-value columns it leans on column clustering and row consistency, because a model's token hints can be wrong there.

You don't have to write a schema. Pass a built-in templateId like receipt or invoice, define your own fields, or set autoFields and let the model suggest the structure. The web app rasterizes PDFs page by page first; the public API takes raster images directly (JPEG, PNG, GIF, BMP, TIFF, WebP).

structure a document, with every value checked
1
2
3
4
5
6
7
8
curl -s https://api.space-ocr.com/ocr/fields \
  -H "Authorization: Bearer $SPACE_OCR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "image": "https://example.com/receipt.jpg",
    "imageType": "url",
    "templateId": "receipt"
  }'

How to run AI OCR you can verify

  1. Send a document
    Upload an image to /ocr/fields (url or base64). In the app you can drop a PDF and each page is rasterized first; the public API takes raster images.
  2. Let the AI structure it
    Pass a built-in templateId, define your own fields, or set autoFields so the model proposes the schema. The LLM returns each value plus the word-token ids it used.
  3. Read the checked result
    Each value comes back with its bbox, vertices, match_ratio, and bbox_source, plus a field_bboxes map locating every field — the AI's output validated against the page.
  4. Verify the low scores
    Click a cell to highlight the exact region it was read from. A match_ratio below 0.85 marks a value worth a second look; your edit is stored beside the original OCR value.
  5. Export or query
    Download CSV (UTF-8 BOM, line items unfolded) or query a stored sheet with GET /view using where, sort, and select — no re-OCR, no extra charge.

Simple, predictable pricing

Pay $0.05 per image (¥10 / ₩100), with a free tier of 100 scans a month and no credit card. Flat plans add monthly scans, more sheets, and storage.

Free
$0
  • 100 scans / month
  • 3 sheets
  • 1 GB storage
Free — no card
Starter
$19/mo
  • 400 scans / month
  • 10 sheets
  • 10 GB storage
Start free
Most popular
Pro
$49/mo
  • 1,100 scans / month
  • Unlimited sheets
  • 100 GB storage
Start free
What makes this AI OCR different from a model that just returns JSON?
The LLM structures the document, but it never has the last word. It returns each value plus the word-token ids it used, and the engine then character-matches that value against the symbols Google Vision actually detected on the page. You get a box and a match_ratio for every value, so you can check the AI rather than trust it.
Does the AI return the coordinates?
No. The LLM returns the value and the word-token ids it used, not coordinates. CharMatcher runs first and produces the bounding box, oriented vertices, and match_ratio by matching characters against the detected symbols. Token ids are a secondary override, and for repeated-value columns the engine checks column and row consistency instead of trusting them.
How do I know whether to trust a given value?
Read its match_ratio. It is the share of expected characters that were located on the page, from 0.0 to 1.0. A value at 0.85 or higher is a confident match; 1.0 means every character was found. A value below 0.85 is flagged so you know to look closer. We don't quote an accuracy percentage — we give you a per-value score instead.
Can the AI propose the fields for me?
Yes. Set autoFields and the model suggests a schema for the document, or pass a built-in templateId like receipt, invoice, delivery, business_card, or driver_license, or define your own fields — including an array field with children for line items.
What happens to the original value when I fix the AI's output?
Your edit is stored beside the original OCR value, not on top of it. The AI's reading and the human correction both stay on record, so a sheet has an audit trail you can review. CSV export is UTF-8 BOM (Excel- and CJK-safe) and line items unfold into sub-rows.
How much does it cost?
$0.05 per image (¥10 / ₩100 per scan), with a free tier of 100 scans a month and no credit card. Flat plans (Starter and Pro) add monthly scans, more sheets, and storage — see the plans above.

Use AI on your documents without trusting it blindly

Free tier — 100 scans a month, no credit card. Every value the model produces comes back located and scored.

Related