An OCR API with source coordinates you can gate on
How to trust OCR output: every value carries its on-page source location (bbox + vertices) and a match_ratio you can gate on. Flag match_ratio < 0.85 for review, keep edits beside the original, and query stored results with GET /view — no re-OCR.
Most OCR APIs hand you a value and a recognition confidence: the model's self-reported certainty that it read the characters correctly. That number tells you how sure the model feels — it doesn't tell you where on the page the value came from, or let you cheaply re-check it later. For anything that touches money, compliance, or a downstream system of record, "trust the model" isn't an audit story.
This is a how-to-trust-your-OCR workflow. The idea behind an OCR API with source coordinates is simple: every extracted value should carry the exact spot on the page it was read from — a bounding box plus oriented vertices — and a match_ratio that says how much of the value was actually located on the page. With those two things you can gate: auto-accept the confident values, route the rest to a human, and prove after the fact where each number lives. This article is about the verification workflow, not the coordinate formats themselves — for the format landscape (pixels vs. normalized, polygons vs. quads) see an OCR API with bounding boxes.
Proof first: an extraction that points back to the page
Here's the thing to check before reading another word. Hover any field below — the box on the receipt is exactly where that value was read, and each value carries a match_ratio for how much of it was found on the page. This isn't a mockup; the boxes are drawn from the same bbox/vertices/match_ratio the API returns.

Every value carries a verified on-page location — bbox + 4-point vertices + match_ratio — on a 0–1000 normalized grid (0,0 top-left → 1000,1000 bottom-right), the same shape the live API returns. Hover a field to trace it back to the pixels it came from.
Recognition confidence vs. a match ratio you can gate on
Google Cloud Vision, Amazon Textract, Tesseract, and Azure AI Document Intelligence all return geometry and a recognition confidence — the model's certainty it read the glyphs right. That's genuinely useful, but it's a different signal from "how much of this value did we actually find on the page."
space-ocr returns a match_ratio: the share of a value's characters that were located among the symbols the vision OCR actually detected. It's a coverage score, not a self-report. You can gate on it: treat match_ratio >= 0.85 as a confident match (the engine labels it vision_symbol_match), and send anything below that to a human. Paired with each value's bbox_source provenance label, you get a defensible answer to "where did this number come from, and how sure are we?"
How the coordinates are derived — and why they're checkable. The language model returns each field's text plus a hint of which word tokens it used — never the boxes. The engine's CharMatcher then runs first and matches that text, character by character, against the symbols the vision OCR actually detected on the page; the box lands on those real symbols, and match_ratio scores how much of the value was found (a field is treated as confidently matched at >= 0.85). The model's token hints are a secondary override — they can be noisy and sometimes swap between repeated rows — so column- and row-consistency checks validate them rather than trusting them blindly. The bbox_source label tells you which path produced the box: vision_symbol_match (CharMatcher), token_id / token_id_hybrid (token-hint override), low_confidence (matched below 0.85), or shared_value (propagated from a merged cell). The point isn't that the model can't be wrong; it's that every value is checked back against the page with a score you can gate on.
What comes back per value
Call POST /ocr/fields with one image and you get structured fields where each value carries:
bbox— integer{ xmin, ymin, xmax, ymax }on a 0–1000 normalized grid (0,0 top-left → 1000,1000 bottom-right). To draw on the source image:pixel_x = bbox_x / 1000 * image_width.vertices— four ordered points (tl, tr, br, bl) for an oriented box that follows a tilted phone photo.match_ratio— 0–1 character coverage;>= 0.85is a confident match.bbox_source— the provenance label (vision_symbol_match,token_id,token_id_hybrid,low_confidence,shared_value).
The request is one HTTPS call with a Bearer key — no SDK, and the engine takes raster images (JPEG, PNG, GIF, BMP, TIFF, WebP).
curl -s https://api.space-ocr.com/ocr/fields \
-H "Authorization: Bearer $SPACE_OCR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"image": "https://example.com/receipt.jpg",
"imageType": "url",
"templateId": "receipt"
}'Gate on match_ratio: flag low-confidence values for review
The verification workflow lives in one rule: auto-accept the confident values, route the rest to a human. Walk the returned fields, and anything with match_ratio < 0.85 (or a bbox_source of low_confidence) goes into a review queue alongside its on-page box so a reviewer can see exactly which characters were and weren't found.
import os, json, urllib.request
API = "https://api.space-ocr.com"
KEY = os.environ["SPACE_OCR_API_KEY"]
GATE = 0.85 # values below this go to human review
def ocr_fields(image_url):
body = json.dumps({
"image": image_url,
"imageType": "url",
"templateId": "receipt",
}).encode()
req = urllib.request.Request(
f"{API}/ocr/fields", data=body,
headers={
"Authorization": f"Bearer {KEY}",
"Content-Type": "application/json",
},
)
with urllib.request.urlopen(req) as r:
return json.load(r)["data"]
def flag_for_review(data):
"""Return (auto_accept, needs_review) split on match_ratio."""
auto, review = [], []
for name, v in data.items():
if not isinstance(v, dict) or "match_ratio" not in v:
continue
row = {
"field": name,
"value": v.get("value"),
"match_ratio": v["match_ratio"],
"bbox_source": v.get("bbox_source"),
"bbox": v.get("bbox"), # where on the page to highlight
}
(auto if v["match_ratio"] >= GATE else review).append(row)
return auto, review
data = ocr_fields("https://example.com/receipt.jpg")
_, needs_review = flag_for_review(data)
for r in needs_review:
print(f"REVIEW {r['field']}={r['value']!r} "
f"match_ratio={r['match_ratio']:.2f} "
f"source={r['bbox_source']} bbox={r['bbox']}")Because every flagged value carries its bbox, the review tool doesn't need to re-OCR anything — it just highlights the region on the original image. That's the same interaction the demo above shows: click a cell, the source region lights up. A reviewer fixes the value in place, and the correction is stored beside the original OCR value rather than overwriting it — so you keep a full audit trail of what the engine read versus what a human accepted. For the provenance/audit story end to end, see building an OCR audit trail; for hands-on box-level validation, see how to validate OCR with bounding boxes.
Query stored results server-side — without re-OCR
Once documents are processed into a sheet (via POST /upload), you don't re-run OCR to audit them. GET /view queries the stored results server-side — where, sort, select, limit, offset, and boxes — at no charge. That makes "show me every row where the match was weak" or "pull the high-value invoices" a single read, not a re-extraction.
The where filter accepts repeated clauses (AND'd together) with operators = != > >= < <= ~ (~ is contains), matching either a column or ocrStatus. Set boxes=1 to keep each cell's vertices/field_bboxes in the response so you can highlight straight from the query.
# rows the engine matched weakly, newest first, with boxes for highlighting
curl -s -G https://api.space-ocr.com/view \
-H "Authorization: Bearer $SPACE_OCR_API_KEY" \
--data-urlencode "path=/invoices/2026-06" \
--data-urlencode "where=ocrStatus~low" \
--data-urlencode "sort=-invoice_date" \
--data-urlencode "select=vendor,total,invoice_date" \
--data-urlencode "boxes=1" \
--data-urlencode "limit=50"How to build a verify-before-trust OCR pipeline
- Extract with source coordinatesPOST the image to /ocr/fields with imageType 'url' or 'base64'. Each value comes back with a bbox, oriented vertices, a match_ratio, and a bbox_source — language is detected automatically.
- Gate on match_ratioAuto-accept values with match_ratio at or above 0.85 (bbox_source vision_symbol_match); route anything below — or labelled low_confidence — into a human review queue, carrying its bbox along.
- Verify by highlighting, not re-OCRDraw each flagged value's bbox on the original image so a reviewer sees exactly which characters were and weren't found. In the app, clicking a cell lights up its source region for you.
- Store edits beside the originalWhen a reviewer corrects a value, keep the correction next to the original OCR value rather than overwriting it. You retain a full audit trail of what the engine read versus what a human accepted.
- Query the stored sheet server-sidePush images into a sheet with /upload, then audit with GET /view — for example where=ocrStatus~low with boxes=1 — to pull weakly-matched rows for review with no re-OCR and no charge.
What are source coordinates in an OCR API?
How is match_ratio different from a recognition confidence?
How do I flag low-confidence values for human review?
Can I query stored OCR results without re-running OCR?
What does bbox_source tell me?
Extract values you can actually verify
Free tier — 100 scans a month, no credit card. Every value comes back with its on-page source location and a match_ratio you can gate on.