space ocr
ArticlesDocs
developer

An OCR API with source coordinates you can gate on

How to trust OCR output: every value carries its on-page source location (bbox + vertices) and a match_ratio you can gate on. Flag match_ratio < 0.85 for review, keep edits beside the original, and query stored results with GET /view — no re-OCR.

8 min read· 2026-06-25

Most OCR APIs hand you a value and a recognition confidence: the model's self-reported certainty that it read the characters correctly. That number tells you how sure the model feels — it doesn't tell you where on the page the value came from, or let you cheaply re-check it later. For anything that touches money, compliance, or a downstream system of record, "trust the model" isn't an audit story.

This is a how-to-trust-your-OCR workflow. The idea behind an OCR API with source coordinates is simple: every extracted value should carry the exact spot on the page it was read from — a bounding box plus oriented vertices — and a match_ratio that says how much of the value was actually located on the page. With those two things you can gate: auto-accept the confident values, route the rest to a human, and prove after the fact where each number lives. This article is about the verification workflow, not the coordinate formats themselves — for the format landscape (pixels vs. normalized, polygons vs. quads) see an OCR API with bounding boxes.

Proof first: an extraction that points back to the page

Here's the thing to check before reading another word. Hover any field below — the box on the receipt is exactly where that value was read, and each value carries a match_ratio for how much of it was found on the page. This isn't a mockup; the boxes are drawn from the same bbox/vertices/match_ratio the API returns.

Source receipts with extracted-field bounding boxes
Verified fields
KINSHO · 合計 2,045
ライフ · 合計 4,286

Every value carries a verified on-page location — bbox + 4-point vertices + match_ratio — on a 0–1000 normalized grid (0,0 top-left → 1000,1000 bottom-right), the same shape the live API returns. Hover a field to trace it back to the pixels it came from.

DemoEvery value carries its own source location — a <b>bbox</b>, oriented <b>vertices</b>, and a <b>match_ratio</b> — so an extraction is something you can check, not just trust.
Every value carries its own source location — a bbox, oriented vertices, and a match_ratio — so an extraction is something you can check, not just trust.

Recognition confidence vs. a match ratio you can gate on

Google Cloud Vision, Amazon Textract, Tesseract, and Azure AI Document Intelligence all return geometry and a recognition confidence — the model's certainty it read the glyphs right. That's genuinely useful, but it's a different signal from "how much of this value did we actually find on the page."

space-ocr returns a match_ratio: the share of a value's characters that were located among the symbols the vision OCR actually detected. It's a coverage score, not a self-report. You can gate on it: treat match_ratio >= 0.85 as a confident match (the engine labels it vision_symbol_match), and send anything below that to a human. Paired with each value's bbox_source provenance label, you get a defensible answer to "where did this number come from, and how sure are we?"

✓ Verified

How the coordinates are derived — and why they're checkable. The language model returns each field's text plus a hint of which word tokens it used — never the boxes. The engine's CharMatcher then runs first and matches that text, character by character, against the symbols the vision OCR actually detected on the page; the box lands on those real symbols, and match_ratio scores how much of the value was found (a field is treated as confidently matched at >= 0.85). The model's token hints are a secondary override — they can be noisy and sometimes swap between repeated rows — so column- and row-consistency checks validate them rather than trusting them blindly. The bbox_source label tells you which path produced the box: vision_symbol_match (CharMatcher), token_id / token_id_hybrid (token-hint override), low_confidence (matched below 0.85), or shared_value (propagated from a merged cell). The point isn't that the model can't be wrong; it's that every value is checked back against the page with a score you can gate on.

What comes back per value

Call POST /ocr/fields with one image and you get structured fields where each value carries:

  • bbox — integer { xmin, ymin, xmax, ymax } on a 0–1000 normalized grid (0,0 top-left → 1000,1000 bottom-right). To draw on the source image: pixel_x = bbox_x / 1000 * image_width.
  • vertices — four ordered points (tl, tr, br, bl) for an oriented box that follows a tilted phone photo.
  • match_ratio — 0–1 character coverage; >= 0.85 is a confident match.
  • bbox_source — the provenance label (vision_symbol_match, token_id, token_id_hybrid, low_confidence, shared_value).

The request is one HTTPS call with a Bearer key — no SDK, and the engine takes raster images (JPEG, PNG, GIF, BMP, TIFF, WebP).

extract fields — every value comes back with source coordinates
1
2
3
4
5
6
7
8
curl -s https://api.space-ocr.com/ocr/fields \
  -H "Authorization: Bearer $SPACE_OCR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "image": "https://example.com/receipt.jpg",
    "imageType": "url",
    "templateId": "receipt"
  }'

Gate on match_ratio: flag low-confidence values for review

The verification workflow lives in one rule: auto-accept the confident values, route the rest to a human. Walk the returned fields, and anything with match_ratio < 0.85 (or a bbox_source of low_confidence) goes into a review queue alongside its on-page box so a reviewer can see exactly which characters were and weren't found.

flag_low_confidence.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
import os, json, urllib.request

API = "https://api.space-ocr.com"
KEY = os.environ["SPACE_OCR_API_KEY"]
GATE = 0.85  # values below this go to human review


def ocr_fields(image_url):
    body = json.dumps({
        "image": image_url,
        "imageType": "url",
        "templateId": "receipt",
    }).encode()
    req = urllib.request.Request(
        f"{API}/ocr/fields", data=body,
        headers={
            "Authorization": f"Bearer {KEY}",
            "Content-Type": "application/json",
        },
    )
    with urllib.request.urlopen(req) as r:
        return json.load(r)["data"]


def flag_for_review(data):
    """Return (auto_accept, needs_review) split on match_ratio."""
    auto, review = [], []
    for name, v in data.items():
        if not isinstance(v, dict) or "match_ratio" not in v:
            continue
        row = {
            "field": name,
            "value": v.get("value"),
            "match_ratio": v["match_ratio"],
            "bbox_source": v.get("bbox_source"),
            "bbox": v.get("bbox"),  # where on the page to highlight
        }
        (auto if v["match_ratio"] >= GATE else review).append(row)
    return auto, review


data = ocr_fields("https://example.com/receipt.jpg")
_, needs_review = flag_for_review(data)
for r in needs_review:
    print(f"REVIEW {r['field']}={r['value']!r} "
          f"match_ratio={r['match_ratio']:.2f} "
          f"source={r['bbox_source']} bbox={r['bbox']}")

Because every flagged value carries its bbox, the review tool doesn't need to re-OCR anything — it just highlights the region on the original image. That's the same interaction the demo above shows: click a cell, the source region lights up. A reviewer fixes the value in place, and the correction is stored beside the original OCR value rather than overwriting it — so you keep a full audit trail of what the engine read versus what a human accepted. For the provenance/audit story end to end, see building an OCR audit trail; for hands-on box-level validation, see how to validate OCR with bounding boxes.

DemoClick any value and its source region lights up on the original — verification with no re-OCR, because the box travelled with the value.
Click any value and its source region lights up on the original — verification with no re-OCR, because the box travelled with the value.

Query stored results server-side — without re-OCR

Once documents are processed into a sheet (via POST /upload), you don't re-run OCR to audit them. GET /view queries the stored results server-side — where, sort, select, limit, offset, and boxes — at no charge. That makes "show me every row where the match was weak" or "pull the high-value invoices" a single read, not a re-extraction.

The where filter accepts repeated clauses (AND'd together) with operators = != > >= < <= ~ (~ is contains), matching either a column or ocrStatus. Set boxes=1 to keep each cell's vertices/field_bboxes in the response so you can highlight straight from the query.

pull low-confidence rows from a stored sheet — no re-OCR, no charge
1
2
3
4
5
6
7
8
9
# rows the engine matched weakly, newest first, with boxes for highlighting
curl -s -G https://api.space-ocr.com/view \
  -H "Authorization: Bearer $SPACE_OCR_API_KEY" \
  --data-urlencode "path=/invoices/2026-06" \
  --data-urlencode "where=ocrStatus~low" \
  --data-urlencode "sort=-invoice_date" \
  --data-urlencode "select=vendor,total,invoice_date" \
  --data-urlencode "boxes=1" \
  --data-urlencode "limit=50"

How to build a verify-before-trust OCR pipeline

  1. Extract with source coordinates
    POST the image to /ocr/fields with imageType 'url' or 'base64'. Each value comes back with a bbox, oriented vertices, a match_ratio, and a bbox_source — language is detected automatically.
  2. Gate on match_ratio
    Auto-accept values with match_ratio at or above 0.85 (bbox_source vision_symbol_match); route anything below — or labelled low_confidence — into a human review queue, carrying its bbox along.
  3. Verify by highlighting, not re-OCR
    Draw each flagged value's bbox on the original image so a reviewer sees exactly which characters were and weren't found. In the app, clicking a cell lights up its source region for you.
  4. Store edits beside the original
    When a reviewer corrects a value, keep the correction next to the original OCR value rather than overwriting it. You retain a full audit trail of what the engine read versus what a human accepted.
  5. Query the stored sheet server-side
    Push images into a sheet with /upload, then audit with GET /view — for example where=ocrStatus~low with boxes=1 — to pull weakly-matched rows for review with no re-OCR and no charge.
What are source coordinates in an OCR API?
Source coordinates are the exact location on the page that an extracted value was read from. space-ocr returns, per value, a bbox of integer xmin/ymin/xmax/ymax on a 0–1000 normalized grid plus four oriented vertices (top-left, top-right, bottom-right, bottom-left). To draw on the source image you convert with pixel_x = bbox_x / 1000 * image_width. They let you point any value back to the precise region it came from, for highlighting or audit.
How is match_ratio different from a recognition confidence?
A recognition confidence is the model's self-reported certainty that it read the characters correctly — Google Cloud Vision, Amazon Textract, Tesseract, and Azure Document Intelligence all document one. match_ratio is a coverage score: the share of the value's characters that were actually located among the symbols the vision OCR detected on the page. Because it measures how much of the value was found on the page rather than how confident the model feels, it's a signal you can gate on — treat 0.85 and above as a confident match and send the rest to a human.
How do I flag low-confidence values for human review?
Walk the returned fields and route anything with match_ratio below 0.85 — or a bbox_source of low_confidence — into a review queue. Each flagged value carries its own bbox, so the reviewer sees exactly which region to check without re-running OCR. The Python snippet in this article splits the fields into auto-accept and needs-review on that 0.85 gate.
Can I query stored OCR results without re-running OCR?
Yes. After documents are processed into a sheet, GET /view queries the stored results server-side — where, sort, select, limit, offset, and boxes — with no re-OCR and no charge. You can pull, for example, every row where ocrStatus matches 'low', sorted newest first, with boxes=1 to keep each cell's coordinates for highlighting. The where filter supports = != > >= < <= and ~ (contains).
What does bbox_source tell me?
bbox_source is a provenance label for how each box was derived. vision_symbol_match means CharMatcher located the value's characters among the detected symbols (the usual confident path, match_ratio at or above 0.85). token_id and token_id_hybrid mean the box came from the model's word-token hint as a secondary override. low_confidence means the character match landed below 0.85. shared_value means the box was propagated from a merged cell. It lets you audit not just where a value is, but how its location was established.

Extract values you can actually verify

Free tier — 100 scans a month, no credit card. Every value comes back with its on-page source location and a match_ratio you can gate on.

Related