Validate OCR output with bounding boxes
A confidence score tells you a model is unsure; a bounding box tells you where to look. Here's how to use per-field boxes and match ratios to validate extracted data instead of trusting it blind.
Validating OCR usually means re-reading the document yourself — slow, and exactly the work you were trying to avoid. Bounding boxes change the loop: instead of comparing two columns of text, you let your eye jump to the spot on the image where each value was read from. Right or wrong becomes obvious in a glance.

Every value carries a verified on-page location — bbox + 4-point vertices + match_ratio — on a 0–1000 normalized grid (0,0 top-left → 1000,1000 bottom-right), the same shape the live API returns. Hover a field to trace it back to the pixels it came from.
Two signals that travel with every value
space-ocr returns more than text on each field:
- A bounding box + oriented vertices — where the value was read. If the box sits on the wrong part of the page, the value is suspect even if it looks plausible.
- A
match_ratio(0–1) — how much of the value's characters were actually located on the page. The engine treats ≥ 0.85 as a confident match; lower values are your queue of fields to eyeball.
Together they turn validation into triage: sort by match ratio, look only at the low ones, and confirm each by its box.
Why the box can be trusted
The boxes don't come from the language model. The model returns the value and the word-token IDs it used; the engine resolves those tokens to regions the vision OCR actually detected. So a box is evidence the text exists on the page — not a guess drawn around where it should be. For the full reasoning, see Document OCR with an audit trail.
- Pull the result with boxesCall GET /view (boxes are included by default) or POST /ocr/fields — every value carries a bbox and match_ratio.
- Sort by match ratioSurface fields below 0.85 first; those are the values most worth a human glance.
- Confirm by locationClick or hover a flagged value to see the exact region it was read from on the source image.
- Override and move onCorrect any miss inline; the original OCR value is kept for the record.