PDF OCR that turns documents into data you can check
Extract structured data from PDFs and scans with space-ocr: line items, built-in templates, CSV/JSON export, and every value returned with its on-page location and a match score.
PDFs are where data goes to hide. An invoice, a stack of receipts, a delivery note — the numbers are right there on the page, but getting them into a spreadsheet usually means retyping. PDF OCR promises to fix that: read the document, get structured fields back. The catch is that most tools stop at a plausible guess and leave you to trust it.
space-ocr answers a stricter question. It turns a PDF into structured rows, and it returns every value with the exact spot on the page it was read from — a box you can see, plus a score for how well it matched. So you don't have to take the extraction on faith; you can check it.
See a real extraction you can check
Hover any field below — the box on the receipt is where that value was read. Every number, box, and match score here is read straight from a real parsed result, not a mockup.

Each value with a box carries a verified on-page location — bbox + 4-point vertices + match_ratio — on a 0–1000 normalized grid (0,0 top-left → 1000,1000 bottom-right), the same shape the live API returns. Hover a field to trace it back to the pixels it came from.
How PDF OCR works in space-ocr
Drop a PDF into the app and each page is rendered to an image, then read and turned into structured fields — a multi-page PDF becomes a set of rows you can sort, filter, and export. If you're calling the API directly, send the page images (the public API takes raster images — JPEG, PNG, GIF, BMP, TIFF, WebP), and you get the same structured result back.
You don't have to write a schema for common documents. Pass a built-in templateId like receipt or invoice, or define your own fields — including an array field whose children describe one line-item row.
curl -s https://api.space-ocr.com/ocr/fields \
-H "Authorization: Bearer $SPACE_OCR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"image": "https://example.com/invoice-page-1.png",
"imageType": "url",
"templateId": "invoice"
}'How to OCR a PDF
- Add your PDFIn the app, drop a PDF — each page is rendered to an image and queued for OCR. For the API, send the page images (url or base64) to /ocr/fields.
- Pick a template or fieldsPass a built-in templateId like 'receipt' or 'invoice', or supply your own fields — including an array field with children for line-item tables.
- Read the structured resultEach value returns with its bbox, vertices, match_ratio, and bbox_source, plus a field_bboxes map locating every field on the page.
- Verify anythingClick a cell to highlight the exact region it was read from; a match_ratio below 0.85 flags a value worth a closer look. Edits are stored beside the original OCR value.
- Export or queryDownload CSV (UTF-8 BOM, line items unfolded) or query a stored sheet with GET /view using where, sort, and select — no re-OCR, no extra charge.
Simple, predictable pricing
Pay $0.05 per image (¥10 / ₩100), with a free tier of 100 scans a month and no credit card. Flat plans add monthly scans, more sheets, and storage.
Can I OCR a PDF with space-ocr?
Does PDF OCR keep the location of each value?
Can it extract tables and line items from a PDF?
What can I export PDF OCR results to?
How much does PDF OCR cost?
Which languages does it handle?
Turn your own PDFs into checkable data
Free tier — 100 scans a month, no credit card. Every value comes back with its on-page location.