PDF OCR

PDF OCR that turns documents into data you can check

Extract structured data from PDFs and scans with space-ocr: line items, built-in templates, CSV/JSON export, and every value returned with its on-page location and a match score.

PDFs are where data goes to hide. An invoice, a stack of receipts, a delivery note — the numbers are right there on the page, but getting them into a spreadsheet usually means retyping. PDF OCR promises to fix that: read the document, get structured fields back. The catch is that most tools stop at a plausible guess and leave you to trust it.

space-ocr answers a stricter question. It turns a PDF into structured rows, and it returns every value with the exact spot on the page it was read from — a box you can see, plus a score for how well it matched. So you don't have to take the extraction on faith; you can check it.

See a real extraction you can check

Hover any field below — the box on the receipt is where that value was read. Every number, box, and match score here is read straight from a real parsed result, not a mockup.

Receipts with extracted-field bounding boxes

Verified fields

KINSHO · 合計 2,045

ライフ · 合計 4,286

Each value with a box carries a verified on-page location — bbox + 4-point vertices + match_ratio — on a 0–1000 normalized grid (0,0 top-left → 1000,1000 bottom-right), the same shape the live API returns. Hover a field to trace it back to the pixels it came from.

Every value located

Each field returns a bounding box (xmin/ymin/xmax/ymax on a 0–1000 grid), four oriented vertices, and a match_ratio — so a number traces back to the exact spot on the page.

Line items, not just totals

Tables come back as repeating rows with a position for every cell, even when a line wraps or merges.

Built-in templates

Apply a receipt, invoice, delivery note, business card, or ID template with one templateId — or define your own fields.

Clean exports

CSV with a UTF-8 BOM (Excel- and CJK-safe, line items unfolded) and JSON over a REST API with async jobs and signed webhooks.

Languages on autopilot

Japanese, Korean, Chinese, and English in one engine — no language hint to set, mixed scripts handled.

Phone photos welcome

EXIF rotation is applied on load and boxes follow the document's tilt, so a skewed scan or photo still lines up.

How PDF OCR works in space-ocr

Drop a PDF into the app and each page is rendered to an image, then read and turned into structured fields — a multi-page PDF becomes a set of rows you can sort, filter, and export. If you're calling the API directly, send the page images (the public API takes raster images — JPEG, PNG, GIF, BMP, TIFF, WebP), and you get the same structured result back.

You don't have to write a schema for common documents. Pass a built-in templateId like receipt or invoice, or define your own fields — including an array field whose children describe one line-item row.

extract fields from a page image

curl -s https://api.space-ocr.com/ocr/fields \
  -H "Authorization: Bearer $SPACE_OCR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "image": "https://example.com/invoice-page-1.png",
    "imageType": "url",
    "templateId": "invoice"
  }'

How to OCR a PDF

Add your PDF
In the app, drop a PDF — each page is rendered to an image and queued for OCR. For the API, send the page images (url or base64) to /ocr/fields.
Pick a template or fields
Pass a built-in templateId like 'receipt' or 'invoice', or supply your own fields — including an array field with children for line-item tables.
Read the structured result
Each value returns with its bbox, vertices, match_ratio, and bbox_source, plus a field_bboxes map locating every field on the page.
Verify anything
Click a cell to highlight the exact region it was read from; a match_ratio below 0.85 flags a value worth a closer look. Edits are stored beside the original OCR value.
Export or query
Download CSV (UTF-8 BOM, line items unfolded) or query a stored sheet with GET /view using where, sort, and select — no re-OCR, no extra charge.

Simple, predictable pricing

Pay $0.05 per image (¥10 / ₩100), with a free tier of 100 scans a month and no credit card. Flat plans add monthly scans, more sheets, and storage.

Free

100 scans / month
3 sheets
1 GB storage

Free — no card

Starter

$19/mo

400 scans / month
10 sheets
10 GB storage

Start free

Turn your own PDFs into checkable data

Free tier — 100 scans a month, no credit card. Every value comes back with its on-page location.

Start free API docs

Convert a Scanned PDF to Excel: Page Images to CSV

How to Convert Scanned Documents Into CSV (Step by Step)

Best OCR Software for Receipts and Invoices (2026 Guide)