Japanese OCR

Japanese OCR that returns data you can check

Read Japanese receipts, invoices, and delivery notes with space-ocr: mixed scripts, full-width and vertical text, CJK-safe CSV, every value located with a box.

Japanese is where ordinary OCR quietly falls apart. A single receipt mixes kanji, kana, half-width katakana, full-width digits, and a stray run of English, and the totals might sit in a vertical column down the right edge. Most tools either force you to pick a language first or hand back a flat blob of text that loses the layout. Japanese OCR that actually helps has to read all of that at once and tell you where each number came from.

space-ocr does both. It reads JP documents and returns structured fields, and it returns every value with the exact spot on the page it was read from — a box you can see, plus a score for how well the text matched the characters detected on the page. Language detection is automatic, so there is no hint to set; one engine handles Japanese, Korean, Chinese, and English together.

See a real Japanese extraction you can check

Hover any field below. The two receipts read here are real — a KINSHO 布施店 slip totalling 2,045 and a ライフ国分店 slip totalling 4,286, both dated August 2019. Every value, box, and match score comes straight from a parsed result, not a mockup, and the boxes follow each line of mixed kanji-kana-digit text.

Receipts with extracted-field bounding boxes

Verified fields

KINSHO · 合計 2,045

ライフ · 合計 4,286

Each value with a box carries a verified on-page location — bbox + 4-point vertices + match_ratio — on a 0–1000 normalized grid (0,0 top-left → 1000,1000 bottom-right), the same shape the live API returns. Hover a field to trace it back to the pixels it came from.

Language detection is automatic

No language hint to choose. A low-cost vision pass detects the script and routes it, so Japanese, Korean, Chinese, and English go through one engine without you flagging anything.

Full-width, vertical, mixed scripts

Kanji, hiragana, katakana, half-width katakana, full-width digits, and English in the same line are normalized together. Vertical columns are detected by text flow and grouped into the right rows.

CJK-safe CSV, no mojibake

Exports are CSV with a UTF-8 BOM, so Excel opens 店舗名, 合計, and product names correctly instead of garbled characters. Line items unfold into sub-rows.

Every value located

Each field returns a bounding box (xmin/ymin/xmax/ymax on a 0–1000 grid), four oriented vertices, and a match_ratio — so 2,045 traces back to the exact spot on the slip.

Real JP documents, line items and all

Receipts, invoices, and delivery notes come back with totals, dates, store names, and a repeating row per line item, each cell keeping its own position even when text wraps.

Phone photos welcome

EXIF rotation is applied on load and boxes follow the document's tilt, so a snapshot of a crumpled receipt taken at an angle still lines up.

How Japanese OCR works in space-ocr

The LLM never invents coordinates. It reads the document, returns each value plus the word-token ids it used, and a character matcher runs first to match those characters against the symbols Vision actually detected on the page. That match produces the box, the oriented vertices, and the match_ratio; the token ids are a secondary override. So full-width and half-width forms of the same digit still resolve to one value, and you get a confidence signal for every field instead of a number you have to trust blindly.

Drop a PDF into the app and each page is rendered to an image first, then read — handy for multi-page invoices and delivery notes. If you call the API directly, send the page images (the public API takes raster images — JPEG, PNG, GIF, BMP, TIFF, WebP) and the structured result is the same. Pass a built-in templateId like receipt, invoice, or delivery, or define your own fields including an array field whose children describe one line-item row.

extract fields from a Japanese receipt image

curl -s https://api.space-ocr.com/ocr/fields \
  -H "Authorization: Bearer $SPACE_OCR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "image": "https://example.com/receipt-jp.jpg",
    "imageType": "url",
    "templateId": "receipt"
  }'

How to OCR a Japanese document

Add your document
In the app, drop a receipt, invoice, or PDF — each page is rendered to an image and queued for OCR. For the API, send the page images (url or base64) to /ocr/fields. No language setting is needed.
Pick a template or fields
Pass a built-in templateId like 'receipt', 'invoice', or 'delivery', or supply your own fields — including an array field with children for line-item tables.
Read the structured result
Each value returns with its bbox, vertices, match_ratio, and bbox_source, plus a field_bboxes map locating every field on the page — full-width and vertical text included.
Verify anything
Click a cell to highlight the exact region it was read from; a match_ratio below 0.85 flags a value worth a closer look. Edits are stored beside the original OCR value.
Export or query
Download CSV (UTF-8 BOM so Japanese opens cleanly, line items unfolded) or query a stored sheet with GET /view using where, sort, and select — no re-OCR, no extra charge.

Simple, predictable pricing

Pay $0.05 per image (¥10 / ₩100), with a free tier of 100 scans a month and no credit card. Flat plans add monthly scans, more sheets, and storage.

Free

100 scans / month
3 sheets
1 GB storage

Free — no card

Starter

$19/mo

400 scans / month
10 sheets
10 GB storage

Start free

Turn your own Japanese documents into checkable data

Free tier — 100 scans a month, no credit card. Every value comes back with its on-page location.

Start free API docs

Receipt OCR to CSV: Convert Receipts and Import Into freee, Money Forward & Yayoi

Invoice & Delivery Note OCR API: Extract Invoice Data to CSV (Developer Guide)

Convert Scanned PDF to Excel (Japanese, No Garbled Text) — Get Tables Into CSV