Extract invoice line items to JSON — the exact request and response shape
The developer's JSON-API contract for invoice line items: request rows as a type:'array' field, get back each cell with its own bbox, vertices and match_ratio, parse it in Python or JS, and unfold the array to CSV. Code-heavy, with a live demo you can check.
If you want to extract invoice line items to JSON from a scanned or photographed invoice, the part that matters to a developer isn't "does it OCR" — it's the contract. What field do you send to request a table of rows? What exactly comes back for each cell? Can you trust a number enough to push it downstream, or do you need to flag it for review? This guide is the contract, end to end: the request field spec, a representative response shape, a parse snippet in Python and JS, and how to unfold the array into CSV.
space-ocr models line items as a single field of type:"array" whose children describe one row. The response gives you, per cell, the verbatim value plus its own bbox, oriented vertices, a match_ratio (how much of the value was actually located on the page), and a bbox_source. That per-cell positioning is what lets you trace a wrapped or merged line item back to the exact pixels it was read from — instead of trusting a flat string.
For a higher-level walkthrough of the table-extraction problem, see extract line items from invoices; for the async, sheet-and-webhook side, see the invoice data extraction API. This article stays on the JSON wire format.
Proof first: every cell points back to the page
Before the JSON, see the thing the JSON encodes. Hover any field below — the box on the receipt is exactly where that value was read, and each value carries a match_ratio for how much of it was found on the page. The line-item rows are the same array structure this article documents.

Every value carries a verified on-page location — bbox + 4-point vertices + match_ratio — on a 0–1000 normalized grid (0,0 top-left → 1000,1000 bottom-right), the same shape the live API returns. Hover a field to trace it back to the pixels it came from.
The request: line items as a type:"array" field
You call POST /ocr/fields with one image and a fields array. A scalar field (invoice number, total) is a { name, type } pair. Line items are one field whose type is "array" and whose children describe the shape of a single row — the engine then returns as many rows as it finds on the page.
Field names and params are camelCase (the snake_case aliases like image_type still work but are deprecated). The engine takes raster images — JPEG, PNG, GIF, BMP, TIFF, WebP — sent as a URL or as pure base64; there's no PDF parsing at the engine layer (the web app converts PDF pages to PNG before OCR). Language is auto-detected, so there's no language parameter to set.
{
"image": "https://example.com/invoice.jpg",
"imageType": "url",
"fields": [
{ "name": "invoice_no", "type": "string" },
{ "name": "invoice_date", "type": "string", "description": "YYYY-MM-DD if printed" },
{ "name": "total", "type": "string", "description": "grand total as printed, keep separators" },
{
"name": "line_items",
"type": "array",
"description": "one object per invoice row",
"children": [
{ "name": "description", "type": "string" },
{ "name": "quantity", "type": "string" },
{ "name": "unit_price", "type": "string" },
{ "name": "amount", "type": "string" }
]
}
]
}If you'd rather not spell out the schema, pass templateId: "invoice" (or "receipt") and the built-in schema supplies sensible fields and a line-item array for you. Supplying your own fields always wins if you send both. The full curl is below.
curl -s https://api.space-ocr.com/ocr/fields \
-H "Authorization: Bearer $SPACE_OCR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"image": "https://example.com/invoice.jpg",
"imageType": "url",
"templateId": "invoice"
}'The response: each cell value carries its own coordinates
The response is { "status": "success", "data": { ... } }. A scalar field is an object with value, bbox, vertices, match_ratio, and bbox_source. The array field is a value that is a list of rows; each row exposes a field_bboxes map keyed by child name, where each child holds its own value and the same coordinate quartet.
A few keys you'll rely on when parsing:
bbox— integer{ xmin, ymin, xmax, ymax }on a 0–1000 normalized grid (0,0 top-left, 1000,1000 bottom-right). It is not pixels and notleft/top/width/height. Convert withpixel_x = bbox.xmin / 1000 * image_width.vertices— four ordered points (tl, tr, br, bl) for an oriented box that follows a tilted phone photo.match_ratio— the share of the value's characters located on the page, 0–1.>= 0.85is a confident match; below that is worth a human glance.bbox_source— how the box was derived:vision_symbol_match(the usual character-match path),token_id/token_id_hybrid(a secondary override from the model's word-token hint),low_confidence(match below 0.85), orshared_value(propagated from a merged cell).
{
"status": "success",
"data": {
"invoice_no": {
"value": "INV-2049",
"bbox": { "xmin": 612, "ymin": 84, "xmax": 788, "ymax": 116 },
"vertices": [
{ "x": 612, "y": 84 }, { "x": 788, "y": 85 },
{ "x": 788, "y": 116 }, { "x": 612, "y": 115 }
],
"match_ratio": 1.0,
"bbox_source": "vision_symbol_match"
},
"total": {
"value": "4,286",
"bbox": { "xmin": 690, "ymin": 902, "xmax": 842, "ymax": 938 },
"vertices": [
{ "x": 690, "y": 902 }, { "x": 842, "y": 903 },
{ "x": 842, "y": 938 }, { "x": 690, "y": 937 }
],
"match_ratio": 0.9285,
"bbox_source": "vision_symbol_match"
},
"line_items": {
"value": [
{
"field_bboxes": {
"description": {
"value": "96K\u8abf\u88fd\u8c46\u4e73",
"bbox": { "xmin": 96, "ymin": 412, "xmax": 388, "ymax": 446 },
"match_ratio": 0.9230,
"bbox_source": "vision_symbol_match"
},
"quantity": {
"value": "2",
"bbox": { "xmin": 612, "ymin": 412, "xmax": 642, "ymax": 446 },
"match_ratio": 1.0,
"bbox_source": "token_id"
},
"unit_price": {
"value": "316",
"bbox": { "xmin": 720, "ymin": 412, "xmax": 800, "ymax": 446 },
"match_ratio": 1.0,
"bbox_source": "vision_symbol_match"
},
"amount": {
"value": "632",
"bbox": { "xmin": 860, "ymin": 412, "xmax": 944, "ymax": 446 },
"match_ratio": 0.9375,
"bbox_source": "vision_symbol_match"
}
}
}
]
}
}
}Why those coordinates can be trusted: the model never returns the boxes. The language model returns each value's text — and at most a hint of which word tokens it used — but not the geometry. The engine then character-matches that text against the symbols the vision OCR actually detected on the page, lands the box on those real pixels, and scores each value with a match_ratio (treated as a confident match at ≥ 0.85). The token hint is only a secondary override, and it can be noisy — the model sometimes swaps tokens between repeated rows — so column- and row-consistency checks validate it rather than trusting it blindly. The takeaway for your parser: don't gate on the model's certainty, gate on match_ratio, which tells you how much of each value was found on the actual page.
Parsing it in code
The shape is regular: scalars at data[name].value, rows at data["line_items"].value, and each cell at row["field_bboxes"][child].value. Iterate the rows, pull each child's value and match_ratio, and flag anything below 0.85 for review before it goes downstream.
import os, json, urllib.request
API = "https://api.space-ocr.com/ocr/fields"
KEY = os.environ["SPACE_OCR_API_KEY"]
body = json.dumps({
"image": "https://example.com/invoice.jpg",
"imageType": "url",
"templateId": "invoice",
}).encode()
req = urllib.request.Request(
API, data=body,
headers={"Authorization": f"Bearer {KEY}",
"Content-Type": "application/json"},
)
resp = json.load(urllib.request.urlopen(req))
data = resp["data"]
# scalar fields
print("invoice_no:", data.get("invoice_no", {}).get("value"))
print("total:", data.get("total", {}).get("value"))
# line items: data["line_items"]["value"] is a list of rows
for i, row in enumerate(data.get("line_items", {}).get("value", [])):
cells = row["field_bboxes"]
record = {name: cell.get("value") for name, cell in cells.items()}
low = [name for name, cell in cells.items()
if cell.get("match_ratio", 0) < 0.85]
print(f"row {i}: {record}" + (f" REVIEW: {low}" if low else ""))The JavaScript shape is identical — data.line_items.value is an array of { field_bboxes } objects, and Object.entries(row.field_bboxes) gives you [childName, cell] pairs with cell.value, cell.bbox, and cell.match_ratio.
import csv
# `data` is the response "data" object from the previous step.
rows = data.get("line_items", {}).get("value", [])
invoice_no = data.get("invoice_no", {}).get("value", "")
child_cols = ["description", "quantity", "unit_price", "amount"]
header = ["invoice_no", *child_cols, "min_match_ratio"]
# UTF-8 BOM so Excel reads CJK and currency text correctly
with open("line_items.csv", "w", encoding="utf-8-sig", newline="") as f:
w = csv.writer(f)
w.writerow(header)
for row in rows:
cells = row["field_bboxes"]
values = [cells.get(c, {}).get("value", "") for c in child_cols]
ratios = [cells.get(c, {}).get("match_ratio", 1.0) for c in child_cols]
w.writerow([invoice_no, *values, round(min(ratios), 4)])That utf-8-sig (UTF-8 BOM) is what lets Excel open the file with Japanese, Korean, and currency text intact. Values are preserved verbatim — "7,855" keeps its separators, full-width characters stay full-width — so normalize in your own code, not by trusting the OCR to have stripped anything. If you'd rather not run the export at all, push images into a stored sheet and let the server unfold array rows for you: that's the scanned documents to CSV path. For the coordinate model in depth, see an OCR API with bounding boxes.
How to extract invoice line items to JSON
- Get a keySign up for the free tier (100 scans a month, no credit card) and grab your spocr_ API key. Authenticate every request with Authorization: Bearer.
- Define line items as an array fieldIn the fields array of POST /ocr/fields, add one field of type 'array' whose children describe a single row (description, quantity, unit_price, amount). Or pass templateId 'invoice' to use the built-in schema. The engine takes raster images (JPEG, PNG, GIF, BMP, TIFF, WebP) as a URL or pure base64.
- Send the image and read the responsePOST one image. The response is { status: 'success', data: {...} }; line items live at data['line_items'].value as a list of rows, each with a field_bboxes map keyed by child name.
- Parse each cell and check match_ratioFor each row, read child values from row['field_bboxes'][child].value, and use each cell's bbox (0–1000 grid), vertices, and match_ratio. Flag any cell with match_ratio below 0.85 for review before it goes downstream.
- Unfold to CSVWrite one CSV row per line item, expanding the array, with a UTF-8 BOM so Excel reads CJK and currency text correctly. Or push images into a stored sheet and export CSV server-side with array rows already unfolded — no re-OCR charge.
How do I request invoice line items as JSON instead of one flat string?
What is the exact JSON response shape for line items?
What coordinate system are the bounding boxes in?
How do I know which extracted values to trust before pushing them downstream?
How do I export the JSON line items to CSV?
Extract invoice line items to JSON your code can trust
Free tier — 100 scans a month, no credit card. Every cell comes back with its own bounding box and a match ratio.