space ocr
ArticlesDocs
developer

Extract invoice line items to JSON — the exact request and response shape

The developer's JSON-API contract for invoice line items: request rows as a type:'array' field, get back each cell with its own bbox, vertices and match_ratio, parse it in Python or JS, and unfold the array to CSV. Code-heavy, with a live demo you can check.

8 min read· 2026-06-25

If you want to extract invoice line items to JSON from a scanned or photographed invoice, the part that matters to a developer isn't "does it OCR" — it's the contract. What field do you send to request a table of rows? What exactly comes back for each cell? Can you trust a number enough to push it downstream, or do you need to flag it for review? This guide is the contract, end to end: the request field spec, a representative response shape, a parse snippet in Python and JS, and how to unfold the array into CSV.

space-ocr models line items as a single field of type:"array" whose children describe one row. The response gives you, per cell, the verbatim value plus its own bbox, oriented vertices, a match_ratio (how much of the value was actually located on the page), and a bbox_source. That per-cell positioning is what lets you trace a wrapped or merged line item back to the exact pixels it was read from — instead of trusting a flat string.

For a higher-level walkthrough of the table-extraction problem, see extract line items from invoices; for the async, sheet-and-webhook side, see the invoice data extraction API. This article stays on the JSON wire format.

Proof first: every cell points back to the page

Before the JSON, see the thing the JSON encodes. Hover any field below — the box on the receipt is exactly where that value was read, and each value carries a match_ratio for how much of it was found on the page. The line-item rows are the same array structure this article documents.

Source receipts with extracted-field bounding boxes
Verified fields
KINSHO · 合計 2,045
ライフ · 合計 4,286

Every value carries a verified on-page location — bbox + 4-point vertices + match_ratio — on a 0–1000 normalized grid (0,0 top-left → 1000,1000 bottom-right), the same shape the live API returns. Hover a field to trace it back to the pixels it came from.

DemoEach extracted value — including every line-item cell — carries its own bounding box and <b>match ratio</b>: not just a number, but where on the page it lives and how well it matched.
Each extracted value — including every line-item cell — carries its own bounding box and match ratio: not just a number, but where on the page it lives and how well it matched.

The request: line items as a type:"array" field

You call POST /ocr/fields with one image and a fields array. A scalar field (invoice number, total) is a { name, type } pair. Line items are one field whose type is "array" and whose children describe the shape of a single row — the engine then returns as many rows as it finds on the page.

Field names and params are camelCase (the snake_case aliases like image_type still work but are deprecated). The engine takes raster images — JPEG, PNG, GIF, BMP, TIFF, WebP — sent as a URL or as pure base64; there's no PDF parsing at the engine layer (the web app converts PDF pages to PNG before OCR). Language is auto-detected, so there's no language parameter to set.

request body — POST /ocr/fields
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
{
  "image": "https://example.com/invoice.jpg",
  "imageType": "url",
  "fields": [
    { "name": "invoice_no", "type": "string" },
    { "name": "invoice_date", "type": "string", "description": "YYYY-MM-DD if printed" },
    { "name": "total", "type": "string", "description": "grand total as printed, keep separators" },
    {
      "name": "line_items",
      "type": "array",
      "description": "one object per invoice row",
      "children": [
        { "name": "description", "type": "string" },
        { "name": "quantity", "type": "string" },
        { "name": "unit_price", "type": "string" },
        { "name": "amount", "type": "string" }
      ]
    }
  ]
}

If you'd rather not spell out the schema, pass templateId: "invoice" (or "receipt") and the built-in schema supplies sensible fields and a line-item array for you. Supplying your own fields always wins if you send both. The full curl is below.

curl — one image in, structured JSON out
1
2
3
4
5
6
7
8
curl -s https://api.space-ocr.com/ocr/fields \
  -H "Authorization: Bearer $SPACE_OCR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "image": "https://example.com/invoice.jpg",
    "imageType": "url",
    "templateId": "invoice"
  }'

The response: each cell value carries its own coordinates

The response is { "status": "success", "data": { ... } }. A scalar field is an object with value, bbox, vertices, match_ratio, and bbox_source. The array field is a value that is a list of rows; each row exposes a field_bboxes map keyed by child name, where each child holds its own value and the same coordinate quartet.

A few keys you'll rely on when parsing:

  • bbox — integer { xmin, ymin, xmax, ymax } on a 0–1000 normalized grid (0,0 top-left, 1000,1000 bottom-right). It is not pixels and not left/top/width/height. Convert with pixel_x = bbox.xmin / 1000 * image_width.
  • vertices — four ordered points (tl, tr, br, bl) for an oriented box that follows a tilted phone photo.
  • match_ratio — the share of the value's characters located on the page, 0–1. >= 0.85 is a confident match; below that is worth a human glance.
  • bbox_source — how the box was derived: vision_symbol_match (the usual character-match path), token_id / token_id_hybrid (a secondary override from the model's word-token hint), low_confidence (match below 0.85), or shared_value (propagated from a merged cell).
response shape (representative)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
{
  "status": "success",
  "data": {
    "invoice_no": {
      "value": "INV-2049",
      "bbox": { "xmin": 612, "ymin": 84, "xmax": 788, "ymax": 116 },
      "vertices": [
        { "x": 612, "y": 84 }, { "x": 788, "y": 85 },
        { "x": 788, "y": 116 }, { "x": 612, "y": 115 }
      ],
      "match_ratio": 1.0,
      "bbox_source": "vision_symbol_match"
    },
    "total": {
      "value": "4,286",
      "bbox": { "xmin": 690, "ymin": 902, "xmax": 842, "ymax": 938 },
      "vertices": [
        { "x": 690, "y": 902 }, { "x": 842, "y": 903 },
        { "x": 842, "y": 938 }, { "x": 690, "y": 937 }
      ],
      "match_ratio": 0.9285,
      "bbox_source": "vision_symbol_match"
    },
    "line_items": {
      "value": [
        {
          "field_bboxes": {
            "description": {
              "value": "96K\u8abf\u88fd\u8c46\u4e73",
              "bbox": { "xmin": 96, "ymin": 412, "xmax": 388, "ymax": 446 },
              "match_ratio": 0.9230,
              "bbox_source": "vision_symbol_match"
            },
            "quantity": {
              "value": "2",
              "bbox": { "xmin": 612, "ymin": 412, "xmax": 642, "ymax": 446 },
              "match_ratio": 1.0,
              "bbox_source": "token_id"
            },
            "unit_price": {
              "value": "316",
              "bbox": { "xmin": 720, "ymin": 412, "xmax": 800, "ymax": 446 },
              "match_ratio": 1.0,
              "bbox_source": "vision_symbol_match"
            },
            "amount": {
              "value": "632",
              "bbox": { "xmin": 860, "ymin": 412, "xmax": 944, "ymax": 446 },
              "match_ratio": 0.9375,
              "bbox_source": "vision_symbol_match"
            }
          }
        }
      ]
    }
  }
}
✓ Verified

Why those coordinates can be trusted: the model never returns the boxes. The language model returns each value's text — and at most a hint of which word tokens it used — but not the geometry. The engine then character-matches that text against the symbols the vision OCR actually detected on the page, lands the box on those real pixels, and scores each value with a match_ratio (treated as a confident match at ≥ 0.85). The token hint is only a secondary override, and it can be noisy — the model sometimes swaps tokens between repeated rows — so column- and row-consistency checks validate it rather than trusting it blindly. The takeaway for your parser: don't gate on the model's certainty, gate on match_ratio, which tells you how much of each value was found on the actual page.

Parsing it in code

The shape is regular: scalars at data[name].value, rows at data["line_items"].value, and each cell at row["field_bboxes"][child].value. Iterate the rows, pull each child's value and match_ratio, and flag anything below 0.85 for review before it goes downstream.

parse_line_items.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import os, json, urllib.request

API = "https://api.space-ocr.com/ocr/fields"
KEY = os.environ["SPACE_OCR_API_KEY"]

body = json.dumps({
    "image": "https://example.com/invoice.jpg",
    "imageType": "url",
    "templateId": "invoice",
}).encode()

req = urllib.request.Request(
    API, data=body,
    headers={"Authorization": f"Bearer {KEY}",
             "Content-Type": "application/json"},
)
resp = json.load(urllib.request.urlopen(req))
data = resp["data"]

# scalar fields
print("invoice_no:", data.get("invoice_no", {}).get("value"))
print("total:", data.get("total", {}).get("value"))

# line items: data["line_items"]["value"] is a list of rows
for i, row in enumerate(data.get("line_items", {}).get("value", [])):
    cells = row["field_bboxes"]
    record = {name: cell.get("value") for name, cell in cells.items()}
    low = [name for name, cell in cells.items()
           if cell.get("match_ratio", 0) < 0.85]
    print(f"row {i}: {record}" + (f"  REVIEW: {low}" if low else ""))

The JavaScript shape is identical — data.line_items.value is an array of { field_bboxes } objects, and Object.entries(row.field_bboxes) gives you [childName, cell] pairs with cell.value, cell.bbox, and cell.match_ratio.

lineitems_to_csv.py — unfold the array, one CSV row per item
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import csv

# `data` is the response "data" object from the previous step.
rows = data.get("line_items", {}).get("value", [])
invoice_no = data.get("invoice_no", {}).get("value", "")

child_cols = ["description", "quantity", "unit_price", "amount"]
header = ["invoice_no", *child_cols, "min_match_ratio"]

# UTF-8 BOM so Excel reads CJK and currency text correctly
with open("line_items.csv", "w", encoding="utf-8-sig", newline="") as f:
    w = csv.writer(f)
    w.writerow(header)
    for row in rows:
        cells = row["field_bboxes"]
        values = [cells.get(c, {}).get("value", "") for c in child_cols]
        ratios = [cells.get(c, {}).get("match_ratio", 1.0) for c in child_cols]
        w.writerow([invoice_no, *values, round(min(ratios), 4)])

That utf-8-sig (UTF-8 BOM) is what lets Excel open the file with Japanese, Korean, and currency text intact. Values are preserved verbatim — "7,855" keeps its separators, full-width characters stay full-width — so normalize in your own code, not by trusting the OCR to have stripped anything. If you'd rather not run the export at all, push images into a stored sheet and let the server unfold array rows for you: that's the scanned documents to CSV path. For the coordinate model in depth, see an OCR API with bounding boxes.

DemoArray line items unfold to one CSV row per item, UTF-8 BOM so Excel reads CJK and currency text correctly.
Array line items unfold to one CSV row per item, UTF-8 BOM so Excel reads CJK and currency text correctly.

How to extract invoice line items to JSON

  1. Get a key
    Sign up for the free tier (100 scans a month, no credit card) and grab your spocr_ API key. Authenticate every request with Authorization: Bearer.
  2. Define line items as an array field
    In the fields array of POST /ocr/fields, add one field of type 'array' whose children describe a single row (description, quantity, unit_price, amount). Or pass templateId 'invoice' to use the built-in schema. The engine takes raster images (JPEG, PNG, GIF, BMP, TIFF, WebP) as a URL or pure base64.
  3. Send the image and read the response
    POST one image. The response is { status: 'success', data: {...} }; line items live at data['line_items'].value as a list of rows, each with a field_bboxes map keyed by child name.
  4. Parse each cell and check match_ratio
    For each row, read child values from row['field_bboxes'][child].value, and use each cell's bbox (0–1000 grid), vertices, and match_ratio. Flag any cell with match_ratio below 0.85 for review before it goes downstream.
  5. Unfold to CSV
    Write one CSV row per line item, expanding the array, with a UTF-8 BOM so Excel reads CJK and currency text correctly. Or push images into a stored sheet and export CSV server-side with array rows already unfolded — no re-OCR charge.
How do I request invoice line items as JSON instead of one flat string?
Send POST /ocr/fields with a fields array, and model the table as a single field of type 'array' whose children describe one row — for example children description, quantity, unit_price, amount. The engine returns as many rows as it finds, and each cell comes back individually with its own value, bbox, vertices, match_ratio, and bbox_source. If you'd rather not spell out the schema, pass templateId 'invoice' or 'receipt' and the built-in schema supplies the line-item array for you.
What is the exact JSON response shape for line items?
The response is { status: 'success', data: {...} }. Scalar fields sit at data[name] as an object with value, bbox, vertices, match_ratio and bbox_source. The array field's value is a list of rows; each row has a field_bboxes map keyed by child name, and each child holds its own value plus the same coordinate quartet. So in code you read rows at data['line_items'].value and each cell at row['field_bboxes'][child].value.
What coordinate system are the bounding boxes in?
Each bbox is integer keys { xmin, ymin, xmax, ymax } on a 0–1000 normalized grid (0,0 top-left, 1000,1000 bottom-right) — not pixels and not left/top/width/height. Convert to pixels with pixel_x = bbox.xmin / 1000 * image_width. Each value also carries four ordered vertices (tl, tr, br, bl) for an oriented box that follows a tilted photo. Because the grid is normalized, you don't have to track the uploaded image's exact pixel dimensions to draw an overlay.
How do I know which extracted values to trust before pushing them downstream?
Gate on match_ratio, the share of each value's characters actually located on the page (0–1). A value at 0.85 or above is a confident match (bbox_source vision_symbol_match); below that it's labeled low_confidence and is worth a human glance. The model returns text, not coordinates — the engine character-matches that text against the symbols the vision OCR detected, so match_ratio reflects how much of the value was found on the real page rather than the model's self-reported certainty.
How do I export the JSON line items to CSV?
Iterate data['line_items'].value, pull each child's value from row['field_bboxes'], and write one CSV row per item — unfolding the array so each line item becomes its own row. Write the file with a UTF-8 BOM (Python's 'utf-8-sig') so Excel reads Japanese, Korean, and currency text correctly. Values are preserved verbatim, including comma separators and full-width characters, so do any normalization in your own code. A stored sheet can also export CSV server-side with array rows already unfolded.

Extract invoice line items to JSON your code can trust

Free tier — 100 scans a month, no credit card. Every cell comes back with its own bounding box and a match ratio.

Related