space ocr
GuidesArticlesPricingDocs

Using the space ocr Claude skill to turn documents into structured, queryable data

A dependency-free Claude Code skill that turns document images into structured, located fields and stores them as queryable sheets behind the space ocr API — no database, SDK, or MCP server.

The space ocr skill is a capability you add to Claude Code: a Claude skill, packaged as a Claude Code plugin. Install it once, and from then on Claude Code can take a document image — an invoice, a receipt, a business card, an ID, a form — and turn it into structured fields rather than a wall of loose text. Each field comes back tied to the exact spot on the page it was read from, and the results can be stored and queried through one API.

It is worth being clear about what this replaces. The common alternative is to paste an image into a chat and ask the model to read it — which works until you need the same fields every time, a record of where each value sat on the page, or somewhere to keep the results. The other alternative is to stand up your own pipeline: an OCR engine, a parsing layer, a database, maybe a vector store. The space ocr skill sits between those. Extraction runs on the API's servers and returns a fixed shape; the same API stores the results as folders, sheets, and rows. There is no database to build and nothing to embed.

The runtime is deliberately small. The skill ships a single standard-library Python script, scripts/space_ocr.py. There is no pip install, no MCP server, and no SDK — the only requirement is python3. Every command prints JSON to stdout, which is what lets the assistant chain calls and reason over results. Talking to the API is Authorization: Bearer spocr_... against the base URL https://api.space-ocr.com (override with SPACE_OCR_API_BASE if you need to).

Install it

You install the skill once, from inside Claude Code, using two slash commands. The first registers the plugin marketplace; the second installs the space-ocr plugin from it. After this, Claude Code knows the skill exists and can invoke the space_ocr.py client on your behalf — you do not run these two commands again per project.

1
2
/plugin marketplace add oisidonut/claude-space-ocr-skill
/plugin install space-ocr@space-ocr

Set up the API key

Create an API key at https://space-ocr.com → Settings → API Keys. New accounts get 100 free scans with no card required. The script reads the key from the environment variable SPACE_OCR_API_KEY (a value beginning with spocr_), or from a .env file in the project root. Once the key is in place, a one-line balance call confirms the setup end to end — it authenticates and prints your remaining quota. Reads like this cost nothing.

1
2
export SPACE_OCR_API_KEY=spocr_xxx
python3 scripts/space_ocr.py balance

The balance output reports free.remaining (your free scans), any flatfee allowance, and your paid balance. The quota model is simple: one OCR call, or one uploaded image, costs exactly 1 scan; reads cost 0. Viewing and querying stored rows are free. If a scan fails — for example, the engine cannot make sense of the image — it is auto-refunded, so a bad input does not silently eat quota.

Extract one document

For a true one-off, point ocr at a single image and pick a built-in template. The template fixes which fields are pulled, so the output shape is predictable. Below, the invoice template returns invoice fields — number, date, vendor, total, line items, and so on — each with its on-page location.

1
python3 scripts/space_ocr.py ocr invoice.jpg --template invoice

Built-in --template ids cover the common documents: invoice, receipt, business_card, purchase_order, delivery, quote, bankbook, passport, driver_license, resident_card, my_number_card, and residence_card. For a document type that isn't on that list, pass your own schema with --fields <schema.json>, or use --auto to let the engine infer roughly four to eight fields. --auto is not a blank cheque: it has rejection gates, so an unstructured, empty, or sideways photo returns an error and the scan is refunded rather than inventing fields to fill the gap. The image argument can be a file path, a URL, or base64.

A folder of documents becomes a sheet

Once you have more than one document, don't extract them one by one and read the JSON back. Create a sheet with a column schema, then upload the images into it; OCR runs server-side, asynchronously, and the extracted values land as rows behind the API. Pass --wait to block until every upload's OCR has finished. This is the point where space ocr stops being an OCR call and starts being storage.

1
2
3
4
5
6
# 1) define the columns once, then create the sheet
python3 scripts/space_ocr.py create sheet /invoices "March" --columns columns.json
# -> returns a path like /invoices/8G90wq...  (reuse this uniqueKey path)

# 2) drop every image in; OCR runs server-side (async), --wait blocks until done
python3 scripts/space_ocr.py upload /invoices/8G90wq... *.jpg --wait
Why it matters

Paths gotcha. Folders are addressed by their name (/invoices). A sheet or memo is addressed by the uniqueKey path that create returns — for example /invoices/8G90wq...not by its display name. A sheet titled "March" does not live at /invoices/March. Capture and reuse the path from create for every later upload, view, query, and edit. To re-find it, view or space the parent folder and read the matching item's path.

If you need to drive the async OCR yourself instead of blocking, the upload call returns job ids you can poll with job <JOB_ID>. Both ocr and upload also accept an Idempotency-Key: retrying a call with the same key replays the cached result and does not charge a second scan — useful for safe retries inside a batch.

Answer questions from the stored rows

With the documents now in a sheet, answering questions about them is a query, not another round of OCR. Push the work to the server with --where, --sort, and --limit so only the rows you need come back into context. Reads are free, so filtering is effectively your SELECT statement over the sheet. The query below pulls invoices with a total of at least 40,000, sorted highest first, capped at twenty rows.

1
2
python3 scripts/space_ocr.py query /invoices/8G90wq... \
  --where 'total>=40000' --sort total:desc --limit 20

query returns lean rows with the bounding boxes dropped, which is what you want for reasoning and aggregation; view returns the same rows with the box geometry intact when you need to point at the page. The rule of thumb: once a document is a row, you query the row — you do not re-open the source image or spend a scan re-reading it.

The four rules that keep it lean and trustworthy

The skill is built around four behaviors, and following them is what keeps usage cheap and the results checkable:

  1. Store, don't dump. After more than one document, write rows into a sheet instead of pasting raw OCR JSON back into the conversation. The heavy data lives behind the API, not in the context window.
  2. Check before you scan. Run balance before a batch so you know there's quota for the whole folder, and reuse an existing row rather than re-scanning a document you've already processed.
  3. Answer from stored rows. Use query <sheet> with server-side --where / --sort / --limit. Reads are free; re-OCR is not.
  4. Cite the location, flag uncertainty. Every value carries a field_bboxes location, so say where on the page it came from, and surface anything blank or low-confidence rather than asserting it. Prefer verbatim extraction — copy values exactly as printed instead of normalizing dates or computing totals, because reformatted or derived values can't be anchored to the page and their boxes drift.
✓ Verified

Why the values are verifiable. Each returned value carries a 4-point bounding box re-anchored to the actual Vision-API symbols on the page — not an LLM's guess at coordinates — on a 0–1000 normalized grid. Because the location is real, the web dashboard can draw the box on the document, and you can check any extracted value by eye against the spot it was read from.

In short

Install the plugin once, set an API key, and you have a dependency-free way for Claude Code to extract structured, located fields from documents and keep them in a queryable workspace — no database, no vector store, no SDK. Single documents go through ocr; a folder goes through create sheet plus upload; questions are answered with free query reads; and every value can be traced back to where it sits on the page.

Related