Proof hub / Document data sample

Turn a PDF packet into clean spreadsheet rows and a review queue.

Goal: each invoice, statement, form, or order packet becomes a clean row set with source document IDs, page numbers, line items, totals, missing-field flags, and a handoff note.

1. Lock the field map

The first pass should define the document type, required columns, source document ID, page reference, vendor or customer field, date field, total field, line-item columns, and review rules for missing or inconsistent values.

2. Use a stable row shape

{
  "document_id": "INV-1001",
  "page": 1,
  "vendor": "Northwind Supplies",
  "date": "2026-05-01",
  "purchase_order": "PO-7781",
  "item_code": "SKU-CLN-25",
  "description": "Cleaner concentrate",
  "quantity": 4,
  "unit_price": 250.00,
  "line_total": 1000.00,
  "tax": 80.50,
  "document_total": 1280.50,
  "review_status": "ready"
}

3. Build the first paid slice

  • Process one sample packet or one recurring document type.
  • Return clean CSV or Excel-ready rows with source references.
  • Separate ready rows from rows that need buyer review.
  • Compare line items, tax, and document totals where those fields exist.
  • Deliver a short handoff note so the buyer can approve the next batch.

4. Acceptance checks

  • Every row includes document ID and page reference.
  • Missing vendor, date, purchase order, or total is visible in the review queue.
  • Line-item totals are checked against the declared document total.
  • Duplicate document IDs update the same document group instead of creating confusion.