Accuracy methodology

What 98% accuracy means.

ExtractInvoice reports headline accuracy on core invoice fields: vendor, dates, totals, currency, document type, invoice number, terms, and PO number when those fields are present in the labeled ground truth. Line items are measured separately because receipt modifiers and zero-amount options can otherwise distort the headline number.

Measured fields

Vendor name
Invoice number
Invoice date and due date
Subtotal, tax, total, and currency
Document type
Payment terms when present on the invoice
PO number when present on the invoice

How to read the claim

The headline number is a core field-level accuracy claim, not a promise that every document will be perfect. The benchmark runs below include totals, dates, vendor names, invoice numbers, currency, document type, terms, and PO numbers when present.

Line items are still extracted and tested, but priced sale rows and strict receipt-modifier reconstruction are reported as their own quality metrics. The product is designed around review: approve clean fields quickly, correct low-confidence fields before exporting, and keep the final accounting data under your control.

Current benchmark runs

These are reproducible internal spot checks from the labeled test corpus. The stress benchmark intentionally overrepresents files that are harder than clean digital PDFs, so it is useful for finding review cases rather than measuring only ideal uploads.

100-item coverage benchmark

98.5%

100 labeled invoices and receipts with stratified edge-case coverage

Core field accuracy; strict line-item accuracy was tracked separately at 89.6%

Admin Testing Center: seed=1293106513, count=100, stratified=true

Repeat coverage benchmark

99.2%

99 completed labeled invoices and receipts from the same seed

Core field accuracy; strict line-item accuracy was tracked separately at 90.5%

Admin Testing Center: seed=1293106513, count=100, stratified=true

What usually works best

Typed invoices, native PDFs, clear scans, and well-lit photos typically produce the strongest results. Standard vendor layouts also tend to produce higher confidence.

What might need review

These cases often still parse correctly. They are the cases where confidence scores and a quick human check matter most.

Handwritten invoices
Low-resolution scans or blurry photos
Unusual table layouts
Missing labels, cropped pages, or faded text
Invoices with multiple currencies or manual corrections

Try it against your own invoices.

The best test is your actual workflow, with the vendors and layouts you see every day.

Start free trial