Projects
/ project/Evals

Receipt Evals

Small eval-driven workflow for extracting receipt details and deciding which expenses should be reviewed.

Status
Selected project
Type
Evals
Link
External project

The question

How do you make an AI workflow more reliable before adding more complexity?

I started with a deliberately small receipt-review pipeline:

  1. extract_receipt_details(image_path) reads a receipt image and returns structured data.
  2. evaluate_receipt_for_audit(receipt_details) decides whether the expense needs review.

Why start small

The first goal is not a broad product surface. It is to understand failure modes. Each run saves extraction and audit JSON separately, repeated outputs are preserved, and labeled examples can be compared with a lightweight assessment helper.

The next step is a batch eval harness once the useful metrics are clear.

What it demonstrates

  • Structured output contracts with Pydantic
  • Image-to-data extraction
  • Explicit separation between extraction and business decisions
  • Ground-truth comparison
  • An eval-driven approach to iteration

The code is available on GitHub.