Skip to main content
Build an invoice validation agent that reads invoices and MSAs, validates invoices against contract terms, and produces a structured decision. Use Tensorlake’s Document Ingestion API behind tool functions for parsing invoice and MSA documents. Use the OpenAI Agents SDK to run the agentic loop and decide which tools to call.
Colab examples are coming soon. This page describes the workflow you will implement. For Agents SDK usage and tool patterns, see the OpenAI Agents SDK docs.

Inputs and output

  • Inputs: invoice PDF, MSA PDF
  • Output: validation decision JSON, list of issues, citations to source locations in the documents

Step 1: Define schemas

Keep schemas small and version them in code. The goal is a stable interface for downstream validation logic.

Invoice fields

from pydantic import BaseModel, Field

class InvoiceFields(BaseModel):
    invoice_number: str | None = Field(default=None, description="Invoice number")
    invoice_date: str | None = Field(default=None, description="Invoice date")
    vendor_name: str | None = Field(default=None, description="Vendor or supplier name")
    customer_name: str | None = Field(default=None, description="Customer name")
    currency: str | None = Field(default=None, description="Currency code such as USD")
    subtotal: float | None = Field(default=None, description="Subtotal amount")
    tax: float | None = Field(default=None, description="Tax amount")
    total: float | None = Field(default=None, description="Total amount")
    billing_period_start: str | None = Field(default=None, description="Billing period start date")
    billing_period_end: str | None = Field(default=None, description="Billing period end date")
    po_number: str | None = Field(default=None, description="Purchase order number if present")

MSA terms

from pydantic import BaseModel, Field

class MsaTerms(BaseModel):
    supplier_name: str | None = Field(default=None, description="Supplier name in the MSA")
    customer_name: str | None = Field(default=None, description="Customer name in the MSA")
    effective_date: str | None = Field(default=None, description="MSA effective date")
    termination_date: str | None = Field(default=None, description="MSA termination date if present")
    payment_terms: str | None = Field(default=None, description="Net terms or payment terms")
    currency: str | None = Field(default=None, description="Contract currency if specified")
    rate_card_summary: str | None = Field(default=None, description="Summary of rates and billing model")
    requires_po: bool | None = Field(default=None, description="Whether a PO is required for invoicing")

Step 2: Define document ingestion tools

Define tool functions that call Tensorlake extraction and return structured fields plus citations. Your agent will call these tools as needed.
from tensorlake.documentai import DocumentAI, StructuredExtractionOptions

doc_ai = DocumentAI()

def parse_invoice(file_url: str) -> dict:
    parse_id = doc_ai.extract(
        file_url=file_url,
        structured_extraction_options=[
            StructuredExtractionOptions(schema_name="InvoiceFields", json_schema=InvoiceFields),
        ],
    )
    result = doc_ai.wait_for_completion(parse_id)
    return {
        "invoice": result.structured_data[0].data,
        "citations": result.pages,
        "parse_id": parse_id,
    }

def parse_msa(file_url: str) -> dict:
    parse_id = doc_ai.extract(
        file_url=file_url,
        structured_extraction_options=[
            StructuredExtractionOptions(schema_name="MsaTerms", json_schema=MsaTerms),
        ],
    )
    result = doc_ai.wait_for_completion(parse_id)
    return {
        "msa": result.structured_data[0].data,
        "citations": result.pages,
        "parse_id": parse_id,
    }

Step 3: Validate deterministically

Write validations as code. Keep them explicit and testable. Common checks:
  • Vendor and customer match the MSA
  • Currency matches the contract
  • Billing period dates are within the contract term
  • PO number present when required
  • Totals are consistent with subtotal and tax
  • Line items and rates match the rate card rules you enforce
def validate_invoice(invoice: dict, msa: dict) -> dict:
    issues: list[dict] = []

    def add_issue(code: str, message: str, field: str | None = None) -> None:
        issues.append({"code": code, "message": message, "field": field})

    if msa.get("requires_po") is True and not invoice.get("po_number"):
        add_issue("MISSING_PO", "PO number is required by the MSA", "po_number")

    if msa.get("currency") and invoice.get("currency") and msa["currency"] != invoice["currency"]:
        add_issue("CURRENCY_MISMATCH", "Invoice currency does not match MSA currency", "currency")

    if msa.get("supplier_name") and invoice.get("vendor_name"):
        if msa["supplier_name"].lower() not in invoice["vendor_name"].lower():
            add_issue("VENDOR_MISMATCH", "Invoice vendor does not match MSA supplier", "vendor_name")

    status = "valid" if len(issues) == 0 else "needs_review"
    return {"status": status, "issues": issues}

Step 4: Orchestrate with OpenAI Agents SDK

Register the ingestion tools and validation logic as agent tools. Provide the agent with clear instructions and let it decide what it needs to call.
# Outline for an Agents SDK implementation
# 1. Define tools
#    - parse_invoice
#    - parse_msa
#    - validate_invoice
#
# 2. Create an agent with instructions
#    - validate invoices against MSAs
#    - call parsing tools to extract fields
#    - call validate_invoice for deterministic checks
#    - return a structured decision with issues and citations
#
# 3. Run the agent
#    - input: invoice_url and msa_url
#    - output: decision JSON

Citations and auditability

Tensorlake returns layout information that includes page numbers and bounding boxes for extracted content. Use citations to show evidence for extracted fields, attach source references to audit logs, and speed up exception review.
Need help building your first agent workflow? Join our Slack Community.