Skip to main content
Tensorlake’s Document AI delivers industry-leading accuracy on document parsing. We measure what matters: structural preservation of document layout and downstream usability of extracted data from documents, which is what often breaks in production. This page presents our comprehensive benchmarking methodology and results comparing Tensorlake against leading document parsing solutions.

Our Evaluation Framework

Two-Stage Methodology

We mirror real-world workflows with a two-stage evaluation process: Stage 1: Document Reading Abilities (OCR and Structural Preservation)
  • Models generate Markdown/HTML output
  • Evaluated using TEDS (Tree Edit Distance Similarity).
  • Captures predicted vs. ground-truth Markdown, structural fidelity in tables and complex layouts
  • Answers: “Is this table still a table?” Not just “Is the text similar?”
Stage 2: Structured JSON Extraction (Downstream Usability)
  • Markdown passed through standardized LLM (GPT-4o) with predefined schemas
  • Evaluated using JSON F1 (Field-Level Precision and Recall)
  • Isolates how OCR quality impacts real extraction workflows
  • We measure precision to measure correctness of extracted fields and recall to measure completeness of required field capture.
  • F1 score combines both metrics for a holistic view.
  • Answers: “Can automation use this data?” Not just “Is text present?”
This methodology ensures fair comparisons by varying only the OCR models while keeping extraction constant.

Document Reading Benchmark Results

Datasets

  • OCRBench v2: 400 diverse document images (invoices, contracts, forms), measuring overall structural and text accuracy. The data was audited to ensure consistency in ground truth.
  • OmniDocBench: 512 document images with complex tables, focusing on table parsing capabilities. We are using v1.5 evaluation code from the official repository.
Key Finding: Tensorlake achieves the highest TEDS score, indicating superior structural preservation while maintaining competitive text accuracy. The gap between open-source and production-grade systems is substantial.

Table Parsing

Evaluated on 512 document images with tables from OmniDocBench (CVPR-accepted benchmark): ¹ Marker’s number is from the officially published OmniDocBench repository. Key Finding: On complex, multi-page tables, Tensorlake leads with 86.79% TEDS. Open-source solutions struggle to preserve table structure (sub-70% TEDS).

Structured Extraction Benchmark Results

Datasets used:

  • We collected 100 document pages of proprietary data spanning banking, retail, and insurance sectors. This represents actual production workloads: invoices with water damage, scanned contracts with skewed text, bank statements with multi-level tables.
  • Ground truth schemas were generated using Gemini Pro 2.5 and audited by human reviewers to ensure accuracy.
Key Findings:
  • Tensorlake achieves 91.7% F1—demonstrating superior OCR quality feeds better extraction
  • The gap between 91.7% and 68.9% F1 is massive: it’s 5 extra fields correctly extracted out of every 20
  • In production processing thousands of documents daily, this accuracy gap compounds into significant error reduction

Production Impact Example

For an insurance claims processor handling 10,000 documents per month:
  • At 85% F1: 1,500 documents require manual review
  • At 90% F1: 1,000 documents require manual review
  • At 91.7% F1 (Tensorlake): 830 documents require manual review
Result: Tensorlake cuts monthly manual reviews from 1,500 → 830 (a 45% reduction vs the 85% baseline).

Cost & Performance Comparison

Accuracy without affordability isn’t practical. Here’s the complete picture:
ProviderCost/1,000 PagesTEDSJSON F1
Docling (open-source)Free*63.3%68.9%
Marker (open-source)Free*71.1%71.2%
Azure Document Intelligence$1078.6%88.1%
AWS Textract$1581.0%88.4%
Tensorlake$1084.1%91.7%
*Free but requires self-hosting infrastructure

Visual Comparison: Where Competitors Fail

Example: Contact Information Extraction

When parsing Section 21 (NOTICES) of a real estate contract: Azure: Missing opening parenthesis in phone number. Two-column layout collapsed into confusing single column. AWS Textract: Completely wrong phone number in buyer field (shows seller’s phone). Buyer’s phone (123)456-7890 entirely missing. Tensorlake: Perfect extraction of both phone numbers: (123)456-7890 and (456)789-1234 Two-column structure preserved with clear buyer/seller separation. All contact fields accurately captured. In legal documents, phone numbers are critical contact information. Errors like these cause compliance issues and workflow failures.

Reproducibility

To reproduce our table results:
  1. Generate Markdown outputs using models listed above
  2. Run evaluation from OmniDocBench repository
  3. Use document data with tables (512 images) with v1.5 code version

Deep Dive: Full Benchmark Analysis

Read our comprehensive blog post: The Document Parsing Benchmark That Actually Matters The blog includes:
  • Detailed failure mode analysis
  • Additional benchmark datasets
  • Technical methodology deep-dive
  • Production deployment case studies
  • Code examples and reproducibility guides
Benchmarks conducted using OCRBench v2 (400 images), OmniDocBench (512 table images), and proprietary enterprise dataset (100 pages) in October 2024. All results are reproducible using public datasets and standardized evaluation frameworks.