Document Parsing Benchmarks

Tensorlake’s Document AI delivers industry-leading accuracy on document parsing. We measure what matters: structural preservation of document layout and downstream usability of extracted data from documents, which is what often breaks in production. This page presents our comprehensive benchmarking methodology and results comparing Tensorlake against leading document parsing solutions.

Our Evaluation Framework

Two-Stage Methodology

We mirror real-world workflows with a two-stage evaluation process: Stage 1: Document Reading Abilities (OCR and Structural Preservation)

Models generate Markdown/HTML output
Evaluated using TEDS (Tree Edit Distance Similarity).
Captures predicted vs. ground-truth Markdown, structural fidelity in tables and complex layouts
Answers: “Is this table still a table?” Not just “Is the text similar?”

Stage 2: Structured JSON Extraction (Downstream Usability)

Markdown passed through standardized LLM (GPT-4o) with predefined schemas
Evaluated using JSON F1 (Field-Level Precision and Recall)
Isolates how OCR quality impacts real extraction workflows
We measure precision to measure correctness of extracted fields and recall to measure completeness of required field capture.
F1 score combines both metrics for a holistic view.
Answers: “Can automation use this data?” Not just “Is text present?”

This methodology ensures fair comparisons by varying only the OCR models while keeping extraction constant.

Document Reading Benchmark Results

Datasets

OCRBench v2: 400 diverse document images (invoices, contracts, forms), measuring overall structural and text accuracy. The data was audited to ensure consistency in ground truth.
OmniDocBench: 512 document images with complex tables, focusing on table parsing capabilities. We are using v1.5 evaluation code from the official repository.

Key Finding: Tensorlake achieves the highest TEDS score, indicating superior structural preservation while maintaining competitive text accuracy. The gap between open-source and production-grade systems is substantial.

Table Parsing

Evaluated on 512 document images with tables from OmniDocBench (CVPR-accepted benchmark):

¹ Marker’s number is from the officially published OmniDocBench repository. Key Finding: On complex, multi-page tables, Tensorlake leads with 86.79% TEDS. Open-source solutions struggle to preserve table structure (sub-70% TEDS).

Structured Extraction Benchmark Results

Datasets used:

We collected 100 document pages of proprietary data spanning banking, retail, and insurance sectors. This represents actual production workloads: invoices with water damage, scanned contracts with skewed text, bank statements with multi-level tables.
Ground truth schemas were generated using Gemini Pro 2.5 and audited by human reviewers to ensure accuracy.

Key Findings:

Tensorlake achieves 91.7% F1—demonstrating superior OCR quality feeds better extraction
The gap between 91.7% and 68.9% F1 is massive: it’s 5 extra fields correctly extracted out of every 20
In production processing thousands of documents daily, this accuracy gap compounds into significant error reduction

Production Impact Example

For an insurance claims processor handling 10,000 documents per month:

At 85% F1: 1,500 documents require manual review
At 90% F1: 1,000 documents require manual review
At 91.7% F1 (Tensorlake): 830 documents require manual review

Result: Tensorlake cuts monthly manual reviews from 1,500 → 830 (a 45% reduction vs the 85% baseline).

Cost & Performance Comparison

Accuracy without affordability isn’t practical. Here’s the complete picture:

Provider	Cost/1,000 Pages	TEDS	JSON F1
Docling (open-source)	Free*	63.3%	68.9%
Marker (open-source)	Free*	71.1%	71.2%
Azure Document Intelligence	$10	78.6%	88.1%
AWS Textract	$15	81.0%	88.4%
Tensorlake	$10	84.1%	91.7%

*Free but requires self-hosting infrastructure

Visual Comparison: Where Competitors Fail

Example: Contact Information Extraction

When parsing Section 21 (NOTICES) of a real estate contract:

Azure: Missing opening parenthesis in phone number. Two-column layout collapsed into confusing single column. AWS Textract: Completely wrong phone number in buyer field (shows seller’s phone). Buyer’s phone (123)456-7890 entirely missing. Tensorlake: Perfect extraction of both phone numbers: (123)456-7890 and (456)789-1234 Two-column structure preserved with clear buyer/seller separation. All contact fields accurately captured. In legal documents, phone numbers are critical contact information. Errors like these cause compliance issues and workflow failures.

Reproducibility

To reproduce our table results:

Generate Markdown outputs using models listed above
Run evaluation from OmniDocBench repository
Use document data with tables (512 images) with v1.5 code version

Deep Dive: Full Benchmark Analysis

Read our comprehensive blog post: The Document Parsing Benchmark That Actually Matters The blog includes:

Detailed failure mode analysis
Additional benchmark datasets
Technical methodology deep-dive
Production deployment case studies
Code examples and reproducibility guides

Benchmarks conducted using OCRBench v2 (400 images), OmniDocBench (512 table images), and proprietary enterprise dataset (100 pages) in October 2024. All results are reproducible using public datasets and standardized evaluation frameworks.

Tensorlake

Applications

Document Ingestion

FAQ

Open Source

Document Parsing Benchmarks

Our Evaluation Framework

Two-Stage Methodology

Document Reading Benchmark Results

Datasets

Table Parsing

Structured Extraction Benchmark Results

Datasets used:

Production Impact Example

Cost & Performance Comparison

Visual Comparison: Where Competitors Fail

Example: Contact Information Extraction

Reproducibility

Deep Dive: Full Benchmark Analysis

Tensorlake

Applications

Document Ingestion

FAQ

Open Source

​Our Evaluation Framework

​Two-Stage Methodology

​Document Reading Benchmark Results

​Datasets

​Table Parsing

​Structured Extraction Benchmark Results

​Datasets used:

​Production Impact Example

​Cost & Performance Comparison

​Visual Comparison: Where Competitors Fail

​Example: Contact Information Extraction

​Reproducibility

​Deep Dive: Full Benchmark Analysis

Our Evaluation Framework

Two-Stage Methodology

Document Reading Benchmark Results

Datasets

Table Parsing

Structured Extraction Benchmark Results

Datasets used:

Production Impact Example

Cost & Performance Comparison

Visual Comparison: Where Competitors Fail

Example: Contact Information Extraction

Reproducibility

Deep Dive: Full Benchmark Analysis