Skip to main content
Tensorlake’s Document AI delivers industry-leading accuracy on document parsing. We measure what matters: can downstream systems actually use this output? This page presents our comprehensive benchmarking methodology and results comparing Tensorlake against leading document parsing solutions.

Why These Benchmarks Matter

Traditional OCR metrics like Word Error Rate (WER) and Character Error Rate (CER) don’t predict production success. A document can achieve 99% text similarity while completely failing if:
  • Tables collapse into flat text, destroying data relationships
  • Reading order is scrambled, corrupting RAG context
  • Critical fields are missing (invoice totals, ID numbers)
  • Charts and figures are ignored, losing key visual information
We measure structural preservation and downstream usability because that’s what breaks in production.

Our Evaluation Framework

Two-Stage Methodology

We mirror real-world workflows with a two-stage evaluation process: Stage 1: Document → OCR (Structural Preservation)
  • Models generate Markdown/HTML output
  • Evaluated using TEDS (Tree Edit Distance Similarity)
  • Measures preservation of reading order, table integrity, and layout coherence
Stage 2: OCR → JSON (Downstream Usability)
  • Markdown passed through standardized LLM (GPT-4o) with predefined schemas
  • Evaluated using JSON F1 (Field-Level Precision and Recall)
  • Isolates how OCR quality impacts real extraction workflows
This methodology ensures fair comparisons by varying only the OCR models while keeping extraction constant. No vendor gets an advantage from proprietary extraction pipelines.

Key Metrics Explained

TEDS (Tree Edit Distance Similarity)

  • Compares predicted vs. ground-truth Markdown/HTML tree structures
  • Captures structural fidelity in tables and complex layouts
  • Widely adopted in OCRBench v2 and OmniDocBench
  • Answers: “Is this table still a table?” Not just “Is the text similar?”

JSON F1 Score

  • Precision: Correctness of extracted fields
  • Recall: Completeness of required field capture
  • F1: Harmonic mean balancing both
  • Answers: “Can automation use this data?” Not just “Is text present?”

Public Benchmark Results

Document Parsing (English) - OCRBench v2

Evaluated on 400 images from the public OCRBench v2 dataset, measuring both structural preservation and text accuracy: Key Finding: Tensorlake achieves the highest TEDS score, indicating superior structural preservation while maintaining competitive text accuracy. The gap between open-source and production-grade systems is substantial.

Table Parsing - OmniDocBench

Evaluated on 512 document images with tables from OmniDocBench (CVPR-accepted benchmark):
ModelTEDSTEDS-Structure only
Marker¹57.88%71.17%
Docling63.84%77.68%
Azure78.14%83.61%
Textract80.75%88.78%
Tensorlake86.79%90.62%
¹ Numbers from the officially published OmniDocBench repository. Key Finding: On complex, multi-page tables, Tensorlake leads with 86.79% TEDS and 90.62% Structure-only TEDS. Open-source solutions struggle to preserve table structure (sub-70% TEDS).

Enterprise Document Performance

Real-World Dataset (100 Pages)

We evaluated on 100 document pages spanning banking, retail, and insurance sectors. This represents actual production workloads: invoices with water damage, scanned contracts with skewed text, bank statements with multi-level tables. Key Findings:
  • Tensorlake achieves 91.7% F1—demonstrating superior OCR quality feeds better extraction
  • The gap between 91.7% and 68.9% F1 is massive: it’s 5 extra fields correctly extracted out of every 20
  • In production processing thousands of documents daily, this accuracy gap compounds into significant error reduction

Production Impact Example

For an insurance claims processor handling 10,000 documents per month:
  • At 85% F1: 1,500 documents require manual review
  • At 90% F1: 1,000 documents require manual review
  • At 91.7% F1 (Tensorlake): 830 documents require manual review
Result: Tensorlake cuts monthly manual reviews from 1,500 → 830 (a 45% reduction vs the 85% baseline).

Cost & Performance Comparison

Accuracy without affordability isn’t practical. Here’s the complete picture:
ProviderCost/1,000 PagesTEDSJSON F1
Docling (open-source)Free*63.3%68.9%
Marker (open-source)$671.1%71.2%
Azure Document Intelligence$1078.6%88.1%
AWS Textract$1581.0%88.4%
Tensorlake$1084.1%91.7%
*Free but requires self-hosting infrastructure and manual correction costs Value Proposition: Tensorlake delivers the highest accuracy at mid-tier pricing, matching Azure’s cost while exceeding both Azure and AWS quality.

Visual Comparison: Where Competitors Fail

Example: Contact Information Extraction

When parsing Section 21 (NOTICES) of a real estate contract: Azure:
  • Missing opening parenthesis in phone number
  • Two-column layout collapsed into confusing single column
AWS Textract:
  • Completely wrong phone number in buyer field (shows seller’s phone)
  • Buyer’s phone (123)456-7890 entirely missing
Tensorlake:
  • Perfect extraction of both phone numbers: (123)456-7890 and (456)789-1234
  • Two-column structure preserved with clear buyer/seller separation
  • All contact fields accurately captured
In legal documents, phone numbers are critical contact information. Errors like these cause compliance issues and workflow failures.

Why Tensorlake Wins: Multi-Modal Understanding

Documents communicate through more than text. Tensorlake’s multi-modal approach captures:

For RAG Applications

  • Chart summarization: Converts figures into descriptive text for retrieval
  • Visual content capture: Research paper’s Figure 3 becomes: “Bar chart showing 15% performance improvement across three benchmark datasets”
  • Preserved context: Reading order maintained for accurate semantic retrieval

For Workflow Automation

  • Signature detection: Identifies stamps, signatures, and annotations
  • Form understanding: Preserves spatial relationships in complex layouts
  • Table integrity: Multi-level tables maintain hierarchical structure

The Tensorlake Advantage

  1. Superior OCR Performance - Best-in-class recognition on degraded, scanned documents representing real-world conditions
  2. Reading Order Preservation - Ensures pipelines process documents in logical sequence (critical for RAG)
  3. Spatial Structure Integrity - Tables stay tables, forms stay forms
  4. Multi-Modal Parsing - Captures figures, charts, signatures, and annotations
  5. Rigorous Evaluation - Public benchmarks + private enterprise datasets

Ground Truth & Reproducibility

Public Datasets

  • OCRBench v2: We audited and corrected inconsistencies in published ground truth
  • OmniDocBench: CVPR-accepted benchmark, using v1.5 evaluation code

JSON Schema Generation

  • Initial schemas generated via Gemini Pro 2.5
  • Human reviewers audit and correct all gold standards
  • Ensures high-quality, unbiased evaluation

Reproducibility

To reproduce our table results:
  1. Generate Markdown outputs using models listed above
  2. Run evaluation from OmniDocBench repository
  3. Use document data with tables (512 images) with v1.5 code version
Benchmarks conducted using OCRBench v2 (400 images), OmniDocBench (512 table images), and proprietary enterprise dataset (100 pages) in October 2024. All results are reproducible using public datasets and standardized evaluation frameworks.