Tensorlake Documentation

Overview

PDFs are designed for printing, not data extraction. When a logical table spans multiple pages or is split across columns on a single page, most parsers output disconnected fragments — breaking the semantic integrity of the data and making it difficult for downstream LLMs and RAG pipelines to reason over. Tensorlake’s Agentic Table Merging reconstructs these fragments into a single coherent table by reasoning over content and context, not just geometry. Enable it with table_merging=True in your ParsingOptions.

Enabling Table Merging

Set table_merging=True in your ParsingOptions:

from tensorlake.documentai import DocumentAI
from tensorlake.documentai.models import ParsingOptions

doc_ai = DocumentAI(api_key="YOUR_TENSORLAKE_CLOUD_API_KEY")

file_id = doc_ai.upload(path="document.pdf")

parsing_options = ParsingOptions(
    table_merging=True,
)

parse_id = doc_ai.read(
    file_id=file_id,
    parsing_options=parsing_options,
)

result = doc_ai.wait_for_completion(parse_id)

How It Works

Rather than relying on geometric position alone, an agent analyzes the content and context around each table fragment to decide whether it is a continuation of the previous one. For each candidate pair, the agent examines:

The end of the previous table fragment
The text in the gap between them (e.g. "Page 14 of 92", "(continued)", boilerplate disclaimers)
The start of the next table fragment
Whether column structures are compatible (same number of columns, matching or repeated headers)

This allows the agent to ignore irrelevant footer noise while correctly identifying continuation cues. Two merge scenarios are handled:

Cross-page merges — tables that continue across one or more page breaks, often with repeated or noisy headers and footers
Same-page merges — tables split into multiple columns on a single page (e.g. an alphabetical list split left/right) that logically belong together

Output

When table merging is enabled, the parse result includes a merged_tables array. Each entry in the array represents a reconstructed table:

Field	Description
`merged_table_id`	Unique identifier for the merged table (e.g. `cross_page_merge_1_3`)
`merged_table_html`	Full HTML representation of the unified table
`start_page`	Page number where the first fragment was found
`end_page`	Page number where the last fragment was found
`pages_merged`	Number of pages spanned by the merged table
`summary`	Human-readable summary of the merged table’s content
`merge_actions`	Details on the pages involved and target column count
`merged_at`	ISO 8601 timestamp of when the merge was performed

Example: cross-page merge

A financial table spanning three pages is merged into a single entry:

{
  "merged_table_id": "cross_page_merge_1_3",
  "merged_table_html": "<table>...</table>",
  "start_page": 1,
  "end_page": 3,
  "pages_merged": 3,
  "summary": "Financial results for the quarter and nine months ended September 30, 2025...",
  "merge_actions": {
    "pages": [1, 2, 3],
    "target_columns": 10
  },
  "merged_at": "2026-01-10T03:12:10.785866+00:00"
}

Example: same-page column merge

A holdings table split into two columns on one page is unified into a single continuous structure:

{
  "merged_table_id": "same_page_merge_2_3",
  "merged_table_html": "<table>...</table>",
  "start_page": 2,
  "end_page": 2,
  "pages_merged": 1,
  "summary": "Both tables share the same column structure (Security, Shares, Value) and represent a continuous alphabetical list of stock holdings...",
  "merge_actions": {
    "pages": [2],
    "target_columns": null
  }
}

Common Use Cases

Financial documents — reconstruct multi-page income statements, balance sheets, and loan tables for accurate numeric reasoning
Research papers — unify results tables that span pages so LLMs can compare rows and compute aggregates
Portfolio and fund reports — merge holdings tables split across columns for reliable sector aggregation and exposure calculations
RAG pipelines — produce coherent table chunks that improve retrieval quality and reduce hallucinations on questions that depend on full table context

​Overview

​Enabling Table Merging

​How It Works

​Output

​Example: cross-page merge

​Example: same-page column merge

​Common Use Cases

​Related