Overview
PDFs are designed for printing, not data extraction. When a logical table spans multiple pages or is split across columns on a single page, most parsers output disconnected fragments — breaking the semantic integrity of the data and making it difficult for downstream LLMs and RAG pipelines to reason over. Tensorlake’s Agentic Table Merging reconstructs these fragments into a single coherent table by reasoning over content and context, not just geometry. Enable it withtable_merging=True in your ParsingOptions.
Enabling Table Merging
Settable_merging=True in your ParsingOptions:
How It Works
Rather than relying on geometric position alone, an agent analyzes the content and context around each table fragment to decide whether it is a continuation of the previous one. For each candidate pair, the agent examines:- The end of the previous table fragment
- The text in the gap between them (e.g.
"Page 14 of 92","(continued)", boilerplate disclaimers) - The start of the next table fragment
- Whether column structures are compatible (same number of columns, matching or repeated headers)
- Cross-page merges — tables that continue across one or more page breaks, often with repeated or noisy headers and footers
- Same-page merges — tables split into multiple columns on a single page (e.g. an alphabetical list split left/right) that logically belong together
Output
When table merging is enabled, the parse result includes amerged_tables array. Each entry in the array represents a reconstructed table:
| Field | Description |
|---|---|
merged_table_id | Unique identifier for the merged table (e.g. cross_page_merge_1_3) |
merged_table_html | Full HTML representation of the unified table |
start_page | Page number where the first fragment was found |
end_page | Page number where the last fragment was found |
pages_merged | Number of pages spanned by the merged table |
summary | Human-readable summary of the merged table’s content |
merge_actions | Details on the pages involved and target column count |
merged_at | ISO 8601 timestamp of when the merge was performed |
Example: cross-page merge
A financial table spanning three pages is merged into a single entry:Example: same-page column merge
A holdings table split into two columns on one page is unified into a single continuous structure:Common Use Cases
- Financial documents — reconstruct multi-page income statements, balance sheets, and loan tables for accurate numeric reasoning
- Research papers — unify results tables that span pages so LLMs can compare rows and compute aggregates
- Portfolio and fund reports — merge holdings tables split across columns for reliable sector aggregation and exposure calculations
- RAG pipelines — produce coherent table chunks that improve retrieval quality and reduce hallucinations on questions that depend on full table context