> ## Documentation Index
> Fetch the complete documentation index at: https://docs.tensorlake.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Table Merging

> Automatically merge table fragments that span multiple pages or columns into a single unified table for LLM-ready output.

## Overview

PDFs are designed for printing, not data extraction. When a logical table spans multiple pages or is split across columns on a single page, most parsers output disconnected fragments — breaking the semantic integrity of the data and making it difficult for downstream LLMs and RAG pipelines to reason over.

Tensorlake's Agentic Table Merging reconstructs these fragments into a single coherent table by reasoning over content and context, not just geometry. Enable it with `table_merging=True` in your `ParsingOptions`.

## Enabling Table Merging

Set `table_merging=True` in your `ParsingOptions`:

<CodeGroup>
  ```python Python SDK theme={null}
  from tensorlake.documentai import DocumentAI
  from tensorlake.documentai.models import ParsingOptions

  doc_ai = DocumentAI(api_key="YOUR_TENSORLAKE_CLOUD_API_KEY")

  file_id = doc_ai.upload(path="document.pdf")

  parsing_options = ParsingOptions(
      table_merging=True,
  )

  parse_id = doc_ai.read(
      file_id=file_id,
      parsing_options=parsing_options,
  )

  result = doc_ai.wait_for_completion(parse_id)
  ```

  ```bash curl theme={null}
  curl --request POST \
    --url https://api.tensorlake.ai/documents/v2/parse \
    --header 'Authorization: Bearer ${TENSORLAKE_API_KEY}' \
    --header 'Content-Type: application/json' \
    --data '{
      "file_id": "file_XXX",
      "parsing_options": {
        "table_merging": true
      }
    }'
  ```
</CodeGroup>

## How It Works

Rather than relying on geometric position alone, an agent analyzes the content and context around each table fragment to decide whether it is a continuation of the previous one. For each candidate pair, the agent examines:

* The end of the previous table fragment
* The text in the gap between them (e.g. `"Page 14 of 92"`, `"(continued)"`, boilerplate disclaimers)
* The start of the next table fragment
* Whether column structures are compatible (same number of columns, matching or repeated headers)

This allows the agent to ignore irrelevant footer noise while correctly identifying continuation cues. Two merge scenarios are handled:

* **Cross-page merges** — tables that continue across one or more page breaks, often with repeated or noisy headers and footers
* **Same-page merges** — tables split into multiple columns on a single page (e.g. an alphabetical list split left/right) that logically belong together

## Output

When table merging is enabled, the parse result includes a `merged_tables` array. Each entry in the array represents a reconstructed table:

| Field               | Description                                                          |
| ------------------- | -------------------------------------------------------------------- |
| `merged_table_id`   | Unique identifier for the merged table (e.g. `cross_page_merge_1_3`) |
| `merged_table_html` | Full HTML representation of the unified table                        |
| `start_page`        | Page number where the first fragment was found                       |
| `end_page`          | Page number where the last fragment was found                        |
| `pages_merged`      | Number of pages spanned by the merged table                          |
| `summary`           | Human-readable summary of the merged table's content                 |
| `merge_actions`     | Details on the pages involved and target column count                |
| `merged_at`         | ISO 8601 timestamp of when the merge was performed                   |

### Example: cross-page merge

A financial table spanning three pages is merged into a single entry:

```json theme={null}
{
  "merged_table_id": "cross_page_merge_1_3",
  "merged_table_html": "<table>...</table>",
  "start_page": 1,
  "end_page": 3,
  "pages_merged": 3,
  "summary": "Financial results for the quarter and nine months ended September 30, 2025...",
  "merge_actions": {
    "pages": [1, 2, 3],
    "target_columns": 10
  },
  "merged_at": "2026-01-10T03:12:10.785866+00:00"
}
```

### Example: same-page column merge

A holdings table split into two columns on one page is unified into a single continuous structure:

```json theme={null}
{
  "merged_table_id": "same_page_merge_2_3",
  "merged_table_html": "<table>...</table>",
  "start_page": 2,
  "end_page": 2,
  "pages_merged": 1,
  "summary": "Both tables share the same column structure (Security, Shares, Value) and represent a continuous alphabetical list of stock holdings...",
  "merge_actions": {
    "pages": [2],
    "target_columns": null
  }
}
```

## Common Use Cases

* **Financial documents** — reconstruct multi-page income statements, balance sheets, and loan tables for accurate numeric reasoning
* **Research papers** — unify results tables that span pages so LLMs can compare rows and compute aggregates
* **Portfolio and fund reports** — merge holdings tables split across columns for reliable sector aggregation and exposure calculations
* **RAG pipelines** — produce coherent table chunks that improve retrieval quality and reduce hallucinations on questions that depend on full table context

## Related

* [Parsing Overview](/document-ingestion/parsing/read)
* [Parse Output](/document-ingestion/parsing/parse-output)
* [Cross-page Header Correction](/document-ingestion/parsing/header-correction)
