> ## Documentation Index
> Fetch the complete documentation index at: https://docs.tensorlake.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Key-Value Extraction

> Template-free extraction of structured field data from forms — text inputs, checkboxes, radio buttons, dropdowns, and signature lines.

## Overview

Forms are everywhere in enterprise documents — loan applications, insurance claims, medical surveys, compliance questionnaires. But processing them at scale is hard: layouts vary, fields shift position, and content is often mixed with tables, text, and illustrations on the same page.

Tensorlake's Agentic Key-Value Extraction solves this with a two-stage pipeline: it first detects whether a page component is actually a form (skipping expensive vision models on non-form content), then extracts every field into structured JSON with its name, type, value, and an optional box ID. No templates, no coordinate mapping, no per-form configuration.

Enable it with `key_value_extraction=True` in your `EnrichmentOptions`.

## Enabling Key-Value Extraction

Set `key_value_extraction=True` in your `EnrichmentOptions`:

<CodeGroup>
  ```python Python SDK theme={null}
  from tensorlake.documentai import DocumentAI
  from tensorlake.documentai.models.options import EnrichmentOptions

  doc_ai = DocumentAI(api_key="YOUR_TENSORLAKE_CLOUD_API_KEY")

  file_id = doc_ai.upload(path="form.pdf")

  enrichment_options = EnrichmentOptions(
      key_value_extraction=True,
  )

  parse_id = doc_ai.read(
      file_id=file_id,
      enrichment_options=enrichment_options,
  )

  result = doc_ai.wait_for_completion(parse_id)
  ```

  ```bash curl theme={null}
  curl --request POST \
    --url https://api.tensorlake.ai/documents/v2/parse \
    --header 'Authorization: Bearer ${TENSORLAKE_API_KEY}' \
    --header 'Content-Type: application/json' \
    --data '{
      "file_id": "file_XXX",
      "enrichment_options": {
        "key_value_extraction": true
      }
    }'
  ```
</CodeGroup>

## How It Works

### Stage 1 — Form Detection

When Tensorlake encounters a layout component, a lightweight vision model first determines whether it is actually a form. Non-form content (tables, text blocks, illustrations) is skipped immediately, so expensive extraction models are only invoked on pages or regions that contain real form fields. This keeps costs low and processing fast.

### Stage 2 — Agentic Field Extraction

Once a form is identified, the agent extracts its fields by reasoning about:

* **Multi-field patterns** — grouping related fields such as address components or checkbox groups
* **Context** — inferring field purpose from surrounding text and document structure
* **Visual cues** — recognizing checkboxes, radio buttons, and text boxes by appearance
* **Spatial relationships** — resolving which labels correspond to which input fields

## Supported Field Types

| Type           | Description                           |
| -------------- | ------------------------------------- |
| `text`         | Free-text input fields                |
| `checkbox`     | Boolean tick boxes (`true` / `false`) |
| `radio button` | Single-select option groups           |
| `dropdown`     | Select menus with a chosen value      |
| `signature`    | Signature line fields                 |

## Output

Each extracted form produces a JSON array of field objects:

| Field        | Description                                                               |
| ------------ | ------------------------------------------------------------------------- |
| `box_id`     | Optional reference ID linking the field back to a labeled box in the form |
| `field_name` | Label or purpose of the field (e.g. `"Federal income tax withheld"`)      |
| `type`       | Input type (e.g. `"text"`, `"checkbox"`)                                  |
| `value`      | Current content of the field                                              |

### Example — W-2 form

```json theme={null}
[
  {
    "box_id": "a",
    "field_name": "Employee's social security number",
    "type": "text",
    "value": "123-45-6789"
  },
  {
    "box_id": "b",
    "field_name": "Employer identification number (EIN)",
    "type": "text",
    "value": "98-7654321"
  },
  {
    "box_id": "1",
    "field_name": "Wages, tips, other compensation",
    "type": "text",
    "value": "85,000.00"
  },
  {
    "box_id": "2",
    "field_name": "Federal income tax withheld",
    "type": "text",
    "value": "12,750.00"
  },
  {
    "box_id": "c",
    "field_name": "Employer's name, address, and ZIP code",
    "type": "text",
    "value": "ABC Technologies Inc. 1234 Innovation Drive San Francisco, CA 94105"
  }
]
```

The same output is also available as readable Markdown:

```
[a] **Employee's social security number** (text): 123-45-6789
[b] **Employer identification number (EIN)** (text): 98-7654321
[1] **Wages, tips, other compensation** (text): 85,000.00
[2] **Federal income tax withheld** (text): 12,750.00
[c] **Employer's name, address, and ZIP code** (text): ABC Technologies Inc. ...
```

## Common Use Cases

* **Loan and mortgage applications** — extract applicant data, income fields, and declaration checkboxes without per-lender templates
* **Insurance claims** — pull policy numbers, claimant details, and coverage selections from variable claim form layouts
* **Medical surveys and intake forms** — capture patient responses, checkbox selections, and consent signatures
* **Tax documents** — extract labeled box values from W-2s, 1099s, and other structured government forms
* **Compliance questionnaires** — process due-diligence and KYC forms across counterparties with different layouts

## Related

* [Parsing Overview](/document-ingestion/parsing/read)
* [Parse Output](/document-ingestion/parsing/parse-output)
* [Structured Data Extraction](/document-ingestion/parsing/structured-extraction)
* [Chart Extraction](/document-ingestion/parsing/chart-extraction)
