POST
/
documents
/
v1
/
datasets
curl --request POST \
  --url https://api.tensorlake.ai/documents/v1/datasets \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "description": "<string>",
  "name": "<string>",
  "settings": {
    "chunkStrategy": null,
    "deliverWebhook": "false",
    "figureSummarization": true,
    "figureSummarizationPrompt": "none",
    "formDetectionMode": "vlm",
    "jsonSchema": "<any>",
    "modelProvider": null,
    "structuredExtractionPrompt": "none",
    "tableOutputMode": "json",
    "tableParsingMode": "tsr",
    "tableSummarization": true,
    "tableSummarizationPrompt": "none"
  }
}'
{
  "id": "<string>"
}

Create an ingestion workflow for structured extraction or document parsing.

A dataset is a collection of settings that help with organizing documents from the same domain and enable focused document intelligence.

The dataset’s name must be unique.

Your data is NOT sent to a third party service(OpenAI, Anthropic, etc), and uses our own models to parse the document.

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
name
string
required
settings
object
required
description
string | null

Response

200
application/json
Create a new dataset. Reference the name to insert structured data from documents extracted by the Structured Extraction API automatically.
id
string
required