Create Datasets

POST

documents

datasets

Create a new dataset. They are used in conjunction with the Structurued Data or Documetn Parsing API requests. Behavior of datasets when used with the APIs: * Structured Extraction API - Structured Data from files are automatically inserted into datasets. A JSON schema is required to guide the extraction. * Document Parsing API - Chunks of parsed documents from the Document Parsing API automatically.

curl --request POST \
  --url https://api.tensorlake.ai/documents/v1/datasets \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "name": "<string>",
  "description": "<string>",
  "settings": {
    "tableParsingMode": "tsr",
    "tableOutputMode": "markdown",
    "tableSummarization": true,
    "tableSummarizationPrompt": "none",
    "figureSummarization": true,
    "figureSummarizationPrompt": "none",
    "formDetectionMode": "vlm",
    "chunkStrategy": null,
    "jsonSchema": "<any>",
    "structuredExtractionPrompt": "none",
    "modelProvider": null,
    "deliverWebhook": false,
    "detectSignature": false,
    "skewCorrection": false,
    "disableLayoutDetection": false,
    "structuredExtractionSkipOcr": false,
    "detectStrikethrough": false
  }
}'

{
  "id": "<string>"
}

Create an ingestion workflow for structured extraction or document parsing.

A dataset is a collection of settings that help with organizing documents from the same domain and enable focused document intelligence.

The dataset’s name must be unique.

Your data is NOT sent to a third party service(OpenAI, Anthropic, etc), and uses our own models to parse the document.

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

Response

200

application/json

Create a new dataset. Reference the name to insert structured data from documents extracted by the Structured Extraction API automatically.

The response is of type object.

Configure Webhooks Ingest New Files

curl --request POST \
  --url https://api.tensorlake.ai/documents/v1/datasets \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "name": "<string>",
  "description": "<string>",
  "settings": {
    "tableParsingMode": "tsr",
    "tableOutputMode": "markdown",
    "tableSummarization": true,
    "tableSummarizationPrompt": "none",
    "figureSummarization": true,
    "figureSummarizationPrompt": "none",
    "formDetectionMode": "vlm",
    "chunkStrategy": null,
    "jsonSchema": "<any>",
    "structuredExtractionPrompt": "none",
    "modelProvider": null,
    "deliverWebhook": false,
    "detectSignature": false,
    "skewCorrection": false,
    "disableLayoutDetection": false,
    "structuredExtractionSkipOcr": false,
    "detectStrikethrough": false
  }
}'

{
  "id": "<string>"
}

API Documentation

Document Ingestion

Authorizations

Body

Response