Create Datasets

POST

documents

datasets

curl --request POST \
  --url https://api.tensorlake.ai/documents/v1/datasets \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "description": "<string>",
  "name": "<string>",
  "settings": {
    "chunkStrategy": null,
    "deliverWebhook": false,
    "detectSignature": false,
    "detectStrikethrough": false,
    "disableLayoutDetection": false,
    "figureSummarization": true,
    "figureSummarizationPrompt": "none",
    "formDetectionMode": "vlm",
    "jsonSchema": "<any>",
    "modelProvider": null,
    "skewCorrection": false,
    "structuredExtractionPrompt": "none",
    "structuredExtractionSkipOcr": false,
    "tableOutputMode": "json",
    "tableParsingMode": "tsr",
    "tableSummarization": true,
    "tableSummarizationPrompt": "none"
  }
}'

{
  "id": "<string>"
}

Create an ingestion workflow for structured extraction or document parsing.

A dataset is a collection of settings that help with organizing documents from the same domain and enable focused document intelligence.

The dataset’s name must be unique.

Your data is NOT sent to a third party service(OpenAI, Anthropic, etc), and uses our own models to parse the document.

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

Response

200

application/json

Create a new dataset. Reference the name to insert structured data from documents extracted by the Structured Extraction API automatically.

The response is of type object.

curl --request POST \
  --url https://api.tensorlake.ai/documents/v1/datasets \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "description": "<string>",
  "name": "<string>",
  "settings": {
    "chunkStrategy": null,
    "deliverWebhook": false,
    "detectSignature": false,
    "detectStrikethrough": false,
    "disableLayoutDetection": false,
    "figureSummarization": true,
    "figureSummarizationPrompt": "none",
    "formDetectionMode": "vlm",
    "jsonSchema": "<any>",
    "modelProvider": null,
    "skewCorrection": false,
    "structuredExtractionPrompt": "none",
    "structuredExtractionSkipOcr": false,
    "tableOutputMode": "json",
    "tableParsingMode": "tsr",
    "tableSummarization": true,
    "tableSummarizationPrompt": "none"
  }
}'

{
  "id": "<string>"
}

POST

documents

datasets

curl --request POST \
  --url https://api.tensorlake.ai/documents/v1/datasets \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "description": "<string>",
  "name": "<string>",
  "settings": {
    "chunkStrategy": null,
    "deliverWebhook": false,
    "detectSignature": false,
    "detectStrikethrough": false,
    "disableLayoutDetection": false,
    "figureSummarization": true,
    "figureSummarizationPrompt": "none",
    "formDetectionMode": "vlm",
    "jsonSchema": "<any>",
    "modelProvider": null,
    "skewCorrection": false,
    "structuredExtractionPrompt": "none",
    "structuredExtractionSkipOcr": false,
    "tableOutputMode": "json",
    "tableParsingMode": "tsr",
    "tableSummarization": true,
    "tableSummarizationPrompt": "none"
  }
}'

{
  "id": "<string>"
}

Create an ingestion workflow for structured extraction or document parsing.

A dataset is a collection of settings that help with organizing documents from the same domain and enable focused document intelligence.

The dataset’s name must be unique.

Your data is NOT sent to a third party service(OpenAI, Anthropic, etc), and uses our own models to parse the document.

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

Response

200

application/json

Create a new dataset. Reference the name to insert structured data from documents extracted by the Structured Extraction API automatically.

The response is of type object.

curl --request POST \
  --url https://api.tensorlake.ai/documents/v1/datasets \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "description": "<string>",
  "name": "<string>",
  "settings": {
    "chunkStrategy": null,
    "deliverWebhook": false,
    "detectSignature": false,
    "detectStrikethrough": false,
    "disableLayoutDetection": false,
    "figureSummarization": true,
    "figureSummarizationPrompt": "none",
    "formDetectionMode": "vlm",
    "jsonSchema": "<any>",
    "modelProvider": null,
    "skewCorrection": false,
    "structuredExtractionPrompt": "none",
    "structuredExtractionSkipOcr": false,
    "tableOutputMode": "json",
    "tableParsingMode": "tsr",
    "tableSummarization": true,
    "tableSummarizationPrompt": "none"
  }
}'

{
  "id": "<string>"
}

Authorizations

Body

Response

API Documentation

Document Ingestion

Create Datasets

Authorizations

Body

Response