Find your dataset by name
curl --request GET \
--url https://api.tensorlake.ai/documents/v1/datasets/{dataset_name} \
--header 'Authorization: Bearer <token>'
{
"analytics": {
"totalErrorJobs": 1,
"totalJobs": 1,
"totalPendingJobs": 1,
"totalProcessingJobs": 1,
"totalSuccessfulJobs": 1
},
"createdAt": "<string>",
"description": "<string>",
"id": "<string>",
"jobs": {
"hasMore": true,
"items": [
{
"createdAt": "<string>",
"errorMessage": "<string>",
"fileId": "<string>",
"fileName": "<string>",
"id": "<string>",
"outputsUrl": "<string>",
"status": "failure",
"updatedAt": "<string>"
}
],
"nextCursor": "<string>",
"prevCursor": "<string>"
},
"name": "<string>",
"settings": {
"chunkStrategy": null,
"deliverWebhook": "false",
"figureSummarization": true,
"figureSummarizationPrompt": "none",
"formDetectionMode": "vlm",
"jsonSchema": "<any>",
"modelProvider": null,
"structuredExtractionPrompt": "none",
"tableOutputMode": "json",
"tableParsingMode": "tsr",
"tableSummarization": true,
"tableSummarizationPrompt": "none"
},
"status": "idle"
}
Modify the settings or metadata for a dataset. Note that changes to a dataset’s settings are not retroactive—they apply only to new document parsing or structured extraction executions.
The unique name constraint remains enforced. If you change the dataset’s name, the new name must be unique within your organization.
Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Path Parameters
Response
x >= 0
x >= 0
x >= 0
x >= 0
x >= 0
The chunking strategy determines how the document is chunked into smaller pieces.
This is only supported in markdown
mode.
page
, section
, fragment
Whether to deliver a webhook when the job is completed. A webhook needs to be configured for this to work.
If a webhook is not configured, the job will still be processed but the webhook will not be delivered.
Whether to summarize the contents of the figures.
The prompt to use for figure summarization.
If not provided, the default prompt will be used.
Whether to summarize the contents of the tables.
vlm
, object_detection
The JSON schema to guide structured data extraction from the file. Encode the JSON schema as a string.
If provided, the pages argument will be ignored.
The model provider to use for structured data extraction.
Specifying tensorlake
will use a private model, which runs on our servers.
tensorlake
, claude-3-5-sonnet-latest
, gpt-4o-mini
Overide the prompt to customize structured extractions. Use this if you want to extract data from a file using a different prompt than the one we use to extract.
The mode to use for table output - JSON, Markdown or HTML JSON mode is great for structured tables. Markdown mode is great for tables without merged cells. HTML mode is great for tables with merged cells.
json
, markdown
, html
The mode to use for table parsing - Table Structure Understanding or VLM Table Structure Understanding is the default mode. It's great for structured tables. VLM is great for unstructured or semi-structured tables.
tsr
, vlm
Whether to summarize the contents of the tables.
The prompt to use for table summarization.
failure
, pending
, processing
, successful
idle
, processing
Was this page helpful?
curl --request GET \
--url https://api.tensorlake.ai/documents/v1/datasets/{dataset_name} \
--header 'Authorization: Bearer <token>'
{
"analytics": {
"totalErrorJobs": 1,
"totalJobs": 1,
"totalPendingJobs": 1,
"totalProcessingJobs": 1,
"totalSuccessfulJobs": 1
},
"createdAt": "<string>",
"description": "<string>",
"id": "<string>",
"jobs": {
"hasMore": true,
"items": [
{
"createdAt": "<string>",
"errorMessage": "<string>",
"fileId": "<string>",
"fileName": "<string>",
"id": "<string>",
"outputsUrl": "<string>",
"status": "failure",
"updatedAt": "<string>"
}
],
"nextCursor": "<string>",
"prevCursor": "<string>"
},
"name": "<string>",
"settings": {
"chunkStrategy": null,
"deliverWebhook": "false",
"figureSummarization": true,
"figureSummarizationPrompt": "none",
"formDetectionMode": "vlm",
"jsonSchema": "<any>",
"modelProvider": null,
"structuredExtractionPrompt": "none",
"tableOutputMode": "json",
"tableParsingMode": "tsr",
"tableSummarization": true,
"tableSummarizationPrompt": "none"
},
"status": "idle"
}