Skip to main content
PUT
/
documents
/
v2
/
datasets
/
{dataset_id}
cURL
curl --request PUT \
  --url https://api.tensorlake.ai/documents/v2/datasets/{dataset_id} \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "parsing_options": {
    "table_output_mode": "html",
    "table_parsing_format": "tsr",
    "chunking_strategy": "none",
    "signature_detection": false,
    "remove_strikethrough_lines": false,
    "skew_detection": false,
    "disable_layout_detection": false,
    "ignore_sections": [],
    "cross_page_header_detection": false,
    "ocr_model": "model01"
  },
  "structured_extraction_options": [
    {
      "schema_name": "<string>",
      "json_schema": "<any>",
      "skip_ocr": true,
      "prompt": "<string>",
      "model_provider": "tensorlake",
      "partition_strategy": "none",
      "page_classes": [
        "<string>"
      ],
      "provide_citations": true
    }
  ],
  "page_classifications": [
    {
      "name": "<string>",
      "description": "<string>"
    }
  ],
  "enrichment_options": {
    "table_summarization": false,
    "table_summarization_prompt": null,
    "figure_summarization": false,
    "figure_summarization_prompt": null,
    "include_full_page_image": false
  },
  "description": "This dataset contains all invoices from 2023."
}'
{
  "name": "Invoices Dataset",
  "dataset_id": "dataset_12345",
  "description": "This dataset contains invoices for the year 2023.",
  "status": "idle",
  "created_at": "2023-10-01T12:00:00Z",
  "updated_at": "2023-10-01T12:00:00Z"
}
Change the settings or metadata for a dataset. Dataset’s settings changes are not retroactive. The changes will only apply to new document parsing or structured extraction executions. The unique name constraint is still enforced. If you change the name of the dataset, the new name must be unique within your organization.

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Path Parameters

dataset_id
string
required

The ID of the dataset to update

Body

application/json
parsing_options
object

The properties of this object define the configuration for the document parsing process.

Tensorlake provides sane defaults that work well for most documents, so this object is not required. However, every document is different, and you may want to customize the parsing process to better suit your needs.

structured_extraction_options
object[] | null

The properties of this object define the configuration for structured data extraction.

If this object is present, the API will perform structured data extraction on the document.

page_classifications
object[] | null

The properties of this object define the configuration for page classify.

If this object is present, the API will perform page classify on the document.

enrichment_options
object

The properties of this object help to extend the output of the document parsing process with additional information.

This includes summarization of tables and figures, which can help to provide a more comprehensive understanding of the document.

This object is not required, and the API will use default settings if it is not present.

description
string | null

A description of the dataset.

This field is optional and can be used to provide additional context about the dataset.

Example:

"This dataset contains all invoices from 2023."

Response

Dataset updated successfully

name
string
required

The name of the dataset.

This is a human-readable name that identifies the dataset.

Example:

"Invoices Dataset"

dataset_id
string
required

The unique identifier for the dataset.

This identifier is used to refer to the dataset in API endpoints and operations.

This value is automatically generated and is unique within the organization and project context.

Example:

"dataset_12345"

status
enum<string>
required

The current status of the dataset.

This indicates whether the dataset is currently idle or processing.

Available options:
idle,
processing
created_at
string
required

The date and time when the dataset was created.

The data is in RFC 3339 format (e.g., "2023-10-01T12:00:00Z").

Example:

"2023-10-01T12:00:00Z"

updated_at
string
required

The date and time when the dataset was last updated.

The data is in RFC 3339 format (e.g., "2023-10-01T12:00:00Z").

Example:

"2023-10-01T12:00:00Z"

description
string | null

An optional description of the dataset.

This description is the one provided during dataset creation or update.

Example:

"This dataset contains invoices for the year 2023."

I