POST
/
documents
/
v2
/
datasets
cURL
curl --request POST \
  --url https://api.tensorlake.ai/documents/v2/datasets \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "parsing_options": {
    "table_output_mode": "markdown",
    "table_parsing_format": "tsr",
    "chunking_strategy": "none",
    "signature_detection": false,
    "remove_strikethrough_lines": false,
    "skew_detection": false,
    "disable_layout_detection": false
  },
  "structured_extraction_options": [
    {
      "schema_name": "<string>",
      "json_schema": "<any>",
      "skip_ocr": true,
      "prompt": "<string>",
      "model_provider": "tensorlake",
      "partition_strategy": "none",
      "page_classes": [
        "<string>"
      ]
    }
  ],
  "page_classifications": [
    {
      "name": "<string>",
      "description": "<string>"
    }
  ],
  "enrichment_options": {
    "table_summarization": false,
    "table_summarization_prompt": null,
    "figure_summarization": false,
    "figure_summarization_prompt": null
  },
  "name": "invoices dataset",
  "description": "This dataset contains all invoices from 2023."
}'
{
  "name": "invoices dataset",
  "dataset_id": "dataset_12345",
  "created_at": "2023-10-01T12:00:00Z"
}

Create an ingestion workflow for structured extraction or document parsing.

A dataset is a collection of settings that help with organizing documents from the same domain and enable focused document intelligence.

The dataset’s name must be unique.

Your data is NOT sent to a third party service(OpenAI, Anthropic, etc), and uses our own models to parse the document.

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

This object defines the request body for creating a new dataset.

A Dataset is a collection of parsed results from files.

It can be used to store and manage related data, such as invoices, receipts, or any other documents that need to be parsed and analyzed.

Once a dataset is created, you can use it to parse related files using the same configuration and options, allowing for consistent and efficient data extraction.

Response

200
application/json

Dataset created successfully

The response is of type object.