Skip to main content
POST
/
documents
/
v2
/
datasets
/
{dataset_id}
/
parse
cURL
curl --request POST \
  --url https://api.tensorlake.ai/documents/v2/datasets/{dataset_id}/parse \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "page_range": "1-5,8,10",
  "file_name": "document.pdf",
  "file_id": "file_abc123xyz",
  "mime_type": "application/pdf",
  "labels": {
    "priority": "high",
    "source": "email"
  }
}'
{
  "parse_id": "parse_id-12345",
  "created_at": "2023-10-01T12:00:00Z"
}
Use the Dataset’s configuration to parse a document and get parsed results in the Dataset. This endpoint allows you to submit a file for parsing using the settings defined in a specific dataset.

Using a file

When submitting a parse job to the dataset, you can provide the content of the file in one of three ways:
  1. file_id: The ID of a file that has been previously uploaded to the Upload File endpoint. This is the most common method.
  2. file_url: A publicly accessible URL that points to the file you want to parse. The API will download the file from this URL. Redirects are also supported, but the URL and the Location header must point to a file that is publicly accessible.
  3. raw_text: Raw text content, if you want to perform structured extraction from non-file sources; such as emails, HTML, CSV, XML, etc.

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Path Parameters

dataset_id
string
required

The ID of the dataset to parse

Body

application/json
  • file_id
  • file_url
  • raw_text

File source - must be exactly one of: file_id, file_url, or raw_text

file_id
string
required

ID of the file previously uploaded to Tensorlake. Has tensorlake- (V1) or file_ (V2) prefix.

Examples:

"file_abc123xyz"

labels
object | null

Additional metadata to identify the parse request. The labels are returned in the parse response.

Example:
{ "priority": "high", "source": "email" }
page_range
string

Comma-separated list of page numbers or ranges to parse (e.g., '1,2,3-5'). Default: all pages.

Examples:

"1-5,8,10"

file_name
string

Name of the file. Only populated when using file_id.

Examples:

"document.pdf"

mime_type
enum<string>
Available options:
application/pdf,
application/vnd.openxmlformats-officedocument.wordprocessingml.document,
application/msword,
application/vnd.openxmlformats-officedocument.presentationml.presentation,
application/vnd.apple.keynote,
image/jpeg,
text/plain,
text/html,
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet,
application/vnd.ms-excel.sheet.macroenabled.12,
application/vnd.ms-excel,
text/xml,
text/csv,
image/png,
application/octet-stream

Response

Dataset file parsed successfully

parse_id
string
required

The unique identifier for the parse job.

Use this identifier to track the progress and results of the parse job using the /documents/v2/parse/{parse_id} endpoint.

This identifier is used to track the parse job's progress and results.

Example:

"parse_id-12345"

created_at
string
required

The date and time when the parse job was scheduled.

The date is in RFC 3339 format (e.g., "2023-10-01T12:00:00Z").

Example:

"2023-10-01T12:00:00Z"

I