Read Document

POST

documents

read

cURL

curl --request POST \
  --url https://api.tensorlake.ai/documents/v2/read \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "file_id": "<string>",
  "page_range": "<string>",
  "file_name": "<string>",
  "mime_type": "application/pdf",
  "parsing_options": {
    "table_output_mode": "html",
    "table_parsing_format": "tsr",
    "chunking_strategy": "none",
    "signature_detection": false,
    "remove_strikethrough_lines": false,
    "skew_detection": false,
    "disable_layout_detection": false,
    "ignore_sections": [],
    "cross_page_header_detection": false,
    "include_images": false,
    "barcode_detection": false,
    "ocr_model": "model01"
  },
  "enrichment_options": {
    "table_summarization": false,
    "table_summarization_prompt": null,
    "figure_summarization": false,
    "figure_summarization_prompt": null,
    "include_full_page_image": false
  },
  "labels": {
    "priority": "high",
    "source": "email"
  }
}
'

{
  "parse_id": "<string>",
  "created_at": "<string>"
}

Submit a uploaded file, an internet-reachable URL, or any kind of raw text for document parsing. If you have configured a webhook, we will notify you when the job is complete, be it a success or a failure. The API will convert the document into markdown, and provide document layout information. Once submitted, the API will return a parse response with a parse_id field. You can query the status and results of the parse operation with the Get Parse Result endpoint.

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

file_id
file_url
raw_text

File source - must be exactly one of: file_id, file_url, or raw_text

file_id

string

required

ID of the file previously uploaded to Tensorlake. Has tensorlake- (V1) or file_ (V2) prefix.

Example:

"file_abc123xyz"

page_range

string

Comma-separated list of page numbers or ranges to parse (e.g., '1,2,3-5'). Default: all pages.

Example:

"1-5,8,10"

file_name

string

Name of the file. Only populated when using file_id.

Example:

"document.pdf"

mime_type

enum<string>

Available options:

application/pdf,

application/vnd.openxmlformats-officedocument.wordprocessingml.document,

application/msword,

application/vnd.openxmlformats-officedocument.presentationml.presentation,

application/vnd.ms-powerpoint,

application/vnd.apple.keynote,

image/jpeg,

image/tiff,

text/plain,

text/html,

text/markdown,

text/x-markdown,

application/vnd.openxmlformats-officedocument.spreadsheetml.sheet,

application/vnd.ms-excel.sheet.macroenabled.12,

application/vnd.ms-excel,

text/xml,

text/csv,

image/png,

text/rtf,

application/rtf,

application/octet-stream,

application/pkcs7-mime,

application/x-pkcs7-mime,

application/pkcs7-signature

parsing_options

object

The properties of this object define the configuration for the document parsing process.

Tensorlake provides sane defaults that work well for most documents, so this object is not required. However, every document is different, and you may want to customize the parsing process to better suit your needs.

Show child attributes

enrichment_options

object

The properties of this object help to extend the output of the document parsing process with additional information.

This includes summarization of tables and figures, which can help to provide a more comprehensive understanding of the document.

This object is not required, and the API will use default settings if it is not present.

Show child attributes

labels

object

Additional metadata to identify the read request. The labels are returned in the read response.

Show child attributes

Example:

{ "priority": "high", "source": "email" }

Response

Created parse job details

parse_id

string

required

The unique identifier for the parse job

This is the ID that can be used to track the status of the parse job. Used in the GET /documents/v2/parse/{parse_id} endpoint to retrieve the status and results of the parse job.

created_at

string

required

The creation date and time of the parse job.

The date is in RFC 3339 format.

Extract Documents

⌘I

cURL

curl --request POST \
  --url https://api.tensorlake.ai/documents/v2/read \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "file_id": "<string>",
  "page_range": "<string>",
  "file_name": "<string>",
  "mime_type": "application/pdf",
  "parsing_options": {
    "table_output_mode": "html",
    "table_parsing_format": "tsr",
    "chunking_strategy": "none",
    "signature_detection": false,
    "remove_strikethrough_lines": false,
    "skew_detection": false,
    "disable_layout_detection": false,
    "ignore_sections": [],
    "cross_page_header_detection": false,
    "include_images": false,
    "barcode_detection": false,
    "ocr_model": "model01"
  },
  "enrichment_options": {
    "table_summarization": false,
    "table_summarization_prompt": null,
    "figure_summarization": false,
    "figure_summarization_prompt": null,
    "include_full_page_image": false
  },
  "labels": {
    "priority": "high",
    "source": "email"
  }
}
'

{
  "parse_id": "<string>",
  "created_at": "<string>"
}

API Documentation

Document Ingestion

Authorizations

Body

Response