POST
/
documents
/
v1
/
parse_async

Submit a file for document parsing. If you have configured a webhook, we will notify you when the job is complete.

The API call returns a job_id, which you can use to retrieve the results of the job with your API key.

You can either use a managed file, tensorlake-<file_id>, or provide a pre-signed URL or any HTTP URL, that can be used to download the file.

Your data is NOT sent to a third party service(OpenAI, Anthropic, etc), and uses our own models to parse the document.

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
file
string
required

The URL of the document to be processed. You can provide one of the following:

  • A publicly available URL or a presigned S3 URL
  • A tensorlake- prefixed id obtained from the /files endpoint after directly uploading a document
chunkStrategy
enum<string> | null
default:
none

The chunking strategy determines how the document is chunked into smaller pieces. This is only supported in markdown mode.

Available options:
none,
page,
section,
fragment
deliverWebhook
boolean

Whether to deliver a webhook when the job is completed.

outputMode
enum<string>
default:
markdown

The output mode determines the format of the output.

  • json mode has the individual page elements and their bounding boxes. JSON mode also includes any images in the document, encoded as base64.
  • markdown mode converts the document into a markdown format. Images are summarized as text. The original images are not included in the output.
Available options:
json,
markdown
pages
string | null

The page range to be processed. Use numbers separated by - to specify a range.

parseMode
enum<string>
default:
table_structure_understanding

The mode to use for table parsing - Table Structure Understanding or VLM Table Structure Understanding is the default mode. It's great for structured tables. VLM is great for unstructured or semi-structured tables.

Available options:
table_structure_understanding,
vlm
tableSummarization
boolean
default:
false

Whether to summarize the contents of the tables.

tableSummarizationPrompt
string | null
default:
none

The prompt to use for table summarization.

Response

200 - application/json
jobId
string
required
status
string
required