Parse file async
Submit a file for document parsing. If you have configured a webhook, we will notify you when the job is complete.
The API call returns a job_id
, which you can use to retrieve the results of the job with your API key.
You can either use a managed file, tensorlake-<file_id>
, or provide a pre-signed URL or any HTTP URL, that can be used to download the file.
Your data is NOT sent to a third party service(OpenAI, Anthropic, etc), and uses our own models to parse the document.
Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Body
The URL of the document to be processed. You can provide one of the following:
- A publicly available URL or a presigned S3 URL
- A tensorlake- prefixed id obtained from the /files endpoint after directly uploading a document
The chunking strategy determines how the document is chunked into smaller pieces.
This is only supported in markdown
mode.
none
, page
, section
, fragment
Whether to deliver a webhook when the job is completed.
The output mode determines the format of the output.
json
mode has the individual page elements and their bounding boxes. JSON mode also includes any images in the document, encoded as base64.markdown
mode converts the document into a markdown format. Images are summarized as text. The original images are not included in the output.
json
, markdown
The page range to be processed. Use numbers separated by - to specify a range.
The mode to use for table parsing - Table Structure Understanding or VLM Table Structure Understanding is the default mode. It's great for structured tables. VLM is great for unstructured or semi-structured tables.
table_structure_understanding
, vlm
Whether to summarize the contents of the tables.
The prompt to use for table summarization.
Was this page helpful?