Parse file async
Submit a file for document parsing. If you have configured a webhook, we will notify you when the job is complete.
The API call returns a job_id
, which you can use to retrieve the results of the job with your API key.
You can either use a managed file, tensorlake://<file_id>
, or provide a pre-signed URL or any HTTP URL, that can be used to download the file.
Your data is NOT sent to a third party service(OpenAI, Anthropic, etc), and uses our own models to parse the document.
Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Body
The URL of the document to be processed. You can provide one of the following:
- A publicly available URL or a presigned S3 URL
- A tensorlake:// prefixed URL obtained from the /files endpoint after directly uploading a document
The chunking strategy determines how the document is chunked into smaller pieces.
This is only supported in markdown
mode.
none
, page
, section
, fragment
Whether to deliver a webhook when the job is completed.
The output mode determines the format of the output.
json
mode has the individual page elements and their bounding boxes. JSON mode also includes any images in the document, encoded as base64.markdown
mode converts the document into a markdown format. Images are summarized as text. The original images are not included in the output.
json
, markdown
The page range to be processed. Use numbers separated by - to specify a range.
The parse mode determines the speed and accuracy of the document parsing.
fast
mode is faster but less accurate OCR. Smaller LLM models are used for figure and table summarization.accurate
mode is slower but more accurate OCR. Larger LLM models are used for figure and table summarization.
fast
, accurate
Was this page helpful?