cURL
Python
JavaScript
PHP
Go
Java
curl --request POST \
--url https://api.tensorlake.ai/documents/v1/parse \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '{
"deliverWebhook": true,
"file": "<string>",
"pages": "<string>",
"settings": {
"chunkStrategy": null,
"figureSummarizationPrompt": "none",
"formDetectionMode": "vlm",
"jsonSchema": "none",
"modelProvider": null,
"structuredExtractionPrompt": "none",
"tableOutputMode": "json",
"tableParsingMode": "tsr",
"tableSummarizationPrompt": "none"
}
}'
{
"fileId" : "<string>" ,
"jobId" : "<string>" ,
"status" : "<string>"
}
Submit a file for document parsing. If you have configured a webhook, we will notify you when the job is complete.
For structured extraction, you can provide a schema to guide the extraction process. The schema can be in the form of a JSON Schema object.
The API call returns a job_id
, which you can use to retrieve the results of the job with your API key.
You can either use a managed file, tensorlake-<file_id>
, or provide a pre-signed URL or any HTTP URL, that can be used to download the file.
Your data is NOT sent to a third party service(OpenAI, Anthropic, etc), and uses our own models to parse the document.
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
The chunking strategy determines how the document is chunked into smaller pieces.
This is only supported in markdown
mode.
Available options:
page
,
section
,
fragment
settings. figureSummarizationPrompt
The prompt to use for figure summarization.
settings. formDetectionMode
Whether to summarize the contents of the tables.
Available options:
vlm
,
object_detection
The JSON schema to guide structured data extraction from the file.
Encode the JSON schema as a string.
If provided, the pages argument will be ignored.
The model provider to use for structured data extraction.
Specifying tensorlake
will use a private model, which runs on our servers.
Available options:
tensorlake
,
claude-3-5-sonnet-latest
,
gpt-4o-mini
settings. structuredExtractionPrompt
Overide the prompt to customize structured extractions. Use this if you
want to extract data from a file using a different prompt than the one
we use to extract.
The mode to use for table output - JSON, Markdown or HTML
JSON mode is great for structured tables.
Markdown mode is great for tables without merged cells.
HTML mode is great for tables with merged cells.
Available options:
json
,
markdown
,
html
settings. tableParsingMode
The mode to use for table parsing - Table Structure Understanding or VLM
Table Structure Understanding is the default mode. It's great for structured tables.
VLM is great for unstructured or semi-structured tables.
Available options:
tsr
,
vlm
settings. tableSummarizationPrompt
The prompt to use for table summarization.