The Document Parsing API parses a document, and returns a Markdown or JSON representation of the document.

You can use the markdown or JSON output to further perform any downstream processing.

  • Chunking
  • Summarization
  • Image Indexing

Parsing a Document

1

Upload the Document

Upload the document to the API.

curl -X 'POST' \
'https://api.tensorlake.ai/documents/v1/upload' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-F 'file_path=@/path/to/finance_stock_report.pdf;type=application/pdf'
2

Parse the Document

3

Get the Result

Get the result from the API.

curl -X GET https://api.tensorlake.ai/documents/v1/result \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "file": "tensorlake://b5dee680-c07c-4bad-ba00-7bd16d28975d"
}'

API Reference

Output Modes

The Document Parsing API supports two output modes:

  • markdown
  • json

Set the output mode using the output parameter of the /parse or /parse_async endpoint.

Switching between Models

You can switch between models using the model parameter of the /parse or /parse_async endpoint.

The following models are supported:

  • small: Faster and smaller models are used for low latency parsing.
  • medium: Slightly slower, uses combinations of vision and VLMs, but more accurate.
  • large: Slowest, but most accurate, uses multiple VLMs and OCR models.

Listing Documents

You can list all the documents in a project using the following API call:

curl -X GET https://api.tensorlake.ai/documents/v1/list \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json"
-d '{
    "cursor": "xxx", 
    "limit": 10
}'