Skip to main content
Tensorlake’s Document Ingestion API turns unstructured documents into agent-ready inputs: layout-aware Markdown chunks and schema-validated structured data. Tensorlake’s Document Ingestion API provides a comprehensive set of tools for converting documents into Markdown or structured data. It’s backed by our state-of-the-art OCR and VLM models, and is designed to be used in conjunction with our Agentic Runtime or separately for document processing workflows. The Document Ingestion API has the following capabilities:
  • Read: Converts any document or image to Markdown and provides layout information.
  • Extract: Extracts structured data from documents using JSON Schemas.
  • Classify: Classifies documents into categories.
  • Summarize: Summarizes tables, figures, and charts in documents.
  • Signature Detection: Detects signatures in documents.

How it works

  1. Upload — Send a PDF, image, spreadsheet, or 15+ supported formats
  2. Process — We have multiple Document AI endpoints that can run OCR, detect layout, extract tables, summarize figures/tables/charts, extract structured data, detect signatures, read bar codes, and more depending on your use case.
  3. Receive — Get clean JSON with text, tables, bounding boxes and structured output.

Integration with Your Existing Workflows

Document Ingestion API is a standalone API that can be used independently of the Agentic Runtime. You can call the APIs directly from your existing workflows. We also support sending webhooks when a document parsing job is completed. See Webhooks.

Supported File Types

Tensorlake supports the following file types:
  • PDF
  • Images (PNG, JPG, TIFF)
  • Presentations (PPTX, Keynote)
  • Raw Text (plain text, HTML, Markdown)
  • Spreadsheets (XLSX, XLSM, XLS, CSV)
  • Word Documents (DOC, DOCX)
  • RTF
  • P7M