- Read: Converts any document or image to Markdown and provides layout information.
- Edit: Allows filling forms and modifying documents using a prompt.
- Extract: Extracts structured data from documents using JSON Schemas.
- Classify: Classifies pages into categories.
- Summarize: Summarizes tables, figures, and charts in documents.
- Signature Detection: Detects signatures in documents.
- Barcode Detection: Detects and reads barcodes in documents.
- Cross-page Header Correction: Fixes headers that span or repeat across pages to improve document structure.
- Table Merging: Merges tables that span multiple pages into a single unified table.
- Chart Extraction: Extracts and processes charts as distinct elements separate from figures.
- Key-Value Extraction: Extracts key-value pairs from forms such as loan applications, insurance claims, and tax documents.
How it works
- Upload — Send a PDF, image, spreadsheet, or 15+ supported formats
- Process — We have multiple Document AI endpoints that can run OCR, detect layout, extract tables, summarize figures/tables/charts, extract structured data, detect signatures, read bar codes, and more depending on your use case.
- Receive — Get clean JSON with text, tables, bounding boxes and structured output.
Integration with Your Existing Workflows
Document Ingestion API is a standalone API that can be used independently of the Agentic Runtime. You can call the APIs directly from your existing workflows. We also support sending webhooks when a document parsing job is completed. See Webhooks.Quickstart
Get started with the Document Ingestion API.
Read documents
Read documents and get Markdown and layout information.
Understand Parsing Output
Understand the output of the document ingestion job.
Supported File Types
Tensorlake supports the following file types:- Images (PNG, JPG, TIFF)
- Presentations (PPTX, Keynote)
- Raw Text (plain text, HTML, Markdown)
- Spreadsheets (XLSX, XLSM, XLS, CSV)
- Word Documents (DOC, DOCX)
- RTF
- P7M