- Read: Converts any document or image to Markdown and provides layout information.
- Extract: Extracts structured data from documents using JSON Schemas.
- Classify: Classifies documents into categories.
- Summarize: Summarizes tables, figures, and charts in documents.
- Signature Detection: Detects signatures in documents.
How it works
- Upload — Send a PDF, image, spreadsheet, or 15+ supported formats
- Process — We have multiple Document AI endpoints that can run OCR, detect layout, extract tables, summarize figures/tables/charts, extract structured data, detect signatures, read bar codes, and more depending on your use case.
- Receive — Get clean JSON with text, tables, bounding boxes and structured output.
Integration with Your Existing Workflows
Document Ingestion API is a standalone API that can be used independently of the Agentic Runtime. You can call the APIs directly from your existing workflows. We also support sending webhooks when a document parsing job is completed. See Webhooks.Quickstart
Get started with the Document Ingestion API.
Read documents
Read documents and get Markdown and layout information.
Understand Parsing Output
Understand the output of the document ingestion job.
Supported File Types
Tensorlake supports the following file types:- Images (PNG, JPG, TIFF)
- Presentations (PPTX, Keynote)
- Raw Text (plain text, HTML, Markdown)
- Spreadsheets (XLSX, XLSM, XLS, CSV)
- Word Documents (DOC, DOCX)
- RTF
- P7M