Document Ingestion Overview

Document Ingestion with Tensorlake is as simple as:

Upload your file

Either upload your file to the Tensorlake Cloud, specify URL to a publicly accessible file, or use raw text.

Specify your parsing settings and parse

Define your parsing settings, including which pages to parse, what chunking strategy to use, what structured data you want extracted, and how to handle complex document fragments like tables, figures, signatures, or strikethrough.

Get the results

Retrieve the results of the parse job, including markdown chunks, a complete document layout, structured data if a schema was provided, and page classifications if requested.

The complete upload, parse, and get results flow

The APIs to support this workflow are:

Files

File Management endpoints to upload, list, and delete files.

Parse

Parse endpoints to parse uploaded Documents or any remote file.

You can also use the Webhooks API to receive notifications when a parse job is completed.

Core API Functionality

While the Tensorlake API is extensive, some of the core functinality that sets it apart from other Document Ingestion APIs are:

Core Function	Description
Structured Data Extraction	Pull out fields from a document. Specify schema using either JSON Schema or Pydantic Models.
Page Classification	Automatically identify and label different sections or types of pages (e.g., cover, table of contents, appendix) within a document.
Document Chunking	Enable Agents to read documents or index chunks for building RAG and Knowledge Graphs applications.
Bounding Boxes	Specifically reference every element in the document for citations and highlighting.
Summarization	Summarize tables, charts and figures in documents.
Unlimited Pages and File Size	Parse any number of large documents. You pay only for what you use
Unlimited Fields Per Document	Capture every detail in even the most complex documents. Don’t be limited to only ~100 fields by using other APIs
Flexible Usage	All of the above features can be consumed individually or in combination in a single API call, thereby not requiring you to not need to build custom multi-stage document parsing pipelines.

Tensorlake

Document Ingestion

Workflows

FAQ

Open Source

Document Ingestion Overview

Files

Parse

Core API Functionality

Tensorlake

Document Ingestion

Workflows

FAQ

Open Source

Files

Parse

​Core API Functionality

Core API Functionality