Document Ingestion with Tensorlake is as simple as:

1

Upload your file

Either upload your file to the Tensorlake Cloud, or specify URL to a publicly accessible file.

2

Specify your parsing settings and parse

Define your parsing settings, including which pages to parse, what chunking strategy to use, what structured data you want extracted, and how to handle complex document fragments like tables, figures, signatures, or strikethrough.

3

Get the results

Retrieve the results of the parse job, including markdown chunks, a complete document layout, and structured data if a schema was provided.

The APIs to support this workflow are:

You can also use the Webhooks API to receive notifications when a parse job is completed.

Core API Functionality

While the Tensorlake API is extensive, some of the core functinality that sets it apart from other Document Ingestion APIs are:

Core FunctionDescription
Structured Data ExtractionPull out fields from a document. Specify schema using either JSON Schema or Pydantic Models.
Page ClassificationAutomatically identify and label different sections or types of pages (e.g., cover, table of contents, appendix) within a document.
Document ChunkingEnable Agents to read documents or index chunks for building RAG and Knowledge Graphs applications.
Bounding BoxesSpecifically reference every element in the document for citations and highlighting.
SummarizationSummarize tables, charts and figures in documents.
Unlimited Pages and File SizeParse any number of large documents. You pay only for what you use
Unlimited Fields Per DocumentCapture every detail in even the most complex documents. Don’t be limited to only ~100 fields by using other APIs
Flexible UsageAll of the above features can be consumed individually or in combination in a single API call, thereby not requiring you to not need to build custom multi-stage document parsing pipelines.