Document Ingestion with Tensorlake is as simple as:
1

Call the parse endpoint

The parse endpoint will create a parse job with the following request payload:
  • A file source, which can be:
  • Options for parsing. See the parse settings below.
  • page_range: The range of pages to parse, ex: 1-2 or 1,3,5. By default, all pages will be parsed.
  • labels: Metadata to identify the parse request. The labels are returned along with the parse response.
The endpoint will return:
  • parse_id: The unique ID Tensorlake uses to reference the specific parsing job. This ID can be used to get the output when the parsing job is completed and re-visit previously used settings.
2

Query the status of the parsing job

The /parse/{parse_id} endpoint will return:
  • status: The status of the parsing job. This can be failure, pending, processing, or successful.
  • If the parsing job is pending or processing, you should wait a few seconds and then check again by re-calling the endpoint.
3

Retrieve the parsed result

When the parsing job is successful, you can retrieve the parsed result by calling the /parse/{parse_id} endpoint. The response payload will include an Response object:
  • chunks: An array of objects that contain a chunk number (specified by the chunk strategy) and the markdown content for that chunk.
  • document_layout: A JSON representation of the document’s visual structure, including page dimensions, bounding boxes for each element (text, tables, figures, signatures), and reading order.
  • labels: Labels associated with the parse job.
The complete upload, parse, and get results flow The APIs to support this workflow are:
You can also use the Webhooks API to receive notifications when a parse job is completed.

Core API Functionality

While the Tensorlake API is extensive, some of the core functinality that sets it apart from other Document Ingestion APIs are:
Core FunctionDescription
Structured Data ExtractionPull out fields from a document. Specify schema using either JSON Schema or Pydantic Models.
Page ClassificationAutomatically identify and label different sections or types of pages (e.g., cover, table of contents, appendix) within a document.
Document ChunkingEnable Agents to read documents or index chunks for building RAG and Knowledge Graphs applications.
Bounding BoxesSpecifically reference every element in the document for citations and highlighting.
SummarizationSummarize tables, charts and figures in documents.
Unlimited Pages and File SizeParse any number of large documents. You pay only for what you use
Unlimited Fields Per DocumentCapture every detail in even the most complex documents. Don’t be limited to only ~100 fields by using other APIs
Flexible UsageAll of the above features can be consumed individually or in combination in a single API call, thereby not requiring you to not need to build custom multi-stage document parsing pipelines.