Document AI Client
What it is: The main entry point for interacting with Tensorlake. It provides methods for uploading documents, creating parsing jobs, and retrieving results. Why it matters: This is where you configure your parsing options, upload files, and manage the parsing workflow.Learn how to get your API key from Tensorlake Cloud.
Document Upload
What it is: The first step in any ingestion workflow. Tensorlake accepts PDF, images, raw-text, presentations, and more. Once your document (or data) is uploaded, it is considered afile
. Each file is assigned a file_id
, which is used in parsing jobs.
Why it matters: Uploading documents enables asynchronous processing and orchestration.
Parsing Jobs
What it is: A parsing job is the process Tensorlake uses to analyze a document and return structured output. It uses the configuredParsingOptions
to determine how the document should be processed.
Why it matters: This is where you define behaviors like schema extraction, signature detection, table parsing, and more.
Parsing Options
What it is: Controls how Tensorlake parses the document. This includes chunking, table strategies, signature detection, OCR preferences, and more. Why it matters: You can fine-tune performance and accuracy by customizing your parsing strategy.Learn more about Parsing Options, including Signature Detection, Strikethrough Detection, and Table Parsing.
Schemas
What it is: Schemas define what structured data you want extracted. They can include keys likebuyer_name
, coverage_type
, or
signature_status
, and can be supplied as JSON or an inline string.
Why it matters: Schemas make Tensorlake deterministic. No fuzzy guesses, just structured fields mapped to your business logic.
signature_status_schema.json
Learn how to define schemas here.
Structured Output
What it is: The output returned by Tensorlake after parsing. Output includes a structured, schema-aligned JSON representation of your document data, including bounding boxes, page numbers, fragment types. If you provided a schema, the output will also include structured data that matches your schema. Why it matters: This output is machine-readable, auditable, and easy to plug into downstream systems like LangGraph, Slack, or CRMs. For example, here is a snippet based on this document, specifying the schema example above.Visual Layout & Bounding Boxes
What it is: Each field extracted includes optional layout metadata — such as its position on the page, size, and surrounding context. Why it matters: Useful for visual validation, audit trails, redlining, and debugging extraction behavior. See the bounding boxes in the Playground: