Structured Extraction
API for extracting structured data from documents
The structured extraction API helps in extracting structured data from documents. It’s ideal for automating data extraction from invoices, RFPs, tax and financial statements, and other structured documents.
Structured Data relevant to the schema is extracted from every page of the document, and then conflicts are resolved to produce a single output.
Supported file types - PDF, JPEG, PNG
Quick Start
Define a Schema
Define a schema for the document you want to extract data from. Schemas are defined as JSON Schema.
Extract Data
This returns a job ID, which is used to get the result of the extraction.
Retrieve the Result
Get the result from the API.
This returns the result of the extraction.
JSON Schema
The schema that you provide is used to extract structured data from the document. The schema is defined as JSON Schema.
Prompt
We use a proprietary prompt to extract the data from the document. You can provide your own prompt if you want to override the default prompt.
Model Provider
The model provider is the model that is used to extract the data from the document. If you specify the tensorlake
provider,
it uses our proprietary model to extract the data, and no data is sent to any third party LLM providers.
You can use OpenAI or Anthropic’s models as well.
Was this page helpful?