parse_id
field. You can query the status and results of the parse operation
with the Get Parse Result endpoint.
Using a schema
For this operation, you must provide one or more schemas to guide the extraction process. The schema must be in the form of a JSON Schema object. The JSON Schema object can be provided in thestructured_extraction_options
array, which can contain multiple objects.
Known limitations include:
- The schema can only be at most 5 levels deep
- Root level fields must be objects
Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Body
File source - must be exactly one of: file_id, file_url, or raw_text
ID of the file previously uploaded to Tensorlake. Has tensorlake- (V1) or file_ (V2) prefix.
"file_abc123xyz"
The properties of this object define the configuration for structured data extraction.
If this object is present, the API will perform structured data extraction on the document.
Additional metadata to identify the extraction request. The labels are returned in the extraction response.
{ "priority": "high", "source": "email" }
Comma-separated list of page numbers or ranges to parse (e.g., '1,2,3-5'). Default: all pages.
"1-5,8,10"
Name of the file. Only populated when using file_id.
"document.pdf"
application/pdf
, application/vnd.openxmlformats-officedocument.wordprocessingml.document
, application/msword
, application/vnd.openxmlformats-officedocument.presentationml.presentation
, application/vnd.apple.keynote
, image/jpeg
, text/plain
, text/html
, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
, application/vnd.ms-excel.sheet.macroenabled.12
, application/vnd.ms-excel
, text/xml
, text/csv
, image/png
, application/octet-stream
Response
Created parse job details
The unique identifier for the parse job
This is the ID that can be used to track the status of the parse job.
Used in the GET /documents/v2/parse/{parse_id}
endpoint to retrieve
the status and results of the parse job.
The creation date and time of the parse job.
The date is in RFC 3339 format.