POST
/
documents
/
v2
/
extract
cURL
curl --request POST \
  --url https://api.tensorlake.ai/documents/v2/extract \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "page_range": "1-5,8,10",
  "file_name": "document.pdf",
  "file_id": "file_abc123xyz",
  "mime_type": "application/pdf",
  "structured_extraction_options": [
    {
      "schema_name": "<string>",
      "json_schema": "<any>",
      "skip_ocr": true,
      "prompt": "<string>",
      "model_provider": "tensorlake",
      "partition_strategy": "none",
      "page_classes": [
        "<string>"
      ],
      "provide_citations": true
    }
  ],
  "labels": {
    "priority": "high",
    "source": "email"
  }
}'
{
  "parse_id": "<string>",
  "created_at": "<string>"
}
Submit a uploaded file, an internet-reachable URL, or any kind of raw text for document parsing. If you have configured a webhook, we will notify you when the job is complete, be it a success or a failure. Once submitted, the API will return a parse response with a parse_id field. You can query the status and results of the parse operation with the Get Parse Result endpoint.

Using a schema

For this operation, you must provide one or more schemas to guide the extraction process. The schema must be in the form of a JSON Schema object. The JSON Schema object can be provided in the structured_extraction_options array, which can contain multiple objects. Known limitations include:
  • The schema can only be at most 5 levels deep
  • Root level fields must be objects
Page Classification labels can be combined with structured extraction, to make the API perform structured extraction on a subset of pages.

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

File source - must be exactly one of: file_id, file_url, or raw_text

Response

200
application/json

Created parse job details

The response is of type object.