GET
/
documents
/
v2
/
parse
/
{parse_id}
cURL
curl --request GET \
  --url https://api.tensorlake.ai/documents/v2/parse/{parse_id} \
  --header 'Authorization: Bearer <token>'
{
  "parse_id": "parse_abcd1234",
  "parsed_pages_count": 5,
  "status": "pending",
  "error": null,
  "pages": null,
  "chunks": [],
  "structured_data": null,
  "page_classes": null,
  "created_at": "",
  "finished_at": null,
  "labels": {}
}

Get Parse Result

Retrieve the results of a previously submitted parse job.

The response will include:

  • Parsed content
    • Markdown (chunked if a chunking strategy is specified)
    • Document Layout
  • Structured extraction results (if schemas are provided during the parse request)
  • Page classification results (if page classifications are provided during the parse request)

Response Structure

When the job finishes successfully, the response will contain a JSON object with the following fields:

document_layout

The document_layout field contains a JSON representation of the chunks of the page/document, all in a property called pages. Each page is represented as an object with the following properties:

  • page_number: The page number of the document.
  • page_fragments: An array of document elements, each with:
    • fragment_type: The type of the fragment (e.g., text, image, table).
    • bbox: The bounding box of the fragment, represented as an object with x1, y1, x2, and y2 coordinates.

chunks

The chunks field contains an array of text chunks extracted from the document. Each chunk is an object with a property called content, which is the text content of the chunk. If a chunking strategy was specified during the parse request, the text will be chunked accordingly.

structured_data

The structured_data field contains a JSON object with every schema_name you provided in the parse request as a key.

Each object in this array represents a structured data item extracted from the document, adhering to the specified schema. For example, if you provided the following schema for an invoice:

{
  "title": "Invoice",
  "type": "object",
  "properties": {
    "invoice_number": {
      "type": "string"
    },
    "date": {
      "type": "string",
      "format": "date"
    },
    "total_amount": {
      "type": "number"
    },
    "items": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "description": {
            "type": "string"
          },
          "quantity": {
            "type": "number"
          },
          "price": {
            "type": "number"
          }
        }
      }
    }
  }
}

The structured_data field will contain objects that match that schema, such as:

{
  "invoice_number": "12345",
  "date": "2023-10-01",
  "total_amount": 100.0,
  "items": [
    {
      "description": "Item 1",
      "quantity": 2,
      "price": 50.0
    }
  ]
}

If our models were unable to find any text that complied to the schema, the structured_data field will be null. This can happen if the document does not contain any text that matches the schema you provided.

Errors

If a parse job is marked as failure, the errors field will contain an object with details about the error.

Lifecycle of a parse operation

The status field will indicate the current state of the parse job. Possible values are:

  • pending: The job is waiting to be processed.
  • processing: The job is currently being processed.
  • successful: The job has been successfully completed and the results are available.
  • failure: The job has failed, and the errors field will contain details about

Only when the job is in the successful state, you can access the structured_data, chunks and document_layout fields.

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Path Parameters

parse_id
string
required

The public ID of the parse job

Query Parameters

with_options
boolean

Response

200
application/json

Parse result details

The response is of type object.