Skip to main content
GET
/
documents
/
v2
/
parse
/
{parse_id}
cURL
curl --request GET \
  --url https://api.tensorlake.ai/documents/v2/parse/{parse_id} \
  --header 'Authorization: Bearer <token>'
{
  "parse_id": "parse_abcd1234",
  "dataset_id": null,
  "parsed_pages_count": 5,
  "status": "pending",
  "error": null,
  "pages": null,
  "chunks": [],
  "structured_data": null,
  "page_classes": null,
  "created_at": "2023-10-01T12:00:00Z",
  "finished_at": null,
  "labels": {}
}
Retrieve the results of a previously submitted parse job. The response will include:
  • Parsed content
    • Markdown (chunked if a chunking strategy is specified)
    • Pages
  • Structured extraction results (if schemas are provided during the parse request)
  • Page classification results (if page classifications are provided during the parse request)

Response Structure

When the job finishes successfully, the response will contain a JSON object with the following fields:

pages

The pages field contains a JSON representation of the chunks of the page/document. Each page is represented as an object with the following properties:
  • page_number: The page number of the document.
  • page_fragments: An array of document elements, each with:
    • content: The content of the fragment.
    • fragment_type: The type of the fragment (e.g., text, image, table).
    • bbox: The bounding box of the fragment, represented as an object with x1, y1, x2, and y2 coordinates.

chunks

The chunks field contains an array of text chunks extracted from the document. Each chunk is an object with a property called content, which is the text content of the chunk. If a chunking strategy was specified during the parse request, the text will be chunked accordingly.

structured_data

The structured_data field contains a JSON object with every schema_name you provided in the parse request as a key. Each object in this array represents a structured data item extracted from the document, adhering to the specified schema. For example, if you provided the following schema for an invoice:
{
  "title": "Invoice",
  "type": "object",
  "properties": {
    "invoice_number": {
      "type": "string"
    },
    "date": {
      "type": "string",
      "format": "date"
    },
    "total_amount": {
      "type": "number"
    },
    "items": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "description": {
            "type": "string"
          },
          "quantity": {
            "type": "number"
          },
          "price": {
            "type": "number"
          }
        }
      }
    }
  }
}
The structured_data field will contain objects that match that schema, such as:
{
  "invoice_number": "12345",
  "date": "2023-10-01",
  "total_amount": 100.0,
  "items": [
    {
      "description": "Item 1",
      "quantity": 2,
      "price": 50.0
    }
  ]
}
If our models were unable to find any text that complied to the schema, the structured_data field will be null. This can happen if the document does not contain any text that matches the schema you provided.

Errors

If a parse job is marked as failure, the errors field will contain an object with details about the error.

Lifecycle of a parse operation

The status field will indicate the current state of the parse job. Possible values are:
  • pending: The job is waiting to be processed.
  • processing: The job is currently being processed.
  • successful: The job has been successfully completed and the results are available.
  • failure: The job has failed, and the errors field will contain details about
Only when the job is in the successful state, you can access the structured_data, chunks and pages fields.

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Path Parameters

parse_id
string
required

The public ID of the parse job

Query Parameters

with_options
boolean

Response

Parse result details (JSON) or progress stream (SSE)

parse_id
string
default:""
required

The unique identifier for the parse job

This is the same as the value returned from the POST /documents/v2/parse endpoint.

Example:

"parse_abcd1234"

status
enum<string>
default:pending
required

The status of the parse job.

This indicates whether the job is pending, in progress, completed, or failed.

This can be used to track the progress of the parse operation.

Available options:
pending,
processing,
detecting_layout,
detected_layout,
extracting_data,
extracted_data,
formatting_output,
formatted_output,
successful,
failure
created_at
string
default:""
required

The date and time when the parse job was created.

The date is in RFC 3339 format.

This can be used to track when the parse job was initiated.

Example:

"2023-10-01T12:00:00Z"

dataset_id
string | null

If the parse job was scheduled from a dataset, this field contains the dataset id.

This is the identifier used in URLs and API endpoints to refer to the dataset.

parsed_pages_count
integer
default:0

The number of pages that were parsed successfully.

This is the total number of pages that were successfully parsed in the document.

Required range: x >= 0
Example:

5

error
string | null

Error occurred during any part of the parse execution.

This is only populated if the parse operation failed.

pages
object[] | null

List of pages parsed from the document.

Each page has a list of fragments, which are detected objects such as tables, text, figures, section headers, etc.

We also return the detected text, structure of the table(if its a table), and the bounding box of the object.

chunks
object[]

Chunks of the document.

This is a vector of Chunk objects, each containing a chunk of the document. The number of chunks depend on the chunking strategy used during parsing.

structured_data
object[] | null

Structured data extracted from the document.

The structured data is a map where the keys are the schema names provided in the parse request, and the values are StructuredData objects containing the structured data extracted from the document.

The number of structured data objects depends on the partition strategy None - one structured data object for the entire document. Page - one structured data object for each page.

page_classes
object[] | null

Page classes extracted from the document.

This is a map where the keys are page class names provided in the parse request under the page_classification_options field, and the values are vectors of page numbers (1-indexed) where each page class appears.

This is used to categorize pages in the document based on the classify options provided.

finished_at
string | null

The date and time when the parse job was finished.

The date is in RFC 3339 format.

This can be undefined if the parse job is still in progress or pending.

labels
object

Labels associated with the parse job.

These are the key-value, or json, pairs submitted with the parse request.

This can be used to categorize or tag the parse job for easier identification and filtering.

It can be undefined if no labels were provided in the request.

I