- Parsed content
- Markdown (chunked if a chunking strategy is specified)
- Pages
- Structured extraction results (if schemas are provided during the parse request)
- Page classification results (if page classifications are provided during the parse request)
Response Structure
When the job finishes successfully, the response will contain a JSON object with the following fields:pages
Thepages field contains a JSON representation of the chunks of the page/document. Each page is represented as an object with the following properties:
page_number: The page number of the document.page_fragments: An array of document elements, each with:content: The content of the fragment.fragment_type: The type of the fragment (e.g., text, image, table).bbox: The bounding box of the fragment, represented as an object withx1,y1,x2, andy2coordinates.
chunks
Thechunks field contains an array of text chunks extracted from the document. Each chunk is an object with a property
called content, which is the text content of the chunk. If a chunking strategy was specified during the parse request,
the text will be chunked accordingly.
structured_data
Thestructured_data field contains a JSON object with every schema_name you provided in the parse request as a key.
Each object in this array represents a structured data item extracted from the document, adhering to the specified schema.
For example, if you provided the following schema for an invoice:
structured_data field will contain objects that match that schema, such as:
structured_data field will be null. This can happen if the document does not contain any text that matches the schema you provided.
Errors
If a parse job is marked asfailure, the errors field will contain an object with details about the error.
Lifecycle of a parse operation
Thestatus field will indicate the current state of the parse job. Possible values are:
pending: The job is waiting to be processed.processing: The job is currently being processed.successful: The job has been successfully completed and the results are available.failure: The job has failed, and theerrorsfield will contain details about
successful state, you can access the structured_data, chunks and pages fields.Authorizations
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Path Parameters
The public ID of the parse job
Query Parameters
Response
Parse result details (JSON) or progress stream (SSE)
The unique identifier for the parse job
This is the same as the value returned from the POST /documents/v2/parse endpoint.
"parse_abcd1234"
The status of the parse job.
This indicates whether the job is pending, in progress, completed, or failed.
This can be used to track the progress of the parse operation.
pending, processing, detecting_layout, detected_layout, extracting_data, extracted_data, formatting_output, formatted_output, successful, failure The date and time when the parse job was created.
The date is in RFC 3339 format.
This can be used to track when the parse job was initiated.
"2023-10-01T12:00:00Z"
If the parse job was scheduled from a dataset, this field contains the dataset id.
This is the identifier used in URLs and API endpoints to refer to the dataset.
The number of pages that were parsed successfully.
This is the total number of pages that were successfully parsed in the document.
x >= 05
Error occurred during any part of the parse execution.
This is only populated if the parse operation failed.
List of pages parsed from the document.
Each page has a list of fragments, which are detected objects such as tables, text, figures, section headers, etc.
We also return the detected text, structure of the table(if its a table), and the bounding box of the object.
Chunks of the document.
This is a vector of Chunk objects, each containing a chunk of the
document.
The number of chunks depend on the chunking strategy used during
parsing.
Structured data extracted from the document.
The structured data is a map where the keys are the schema names
provided in the parse request, and the values are
StructuredData objects containing the structured data extracted from
the document.
The number of structured data objects depends on the partition strategy None - one structured data object for the entire document. Page - one structured data object for each page.
Page classes extracted from the document.
This is a map where the keys are page class names provided in the parse
request under the page_classification_options field,
and the values are vectors of page numbers (1-indexed) where each page
class appears.
This is used to categorize pages in the document based on the classify options provided.
The date and time when the parse job was finished.
The date is in RFC 3339 format.
This can be undefined if the parse job is still in progress or pending.
Labels associated with the parse job.
These are the key-value, or json, pairs submitted with the parse request.
This can be used to categorize or tag the parse job for easier identification and filtering.
It can be undefined if no labels were provided in the request.
Resource usage associated with the parse job.
This includes details such as number of pages parsed, tokens used for OCR and extraction, etc.
Usage is only populated for successful jobs.
Billing is based on the resource usage.