- Parsed content
- Markdown (chunked if a chunking strategy is specified)
- Pages
- Structured extraction results (if schemas are provided during the parse request)
- Page classification results (if page classifications are provided during the parse request)
Response Structure
When the job finishes successfully, the response will contain a JSON object with the following fields:pages
Thepages
field contains a JSON representation of the chunks of the page/document. Each page is represented as an object with the following properties:
page_number
: The page number of the document.page_fragments
: An array of document elements, each with:content
: The content of the fragment.fragment_type
: The type of the fragment (e.g., text, image, table).bbox
: The bounding box of the fragment, represented as an object withx1
,y1
,x2
, andy2
coordinates.
chunks
Thechunks
field contains an array of text chunks extracted from the document. Each chunk is an object with a property
called content
, which is the text content of the chunk. If a chunking strategy was specified during the parse request,
the text will be chunked accordingly.
structured_data
Thestructured_data
field contains a JSON object with every schema_name
you provided in the parse request as a key.
Each object in this array represents a structured data item extracted from the document, adhering to the specified schema.
For example, if you provided the following schema for an invoice:
structured_data
field will contain objects that match that schema, such as:
structured_data
field will be null
. This can happen if the document does not contain any text that matches the schema you provided.
Errors
If a parse job is marked asfailure
, the errors
field will contain an object with details about the error.
Lifecycle of a parse operation
Thestatus
field will indicate the current state of the parse job. Possible values are:
pending
: The job is waiting to be processed.processing
: The job is currently being processed.successful
: The job has been successfully completed and the results are available.failure
: The job has failed, and theerrors
field will contain details about
successful
state, you can access the structured_data
, chunks
and pages
fields.Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Path Parameters
The public ID of the parse job
Query Parameters
Response
Parse result details (JSON) or progress stream (SSE)
The unique identifier for the parse job
This is the same as the value returned from the POST /documents/v2/parse
endpoint.
"parse_abcd1234"
The status of the parse job.
This indicates whether the job is pending, in progress, completed, or failed.
This can be used to track the progress of the parse operation.
pending
, processing
, detecting_layout
, detected_layout
, extracting_data
, extracted_data
, formatting_output
, formatted_output
, successful
, failure
The date and time when the parse job was created.
The date is in RFC 3339 format.
This can be used to track when the parse job was initiated.
"2023-10-01T12:00:00Z"
If the parse job was scheduled from a dataset, this field contains the dataset id.
This is the identifier used in URLs and API endpoints to refer to the dataset.
The number of pages that were parsed successfully.
This is the total number of pages that were successfully parsed in the document.
x >= 0
5
Error occurred during any part of the parse execution.
This is only populated if the parse operation failed.
List of pages parsed from the document.
Each page has a list of fragments, which are detected objects such as tables, text, figures, section headers, etc.
We also return the detected text, structure of the table(if its a table), and the bounding box of the object.
Chunks of the document.
This is a vector of Chunk
objects, each containing a chunk of the
document.
The number of chunks depend on the chunking strategy used during
parsing.
Structured data extracted from the document.
The structured data is a map where the keys are the schema names
provided in the parse request, and the values are
StructuredData
objects containing the structured data extracted from
the document.
The number of structured data objects depends on the partition strategy None - one structured data object for the entire document. Page - one structured data object for each page.
Page classes extracted from the document.
This is a map where the keys are page class names provided in the parse
request under the page_classification_options
field,
and the values are vectors of page numbers (1-indexed) where each page
class appears.
This is used to categorize pages in the document based on the classify options provided.
The date and time when the parse job was finished.
The date is in RFC 3339 format.
This can be undefined if the parse job is still in progress or pending.
Labels associated with the parse job.
These are the key-value, or json, pairs submitted with the parse request.
This can be used to categorize or tag the parse job for easier identification and filtering.
It can be undefined if no labels were provided in the request.