Understand how to process Documents for use in AI Agents and RAG Applications
pending
state. It will transition to the processing
state and then to the successful
state when itβs parsed successfully.
ParsingOptions
class./parse/{parse_id}
endpoint, or using the get_job
SDK function.
Parameter | Description |
---|---|
parsing_options | Customizes the document parsing process, including table parsing, chunking strategies, and more. See Parsing Options. |
enrichment_options | Summarize tables and figures present in the document. See Summarization. |
/parse
section
of the API reference.Parameter | Description | Default Value |
---|---|---|
chunking_strategy | Choose between , Page, Section, or Fragment. | None (no chunking) |
table_output_mode | Choose between Markdown, . | HTML |
table_parsing_format | Choose between or . | TSR |
disable_layout_detection | Boolean flag to skip layout detection and directly extract text. Useful for documents with many tables or images. | false |
skew_detection | Detect and correct skewed or rotated pages. Please note this can increase the processing time. | false |
signature_detection | Detect signatures in the document. Please note this can increase the processing time, and incurs additional costs. | false |
remove_strikethrough_lines | Remove strikethrough lines from the document. Please note this can increase the processing time, and incurs additional costs. | false |
ignore_sections | A set of document fragments to ignore during parsing. This can be useful for excluding irrelevant sections from the output. | [] |
cross_page_header_detection | A boolean flag to enable header hierarchy detection across pages. This can improve the accuracy of header extraction in multi-page documents. | false |
detect_signatures
to true
will ensure all signatures are
detected throughout your document.