- Convert the Document to Markdown for feeding into an LLM.
- Extract structured data from the document specified by a JSON schema.
The example can be run in a Google Colab notebook.
Step-by-Step Guide
The output you will see
When the parsing is complete, you will see -- Two files called
structured_data.json
andmarkdown_chunks.md
with the structure data and the markdown chunks.
structured_data.json