PDF is a widely used file format for sharing documents. Often, Enterprise LLM applications need to derive information that are locked inside PDF documents. You can build workflows with Indexify that uses any PDF Extraction Model to extract tables, images and text from PDFs.

You can use many different PDF models or APIs within a single workflow. Dynamic routing can be used to route the PDF to different models based on the document layout.

Examples

Multi-Modal RAG from PDFs using Inkwell

  • Table, Text and Image Extraction
  • Chunking Text
  • Embedding of Image, Text and Tables using Sentence Transformers
  • Using LanceDB for indexing and retrieval