Indexify is a compute engine for building durable data-intensive workflows and serving them as APIs. The workflows are elastic, functions run in paralellel across mutliple machines, automatically managing data flow between dependent functions. The Graphs are served as live API endpoints for seamless integration with existing systems.

If you know Python functions, you already know how to use Indexify!

Quick Start

Jump right into building a website summarizer!

Key Features

  • Conditional Branching and Data Flow: Router functions can dynamically chose one or more edges in Graph making it easy to invoke expert models based on inputs.
  • Local Inference: Run LLMs in workflow functions using LLamaCPP, vLLM, or Hugging Face Transformers.
  • Distributed Map and Reduce: Automatically parallelizes functions over sequences across multiple machines. Reducer functions are durable and invoked as map functions finish.
  • Version Graphs and Backfill: Backfill API to update previously processed data when functions or models are updated.
  • Request Queuing and Batching: Automatically queues and batches parallel workflow invocations to maximize GPU utilization.

While traditional workflows were often linear, we chose a graph-based approach to unlock inherent parallelism in AI tasks such as embeddings, chunking, summarization, object detection, and transcription.

A webscraper and summarizer workflow built using Indexify.

Migrating to Indexify

You can incrementally adopt Indexify, get an overview of the steps with an example.

Why Indexify?

Interacting with models isn’t the most challenging aspect of building Gen AI applications. However, developing these applications require constant tweaking prompts, update models when better ones are available, etc.

Indexify ensures that it doesn’t come in your way while you are iterating. You can write software as you would normally.

Further, sveral hurdles arise when moving from a prototype to a production-ready service:

  • State Management: Sharing and persisting the state of dependent stages in your application.
  • Version Control and Data Migration: As models evolve, developers must version workflow code and re-process existing data with newer models (e.g., improved structured extraction, summarization, or embedding models).
  • Compound Systems: Applications often require multiple models based on input context, necessitating dynamic routing of data to different functions. For instance, different document extraction models might be optimal for specific layouts, requiring a modular workflow that can adapt to various input types and contexts.
  • Hardware Optimization: For local inference using open-source or custom-trained models, efficiently utilizing GPUs for model inference and CPUs for other workflow components.

Once you start using Indexify you don’t have to worry about -

  • Handling state between functions.
  • Distributing functions across multiple machines for paralellism.
  • Building APIs to integrate with other systems.
  • Developing scripts for versioning and backfilling data.

You can program workflows as if they were running on a single machine, while benefiting from a robust, distributed infrastructure.