Indexify
Open Source Compute Engine for Agentic Data Applications
AI Data Applications are often complex workflows that ingest and process data continuously. A single data application might involve running inference using multiple open source LLMs, call LLM APIs, do post-processing like masking PIIs on a heteregenous pool of compute resources.
Existing open source tools and libraries like Airflow, Temporal, Prefect, etc, are not designed to handle all aspects of a modern data stack for AI Engineering Teams.
We have open-sourced Indexify to democratize the infrastructure for building data applications for engineering teams who wants to own the infrastructure for their data applications.
Our Document AI Engine is built on Indexify as well.
Quick Start
Let’s create a simple workflow to summarize a website on-demand! It demonstrates how to build and serve a workflow as a remote Python API.
Install
Install the Indexify SDK.
Define the Graph
We will write two functions, scrape_website
and summarize_text
.
We create a Graph website-summarizer
that executes the scrape function, and then executes the summarizer with the outputs of the scraper.
Test the Graph In-Process
The graph can be run as-is, this is useful for testing.
Deploying a Graph as an Remote API
When it’s time to consume your graph from other applications, you can serve it as an API. You can run the server in production in many ways, but here we run this in our laptop to show how it works.
Note: The indexify-cli
command is part of the indexify
python package previously installed.
This starts the following processes -
- Server: Orchestrates functions in the graph, stores execution state, and hosts Remote Graph APIs.
- Executor: Runs the individual functions in the graph.
Once the server is ready, you can deploy the graph -
Call a Graph Endpoint
Once the graph is deployed, you can get a reference of the Graph in any application.
You can now call the graph as a remote API.
Was this page helpful?