Incremental Adoption
Migrating to Indexify is easy! Pretty much every multi-stage application code can be ported with minimal changes. Here is an example -
Original Code
This following code reads a list of documents, chunks them, embeds the chunks and writes the embeddings to a database.
This is a typical pattern in many applications. While this works in notebooks and in prototypes, it has the following limitations -
- The code is synchronous and runs on a single machine. It can’t scale to large datasets.
- If any step fails, you have to re-run the entire process. e.g Chunk and Embed if the database write fails.
- You have to build a solution for re-indexing data if you change the embedding model.
- You have to build a solution for managing access control and namespaces for different data sources.
- Have to wrap it with FastAPI or other server frameworks to make it callable from other applications.
Migration to Indexify
Decorate functions which are units of work
Define and Deploy the Graph
This is the main change. We define a graph that connects these functions. The edges of the graph defines the data flow across the functions.
At this point, your graph is deployed as an Remote API. You can call it from any application code.
Call the Graph from Applications
Benefits
- You get a remote API for your workflow without writing any server code.
- The invocations to the graph are automatically load balanced and parallelized.
- If any step fails, the step will be retried without re-running the entire process.
- Graphs are automatically versioned, and when you roll out a new model, you can re-run the graphs on the older data.
- The embedding function can be run on GPUs while the database writes and chunking can run on CPUs. Every second of GPU time is being spent on the embedding function, and you are not paying for the GPU when the database write is happening.
Was this page helpful?