Key Concepts
Indexify represents data-intensive AI workflows as graphs, where each node is a function operating on data, and edges represent data flow between functions. Here are the main components -
Graphs
These are multi-step workflows created by connecting multiple functions together. Some attributes of Graphs-
- Node: Represents a function that operates on data.
- Start Node: which is the first funciton that is executed when the graph is invoked.
- Edges: Represents data flow between functions.
- Conditional Edge: Evaluates input data from the previous function and decide which edges to take. They are like if-else statements in programming.
Functions
They are regular Python functions, decorated with @indexify_function()
decorator.
Function can be executed in a distributed manner, and the output is stored so that if downstream functions fail, they can be resumed from the output of the function.
There are various other parameters, in the decorator that can be used to configure retry behaviour, placement constraints, and more.
Namespaces
Namespaces as logical abstractions for storing related content. This allows for effective data partitioning based on security requirements or organizational boundaries.
Programming Model
Map
Automatically parallelize functions across multiple machines when a function returns a sequence and the downstream function accepts only a single element of that sequence.
Use Cases: Generating Embedding from every single chunk of a document.
Reducing/Accumulating from Sequences
Reduce functions in Indexify aggregate outputs from one or more functions that return sequences. They operate with the following characteristics:
- Lazy Evaluation: Reduce functions are invoked incrementally as elements become available for aggregation. This allows for efficient processing of large datasets or streams of data.
- Stateful Aggregation: The aggregated value is persisted between invocations. Each time the Reduce function is called, it receives the current aggregated state along with the new element to be processed.
Use Cases: Aggregating a summary from hundreds of web pages.
Dynamic Routing
Functions can route data to different nodes based on custom logic, enabling dynamic branching.
Use Cases: Processing outputs differently based on classification results.
Was this page helpful?