Indexify

We have open-sourced the core task scheduler and orchestration engine that powers Tensorlake Workflows. You can use it to build a platform for your company if you are building data-intensive workflows for AI Applications or Data Science projects. Our Document Ingestion Engine is built on Indexify as well. Indexify is an alternative to -

Apache Airflow, Prefect and Temporal - For building and running data-intensive workflows.
Apache Spark/Ray/Dask - For doing map-reduce style parallel processing.

In short, you get a durable workflow engine like Temporal, with the ability to scale out like Spark on 1000s of nodes in a cluster. And, in addition, you can - Run workflows across many different clouds or compute providers. For example, you can run the control plane in AWS, store data in S3, and run the data processing on GCP, Azure, Lambda Labs, Digital Ocean, etc. This enables you to acquire the right compute resources, at the right price point, with the developer experience of building and testing workflows locally and deploying and running them in the cloud.

Quick Start

Let’s create a simple workflow to summarize a website on-demand! It demonstrates how to build and serve a workflow as a remote Python API.

Install

Install the Indexify SDK.

pip install indexify openai requests

Define the Graph

We will write two functions, scrape_website and summarize_text. We create a Graph website-summarizer that executes the scrape function, and then executes the summarizer with the outputs of the scraper.

from indexify import tensorlake_function, Graph

@tensorlake_function()
def scrape_website(url: str) -> str:
    import requests
    return requests.get(f"http://r.jina.ai/{url}").text

@tensorlake_function()
def summarize_text(text: str) -> str:
    from openai import OpenAI
    completion = OpenAI().chat.completions.create(
        model="gpt-4o-mini-2024-07-18",
        messages=[
            {"role": "system", "content": "You are a helpful assistant. Generate a summary of this website"},
            {"role": "user", "content": text},
        ],
    )
    return completion.choices[0].message.content

g = Graph(name="website-summarizer", start_node=scrape_website)
g.add_edge(scrape_website, summarize_text)

Test the Graph In-Process

The graph can be run as-is, this is useful for testing.

invocation_id = g.run(url="https://en.wikipedia.org/wiki/Golden_State_Warriors")
results = g.output(invocation_id, "summarize_text")
print(results)

Deploying a Graph as an Remote API

When it’s time to consume your graph from other applications, you can serve it as an API. You can run the server in production in many ways, but here we run this in our laptop to show how it works.

indexify-cli server-dev-mode

Note: The indexify-cli command is part of the indexify python package previously installed.This starts the following processes -

Server: Orchestrates functions in the graph, stores execution state, and hosts Remote Graph APIs.
Executor: Runs the individual functions in the graph.

Once the server is ready, you can deploy the graph -

from indexify import RemoteGraph
RemoteGraph.deploy(g, server_url="http://localhost:8900")

Call a Graph Endpoint

Once the graph is deployed, you can get a reference of the Graph in any application.

graph = RemoteGraph.by_name(name="website-summarizer", server_url="http://localhost:8900")

You can now call the graph as a remote API.

invocation_id = graph.run(block_until_done=True, url="https://en.wikipedia.org/wiki/Golden_State_Warriors")
results = graph.output(invocation_id, "summarize_text")

Tensorlake

Document Ingestion

Workflows

FAQ

Open Source

Quick Start

Tensorlake

Document Ingestion

Workflows

FAQ

Open Source

​Quick Start

Quick Start