Tensorlake Workflows enable orchestrating data ingestion and transformation workflows. The workflows are distributed and can handle ingesting and processing large volumes of data paralelly. You can run inference on ingested data using AI models at any point of the workflow. They are durable so it’s guaranteed that all ingested data will be processed.

Overview of Workflows

Workflows are defined by writing Python functions which processes data. The functions are connected to each other to form a workflow which captures the end-to-end data ingestion and transformation logic.

We use Graphs to represent the workflow, to enable parallel execution of disjoint parts of the workflow, and dynamic routing of data across functions similar to calling functions with if-else statements on the input data.

Tensorlake Functions

Tensorlake functions are the building blocks of workflows. They are Python functions decorated with the @tensorlake_function decorator.

from tensorlake import tensorlake_function

image = Image()
  .run("pip install transformers")
  .build()

@tensorlake_function(image=image, input_encoding="json")
def my_function(data: str) -> int:
    return len(data)

You can write any Python code inside the function, and depend on any Python package. The functions are run inside a container of an image built from the specification of the Image object.

Graphs

You can string together multiple functions to form a workflow.

g = Graph(start_node=my_function, name="my_workflow", description="My workflow")
g.add_edge(my_function, my_function1)
g.add_edge(my_function, my_function2)

In the above example, my_function is the start node of the workflow. The input to the workflow is passed to my_function.

Workflows are exposed as HTTP endpoints, the body of the request will be passed to the start node of the workflow, in this case my_function.

curl -X POST http://localhost:8000/workflows/my_workflow/run \
  -H "Content-Type: application/json" \
  -d '{"data": "Hello, world!"}'

Outputs

Tensorlake workflows allow retrieving the outputs of any function in the workflow.

from tensorlake import RemoteGraph

g = RemoteGraph(name="my_workflow")

g.outputs(my_function1)

Automatic ScaleOuts

If a function returns a list, and its edges processes a single element of the list, Tensorlake automatically parallelizes the edges by bringing up many instances of the function.

@tensorlake_function(image=image, input_encoding="json")
def chunk_essay(essay: str) -> List[str]:
    return essay.split("\n\n")

@tensorlake_function(image=image, input_encoding="json")
def count_words(text: str) -> int:
    return len(text.split(" "))

In the above example, count_words will be called in parallel for each chunk of the essay. If you are using tools like Airflow, you would need to use something like Spark, Ray or Dask to parallelize the processing. Tensorlake does this automatically.

Get Started with Tensorlake Serverless

Deploy your first Tensorlake Serverless workflow

Key Concepts

Learn about the key concepts for building workflows