Overview
Workflows are not Generally Available yet. Please contact us to get access if you are interested to build and run hassle free data-intensive workflows on GPUs.
Tensorlake Workflows enable orchestrating data ingestion and transformation. The workflows are distributed and can handle ingesting and processing large volumes of data in parallel. They are durable so it’s guaranteed that all ingested data will be processed.
Workflows are defined by writing Python functions which processes data. The functions are connected to each other to form a workflow which captures the end-to-end data ingestion and transformation logic.
We use Graphs to represent the workflow, to enable parallel execution of disjoint parts of the workflows.
Functions
Functions are the building blocks of workflows. They are regular Python functions, decorated with @tensorlake_function()
decorator.
Functions can be executed in a distributed manner, and the output is stored so that if downstream functions fail, they can be resumed from the output of their upstream functions.
Graphs
Graphs define the data flow in a workflow across functions.
Graph contains:
- Start Function: The first function that is executed when the graph is invoked.
- Edges: Connections between functions.
Testing and Deploying Graphs
Graphs and Functions can be tested locally like any other Python code.
Install the tensorlake
package:
Run the graphs locally:
Deploying Graphs
Calling Graphs
Once a Graph is deployed, they are exposed as HTTP endpoints waiting for data to be sent to them. You can invoke the graphs using HTTP API or Python SDK.
Using HTTP API
Using Python SDK
Resource Usage Metering
Workflows are billed based on the number of seconds they are running any of the functions. We don’t charge you when functions are not running, or data is being ingested but not processed by any function.