Skip to main content

Tensorlake applications

Tensorlake applications are the top-level decorators that define your applications. You can define as many applications as you want in your project. Each one of them will be assigned a unique entry point based on the name of the Python function.
from tensorlake.applications import application, function

# This application's name will be `hello_world`.
@application()
@function()
def hello_world():
    print("Hello, world!")

# This application's name will be `hola_mundo`.
@application()
@function()
def hola_mundo():
    print("Hola, mundo!")

Configuring Tensorlake applications

The @application decorator allows you to specify the following attributes:
  1. tags - list of tags to categorize the application.
  2. retries - Retry policy for every function in the application. No retries by default if function failed. See Retries.
  3. region - The region where the function will be deployed. Either US or EU. The default is US.
  4. input_serializer - The serializer to use for the input data. Either json or pickle. The default is json.
  5. output_serializer - The serializer to use for the output data. Either json or pickle. The default is json.
The following code snippet shows an example of all the function attributes set to custom values.
from tensorlake.applications import application, Retries

@application(
    tags=["example", "python"],
    retries=Retries(max_attempts=3),
    region="EU",
    input_serializer="pickle",
    output_serializer="pickle"
)
@function()
def hello_world():
    print("Hello, world!")

Tensorlake functions

Tensorlake functions are the building blocks of applications. They are Python functions decorated with the @function decorator. They can be used to perform any isolated computation.
from tensorlake.applications import application, function

@application()
@function()
def my_function(data: str) -> int:
    return len(data)

Calling other Tensorlake functions

Tensorlake functions can also call other Tensorlake functions. This allows to chain functions together into a workflow graph. A Tensorlake function always returns a Python future. When you chain functions together, the futures will be resolved when the main application function returns its output.
from tensorlake.applications import application, function

@application()
@function()
def my_function(data: str) -> int:
    return other_function(data)

@function
def other_function(data: str) -> int:
    return len(data)

Tensorlake function arguments

Tensorlake functions take a single argument. The value of the argument is deserialized based on the serialization format to match the function’s type signature. As a basic example, if your function takes a string as the input, the expected input format is a JSON string containing a single string value. For example:
from tensorlake.applications import function

@function()
def greet(data: str) -> str:
    return data
input value
"Hello, world!"
If you want to use a complex data structure as the input, you can define a custom type and use it as the function’s argument type. For example:
from pydantic import BaseModel
from tensorlake.applications import function

class MyData(BaseModel):
    name: str
    age: int

@function()
def process_data(data: MyData) -> str:
    return data.name + " is " + str(data.age) + " years old"
input value
{"name": "John", "age": 30}

Configuring Tensorlake functions

The @function decorator allows you to specify the following attributes:
  1. description - A description of the function’s purpose and behavior.
  2. cpu - The number of CPUs available to the function. The default is 1.0 CPU. See CPU.
  3. memory - The memory GB available to the function. The default is 1.0 GB. See Memory.
  4. ephemeral_disk - The ephemeral /tmp disk space available to the function in GB. The default is 2.0 GB. See Ephemeral Disk.
  5. gpu - The GPU model available to the function. The default is None (no GPU). Please contact support@tensorlake.ai to enable GPU support.
  6. timeout - The timeout for the function in seconds. The default is 5 minutes. See Timeouts.
  7. image - The image to use for the function container. A basic Debian based image by default. See Images.
  8. secrets - The secrets available to the function in its environment variables. No secrets by default. See Secrets.
  9. retries - Retry policy for the function. No retries by default if function failed. See Retries.
  10. cacheable - If True, reusing previous function outputs is allowed. False by default. See Caching.
  11. region - The region where the function will be deployed. Either US or EU. The default is US.
The following code snippet shows an example of all the function attributes set to custom values.
from tensorlake.applications import function, Image, Retries

@function(
    # Use Ubuntu as a base image instead of the default Debian
    image=Image().base_image("ubuntu:latest"),
    # Make my_secret available to the function as an environment variable
    secrets=["my_secret"],
    # Description of the function in the workflow
    description="Measures the string using its length",
    # Retry the function twice if it fails
    retries=Retries(max_retries=2),
    # Function fails if it was running for more than 30 seconds and didn't report any progress
    timeout=30,
    # Reuse previous function outputs if the function gets called with the same inputs
    cacheable=True,
    # 2 CPUs are available to the function
    cpu=2,
    # 4 GB of memory is available to the function
    memory=4,
    # 2 GB of ephemeral /tmp disk space is available to the function
    ephemeral_disk=2,
    # Run the function in a container with GPU support
    gpu="nvidia-tesla-k80",
    # Run the function in the EU region
    region="EU",
)
def string_length(s: str) -> int:
    return len(s)

Classes

Sometimes a function needs expensive initialization, like loading a large model into memory. You can define a function with the @cls decorator. Classes use their __init__(self) constructor to run any initialization code once on function container startup. Every function inside this class decorated with @function will behave as any other Tensorlake function.
from large_model import load_large_model
from tensorlake.applications import cls

@cls()
class MyCompute:
    # The same attributes as for @tensorlake_function() decorator
    image = ...
    input_encoder = ...

    def __init__(self):
        # Run initialization code once on function container startup
        self.model = load_large_model()

    @function()
    def run(self, data: str) -> int:
        return self.model.run(data)

Input and Output Serialization

Inputs and Outputs to functions are serialized and deserialized as JSON by default. This is a good default since the applications are exposed as HTTP endpoints, thus making it possible to call them from any programming language. You can also change the serialization format to pickle if you want to pass complex Python objects between functions, such as Pandas dataframes, Pytorch Tensors, PIL images, etc. pickle requires the objects to be serialized and deserialized on the same Python version. This requires function containers to use the same Python version. The input_serializer and output_serializer attributes in the @application decorator can be used to change the serialization format. Currently supported formats are:
  • json - JSON serialization
  • pickle - Cloudpickle serialization

Timeouts

When a function runs longer than its timeout, it is terminated and marked as failed. The timeout in seconds is set using the timeout attribute. The default timeout is 300 (5 minutes). Minimum is 1, maximum is 172800 (48 hours). Progress updates can be sent by the function to extend the timeout. See Request Context.

Retries

When a function fails by raising an exception or timing out, it gets retried according to its retry policy. The default retry policy is to not retry the function. You can specify a custom retry policy using the retries attribute.
from tensorlake.applications import function, Retries

# Retry the function once if it failed
@function(retries=Retries(max_retries=1))
def my_function() -> int:
    raise Exception("Something went wrong")
You can set default retry policy for all the functions in the application decorator. See the Applications guide.

Request Context

Functions can use a request context to share state between invocations in the same request. The context has information about the current request and provides access to Tensorlake APIs for the current request. You can access the request context directly from the RequestContext class.
import time
from tensorlake.applications import RequestContext, function

@function()
def initial_function(data: str) -> int:
    ctx: RequestContext = RequestContext.get()
    start_time = time.time()

    # Set request progress to 1 out of 2.
    ctx.progress.update(1, 2)
    # Print the request information.
    print(f"Request ID: {ctx.request_id}, Graph name: {ctx.graph_name}, Graph version: {ctx.graph_version}")
    # Set a request key-value pair. It's availble to the function in the same request.
    ctx.state.set("my_function_data", data)

    # Set the time that has passed since the function call started.
    ctx.metrics.timer("initial_function", time.time() - start_time)

    # Call the next function to complete the request.
    return final_function()

@function()
def final_function() -> int:
    ctx: RequestContext = RequestContext.get()
    # Set the request progress to 2 out of 2.
    ctx.progress.update(2, 2)

    # Fetch the state data and calculate its length.
    return len(ctx.request_state.get("my_function_data"))
When a function publishes a progress update for its request then the timeout of the function restarts. For example, if a function has a 4 minutes timeout and calls ctx.update_progress after 2 minutes of execution, then the timeout is reset to 4 minutes from that point, allowing the function to run for another 4 minutes.

Caching

If cacheable function attribute is True, then Tensorlake assumes that the function returns the same outputs for the same inputs. This allows Tensorlake to cache the outputs of the function and reuse them when the function is called with the same inputs again. When cached outputs are used, the function is not executed. This speeds up requests and makes them cheaper to run. The size of the cache and the caching duration is controlled by the Tensorlake Platform.

CPU

The number of CPUs available to the function is set using the cpu attribute. Minimum is 1.0, maximum is 8.0. The default is 1.0. This is usually sufficient for functions that only call external APIs and do simple data processing. Adding more CPUs is recommended for functions that do complex data processing or work with large datasets. If functions use large multy-gigabyte inputs or produce large multi-gigabyte outputs, then at least 3 CPUs are recommended. This results in the fastest download and upload speeds for the data.

Memory

GB memory available to the function is set using the memory attribute. Minimum is 1.0, maximum is 32.0. The default is 1.0. This is usually sufficient for functions that only call external APIs and do simple data processing. Adding more memory is recommended for functions that do complex data processing or work with large datasets. It’s recommended to set memory to at least 2x the size of the largest inputs and outputs of the function. This is because when the inputs/outputs are deserialized/serialized both serialized and deserialized representations are kept in memory.

Ephemeral disk

Ephemeral disk space is a temporary storage space available to functions at /tmp path. It gets erased when its function container gets terminated. It’s optimal for storing temporary files that are not needed after the function execution is completed. Ephemeral disks are backed by fast SSD drives. Using other filesystem paths like /home/ubuntu for storing temporary files will result in slower performance. Temporary files created using Python modules like tempfile are stored in ephemeral disk space inside /tmp. GB of ephemeral disk space available to the function is set using ephemeral_disk attribute. Minimum is 2.0, maximum is 50.0. The default is 2.0 GB. This is usually sufficient for functions that only call external APIs and do simple data processing. If the function needs to temporarily store large files or datasets on disk, then the ephemeral_disk attribute should be increased accordingly.

Streaming Progress

You can stream the progress of your requests for interactive use-cases, to notify users about the progress of the request.
bash
curl -N -X POST https://api.tensorlake.com/applications/my_application/progress \
  -H "Content-Type: text/event-stream" \
I