Key Concepts

Tensorlake applications

Tensorlake applications are the top-level decorators that define your applications. You can define as many applications as you want in your project. Each one of them will be assigned a unique HTTP entry point based on the name of the Python function.

from tensorlake.applications import application, function

# This application's name will be `hello_world`.
@application()
@function()
def hello_world():
    print("Hello, world!")

# This application's name will be `hola_mundo`.
@application()
@function()
def hola_mundo():
    print("Hola, mundo!")

Configuring Tensorlake applications

The @application decorator allows you to specify the following attributes:

tags - dict of tags to categorize the application.
retries - Retry policy for every function in the application unless a function specifies its own retry policy. No retries by default if function failed. See Retries.
region - The region where every function in the application will be deployed unless a function specifies its own region. Either us-east-1 or eu-west-1. The default is any of the regions.
input_deserializer - The deserializer to use for the application input. Either json or pickle. The default is json.
output_serializer - The serializer to use for the application output. Either json or pickle. The default is json.

The following code snippet shows an example of all the function attributes set to custom values.

from tensorlake.applications import application, Retries

@application(
    tags={"language": "python"},
    retries=Retries(max_retries=3),
    region="us-east-1",
    input_serializer="pickle",
    output_serializer="pickle"
)
@function()
def hello_world(input: str):
    print(f"Hello, world! {input}")

Tensorlake functions

Tensorlake functions are the building blocks of applications. They are Python functions decorated with the @function decorator. They can be used to perform any isolated computation.

from tensorlake.applications import application, function

@application()
@function()
def my_function(data: str) -> int:
    return len(data)

Calling other Tensorlake functions

Tensorlake functions can call other Tensorlake functions. The called function is executed in its own function container. The function call blocks until the called function returns its output to the calling function.

from tensorlake.applications import application, function

@application()
@function()
def my_function(data: str) -> int:
    returned_value: int = other_function(data)
    # returned_value == len(data)
    return returned_value

@function
def other_function(data: str) -> int:
    return len(data)

Application input and output

Application functions decorated with @application() always take a single argument which is the current request input. The input value is deserialized using input_deserializer (JSON by default). For example if your application function takes a single str argument, and the application function input_deserializer is JSON, then the request input should be a JSON string:

from tensorlake.applications import function, application

@application()
@function()
def greet(data: str) -> str:
    return data + " from greet!"

request input

"Hello, world!"

request output

"Hello, world! from greet!"

If you want to use a complex data structure as an application request input or output, you can define Pydantic model classes and use them in the application function type hints. For example:

from pydantic import BaseModel
from tensorlake.applications import function

class PersonSearchQuery(BaseModel):
    name: str
    age: int

class PersonSearchResult(BaseModel):
    matches: list[dict]


@function()
def process_data(query: PersonSearchQuery) -> PersonSearchResult:
    return PersonSearchResult(matches=[{"name": query.name, "age": query.age, "id": 1}])

request input

{"name": "John", "age": 30}

response output

{"matches":[{"name":"John","age":30,"id":1}]}

Non-application function parameters and return types

Functions that don’t have the @application() decorator can take multiple arguments and return multiple values. Such functions use Python pickle serialization format to serialize and deserialize their arguments and returned values. So you can use any Python data type when calling non-application functions, including complex nested structures as long as they are picklable which is true in most cases. Some notable exceptions are file objects, database connections, threads, etc. Any valid function call signature is supported including positional and keyword arguments.

Configuring Tensorlake functions

The @function decorator allows you to set the following attributes:

description - A description of the function’s purpose and behavior.
cpu - The number of CPUs available to the function. The default is 1.0 CPU. See CPU.
memory - The memory GB available to the function. The default is 1.0 GB. See Memory.
ephemeral_disk - The ephemeral /tmp disk space available to the function in GB. The default is 2.0 GB. See Ephemeral Disk.
gpu - The GPU model available to the function. The default is None (no GPU). Please contact support@tensorlake.ai to enable GPU support.
timeout - The timeout for the function in seconds. The default is 5 minutes. See Timeouts.
image - The image to use for the function container. A basic Debian based image by default. See Images.
secrets - The secrets available to the function in its environment variables. No secrets by default. See Secrets.
retries - Retry policy for the function. No retries by default if function failed. See Retries.
region - The region where the function will be deployed. Either us-east-1 or eu-west-1. The default is any of the regions.

The following code snippet shows an example of all the function attributes set to custom values.

from tensorlake.applications import function, Image, Retries

@function(
    # Use Ubuntu as a base image instead of the default Debian
    image=Image(base_image="ubuntu:latest"),
    # Make my_secret available to the function as an environment variable
    secrets=["my_secret"],
    # Description of the function in the workflow
    description="Measures the string using its length",
    # Retry the function twice if it fails
    retries=Retries(max_retries=2),
    # Function fails if it was running for more than 30 seconds and didn't report any progress
    timeout=30,
    # 2 CPUs are available to the function
    cpu=2,
    # 4 GB of memory is available to the function
    memory=4,
    # 2 GB of ephemeral /tmp disk space is available to the function
    ephemeral_disk=2,
    # Run the function in a container with GPU support
    gpu="H100",
    # Run the function in us-east-1 region only
    region="us-east-1",
)
def string_length(s: str) -> int:
    return len(s)

Classes

Sometimes a function needs one time initialization, like loading a large model into memory. This is acheived by defining a class using @cls decorator. Classes use their __init__(self) constructor to run any initialization code once on function container startup. The constructor can not have any arguments other than self. Any number of class methods can be decorated with @function.

from large_model import load_large_model
from tensorlake.applications import cls, function

@cls()
class MyCompute:
    def __init__(self):
        # Run initialization code once on function container startup
        self.model = load_large_model()

    @function(cpu=4, memory=16)
    def run(self, data: str) -> int:
        return self.model.run(data)

Timeouts

When a function runs longer than its timeout, it is terminated and marked as failed. The timeout in seconds is set using the timeout attribute. The default timeout is 300 (5 minutes). Minimum is 1, maximum is 172800 (48 hours). Progress updates can be sent by the function to extend the timeout. See Request Context.

Retries

When a function fails by raising an exception or timing out, it gets retried according to its retry policy. The default retry policy is to not retry the function call. You can specify a custom retry policy using the retries attribute.

from tensorlake.applications import function, Retries

# Retry the function once if it failed
@function(retries=Retries(max_retries=1))
def my_function() -> int:
    raise Exception("Something went wrong")

You can set default retry policy for all the functions in the application decorator. See the Applications guide.

Request Context

Functions can use a request context to share state between function calls of the same request. The context has information about the current request and provides access to Tensorlake APIs for the current request. You can access the request context directly from the RequestContext class.

import time
from tensorlake.applications import RequestContext, function

@function()
def initial_function(data: str) -> int:
    ctx: RequestContext = RequestContext.get()
    start_time = time.time()

    # Set request progress to 1 out of 2.
    # This extends the function timeout by resetting it.
    ctx.progress.update(1, 2)
    # Print the request information.
    print(f"Request ID: {ctx.request_id}")
    # Set a request key-value pair. It's availble to the functions in the same request.
    ctx.state.set("my_function_data", data)

    # Set the time that has passed since the function call started.
    ctx.metrics.timer("initial_function", time.time() - start_time)

    # Call the next function and return its result.
    final_result: int = final_function()
    # Mark the function call as almost done by setting its progress.
    ctx.progress.update(2, 2)
    # Return the result.
    # final_result == len(data)
    return final_result

@function()
def final_function() -> int:
    ctx: RequestContext = RequestContext.get()
    # Fetch the state data and calculate its length.
    return len(ctx.request_state.get("my_function_data"))

When a function publishes a progress update for its request then the timeout of the function call restarts. For example, if a function has a 4 minutes timeout and calls ctx.progress.update(...) after 2 minutes of execution, then the timeout is reset to 4 minutes from that point, allowing the function to run for another 4 minutes.

CPU

The number of CPUs available to the function is set using the cpu attribute. Minimum is 1.0, maximum is 8.0. The default is 1.0. This is usually sufficient for functions that only call external APIs and do simple data processing. Adding more CPUs is recommended for functions that do complex data processing or work with large datasets. If functions use large multy-gigabyte inputs or produce large multi-gigabyte outputs, then at least 3 CPUs are recommended. This results in the fastest download and upload speeds for the data.

Memory

GB memory available to the function is set using the memory attribute. Minimum is 1.0, maximum is 32.0. The default is 1.0. This is usually sufficient for functions that only call external APIs and do simple data processing. Adding more memory is recommended for functions that do complex data processing or work with large datasets. It’s recommended to set memory to at least 2x the size of the largest inputs and outputs of the function. This is because when the inputs/outputs are deserialized/serialized both serialized and deserialized representations are kept in memory.

Ephemeral disk

Ephemeral disk space is a temporary storage space available to functions at /tmp path. It gets erased when its function container gets terminated. It’s optimal for storing temporary files that are not needed after the function execution is completed. Ephemeral disks are backed by fast SSD drives. Using other filesystem paths like /home/ubuntu for storing temporary files will result in slower performance. Temporary files created using Python modules like tempfile are stored in ephemeral disk space inside /tmp. GB of ephemeral disk space available to the function is set using ephemeral_disk attribute. Minimum is 2.0, maximum is 50.0. The default is 2.0 GB. This is usually sufficient for functions that only call external APIs and do simple data processing. If the function needs to temporarily store large files or datasets on disk, then the ephemeral_disk attribute should be increased accordingly.

Tensorlake

Applications

Document Ingestion

FAQ

Open Source

Tensorlake applications

Configuring Tensorlake applications

Tensorlake functions

Calling other Tensorlake functions

Application input and output

Non-application function parameters and return types

Configuring Tensorlake functions

Classes

Timeouts

Retries

Request Context

CPU

Memory

Ephemeral disk

Tensorlake

Applications

Document Ingestion

FAQ

Open Source

​Tensorlake applications

​Configuring Tensorlake applications

​Tensorlake functions

​Calling other Tensorlake functions

​Application input and output

​Non-application function parameters and return types

​Configuring Tensorlake functions

​Classes

​Timeouts

​Retries

​Request Context

​CPU

​Memory

​Ephemeral disk

Tensorlake applications

Configuring Tensorlake applications

Tensorlake functions

Calling other Tensorlake functions

Application input and output

Non-application function parameters and return types

Configuring Tensorlake functions

Classes

Timeouts

Retries

Request Context

CPU

Memory

Ephemeral disk