@tensorlake_function
decorator.
Functions
The@tensorlake_function
decorator allows you to specify the following attributes:
image
- The image to use for the function container. A basic Debian based image by default. See Images.input_encoder
- The serializer to use for the input of the function.json
by default. See Input and output serialization.output_encoder
- The serializer to use for the output of the function.json
by default. See Input and output serialization.secrets
- The secrets available to the function in its environment variables. No secrets by default. See Secrets.next
- functions called with outputs of this function. This allows to chain functions together into a workflow graph. See Dynamic routing.name
- The name of the function in the workflow. By default, it is the name of the Python function.description
- A description of the function in the workflow. Visible when viewing workflow details.retries
- retry policy for the function. No retries by default if function failed. See Retries.timeout
- The timeout for the function in seconds. The default is 5 minutes. See Timeouts.use_ctx
- IfTrue
then request context is passed to the function.False
by default. See Request Context.accumulate
- If notNone
, turns the function into a reducer.None
by default. See Map-Reduce.cacheable
- IfTrue
, reusing previous function outputs is allowed.False
by default. See Caching.cpu
- The number of CPUs available to the function. The default is1.0
CPU. See CPU.memory
- The memory GB available to the function. The default is1.0
GB. See Memory.ephemeral_disk
- The ephemeral/tmp
disk space available to the function in GB. The default is2.0
GB. See Ephemeral Disk.
gpu
- The GPU model and count available to the function. The default isNone
(no GPU). Please contactsupport@tensorlake.ai
to enable GPU support.
Classes
Sometimes a function needs expensive initialization, like loading a large model into memory. You can define a function as a class inherited fromTensorlakeCompute
and use its __init__(self)
constructor to run any initialization code once on function container startup.
Under the hood, all functions defined using @tensorlake_function()
decorator get converted into a TensorlakeCompute
instance.
Input and Output Serialization
Inputs and Outputs to functions are serialized and deserialized as JSON by default. This is a good default since the workflows are exposed as HTTP endpoints, thus making it possible to call them from any programming language. You can also change the serialization format tocloudpickle
if you want to pass complex Python objects between functions, such as Pandas dataframes, Pytorch Tensors, PIL
images, etc.
cloudpickle
requires the objects to be serialized and deserialized on the same Python version. This requires function containers to use the same Python version.
The input_encoder
and output_encoder
attributes can be used to change the serialization format. Currently supported formats are:
json
- JSON serializationcloudpickle
- Cloudpickle serialization
Timeouts
When a function runs longer than its timeout, it is terminated and marked as failed. The timeout in seconds is set using thetimeout
attribute.
The default timeout is 300
(5 minutes). Minimum is 1
, maximum is 172800
(48 hours). Progress updates can be sent by the function to extend the
timeout. See Request Context.
Retries
When a function fails by raising an exception or timing out, it gets retried according to its retry policy. The default retry policy is to not retry the function. You can specify a custom retry policy using theretries
attribute.
Request Context
Ifuse_ctx
function attribute is True
then the function gets a request context as its first parameter with name ctx
.
The context has information about the current request and provides access to Tensorlake APIs for the current request.
By default, the request context is not passed to the function.
ctx.update_progress
after 2 minutes of execution,
then the timeout is reset to 4 minutes from that point, allowing the function to run for another 4 minutes.
Caching
Ifcacheable
function attribute is True
, then Tensorlake assumes that the function returns the same outputs for the same inputs.
This allows Tensorlake to cache the outputs of the function and reuse them when the function is called with the same inputs again.
When cached outputs are used, the function is not executed. This speeds up requests and makes them cheaper to run.
The size of the cache and the caching duration is controlled by the Tensorlake Platform.
CPU
The number of CPUs available to the function is set using thecpu
attribute. Minimum is 1.0
, maximum is 8.0
.
The default is 1.0
. This is usually sufficient for functions that only call external APIs and do simple data processing.
Adding more CPUs is recommended for functions that do complex data processing or work with large datasets.
If functions use large multy-gigabyte inputs or produce large multi-gigabyte outputs, then at least 3 CPUs are recommended.
This results in the fastest download and upload speeds for the data.
Memory
GB memory available to the function is set using thememory
attribute. Minimum is 1.0
, maximum is 32.0
.
The default is 1.0
. This is usually sufficient for functions that only call external APIs and do simple data processing.
Adding more memory is recommended for functions that do complex data processing or work with large datasets.
It’s recommended to set memory
to at least 2x the size of the largest inputs and outputs of the function.
This is because when the inputs/outputs are deserialized/serialized both serialized and deserialized representations are
kept in memory.
Ephemeral disk
Ephemeral disk space is a temporary storage space available to functions at/tmp
path. It gets erased when its
function container gets terminated. It’s optimal for storing temporary files that are not needed after the function
execution is completed. Ephemeral disks are backed by fast SSD drives. Using other filesystem paths like /home/ubuntu
for storing temporary files will result in slower performance. Temporary files created using Python modules like tempfile
are stored in ephemeral disk space inside /tmp
.
GB of ephemeral disk space available to the function is set using ephemeral_disk
attribute. Minimum is 2.0
, maximum is 50.0
.
The default is 2.0
GB. This is usually sufficient for functions that only call external APIs and do simple data processing.
If the function needs to temporarily store large files or datasets on disk, then the ephemeral_disk
attribute should be increased
accordingly.
Graphs
You can string together multiple functions to form a workflow.my_function
is the start node of the workflow. The input to the workflow is passed to my_function
.
Workflows are exposed as HTTP endpoints, the body of the request will be passed to the start node of the workflow, in this case my_function
.
Retrieving Output
Tensorlake workflows allow retrieving the outputs of any function in the workflow.Streaming Progress
You can stream the progress of your requests for interactive use-cases, to notify users about the progress of the request.Default retries
You can set default retry policy for all the functions in a workflow usingretries
attribute of the workflow.
Each function can override the default retry policy by setting its own retries
attribute.
See Retries.