Skip to main content
Tensorlake scales your @function() sandboxes automatically. In most cases, you do not need to configure anything. Start with defaults, then tune only if you have a specific latency or cost goal.

Default Behavior

With just @function(), Tensorlake does this automatically:
  • Creates containers when requests arrive
  • Scales to zero when idle
  • Adds more containers as traffic grows
from tensorlake.applications import function

@function()
def agent(prompt: str) -> str:
    ...
This is the simplest and most cost-efficient setup for many async and internal workloads.

Scaling Settings

Use these only when default on-demand scaling is not enough:
SettingWhat it controlsWhat happens
warm_containersReady-to-serve bufferKeeps extra pre-started containers ready so bursts start faster
max_containersCapacity ceilingCaps total containers so scale and cost stay bounded
How they work together:
  • warm_containers adds ready capacity above current demand.
  • max_containers limits the final upper bound.
  • If demand exceeds max_containers, requests wait in queue.

Practical Examples

1) Reduce cold starts

If this is a user-facing endpoint and startup delay is noticeable:
@function(warm_containers=2)
def agent(prompt: str) -> str:
    ...

2) Cap spend or protect downstream APIs

If you need to bound scale:
@function(max_containers=10)
def agent(prompt: str) -> str:
    ...
When all 10 are busy, new requests wait in queue.

3) Balance low latency with bounded scale

If you want faster startup plus bounded scaling:
@function(warm_containers=2, max_containers=20)
def agent(prompt: str) -> str:
    ...
Result:
  • 2 warm containers are ready for faster responses
  • Scale is still capped at 20 containers

4) High-throughput with a safety ceiling

@function(
    warm_containers=4,
    max_containers=50,
)
def agent(prompt: str) -> str:
    ...

How to Choose Values

Start with @function() and add knobs only for a specific goal:
  • Lower first-request latency: set warm_containers=1, then increase gradually.
  • Budget or downstream protection: set max_containers to a safe upper limit.
  • Stable setup: add a small warm_containers buffer, then cap with max_containers.
  • Keep changes incremental: update one knob, test, then adjust.

Learn More

Scale-Out & Queuing

How queueing works when demand exceeds available capacity

Rate Limits

Pattern for handling transient API failures safely