> ## Documentation Index
> Fetch the complete documentation index at: https://docs.tensorlake.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Autoscaling

> Autoscaling guide for Orchestration endpoints

Tensorlake scales your `@function()` sandboxes automatically.

In most cases, you do not need to configure anything. Start with defaults, then tune only if you have a specific latency or cost goal.

## Default Behavior

With just `@function()`, Tensorlake does this automatically:

* Creates containers when requests arrive
* Scales to zero when idle
* Adds more containers as traffic grows

```python theme={null}
from tensorlake.applications import function

@function()
def agent(prompt: str) -> str:
    ...
```

This is the simplest and most cost-efficient setup for many async and internal workloads.

## Scaling Settings

Use these only when default on-demand scaling is not enough:

| Setting           | What it controls      | What happens                                                    |
| ----------------- | --------------------- | --------------------------------------------------------------- |
| `warm_containers` | Ready-to-serve buffer | Keeps extra pre-started containers ready so bursts start faster |
| `max_containers`  | Capacity ceiling      | Caps total containers so scale and cost stay bounded            |

How they work together:

* `warm_containers` adds ready capacity above current demand.
* `max_containers` limits the final upper bound.
* If demand exceeds `max_containers`, requests wait in queue.

## Practical Examples

### 1) Reduce cold starts

If this is a user-facing endpoint and startup delay is noticeable:

```python theme={null}
@function(warm_containers=2)
def agent(prompt: str) -> str:
    ...
```

### 2) Cap spend or protect downstream APIs

If you need to bound scale:

```python theme={null}
@function(max_containers=10)
def agent(prompt: str) -> str:
    ...
```

When all 10 are busy, new requests wait in queue.

### 3) Balance low latency with bounded scale

If you want faster startup plus bounded scaling:

```python theme={null}
@function(warm_containers=2, max_containers=20)
def agent(prompt: str) -> str:
    ...
```

Result:

* 2 warm containers are ready for faster responses
* Scale is still capped at 20 containers

### 4) High-throughput with a safety ceiling

```python theme={null}
@function(
    warm_containers=4,
    max_containers=50,
)
def agent(prompt: str) -> str:
    ...
```

## How to Choose Values

Start with `@function()` and add knobs only for a specific goal:

* Lower first-request latency: set `warm_containers=1`, then increase gradually.
* Budget or downstream protection: set `max_containers` to a safe upper limit.
* Stable setup: add a small `warm_containers` buffer, then cap with `max_containers`.
* Keep changes incremental: update one knob, test, then adjust.

## Learn More

<CardGroup cols={2}>
  <Card title="Scale-Out & Queuing" icon="chart-line-up" href="/applications/scale-out-queuing">
    How queueing works when demand exceeds available capacity
  </Card>

  <Card title="Rate Limits" icon="rotate" href="/applications/retries">
    Pattern for handling transient API failures safely
  </Card>
</CardGroup>