> ## Documentation Index
> Fetch the complete documentation index at: https://docs.tensorlake.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Scale-Out & Queuing

> Workflows scale automatically as endpoints are called, with configurable scaling per function

Workflows scale out automatically as their endpoints are called. When you invoke a workflow, Tensorlake spins up containers for each function as needed, processes the request, and scales back down when idle.

Each function in your workflow can have its own scaling configuration. You control scaling behavior with two parameters: `warm_containers` and `max_containers`.

**Example workflow with scaling:**

```python theme={null}
from tensorlake.applications import application, function

@function(warm_containers=2, max_containers=10)
def enrich_data(record_id: str) -> dict:
    # 2 containers always warm, scales up to 10
    ...

@function()
def transform(data: dict) -> dict:
    # Transform the enriched data
    ...

@application()
@function()
def process_workflow(record_id: str) -> dict:
    enriched = enrich_data.future(record_id)
    return transform.future(enriched)
```

When you call `POST /applications/process_workflow`, the workflow endpoint scales automatically, and each function scales based on its configuration.

## Scaling Parameters

Configure scaling in the `@function()` decorator:

```python theme={null}
from tensorlake.applications import function

@function(
    warm_containers=2,
    max_containers=10
)
def process_data(data: str) -> str:
    ...
```

### `warm_containers`

Number of pre-warmed containers to keep ready. Warm containers have your code and dependencies loaded, eliminating cold start latency for incoming requests.

```python theme={null}
@function(warm_containers=3)
def classify_document(content: str) -> str:
    """3 containers are always warm and ready to handle requests."""
    # Critical first step in workflow - needs low latency
    ...
```

Use warm containers when:

* You need low-latency responses
* Cold starts are unacceptable for your use case
* You have predictable baseline traffic

### `max_containers`

Maximum number of containers. Once this limit is reached, additional requests are automatically queued and processed in FIFO order as containers become available.

```python theme={null}
@function(max_containers=5)
def bounded_processing(data: str) -> str:
    """No more than 5 containers will run simultaneously."""
    ...
```

## Automatic Queuing

When all containers for a function are busy and `max_containers` has been reached, Tensorlake automatically queues incoming requests. No configuration is needed — queuing is built into the platform.

* Requests are processed in **FIFO order**
* Queued requests begin processing as soon as a container becomes available
* No separate queue infrastructure (Redis, SQS, RabbitMQ) is required

```python theme={null}
@function(max_containers=3)
def process_with_llm(data: str) -> str:
    """At most 3 concurrent LLM calls. Additional requests are queued."""
    # Expensive workflow step - limit concurrency to control costs
    ...
```

## Combined Behaviors

Combine parameters for fine-grained control:

### Low-latency with bounded scale

```python theme={null}
@function(
    warm_containers=2,   # 2 containers ready for instant response
    max_containers=10    # Scale up to 10, then queue
)
def extract_entities(document: str) -> dict:
    # First step in document workflow - needs low latency
    ...
```

### High-throughput with bounded cost

```python theme={null}
@function(
    warm_containers=4,   # 4 containers pre-warmed
    max_containers=50    # Scale up to 50, then queue
)
def enrich_from_api(record_id: str) -> dict:
    # High-volume workflow step with bounded scale
    ...
```

## Scaling in Workflows

Each function in your workflow scales independently. This allows different workflow steps to have different scaling profiles based on their resource requirements and latency needs:

```python theme={null}
@function(warm_containers=5, max_containers=50)
def fetch_data(record_id: str) -> dict:
    """High-throughput data fetching with low latency."""
    ...

@function(max_containers=3)
def analyze_with_llm(data: dict) -> dict:
    """Expensive LLM analysis, bounded concurrency to control costs."""
    ...

@application()
@function()
def process_record(record_id: str) -> dict:
    # fetch_data can handle 50 concurrent requests
    data = fetch_data.future(record_id)
    # analyze_with_llm is limited to 3 concurrent executions
    return analyze_with_llm.future(data)
```

In this workflow, `fetch_data` can scale to 50 containers for high throughput, while `analyze_with_llm` is capped at 3 to control costs. When you call the `process_record` endpoint, both functions scale independently based on their configuration.

## Default Behavior

Without any scaling parameters, workflow functions scale dynamically:

* Containers scale from zero based on demand when the workflow endpoint is called
* There is no upper bound on container count
* Cold starts occur for the first request after an idle period
* No automatic queuing (unlimited scaling)

## Learn More

<CardGroup cols={2}>
  <Card title="SDK Reference" icon="rectangle-code" href="/applications/concepts">
    Full @function() decorator reference.
  </Card>

  <Card title="Agentic Patterns" icon="robot" href="/applications/overview">
    Structuring agents for scale.
  </Card>

  <Card title="Building Workflows" icon="diagram-project" href="/applications/building-workflows">
    Multi-step data workflows.
  </Card>
</CardGroup>
