Scale-Out & Queuing

Workflows scale out automatically as their endpoints are called. When you invoke a workflow, Tensorlake spins up containers for each function as needed, processes the request, and scales back down when idle. Each function in your workflow can have its own scaling configuration. You control scaling behavior with three parameters: warm_containers, min_containers, and max_containers. Example workflow with scaling:

from tensorlake.applications import application, function

@function(warm_containers=2, max_containers=10)
def enrich_data(record_id: str) -> dict:
    # 2 containers always warm, scales up to 10
    ...

@function()
def transform(data: dict) -> dict:
    # Transform the enriched data
    ...

@application()
@function()
def process_workflow(record_id: str) -> dict:
    enriched = enrich_data.future(record_id)
    return transform.future(enriched)

When you call POST /applications/process_workflow, the workflow endpoint scales automatically, and each function scales based on its configuration.

Scaling Parameters

Configure scaling in the @function() decorator:

from tensorlake.applications import function

@function(
    warm_containers=2,
    min_containers=1,
    max_containers=10
)
def process_data(data: str) -> str:
    ...

`warm_containers`

Number of pre-warmed containers to keep ready. Warm containers have your code and dependencies loaded, eliminating cold start latency for incoming requests.

@function(warm_containers=3)
def classify_document(content: str) -> str:
    """3 containers are always warm and ready to handle requests."""
    # Critical first step in workflow - needs low latency
    ...

Use warm containers when:

You need low-latency responses
Cold starts are unacceptable for your use case
You have predictable baseline traffic

`min_containers`

Guaranteed minimum number of containers. Unlike warm_containers, these containers may also be actively processing requests. Tensorlake will never scale below this number.

@function(min_containers=2)
def always_available(data: str) -> str:
    """At least 2 containers are always running."""
    ...

`max_containers`

Maximum number of containers. Once this limit is reached, additional requests are automatically queued and processed in FIFO order as containers become available.

@function(max_containers=5)
def bounded_processing(data: str) -> str:
    """No more than 5 containers will run simultaneously."""
    ...

Automatic Queuing

When all containers for a function are busy and max_containers has been reached, Tensorlake automatically queues incoming requests. No configuration is needed — queuing is built into the platform.

Requests are processed in FIFO order
Queued requests begin processing as soon as a container becomes available
No separate queue infrastructure (Redis, SQS, RabbitMQ) is required

@function(max_containers=3)
def process_with_llm(data: str) -> str:
    """At most 3 concurrent LLM calls. Additional requests are queued."""
    # Expensive workflow step - limit concurrency to control costs
    ...

Combined Behaviors

Combine parameters for fine-grained control:

Low-latency with bounded scale

@function(
    warm_containers=2,   # 2 containers ready for instant response
    max_containers=10    # Scale up to 10, then queue
)
def extract_entities(document: str) -> dict:
    # First step in document workflow - needs low latency
    ...

Guaranteed capacity with ceiling

@function(
    min_containers=3,    # Never fewer than 3
    max_containers=20    # Never more than 20
)
def transform_records(records: list[dict]) -> list[dict]:
    # ETL workflow step - guaranteed capacity with cost control
    ...

Full control

@function(
    min_containers=2,    # Always at least 2 running
    warm_containers=4,   # 4 containers pre-warmed (includes the 2 min)
    max_containers=50    # Scale up to 50, then queue
)
def enrich_from_api(record_id: str) -> dict:
    # High-volume workflow step with fine-grained control
    ...

Scaling in Workflows

Each function in your workflow scales independently. This allows different workflow steps to have different scaling profiles based on their resource requirements and latency needs:

@function(warm_containers=5, max_containers=50)
def fetch_data(record_id: str) -> dict:
    """High-throughput data fetching with low latency."""
    ...

@function(max_containers=3)
def analyze_with_llm(data: dict) -> dict:
    """Expensive LLM analysis, bounded concurrency to control costs."""
    ...

@application()
@function()
def process_record(record_id: str) -> dict:
    # fetch_data can handle 50 concurrent requests
    data = fetch_data.future(record_id)
    # analyze_with_llm is limited to 3 concurrent executions
    return analyze_with_llm.future(data)

In this workflow, fetch_data can scale to 50 containers for high throughput, while analyze_with_llm is capped at 3 to control costs. When you call the process_record endpoint, both functions scale independently based on their configuration.

Default Behavior

Without any scaling parameters, workflow functions scale dynamically:

Containers scale from zero based on demand when the workflow endpoint is called
There is no upper bound on container count
Cold starts occur for the first request after an idle period
No automatic queuing (unlimited scaling)

Learn More

SDK Reference

Full @function() decorator reference.

Agentic Patterns

Structuring agents for scale.

Building Workflows

Multi-step data workflows.

Overview

Concepts

Data Workflows

Guides

Production

Open Source

Scaling Parameters

`warm_containers`

`min_containers`

`max_containers`

Automatic Queuing

Combined Behaviors

Low-latency with bounded scale

Guaranteed capacity with ceiling

Full control

Scaling in Workflows

Default Behavior

Learn More

SDK Reference

Agentic Patterns

Building Workflows

Overview

Concepts

Data Workflows

Guides

Production

Open Source

​Scaling Parameters

​warm_containers

​min_containers

​max_containers

​Automatic Queuing

​Combined Behaviors

​Low-latency with bounded scale

​Guaranteed capacity with ceiling

​Full control

​Scaling in Workflows

​Default Behavior

​Learn More

SDK Reference

Agentic Patterns

Building Workflows

Scaling Parameters

`warm_containers`

`min_containers`

`max_containers`

Automatic Queuing

Combined Behaviors

Low-latency with bounded scale

Guaranteed capacity with ceiling

Full control

Scaling in Workflows

Default Behavior

Learn More