Skip to main content
Workflows scale out automatically as their endpoints are called. When you invoke a workflow, Tensorlake spins up containers for each function as needed, processes the request, and scales back down when idle. Each function in your workflow can have its own scaling configuration. You control scaling behavior with three parameters: warm_containers, min_containers, and max_containers. Example workflow with scaling:
from tensorlake.applications import application, function

@function(warm_containers=2, max_containers=10)
def enrich_data(record_id: str) -> dict:
    # 2 containers always warm, scales up to 10
    ...

@application()
@function()
def process_workflow(record_id: str) -> dict:
    enriched = enrich_data.awaitable(record_id)
    return transform.awaitable(enriched)
When you call POST /applications/process_workflow, the workflow endpoint scales automatically, and each function scales based on its configuration.

Scaling Parameters

Configure scaling in the @function() decorator:
from tensorlake.applications import function

@function(
    warm_containers=2,
    min_containers=1,
    max_containers=10
)
def process_data(data: str) -> str:
    ...

warm_containers

Number of pre-warmed containers to keep ready. Warm containers have your code and dependencies loaded, eliminating cold start latency for incoming requests.
@function(warm_containers=3)
def classify_document(content: str) -> str:
    """3 containers are always warm and ready to handle requests."""
    # Critical first step in workflow - needs low latency
    ...
Use warm containers when:
  • You need low-latency responses
  • Cold starts are unacceptable for your use case
  • You have predictable baseline traffic

min_containers

Guaranteed minimum number of containers. Unlike warm_containers, these containers may also be actively processing requests. Tensorlake will never scale below this number.
@function(min_containers=2)
def always_available(data: str) -> str:
    """At least 2 containers are always running."""
    ...

max_containers

Maximum number of containers. Once this limit is reached, additional requests are automatically queued and processed in FIFO order as containers become available.
@function(max_containers=5)
def bounded_processing(data: str) -> str:
    """No more than 5 containers will run simultaneously."""
    ...

Automatic Queuing

When all containers for a function are busy and max_containers has been reached, Tensorlake automatically queues incoming requests. No configuration is needed — queuing is built into the platform.
  • Requests are processed in FIFO order
  • Queued requests begin processing as soon as a container becomes available
  • No separate queue infrastructure (Redis, SQS, RabbitMQ) is required
@function(max_containers=3)
def process_with_llm(data: str) -> str:
    """At most 3 concurrent LLM calls. Additional requests are queued."""
    # Expensive workflow step - limit concurrency to control costs
    ...

Combined Behaviors

Combine parameters for fine-grained control:

Low-latency with bounded scale

@function(
    warm_containers=2,   # 2 containers ready for instant response
    max_containers=10    # Scale up to 10, then queue
)
def extract_entities(document: str) -> dict:
    # First step in document workflow - needs low latency
    ...

Guaranteed capacity with ceiling

@function(
    min_containers=3,    # Never fewer than 3
    max_containers=20    # Never more than 20
)
def transform_records(records: list[dict]) -> list[dict]:
    # ETL workflow step - guaranteed capacity with cost control
    ...

Full control

@function(
    min_containers=2,    # Always at least 2 running
    warm_containers=4,   # 4 containers pre-warmed (includes the 2 min)
    max_containers=50    # Scale up to 50, then queue
)
def enrich_from_api(record_id: str) -> dict:
    # High-volume workflow step with fine-grained control
    ...

Scaling in Workflows

Each function in your workflow scales independently. This allows different workflow steps to have different scaling profiles based on their resource requirements and latency needs:
@function(warm_containers=5, max_containers=50)
def fetch_data(record_id: str) -> dict:
    """High-throughput data fetching with low latency."""
    ...

@function(max_containers=3)
def analyze_with_llm(data: dict) -> dict:
    """Expensive LLM analysis, bounded concurrency to control costs."""
    ...

@application()
@function()
def process_record(record_id: str) -> dict:
    # fetch_data can handle 50 concurrent requests
    data = fetch_data.awaitable(record_id)
    # analyze_with_llm is limited to 3 concurrent executions
    return analyze_with_llm.awaitable(data)
In this workflow, fetch_data can scale to 50 containers for high throughput, while analyze_with_llm is capped at 3 to control costs. When you call the process_record endpoint, both functions scale independently based on their configuration.

Default Behavior

Without any scaling parameters, workflow functions scale dynamically:
  • Containers scale from zero based on demand when the workflow endpoint is called
  • There is no upper bound on container count
  • Cold starts occur for the first request after an idle period
  • No automatic queuing (unlimited scaling)

Learn More