Skip to main content
LLM providers return rate-limit errors, APIs time out, and web scrapes hit transient failures. Tensorlake handles retries at the platform level — each retry is durable, meaning any nested function calls that already succeeded are served from checkpoints instead of re-executing.

Configuring Retries

Set the retries parameter on any @function() to automatically retry on failure. This is especially useful for LLM calls that return structured output — if the LLM returns malformed data, Pydantic validation fails and Tensorlake retries the entire call:
from pydantic import BaseModel
from tensorlake.applications import function

class ResearchFindings(BaseModel):
    summary: str
    sources: list[str]
    confidence: float

@function(retries=3)
def extract_findings(text: str) -> ResearchFindings:
    from openai import OpenAI
    # Disable client-level retries to avoid unpredictable behavior
    response = OpenAI(max_retries=0).chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Extract research findings as JSON."},
            {"role": "user", "content": text},
        ],
        response_format={"type": "json_object"},
    )
    # If validation fails, Tensorlake retries the entire function
    return ResearchFindings.model_validate_json(response.choices[0].message.content)
How retries work:
  • Rate limit errors, timeouts, or exceptions trigger automatic retries
  • Validation failures (e.g., Pydantic ValidationError) also trigger retries
  • Tensorlake retries up to 3 times with exponential backoff
  • Any nested function calls that already succeeded are served from checkpoints, not re-executed
Disable client-level retries (e.g., OpenAI’s max_retries=0) when using Tensorlake retries. Layering both creates unpredictable behavior and inflated retry counts.

Rate Limiting External APIs

When calling external APIs with rate limits, you can control the total number of concurrent calls using the formula: Total concurrent calls = max_containers × concurrency This allows you to respect API rate limits by capping the maximum number of parallel requests your function can make:
from tensorlake.applications import function

@function(
    retries=3,
    max_containers=5,  # Maximum 5 containers
    concurrency=2      # Each container handles 2 concurrent requests
)
def call_rate_limited_api(query: str) -> dict:
    # Total concurrent calls: 5 × 2 = 10 requests max
    import requests
    response = requests.get(f"https://api.example.com/search?q={query}")
    return response.json()
Use cases:
  • Respect API quotas — If an API allows 100 requests/second, set max_containers=50 and concurrency=2
  • Control costs — Limit concurrent LLM calls to manage token spend
  • Prevent overload — Cap requests to internal services that can’t handle high concurrency
See Scale-Out & Queuing for more on max_containers and request queuing.