Use this file to discover all available pages before exploring further.
Durable Execution is in Technical PreviewThis feature is currently in technical preview and under active development. Please contact us on Slack if you’d like to ask a question or try it out.
Agents call LLMs, scrape websites, query databases, and invoke external APIs. Any of these can fail — rate limits, timeouts, transient network errors, OOM kills. Without durability, a failure means restarting the entire agent from scratch, repeating every LLM call and API request.Tensorlake checkpoints every @function() call automatically. When a request fails, you replay it and only the failed step re-executes. Everything before it is served from the checkpoint.This page covers the recovery patterns. For automatic retries on transient failures (rate limits, validation errors), see Retries & Rate Limits. For long-running functions that need to extend their deadline as they make progress, see Timeouts. For try/except patterns and graceful degradation, see Error Handling.
LLM calls are unlike normal API calls. They are non-deterministic — the same prompt can produce a different response on every invocation. This makes re-execution dangerous, not just wasteful.Consider a travel agent that plans a trip. On the first run, the LLM decides on flights to Whistler. The agent books the flights, then crashes while searching for hotels. Without durable execution, the agent restarts from scratch. This time the LLM decides on Japan instead. Now the user has unwanted Whistler flights and a completely different trip plan.Making LLM calls durable solves three problems at once:
Consistency — Prior LLM decisions are preserved on replay. The agent resumes searching for Whistler hotels, not re-planning the entire trip.
Cost — LLM inference is expensive. Re-executing 14 successful tool-calling iterations because the 15th failed wastes tokens and money.
Rate limits — Agentic applications multiply downstream calls by an order of magnitude. Re-executing all of them increases the chance of hitting rate limits again.
On Tensorlake, every @function() call is automatically checkpointed. When a request is replayed, previously successful LLM calls return their recorded outputs — the model is not called again.
The most common agent pattern is a loop: the LLM decides which tool to call, the tool runs, the result feeds back into the LLM. Each iteration is an expensive operation — an LLM inference plus a tool execution.Wrap each tool in its own @function() to make every tool call a checkpoint:
If the agent crashes on iteration 15, a replay skips the first 14 iterations entirely. The LLM calls, web searches, and document reads from those iterations are all served from checkpoints. The agent resumes from iteration 15 with the full conversation history intact.
When you process a batch of items in parallel using map, each item is an independent function call with its own checkpoint. If 3 out of 1,000 items fail, replay only re-processes those 3.
from tensorlake.applications import application, function@function(timeout=120)def process_document(doc_url: str) -> dict: """Parse a single document. Each call is independently checkpointed.""" content = fetch_and_parse(doc_url) extracted = extract_fields(content) return extracted@function()def aggregate_results(results: list[dict], acc: dict) -> dict: """Combine results as they arrive.""" acc["documents"].append(results) return acc@application()@function()def batch_processor(doc_urls: list[str]) -> dict: results = process_document.map(doc_urls) summary = results.reduce(aggregate_results, {"documents": []}) return summary
This is the pattern behind durable data ingestion pipelines. Whether you’re processing SEC filings, insurance forms, or research papers — partial failures don’t lose the work already completed.
When a function sends an email, charges a credit card, or writes to an external database, you don’t want that action repeated on replay. Wrap the side effect in its own @function() — since the function’s output is checkpointed, replay skips it entirely.