How It Works
By default, agents scale from zero. The first request after an idle period experiences a “cold start” while the container loads your code and dependencies. Subsequent requests are served by warm containers until the agent goes idle again. When multiple requests arrive simultaneously, Tensorlake automatically creates more containers to handle the load in parallel. There’s no upper limit by default — your agent scales to meet demand.Configuration Options
You can tune scaling behavior with a few optional parameters on@function():
Keep Containers Warm
If cold starts are problematic for your use case, keep some containers pre-warmed:Limit Maximum Concurrency
To control costs or respect API rate limits, cap the maximum number of concurrent executions:Guarantee Minimum Capacity
Ensure a baseline level of capacity is always available:Control Request Concurrency
By default, each container handles one request at a time. If your agent is I/O-bound (waiting on API calls, database queries), you can increase concurrency:max_containers × concurrency
For example, max_containers=10 and concurrency=5 allows up to 50 concurrent requests.
Combining Options
You can combine parameters for fine-grained control:When to Configure Scaling
Most agents work fine with the defaults. Consider configuring scaling when:- Latency is critical — Use
warm_containersto eliminate cold starts - You have cost constraints — Use
max_containersto cap spending - External APIs have rate limits — Use
max_containersandconcurrencyto stay within limits - You need guaranteed capacity — Use
min_containersto ensure availability