Skip to main content
Run parallel data analysis, model training, and benchmarking tasks in secure, isolated sandbox environments. Each sandbox can have its own dependencies and resource limits, allowing you to compare different models or process large datasets concurrently. This example demonstrates how to benchmark several scikit-learn classification models in parallel by running each in its own sandbox.

Example: Parallel Model Benchmarking

The following script benchmarks five different scikit-learn models on the Iris dataset. Each model is trained and evaluated in a separate, concurrent sandbox.
import asyncio
import json

from dotenv import load_dotenv
load_dotenv()

from tensorlake.sandbox import SandboxClient


async def run_model_benchmark(model_name, sklearn_path):
    """
    Runs a model benchmark inside an isolated sandbox.
    Returns a dict with model name and accuracy.
    """
    module_path, class_name = sklearn_path.rsplit('.', 1)

    code = f"""
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from {module_path} import {class_name}
import json

data = load_iris()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.3)

model = {class_name}()
model.fit(X_train, y_train)

score = model.score(X_test, y_test)
print(json.dumps({{"model": "{model_name}", "accuracy": score}}))
"""

    def _sync_benchmark():
        client = SandboxClient()
        with client.create_and_connect() as sandbox:
            print(f"🚀 Sandbox started for {model_name}...")
            # install scikit-learn and its dependencies in the sandbox
            sandbox.run("pip", ["install", "--user", "--break-system-packages", "numpy", "scikit-learn"])
            # run the code in the sandbox
            result = sandbox.run("python", ["-c", code])

            output_data = json.loads(result.stdout.strip())

            return output_data

    return await asyncio.to_thread(_sync_benchmark)

async def main():
    models_to_test: dict[str, str] = {
    "Random Forest": "sklearn.ensemble.RandomForestClassifier",
    "SVM": "sklearn.svm.SVC",
    "Logistic Regression": "sklearn.linear_model.LogisticRegression",
    "Decision Tree": "sklearn.tree.DecisionTreeClassifier",
    "KNN": "sklearn.neighbors.KNeighborsClassifier",
}

    tasks = [run_model_benchmark(name, path) for name, path in models_to_test.items()]
    print("Gathering results from all sandboxes...\n")
    results = await asyncio.gather(*tasks)

    print("--- Benchmark Results ---")
    for r in results:
        print(f"{r['model']:<20}: {r['accuracy']:.4f}")

if __name__ == "__main__":
    asyncio.run(main())

How It Works

The script orchestrates the parallel execution of model benchmarks using Python’s asyncio library. 1. Parallel Execution: The main function defines a dictionary of models to test and creates a list of asynchronous tasks using a list comprehension. asyncio.gather runs all these tasks concurrently. 2. Sandbox Task: The run_model_benchmark function is responsible for a single benchmark. For each model, it:
  • Creates a new, isolated sandbox.
  • Installs the necessary Python libraries (numpy and scikit-learn) inside the sandbox using sandbox.run(). The --break-system-packages flag is used to comply with PEP 668 in newer Python environments.
  • Executes a Python script that trains the model on the Iris dataset and calculates its accuracy.
  • Prints the results as a JSON string to standard output.
  • Captures the stdout, parses the JSON, and returns the result.
3. Aggregate Results: Once all sandboxes have completed their tasks, asyncio.gather returns a list of all the results, which are then printed to the console.
This example uses the python-dotenv library to load your Tensorlake API key from a .env file. Create a file named .env in your project root and add your key:
TENSORLAKE_API_KEY="your-api-key-here"
The SandboxClient will automatically use this key.

Pro Tips

Faster Execution with Snapshots

The example installs dependencies every time a sandbox is created. This is simple but inefficient for repeated runs. To significantly speed up your workflow, you can use Snapshots.
  1. Create a “base” sandbox and install all your dependencies.
  2. Create a snapshot of that sandbox.
  3. Start new sandboxes from the snapshot ID. The new sandboxes will have all the dependencies pre-installed, saving you valuable setup time.
Learn more in the Snapshots guide.

Instant Sandboxes with Warm Pools

For the lowest possible latency, you can use Warm Pools to maintain a set of pre-warmed, ready-to-use sandboxes. Claiming a sandbox from a pool is nearly instantaneous. Learn more in the Pools guide.

Learn More