Use this file to discover all available pages before exploring further.
Run parallel data analysis, model training, and benchmarking tasks in secure, isolated sandbox environments. Each sandbox can have its own dependencies and resource limits, allowing you to compare different models or process large datasets concurrently.This example demonstrates how to benchmark several scikit-learn classification models in parallel by running each in its own sandbox.
The following script benchmarks five different scikit-learn models on the Iris dataset. Each model is trained and evaluated in a separate, concurrent sandbox.
import asyncioimport jsonfrom dotenv import load_dotenvload_dotenv()from tensorlake.sandbox import Sandboxasync def run_model_benchmark(model_name, sklearn_path): """ Runs a model benchmark inside an isolated sandbox. Returns a dict with model name and accuracy. """ module_path, class_name = sklearn_path.rsplit('.', 1) code = f"""from sklearn.datasets import load_irisfrom sklearn.model_selection import train_test_splitfrom {module_path} import {class_name}import jsondata = load_iris()X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.3)model = {class_name}()model.fit(X_train, y_train)score = model.score(X_test, y_test)print(json.dumps({{"model": "{model_name}", "accuracy": score}}))""" def _sync_benchmark(): sandbox = Sandbox.create() print(f"🚀 Sandbox started for {model_name}...") # install scikit-learn and its dependencies in the sandbox sandbox.run("pip", ["install", "--user", "--break-system-packages", "numpy", "scikit-learn"]) # run the code in the sandbox result = sandbox.run("python", ["-c", code]) output_data = json.loads(result.stdout.strip()) return output_data return await asyncio.to_thread(_sync_benchmark)async def main(): models_to_test: dict[str, str] = { "Random Forest": "sklearn.ensemble.RandomForestClassifier", "SVM": "sklearn.svm.SVC", "Logistic Regression": "sklearn.linear_model.LogisticRegression", "Decision Tree": "sklearn.tree.DecisionTreeClassifier", "KNN": "sklearn.neighbors.KNeighborsClassifier",} tasks = [run_model_benchmark(name, path) for name, path in models_to_test.items()] print("Gathering results from all sandboxes...\n") results = await asyncio.gather(*tasks) print("--- Benchmark Results ---") for r in results: print(f"{r['model']:<20}: {r['accuracy']:.4f}")if __name__ == "__main__": asyncio.run(main())
The script orchestrates the parallel execution of model benchmarks using Python’s asyncio library.1. Parallel Execution: The main function defines a dictionary of models to test and creates a list of asynchronous tasks using a list comprehension. asyncio.gather runs all these tasks concurrently.2. Sandbox Task: The run_model_benchmark function is responsible for a single benchmark. For each model, it:
Creates a new, isolated sandbox.
Installs the necessary Python libraries (numpy and scikit-learn) inside the sandbox using sandbox.run(). The --break-system-packages flag is used to comply with PEP 668 in newer Python environments.
Executes a Python script that trains the model on the Iris dataset and calculates its accuracy.
Prints the results as a JSON string to standard output.
Captures the stdout, parses the JSON, and returns the result.
3. Aggregate Results: Once all sandboxes have completed their tasks, asyncio.gather returns a list of all the results, which are then printed to the console.
This example uses the python-dotenv library to load your Tensorlake API key from a .env file. Create a file named .env in your project root and add your key:
The example installs dependencies every time a sandbox is created. This is simple but inefficient for repeated runs. To significantly speed up your workflow, you can use Snapshots.
Create a “base” sandbox and install all your dependencies.
Create a snapshot of that sandbox.
Start new sandboxes from the snapshot ID. The new sandboxes will have all the dependencies pre-installed, saving you valuable setup time.