scikit-learn classification models in parallel by running each in its own sandbox.
Example: Parallel Model Benchmarking
The following script benchmarks five differentscikit-learn models on the Iris dataset. Each model is trained and evaluated in a separate, concurrent sandbox.
How It Works
The script orchestrates the parallel execution of model benchmarks using Python’sasyncio library.
1. Parallel Execution: The main function defines a dictionary of models to test and creates a list of asynchronous tasks using a list comprehension. asyncio.gather runs all these tasks concurrently.
2. Sandbox Task: The run_model_benchmark function is responsible for a single benchmark. For each model, it:
- Creates a new, isolated sandbox.
- Installs the necessary Python libraries (
numpyandscikit-learn) inside the sandbox usingsandbox.run(). The--break-system-packagesflag is used to comply with PEP 668 in newer Python environments. - Executes a Python script that trains the model on the Iris dataset and calculates its accuracy.
- Prints the results as a JSON string to standard output.
- Captures the
stdout, parses the JSON, and returns the result.
asyncio.gather returns a list of all the results, which are then printed to the console.
This example uses the The
python-dotenv library to load your Tensorlake API key from a .env file. Create a file named .env in your project root and add your key:SandboxClient will automatically use this key.Pro Tips
Faster Execution with Snapshots
The example installs dependencies every time a sandbox is created. This is simple but inefficient for repeated runs. To significantly speed up your workflow, you can use Snapshots.- Create a “base” sandbox and install all your dependencies.
- Create a snapshot of that sandbox.
- Start new sandboxes from the snapshot ID. The new sandboxes will have all the dependencies pre-installed, saving you valuable setup time.