Skip to main content
tensorlake/ubuntu-vnc is a managed desktop image for browser automation and computer-use agents. It boots XFCE, TigerVNC, and Firefox for you, and the SDK connects through the authenticated sandbox proxy so you can drive the desktop without manually exposing port 5901. This guide builds on Sandboxes. If you already have a Tensorlake API key, you can create a desktop sandbox, capture screenshots, and send mouse and keyboard input in just a few lines.

Prerequisites

pip install tensorlake
export TENSORLAKE_API_KEY=your-api-key
The current managed tensorlake/ubuntu-vnc image uses tensorlake as its VNC password.

Launch a Desktop Sandbox

Use tensorlake/ubuntu-vnc when you want a full Linux desktop instead of a shell-only environment.
from tensorlake.sandbox import SandboxClient

client = SandboxClient.for_cloud()

with client.create_and_connect(image="tensorlake/ubuntu-vnc") as sandbox:
    print(sandbox.sandbox_id)
You still get a normal Sandbox object back, so computer use fits naturally alongside run(), file operations, PTY sessions, snapshots, and tunnels.

Capture Screenshots

Once the sandbox is running, attach to the desktop and save a PNG. This is the easiest way to inspect the layout and discover click coordinates before sending pointer events. Fresh desktop sandboxes can take a few seconds to finish starting XFCE and other desktop services. If your first screenshot is blank or input does not land where you expect, wait briefly after connecting and then retry.
import time
from pathlib import Path
from tensorlake.sandbox import SandboxClient

client = SandboxClient.for_cloud()

with client.create_and_connect(image="tensorlake/ubuntu-vnc") as sandbox:
    with sandbox.connect_desktop(password="tensorlake") as desktop:
        time.sleep(4.0)
        screenshot = desktop.screenshot()
        Path("sandbox-desktop.png").write_bytes(screenshot)

        print(desktop.width, desktop.height)

Send Keyboard and Mouse Input

The desktop client supports keyboard shortcuts, typed input, clicks, double-clicks, mouse movement, and scrolling. The example below uses a reliable keyboard-driven flow: open a terminal, type a command, and then verify the result from the sandbox shell.
import time
from tensorlake.sandbox import SandboxClient

client = SandboxClient.for_cloud()

with client.create_and_connect(image="tensorlake/ubuntu-vnc") as sandbox:
    with sandbox.connect_desktop(password="tensorlake") as desktop:
        desktop.press(["ctrl", "alt", "t"])
        time.sleep(4.0)

        desktop.type_text("echo docs-test > /tmp/desktop-test.txt")
        desktop.press("enter")
        time.sleep(3.0)

        # Mouse helpers are also available when you know the coordinates.
        desktop.move_mouse(640, 400)
        desktop.scroll_down()

    result = sandbox.run("bash", ["-lc", "cat /tmp/desktop-test.txt"])
    print(result.stdout.strip())  # docs-test
Coordinate-based actions are screen-relative. A common workflow is:
  1. Take a screenshot.
  2. Inspect the desktop layout and note the coordinates you care about.
  3. Use move_mouse() / moveMouse(), click(), double_click() / doubleClick(), and scroll() with those coordinates.

Reconnect to an Existing Sandbox

If a sandbox is already running, connect by sandbox ID and attach to the desktop without creating a new VM.
from pathlib import Path
from tensorlake.sandbox import SandboxClient

sandbox_id = "your-running-sandbox-id"

client = SandboxClient.for_cloud()

with client.connect(sandbox_id) as sandbox:
    with sandbox.connect_desktop(password="tensorlake") as desktop:
        Path("existing-sandbox.png").write_bytes(desktop.screenshot())
Connecting to an existing sandbox only closes the client connection when you are done. It does not terminate the running VM.

Desktop API Surface

Python uses snake_case, while JavaScript uses camelCase, but both SDKs expose the same core capabilities:
  • Screenshots: screenshot()
  • Mouse input: move_mouse() / moveMouse(), mouse_press() / mousePress(), mouse_release() / mouseRelease(), click(), double_click() / doubleClick(), scroll(), scroll_up() / scrollUp(), and scroll_down() / scrollDown()
  • Keyboard input: key_down() / keyDown(), key_up() / keyUp(), press(), and type_text() / typeText()
  • Desktop size: width and height
connect_desktop() and connectDesktop() go through the authenticated sandbox proxy, so you do not need to bind or expose the VNC port yourself. If you need manual access for debugging, you can still create a TCP tunnel to port 5901.
tl sbx tunnel <sandbox_id> 5901 --listen-port 15901
After the tunnel is up, you can connect with any VNC client using localhost:15901.