tensorlake/ubuntu-vnc is a managed desktop image for browser automation and computer-use agents. It boots XFCE, TigerVNC, and Firefox for you, and the SDK connects through the authenticated sandbox proxy so you can drive the desktop without manually exposing port 5901.
This guide builds on Sandboxes. If you already have a Tensorlake API key, you can create a desktop sandbox, capture screenshots, and send mouse and keyboard input in just a few lines.
Prerequisites
- Python
- JavaScript
tensorlake/ubuntu-vnc image uses tensorlake as its VNC password.
Launch a Desktop Sandbox
Usetensorlake/ubuntu-vnc when you want a full Linux desktop instead of a shell-only environment.
- Python
- JavaScript
Sandbox object back, so computer use fits naturally alongside run(), file operations, PTY sessions, snapshots, and tunnels.
Capture Screenshots
Once the sandbox is running, attach to the desktop and save a PNG. This is the easiest way to inspect the layout and discover click coordinates before sending pointer events. Fresh desktop sandboxes can take a few seconds to finish starting XFCE and other desktop services. If your first screenshot is blank or input does not land where you expect, wait briefly after connecting and then retry.- Python
- JavaScript
Send Keyboard and Mouse Input
The desktop client supports keyboard shortcuts, typed input, clicks, double-clicks, mouse movement, and scrolling. The example below uses a reliable keyboard-driven flow: open a terminal, type a command, and then verify the result from the sandbox shell.- Python
- JavaScript
- Take a screenshot.
- Inspect the desktop layout and note the coordinates you care about.
- Use
move_mouse()/moveMouse(),click(),double_click()/doubleClick(), andscroll()with those coordinates.
Reconnect to an Existing Sandbox
If a sandbox is already running, connect by sandbox ID and attach to the desktop without creating a new VM.- Python
- JavaScript
Use noVNC in the Browser
If you want a human to interact with the sandbox desktop in real time, use a real VNC client in the browser instead of polling screenshots.noVNC is a good fit here.
The recommended architecture is:
- Keep the Tensorlake API key on your backend.
- Use the backend to open a TCP tunnel to the sandbox’s VNC port
5901. - Bridge that local tunnel to a browser WebSocket endpoint such as
/vnc/<session-id>. - Point
noVNCat your backend WebSocket and authenticate with the desktop passwordtensorlake.
5901 publicly yourself.
If you are also running an agent loop, a good pattern is to use:
noVNCfor the live human-facing desktop streamsandbox.connectDesktop()for screenshots and high-level computer-use actions on the backend
Browser Client with noVNC
InstallnoVNC in your frontend:
Desktop API Surface
Python usessnake_case, while JavaScript uses camelCase, but both SDKs expose the same core capabilities:
- Screenshots:
screenshot() - Mouse input:
move_mouse()/moveMouse(),mouse_press()/mousePress(),mouse_release()/mouseRelease(),click(),double_click()/doubleClick(),scroll(),scroll_up()/scrollUp(), andscroll_down()/scrollDown() - Keyboard input:
key_down()/keyDown(),key_up()/keyUp(),press(), andtype_text()/typeText() - Desktop size:
widthandheight
connect_desktop() and connectDesktop() go through the authenticated sandbox proxy, so you do not need to bind or expose the VNC port yourself. If you need manual access for debugging, you can still create a TCP tunnel to port 5901.
localhost:15901.