Trycua released Cua, an open-source infrastructure stack for building, benchmarking, and deploying computer-use agents. The project provides sandboxes, SDKs, and benchmarks across macOS, Windows, Linux, and Android. It aims to give AI agents full desktop control without stealing user focus or requiring screen sharing.

The timing matters. Computer-use agents — AI models that can see a screen, move a cursor, click buttons, and type text — are one of the most active frontiers in applied AI research. OpenAI demoed a computer-use capability in early 2025. Anthropic shipped a beta version in Claude 3.5 Sonnet. Google DeepMind published research on grounding agents in browser environments. But most of these efforts have been proprietary, tied to a single model provider, or limited to browser-based interaction.

Cua takes a different approach. It is model-agnostic, operating-system-agnostic, and infrastructure-agnostic. It provides four distinct layers: Cua (the sandbox API), Cua Drivers (background desktop control), Cua Bench (benchmarks and reinforcement learning environments), and Lume (macOS and Linux VM management on Apple Silicon). Together they form a full pipeline for building agents that can interact with real desktop applications.

What the stack does

The sandbox layer, accessible via pip install cua, provides a unified API for creating ephemeral or persistent virtual machines and containers. A single Sandbox.ephemeral() call can spin up a Linux container, a macOS VM, a Windows VM, or an Android emulator. The same API works for cloud instances via cua.ai or local QEMU-based VMs. The sandbox exposes shell access, screenshot capture, mouse clicks, keyboard input, and mobile gestures.

The driver layer is perhaps the most technically interesting piece. Cua Drivers let agents control native desktop applications in the background — clicking, typing, and verifying without stealing the cursor or focus. This matters for production deployment. Most computer-use demos today require the agent to take over the user’s screen, which blocks the user from doing anything else. Cua Drivers run as a background process, accessible via CLI or as a Model Context Protocol (MCP) server. The project claims compatibility with Claude Code, Cursor, Codex, OpenClaw, and custom clients.

The benchmark layer, Cua Bench, evaluates agents on established datasets: OSWorld, ScreenSpot, Windows Arena, and custom tasks. It exports trajectories for training, which makes it useful for both evaluation and reinforcement learning data collection. The project includes a CLI tool that can run benchmarks with configurable parallelism.

Lume, the VM management layer, uses Apple’s Virtualization.Framework to create macOS and Linux VMs with near-native performance on Apple Silicon. This is a practical bottleneck for many agent developers: macOS VMs are notoriously difficult to provision for testing, and cloud macOS instances are expensive. Lume makes it possible to run macOS VMs locally for development and testing.

The infrastructure gap

Computer-use agents are hard to build for reasons that have little to do with the models themselves. The infrastructure challenges are substantial: provisioning operating systems, managing display servers, handling input capture, coordinating screenshots and mouse events, and running benchmarks reproducibly. Most teams solve these problems from scratch, ad hoc, for a single OS and a single model.

Cua’s bet is that this infrastructure should be a shared open-source layer, not proprietary moat. The project is MIT-licensed. It bundles third-party components under their own licenses: Kasm (MIT), OmniParser (CC-BY-4.0), and optionally ultralytics (AGPL-3.0) for the omni agent variant.

The project is still early. The README flags Linux support as a pre-release backend. The cloud sandbox API is marked as “soon” for BYOI images. The agent skill pack is described as optional. But the architecture is coherent and the scope is ambitious.

What this means for the field

Cua lowers the barrier to entry for computer-use agent development. A researcher or startup can now spin up a macOS VM, run a benchmark, and iterate on an agent without building custom infrastructure. This could accelerate the pace of research, particularly for tasks that require real desktop applications rather than synthetic web environments.

The model-agnostic design is a deliberate choice. Most computer-use demos today are tied to a specific model’s tool-use API. Cua treats the model as a plug-in: the same infrastructure works with Claude Code, open-weight models, or custom fine-tuned agents. This makes it easier to compare approaches and to train models on real computer-use trajectories.

The background driver is the feature most likely to matter for production deployment. An agent that can control desktop applications without stealing focus is usable in workflows where the human and the agent share a machine. A customer support agent that fills out a CRM form in the background. A QA agent that runs test suites across multiple operating systems. A data-entry agent that processes invoices while the user works on something else.

The benchmark layer addresses a fragmentation problem. OSWorld, ScreenSpot, and Windows Arena each test different capabilities on different operating systems. Cua Bench provides a single CLI to run all of them, with trajectory export for training. This is the kind of infrastructure that makes it possible to measure progress systematically rather than relying on cherry-picked demos.

Open questions

The project does not address safety or alignment directly. Computer-use agents that can control a full desktop raise obvious risks: an agent could delete files, send emails, install software, or exfiltrate data. Cua provides the control infrastructure but does not impose guardrails. The sandbox model mitigates some risks — ephemeral VMs can be destroyed after a task — but agents running on real desktops via Cua Drivers operate without a sandbox boundary.

The performance characteristics are not yet documented. Running background desktop control on macOS and Windows requires efficient screenshot capture, input injection, and state tracking. Latency and resource usage will determine whether the approach is practical for production workloads.

The project’s sustainability is unclear. Open-source infrastructure projects for AI agents are proliferating, and most struggle to find a business model. Trycua offers cloud sandbox hosting via cua.ai, which could provide a revenue stream, but the open-source core is MIT-licensed and freely forkable.

Cua is infrastructure, not a product. It does not include a pre-trained agent, a fine-tuning recipe, or a deployment service. Teams that use it still need to build the agent logic, train or select a model, and handle the safety engineering.

The project fills a real gap. Computer-use agents need operating systems to run on, benchmarks to measure against, and drivers to interact with. Cua provides all three in a single open-source stack. For teams building agents that need to control real desktops, it is worth a look.