Poolside, the foundation-model lab founded by Eiso Kant and Jason Warner, released two models and two products on April 28, 2026. The company is not a coding-agent wrapper or a fine-tuning shop. It pre-trains its own models from scratch in a facility it calls the Model Factory, and it is betting that the future of AI in software engineering belongs to models built specifically for agentic coding and long-horizon work, not general-purpose chatbots retrofitted with tool calls.
The two models are Laguna M.1, a 225-billion-parameter mixture-of-experts (MoE) model with 23 billion active parameters, and Laguna XS.2, a 33-billion-parameter MoE model with 3 billion active parameters. XS.2 is released under an Apache 2.0 open-weights license and is available on Hugging Face. M.1 is available via API and through Poolside’s own terminal agent, pool, and a cloud development environment called Shimmer. Both are free for a limited time on the company’s API and on OpenRouter.
What is new here is not just another set of benchmark scores. It is the thesis. Poolside is not trying to beat GPT-5.4 or Claude Sonnet 4.6 on general knowledge or creative writing. It is building for a specific use case: software engineering tasks that require a model to navigate a codebase, write patches, run tests, and iterate over hundreds of steps. The benchmarks Poolside chose to publish reflect that focus: SWE-bench Verified, SWE-bench Multilingual, SWE-Bench Pro, and Terminal-Bench 2.0. No MMLU, no GSM8K, no HellaSwag.
On SWE-bench Verified, Laguna M.1 scores above all publicly reported dense models and MoE models of comparable size, including DeepSeek-V4-Flash and Qwen3.5. On Terminal-Bench 2.0, which measures a model’s ability to complete terminal-based tasks across multiple steps, Laguna M.1 posts a score that the company claims is the highest publicly available. The company’s technical report is still forthcoming, but the benchmark table in the blog post shows Laguna M.1 at the top of the chart for Terminal-Bench 2.0, ahead of Devstral 2, GLM-4.7, and DeepSeek-V4-Flash.
The more interesting story is XS.2. At 3 billion active parameters, it fits on a single GPU. It runs locally via Ollama with native MLX support. And it beats Qwen3.5, Qwen3.6, Devstral Small 2, Gemma 4, and Claude Haiku 4.5 on SWE-bench Verified, SWE-bench Multilingual, and Terminal-Bench 2.0. The only benchmark where it trails is SWE-Bench Pro, where Qwen3.6 and Claude Haiku 4.5 edge ahead. For a model that started pre-training only five weeks before release, that is a strong showing.
The open-weights decision for XS.2 is strategic. Poolside’s Kant and Warner wrote that the open-weight ecosystem in the West is still early in its development, and they want to accelerate it. The Apache 2.0 license gives developers full freedom to modify, redistribute, and deploy the model. This is a direct challenge to the closed-source dominance of OpenAI, Anthropic, and Google in the coding-agent space. It also positions Poolside as a Western alternative to DeepSeek and Qwen, which have dominated the open-weight coding-model category.
The company’s organizational structure is worth noting. The models are the work of approximately 60 people in Poolside’s Applied Research organization, covering infrastructure, architecture, data, pre-training, and reinforcement learning. That is a small team by frontier-lab standards. For comparison, DeepSeek’s V4 team is reported to be several hundred people. Poolside is betting that focus and vertical integration can compensate for raw headcount.
The products that ship alongside the models matter. pool is a terminal-based coding agent that runs the models in a loop: generate a patch, execute it in a sandbox, observe the result, iterate. Shimmer is an instant-on virtual machine sandbox with the agent pre-installed, designed for building web apps, APIs, and CLIs. Both are in preview. Both are free for a limited time. Both are designed to capture usage data that feeds back into the model training loop.
Poolside’s approach to reinforcement learning is also distinct. The company uses async on-policy agent RL, meaning the model learns from its own rollouts during training. The agent harness the company uses internally is the same one used for benchmarking, which reduces the gap between evaluation and real-world performance. The company also developed its own work on the Muon optimizer and a training codebase called Titan, though details remain sparse.
The risks are clear. Poolside is a small company in a capital-intensive business. Pre-training foundation models costs tens of millions of dollars per run. The company has raised significant venture funding — $126 million in Series B in 2023 and an undisclosed amount since — but it is competing against labs with near-unlimited budgets. OpenAI, Anthropic, and Google can afford to train models that are general-purpose and then fine-tune them for coding. Poolside has to win on specialization and execution.
The company’s focus on agentic coding also means it is betting against the idea that general-purpose models will eventually absorb all coding-specific capabilities. If GPT-6 or Gemini 3 can match or exceed Laguna M.1 on SWE-bench without sacrificing generality, Poolside’s moat narrows. But if coding is a domain that rewards deep specialization — models trained from scratch on code, with architectures optimized for long-horizon reasoning and tool use — then Poolside’s approach could yield compounding advantages.
The release of XS.2 as open weights is the most immediately consequential move. It gives the open-source community a Western-trained coding model that can run on consumer hardware and compete with models from DeepSeek and Qwen. If developers adopt it, build tools around it, and contribute improvements, Poolside gains a distribution channel and a feedback loop that no closed model can match.
The question for AI builders is whether to bet on Poolside’s ecosystem or wait for the next generation of general-purpose models. The answer depends on how much you value specialization. If your workflow is long-horizon software engineering — patching a legacy codebase, refactoring a monolith, writing integration tests — a model trained from scratch on that task may outperform a general model that was taught to write poetry and solve math problems. If your workflow is more varied, a general model may still be the better choice.
Poolside is now live with real users, real models, and real benchmarks. The company’s next move will be the most telling: whether it can sustain the pace of improvement, keep the open-weight community engaged, and convert benchmark wins into real-world adoption. The Laguna family is no longer a promise. It is a product.