Software / T-2026-8414

Xiaomi's MiMo Code open-sources a memory system that beats Claude Code on long tasks

Xiaomi open-sources MiMo Code V0.1.0, a terminal AI coding assistant that beats Claude Code on long-horizon tasks with a cross-session memory system.

Tessera Newsroom · 4 min read · July 1, 2026

Source Xiaomi's new open source, agentic AI coding harness MiMo Code beats ... (venturebeat.com)

FIGURE T-2026-8414

0.1.0, SOFTWARE

Xiaomi’s MiMo AI team open-sourced MiMo Code V0.1.0 on June 10, a terminal-native AI coding assistant that outperforms Anthropic’s Claude Code on long-horizon, multi-step coding tasks. The Chinese electronics giant says its tool beats Claude Code on SWE-bench Verified (82% vs. 79%), SWE-bench Pro (62% vs. 55%), and Terminal Bench 2 (73% vs. 69%) when paired with Xiaomi’s MiMo-V2.5-Pro model.

The headline numbers are vendor self-reported, unverified by independent benchmarks. But the architecture underneath them matters more than the scores.

MiMo Code is a fork of the open-source OpenCode agent, extended with a cross-session memory system that addresses a fundamental failure mode of current AI coding agents: context window decay. As a coding session lengthens, earlier decisions, conventions, and task state get compacted or lost. The agent forgets what it built, and the developer re-explains.

Xiaomi argues compression is the wrong fix. “What we need is not better compression, but an explicit storage-and-retrieval mechanism that decides what information should be written into persistent structures, and when it should be recalled,” the MiMo team wrote in their launch blog.

The memory system uses SQLite FTS5 full-text search across four layers: a persistent MEMORY.md file, session checkpoints, scratch notes, and per-task progress logs. A separate “checkpoint-writer” subagent runs alongside the primary coding agent, recording decisions and state without pausing the main work. When the context window approaches its limit, the primary agent consults the subagent and rebuilds its environment from structured checkpoints.

Two self-improvement mechanisms round out the design. A /dream command runs roughly every seven days, reviewing historical sessions, deduplicating them, and compressing them into long-term memory. A “distill” function mines past sessions for repeated workflows that can be automated, following a similar approach taken by OpenAI and Anthropic.

The harness itself accounts for a measurable share of the gain. Running the same MiMo-V2.5-Pro model in both harnesses, MiMo Code scored 62% on SWE-bench Pro versus 57% for Claude Code, and 73% on Terminal Bench 2 versus 68%. That is roughly five percentage points each, attributable purely to the agent system rather than the model.

Xiaomi ran a human double-blind A/B evaluation during its internal beta, covering 576 developers working in 474 real private repositories, producing 1,213 judged head-to-head pairs against Claude Code using the same target model. Under 200 execution steps, the two systems split roughly 50/50. Past 200 steps, MiMo Code’s win rate rose above 65%.

That is the real signal. The benchmarks “still measure one-shot problem-solving ability,” Xiaomi concedes. The human evaluation captures the multi-session design goal.

MiMo Code lands in a crowded field of terminal-based coding agents: Anthropic’s Claude Code, OpenAI’s Codex CLI, Google’s Gemini CLI, and open-source players like OpenCode and Aider. Xiaomi chose Claude Code as its sole named competitor throughout its materials. Independent reference points suggest why. On the official Terminal-Bench 2.0 leaderboard at tbench.ai, OpenAI’s Codex CLI running GPT-5.5 scores 82.2% — roughly nine points above MiMo Code’s self-reported 73%. On SWE-Bench Pro, however, OpenAI reports GPT-5.5 at 58.6%, below MiMo Code’s claimed 62%. MiMo Code does not yet appear on either official leaderboard, and cross-comparing self-run numbers against leaderboard submissions carries the usual configuration caveats.

The pricing is aggressive. MiMo-V2.5 starts at $0.40 per million input tokens and $2.00 per million output tokens, while V2.5-Pro runs $1.00/$3.00 per million up to 256K context, doubling beyond that. Cache hits drop input costs to as little as $0.20 per million. For comparison, Anthropic’s Claude Opus 4.8 runs $5.00/$25.00 per million, and OpenAI’s GPT-5.5 runs $5.00/$30.00 per million. MiMo Code also supports third-party backends, including token plans from DeepSeek, Moonshot’s Kimi, and Zhipu’s GLM, along with any OpenAI-compatible API.

The bigger story is what this says about the AI coding stack. Scaffolding and harness engineering are becoming as important as raw model capability. Xiaomi’s V2.5-Pro post-training was explicitly designed to instill “harness awareness” — training the model to manage its own memory and context within agent scaffolds like Claude Code or OpenCode. A Xiaomi-built harness optimized around that capability is a logical next step.

The effort is led by Fuli Luo, a veteran of DeepSeek’s disruptive R1 project. Xiaomi has been building its MiMo AI division since the release of the MiMo-7B reasoning model in April 2025, following with the MiMo-VL vision-language series, MiMo-V2-Flash, the 1-trillion-parameter MiMo-V2-Pro in March 2026, and the V2.5 flagship family in April. The company is the world’s third-largest smartphone maker with a fast-growing EV business.

The terminal AI coding agent wars are going global. Xiaomi’s entry brings aggressive pricing, an MIT-licensed open-source tool, and a memory architecture that targets the most visible failure mode of current agents. The human evaluation data suggests the memory system works on long tasks. The benchmarks are unverified. The pricing is real.

Tessera Newsroom

Editorial

Masthead Contact

T-REL / SOFTWARE

llama.cpp b9837 adds a reasoning-preserve flag, and that matters more than it sounds

llama.cpp b9837 ships a `--reasoning-preserve` flag for chat templates. The feature is small. The problem it solves is not.

Tessera Newsroom · June 29, 2026

Software / T-2026-7289

Strix turns LLM agents into autonomous pentesters

Strix is an open-source platform that uses autonomous AI agents to dynamically find and validate software vulnerabilities, offering a developer-first CLI and CI/CD integration.

Tessera Newsroom · June 29, 2026

Software / T-2026-9579

OpenSpec proposes a spec layer for AI coding, but the hard part is still the model

Fission-AI's OpenSpec adds a spec layer to AI coding workflows. The idea is sound, but execution depends on models it cannot control.

Tessera Newsroom · June 28, 2026