Research / T-2026-3733

Why CORE matters: reasoning improvement without the rollout tax

CORE lets language models improve reasoning with minimal rollouts by generating insights from contrasting success and failure.

Tessera Newsroom · 2 min read · May 28, 2026

Source CORE: Contrastive Reflection Enables Rapid Improvements in Reasoning (arxiv.org)

TILE No. T-2026-3733

3733 RESEARCH

The standard recipe for improving a language model’s reasoning is expensive. Run hundreds of samples, generate thousands of rollouts, update weights or optimize prompts. The cost is baked into the assumption that improvement requires scale.

A new paper from researchers on arXiv challenges that assumption. CORE (Contrastive Reflection) takes a different approach. Instead of accumulating data, it accumulates insights.

The algorithm works by comparing a model’s successful and unsuccessful reasoning traces on a given problem. From that comparison, it generates a short natural-language “insight”: a compact description of a reasoning strategy or constraint that separates the correct path from the wrong one. That insight gets stored and fed into the prompt on future attempts.

The results are striking. Across four reasoning tasks, CORE outperforms both parametric methods like GRPO and non-parametric baselines like episodic RAG and MemRL, all while using fewer rollouts. With as few as five training samples, it matches or exceeds the gains of methods that require hundreds.

This is not a marginal efficiency gain. It is a structural shift in how models can improve. The insight is not a weight update or a cached trace. It is an abstraction, written in natural language, that the model can reuse and combine. The paper shows that CORE is also more context-efficient than its peers, requiring fewer prompt tokens to store the same knowledge.

What makes CORE surprising is that it works at all. The received wisdom in reasoning research is that models need many examples to generalize. CORE suggests that the bottleneck is not the number of examples but the ability to extract the right signal from the contrast between success and failure. A single well-chosen insight can do more than a thousand rollouts.

For builders, the implication is practical. CORE opens a path to rapid, interpretable model improvement without the infrastructure cost of RL pipelines or the brittleness of prompt optimization. The insights are human-readable. They can be inspected, edited, and combined. The algorithm is non-parametric, meaning the model’s weights stay untouched.

The paper is available on arXiv at http://arxiv.org/abs/2605.28742v1. The question it leaves open is whether the approach scales to tasks where the reward signal is noisy or the reasoning traces are long. That is the next test.

Tessera Newsroom

Editorial

Masthead Contact

T-REL / RESEARCH

The company that ships AI without an AI team

Mid-market firms are deploying custom AI without hiring ML engineers, commissioning assistants and automation from generalist software agencies. We weigh what that route buys.

Tessera Newsroom · July 15, 2026

Research / T-2026-3444

Microsoft Study: Claude Code and Copilot CLI Users Merged 24% More Pull Requests

Researchers at Microsoft studied the early 2026 rollout of Claude Code and Copilot CLI, finding a 24% lift in pull requests merged and adoption driven by peer networks.

Tessera Newsroom · July 14, 2026

Research / T-2026-7866

Arvind Narayanan at ICML 2026: AI adaptation is the slow work of decades

Arvind Narayanan's ICML 2026 keynote argues AI adaptation will take decades, not years — and that the real bottleneck is organizational, not technical.

Tessera Newsroom · July 14, 2026