The AI agent stack has a fundamental design problem. Teams build multi-step agent logic on top of tooling built for single-turn chat. The gateway does not know what evals are. The eval framework does not talk to the router. The observability tool can surface a hallucination, but it cannot tell the gateway to stop routing traffic to the failing model. Every new capability means another vendor, another integration, another thing to break.
Respan, formerly Keywords AI, builds the alternative: a unified control plane that merges LM gateway functions, evaluations, observability, and prompt optimization into a single automated platform. The company, founded by Andy Li and Raymond Huang and backed by Y Combinator (Winter 2024 batch), announced a $5 million seed round in March 2026 led by Gradient Ventures. The pitch is not incremental. It is a structural bet that the next generation of production agents will be routed not by price alone, but by trace patterns, eval signals, and real-time reliability scores.
The numbers are worth pausing on. Respan says it already routes over 80 trillion tokens cumulatively in production, processes more than 1 billion logs and 2 trillion tokens every month, and supports over 6.5 million end users. The customer list includes Retell AI, Mem0, Mercor, AlphaSense, Octolane AI, and Gumloop. For a company that launched as Keywords AI in early 2024, that is a fast climb.
What makes Respan different from the dozen other AI observability and gateway tools is the closed loop. In a fragmented stack, an eval tool detects a hallucination, but the gateway keeps routing traffic to the failing model. Nobody told it to stop. Respan’s gateway is natively coupled with real-time evals. A hallucination trigger in production does not just end up in a log file. It immediately informs the LM router, triggering automated fallbacks to more reliable models or switching to safe-mode prompts. The system learns from its own production traces.
The architecture works in three stages, as described in the Gradient Ventures announcement. First, multi-environment deployment with decoupled dev, staging, and production environments, granular RBAC, load balancing, automated fallbacks, and a managed semantic caching layer. Second, long-context and agent-first consumption: handling edge cases like NGINX tuning and asynchronous logging so that agents hitting 100k-token contexts do not time out. Third, the eval-aware gateway itself, where trace patterns become routing decisions in real time.
This matters because the cost structure of agent workloads is changing. Inference costs have dropped low enough that teams can run agents in loops. But the operational cost of a five-vendor stack is not dropping. Every integration point is a potential failure mode. Teams debugging fragmented traces and manually syncing data across tools spend more time on plumbing than on the agent itself. Respan’s bet is that as agent workloads scale toward trillions of tokens, the teams running unified platforms will lap the teams running Frankenstein stacks.
The timing aligns with a broader shift in the AI infrastructure market. LangChain, Weights and Biases, and Arize AI all offer pieces of the observability and evaluation puzzle. But none of them own the gateway. Respan’s insight is that the gateway is the natural control point. If you control the routing layer, you control the feedback loop. You can decide which model gets traffic based on real-time eval signals. You can drain traffic from a failing model before the developer even knows there is a problem.
There is a risk, of course. Unified platforms trade flexibility for coherence. A team that wants to use LangSmith for evals, Portkey for routing, and Datadog for observability will not buy Respan. But the calculus changes when the agent is mission-critical and the stack is breaking under load. The edge cases that take down most stacks at scale, long-context timeouts, async logging under load, and eval signals that never reach the router, are handled at the platform level.
Respan’s founders bring a specific background. Li worked as a product design engineer at Apple, designing parts for AirPods, before dropping out of UIUC. Huang graduated with highest honors in three years and self-mastered software engineering within a year, declining robotics PhD offers from top schools. The team of 15 is based in San Francisco and currently hiring for sales, security engineering, and frontend roles.
The open question is whether the market wants a unified control plane or best-in-class components. Respan’s early traction suggests that teams scaling agents are feeling the pain of fragmentation acutely enough to try a different model. If the eval-aware gateway becomes the default routing layer for production agents, the company has a chance to own the most important piece of the agent infrastructure stack.
What to watch: whether Respan opens its eval-aware routing logic as a protocol that other tools can plug into, or keeps the loop proprietary. The former would accelerate adoption. The latter would protect the moat. Either way, the loop is the product, and the loop is what the fragmented stack cannot deliver.