TauricResearch open-sourced TradingAgents in January and has been shipping updates every few weeks since. The framework, now at v0.2.5, does something most AI trading projects avoid: it models the messy internal dynamics of a real trading desk, complete with analysts who disagree, researchers who debate, and a portfolio manager who has the final say. The result is a research tool that feels less like a black-box predictor and more like a simulation of how human teams actually make investment decisions.
The architecture is the story. TradingAgents decomposes a single trading decision into nine distinct LLM-powered roles. The Analyst Team splits into four specialists: a Fundamentals Analyst evaluating company financials, a Sentiment Analyst aggregating news headlines, StockTwits, and Reddit chatter, a News Analyst monitoring macroeconomic indicators, and a Technical Analyst using indicators like MACD and RSI to detect price patterns. The Researcher Team then assigns one bullish and one bearish researcher to critically assess the analysts’ findings through structured debates. A Trader agent composes the reports and proposes a trade. Finally, a Risk Management team evaluates market volatility and liquidity before the Portfolio Manager approves or rejects the transaction.
This division of labor is not cosmetic. Each agent operates with a distinct prompt and a different model configuration. Users can assign a powerful reasoning model like GPT-5.5 for the complex analytical roles and a cheaper, faster model like GPT-5.4-mini for the quick tasks. The framework supports OpenAI, Google, Anthropic, xAI, DeepSeek, Qwen, GLM, MiniMax, OpenRouter, Ollama for local models, and Azure OpenAI for enterprise deployments. The configuration is granular enough that a researcher could run the same ticker through a dozen different model combinations and compare the resulting decisions.
What separates TradingAgents from the typical GitHub AI trading project is its honest treatment of reproducibility. The README is unusually direct about the problem. “TradingAgents is LLM-driven, so two runs of the same ticker and date can differ,” it states. “This is expected for a research tool built on language models, not a defect.” The variation comes from three sources: language model sampling is non-deterministic even at fixed temperature, reasoning models vary the most because their internal reasoning is itself sampled, and live data sources like news and social media return different content as time passes.
The framework provides pragmatic mitigations. Users can set temperature to 0.0 and pair it with a non-reasoning model like GPT-4.1, which honors temperature settings more faithfully than reasoning models. The company identity is resolved deterministically from the ticker before any agent runs, and the market analyst grounds exact price and indicator claims in a verified data snapshot. Earlier versions had issues with agents hallucinating different companies or fabricated price levels across runs, but the current release addresses those through these grounding mechanisms.
The persistence layer is another thoughtful touch. TradingAgents maintains a decision log at ~/.tradingagents/memory/trading_memory.md that records every completed run. On the next analysis for the same ticker, the framework fetches the realized return, generates a one-paragraph reflection, and injects the most recent same-ticker decisions plus recent cross-ticker lessons into the Portfolio Manager prompt. This creates a primitive form of experience accumulation across runs. The framework also supports LangGraph checkpoint resume, so a crashed or interrupted run resumes from the last successful step instead of starting over.
The v0.2.5 release, dated May 2026, adds a grounded Sentiment Analyst, GPT-5.5 model coverage, dual-region support for Qwen, GLM, and MiniMax, non-US alpha benchmarks, and ticker path-traversal hardening. The multi-region support is notable because it reflects a real engineering challenge: Chinese LLM providers like Qwen, GLM, and MiniMax maintain separate API endpoints for domestic and international users, with different authentication and rate limits. TradingAgents handles both with environment variables like DASHSCOPE_API_KEY for the international endpoint and DASHSCOPE_CN_API_KEY for the China endpoint.
The framework covers any market that Yahoo Finance supports, using exchange-suffixed tickers. US stocks like AAPL and SPY work alongside Hong Kong’s 0700.HK, Tokyo’s 7203.T, London’s AZN.L, India’s RELIANCE.NS, and China A-shares like 600519.SS for Kweichow Moutai. Crypto pairs like BTC-USD and ETH-USD are also supported. The alpha benchmark resolves automatically per market, comparing returns against the local index.
TauricResearch is careful to frame this as a research scaffold, not a trading system. “TradingAgents framework is designed for research purposes,” the README states. “Trading performance may vary based on many factors, including the chosen backbone language models, model temperature, trading periods, the quality of data, and other non-deterministic factors. It is not intended as financial, investment, or trading advice.” The backtest results are not guaranteed to match any published figure, and the team advises treating the framework as a tool for studying multi-agent analysis rather than a strategy with a fixed, replicable return.
The most interesting question TradingAgents raises is about the sociology of multi-agent systems. By forcing models to debate each other, the framework creates a kind of adversarial verification that single-agent systems lack. The bullish and bearish researchers are explicitly instructed to find flaws in each other’s arguments. This mirrors the real-world function of a trading desk’s research department, where analysts are rewarded for poking holes in consensus views. Whether LLMs can actually perform this function reliably is an open question, but the framework provides a testbed for studying it.
For AI builders, TradingAgents offers a concrete example of how to structure multi-agent workflows with LangGraph, handle non-determinism honestly, and build persistence into agent systems. The decision log with reflection is a pattern that could transfer to other domains where agents need to learn from past failures. The honest treatment of reproducibility is a model for how open-source AI projects should document their limitations. The framework is not going to replace human traders, but it might help researchers understand what happens when you let LLMs argue with each other about money.