Research / T-2026-0414

Meituan's LongCat-2.0: A 1.6T-Parameter MoE Trained on Chinese Chips, Now Open Source

Q: Meituan's LongCat-2.0: A 1.6T-Parameter MoE Trained on Chinese Chips, Now Open Source — key point 1

Meituan released LongCat-2.0, a 1.6-trillion-parameter MoE model trained on 50,000 Huawei Ascend cards, the largest known training run on domestic Chinese silicon.

Q: Meituan's LongCat-2.0: A 1.6T-Parameter MoE Trained on Chinese Chips, Now Open Source — key point 2

The model scores 59.5 on SWE-bench Pro and 70.8 on Terminal-Bench, with a 1M-token context window via LongCat Sparse Attention, activating 33B-56B parameters per token.

Q: Meituan's LongCat-2.0: A 1.6T-Parameter MoE Trained on Chinese Chips, Now Open Source — key point 3

LongCat-2.0 is open-source, marking the first public trillion-parameter model trained entirely on non-NVIDIA hardware, challenging US export control assumptions.

Meituan releases LongCat-2.0, a 1.6T-parameter open-source MoE model trained on 50,000 domestic chips, with a 1M context window and top-3 OpenRouter rankings.

Tessera Newsroom · 4 min read · June 30, 2026

Source LongCat-2.0, a large-scale MoE model with 1.6T total and 48B Active (longcat.chat)

FIGURE T-2026-0414

2.0 RESEARCH

Meituan released LongCat-2.0 today, a 1.6-trillion-parameter Mixture-of-Experts model trained end-to-end on a cluster of 50,000 domestic compute cards. The model, detailed on the LongCat blog, is open-source and marks the first time a trillion-parameter model trained entirely on non-NVIDIA hardware has been released to the public. It supports a native 1-million-token context window via LongCat Sparse Attention, activates 33 billion to 56 billion parameters per token, and ranks in the top three on OpenRouter by call volume.

The model is built for agentic coding and software engineering. On SWE-bench Pro it scores 59.5, and on Terminal-Bench it reaches 70.8. These are not frontier-chasing numbers, but they are production-viable numbers for a model that costs less to run than comparable frontier models because it was trained on cheaper, domestic hardware. The architecture introduces three technical innovations: LongCat Sparse Attention (LSA) for the 1M context window, MOPD multi-expert fusion that combines agent, reasoning, and interaction experts, and zero-computation experts with dynamic activation that keep the active parameter count between 33B and 56B despite the 1.6T total.

The chip story is the real story here. The 50,000-chip cluster used Huawei Ascend cards, not NVIDIA. This is the largest known training run on domestic Chinese silicon. Meituan CEO Wang Xing has been building toward this for two years. The company’s AI strategy has three layers: AI at Work, AI in Products, and Building LLMs. LongCat-2.0 is the third layer, and without it, the first two layers depend on infrastructure Meituan does not control. That is not an acceptable position for China’s largest food delivery platform, which processes tens of millions of orders daily and manages hundreds of thousands of restaurants.

The timing is not accidental. On April 24, DeepSeek released V4, a 1.6-trillion-parameter model also trained on Huawei Ascend chips. Two trillion-parameter models, two companies, same day, zero NVIDIA. That is a pattern, not a coincidence. The US export control regime that restricted A100s, H100s, H800s, and H20s assumed that China could not train frontier-class models without access to NVIDIA’s most capable chips. That assumption now looks outdated.

LongCat-2.0 is open-source, a shift from the invited-access preview that Meituan released in April. The open-source release includes the model weights on Hugging Face and GitHub, and the model is available on OpenRouter. That is a strategic choice. DeepSeek V4 is also open-source. The competition between Chinese AI labs is now playing out in the open-source arena, not in closed APIs or benchmark leaderboards. Both labs are betting that the fastest path to adoption is letting developers run the models themselves.

The model’s performance on agentic benchmarks is competitive but not dominant. SWE-bench Pro 59.5 is below the 70+ scores that frontier models like Claude 4.7 and GPT-5 achieve, but it is above the 50-55 range that most open-source models reach. Terminal-Bench 70.8 is stronger, suggesting the model is particularly good at command-line and terminal-based agent tasks. The 1M context window is a genuine differentiator. Most models at this scale cap out at 128K or 256K. Meituan’s LSA mechanism makes the full million tokens computationally feasible.

The open-source release raises a policy question. The US export controls were designed to prevent China from training frontier models. LongCat-2.0 is a frontier-scale model trained on domestic chips. If the policy goal was to slow China’s AI progress, the evidence suggests it has not worked. Instead, it has accelerated the development of domestic chip supply chains and training infrastructure. Huawei’s Ascend 950PR delivers 1.56 petaflops per card at FP4 precision, roughly 2.8 times the FP4 performance of the H20, the chip US policy currently allows into China. ByteDance has reportedly committed $5.6 billion to Ascend orders. Industry estimates put total Chinese demand for Ascend chips at $12 billion to $15 billion in 2026.

The caveat is that aggregate flops and efficient flops are different things. Neither Meituan nor DeepSeek has published detailed training efficiency metrics. The 50,000-chip cluster may compensate for lower per-chip efficiency with brute scale, but that strategy has limits. The question is whether the next generation of domestic chips closes the efficiency gap or widens it.

For AI builders, the takeaway is straightforward. There is now a viable open-source alternative to NVIDIA-dependent training infrastructure for trillion-parameter models. The cost advantage is real: domestic chips are cheaper than NVIDIA’s latest offerings, and the cluster economics favor scale over per-chip performance. For developers building agentic coding tools or enterprise automation workflows, LongCat-2.0 is a model that runs on commodity hardware and delivers production-quality results on agent benchmarks. The 1M context window alone makes it worth evaluating for long-document tasks.

The model is not a GPT-5 killer. It is not even the strongest open-source model on the market. But it is the first trillion-parameter model trained entirely on domestic Chinese chips that is available for anyone to download and run. That is a milestone, and it changes the calculus for anyone building AI infrastructure in a world where NVIDIA’s dominance is no longer assumed.

Tessera Newsroom

Editorial

Masthead Contact

T-REL / RESEARCH

Central bankers warn debt-fuelled AI boom risks a global financial crash

The Bank for International Settlements warns that debt-fuelled AI spending, opaque financing, and shadow bank lending risk a global financial crash similar to the 2008 credit…

Tessera Newsroom · June 29, 2026

Research / T-2026-6838

The single Unicode character that reveals AI's typeface blind spot

A deep dive into Arabic ligature rendering exposes a blind spot in how AI systems understand written language.

Tessera Newsroom · June 29, 2026

Research / T-2026-1811

DeepSeek's DSpark spec-decoding framework accelerates inference 60-85%

DeepSeek open-sources DeepSpec, a full-stack speculative decoding library, claiming 60-85% speedups on Flash models and 57-78% on Pro models.

Tessera Newsroom · June 28, 2026