DATA · MAINTAINED TABLE · T-DATA-LLM
Frontier Model Comparison
A side-by-side reference for today’s frontier large language models — context window, modalities, API pricing, open-weights status, and headline benchmarks. Many flagship models share architectural lineage — mixture-of-experts routing and chain-of-thought reasoning are now table stakes — but they diverge sharply on context length, openness, and cost.
| Vendor | Model | Released | Context | Modalities | Weights | $ / Mtok in | $ / Mtok out | MMLU | GPQA | SWE-bench |
|---|---|---|---|---|---|---|---|---|---|---|
| OpenAI | GPT-5.1 | 2026-02 | 400K | text, image, audio | CLOSED | — | — | — | — | — |
| OpenAI | GPT-5 | 2025-08 | 400K | text, image, audio | CLOSED | — | — | — | — | — |
| Anthropic | Claude Opus 4.5 | 2026-01 | 200K | text, image | CLOSED | — | — | — | — | — |
| Anthropic | Claude Sonnet 4.5 | 2025-09 | 200K | text, image | CLOSED | $3.00 | $15.00 | — | — | — |
| Google DeepMind | Gemini 3 Pro | 2026-03 | 1M | text, image, audio, video | CLOSED | — | — | — | — | — |
| Google DeepMind | Gemini 2.5 Pro | 2025-03 | 1M | text, image, audio, video | CLOSED | — | — | — | — | — |
| Meta | Llama 4 Maverick | 2025-04 | 1M | text, image | OPEN | — | — | — | — | — |
| Meta | Llama 4 Scout | 2025-04 | 10M | text, image | OPEN | — | — | — | — | — |
| DeepSeek | DeepSeek-V3 | 2024-12 | 128K | text | OPEN | — | — | — | — | — |
| DeepSeek | DeepSeek-R1 | 2025-01 | 128K | text | OPEN | — | — | — | — | — |
| Mistral AI | Mistral Large 2 | 2024-07 | 128K | text | CLOSED | — | — | — | — | — |
| xAI | Grok 4 | 2025-07 | 256K | text, image | CLOSED | — | — | — | — | — |
| Alibaba (Qwen) | Qwen3-235B | 2025-04 | 128K | text | OPEN | — | — | — | — | — |
| Cohere | Command A | 2025-03 | 256K | text | OPEN | — | — | — | — | — |
Figures are compiled from public sources (vendor announcements, model cards, published evals). Where a value is not publicly confirmed it is shown as null / — rather than estimated. Benchmark scores vary by harness and prompting; treat as indicative. Corrections: [email protected]
Editor review queue (1)
- **Claude Opus 4.8** – new model release, not in current list.
Related: Foundation model · Quantization · Fine-tuning · AI Lab Funding Tracker · Research coverage