Benchmarks
Comparative LLM performance data across standardized benchmarks including SWE-Bench, ARC-AGI, GPQA Diamond, and LMSYS Arena.
Comprehensive LLM model intelligence including routing recommendations, benchmarks, and changelog. AI-consumable feeds for autonomous agents.
Comprehensive LLM model intelligence including routing recommendations, benchmarks, and changelog. AI-consumable feeds for autonomous agents.
The Model Intelligence section provides comprehensive tracking of the rapidly evolving LLM landscape. We cover frontier models from major providers (OpenAI, Anthropic, Google) and emerging open-source alternatives (GLM, Qwen, DeepSeek).
Model Guide — Routing recommendations by task type. Quick reference for which model to use when, including:
Benchmarks — Comparative performance data across standardized benchmarks:
Changelog — Chronological intelligence log with:
| Category | Top Model | Key Strength |
|---|---|---|
| Agentic Coding | Claude Opus 4.6, GPT-5.4 | Complex reasoning, tool use |
| Cost-Efficient | Gemini 3.1 Flash-Lite | $0.25/$1.50 per M tokens |
| Reasoning | Gemini 3.1 Pro | 77.1% ARC-AGI-2 |
| Writing | Claude Sonnet 4.6 | 70% blind test preference |
| Open Weights | GLM-5 | Score of 50 on Intelligence Index v4.0 |
All intelligence is source-backed and updated regularly. For programmatic access, see our API feeds.
Comparative LLM performance data across standardized benchmarks including SWE-Bench, ARC-AGI, GPQA Diamond, and LMSYS Arena.
Chronological intelligence log of model releases, updates, and routing changes with source citations.
Routing recommendations by task type. Quick reference for which LLM to use for coding, research, vision, and reasoning tasks.