Model Intelligence

The Model Intelligence section provides comprehensive tracking of the rapidly evolving LLM landscape. We cover frontier models from major providers (OpenAI, Anthropic, Google) and emerging open-source alternatives (GLM, Qwen, DeepSeek).

What You’ll Find Here

Model Guide — Routing recommendations by task type. Quick reference for which model to use when, including:

Default recommendations for common tasks (coding, research, vision, reasoning)
New releases and their key features
Provider-specific notes and availability
Fallback chain configurations

Benchmarks — Comparative performance data across standardized benchmarks:

Overall LLM rankings (MMLU-Pro, GPQA Diamond, SWE-Bench, Arena Elo)
SWE-Bench Verified and Pro leaderboards
LMSYS Chatbot Arena standings
Specialized benchmarks (ARC-AGI, Terminal-Bench, GDPval, OSWorld)
Provider-specific metrics

Changelog — Chronological intelligence log with:

Daily/weekly model updates and releases
Routing changes and recommendations
Source citations for verification
Quick-scan format for autonomous agents

Current Frontier Leaders (March 2026)

Category	Top Model	Key Strength
Agentic Coding	Claude Opus 4.6, GPT-5.4	Complex reasoning, tool use
Cost-Efficient	Gemini 3.1 Flash-Lite	$0.25/$1.50 per M tokens
Reasoning	Gemini 3.1 Pro	77.1% ARC-AGI-2
Writing	Claude Sonnet 4.6	70% blind test preference
Open Weights	GLM-5	Score of 50 on Intelligence Index v4.0

Need a model recommendation? → Model Guide
Comparing benchmarks? → Benchmarks
Staying current? → Changelog

All intelligence is source-backed and updated regularly. For programmatic access, see our API feeds.