AI Research🇰🇷 한국어

Claude Sonnet 4.6: Opus-Level Performance, 40% Cheaper — Benchmark Deep Dive

Claude Sonnet 4.6 scores 79.6% on SWE-bench, 72.5% on OSWorld, and 1633 Elo on GDPval-AA — matching or beating Opus 4.6 on production tasks. $3/$15 vs $5/$25 per M tokens. Analysis of Adaptive Thinking, Context Compaction, and OSWorld growth trajectory.

Claude Sonnet 4.6: Opus-Level Performance, 40% Cheaper — Benchmark Deep Dive

Did Sonnet Just Beat Opus? — Claude Sonnet 4.6 Benchmark Deep Dive

Anthropic released Claude Sonnet 4.6 on February 17, and it outperforms the flagship Opus 4.6 on several key benchmarks. At roughly 40% less cost. The secret isn't a "cheaper knock-off" — it's architectural-level structural changes.

Opus vs Sonnet: What Changed?

The old Opus-Sonnet dynamic was straightforward. Opus was the full-spec brain; Sonnet was the compressed version. Same architecture, smaller size, naturally lower performance.

In the 4.6 generation, that formula breaks.

🔒

Sign in to continue reading

Create a free account to access the full content.

Related Posts