AI Research•February 16, 2026•🇰🇷 한국어

MiniMax M2.5: Opus-Level Performance at $1 per Hour

MiniMax M2.5 achieves SWE-bench 80.2% using only 10B active parameters from a 230B MoE architecture. 1/20th the cost of Claude Opus with comparable coding performance. Forge RL framework, benchmark analysis, pricing comparison.

MiniMax M2.5: Opus-Level Performance for $1 per Hour

On February 12, 2026, Shanghai-based AI startup MiniMax released M2.5. SWE-bench Verified 80.2%, BrowseComp 76.3%, Multi-SWE-Bench 51.3%. All within 0.6%p of Claude Opus 4.6, at 1/20th the price.

The model is available as open weights on Hugging Face under a modified MIT license. It runs on a 230B parameter MoE architecture, activating only 10B at inference time. Running the 100 TPS (tokens per second) Lightning variant continuously for one hour costs about $1.

This post analyzes M2.5's architecture, training methodology, benchmark performance, and pricing structure, and examines what it means for the AI industry.

Architecture: 230B Total, 10B Active

🔒

Sign in to continue reading

Create a free account to access the full content.

AI Research

Backpropagation From Scratch: Chain Rule, Computation Graphs, and Topological Sort

How microgpt.py's 15-line backward() works. From high school calculus to chain rule, computation graphs, topological sort, and backpropagation.

AI Research

Karpathy's microgpt.py Dissected: Understanding GPT's Essence in 150 Lines

A line-by-line dissection of microgpt.py -- a pure Python GPT implementation with zero dependencies. Training, inference, and autograd in 150 lines.

AI Research

Diffusion LLM Part 4: LLaDA 2.0 -> 2.1 -- Breaking 100B with MoE + Token Editing

MoE scaling, Token Editing (T2T+M2T), S-Mode/Q-Mode, RL Framework -- how LLaDA 2.X makes diffusion LLMs practical.

MiniMax M2.5: Opus-Level Performance for $1 per Hour

Architecture: 230B Total, 10B Active

Sign in to continue reading

Related Posts

Backpropagation From Scratch: Chain Rule, Computation Graphs, and Topological Sort

Karpathy's microgpt.py Dissected: Understanding GPT's Essence in 150 Lines

Diffusion LLM Part 4: LLaDA 2.0 -> 2.1 -- Breaking 100B with MoE + Token Editing