Qwen3-Max-Thinking Snapshot Release: A New Standard in Reasoning AI

The recent trend in the LLM market goes beyond simply learning "more data" — it's now focused on "how the model thinks." Alibaba Cloud has released an API snapshot (qwen3-max-2026-01-23) of its most powerful model, Qwen3-Max-Thinking.

Beyond simple text generation, this model thinks deeply like a human and autonomously selects its own tools. Here's a summary of why it's shaking up the current AI landscape.

"Thinking" AI: Test-time Scaling

The most significant feature of Qwen3-Max-Thinking is the introduction of Reasoning Mode. Before providing an answer, this model strengthens its reasoning steps (thinking mode) and integrates tool calls into its reasoning flow as needed.

Multi-round Self-verification: Improves reasoning quality through multi-round test-time scaling and self-verification (self-correction) loops.
Parallel Test-time Compute: Combined with a code interpreter, it maximizes mathematical reasoning capabilities through parallel test-time compute.
Accuracy and Traceability: Provides accurate and logically traceable answers for technical problems in algebra, number theory, probability, and more.

Self-selecting Tools: Adaptive Tool-use

While previous models only used tools (search, code execution, etc.) specified by users, Qwen3-Max-Thinking autonomously selects tools based on conversation context.

According to Model Studio documentation, Thinking mode integrates 3 built-in tools into the reasoning process through interleaved thinking:

Web Search: Automatically calls search engines when up-to-date information is needed.
Webpage Content Extraction: Extracts and analyzes webpage content.
Code Interpreter: Writes and executes Python code on the spot when complex calculations or data analysis are required.

Benchmarks: Perfect Scores in Math Reasoning with Tools

Qwen3-Max-Thinking achieved top scores in math reasoning under tool usage + scaled test-time compute conditions.

Benchmark	Score	Conditions
AIME 2025	100%	Code interpreter + parallel test-time compute
HMMT	100%	Code interpreter + parallel test-time compute
GPQA	Excellent	PhD-level scientific reasoning

Technical Specifications

Alibaba has demonstrated through this model what over 1 trillion parameters combined with reinforcement learning can achieve.

Parameters: 1T+ (trillion scale)
Training Data: 36T tokens
Training Context: Up to 1M tokens possible with ChunkFlow technology
Architecture: MoE (Mixture of Experts)

Service Context Windows

Model	Context	Max Input	Max Output
qwen3-max (Non-thinking)	262,144	258,048	65,536
qwen3-max-2026-01-23 (Thinking)	81,920	-	-

Note: The Thinking snapshot is documented with context (81,920) as the primary specification. Detailed limits may vary by deployment/invocation method—refer to the latest documentation.

Pricing (Tiered by Token Range)

Qwen3-Max applies tiered pricing based on input token ranges:

Input Token Range	Input Price (per 1M)	Output Price (per 1M)
≤32K	$1.20	$6.00
32K~128K	$2.40	$12.00
128K+	$3.00	$15.00

Note: Pricing tables differ by deployment mode (International/US/Mainland China). This table is based on the $1.2/$6 tier in the Model Studio documentation. For the latest prices, check the deployment-specific tables in the Model Studio documentation.

Future Roadmap

Researchers have announced improvements in the following areas:

Multilingual Reasoning: Enhanced reasoning capabilities in languages beyond English
Safety Alignment: Generating safer AI responses
Robustness under Distribution Shift: Resilience in scenarios different from training data

Try It Now

"Evolution from an AI that knows a lot, to an AI that truly knows how to think"

Qwen3-Max-Thinking is currently available through the following channels:

Web: chat.qwen.ai (Qwen Chat)
API: Alibaba Cloud Model Studio (qwen3-max-2026-01-23 snapshot)

Enterprise users can test tool usage and step-by-step reasoning capabilities across various fields including finance, research, and operations.

Qwen3-Max-Thinking Snapshot Release: A New Standard in Reasoning AI