Level 5 · Frontier & Mastery
9 min

Reasoning Models

DeepSeek R1, o3, QwQ — a new class of model that thinks before it speaks.

Throughout 2024-2025 a new category of model emerged that broke the old scaling laws. Instead of being made bigger, reasoning models were trained to think longer. The result: dramatic gains on math, code, and multi-step problems — sometimes 30+ percentage points over their base models.

Capability vs compute — and the regime shift with reasoning models

log(Compute)CapabilityGPT-2GPT-3GPT-4Reasoning models?RL on reasoning

Old laws: more compute → smooth capability gains. New regime: training on RL with verifiable rewards unlocks step-changes.

What makes a reasoning model different

A reasoning model is post-trained with reinforcement learning on verifiable rewards. The training loop:

  1. Give the model a math/code problem with a known answer
  2. Let it generate a long chain of thought + final answer
  3. If the answer is correct, reward the chain
  4. Repeat for millions of problems

The model learns to search and verify within its own context window. It generates 5,000-30,000 tokens of "thinking" — backtracking, checking work, exploring alternatives — before emitting a final answer.

The visible chain-of-thought

These models expose their reasoning. DeepSeek R1 emits a <think>...</think> block with its scratchwork before answering. o3 returns reasoning summaries via the API. QwQ does the same. You can watch the model reason.

Sometimes this is fascinating ("ah, I made an error, let me reconsider..."). Sometimes it's embarrassing ("the user is asking about X but I'll pretend to know..."). Either way, it's a new layer of observability.

When reasoning models win

They dominate when:

  • The task has a verifiable answer (math, code, logic puzzles)
  • Multi-step reasoning is required
  • The problem can be decomposed

They tie or lose when:

  • The task is creative writing
  • Speed matters (they're 10-100× slower)
  • The task is simple Q&A or summary

Cost and latency trade-offs

A reasoning model spends 5,000-30,000 tokens of internal thought. At $15/M tokens that's $0.45 per query. Compare to GPT-4.1 at $0.005 for the same query. Reasoning models are 70-100× more expensive per request.

Latency: 30-90 seconds typical for hard problems vs 2-5 seconds for standard models. Not a chat UX — more like an async tool you queue work for.

The major reasoning models in 2026

  • OpenAI o3 — best overall, expensive ($60/M output). Multimodal.
  • OpenAI o4-mini — 90% of o3 capability, 10% of the cost.
  • DeepSeek R1 (0528 refresh) — open weights, free via OpenRouter, frontier-tier on math/code.
  • Google Gemini 2.5 Pro Thinking — strong, generous free tier.
  • Alibaba QwQ-32B — open weights, strong reasoning at small scale.
  • Anthropic Claude 4 Opus (extended thinking) — toggle reasoning mode on Claude.

When to use them in production

Reasoning models go in your expensive lane: hard customer questions that need a real answer, code generation for non-trivial tasks, math/finance/science workflows. Use a routing classifier to send only the hard 10% of queries to a reasoning model; the easy 90% go to a fast cheap model.

The trick: spend reasoning model compute only where it matters.

Knowledge Check

Score 70% or higher to mark this chapter complete.

Q1.How are reasoning models trained differently from standard LLMs?

Q2.How much more expensive is a reasoning model per query on average?

Q3.When does a reasoning model NOT outperform a standard model?

Q4.Best production pattern for reasoning models?

0 / 4 answered

LLMAtlas — The Open Ecosystem Workspace for LLMs