How to Choose a Model

There are over 1,000 publicly tracked LLMs. You will never benchmark all of them. Use this five-axis framework instead.

The five axes

Axis	Question	Where it matters
Cost	$ per million tokens?	High-volume apps
Latency	Tokens/sec, time-to-first-token?	User-facing chat
Quality	MMLU / GSM8K / your eval?	Anything quality-sensitive
Context	How much text can it hold?	Long docs, RAG
Compliance	Where does data go?	Regulated industries

The shortlist heuristic

Pick 3 candidates, run 5 of your real prompts through each, and rate the outputs 1-5. The winner of your eval is the winner. Public benchmarks are a starting point, not a verdict.

When in doubt

Need it free, fast, and good? → Llama 3.3 70B via Groq.
Need long context? → Gemini 1.5 Flash (1M tokens, free).
Need top-tier reasoning? → DeepSeek V3 or R1 distills.
Need code? → Qwen 2.5 Coder 32B.

Open the Comparison Lab to test any three of these on your prompts in under a minute.

How to Choose a Model

How to Choose a Model

The five axes

The shortlist heuristic

When in doubt

Try it in the Playground

Browse all lessons