Zero-shot, Few-shot & Chain-of-Thought
Three prompting modes, each with a sweet spot.
A prompt can include zero, one, or many examples. It can ask for a direct answer or a step-by-step reasoning trace. These choices have names — and big effects on output quality.
Zero-shot
You just ask. No examples, no reasoning scaffold:
Classify the sentiment: "The pizza was cold but the staff was lovely."
Zero-shot works when the task is common in training data (sentiment, summarisation, translation, simple Q&A). It fails when the task is unusual or has a specific output format the model hasn't seen.
Few-shot
You give 2–5 worked examples before the real one:
Classify the sentiment: "The pizza was burnt." → negative
Classify the sentiment: "Great service, fast delivery." → positive
Classify the sentiment: "Food was fine, nothing special." → neutral
Classify the sentiment: "The pizza was cold but the staff was lovely." →
Few-shot teaches by demonstration. It's the most reliable way to nail a specific output format. The examples also shape edge cases: showing a "neutral" example makes the model less prone to forcing every input into positive/negative.
Quality > quantity. Five diverse, high-quality examples outperform fifty mediocre ones. Cover edge cases. Avoid bias (don't make all examples positive, or all short).
Chain-of-thought (CoT)
Ask the model to show its work:
Q: A bat and a ball cost $1.10 together. The bat costs $1 more than the ball.
How much does the ball cost? Think step by step.
A: Let the ball cost x. Then the bat costs x + 1.
Total: x + (x + 1) = 1.10 → 2x = 0.10 → x = $0.05.
For non-reasoning models, CoT dramatically improves accuracy on math, logic, and multi-step problems. The intermediate tokens act as scratchpad — the model conditions each new step on its own previous reasoning.
For reasoning models (DeepSeek R1, o3, QwQ), CoT happens internally and you usually shouldn't ask for it explicitly. They've been RL-trained to do it; prompting "think step by step" can sometimes hurt.
Combining them
The most powerful pattern is few-shot + CoT: give 2–3 examples where each example shows the reasoning explicitly. The model learns both the format and the thinking style.
Q: ... A: First, ... Then, ... So the answer is ...
Q: ... A:
This combo is the workhorse of high-stakes prompting. It's how Anthropic, OpenAI, and Google demonstrate complex agent behaviours in research.
Diminishing returns
After about 5 examples, each new example helps less. After 10, you're often hurting yourself with token cost. If you need many examples, you probably want fine-tuning, not prompting.