Fine-Tuning vs Prompting vs RAG
A common (and expensive) mistake: jumping to fine-tuning before exhausting cheaper options. Decision tree:
Does the model already know the answer?
├── Yes → Better prompting
└── No → Does it need facts that change?
├── Yes → RAG
└── No → Does it need a new skill or style?
├── Yes → Fine-tuning (LoRA first, full only if needed)
└── No → Reconsider — maybe a different base model
Rough costs
| Approach | Setup | Per-request | Maintenance |
|---|---|---|---|
| Prompting | $0 | Inference only | None |
| RAG | $50-$500 | Inference + retrieval | Re-index when data changes |
| Fine-tuning (LoRA) | $20-$2,000 | Inference (slightly higher) | Re-tune when base model deprecates |
A pragmatic order
- Try harder on the prompt. A great prompt closes 70% of the gap.
- Add RAG. If the gap is factual, retrieval almost always wins.
- Switch base models. Sometimes a bigger or different model already does what you want.
- Then consider fine-tuning. And start with LoRA — full fine-tuning is rarely worth it.