What is a Large Language Model?
A Large Language Model (LLM) is a neural network trained to predict the next word — given a sequence of words. That's it. Everything else — chat, code, reasoning, summarisation — emerges from doing that prediction extremely well, at very large scale.
The three ideas you need
-
Tokens — LLMs don't see characters or words. They see tokens (subword pieces). The word "unbelievable" might be three tokens:
un,believ,able. Models charge per token and have a maximum context window measured in tokens. -
Parameters — These are the learned weights. A 7B model has 7 billion of them. More parameters generally mean more capability, but also more cost.
-
Context window — How much text the model can consider at once. Tiny: 4K (an essay). Modern: 128K (a novella). Frontier: 1M+ (a small codebase).
Why "Large"?
Three scaling axes drive capability:
- Model size (parameters)
- Training data (tokens seen)
- Compute (FLOPs spent training)
The "scaling laws" discovered by OpenAI and DeepMind showed that loss decreases predictably as you push all three. That single insight launched the modern era of AI.
What's next
In the next lesson we'll look at how to choose a model from the 1,000+ available — the same problem LLMAtlas was built to solve.